Workshop on Learning-Augmented Algorithms

How can one use predictions to improve over approximation guarantees of classic algorithms, without increasing the running time? We propose a generic method for a broad class of optimization problems that ask to select a feasible subset of input items of minimal (or maximal) total weight. This gives simple (near-)linear-time algorithms for, e.g., Vertex Cover, Steiner Tree, Minimum Weight Perfect Matching, Knapsack, and Clique. Our algorithms produce an optimal solution when provided with perfect predictions and their approximation ratio smoothly degrades with increasing prediction error. With small enough prediction error we achieve approximation guarantees that are beyond the reach without predictions in given time bounds, as exemplified by the NP-hardness and APX-hardness of many of the above problems. Although we show our approach to be optimal for this class of problems as a whole, there is a potential for exploiting specific structural properties of individual problems to obtain improved bounds; we demonstrate this on the Steiner Tree problem. This is joint work with Antonios Antoniadis, Marek Eliáš, and Moritz Venzin.

Recent advances in machine learning have changed the theory and practice of algorithm design and analysis. From a theoretical perspective, new analysis frameworks have emerged including data-driven algorithm design (where we learn algorithms for solving problem instances coming from a given domain based on training instances from that domain) and algorithms with predictions (where algorithms are designed to take advantage of possibly imperfect machine learning predictions of some aspect of the problem).

In this talk we argue that we also need to loop back and expand our techniques for designing and analyzing machine learning algorithms themselves. I will present recent work that builds on tools from data driven algorithm design to address the question of how to tune machine learning algorithms, to achieve desired performance for a given domain, with provable guarantees. The main case study I will discuss is learning decision tree learning algorithms, a class of machine learning approaches often preferred in practice due to their high degree of interpretability.

We consider variants of online convex optimization in which before choosing a point, the algorithm receives a "hint" about the loss function that is to arrive. Such hints may be the outcome of some prediction algorithms operating with domain-specific side information. Recent works have shown that if hints are guaranteed to be "well-correlated" with the loss, then known regret bounds can be improved significantly (under additional assumptions).

In this talk, I will discuss some limitations of existing results and present hint models that can overcome them. Specifically, for the simple setting of online linear optimization on the sphere with bandit feedback, I will show a lower bound that \sqrt{T} regret is unavoidable even with well-correlated hints. I will then discuss the surprising power of "queried" hints: if the algorithm can get a weak indication of which of two points (or arms) is better before playing, it can overcome the lower bound and obtain constant regret.

This talk discusses how to leverage the framework of algorithms-with-predictions to design data structures for two fundamental dynamic graph problems: incremental topological ordering and cycle detection. In these problems, the input is a directed graph on n nodes, and the m edges arrive one by one. The data structure must maintain a topological ordering of the vertices at all times and detect if the newly inserted edge creates a cycle. The theoretically best worst-case algorithms for these problems have high update cost (polynomial in n and m). In practice, greedy heuristics (that recompute the solution from scratch each time) perform well but can have high update cost in the worst case.

We bridge this gap by leveraging predictions to design a learned new data structure for the problems. Our data structure guarantees consistency, robustness, and smoothness with respect to predictions -- that is, it has the best possible running time under perfect predictions, never performs worse than the best-known worst-case methods, and its running time degrades smoothly with the prediction error. Moreover, we demonstrate empirically that predictions, learned from a very small training dataset, are sufficient to provide significant speed-ups on real datasets.

In light of recent work on scheduling with predicted job sizes, we consider the effect of the cost of predictions in queueing systems, removing the assumption in prior research that predictions are external to the system’s resources and/or cost-free. In particular, we introduce a novel approach to utilizing predictions, SkipPredict, designed to address their inherent cost. Rather than uniformly applying predictions to all jobs, we propose a tailored approach that categorizes jobs based on their prediction requirements. To achieve this, we employ one-bit “cheap predictions” to classify jobs as either short or long. SkipPredict prioritizes predicted short jobs over long jobs, and for the latter, SkipPredict applies a second round of more detailed “expensive predictions” to approximate Shortest Remaining Processing Time for these jobs. Our analysis takes into account the cost of prediction. We examine the effect of this cost for two distinct models. In the external cost model, predictions are generated by some external method without impacting job service times but incur a cost. In the server time cost model, predictions themselves require server processing time, and are scheduled on the same server as the jobs.

In correlation clustering, the input is a complete graph where an edge indicates whether its endpoints should be in the same cluster or not. The goal is to partition the vertices in a way that minimizes the $\ell_p$-norm of the vertices’ disagreements. It is known that in the online setting, where nodes arrive one-by-one and must be irrevocably assigned to a cluster upon arrival, there are $\Omega(n)$ lower bounds on the competitive ratio of any online algorithm for correlation clustering. In search of meaningful ways to study this problem online despite these lower bounds, we consider the online-with-a-sample model, an approach that goes beyond worst-case analysis. To study correlation clustering in this model, we first created a new, combinatorial approach in the offline setting. In this talk, I’ll outline the machinery needed to study $\ell_p$-norm correlation clustering in the online-with-a-sample model, and showcase some unique structural insights into correlation clustering that we found along the way.

Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis. Popular sketching approaches to this problem (e.g., CountMin and CountSketch) come with worst-case guarantees that probabilistically bound the error of the estimated frequencies for any possible input. The work of Hsu et al. (2019) introduced the idea of using machine learning to tailor sketching algorithms to the specific data distribution they are being run on. In particular, their learning-augmented frequency estimation algorithm uses a learned heavy-hitter oracle which predicts which elements will appear many times in the stream. We give a novel algorithm, which in some parameter regimes, already theoretically outperforms the learning based algorithm of Hsu et al. without the use of any predictions. Augmenting our algorithm with heavy-hitter predictions further reduces the error and improves upon the state of the art. Empirically, our algorithms achieve superior performance in all experiments compared to prior approaches. This is joint work with Anders Aamand, Justin Y. Chen, Sandeep Silwal, and Ali Vakilian.

We consider learning augmented problems, where our goal is to be within 1% of an algorithm that "follows the prediction". Our prediction is provided at any step of an online algorithm and hence may be adaptive and based possibly upon all previous history. Specifically, the prediction may be any information that yields an action to be taken by an online algorithm at the current step. An algorithm is called 1-smooth if its cost (value) is as low (high) as the algorithm that follows the prediction. For the packet scheduling with deadlines problem (where the goal is to maximize the total value of transmitted packets), we provide 1-smooth and 3-robust algorithm. That means an algorithm that always achieves a value which is as large as "following the prediction" but never worse than 3 times the optimal solution for every possible sequence and prediction. For the list update problem (where the goal is to minimize the access cost plus the movement cost), we provide (1+\epsilon)-smooth randomized algorithm, offering robustness of O(1/\epsilon^4). That means an algorithm that always pays a cost which is within (1+\epsilon) of the cost incurred by "following the prediction" but never worse than O(1/\epsilon^4) times the optimal solution for every possible sequence and prediction.

Joint work with Shahar Lewkowicz, Varun Suriyanarayana and Or Vardi

The main bottleneck in designing efficient dynamic algorithms is the unknown nature of the update sequence. In particular, there are some problems, like triconnectivity, planar digraph all pairs shortest paths, k-edge connectivity, and others, where the separation in runtime between the best offline or partially dynamic solutions and the best fully dynamic solutions is polynomial, sometimes even exponential.

In this talk, we introduce the predicted-updates dynamic model, one of the first beyond-worst-case models for dynamic algorithms, which generalizes a large set of well-studied dynamic models including the offline dynamic, incremental, and decremental models to the fully dynamic setting when given predictions about the update times of the elements. In the most basic form of our model, we receive a set of predicted update times for all of the updates that occur over the event horizon. We give a novel framework that “lifts” offline divide-and-conquer algorithms into the fully dynamic setting with little overhead. Using this, we are able to interpolate between the offline and fully dynamic settings; when the $\ell_1$ error of the prediction is linear in the number of updates, we achieve the offline runtime of the algorithm (up to poly log n factors).

Joint work with Vaidehi Srinivas, appeared in COLT 2024.

We consider the Linear Elastic Caching problem, where the goal is to minimize the total cost of a cache inclusive of its misses and its memory footprint integrated over time. We show a theoretical connection to the classic ski-rental problem and propose a practical algorithm that combines online caching algorithms with learned ski-rental policies.

Joint work with Todd Lipcon, Manish Purohit, Tamas Sarlos.

CountSketch, or feature hashing, is a sparse random dimensionality reduction technique used in applications such as compressed sensing, low rank approximation, regression, second order optimization, and streaming algorithms. We study learned versions of CountSketch, where we learn both the positions and values of the hashing matrix to better adapt to the underlying data. We give theoretical and empirical results, showing improvements over classical CountSketch or earlier learned CountSketch algorithms where only the values are learned.

Based on a joint work with Yi Li, Honghao Lin, Simin Liu, and Ali Vakilian, with some discussion of earlier work joint with Piotr Indyk and Tal Wagner

Given a ground truth clustering, we consider the problem of recovering the underlying clusters by querying an oracle. The oracle receives as its input a subset of nodes, and outputs the number of clusters induced on those nodes. We would like to minimize the number of queries to the oracle, and consider further practical considerations such as query size, adaptivity etc. We will dig into the history of this problem, the motivation to study it, and will present some recent developments in this area.

The problem of exploring an unknown graph using predictions is a classical one, and strategies like Astar have long been studied for it. I will discuss some recent quantitative results about this problem, which give bounds on the cost of an agent exploring the graph in terms of the quality of the predictions. This is based on joint work with Siddhartha Banerjee, Vincent Cohen-Addad, and Zhouzi Li.

Algorithms with predictions is a recent framework for decision-making under uncertainty that leverages the power of machine-learned predictions without making any assumption about their quality. The goal in this framework is for algorithms to achieve an improved performance when the predictions are accurate while maintaining acceptable guarantees when the predictions are erroneous. A serious concern with algorithms that use predictions is that these predictions can be biased and, as a result, cause the algorithm to make decisions that are deemed unfair. We show that this concern manifests itself in the classical secretary problem in the learning-augmented setting---the state-of-the-art algorithm can have zero probability of accepting the best candidate, which we deem unfair, despite promising to accept a candidate whose expected value is at least max{Omega(1), 1 - O(ε)} times the optimal value, where ε is the prediction error. We show how to preserve this promise while also guaranteeing to accept the best candidate with probability Omega(1). Our algorithm and analysis are based on a new ``pegging'' idea that diverges from existing works and simplifies/unifies some of their results. Finally, we extend to the k-secretary problem and complement our theoretical analysis with experiments.

Joint work with Will Ma and Andreas Maggiori

Decision-makers often have access to a machine-learned prediction about demand, referred to as advice, which can potentially be utilized in online decision-making processes for resource allocation. However, exploiting such advice poses challenges due to its potential inaccuracy. To address this issue, we propose a framework that enhances online resource allocation decisions with potentially unreliable machine-learned (ML) advice. We assume here that this advice is represented by a general convex uncertainty set for the demand vector.

We introduce a parameterized class of Pareto optimal online resource allocation algorithms that strike a balance between consistent and robust ratios. The consistent ratio measures the algorithm's performance (compared to the optimal hindsight solution) when the ML advice is accurate, while the robust ratio captures performance under an adversarial demand process when the advice is inaccurate. Specifically, in a C-Pareto optimal setting, we maximize the robust ratio while ensuring that the consistent ratio is at least C. Our proposed C-Pareto optimal algorithm is an adaptive protection level algorithm, which extends the classical fixed protection level algorithm introduced in Littlewood (2005) and Ball and Queyranne (2009). Solving a complex non-convex continuous optimization problem characterizes the adaptive protection level algorithm. To complement our algorithms, we present a simple method for computing the maximum achievable consistent ratio, which serves as an estimate for the maximum value of the ML advice. Additionally, we present numerical studies to evaluate the performance of our algorithm in comparison to benchmark algorithms. The results demonstrate that by adjusting the parameter C, our algorithms effectively strike a balance between worst-case and average performance, outperforming the benchmark algorithms.

Assignment problems are among the most well-studied in online algorithms. In these problems, a sequence of items arriving online must be assigned among a set of agents so as to optimize a given objective. This encompasses scheduling problems for minimizing makespan, p-norms, and other objectives, as well as fair division problems such as the Santa Claus problem and Nash welfare maximization. One common feature is that many of these problems are characterized by strong worst-case lower bounds in the online setting. To circumvent these impossibility results, recent research has focused on using additional (learned) information about the problem instance and this has led to dramatic improvements in the competitive ratio over the worst case. In this talk, I will first survey some of this literature (Lattanzi et al., SODA 20; Li and Xian, ICML 21; Banerjee et al., SODA 22; Barman et al., AAAI 22) that addresses specific problems in this domain. I will then proceed to describe joint work with Ilan Cohen that brings these problems under one umbrella: we give a single algorithmic framework for near-optimal learning-augmented online assignment for a large class of maximization and minimization objectives, including all the problems mentioned above.

Online load balancing for heterogeneous machines aims to minimize the makespan (maximum machine workload) by scheduling arriving jobs with varying sizes on different machines. In the adversarial setting, where an adversary chooses not only the collection of job sizes but also their arrival order, the problem is well-understood and the optimal competitive ratio is known to be $\Theta(\log m)$ where $m$ is the number of machines. In the more realistic random arrival order model, the understanding is limited. Previously, the best lower bound on the competitive ratio was only $\Omega(\log \log m)$. We significantly improve this bound by showing an $\Omega(\sqrt {\log m})$ lower bound, even for the restricted case where each job has a unit size on two machines and infinite size on the others. On the positive side, we propose an $O(\log m/ \log \log m)$-competitive algorithm, demonstrating that better performance is possible in the random arrival model.

This is a joint work with Ravi Kumar, Shi Li, Aditya Petety, and Manish Purohit, which appeared in SPAA '24.

Online Bayesian bipartite matching is a central problem in digital marketplaces and exchanges, including advertising, crowdsourcing, ridesharing, and kidney exchange. We introduce a graph neural network (GNN) approach that emulates the problem's combinatorially complex optimal online algorithm, which selects actions (e.g., which nodes to match) by computing each action's value-to-go (VTG)—the expected weight of the final matching if the algorithm takes that action, then acts optimally in the future. We train a GNN to estimate VTG and show empirically that this GNN returns high-weight matchings across a variety of tasks. Moreover, we identify a common family of graph distributions in spatial crowdsourcing applications, such as rideshare, under which VTG can be efficiently approximated by aggregating information within local neighborhoods in the graphs. This structure matches the local behavior of GNNs, providing theoretical justification for our approach.

This is joint work with Alexandre Hayderi, Amin Saberi, and Anders Wikum.

Workshop on Learning-Augmented Algorithms

Date: August 19-21, 2024
Location: TTIC, Chicago, IL
Address: 6045 S Kenwood Ave, Chicago, IL 60637

Parking Information

Organizers:

Sponsors:

Registration:

Participants:

Schedule (Tentative):

Workshop on Learning-Augmented Algorithms

Date: August 19-21, 2024 Location: TTIC, Chicago, IL Address: 6045 S Kenwood Ave, Chicago, IL 60637 Parking Information

Organizers:

Sponsors:

Registration:

Participants:

Schedule (Tentative):

Date: August 19-21, 2024
Location: TTIC, Chicago, IL
Address: 6045 S Kenwood Ave, Chicago, IL 60637

Parking Information