Jump to Content
Jakub Łącki

Jakub Łącki

Jakub Łącki is a research scientist working on graph-mining and large-scale optimization teams. He received his PhD from Univeristy of Warsaw in 2015, advised by Piotr Sankowski. Before joining Google he was a postdoctoral researcher at Sapienza University of Rome, working with Stefano Leonardi.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract We introduceTeraHAC, a (1+epsilon)-approximate hierarchical agglomerative clustering (HAC) algorithm whichs cales to trillion-edge graphs. Our algorithm is based on a new approach to computing (1+epsilon)-approximate HAC, which is a novel combination of the nearest-neighbor chain algorithm and the notion of (1+epsilon)-approximate HAC. Our approach allows us to partition the graph among multiple machines and make significant progress in computing the clustering within each partition before any communication with other partitions is needed.We evaluate TeraHAC on a number of real-world and synthetic graphs of up to 8 trillion edges. We show that TeraHAC requires over 100x fewer rounds compared to previously known approaches for computing HAC. It is up to 8.3x faster than SCC, the state-of-the-art distributed algorithm for hierarchical clustering, while achieving 1.16x higher quality. In fact, TeraHAC essentially retains the quality of the celebrated HAC algorithm while significantly improving the running time. View details
    Adaptive Massively Parallel Connectivity in Optimal Space
    Jara Uitto
    Rustam Latypov
    Yannic Maus
    SPAA'23 (2023)
    Preview abstract We study the problem of finding connected components in the Adaptive Massively Parallel Computation (AMPC) model. We show that when the total available space is linear in the size of the input graph the problem can be solved in O(log* n) rounds in forests (with high probability) and 2^O(log* n) expected rounds in general graphs. This improves upon an existing O(log log_(m/n) n) round algorithm. For the case when the desired number of rounds is constant we show that both problems can be solved with only O(m + n log^(k) n) total space, where k is an arbitrarily large constant and log^(k) is the k-th iterate of the log2 function. This improves upon existing algorithms requiring Omega(m + n log n) total space. View details
    Preview abstract Obtaining scalable algorithms for hierarchical agglomerative clustering (HAC) is of significant interest due to the massive size of real-world datasets. At the same time, efficiently parallelizing HAC is difficult due to the seemingly sequential nature of the algorithm. In this paper, we address this issue and present ParHAC, the first efficient parallel HAC algorithm with sublinear depth for the widely-used average-linkage function. In particular, we provide a (1+ϵ)-approximation algorithm for this problem on m edge graphs using O(m polylog m) work and poly-logarithmic depth. Moreover, we show that obtaining similar bounds for exact average-linkage HAC is not possible under standard complexity-theoretic assumptions. We complement our theoretical results with a comprehensive study of the ParHAC algorithm in terms of its scalability, performance, and quality, and compare with several state-of-the-art sequential and parallel baselines. On a broad set of large publicly-available real-world datasets, we find that ParHAC obtains a 50.1x speedup on average over the best sequential baseline, while achieving quality similar to the exact HAC algorithm. We also show that ParHAC can cluster one of the largest publicly available graph datasets with 124 billion edges in a little over three hours using a commodity multicore machine. View details
    Preview abstract Graph clustering and community detection are central problems in modern data mining. The increasing need for analyzing billion-scale data calls for faster and more scalable algorithms for these problems. There are certain trade-offs between the quality and speed of such clustering algorithms. In this paper, we design scalable algorithms that achieve high quality when evaluated based on ground truth. We develop a generalized sequential and shared-memory parallel framework based on the LambdaCC objective (introduced by Veldt et al.), which encompasses modularity and correlation clustering. Our framework consists of highly-optimized implementations that scale to large data sets of billions of edges and that obtain high-quality clusters compared to ground-truth data, on both unweighted and weighted graphs. Our empirical evaluation shows that this framework improves the state-of-the-art trade-offs between speed and quality of scalable community detection. For example, on a 30-core machine with two-way hyper-threading, our implementations achieve orders of magnitude speedups over other correlation clustering baselines, and up to 28.44x speedups over our own sequential baselines while maintaining or improving quality View details
    Preview abstract We study the widely used hierarchical agglomerative clustering (HAC) algorithm on edge-weighted graphs. We define an algorithmic framework for hierarchical agglomerative graph clustering that provides the first efficient Õ(m) time exact algorithms for classic linkage measures, such as complete- and WPGMA-linkage, as well as other measures. Furthermore, for average-linkage, arguably the most popular variant of HAC, we provide an algorithm that runs in Õ (n sqrt(m)) time. For this variant, this is the first exact algorithm that runs in subquadratic time, as long as m=n^(2−ϵ) for some constant ϵ>0. We complement this result with a simple ϵ-close approximation algorithm for average-linkage in our framework that runs in Õ (m) time. As an application of our algorithms, we consider clustering points in a metric space by first using k-NN to generate a graph from the point set, and then running our algorithms on the resulting weighted graph. We validate the performance of our algorithms on publicly available datasets, and show that our approach can speed up clustering of point datasets by a factor of 20.7--76.5x. View details
    Preview abstract Classical single source shortest paths algorithms work by maintaining distance estimates d : V → ℝ and performing so-called edge relaxations. We call an edge uv of weight w(uv) relaxed if d(v) ≤ d(u) + w(uv), and tense otherwise. To relax a tense edge uv means to set d(v) to d(u)+w(uv). It is known that starting from d(s) = 0, and d(v) = ∞ for all v ≠ s, and performing edge relaxations in arbitrary order until there are no more tense edges leads to d being equal to the distances from the source s. This overall idea can be extended to a very simple incremental algorithm for maintaining shortest paths. We consider an operation which can be seen as a dual of a relaxation and study an approximate version of both operations. We show that by repeating the respective operation until convergence one obtains very simple incremental and decremental deterministic algorithms for (1 + ∊)-approximate shortest paths in directed graphs. Specifically, we give an algorithm maintaining all-pairs approximate shortest paths in O(n^3 log n log (nW)/∊) total update time, where the graph's edge weights come from the interval [1,W]. This is two log-factors faster than the known folklore solution obtained by combining King's decremental transitive closure algorithm [King, FOCS'99] and the h-SSSP algorithm [Bernstein, SICOMP'16] for h = 2. In addition, we give an algorithm for approximating single source shortest paths of hop-length at most h in O(mh log(nW)/∊) total time. The obtained algorithm is simpler and more efficient than Bernstein's h-SSSP algorithm [Bernstein, SICOMP'16]. View details
    Preview abstract We study fundamental graph problems such as graph connectivity, minimum spanning forest (MSF), and approximate maximum (weight) matching in a distributed setting. In particular, we focus on the Adaptive Massively Parallel Computation (AMPC) model, which is a theoretical model that captures MapReduce-like computation augmented with a distributed hash table. We show the first AMPC algorithms for all of the studied problems that run in a constant number of rounds and use only O(n^ϵ) space per machine, where 0<ϵ<1. Our results improve both upon the previous results in the AMPC model, as well as the best-known results in the MPC model, which is the theoretical model underpinning many popular distributed computation frameworks, such as MapReduce, Hadoop, Beam, Pregel and Giraph. Finally, we provide an empirical comparison of the algorithms in the MPC and AMPC models in a fault-tolerant distriubted computation environment. We empirically evaluate our algorithms on a set of large real-world graphs and show that our AMPC algorithms can achieve improvements in both running time and round-complexity over optimized MPC baselines. View details
    Walking Randomly, Massively, and Efficiently
    Krzysztof Onak
    Piotr Sankowski
    Slobodan Mitrović
    STOC 2020
    Preview abstract We introduce a set of techniques that allow for efficiently generating many independent random walks in the Massively Parallel Computation (MPC) model with space per machine strongly sublinear in the number of vertices. In this space-per-machine regime, many natural approaches to graph problems struggle to overcome the Θ(log n) MPC round complexity barrier, where n is the number of vertices. Our techniques enable achieving this for PageRank—one of the most important applications of random walks—even in more challenging directed graphs, as well as for approximate bipartiteness and expansion testing. In the undirected case, we start our random walks from the stationary distribution, which implies that we approximately know the empirical distribution of their next steps. This allows for preparing continuations of random walks in advance and applying a doubling approach. As a result we can generate multiple random walks of length l in Θ(log l) rounds on MPC. Moreover, we show that under the popular 1-vs.-2-Cycles conjecture, this round complexity is asymptotically tight. For directed graphs, our approach stems from our treatment of the PageRank Markov chain. We first compute the PageRank for the undirected version of the input graph and then slowly transition towards the directed case, considering convex combinations of the transition matrices in the process. For PageRank, we achieve the following round complexities for damping factor equal to 1 − є: in O(log log n + log 1 / є) rounds for undirected graphs (with Õ(m / є^2) total space), in Õ(log^2 log n + log^2 1/є) rounds for directed graphs (with Õ((m+n^(1+o(1))) / poly(є)) total space). The round complexity of our result for computing PageRank has only logarithmic dependence on 1/є. We use this to show that our PageRank algorithm can be used to construct directed length-l random walks in O(log^2 log n + log^2 l) rounds with Õ((m+n^(1+o(1))) poly(l)) total space. More specifically, by setting є = Θ(1 / l), a length-l PageRank walk with constant probability contains no random jump, and hence is a directed random walk. View details
    Preview abstract In this paper we show new dynamic algorithms for the all-pairs shortest paths problem in weighted directed graphs. Most importantly, we give a new deterministic incremental algorithm for the the problem, that handles updates in O~(n^{4/3} log W/ε) amortized time (where the edge weights are from [1, W]) and explicitly maintains a (1 + ε)-approximate distance matrix. For a fixed ε > 0, this is the first deterministic partially dynamic algorithm for all-pairs shortest paths in directed graphs, whose update time is o(n^2), regardless of the number of edges. To obtain this result, we give a new, very simple deterministic incremental algorithm for the same problem, whose total update time is O~(n^3/ε). This algorithm can be easily adapted to work in the decremental setting within the same time bound. By using our techniques we also show how to improve the state-of-the-art partially dynamic randomized algorithms for all-pairs shortest paths [Baswana et al. STOC’02, Bernstein STOC’13] from Monte Carlo randomized to Las Vegas randomized, without increasing the running time bounds (with respect to the O~() notation). View details
    Preview abstract We introduce the Adaptive Massively Parallel Computation (AMPC) model, which is an extension of the widely popular Massively Parallel Computation (MPC) model. At a high level, the AMPC model strengthens the MPC model by storing all messages sent within a round in a distributed data store. In the following round all machines are provided with random read access to the data store, subject to the same constraints on the total amount of communication as in the MPC model. Our model is inspired by the previous empirical studies of distributed graph algorithms using MapReduce and a distributed hash table service. This extension allows us to give new graph algorithms with much lower round complexities compared to the best known solutions in the MPC model. In particular, in the AMPC model we show how to solve maximal independent set in O(1) rounds, and connectivity/minimum spanning tree in O(log log_{m/n} n) rounds, which is an exponential improvement upon the best known algorithms in the MPC model with sublinear space per machine. Our results imply that the 2-Cycle conjecture, the most popular hardness conjecture in the MPC model, does not hold in the AMPC model. View details
    Stochastic Graph Exploration
    Aris Anagnostopoulos
    Ilan R. Cohen
    Stefano Leonardi
    ICALP 2019
    Preview abstract Exploring large-scale networks is a time consuming and expensive task which is usually operated in a complex and uncertain environment. A crucial aspect of network exploration is the development of suitable strategies that decide which nodes and edges to probe at each stage of the process. Adaptive strategies are dynamically updated to adjust to the current status of the exploration process. Nonadaptive oblivious strategies are also appealing since they greatly reduce the computation and communication costs required by adaptive strategies. In order to model this process we introduce the \emph{stochastic graph exploration problem}. The input is an undirected graph $G=(V,E)$ with a source vertex $s$, stochastic edge costs for the edges drawn from a distribution $\pi_e, e\in E$, and rewards on vertices of maximum value $R$. The goal is to find a set $F$ of edges of total cost at most $B$ such that the subgraph of $G$ induced by $F$ is connected, contains $s$ and has maximum total reward. This problem generalizes the stochastic knapsack problem and other stochastic probing problems recently studied. Our focus is on the development of efficient nonadaptive strategies with small adaptivity gap that are competitive against the optimal adaptive strategy. Unfortunately, we prove an $\Omega(n)$ adaptivity gap for the stochastic graph exploration problem even on a tree of $n$ vertices that compares against the $O(1)$ adaptivity gap of the stochastic knapsack problem. We circumvent this negative result by showing that $O(\log nR)$ resource augmentation suffices for an adaptivity gap $O(1)$ on trees and $O(\log nR)$ on general graphs. In order to achieve this result, we reduce stochastic graph exploration to a memoryless process -- the {\em minesweeper} problem, -- which assigns to every edge that is traversed a probability that the process would terminate. For this problem, interesting in its own, we present an optimal polynomial time algorithm on trees and a $O(\log nR)$ approximation for general graphs. We also study the problem in which the maximum cost of an edge is a logarithmic fraction of the budget. We show under this condition the existence of polynomial time oblivious strategies that use $1+\epsilon$ budget to obtain adaptivity gaps $1+\epsilon$ on trees and $8+\epsilon$ for general graphs. Finally, we provide additional results on the structure and the complexity of nonadaptive and adaptive strategies. View details
    Optimal Dynamic Strings
    Adam Karczmarz
    Paweł Gawrychowski
    Piotr Sankowski
    Tomasz Kociumaka
    SODA 2018 (to appear)
    Preview abstract In this paper we study the fundamental problem of maintaining a dynamic collection of strings under the following operations: (a) make_string -- add a string of constant length, (b) concatenate -- concatenate two strings, (c) split -- split a string into two at a given position, (d) compare -- find the lexicographical order (less, equal, greater) between two strings, (e) LCP -- calculate the longest common prefix of two strings. We develop a generic framework for dynamizing the recompression method recently introduced by Jeż [J. ACM 2016]. It allows us to present an efficient data structure for the above problem, where an update requires only O(log n) worst-case time with high probability, with n being the total length of all strings in the collection, and a query takes constant worst-case time. On the lower bound side, we prove that even if the only possible query is checking equality of two strings, either updates or queries must take amortized Omega(log n) time; hence our implementation is optimal. View details
    Round Compression for Parallel Matching Algorithms
    Aleksander Mądry
    Artur Czumaj
    Krzysztof Onak
    Piotr Sankowski
    Slobodan Mitrović
    STOC 2018
    Preview abstract For over a decade now we have been witnessing the success of massive parallel computation (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms? A prominent example here is the maximum matching problem—one of the most classic graph problems. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in O(logn) rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. (SPAA 2011) showed that if each machine has n^(1+Ω(1)) memory, this problem can also be solved 2-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly O(log n) rounds once we enter the (at most) near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power. In this paper, we finally refute that possibility. That is, we break the above O(log n) round complexity bound even in the case of slightly sublinear memory per machine. In fact, our improvement here is almost exponential: we are able to deliver a (2+є)-approximate maximum matching, for any fixed constant є>0, in O((log log n)^2) rounds. To establish our result we need to deviate from the previous work in two important ways that are crucial for exploiting the power of the MPC model, as compared to the PRAM model. Firstly, we use vertex–based graph partitioning, instead of the edge–based approaches that were utilized so far. Secondly, we develop a technique of round compression. This technique enables one to take a (distributed) algorithm that computes an O(1)-approximation of maximum matching in O(log n) independent PRAM phases and implement a super-constant number of these phases in only a constant number of MPC rounds. View details
    Round Compression for Parallel Matching Algorithms
    Aleksander Mądry
    Artur Czumaj
    Krzysztof Onak
    Piotr Sankowski
    Slobodan Mitrović
    STOC 2018 (to appear)
    Preview abstract For over a decade now we have been witnessing the success of massive parallel computation (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms? A prominent example here is the maximum matching problem---one of the most classic graph problems. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in O(log n) rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. (SPAA 2011) showed that if each machine has n^(1+Omega(1)) memory, this problem can also be solved 2-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly O(log n) rounds once we enter the (at most) near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power. In this paper, we finally refute that perplexing possibility. That is, we break the above O(log n) round complexity bound even in the case of slightly sublinear memory per machine. In fact, our improvement here is {\em almost exponential}: we are able to deliver a (2+epsilon)-approximation to maximum matching, for any fixed constant epsilon>0, in O((log log n)^2) rounds. To establish our result we need to deviate from the previous work in two important ways that are crucial for exploiting the power of the MPC model, as compared to the PRAM model. Firstly, we use vertex--based graph partitioning, instead of the edge--based approaches that were utilized so far. Secondly, we develop a technique of round compression. This technique enables one to take a (distributed) algorithm that computes an O(1)-approximation of maximum matching in O(log n) independent PRAM phases and implement a super-constant number of these phases in only a constant number of MPC rounds. View details
    Decremental SPQR-trees for Planar Graphs
    Adam Karczmarz
    Eva Rotenberg
    Giuseppe F. Italiano
    Jacob Holm
    ESA 2018
    Preview abstract We present a decremental data structure for maintaining the SPQR-tree of a planar graph, subject to contractions and deletions of edges. The amortized update time is O(log^2 n). Via SPQR-trees, we show a data structure that maintains 3-vertex connectivity in planar graphs, answers queries in O(1) time and processes edge deletions and contractions in O(log^2 n) time (amortized over Omega(n) updates). This is an exponential improvement over a previously known O(sqrt(n)) bound that has stood for over 20 years. View details
    Contracting a Planar Graph Efficiently
    Adam Karczmarz
    Eva Rotenberg
    Giuseppe F. Italiano
    Jacob Holm
    Piotr Sankowski
    ESA (2017)
    Preview abstract We show a data structure that can maintain a simple planar graph under edge contractions in linear total time. The data structure supports adjacency queries and provides access to neighbor lists in $O(1)$ time. Moreover, it can report all the arising loops and parallel edges. By applying the data structure, we can improve the running times of algorithms for several planar graph problems, including decremental 2-edge, 2-vertex and 3-edge connectivity. We also show that using our data structure in a black-box manner, one obtains very simple optimal algorithms for computing MST and 5-coloring in planar graphs. View details
    Decremental Single-Source Reachability in Planar Digraphs
    Giuseppe F. Italiano
    Adam Karczmarz
    Piotr Sankowski
    STOC, ACM (2017)