Distributed Coverage Maximization via Sketching


Coverage problems are central in optimization and have a wide range of applications in data mining and machine learning. While several distributed algorithms have been developed for coverage problems, the existing methods suffer from several limitations, e.g., they all achieve either suboptimal approximation guarantees or suboptimal space and memory complexities. In addition, previous algorithms developed for submodular maximization assume oracle access, and ignore the computational complexity of communicating large subsets or computing the size of the union of the subsets in this subfamily. In this paper, we develop an improved distributed algorithm for the k-cover and the set cover with λ outliers problems, with almost optimal approximation guarantees, almost optimal memory complexity, and linear communication complexity running in only four rounds of computation. Finally, we perform an extensive empirical study of our algorithms on a number of publicly available real data sets, and show that using sketches of size 30 to 600 times smaller than the input, one can solve the coverage maximization problem with quality very close to that of the state-of-the-art single-machine algorithm.