Large Scale computation of Means and Clusters for Persistence Diagrams using Optimal Transport
Abstract
Persistence diagrams (PDs) are now routinely used to summarize the underlying
topology of sophisticated data encountered in challenging learning problems. Despite
several appealing properties, integrating PDs in learning pipelines can be
challenging because their natural geometry is not Hilbertian. In particular, algorithms
to average a family of PDs have only been considered recently and are
known to be computationally prohibitive. We propose in this article a tractable
framework to carry out fundamental tasks on PDs, namely evaluating distances,
computing barycenters and carrying out clustering. This framework builds upon a
formulation of PD metrics as optimal transport (OT) problems, for which recent
computational advances, in particular entropic regularization and its convolutional
formulation on regular grids, can all be leveraged to provide efficient and (GPU)
scalable computations. We demonstrate the efficiency of our approach by carrying
out clustering on PDs at scales never seen before in the literature.