Learning Permutations with gradient descent and the sinkhorn operator


Gradient descent methods have greatly facilitated the practice of machine learning, as the learning problem can be usually represented as the minimization of a differentiable function over some parameters. However, in cases where some dependencies between parameters and variables are discrete, gradient descent cannot be applied, unless those discrete nodes are relaxed to continued values ones, where derivatives can be defined. Nonetheless, no clear solution exists in cases of structured discrete objects defined by a certain combinatorial structure; for example, in permutations, which underlie the notions of ordering, ranking and matching of objects. Here we show how to extend the relaxation method to enable gradient descent in computational graphs containing permutations as deterministic or stochastic nodes. To this end, we first show that permutations can be approximated by the differentiable Sinkhorn operator. With this, we are able to define Sinkhorn networks for the supervised learning of permutations. Finally, for stochastic nodes (corresponding to latent distributions over permutations) we introduce two implicit distributions: Gumbel-Matching and its relaxation, the Gumbel-Sinkhorn, and we prescribe how to perform inferences. We demonstrate the effectiveness of our method by showing we achieve state-of-the-art results on several tasks involving both standard datasets and a scientific application.