AI

Bootstrapped Graph Diffusions: Exposing the Power of Nonlinearity

Abstract

Graph-based semi-supervised learning (SSL) algorithms predict labels for all nodes based on provided labels of a small set of seed nodes. Classic methods capture the graph structure through some underlying diffusion process that propagates through the graph edges. Spectral diffusion, which includes personalized page rank and label propagation can be formulated as repeated weighted averaging of the label vectors of adjacent nodes. Social diffusion propagates through shortest paths. A common ground to these diffusions is their {\em linearity}, which does not distinguish between contributions of few strong'' relations and manyweak'' relations.

Recently, non-linear methods such as node embeddings and graph convolutional networks (GCN) demonstrated a large gain in quality for SSL tasks. These methods introduce multiple components and greatly vary on how the graph structure, seed label information, and other features are used.

We aim here to study the contribution of non-linearity, as an isolated ingredient, to the performance gain. To do so, we place classic linear graph diffusions in a self-training framework. Surprisingly, we observe that the resulting {\em bootstrapped diffusions} not only significantly improve over the respective non-bootstrapped baselines but also outperform state-of-the-art non-linear methods. Moreover, since the self-training wrapper retains the scalability of the base method, we obtain both higher quality and better scalability.