Realistic Evaluation of Semi-Supervised Learning Algorithms


Semi-supervised learning (SSL) provides a powerful framework for leveraging unlabeled data when labels are limited or expensive to obtain. Approaches based on deep neural networks have recently proven successful on standard benchmark tasks. However, we argue that these benchmarks do not reflect real-world requirements and are compared to weak baselines. We propose a set of new benchmarks and find that simple baselines that were previously underappreciated outperform more complicated research ideas that were previously regarded as state of the art. Using our new benchmarking procedures, we additionally find that SSL methods are highly sensitive to the amount of unlabeled data and the class distribution of the data. We encourage researchers studying SSL to adopt our improved methodology, and suggest readers and reviewers of SSL papers to familiarize themselves with the experimental design concerns we identify.