Jump to Content

Slice Finder: Automated Data Slicing for Model Validation

Neoklis Polyzotis
Steven Whang
Tim Klas Kraska
Yeounoh Chung
Proceedings of the IEEE Int' Conf. on Data Engineering (ICDE), 2019 (to appear)

Abstract

As machine learning (ML) systems become democratized, helping users easily debug their models becomes increasingly important. Yet current data tools are still primitive when it comes to helping users trace model performance problems all the way to the data. We focus on the particular prob- lem of slicing data to identify subsets of the training data where the model performs poorly. Unlike general techniques (e.g., clustering) that can find arbitrary slices, our goal is to find interpretable slices (which are easier to take action com- pared to arbitrary subsets) that are problematic and large. We propose Slice Finder, which is an interactive framework for identifying such slices using statistical techniques. The slices can be used for applications like diagnosing model fair- ness and fraud detection where describing slices that are interpretable to humans is necessary.