Classification using Predictive State Smoothing (PRESS): A scalable kernel classifier for high-dimensional features with variable selection


In this work we adapt the predictive state smoothing (PRESS) framework to classification, which leads to a fully probabilistic, non-linear classifier that estimates the minimal sufficient statistic for predicting class membership probabilities. It can be used for high-dimensional problems, both in number of observations and covariates, and allows for variable selection using LASSO or Ridge penalties. We also establish a connection between the metric learning aspect of PRESS kernel smoothing and an equivalent state-dependent neural network representation. Out-of-sample prediction performance is comparable to existing state-of-the-art classifiers on several benchmark datasets. Yet a trained PRESS classifier provides meaningful domain-specific insights based on regression coefficients using standard frequentist as well Bayesian inference. Algorithms scale linearly in the number of observations and can be easily implemented in R, STAN, or TensorFlow.