Web-Scale Multi-Task Feature Selection for Behavioral Targeting


A typical behavioral targeting system optimizing purchase activities, called conversions, faces two main challenges: the web-scale amounts of user histories to process on a daily basis, and the relative sparsity of conversions. In this paper, we try to address these challenges through feature selection. We formulate a multi-task (or group) feature-selection problem among a set of related tasks (sharing a common set of features), namely advertising campaigns. We apply a group-sparse penalty consisting of a combination of an 1 and2 penalty and an associated fast optimization algorithm for distributed parameter estimation. Our algorithm relies on a variant of the well known Fast Iterative Thresholding Algorithm (FISTA), a closed-form solution for mixed norm programming and a distributed subgradient oracle. To eciently handle web-scale user histories, we present a distributed inference algorithm for the problem that scales to billions of instances and millions of attributes. We show the superiority of our algorithm in terms of both sparsity and ROC performance over baseline feature selection methods (both single-task L1-regularization and multi-task mutual-information gain).