Stochastic bandits robust to adversarial corruptions


We introduce a new model of stochastic bandits with adversarial corruptions which aims to capture settings where most of the input follows a stochastic pattern but some fraction of it can be adversarially changed to trick the algorithm, e.g., click fraud, fake reviews and email spam. The goal of this model is to encourage the design of bandit algorithms that (i) work well in mixed adversarial and stochastic models, and (ii) whose performance deteriorates gracefully as we move from fully stochastic to fully adversarial models.

We consider a setting with $K$ arms with underlying stochastic distributions such that $\Delta(a)$ is the difference between the mean of arm $a$ and the optimal arm $a^$. The total corruption $C$ corresponds to the total perturbation that an adversary can add to the stochastic input. Our main result is an algorithm with regret bounded by {$\bigO\prn{\sum_{a{\neq a^}}\frac{K\cdot C\log\prn{\nicefrac{KT}{\delta}}+\log\prn{T}}{\Delta(a)}\cdot\log\prn{\nicefrac{KT}{\delta}}}$ with $1-\delta$ probability and pseudoregret $\bigO\prn{\sum_{a{\neq a^}}\frac{K\cdot C+\log\prn{T}}{\Delta(a)}\cdot\log\prn{KT}}$.} Notably, our algorithm is agnostic to the total corruption and the bound holds with respect to the total corruption that was added in retrospect. We also provide a lower bound showing that the linear dependency on the corruption level is necessary.