We consider standard regret minimization setting where at each time step the decision maker has to choose a distribution over k alternatives, and then observes the loss of each alternative. The setting is very similar to the classical online job scheduling setting with three major differences:
- Information model: in the regret minimization setting losses are only observed after the actions (assigning the job to a machine) is performed and not observed before the action selection, as assumed in the classical online job scheduling setting,
- The comparison class: in regret minimization the comparison class is the best static algorithm (i.e., distribution over alternatives) and not the optimal offline solution.
- Performance measure: In regret minimization we measure the additive difference to the optimal solution in the comparison class, in contrast to the ratio used in online job scheduling setting.
The major contribution of this work is to design a novel regret minimization algorithm based on calibration that guarantees a vanishing average regret, where the regret is measured with respect to the best static decision maker, who selects the same distribution over alternatives at every time step. Our results hold for a wide class of global cost functions. which include the makespan and the Ld norms, for d>1. In contrast, we show that for concave global cost functions, such as Ld norms for d<1, the worst-case average regret does not vanish.
In addition to the general calibration based algorithm, we provide simple and efficient algorithms for special interesting cases.