Consider a buyer participating in a repeated auction, such as those prevalent in display advertising. How would she test whether the auction is incentive compatible? To bid effectively, she is interested in whether the auction is single-shot incentive compatible---a pure second-price auction, with fixed reserve price---and also dynamically incentive compatible---her bids are not used to set future reserve prices. In this work we develop tests based on simple bid perturbations that a buyer can use to answer these questions, with a focus on dynamic incentive compatibility.
There are many potential A/B testing setups that one could use, but we find that many natural experimental designs are, in fact, flawed. For instance, we show that additive perturbations can lead to paradoxical results, where higher bids lead to lower optimal reserve prices. We precisely characterize this phenomenon and show that reserve prices are only guaranteed to be monotone for distributions satisfying the Monotone Hazard Rate (MHR) property. The experimenter must also decide how to split traffic to apply systematic perturbations. It is tempting to have this split be randomized, but we demonstrate empirically that unless the perturbations are aligned with the partitions used by the seller to compute reserve prices, the results are guaranteed to be inconclusive.
We validate our results with experiments on real display auction data and show that a buyer can quantify both single-shot and dynamic incentive compatibility even under realistic conditions where only the cost of the impression is observed (as opposed to the exact reserve price). We analyze the cost of running such experiments, exposing trade-offs between test accuracy, cost, and underlying market dynamics.