Jump to Content

Nicolas Remy

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract How to measure the incremental return on Ad spend (iROAS) is a fundamental problem for the online advertising industry. A standard modern tool is to run randomized geo experiments, where experimental units are non-overlapping ad-targetable geographical areas (Vaver & Koehler 2011). However, how to design a reliable and cost-effective geo experiment can be complicated, for example: 1) the number of geos is often small, 2) the response metric (e.g. revenue) across geos can be very heavy-tailed due to geo heterogeneity, and furthermore 3) the response metric can vary dramatically over time. To address these issues, we propose a robust nonparametric method for the design, called Trimmed Match Design (TMD), which extends the idea of Trimmed Match (Chen & Au 2019) and furthermore integrates the techniques of optimal subset pairing and sample splitting in a novel and systematic manner. Some simulation and real case studies are presented. We also point out a few open problems for future research. View details
    Cross Panel Imputation
    Jim Koehler
    Google, Inc. (2016), pp. 1-18 (to appear)
    Preview abstract Many empirical micro-economics studies rely on consumer panels. For example, TV and web metering panels track TV and online usage of individuals. Sometimes more than one panel are available although these panels use different metering technologies and are subject to varying degrees of missingness. The problem we consider here is how to combine imputation based on two panels which have similar but not identical statistical characteristics. In the US, we have two two-screen panels, panel A (TV + desktop) and panel B(desktop + mobile) which are both calibrated to the US internet population. We want to estimate a count of ad impressions across all three-screens. As desktop impressions are metered in both panels, we fit a joint imputation model by pooling observed desktop impression counts across panels. After imputation on panel B, we fit a truncated negative binomial hurdle regression of mobile impression count over desktop impression count, demographic information, etc. And then, for each panelist in the panel A, we predict his/her mobile impression counts. In this way, we 'impute' mobile impressions in the panel A to facilitate three-screens measurements. View details
    Inferring causal impact using Bayesian structural time-series models
    Fabian Gallusser
    Jim Koehler
    Steven L. Scott
    Annals of Applied Statistics, vol. 9 (2015), pp. 247-274
    Preview abstract An important problem in econometrics and marketing is to infer the causal impact that a designed market intervention has exerted on an outcome metric over time. In order to allocate a given budget optimally, for example, an advertiser must assess to what extent different campaigns have contributed to an incremental lift in web searches, product installs, or sales. This paper proposes to infer causal impact on the basis of a diffusion-regression state-space model that predicts the counterfactual market response that would have occurred had no intervention taken place. In contrast to classical difference-in-differences schemes, state-space models make it possible to (i) infer the temporal evolution of attributable impact, (ii) incorporate empirical priors on the parameters in a fully Bayesian treatment, and (iii) flexibly accommodate multiple sources of variation, including the time-varying influence of contemporaneous covariates, i.e., synthetic controls. Using a Markov chain Monte Carlo algorithm for model inversion, we illustrate the statistical properties of our approach on synthetic data. We then demonstrate its practical utility by evaluating the effect of an online advertising campaign on search-related site visits. We discuss the strengths and limitations of state-space models in enabling causal attribution in those settings where a randomised experiment is unavailable. The CausalImpact R package provides an implementation of our approach. View details
    Advertising on YouTube and TV: A Meta-analysis of Optimal Media-mix Planning
    Georg M. Goerg
    Sheethal Shobowale
    Jim Koehler
    Journal of Advertising Research (JAR), vol. 57 (2015), pp. 283-304 (to appear)
    Preview abstract In this work we investigate under what circumstances a TV campaign should be complemented with online advertising to increase combined reach. First, we use probabilistic models to derive necessary and sufficient conditions. We then test these optimality conditions on empirical findings of a large collection of TV campaigns to answer two important questions: i) which characteristics of a TV campaign make it favorable to shift part of its budget to online advertising?; and ii) if it should shift, how much cost savings and additional reach can advertisers expect? First, we use classification methods such as linear discriminant analysis, logistic regression, and decision trees to decide whether a TV campaign should add online advertising; secondly, we train linear and support vector regression models to predict optimal budget allocation, cost savings, or additional reach. To train these models we use optimization results on roughly 26,000 campaigns. We do not only achieve excellent out-of-sample predictive power, but also obtain simple, interpretable, and actionable rules that improve the understanding of media mix advertising. View details
    How Many People Visit YouTube? Imputing Missing Events in Panels With Excess Zeros
    Georg M. Goerg
    Jim Koehler
    ; SAGE Publications - edited by Herwig Friedl and Helga Wagner, Linz, Austria (2015), pp. 1-6
    Preview abstract Media-metering panels track TV and online usage of people to analyze viewing behavior. However, panel data is often incomplete due to non-registered devices, non-compliant panelists, or work usage. We thus propose a probabilistic model to impute missing events in data with excess zeros using a negative-binomial hurdle model for the unobserved events and beta-binomial sub-sampling to account for missingness. We then use the presented models to estimate the number of people in Germany who visit YouTube. View details
    Preview abstract Many socio-economic studies rely on panel data as they also provide detailed demographic information about consumers. For example, advertisers use TV and web metering panels to estimate ads effectiveness in selected target demographics. However, panels often record only a fraction of all events due to non-registered devices, technical problems, or work usage. Goerg et al. (2015) present a beta-binomial negative-binomial hurdle (BBNBH) model to impute missing events in count data with excess zeros. In this work, we study empirical properties of the MLE for the BBNBH model, extend it to categorical covariates, introduce a penalized maximum likelihood estimator (MLE) to get accurate estimates by demographic group, and apply the methodology to a German media panel to learn about demographic patterns in the YouTube viewership. View details
    Preview abstract There is increasing interest in measuring the overlap and/or incremental reach of cross-media campaigns. The direct method is to use a cross-media panel but these are expensive to scale across all media. Typically, the cross-media panel is too small to produce reliable estimates when the interest comes down to subsets of the population. An alternative is to combine information from a small cross-media panel with a larger, cheaper but potentially biased single media panel. In this article, we develop a data enrichment approach specifically for incremental reach estimation. The approach not only integrates information from both panels that takes into account potential panel bias, but borrows strength from modeling conditional dependence of cross-media reaches. We demonstrate the approach with data from six campaigns for estimating YouTube video ad incremental reach over TV. In a simulation directly modeled on the actual data, we find that data enrichment yields much greater accuracy than one would get by either ignoring the larger panel, or by using it in a data fusion. View details
    Collaboration in the Cloud at Google
    Diane Lambert
    Makoto Uchida
    research.google.com (2014), pp. 1-13
    Preview abstract Through a detailed analysis of logs of activity for all Google employees, this paper shows how the Google Docs suite (documents, spreadsheets and slides) enables and increases collaboration within Google. In particular, visualization and analysis of the evolution of Google’s collaboration network show that new employees, have started collaborating more quickly and with more people as usage of Docs has grown. Over the last two years, the percentage of new employees who collaborate on Docs per month has risen from 70% to 90% and the percentage who collaborate with more than two people has doubled from 35% to 70%. Moreover, the culture of collaboration has become more open, with public sharing within Google overtaking private sharing. View details
    The Optimal Mix of TV and Online Ads to Maximize Reach
    Jim Koehler
    Georg M. Goerg
    research.google.com, 76 Ninth Avenue (2013), pp. 1-16
    Preview abstract Brand marketers often wonder how they should allocate budget between TV and online ads in order to maximize reach or maintain the same reach at a lower cost. We use probability models based on historical cross media panel data to suggest the optimal budget allocation between TV and online ads to maximize reach to the target demographics. We take a historical TV campaign and estimate the reach and GRPs of a hypothetical cross-media campaign if some budget was shifted from TV to online. The models are validated against simulations and historical cross-media campaigns. They are illustrated on one case study to show how an optimized cross-media campaign can obtain a higher reach at the same cost or maintain the same reach at a lower cost than the TV-only campaign. View details
    No Results Found