Jump to Content
Ofer Meshi

Ofer Meshi

Ofer Meshi is a Research Scientist at Google. Prior to joining Google, he was a Research Assistant Professor at the Toyota Technological Institute at Chicago. Before that he obtained a Ph.D. and an M.Sc. in Computer Science from the Hebrew University of Jerusalem and a B.Sc. in Computer Science from Tel Aviv University. Ofer's research interests are in machine learning and optimization. In particular, he seeks efficient algorithms for: structured output prediction, probabilistic graphical models, reinforcement learning and other related problems. More info and previous publications can be found in his homepage.

Research Areas

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    On the Value of Prior in Online Learning to Rank
    Branislav Kveton
    The 25th International Conference on Artificial Intelligence and Statistics (2022)
    Preview abstract This paper addresses the cold-start problem in online learning to rank (OLTR). We show both theoretically and empirically that priors improve the quality of ranked lists presented to users interactively based on user feedback. These priors can come in the form of unbiased estimates of the relevance of the ranked items, or more practically, can be obtained from offline-learned models. Our experiments show the effectiveness of priors in improving the short-term regret of tabular OLTR algorithms, based on Thompson sampling and BayesUCB. View details
    Advantage Amplification in Slowly Evolving Latent-State Environments
    Martin Mladenov
    Proceedings of the Twenty-eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macau, China (2019), pp. 3165-3172
    Preview abstract Latent-state environments with long horizons, such as those faced by recommender systems, pose significant challenges for reinforcement learning (RL). In this work, we identify and analyze several key hurdles for RL in such environments, including belief state error and small action advantage. We develop a general principle called advantage amplification that can overcome these hurdles through the use of temporal abstraction. We propose several aggregation methods and prove they induce amplification in certain settings. We also bound the loss in optimality incurred by our methods in environments where latent state evolves slowly and demonstrate their performance empirically in a stylized user-modeling task. View details
    Preview abstract Ranking is a central task in machine learning and information retrieval. In this task, it is especially important to present the user with a slate of items that is appealing as a whole. This in turn requires taking into account interactions between items, since intuitively, placing an item on the slate affects the decision of which other items should be placed alongside it. In this work, we propose a sequence-to-sequence model for ranking called seq2slate. At each step, the model predicts the next "best" item to place on the slate given the items already selected. The sequential nature of the model allows complex dependencies between the items to be captured directly in a flexible and scalable way. We show how to learn the model end-to-end from weak supervision in the form of easily obtained click-through data. We further demonstrate the usefulness of our approach in experiments on standard ranking benchmarks as well as in a real-world recommendation system. View details
    Planning and Learning with Stochastic Action Sets
    Martin Mladenov
    Proceedings of the Twenty-seventh International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm (2018), pp. 4674-4682
    Preview abstract This is an extended version of the paper Planning and Learning with Stochastic Action Sets that appeared in the Proceedings of the Twenty-seventh International Joint Conference on Artificial Intelligence (IJCAI-18), pp.4674-4682, Stockholm (2018). In many practical uses of reinforcement learning (RL) the set of actions available at a given state is a random variable, with realizations governed by an exogenous stochastic process. Somewhat surprisingly, the foundations for such sequential decision processes have been unaddressed. In this work, we formalize and investigate MDPs with stochastic action sets (SAS-MDPs) to provide these foundations. We show that optimal policies and value functions in this model have a structure that admits a compact representation. From an RL perspective, we show that Q-learning with sampled action sets is sound. In model-based settings, we consider two important special cases: when individual actions are available with independent probabilities; and a sampling-based model for unknown distributions. We develop poly-time value and policy iteration methods for both cases; and in the first, we offer a poly-time linear programming solution. View details
    Asynchronous Parallel Coordinate Minimization for MAP Inference
    Alexander G. Schwing
    Advances in Neural Information Processing Systems (NIPS) 30 (2017)
    Preview abstract Finding the maximum a-posteriori (MAP) assignment is a central task in graphical models. Since modern applications give rise to very large problem instances, there is increasing need for efficient solvers. In this work we propose to improve the efficiency of coordinate-minimization-based dual-decomposition solvers by running them asynchronously in parallel. In this case message-passing inference is performed by multiple processing units simultaneously, all reading and writing to shared memory, without coordination. We analyze the convergence properties of the resulting algorithms and identify settings where speedup gains can be expected. Our numerical evaluations show that this approach indeed achieves significant speedups in common computer vision tasks. View details
    Approximate Linear Programming for Logistic Markov Decision Processes
    Martin Mladenov
    Tyler Lu
    Proceedings of the Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17), Melbourne, Australia (2017), pp. 2486-2493
    Preview abstract This is an extended version of the paper Logistic Markov Decision Processes that appeared in the Proceedings of the Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17), pp.2486-2493, Melbourne (2017). Online and mobile interactions with users, in areas such as advertising and product or content recommendation, have been transformed by machine learning techniques. However, such methods have largely focused on myopic prediction, i.e., predicting immediate user response to system actions (e.g., ads or recommendations), without explicitly accounting for the long-term impact on user behavior, nor the potential need for planning action sequences. In this work, we propose the use of Markov decision processes (MDPs) to formulate the long-term decision problem and address two key questions that emerge in their application to user interaction. The first focuses on model formulation, specifically, how best to construct MDP models of user interaction in a way that exploits the great successes of myopic prediction models. To this end, we propose a new model called logistic MDPs, an MDP formulation that allows the concise specification of transition dynamics. It does so by augmenting the natural factored form of dynamic Bayesian networks (DBNs) with user response variables that are captured by a logistic regression model (the latter being precisely the model used for myopic user interaction). The second question we address is how best to solve large logistic MDPs of this type. A variety of methods have been proposed for solving MDPs that exploit the conditional independence reflected in the DBN representations, including approximate linear programming (ALP). Despite their compact form, logistic MDPs do not admit the same conditional independence as DBNs, nor do they satisfy the linearity requirements for standard ALP. We propose a constraint generation approach to ALP for logistic MDPs that circumvents these problems by: (a) recovering compactness by conditioning on the logistic response variable; and (b) devising two procedures, one exact and one approximate, that linearize the search for violated constraints in the master LP. For the approximation procedure, we also derive error bounds on the quality of the induced policy. We demonstrate the effectiveness of our approach on advertising problems with up to several thousand sparse binarized features (up to 2^54 and 2^39 actions). View details
    No Results Found