Breaking the curse of horizon: Infinite-horizon off-policy estimation

Qiang Liu

Lihong Li

Ziyang Tang

Denny Zhou

NeurIPS (Spotlight) (2018)

Download Google Scholar

Abstract

We consider off-policy estimation of the expected reward of a target policy using samples collected by a different behavior policy. Importance sampling (IS) has been a key technique for deriving (nearly) unbiased estimators, but is known to suffer from an excessively high variance in long-horizon problems. In the extreme case of infinite-horizon problems, the variance of an IS-based estimator may even be unbounded. In this paper, we propose a new off-policy estimator that applies IS directly on the stationary state-visitation distributions to avoid the exploding variance faced by existing methods. Our key contribution is a novel approach to estimating the density ratio of two stationary state distributions, with trajectories sampled from only the behavior distribution. We develop a mini-max loss function for the estimation problem, and derive a closed-form solution for the case of RKHS. We support our method with both theoretical and empirical analyses.

Research Areas

Machine Intelligence

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Breaking the curse of horizon: Infinite-horizon off-policy estimation

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Breaking the curse of horizon: Infinite-horizon off-policy estimation

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities