Combining parametric and nonparametric models for off-policy …12-16-00)-12-16-40-4537... ·...

Combining parametric and nonparametric models for off-policy evaluation

Omer Gottesman1, Yao Liu2, Scott Sussex1, Emma Brunskill2, Finale Doshi-Velez1

1Paulson School of Engineering and Applied Science, Harvard University2Department of Computer Science, Stanford University

Introduction

Off-Policy Evaluation –We wish to estimate the value of a sequential decision making evaluation policyfrom batch data, collected using a behavior policy we do not control

Introduction

Model Based vs. Importance Sampling –Importance sampling methods provide unbiased estimates of the value evaluation policy, but tend to require a huge amount of data to achieve reasonably low variance. When data is limited, model based methods tend to perform better.

In this work we focus on improving model based methods.

Combining multiple models

Challenge: Hard for one model to be good enough for the entire domain.

Question: If we had multiple models, with different strengths, could we combine them to get better estimates?

Approach: Use a planner to decide when to use each model to get the most accurate reward estimate over entire trajectories.

Balancing short vs. long term accuracy

Well modeled area

Poorly modeled area

Accurate simulation

Inaccurate simulation

Real Transition

Balancing short vs. long term accuracy

𝑔" − $𝑔" ≤ 𝐿'()*+

𝛾) ()-*+

𝐿) )- 𝜀) 𝑡 − 𝑡2 − 1 +()*+

𝛾) 𝜀'(𝑡)

Total return error

Error due to state estimation

Error due to reward estimation

𝐿)/' - Lipschitz constants of transition/reward functions𝜀)/' 𝑡 - Bound on model errors for transition/reward at time 𝑡𝑇 - Time horizon𝛾 - Reward discount factor𝑔" ≡ ∑)*+" 𝛾) 𝑟(𝑡) - Return over entire trajectory

Closely related to bound in - Asadi, Misra, Littman. “Lipschitz Continuity in Model-based Reinforcement Learning.” (ICML 2018).

Planning to minimize the estimated return error over entire trajectoriesWe use Monte Carlo Tree Search (MCTS) planning algorithm to minimize the return error bound over entire trajectories.

Agent𝑥)𝑎)𝑟)

Planner(𝑥), 𝑎))

Model to use−(𝑟) − ��))

State:Action:Reward:

Parametric vs. Nonparametric Models

Nonparametric models –Predicting the dynamics for a given state-action pair based on similarity to neighbors.Nonparametric models can be very accurate in regions of state space where data is abundant.

Parametric Models –Any parametric regression model or hand coded model incorporating domain knowledge.Parametric models will tend to generalize better to situations very different from the ones observed in the data.

Estimating bounds on model errors

𝑔" − $𝑔" ≤ 𝐿'()*+

𝛾) ()-*+

𝐿) )- 𝜀) 𝑡 − 𝑡2 − 1 +()*+

𝛾) 𝜀'(𝑡)

𝐿)/' - Lipschitz constants of transition/reward functions𝜀)/' 𝑡 - Bound on model errors for transition/reward at time 𝑡𝑇 - Time horizon𝛾 - Reward discount factor𝑔" ≡ ∑)*+" 𝛾) 𝑟(𝑡) - Return over entire trajectory

Estimating bounds on model errors

Parametric Nonparametric

𝜀),@ ≈ max Δ(𝑥)-F/, G𝑓)(𝑥)-, 𝑎))

𝑥)-

𝑥)-F/

G𝑓)-(𝑥)-, 𝑎)

𝑥 𝑥

𝑥)-∗

𝜀),J@ ≈ 𝐿) ⋅ Δ(𝑥, 𝑥)-∗ )

Demonstration on a toy domainPossible actions

Demonstration on a toy domain

Parametric model

Possible actions

Parametric model

Possible actions

Parametric model

Possible actions

Performance on medical simulatorsCancer : HIV :

• MCTS-MoE tends to outperforms both the parametric and nonparametric models• With access to the true model errors, the performance of the MCTS-MoE could be

improved even further• For these domains, all importance sampling methods result in errors which are

order of magnitudes larger than any model based method

Summary and Future Directions

• We provide a general framework for combining multiple models to improve off-policy evaluation.

• Improvements via individual models, error estimation or combining multiple models.• Extension to stochastic domains is conceptually straight-forward but

requires estimating distances between distributions rather than states.• Identifying particularly loose or tight error bounds.

Combining parametric and nonparametric models for off-policy …12-16-00)-12-16-40-4537... ·...

Documents

Transcript of Combining parametric and nonparametric models for off-policy …12-16-00)-12-16-40-4537... ·...

HW 4. Nonparametric Bayesian Models Parametric Model Fixed number of parameters that is independent of the data we’re fitting.

Six Sigma: Advanced Parametric and Nonparametric ......and two variable hypothesis tests using parametric and nonparametric techniques Requirements: Modeling Toolkit, Risk Simulator

Handbook of PARAMETRIC and NONPARAMETRIC ... of PARAMETRIC and NONPARAMETRIC STATISTICAL PROCEDURES David J.Sheskin Western Connecticut State University CRC Press Boca Raton New York

Intro to Parametric & Nonparametric Statistics Kinds of variables & why we care Kinds & definitions of nonparametric statistics Where parametric stats.

Nonlinear Parametric and Nonparametric Population ...Nonlinear Parametric and Nonparametric Population Pharmacokinetic Modeling on a Supercomputer Roger W. Jelliffe, Michael Van Guilder,

Parametric & Nonparametric Models for Univariate Tests, Tests of Association & Between Groups Comparisons Models we will consider Univariate Statistics.

New applications of parametric and nonparametric curve estimation ...

Module 9: Nonparametric Tests - Nova Southeastern …apps.fischlerschool.nova.edu/toolbox...Module 9 Overview ! Nonparametric Tests ! Parametric vs. Nonparametric Tests ! Restrictions

Survival Analysis: Overview of Parametric, Nonparametric, and ...

Parametric vs nonparametric dichotomous choice contingent ...Parametric vs nonparametric dichotomous choice contingent valuation models: testing the kernel estimator and its revealed

A Natural Nonparametric Generalization of Parametric ...people.stat.sc.edu/hansont/Austin2008.pdf · A Natural Nonparametric Generalization of Parametric Statistical Models ... (Y100,

Parametric vs Nonparametric

Parametric, nonparametric and parametric modelling of a ...

PARAMETRIC AND NONPARAMETRIC APPROACHES FOR MULTISENSOR ...web.eecs.umich.edu/~hero/Preprints/thesis_ma.pdf · ABSTRACT PARAMETRIC AND NONPARAMETRIC APPROACHES FOR MULTISENSOR DATA

BAYESIAN NONPARAMETRIC AND SEMI-PARAMETRIC …ufdcimages.uflib.ufl.edu/UF/E0/04/19/42/00001/wang_c.pdf · bayesian nonparametric and semi-parametric methods for incomplete longitudinal

Implementation of Semi-Parametric Bayesian …Particularly in medicine (survival analysis) and nonparametric parts (such as hazard function). Semi-parametric models are quite popular,

Selected Nonparametric and Parametric Statistical … Nonparametric and Parametric Statistical Tests for Two-Sample Cases | 1 Selected Nonparametric and Parametric Statistical Tests

Recurrent Event Data: Models, Analysis, Efciencypeople.stat.sc.edu/pena/TalksPresented/BrownUnivTalkMar08.pdfEfficiency: Parametric/Nonparametric Asymptotic efficiency of parametric

Distributed Parametric-Nonparametric Estimation in ...paduaresearch.cab.unipd.it/3692/1/main.pdf · In the framework of parametric and nonparametric distributed estimation, we intro-duce

Nonparametric Additive Models - Cemmap · nonparametric estimation but weaker than those made in parametric modeling. Single-index and partially linear models (Härdle, Gao, and Liang