Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans,...

53
How I Learned To Stop Worrying And Love Offline RL An Optimistic Perspective on Offline Reinforcement Learning Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi

Transcript of Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans,...

Page 1: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

How I Learned To Stop Worrying And Love Offline RL

An Optimistic Perspective on Offline Reinforcement Learning

Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi

Page 2: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

What makes Deep Learning Successful?

P 2An Optimistic Perspective on Offline Reinforcement Learning

Expressive function approximators

Page 3: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

What makes Deep Learning Successful?

P 3An Optimistic Perspective on Offline Reinforcement Learning

Expressive function approximators

Powerful learning algorithms

Page 4: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

What makes Deep Learning Successful?

P 4An Optimistic Perspective on Offline Reinforcement Learning

Expressive function approximators

Large and Diverse Datasets

Powerful learning algorithms

Page 5: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

How to make Deep RL similarly successful?

P 5An Optimistic Perspective on Offline Reinforcement Learning

Expressive function approximators

Good learning algorithms e.g., actor-critic, approx DP

Page 6: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

How to make Deep RL similarly successful?

P 6An Optimistic Perspective on Offline Reinforcement Learning

Large and Diverse Datasets

Expressive function approximators

Good learning algorithms e.g., actor-critic, approx DP

Page 7: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

How to make Deep RL similarly successful?

P 7An Optimistic Perspective on Offline Reinforcement Learning

Interactive EnvironmentsExpressive function approximators

Good learning algorithms e.g., actor-critic, approx DP Active Data Collection

Page 8: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

RL for Real-World: RL with Large Datasets

P 8An Optimistic Perspective on Offline Reinforcement Learning

[1] Dasari, Ebert, Tian, Nair, Bucher, Schmeckpeper, .. Finn. RoboNet: Large-Scale Multi-Robot Learning.[2] Yu, Xian, Chen, Liu, Liao, Madhavan, Darrell. BDD100K: A Large-scale Diverse Driving Video Database.

RoboNet

Robotics

Page 9: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

RL for Real-World: RL with Large Datasets

P 9An Optimistic Perspective on Offline Reinforcement Learning

[1] Dasari, Ebert, Tian, Nair, Bucher, Schmeckpeper, .. Finn. RoboNet: Large-Scale Multi-Robot Learning.[2] Yu, Xian, Chen, Liu, Liao, Madhavan, Darrell. BDD100K: A Large-scale Diverse Driving Video Database.

RoboNet

Robotics

Recommender Systems

Page 10: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

RL for Real-World: RL with Large Datasets

P 10An Optimistic Perspective on Offline Reinforcement Learning

[1] Dasari, Ebert, Tian, Nair, Bucher, Schmeckpeper, .. Finn. RoboNet: Large-Scale Multi-Robot Learning.[2] Yu, Xian, Chen, Liu, Liao, Madhavan, Darrell. BDD100K: A Large-scale Diverse Driving Video Database.

RoboNet

Robotics

Recommender Systems

Self-Driving Cars

Page 11: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

RL for Real-World: RL with Large Datasets

P 11An Optimistic Perspective on Offline Reinforcement Learning

[1] Dasari, Ebert, Tian, Nair, Bucher, Schmeckpeper, .. Finn. RoboNet: Large-Scale Multi-Robot Learning.[2] Yu, Xian, Chen, Liu, Liao, Madhavan, Darrell. BDD100K: A Large-scale Diverse Driving Video Database.

RoboNet

Robotics

Recommender Systems

Self-Driving Cars

Page 12: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

P 12An Optimistic Perspective on Offline Reinforcement Learning

Offline RL: A Data-Driven RL Paradigm

Image Source: Data-Driven Deep Reinforcement Learning, BAIR Blog. https://bair.berkeley.edu/blog/2019/12/05/bear/

Page 13: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

P 13An Optimistic Perspective on Offline Reinforcement Learning

Offline RL: A Data-Driven RL Paradigm

Image Source: Data-Driven Deep Reinforcement Learning, BAIR Blog. https://bair.berkeley.edu/blog/2019/12/05/bear/

Offline RL can help:

● Pretrain agents on existing logged data.

Page 14: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

P 14An Optimistic Perspective on Offline Reinforcement Learning

Offline RL: A Data-Driven RL Paradigm

Image Source: Data-Driven Deep Reinforcement Learning, BAIR Blog. https://bair.berkeley.edu/blog/2019/12/05/bear/

Offline RL can help:

● Pretrain agents on existing

logged data.

● Evaluate RL algorithms on the basis of exploitation alone on common datasets.

Page 15: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

P 15An Optimistic Perspective on Offline Reinforcement Learning

Offline RL: A Data-Driven RL Paradigm

Image Source: Data-Driven Deep Reinforcement Learning, BAIR Blog. https://bair.berkeley.edu/blog/2019/12/05/bear/

Offline RL can help:

● Pretrain the agents on existing logged data.

● Evaluate RL algorithms on the basis of exploitation alone on common datasets.

● Deliver real world impact.

Page 16: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

P 16An Optimistic Perspective on Offline Reinforcement Learning

But .. Offline RL is Hard!

NO new corrective feedback!

Page 17: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

P 17An Optimistic Perspective on Offline Reinforcement Learning

But .. Offline RL is Hard!

Requires Counterfactual Generalization

Page 18: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

P 18An Optimistic Perspective on Offline Reinforcement Learning

But .. Offline RL is Hard!

Bootstrapping (Learning guess from a guess)

Function Approximation

Fully Off-Policy

Page 19: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

P 19An Optimistic Perspective on Offline Reinforcement Learning

Standard RL fails in Offline setting ..

Page 20: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

P 20An Optimistic Perspective on Offline Reinforcement Learning

Standard RL fails in Offline setting ..

Page 21: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

P 21An Optimistic Perspective on Offline Reinforcement Learning

Standard RL fails in Offline setting ..

Page 22: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

P 22An Optimistic Perspective on Offline Reinforcement Learning

Can standard off-policy RL succeed in the offline Setting?

Page 23: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

P 23An Optimistic Perspective on Offline Reinforcement Learning

Offline RL on Atari 2600

200 million frames

(standard protocol)

Train 5 DQN (Nature) agents on each Atarigame using sticky actions (stochasticity)

Page 24: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

P 24An Optimistic Perspective on Offline Reinforcement Learning

Offline RL on Atari 2600

Save all of the tuples of (observation, action, next observation, reward) encountered to DQN-replay

dataset(s)

Page 25: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Offline RL on Atari 2600

Train off-policy agents using DQN-replay dataset(s) without any further environment interaction

Page 26: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Does Offline DQN work?

Page 27: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Distributional RL uses Z(s, a), a distribution over returns, instead of the Q-function.

Let's try recent off-policy algorithms!

Z(1/K) Z(K/K)

Shared Neural Network

Z(2/K)

QR-DQN

Actions

Returns

Page 28: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Does Offline QR-DQN work?

Page 29: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Does Offline DQN work?

Page 30: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Offline DQN (Nature) vs Offline C51

Average online scores of C51 and DQN (Nature) agents trained offline on DQN replay dataset for the same number of gradient steps as online DQN. The horizontal line shows the performance of

fully-trained DQN.

Page 31: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Developing Robust Offline RL algorithms

➢ Emphasis on Generalization○ Given a fixed dataset, generalize to unseen states during evaluation.

Page 32: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Developing Robust Offline RL algorithms

➢ Emphasis on Generalization

○ Given a fixed dataset, generalize to unseen states during evaluation.

➢ Ensemble of Q-estimates:

○ Ensembling, Dropout widely used for improving generalization.

Page 33: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Ensemble-DQN

Train multiple (linear) Q-heads with different

random initialization.Shared Neural Network

Q1 Q2

Ensemble-DQN

QK

Returns

Actions Actions

..Actions

Page 34: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Does Offline Ensemble-DQN work?

Page 35: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Does Offline DQN work?

Page 36: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Developing Robust Offline RL algorithms

➢ Emphasis on Generalization○ Given a fixed dataset, generalize to unseen states during evaluation.

➢ Q-learning as constraint satisfaction:

Page 37: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Random Ensemble Mixture (REM)

Minimize TD error on random (per minibatch) convex combination of multiple Q-estimates.

𝛼2

REM

∑i ⍺i Qi

𝛼K

Shared Neural Network

Q1 Q2 QK

Actions Returns

Page 38: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

REM vs QR-DQN

𝛼2

REM

∑i ⍺i Qi

𝛼K

Shared Neural Network

Q1 Q2 QK

Actions Returns

Z(1/K) Z(K/K)

Shared Neural Network

Z(2/K)

QR-DQN

Returns

Page 39: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Offline Stochastic Atari Results

Scores averaged over 5 runs of offline agents trained using DQN replay data across 60 Atari games for 5X gradient steps. Offline REM surpasses gains from online C51 and offline QR-DQN.

Page 40: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Offline REM vs. Baselines

Page 41: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Reviewers asked: Does Online REM work?

Average normalized scores of online agents trained for 200 million game frames. Multi-network REM with 4 Q-functions performs comparably to QR-DQN.

Page 42: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Key Factor in Success: Offline Dataset Size

Randomly subsample N% of frames from 200 million frames for offline training.

Divergence with 1% of data for prolonged training!

Page 43: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Key Factor in Success: Offline Dataset Composition

Subsample first 10% of total frames (20 million) for offline training -- much lower quality data.

Page 44: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Choice of Algorithm: Offline Continuous Control

Offline agents trained using full experience replay of DDPG on MuJoCo environments.

Page 45: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Offline RL: Stability / Overfitting

More gradient updates eventually degrade performance :(

Average online scores of offline agents trained on 5 games using logged DQN replay data for 5X gradient steps compared to online DQN.

Page 46: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

Offline RL for Robotics

Page 47: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

Future Work

🙷The potential for off-policy learning remains tantalizing, the best way to achieve it still a mystery.🙷 - Sutton & Barto🙷The potential for off-policy learning remains tantalizing, the best way to achieve it still a mystery.🙷 - Sutton & Barto

An Optimistic Perspective on Offline Reinforcement Learning

Page 48: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

● Rigorous characterization of role of generalization in offline RL

Offline RL: Future Work

Page 49: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

● Rigorous characterization of role of generalization in offline RL

● Benchmarking with various data collection strategies○ Subsampling DQN-replay datasets (e.g., first / last k million frames)

Offline RL: Future Work

Page 50: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

● Rigorous characterization of role of generalization in offline RL

● Benchmarking with various data collection strategies○ Subsampling DQN-replay datasets (e.g., first / last k million frames)

● Offline Evaluation / Hyperparameter Tuning○ Currently, online evaluation used for early stopping. “True” offline RL

requires offline policy evaluation.

Offline RL: Future Work

Page 51: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

● Rigorous characterization of role of generalization in offline RL

● Benchmarking with various data collection strategies○ Subsampling DQN-replay datasets (e.g., first / last k million frames)

● Offline Evaluation / Hyperparameter Tuning○ Currently, online evaluation used for early stopping. “True” offline RL require offline policy evaluation.

● Model-based RL approaches

Offline RL: Future Work

Page 52: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

● Robust RL algorithms (e.g. REM, QR-DQN), trained on sufficiently large and diverse datasets, perform quite well in the offline setting.

● Offline RL provides a standardized setup for:○ Isolating exploitation from exploration○ Developing sample efficient and stable algorithms

○ Pretrain RL agents on logged data

TL;DR

Page 53: Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi And ... · Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi. What makes Deep Learning Successful? An Optimistic Perspective on

An Optimistic Perspective on Offline Reinforcement Learning

For code, DQN-replay dataset(s) and previous version of paper, refer to

offline-rl.github.io

Thank you!