Relational Transfer in Reinforcement Learning
Transcript of Relational Transfer in Reinforcement Learning
![Page 1: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/1.jpg)
Lisa Torrey
University of Wisconsin – Madison
CS 540
Transfer Learning
![Page 2: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/2.jpg)
EducationHierarchical curriculum
Learning tasks share common stimulus-response elements
Abstract problem-solvingLearning tasks share general underlying principles
MultilingualismKnowing one language affects learning in
anotherTransfer can be both positive and negative
Transfer Learning in Humans
![Page 3: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/3.jpg)
Transfer Learning in AI
Given
Learn
Task T
Task S
![Page 4: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/4.jpg)
Goals of Transfer Learning
perf
orm
an
ce
training
higher start
higher slope
higher asymptote
![Page 5: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/5.jpg)
Inductive Learning
All Hypotheses
Allowed Hypotheses
Search
![Page 6: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/6.jpg)
Transfer in Inductive Learning
All Hypotheses
Allowed Hypotheses
Search
Thrun and Mitchell 1995: Transfer slopes for gradient descent
![Page 7: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/7.jpg)
Transfer in Inductive Learning
Bayesian Learning
Bayesian Transfer
Priordistribution
+
Data
=
Posterior Distributio
n
Bayesian methods
Raina et al.2006: Transfer a Gaussian prior
![Page 8: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/8.jpg)
Transfer in Inductive Learning
Line Curve
Surface Circle
Pipe
Hierarchical methods
Stracuzzi 2006: Learn Boolean concepts that can depend on each other
![Page 9: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/9.jpg)
Transfer in Inductive Learning
Dealing with Missing Data or Labels
Shi et al. 2008: Transfer via active learning
Task S
Task T
![Page 10: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/10.jpg)
Reinforcement Learning
Environment
s1
AgentQ(s1, a) =
0π(s1) = a1a
1
s2
r2
δ(s1, a1) = s2
r(s1, a1) = r2
Q(s1, a1) Q(s1, a1) + Δ
π(s2) = a2a2
δ(s2, a2) = s3
r(s2, a2) = r3
s3
r3
![Page 11: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/11.jpg)
Transfer in Reinforcement Learning
Starting-point
methods
Hierarchical methods
Alterationmethods
Imitation methods
New RL algorithms
![Page 12: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/12.jpg)
Transfer in Reinforcement Learning
0 0 0 0
0 0 0 0
0 0 0 0 target-task training
2 5 4 8
9 1 7 2
5 9 1 4
Initial Q-tabletransferno transfer
Source task
Starting-point methods
Taylor et al. 2005: Value-function transfer
![Page 13: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/13.jpg)
Transfer in Reinforcement Learning
Hierarchical methods
Run Kick
Pass Shoot
Soccer
Mehta et al. 2008: Transfer a learned hierarchy
![Page 14: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/14.jpg)
Transfer in Reinforcement Learning
Alteration methods
Walsh et al. 2006: Transfer aggregate states
Task S
Original statesOriginal actionsOriginal rewards
New statesNew actionsNew rewards
![Page 15: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/15.jpg)
Transfer in Reinforcement Learning
New RL Algorithms
Torrey et al. 2006: Transfer advice about skills
Environment
s1
AgentQ(s1, a) =
0π(s1) = a1a
1
s2r2
δ(s1, a1) = s2
r(s1, a1) = r2
Q(s1, a1) Q(s1, a1) + Δ
π(s2) = a2a2
δ(s2, a2) = s3
r(s2, a2) = r3
s3r3
![Page 16: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/16.jpg)
Transfer in Reinforcement Learning
Imitation methods
training
source
target
policy used
Torrey et al. 2007: Demonstrate a strategy
![Page 17: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/17.jpg)
My Research
Starting-point
methods
Imitation methods
Hierarchical methods
Hierarchical methods
New RL algorithms
SkillTransf
er
MacroTransf
er
![Page 18: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/18.jpg)
RoboCup Domain
3-on-2 BreakAway
3-on-2 KeepAway
3-on-2 MoveDownfield
2-on-1 BreakAway
![Page 19: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/19.jpg)
Inductive Logic Programming
IF [ ]THEN pass(Teammate)
IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 15 THEN pass(Teammate)
IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate)
IF distance(Teammate) ≤ 5 THEN pass(Teammate)
IF distance(Teammate) ≤ 10 THEN pass(Teammate)
…
![Page 20: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/20.jpg)
Advice Taking
Find Q-functions that minimize: ModelSize + C × DataMisfit
Batch Reinforcement Learning via Support Vector Regression (RL-SVR)
Environment
Agent
Batch 1
Environment
Agent
Batch 2
…Compute
Q-functions
![Page 21: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/21.jpg)
Advice Taking
Find Q-functions that minimize: ModelSize + C × DataMisfit
Batch Reinforcement Learning with Advice (KBKR)
Environment
Agent
Batch 1
Compute Q-
functions Environment
Agent
Batch 2
…
Advice
+ µ × AdviceMisfit
![Page 22: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/22.jpg)
Skill Transfer Algorithm
Source
Target
IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30THEN pass(Teammate)
ILP
Advice Taking
[Human advice]
Mapping
![Page 23: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/23.jpg)
Selected ResultsSkill transfer to 3-on-2 BreakAway from several tasks
![Page 24: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/24.jpg)
Macro-Operators
pass(Teammate)
move(Direction)
shoot(goalRight)
shoot(goalLeft)
IF [ ... ] THEN pass(Teammate)
IF [ ... ] THEN move(ahead)
IF [ ... ] THEN shoot(goalRight)
IF [ ... ] THEN shoot(goalLeft)
IF [ ... ] THEN pass(Teammate)
IF [ ... ] THEN move(left)
IF [ ... ] THEN shoot(goalRight)
IF [ ... ] THEN shoot(goalRight)
![Page 25: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/25.jpg)
Demonstration
source
target
training
policy used
An imitation method
![Page 26: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/26.jpg)
Macro Transfer AlgorithmSourc
e
Target
ILP
Demonstration
![Page 27: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/27.jpg)
Macro Transfer AlgorithmLearning structures
Positive: BreakAway
games that score
Negative: BreakAway games that didn’t score
ILP
IF actionTaken(Game, StateA, pass(Teammate), StateB) actionTaken(Game, StateB, move(Direction), StateC) actionTaken(Game, StateC, shoot(goalRight), StateD) actionTaken(Game, StateD, shoot(goalLeft), StateE)
THEN isaGoodGame(Game)
![Page 28: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/28.jpg)
Macro Transfer AlgorithmLearning rules for arcs
Positive: states in good games
that took the arc
Negative: states in good games that could have taken the arc but didn’t
ILP
shoot(goalRight)
IF [ … ]THEN enter(State)
IF [ … ]THEN loop(State, Teammate))
pass(Teammate)
![Page 29: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/29.jpg)
Selected ResultsMacro transfer to 3-on-2 BreakAway from 2-on-1 BreakAway
![Page 30: Relational Transfer in Reinforcement Learning](https://reader038.fdocuments.us/reader038/viewer/2022110119/556582f4d8b42a723f8b4dea/html5/thumbnails/30.jpg)
Machine learning is often designed in standalone tasks
Transfer is a natural learning ability that we would like to incorporate into machine learners
There are some successes, but challenges remain, like avoiding negative transfer and automating mapping
Summary