Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa...
-
Upload
bennett-sharp -
Category
Documents
-
view
213 -
download
1
Transcript of Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa...
![Page 1: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/1.jpg)
Advice Taking and Transfer Learning:
Naturally-Inspired Extensionsto Reinforcement Learning
Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik
University of Wisconsin - Madison
University of Minnesota - Duluth*
![Page 2: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/2.jpg)
Reinforcement LearningReinforcement Learning
Environment
Agent
action rewardstate
May be
delayed
![Page 3: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/3.jpg)
Q-LearningQ-Learning
Update Q-function incrementallyUpdate Q-function incrementally Follow current Q-function to choose actionsFollow current Q-function to choose actions Converges to accurate Q-functionConverges to accurate Q-function
Q-function
state
actionvalue
policy(state) =
argmaxaction
![Page 4: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/4.jpg)
LimitationsLimitations
Agents begin without any informationAgents begin without any information
Random exploration required in early Random exploration required in early stages of learningstages of learning
Long training times can resultLong training times can result
![Page 5: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/5.jpg)
Naturally-Inspired Naturally-Inspired ExtensionsExtensions
Advice TakingAdvice Taking
Transfer LearningTransfer Learning
RL AgentHumanTeacher
Knowledge
Target-taskAgent
Knowledge
Source-taskAgent
![Page 6: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/6.jpg)
Potential BenefitsPotential Benefits
perf
orm
ance
training
with knowledgewithout knowledge
higher start
higher slope higher asymptote
![Page 7: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/7.jpg)
OutlineOutline
RL in a complex domainRL in a complex domain Extension #1: Advice TakingExtension #1: Advice Taking Extension #2: Transfer LearningExtension #2: Transfer Learning
Skill TransferSkill Transfer Macro TransferMacro Transfer MLN TransferMLN Transfer
![Page 8: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/8.jpg)
The RoboCup DomainThe RoboCup Domain
KeepAway
BreakAway
MoveDownfield
+1 per time step
+1 per meter
+1 upon goal
![Page 9: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/9.jpg)
The RoboCup DomainThe RoboCup Domain
distBetween(a0, Player)distBetween(a0, GoalPart)distBetween(Attacker, goalCenter)distBetween(Attacker, ClosestDefender)distBetween(Attacker, goalie)angleDefinedBy(topRight, goalCenter, a0)angleDefinedBy(GoalPart, a0, goalie)angleDefinedBy(Attacker, a0, ClosestDefender)angleDefinedBy(Attacker, a0, goalie)timeLeft
state
actions
move(ahead) shoot(GoalPart) pass(Teammate)move(away)move(right)move(left)
![Page 10: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/10.jpg)
Q-LearningQ-Learning
Q-function
state
actionvalue
policy(state) =
argmaxaction
State Action
Q
1 1 0.5
1 2 -0.5
1 3 0
2…
1…
0.3…
Function approximation
![Page 11: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/11.jpg)
Approximating the Q-functionApproximating the Q-function
Feature vectorWeight vector ●
Linear support-vector Linear support-vector regression:regression:
Q-value =
Set weights to minimizeSet weights to minimize:
ModelSize + C × DataMisfit
distBetween(a0, a1)distBetween(a0, a2)
distBetween(a0, goalie)…
0.2-0.10.9…
T
●
![Page 12: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/12.jpg)
RL in 3-on-2 BreakAwayRL in 3-on-2 BreakAway
0
0.1
0.2
0.3
0.4
0.5
0.6
0 500 1000 1500 2000 2500 3000
Training Games
Pro
bab
ilit
y o
f G
oal
![Page 13: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/13.jpg)
OutlineOutline
RL in a complex domainRL in a complex domain Extension #1: Advice TakingExtension #1: Advice Taking Extension #2: Transfer LearningExtension #2: Transfer Learning
Skill TransferSkill Transfer Macro TransferMacro Transfer MLN TransferMLN Transfer
![Page 14: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/14.jpg)
Extension #1: Advice TakingExtension #1: Advice Taking
IF an opponent is near
AND a teammate is open
THEN pass is the best action
![Page 15: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/15.jpg)
Advice in RLAdvice in RL
Advice sets constraints on Q-values Advice sets constraints on Q-values under specified conditionsunder specified conditions
IF an opponent is near meAND a teammate is openTHEN pass has a high Q-value
Apply as Apply as softsoft constraints in constraints in optimizationoptimization
ModelSize + C × DataMisfit + μ × AdviceMisfit
![Page 16: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/16.jpg)
Advice PerformanceAdvice Performance
![Page 17: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/17.jpg)
OutlineOutline
RL in a complex domainRL in a complex domain Extension #1: Advice TakingExtension #1: Advice Taking Extension #2: Transfer LearningExtension #2: Transfer Learning
Skill TransferSkill Transfer Macro TransferMacro Transfer MLN TransferMLN Transfer
![Page 18: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/18.jpg)
Extension #2: TransferExtension #2: Transfer
3-on-2 BreakAway
3-on-2 KeepAway
3-on-2 MoveDownfield
![Page 19: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/19.jpg)
Relational TransferRelational Transfer
First-order logic describes relationships First-order logic describes relationships between objectsbetween objects
distBetween(a0, Teammate) > 10
distBetween(Teammate, goalCenter) < 15
We want to transfer relational We want to transfer relational knowledgeknowledge Human-level reasoningHuman-level reasoning General representationGeneral representation
![Page 20: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/20.jpg)
OutlineOutline
RL in a complex domainRL in a complex domain Extension #1: Advice TakingExtension #1: Advice Taking Extension #2: Transfer LearningExtension #2: Transfer Learning
Skill TransferSkill Transfer Macro TransferMacro Transfer MLN TransferMLN Transfer
![Page 21: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/21.jpg)
Skill TransferSkill Transfer Learn advice about good actions from the source Learn advice about good actions from the source
tasktaskgood_action(pass(Teammate)):-
distBetween(a0, Teammate) > 10,
distBetween(Teammate, goalCenter) <15.
Example 1:Example 1:distBetween(a0, a1) = 15distBetween(a0, a1) = 15distBetween(a0, a2) = 5distBetween(a0, a2) = 5distBetween(a0, goalie) = 20distBetween(a0, goalie) = 20......action = pass(a1)action = pass(a1)outcome = caught(a1)outcome = caught(a1)
Select positive and negative examples of good actions Select positive and negative examples of good actions and apply inductive logic programming to learn rulesand apply inductive logic programming to learn rules
![Page 22: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/22.jpg)
User Advice in Skill TransferUser Advice in Skill Transfer
There may be new skills in the target There may be new skills in the target that cannot be learned from the sourcethat cannot be learned from the source E.g., shooting in BreakAwayE.g., shooting in BreakAway
We allow users to add their own advice We allow users to add their own advice about these new skillsabout these new skills
User advice simply adds to transfer User advice simply adds to transfer advice advice
![Page 23: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/23.jpg)
Skill Transfer to 3-on-2 Skill Transfer to 3-on-2 BreakAwayBreakAway
0
0.1
0.2
0.3
0.4
0.5
0.6
0 500 1000 1500 2000 2500 3000
Training Games
Pro
bab
ilit
y o
f G
oal
Standard RL
Skill Transfer from 2-on-1 BreakAway
Skill Transfer from 3-on-2 MoveDownfield
Skill Transfer from 3-on-2 KeepAway
![Page 24: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/24.jpg)
OutlineOutline
RL in a complex domainRL in a complex domain Extension #1: Advice TakingExtension #1: Advice Taking Extension #2: Transfer LearningExtension #2: Transfer Learning
Skill TransferSkill Transfer Macro TransferMacro Transfer MLN TransferMLN Transfer
![Page 25: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/25.jpg)
Macro TransferMacro Transfer
Find an action sequence that separates Find an action sequence that separates good games from bad gamesgood games from bad games
Learn first-order rules to control transitions Learn first-order rules to control transitions along the sequencealong the sequence
move(ahead) pass(Teammate) shoot(GoalPart)
Learn a strategy from the source taskLearn a strategy from the source task
![Page 26: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/26.jpg)
Transfer via DemonstrationTransfer via Demonstration
Games played in target task
0 100
…
Execute macro strategy
Perform standard RL
Agent learns an initial Q-function
Agent adapts to the target task
![Page 27: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/27.jpg)
Macro Transfer to 3-on-2 Macro Transfer to 3-on-2 BreakAwayBreakAway
0
0.1
0.2
0.3
0.4
0.5
0.6
0 500 1000 1500 2000 2500 3000
Training Games
Pro
bab
ilit
y o
f G
oal
Standard RL
Skill Transfer from 2-on-1 BreakAway
Macro Transfer from 2-on-1 BreakAway
![Page 28: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/28.jpg)
OutlineOutline
RL in a complex domainRL in a complex domain Extension #1: Advice TakingExtension #1: Advice Taking Extension #2: Transfer LearningExtension #2: Transfer Learning
Skill TransferSkill Transfer Macro TransferMacro Transfer MLN TransferMLN Transfer
![Page 29: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/29.jpg)
MLN TransferMLN Transfer Learn a Markov Logic Network to Learn a Markov Logic Network to
represent the source-task policy represent the source-task policy relationallyrelationally
Apply the policy via demonstration in the Apply the policy via demonstration in the target tasktarget task
MLNQ-function
state
actionvalue
![Page 30: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/30.jpg)
Markov Logic NetworksMarkov Logic Networks A Markov network models a joint distributionA Markov network models a joint distribution
A Markov Logic Network combines probability with A Markov Logic Network combines probability with logic logic Template: a set of first-order formulas with weightsTemplate: a set of first-order formulas with weights Each grounded predicate in a formula becomes a nodeEach grounded predicate in a formula becomes a node Predicates in grounded formula are connected by arcsPredicates in grounded formula are connected by arcs
Probability of a world: (1/Z) exp( Probability of a world: (1/Z) exp( ΣΣ W WiiNNi i ))
X Y Z
A B
![Page 31: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/31.jpg)
MLN Q-functionMLN Q-function
IF distance(me, Teammate) < 15
AND angle(me, goalie, Teammate) > 45
THEN Q є (0.8, 1.0)
IF distance(me, GoalPart) < 10
AND angle(me, goalie, GoalPart) > 45
THEN Q є (0.8, 1.0)
Formula 1
W1 = 0.75
N1 = 1 teammate
Formula 2
W2 = 1.33
N2 = 3 goal parts
Probability that Q є (0.8, 1.0): __exp(W1N1 + W2N2)__
1 + exp(W1N1 + W2N2)
![Page 32: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/32.jpg)
Using an MLN Q-functionUsing an MLN Q-function
Q є (0.8, 1.0) P1 = 0.75
Q є (0.5, 0.8) P2 = 0.15
Q є (0, 0.5) P2 = 0.10
Q = P1 ● E [Q | bin1]
+ P2 ● E [Q | bin2]
+ P3 ● E [Q | bin3]
Q-value of most similar
training example in bin
![Page 33: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/33.jpg)
MLN Transfer to 3-on-2 MLN Transfer to 3-on-2 BreakAwayBreakAway
0
0.1
0.2
0.3
0.4
0.5
0.6
0 500 1000 1500 2000 2500 3000Training Games
Pro
bab
ilit
y o
f G
oal
MLN Transfer
Macro Transfer
Value-function Transfer
Standard RL
![Page 34: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/34.jpg)
ConclusionsConclusions
Advice and transfer can provide RL agents with Advice and transfer can provide RL agents with knowledge that improves early performanceknowledge that improves early performance
Relational knowledge is desirable because it is Relational knowledge is desirable because it is general and involves human-level reasoninggeneral and involves human-level reasoning
More detailed knowledge produces larger initial More detailed knowledge produces larger initial benefits, but is less widely transferrablebenefits, but is less widely transferrable
![Page 35: Advice Taking and Transfer Learning: Naturally-Inspired Extensions to Reinforcement Learning Lisa Torrey, Trevor Walker, Richard Maclin*, Jude Shavlik.](https://reader035.fdocuments.us/reader035/viewer/2022070404/56649f385503460f94c54b18/html5/thumbnails/35.jpg)
AcknowledgementsAcknowledgements
DARPA grant HR0011-04-1-0007DARPA grant HR0011-04-1-0007 DARPA grant HR0011-07-C-0060DARPA grant HR0011-07-C-0060 DARPA grant FA8650-06-C-7606DARPA grant FA8650-06-C-7606 NRL grant N00173-06-1-G002NRL grant N00173-06-1-G002
Thank You