Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker...

29
Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin University of Minnesota-Duluth, USA

Transcript of Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker...

Page 1: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Skill Acquisition via Transfer Learning

and Advice TakingLisa Torrey, Jude Shavlik, Trevor Walker

University of Wisconsin-Madison, USA

Richard MaclinUniversity of Minnesota-Duluth, USA

Page 2: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Transfer LearningTransfer Learning

Agent learns Task A

Agent encounters related Task B

Agent discovers how tasks are related

So far the user

provides this info to the agent

Agent uses knowledge from Task A to learn Task B faster

Task A is the

source. Task B is

the target.

Page 3: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Transfer LearningTransfer Learning

The goal for the target task:

perf

orm

ance

training

with transferwithout transfer

Page 4: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Reinforcement Learning Reinforcement Learning OverviewOverview

Take an actionObserve world state

Receive a reward

Policy: choose the action with the highest Q-value in the current state

Use the rewards to

estimate the Q-values of actions in

states

Described by a set of features

Page 5: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Transfer in Reinforcement Transfer in Reinforcement LearningLearning

What knowledge will we transfer from the source?What knowledge will we transfer from the source? Q-functions (Taylor & Stone 2005)Q-functions (Taylor & Stone 2005) Policies (Torrey et al. 2005)Policies (Torrey et al. 2005) Skills Skills (this work)(this work)

How will we extract that knowledge from the source?How will we extract that knowledge from the source? From Q-functions (Torrey et al. 2005)From Q-functions (Torrey et al. 2005) From observed behavior From observed behavior (this work)(this work)

How will we apply that knowledge in the target?How will we apply that knowledge in the target? Model reuse (Taylor & Stone 2005)Model reuse (Taylor & Stone 2005) Advice takingAdvice taking (Torrey et al. 2005, this work) (Torrey et al. 2005, this work)

Page 6: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Advice TakingAdvice Taking

AdviceAdvice: instructions for the learner: instructions for the learner

IF: IF: conditioncondition

THEN: THEN: prefer actionprefer action

In these states

Qaction1 >

Qaction2

Complexity of Q-function

Apply advice as Apply advice as soft constraintssoft constraints (KBKR, 2005)(KBKR, 2005)

For each action, find the Q-function that minimizes:

Error on Training

Data

Disagreement with Advice+ +

Page 7: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Experimental Domain: Experimental Domain: RoboCupRoboCup

Keep the ballStone & Sutton 2001

KeepAway (KA/MKA)

Score a goalMaclin et al. 2005

BreakAway (BA)

MoveDownfield (MD)

Cross the line Torrey et al. 2006

Different objectives, but a transferable skill: passing to teammates

Page 8: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

A Challenge for Skill A Challenge for Skill TransferTransfer

Shared skills are not exactly the sameShared skills are not exactly the same Skills have general and specific aspectsSkills have general and specific aspects

Aspects of the Aspects of the passpass skill in RoboCup skill in RoboCup General: teammate must be openGeneral: teammate must be open Game-specific: where teammate should be locatedGame-specific: where teammate should be located Player-specific: whether teammate is nearest or Player-specific: whether teammate is nearest or

furthestfurthest

I’m open and near the goal.

Pass to me!

I’m open and far

from you. Pass to me!

Page 9: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Addressing the ChallengeAddressing the Challenge

We focus on learning We focus on learning generalgeneral skillskill aspectsaspects These should transfer betterThese should transfer better

We learn skills that We learn skills that apply toapply to multiple multiple playersplayers This generalizes over player-specific aspectsThis generalizes over player-specific aspects

We allow humans to We allow humans to provide informationprovide information They can point out game-specific aspectsThey can point out game-specific aspects

Page 10: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Human-Provided Human-Provided InformationInformation

User provides a User provides a mapping mapping to show task similaritiesto show task similarities

May also provide May also provide user adviceuser advice about task about task differencesdifferences

PassØØ

Pass towards goalMove towards goal

Shoot at goal

Page 11: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Our Transfer AlgorithmOur Transfer Algorithm

Observe source task games to learn skills

Create advice for thetarget task

Learn target taskwith KBKR

Translate learned skills into transfer

advice

If there is user advice,

add it in

Page 12: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Learning Skills By Learning Skills By ObservationObservation

Source-task games are sequences: (state, action)Source-task games are sequences: (state, action) Learning skills is like learning to classify states by Learning skills is like learning to classify states by

their correct actionstheir correct actions We use Inductive Logic Programming to learn classifiersWe use Inductive Logic Programming to learn classifiers

State 1:State 1:distBetween(me,teammate2) = 15distBetween(me,teammate2) = 15distBetween(me,teammate1) = 10distBetween(me,teammate1) = 10distBetween(me,opponent1) = 5distBetween(me,opponent1) = 5......action = pass(teammate2)action = pass(teammate2)outcome = caught(teammate2)outcome = caught(teammate2)

Page 13: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Advantages of ILPAdvantages of ILP

Can produce first-order rules for skillsCan produce first-order rules for skills Capture only the essential aspects of the Capture only the essential aspects of the

skillskill We expect these aspects to transfer betterWe expect these aspects to transfer better

Can incorporate background knowledgeCan incorporate background knowledge

pass(pass(TeammateTeammate))

pass(pass(teammate1teammate1))

pass(pass(teammateNteammateN))

vs. ...

Page 14: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Preparing Datasets for ILPPreparing Datasets for ILP

action = pass(Teammate) ?

outcome = caught(Teammate) ?

Q(pass) is high?

Q(pass) is highest?

Positive example for pass(Teammate)

yes

yes

yes

yes

Q(other) is high?

Q(pass) is lower?

Negative example for pass(Teammate)

no

yes

yes

Reject example

no

no

no

no

no

Page 15: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Example of a Skill LearnedExample of a Skill Learned

pass(pass(TeammateTeammate) :-) :-

distBetween(me, distBetween(me, TeammateTeammate) > 14,) > 14,

passAngle(passAngle(TeammateTeammate) > 30,) > 30,

passAngle(passAngle(TeammateTeammate) < 150,) < 150,

distBetween(me, distBetween(me, OpponentOpponent) < 7.) < 7.

Page 16: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

KBKR requires propositional adviceKBKR requires propositional advice We instantiate each rule headWe instantiate each rule head

Variables in rule bodies create Variables in rule bodies create disjunctionsdisjunctions We use We use tile featurestile features to translate them to translate them

Variables can appear multiple timesVariables can appear multiple times We create new features to translate themWe create new features to translate them

Technical ChallengesTechnical Challenges

Page 17: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Two Experimental ScenariosTwo Experimental Scenarios

PassØØ

Pass towards goalMove towards goal

Shoot at goal

4-on-3 MKA 3-on-2 BA

3-on-2 BA3-on-2 MD

PassMoveAhead

Ø

PassMoveAhead

Shoot at goal

Page 18: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Skill Transfer ResultsSkill Transfer Results

0

0.1

0.2

0.3

0.4

0.5

0.6

0 1000 2000 3000 4000 5000

Games Played

Pro

bab

ility

of

Go

al Without transfer

From MKA

From MD

Page 19: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Breakdown of MKA ResultsBreakdown of MKA Results

0

0.1

0.2

0.3

0.4

0.5

0.6

0 1000 2000 3000 4000 5000

Games Played

Pro

bab

ilit

y o

f G

oal

all advice

transfer advice onlyuser advice only

no advice

Page 20: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

What if User Advice is Bad?What if User Advice is Bad?

0

0.1

0.2

0.3

0.4

0.5

0.6

0 1000 2000 3000 4000 5000

Games Played

Pro

ba

bili

ty o

f G

oa

l

Transfer with good advice

Transfer with bad adviceBad advice only

No advice

Page 21: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Related WorkRelated Work

Q-function transfer in RoboCupQ-function transfer in RoboCup Taylor & Stone (AAMAS 2005, AAAI 2005)Taylor & Stone (AAMAS 2005, AAAI 2005)

Transfer via policy reuseTransfer via policy reuse Fernandez & Veloso (AAMAS 2006, ICML workshop Fernandez & Veloso (AAMAS 2006, ICML workshop

2006)2006) Madden & Howley (AI Review 2004)Madden & Howley (AI Review 2004)

Transfer via relational RLTransfer via relational RL Driessens et al. (ICML workshop 2006)Driessens et al. (ICML workshop 2006)

Page 22: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Summary of Summary of ContributionsContributions

Transfer of shared skills in high-level logicTransfer of shared skills in high-level logic Despite differences in shared skillsDespite differences in shared skills

Demonstration of the value of user guidanceDemonstration of the value of user guidance Easy to give and beneficialEasy to give and beneficial

Effective transfer in the RoboCup domainEffective transfer in the RoboCup domain Challenging and dissimilar tasksChallenging and dissimilar tasks

Page 23: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Future WorkFuture Work

Learn more general skills by Learn more general skills by combining multiple source taskscombining multiple source tasks

Compare several transfer methods on Compare several transfer methods on RoboCup scenarios of varying RoboCup scenarios of varying difficultydifficulty

Reach similar levels of transfer with Reach similar levels of transfer with less user inputless user input

Page 24: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

AcknowledgementsAcknowledgements

DARPA Grant HR0011-04-1-0007DARPA Grant HR0011-04-1-0007

US Naval Research Laboratory US Naval Research Laboratory Grant N00173-06-1-G002Grant N00173-06-1-G002

Thank You

Page 25: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

User AdviceUser AdviceIF: distBetween(me,goal) < 10 ANDIF: distBetween(me,goal) < 10 AND

angle(goal, me, goalie) > 40angle(goal, me, goalie) > 40

THEN: prefer shootTHEN: prefer shoot

IF: distBetween(me,goal) > 10IF: distBetween(me,goal) > 10

THEN: prefer move_aheadTHEN: prefer move_ahead

IF: IF: [transferred conditions] [transferred conditions] ANDAND distBetween(Teammate,goal) < distBetween(me,goal) distBetween(Teammate,goal) < distBetween(me,goal)

THEN: prefer pass(Teammate)THEN: prefer pass(Teammate)

This is the part

that came from

transfer

Page 26: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Feature TilingFeature Tiling

Original feature

Tiling #1Tiling #2

Tiling #8…

Tiling #9Tiling #10

Tiling #11

min value max value

(16 tiles)

(8 tiles)

(8 tiles)

Page 27: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Propositionalizing RulesPropositionalizing Rules

pass(pass(TeammateTeammate) :-) :-

distBetween(me, distBetween(me, TeammateTeammate) > 14,) > 14,

… …

Step 1: rule headStep 1: rule head

pass(pass(teammate1teammate1) :-) :-

distBetween(me, distBetween(me, teammate1teammate1) > ) > 14,14,

… …

pass(pass(teammateNteammateN) :-) :-

distBetween(me, distBetween(me, teammateNteammateN) > ) > 14,14,

… …

Page 28: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Propositionalizing RulesPropositionalizing Rules

distBetween(me, distBetween(me, OpponentOpponent) < 7) < 7

distBetween(me,distBetween(me,opponent1opponent1))[0,7][0,7] + … + distBetween(me, + … + distBetween(me,opponentNopponentN ) )[0,7][0,7] ≥≥ 1 1

Step 2: single-variable disjunctionsStep 2: single-variable disjunctions

distBetween(me,distBetween(me,opponent1opponent1) < 7 OR … OR distBetween(me,) < 7 OR … OR distBetween(me,opponentNopponentN) < 7) < 7

Page 29: Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

distBetween(me, distBetween(me, PlayerPlayer) > 14,) > 14,

distBetween(distBetween(PlayerPlayer, goal) < 10, goal) < 10

newFeature(newFeature(player1player1) + … + newFeature() + … + newFeature(playerNplayerN) ) ≥≥ 1 1

newFeature(Player) :-newFeature(Player) :-

Dist1 is distBetween(me, Dist1 is distBetween(me, PlayerPlayer),),

Dist2 is distBetween(Dist2 is distBetween(PlayerPlayer, goal),, goal),

Dist1 > 14, Dist2 < 10.Dist1 > 14, Dist2 < 10.

Add to target task feature space:

Step 3: linked-variable disjunctionsStep 3: linked-variable disjunctions

Propositionalizing RulesPropositionalizing Rules