Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker...

Skill Acquisition via Transfer Learning

and Advice TakingLisa Torrey, Jude Shavlik, Trevor Walker

University of Wisconsin-Madison, USA

Richard MaclinUniversity of Minnesota-Duluth, USA

Transfer LearningTransfer Learning

Agent learns Task A

Agent encounters related Task B

Agent discovers how tasks are related

So far the user

provides this info to the agent

Agent uses knowledge from Task A to learn Task B faster

Task A is the

source. Task B is

the target.

Transfer LearningTransfer Learning

The goal for the target task:

perf

orm

ance

training

with transferwithout transfer

Reinforcement Learning Reinforcement Learning OverviewOverview

Take an actionObserve world state

Receive a reward

Policy: choose the action with the highest Q-value in the current state

Use the rewards to

estimate the Q-values of actions in

states

Described by a set of features

Transfer in Reinforcement Transfer in Reinforcement LearningLearning

What knowledge will we transfer from the source?What knowledge will we transfer from the source? Q-functions (Taylor & Stone 2005)Q-functions (Taylor & Stone 2005) Policies (Torrey et al. 2005)Policies (Torrey et al. 2005) Skills Skills (this work)(this work)

How will we extract that knowledge from the source?How will we extract that knowledge from the source? From Q-functions (Torrey et al. 2005)From Q-functions (Torrey et al. 2005) From observed behavior From observed behavior (this work)(this work)

How will we apply that knowledge in the target?How will we apply that knowledge in the target? Model reuse (Taylor & Stone 2005)Model reuse (Taylor & Stone 2005) Advice takingAdvice taking (Torrey et al. 2005, this work) (Torrey et al. 2005, this work)

Advice TakingAdvice Taking

AdviceAdvice: instructions for the learner: instructions for the learner

IF: IF: conditioncondition

THEN: THEN: prefer actionprefer action

In these states

Qaction1 >

Qaction2

Complexity of Q-function

Apply advice as Apply advice as soft constraintssoft constraints (KBKR, 2005)(KBKR, 2005)

For each action, find the Q-function that minimizes:

Error on Training

Data

Disagreement with Advice+ +

Experimental Domain: Experimental Domain: RoboCupRoboCup

Keep the ballStone & Sutton 2001

KeepAway (KA/MKA)

Score a goalMaclin et al. 2005

BreakAway (BA)

MoveDownfield (MD)

Cross the line Torrey et al. 2006

Different objectives, but a transferable skill: passing to teammates

A Challenge for Skill A Challenge for Skill TransferTransfer

Shared skills are not exactly the sameShared skills are not exactly the same Skills have general and specific aspectsSkills have general and specific aspects

Aspects of the Aspects of the passpass skill in RoboCup skill in RoboCup General: teammate must be openGeneral: teammate must be open Game-specific: where teammate should be locatedGame-specific: where teammate should be located Player-specific: whether teammate is nearest or Player-specific: whether teammate is nearest or

furthestfurthest

I’m open and near the goal.

Pass to me!

I’m open and far

from you. Pass to me!

Addressing the ChallengeAddressing the Challenge

We focus on learning We focus on learning generalgeneral skillskill aspectsaspects These should transfer betterThese should transfer better

We learn skills that We learn skills that apply toapply to multiple multiple playersplayers This generalizes over player-specific aspectsThis generalizes over player-specific aspects

We allow humans to We allow humans to provide informationprovide information They can point out game-specific aspectsThey can point out game-specific aspects

Human-Provided Human-Provided InformationInformation

User provides a User provides a mapping mapping to show task similaritiesto show task similarities

May also provide May also provide user adviceuser advice about task about task differencesdifferences

PassØØ

Pass towards goalMove towards goal

Shoot at goal

Our Transfer AlgorithmOur Transfer Algorithm

Observe source task games to learn skills

Create advice for thetarget task

Learn target taskwith KBKR

Translate learned skills into transfer

advice

If there is user advice,

add it in

Learning Skills By Learning Skills By ObservationObservation

Source-task games are sequences: (state, action)Source-task games are sequences: (state, action) Learning skills is like learning to classify states by Learning skills is like learning to classify states by

their correct actionstheir correct actions We use Inductive Logic Programming to learn classifiersWe use Inductive Logic Programming to learn classifiers

State 1:State 1:distBetween(me,teammate2) = 15distBetween(me,teammate2) = 15distBetween(me,teammate1) = 10distBetween(me,teammate1) = 10distBetween(me,opponent1) = 5distBetween(me,opponent1) = 5......action = pass(teammate2)action = pass(teammate2)outcome = caught(teammate2)outcome = caught(teammate2)

Advantages of ILPAdvantages of ILP

Can produce first-order rules for skillsCan produce first-order rules for skills Capture only the essential aspects of the Capture only the essential aspects of the

skillskill We expect these aspects to transfer betterWe expect these aspects to transfer better

Can incorporate background knowledgeCan incorporate background knowledge

pass(pass(TeammateTeammate))

pass(pass(teammate1teammate1))

pass(pass(teammateNteammateN))

vs. ...

Preparing Datasets for ILPPreparing Datasets for ILP

action = pass(Teammate) ?

outcome = caught(Teammate) ?

Q(pass) is high?

Q(pass) is highest?

Positive example for pass(Teammate)

yes

yes

yes

yes

Q(other) is high?

Q(pass) is lower?

Negative example for pass(Teammate)

no

yes

yes

Reject example

no

no

no

no

no

Example of a Skill LearnedExample of a Skill Learned

pass(pass(TeammateTeammate) :-) :-

distBetween(me, distBetween(me, TeammateTeammate) > 14,) > 14,

passAngle(passAngle(TeammateTeammate) > 30,) > 30,

passAngle(passAngle(TeammateTeammate) < 150,) < 150,

distBetween(me, distBetween(me, OpponentOpponent) < 7.) < 7.

KBKR requires propositional adviceKBKR requires propositional advice We instantiate each rule headWe instantiate each rule head

Variables in rule bodies create Variables in rule bodies create disjunctionsdisjunctions We use We use tile featurestile features to translate them to translate them

Variables can appear multiple timesVariables can appear multiple times We create new features to translate themWe create new features to translate them

Technical ChallengesTechnical Challenges

Two Experimental ScenariosTwo Experimental Scenarios

PassØØ

Pass towards goalMove towards goal

Shoot at goal

4-on-3 MKA 3-on-2 BA

3-on-2 BA3-on-2 MD

PassMoveAhead

Ø

PassMoveAhead

Shoot at goal

Skill Transfer ResultsSkill Transfer Results

0

0.1

0.2

0.3

0.4

0.5

0.6

0 1000 2000 3000 4000 5000

Games Played

Pro

bab

ility

of

Go

al Without transfer

From MKA

From MD

Breakdown of MKA ResultsBreakdown of MKA Results

0

0.1

0.2

0.3

0.4

0.5

0.6

0 1000 2000 3000 4000 5000

Games Played

Pro

bab

ilit

y o

f G

oal

all advice

transfer advice onlyuser advice only

no advice

What if User Advice is Bad?What if User Advice is Bad?

0

0.1

0.2

0.3

0.4

0.5

0.6

0 1000 2000 3000 4000 5000

Games Played

Pro

ba

bili

ty o

f G

oa

l

Transfer with good advice

Transfer with bad adviceBad advice only

No advice

Related WorkRelated Work

Q-function transfer in RoboCupQ-function transfer in RoboCup Taylor & Stone (AAMAS 2005, AAAI 2005)Taylor & Stone (AAMAS 2005, AAAI 2005)

Transfer via policy reuseTransfer via policy reuse Fernandez & Veloso (AAMAS 2006, ICML workshop Fernandez & Veloso (AAMAS 2006, ICML workshop

2006)2006) Madden & Howley (AI Review 2004)Madden & Howley (AI Review 2004)

Transfer via relational RLTransfer via relational RL Driessens et al. (ICML workshop 2006)Driessens et al. (ICML workshop 2006)

Summary of Summary of ContributionsContributions

Transfer of shared skills in high-level logicTransfer of shared skills in high-level logic Despite differences in shared skillsDespite differences in shared skills

Demonstration of the value of user guidanceDemonstration of the value of user guidance Easy to give and beneficialEasy to give and beneficial

Effective transfer in the RoboCup domainEffective transfer in the RoboCup domain Challenging and dissimilar tasksChallenging and dissimilar tasks

Future WorkFuture Work

Learn more general skills by Learn more general skills by combining multiple source taskscombining multiple source tasks

Compare several transfer methods on Compare several transfer methods on RoboCup scenarios of varying RoboCup scenarios of varying difficultydifficulty

Reach similar levels of transfer with Reach similar levels of transfer with less user inputless user input

AcknowledgementsAcknowledgements

DARPA Grant HR0011-04-1-0007DARPA Grant HR0011-04-1-0007

US Naval Research Laboratory US Naval Research Laboratory Grant N00173-06-1-G002Grant N00173-06-1-G002

Thank You

User AdviceUser AdviceIF: distBetween(me,goal) < 10 ANDIF: distBetween(me,goal) < 10 AND

angle(goal, me, goalie) > 40angle(goal, me, goalie) > 40

THEN: prefer shootTHEN: prefer shoot

IF: distBetween(me,goal) > 10IF: distBetween(me,goal) > 10

THEN: prefer move_aheadTHEN: prefer move_ahead

IF: IF: [transferred conditions] [transferred conditions] ANDAND distBetween(Teammate,goal) < distBetween(me,goal) distBetween(Teammate,goal) < distBetween(me,goal)

THEN: prefer pass(Teammate)THEN: prefer pass(Teammate)

This is the part

that came from

transfer

Feature TilingFeature Tiling

Original feature

Tiling #1Tiling #2

Tiling #8…

Tiling #9Tiling #10

Tiling #11

…

min value max value

(16 tiles)

(8 tiles)

(8 tiles)

Propositionalizing RulesPropositionalizing Rules

pass(pass(TeammateTeammate) :-) :-

distBetween(me, distBetween(me, TeammateTeammate) > 14,) > 14,

… …

Step 1: rule headStep 1: rule head

pass(pass(teammate1teammate1) :-) :-

distBetween(me, distBetween(me, teammate1teammate1) > ) > 14,14,

… …

pass(pass(teammateNteammateN) :-) :-

distBetween(me, distBetween(me, teammateNteammateN) > ) > 14,14,

… …

…


distBetween(me, distBetween(me, OpponentOpponent) < 7) < 7

distBetween(me,distBetween(me,opponent1opponent1))[0,7][0,7] + … + distBetween(me, + … + distBetween(me,opponentNopponentN ) )[0,7][0,7] ≥≥ 1 1

Step 2: single-variable disjunctionsStep 2: single-variable disjunctions

distBetween(me,distBetween(me,opponent1opponent1) < 7 OR … OR distBetween(me,) < 7 OR … OR distBetween(me,opponentNopponentN) < 7) < 7

distBetween(me, distBetween(me, PlayerPlayer) > 14,) > 14,

distBetween(distBetween(PlayerPlayer, goal) < 10, goal) < 10

newFeature(newFeature(player1player1) + … + newFeature() + … + newFeature(playerNplayerN) ) ≥≥ 1 1

newFeature(Player) :-newFeature(Player) :-

Dist1 is distBetween(me, Dist1 is distBetween(me, PlayerPlayer),),

Dist2 is distBetween(Dist2 is distBetween(PlayerPlayer, goal),, goal),

Dist1 > 14, Dist2 < 10.Dist1 > 14, Dist2 < 10.

Add to target task feature space:

Step 3: linked-variable disjunctionsStep 3: linked-variable disjunctions


Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker...

Documents

Transcript of Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker...