Relational Transfer in Reinforcement Learning

Lisa Torrey

University of Wisconsin – Madison

CS 540

Transfer Learning

EducationHierarchical curriculum

Learning tasks share common stimulus-response elements

Abstract problem-solvingLearning tasks share general underlying principles

MultilingualismKnowing one language affects learning in

anotherTransfer can be both positive and negative

Transfer Learning in Humans

Transfer Learning in AI

Given

Learn

Task T

Task S

Goals of Transfer Learning

perf

orm

an

ce

training

higher start

higher slope

higher asymptote

Inductive Learning

All Hypotheses

Allowed Hypotheses

Search

Transfer in Inductive Learning

All Hypotheses

Allowed Hypotheses

Search

Thrun and Mitchell 1995: Transfer slopes for gradient descent


Bayesian Learning

Bayesian Transfer

Priordistribution

+

Data

=

Posterior Distributio

n

Bayesian methods

Raina et al.2006: Transfer a Gaussian prior


Line Curve

Surface Circle

Pipe

Hierarchical methods

Stracuzzi 2006: Learn Boolean concepts that can depend on each other


Dealing with Missing Data or Labels

Shi et al. 2008: Transfer via active learning

Task S

Task T

Reinforcement Learning

Environment

s1

AgentQ(s1, a) =

0π(s1) = a1a

1

s2

r2

δ(s1, a1) = s2

r(s1, a1) = r2

Q(s1, a1) Q(s1, a1) + Δ

π(s2) = a2a2

δ(s2, a2) = s3

r(s2, a2) = r3

s3

r3

Transfer in Reinforcement Learning

Starting-point

methods


Alterationmethods

Imitation methods

New RL algorithms


0 0 0 0

0 0 0 0

0 0 0 0 target-task training

2 5 4 8

9 1 7 2

5 9 1 4

Initial Q-tabletransferno transfer

Source task

Starting-point methods

Taylor et al. 2005: Value-function transfer



Run Kick

Pass Shoot

Soccer

Mehta et al. 2008: Transfer a learned hierarchy


Alteration methods

Walsh et al. 2006: Transfer aggregate states

Task S

Original statesOriginal actionsOriginal rewards

New statesNew actionsNew rewards


New RL Algorithms

Torrey et al. 2006: Transfer advice about skills

Environment

s1

AgentQ(s1, a) =

0π(s1) = a1a

1

s2r2

δ(s1, a1) = s2

r(s1, a1) = r2

Q(s1, a1) Q(s1, a1) + Δ

π(s2) = a2a2

δ(s2, a2) = s3

r(s2, a2) = r3

s3r3


Imitation methods

training

source

target

policy used

Torrey et al. 2007: Demonstrate a strategy

My Research

Starting-point

methods

Imitation methods



New RL algorithms

SkillTransf

er

MacroTransf

er

RoboCup Domain

3-on-2 BreakAway

3-on-2 KeepAway

3-on-2 MoveDownfield

2-on-1 BreakAway

Inductive Logic Programming

IF [ ]THEN pass(Teammate)

IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 15 THEN pass(Teammate)

IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate)

IF distance(Teammate) ≤ 5 THEN pass(Teammate)

IF distance(Teammate) ≤ 10 THEN pass(Teammate)

…

Advice Taking

Find Q-functions that minimize: ModelSize + C × DataMisfit

Batch Reinforcement Learning via Support Vector Regression (RL-SVR)

Environment

Agent

Batch 1

Environment

Agent

Batch 2

…Compute

Q-functions

Advice Taking

Find Q-functions that minimize: ModelSize + C × DataMisfit

Batch Reinforcement Learning with Advice (KBKR)

Environment

Agent

Batch 1

Compute Q-

functions Environment

Agent

Batch 2

…

Advice

+ µ × AdviceMisfit

Skill Transfer Algorithm

Source

Target

IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30THEN pass(Teammate)

ILP

Advice Taking

[Human advice]

Mapping

Selected ResultsSkill transfer to 3-on-2 BreakAway from several tasks

Macro-Operators

pass(Teammate)

move(Direction)

shoot(goalRight)

shoot(goalLeft)

IF [ ... ] THEN pass(Teammate)

IF [ ... ] THEN move(ahead)

IF [ ... ] THEN shoot(goalRight)

IF [ ... ] THEN shoot(goalLeft)

IF [ ... ] THEN pass(Teammate)

IF [ ... ] THEN move(left)



Demonstration

source

target

training

policy used

An imitation method

Macro Transfer AlgorithmSourc

e

Target

ILP

Demonstration

Macro Transfer AlgorithmLearning structures

Positive: BreakAway

games that score

Negative: BreakAway games that didn’t score

ILP

IF actionTaken(Game, StateA, pass(Teammate), StateB) actionTaken(Game, StateB, move(Direction), StateC) actionTaken(Game, StateC, shoot(goalRight), StateD) actionTaken(Game, StateD, shoot(goalLeft), StateE)

THEN isaGoodGame(Game)

Macro Transfer AlgorithmLearning rules for arcs

Positive: states in good games

that took the arc

Negative: states in good games that could have taken the arc but didn’t

ILP

shoot(goalRight)

IF [ … ]THEN enter(State)

IF [ … ]THEN loop(State, Teammate))

pass(Teammate)

Selected ResultsMacro transfer to 3-on-2 BreakAway from 2-on-1 BreakAway

Machine learning is often designed in standalone tasks

Transfer is a natural learning ability that we would like to incorporate into machine learners

There are some successes, but challenges remain, like avoiding negative transfer and automating mapping

Summary

Relational Transfer in Reinforcement Learning

Documents

Transcript of Relational Transfer in Reinforcement Learning