University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement...

79
Universi ty Paderbor n 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning

Transcript of University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement...

Page 1: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

UniversityPaderbor

n

16 January 2009

RG Knowledge Based Systems

Hans Kleine Büning

Reinforcement LearningReinforcement Learning

Page 2: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 2

UniversityPaderbor

n

OutlineOutline

• Motivation• Applications• Markov Decision Processes• Q-learning• Examples

Page 3: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 3

UniversityPaderbor

n

Page 4: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 4

UniversityPaderbor

n

Reinforcement Learning: The Idea

• A way of programming agents by reward and punishment without specifying how the task is to be achieved

Page 5: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 5

UniversityPaderbor

n

Learning to Ride a Bicycle

Environment

Environment

state

action

€€€€€€

Page 6: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 6

UniversityPaderbor

n

Learning to Ride a Bicycle

• States:– Angle of handle bars

– Angular velocity of handle bars

– Angle of bicycle to vertical

– Angular velocity of bicycle to vertical

– Acceleration of angle of bicycle to vertical

Page 7: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 7

UniversityPaderbor

n

Learning to Ride a Bicycle

Environment

Environment

state

action

€€€€€€

Page 8: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 8

UniversityPaderbor

n

Learning to Ride a Bicycle

• Actions:– Torque to be applied to the

handle bars

– Displacement of the center of mass from the bicycle’s plan (in cm)

Page 9: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 9

UniversityPaderbor

n

Learning to Ride a Bicycle

Environment

Environment

state

action

€€€€€€

Page 10: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 10

UniversityPaderbor

n

Angle of bicycle to vertical is greater

than 12°

Reward = 0

Reward = -1

no yes

Page 11: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 11

UniversityPaderbor

n

Learning To Ride a Bicycle

Reinforcement Learning

Page 12: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 12

UniversityPaderbor

n

Reinforcement Learning: Applications

• Board Games– TD-Gammon program, based on reinforcement learning, has

become a world-class backgammon player

• Mobile Robot Controlling– Learning to Drive a Bicycle– Navigation– Pole-balancing– Acrobot

• Sequential Process Controlling– Elevator Dispatching

Page 13: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 13

UniversityPaderbor

n

Key Features of Reinforcement Learning

• Learner is not told which actions to take• Trial and error search• Possibility of delayed reward:

– Sacrifice of short-term gains for greater long-term gains

• Explore/Exploit trade-off• Considers the whole problem of a goal-directed

agent interacting with an uncertain environment

Page 14: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 14

UniversityPaderbor

n

The Agent-Environment Interaction

• Agent and environment interact at discrete time steps: t = 0,1, 2, …– Agent observes state at step t :

st 2 S

– produces action at step t: at 2 A

– gets resulting reward : rt +1 2 ℜ

– and resulting next state: st +1 2 S

Page 15: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 15

UniversityPaderbor

n

The Agent’s Goal:

• Coarsely, the agent’s goal is to get as much reward as it

can over the long run

Policy is• a mapping from states to action s) = a

• Reinforcement learning methods specify how the agent changes its policy as a result of experience experience

Page 16: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 16

UniversityPaderbor

n

Deterministic Markov Decision Process

Page 17: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 17

UniversityPaderbor

n

Example

Page 18: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 18

UniversityPaderbor

n

Example: Corresponding MDP

Page 19: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 19

UniversityPaderbor

n

Example: Corresponding MDP

Page 20: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 20

UniversityPaderbor

n

Example: Corresponding MDP

Page 21: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 21

UniversityPaderbor

n

Example: Policy

Page 22: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 22

UniversityPaderbor

n

Value of Policy and Rewards

Page 23: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 23

UniversityPaderbor

n

Value of Policy and Agent’s Task

Page 24: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 24

UniversityPaderbor

n

Nondeterministic Markov Decision Process

P = 0

.8

P = 0.1

P = 0.1

Page 25: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 25

UniversityPaderbor

n

Nondeterministic Markov Decision Process

Page 26: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 26

UniversityPaderbor

n

Nondeterministic Markov Decision Process

Page 27: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 27

UniversityPaderbor

n

Example with South-Easten Wind

Page 28: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 28

UniversityPaderbor

n

Example with South-Easten Wind

Page 29: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 29

UniversityPaderbor

n

Methods

Dynamic Programming

ValueFunction

Approximation+

DynamicProgramming

ReinforcementLearning,

Monte Carlo Methods

ValuationFunction

Approximation+

ReinforcementLearning

continuousstates

discrete states discrete statescontinuous

states

Model (reward function and transitionprobabilities) is known

Model (reward function or transitionprobabilities) is unknown

Page 30: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 30

UniversityPaderbor

n

Q-learning Algorithm

Page 31: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 31

UniversityPaderbor

n

Q-learning Algorithm

Page 32: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 32

UniversityPaderbor

n

Example

Page 33: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 33

UniversityPaderbor

n

Example: Q-table Initialization

Page 34: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 34

UniversityPaderbor

n

Example: Episode 1

Page 35: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 35

UniversityPaderbor

n

Example: Episode 1

Page 36: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 36

UniversityPaderbor

n

Example: Episode 1

Page 37: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 37

UniversityPaderbor

n

Example: Episode 1

Page 38: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 38

UniversityPaderbor

n

Example: Episode 1

Page 39: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 39

UniversityPaderbor

n

Example: Q-table

Page 40: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 40

UniversityPaderbor

n

Example: Episode 1

Page 41: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 41

UniversityPaderbor

n

Episode 1

Page 42: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 42

UniversityPaderbor

n

Example: Q-table

Page 43: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 43

UniversityPaderbor

n

Example: Episode 2

Page 44: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 44

UniversityPaderbor

n

Example: Episode 2

Page 45: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 45

UniversityPaderbor

n

Example: Episode 2

Page 46: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 46

UniversityPaderbor

n

Example: Q-table after Convergence

Page 47: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 47

UniversityPaderbor

n

Example: Value Function after Convergence

Page 48: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 48

UniversityPaderbor

n

Example: Optimal Policy

Page 49: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 49

UniversityPaderbor

n

Example: Optimal Policy

Page 50: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 50

UniversityPaderbor

n

Q-learning

Page 51: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 51

UniversityPaderbor

n

Convergence of Q-learning

Page 52: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 52

UniversityPaderbor

n

Blackjack• Standard rules of blackjack hold• State space:

– element[0] - current value of player's hand (4-21)

– element[1] - value of dealer's face -up card (2-11)

– element[2] - player does not have usable ace (0/1)

• Starting states:– player has any 2 cards (uniformly

distributed), dealer has any 1 card (uniformly distributed)

• Actions: – HIT– STICK

• Rewards: – 1 for a loss– 0 for a draw– 1 for a win

Page 53: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 53

UniversityPaderbor

n

Blackjack: Optimal Policy

Page 54: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 54

UniversityPaderbor

n

Reinforcement Learning: Example

• States– Grids

• Actions– Left– Up– Right– Down

• Rewards– Bonus 20– Food 1– Predator -10– Empty grid -0.1

• Transition probabilities– 0.80 – agent goes where he

intends to go– 0.20 – to any other adjacent

grid or remains where it was (in case he is on the board of the grid world he goes to the other side)

Page 55: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 55

UniversityPaderbor

n

Reinforcement Learning: Example

Page 56: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 56

UniversityPaderbor

n

Reinforcement Learning: Example

Page 57: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 57

UniversityPaderbor

n

Reinforcement Learning: Example

Page 58: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 58

UniversityPaderbor

n

Reinforcement Learning: Example

Page 59: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 59

UniversityPaderbor

n

Reinforcement Learning: Example

Page 60: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 60

UniversityPaderbor

n

Reinforcement Learning: Example

Page 61: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 61

UniversityPaderbor

n

Reinforcement Learning: Example

Page 62: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 62

UniversityPaderbor

n

Reinforcement Learning: Example

Page 63: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 63

UniversityPaderbor

n

Reinforcement Learning: Example

Page 64: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 64

UniversityPaderbor

n

Reinforcement Learning: Example

Page 65: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 65

UniversityPaderbor

n

Reinforcement Learning: Example

Page 66: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 66

UniversityPaderbor

n

Reinforcement Learning: Example

Page 67: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 67

UniversityPaderbor

n

Reinforcement Learning: Example

Page 68: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 68

UniversityPaderbor

n

Reinforcement Learning: Example

Page 69: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 69

UniversityPaderbor

n

Reinforcement Learning: Example

Page 70: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 70

UniversityPaderbor

n

Reinforcement Learning: Example

Page 71: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 71

UniversityPaderbor

n

Reinforcement Learning: Example

Page 72: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 72

UniversityPaderbor

n

Reinforcement Learning: Example

Page 73: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 73

UniversityPaderbor

n

Reinforcement Learning: Example

Page 74: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 74

UniversityPaderbor

n

Reinforcement Learning: Example

Page 75: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 75

UniversityPaderbor

n

Reinforcement Learning: Example

Page 76: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 76

UniversityPaderbor

n

Reinforcement Learning: Example

Page 77: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 77

UniversityPaderbor

n

Reinforcement Learning: Example

Page 78: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 78

UniversityPaderbor

n

Reinforcement Learning: Example

Page 79: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 79

UniversityPaderbor

n

Reinforcement Learning: Example