University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement...

Post on 26-Mar-2015

212 views 0 download

Tags:

Transcript of University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement...

UniversityPaderbor

n

16 January 2009

RG Knowledge Based Systems

Hans Kleine Büning

Reinforcement LearningReinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 2

UniversityPaderbor

n

OutlineOutline

• Motivation• Applications• Markov Decision Processes• Q-learning• Examples

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 3

UniversityPaderbor

n

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 4

UniversityPaderbor

n

Reinforcement Learning: The Idea

• A way of programming agents by reward and punishment without specifying how the task is to be achieved

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 5

UniversityPaderbor

n

Learning to Ride a Bicycle

Environment

Environment

state

action

€€€€€€

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 6

UniversityPaderbor

n

Learning to Ride a Bicycle

• States:– Angle of handle bars

– Angular velocity of handle bars

– Angle of bicycle to vertical

– Angular velocity of bicycle to vertical

– Acceleration of angle of bicycle to vertical

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 7

UniversityPaderbor

n

Learning to Ride a Bicycle

Environment

Environment

state

action

€€€€€€

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 8

UniversityPaderbor

n

Learning to Ride a Bicycle

• Actions:– Torque to be applied to the

handle bars

– Displacement of the center of mass from the bicycle’s plan (in cm)

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 9

UniversityPaderbor

n

Learning to Ride a Bicycle

Environment

Environment

state

action

€€€€€€

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 10

UniversityPaderbor

n

Angle of bicycle to vertical is greater

than 12°

Reward = 0

Reward = -1

no yes

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 11

UniversityPaderbor

n

Learning To Ride a Bicycle

Reinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 12

UniversityPaderbor

n

Reinforcement Learning: Applications

• Board Games– TD-Gammon program, based on reinforcement learning, has

become a world-class backgammon player

• Mobile Robot Controlling– Learning to Drive a Bicycle– Navigation– Pole-balancing– Acrobot

• Sequential Process Controlling– Elevator Dispatching

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 13

UniversityPaderbor

n

Key Features of Reinforcement Learning

• Learner is not told which actions to take• Trial and error search• Possibility of delayed reward:

– Sacrifice of short-term gains for greater long-term gains

• Explore/Exploit trade-off• Considers the whole problem of a goal-directed

agent interacting with an uncertain environment

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 14

UniversityPaderbor

n

The Agent-Environment Interaction

• Agent and environment interact at discrete time steps: t = 0,1, 2, …– Agent observes state at step t :

st 2 S

– produces action at step t: at 2 A

– gets resulting reward : rt +1 2 ℜ

– and resulting next state: st +1 2 S

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 15

UniversityPaderbor

n

The Agent’s Goal:

• Coarsely, the agent’s goal is to get as much reward as it

can over the long run

Policy is• a mapping from states to action s) = a

• Reinforcement learning methods specify how the agent changes its policy as a result of experience experience

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 16

UniversityPaderbor

n

Deterministic Markov Decision Process

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 17

UniversityPaderbor

n

Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 18

UniversityPaderbor

n

Example: Corresponding MDP

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 19

UniversityPaderbor

n

Example: Corresponding MDP

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 20

UniversityPaderbor

n

Example: Corresponding MDP

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 21

UniversityPaderbor

n

Example: Policy

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 22

UniversityPaderbor

n

Value of Policy and Rewards

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 23

UniversityPaderbor

n

Value of Policy and Agent’s Task

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 24

UniversityPaderbor

n

Nondeterministic Markov Decision Process

P = 0

.8

P = 0.1

P = 0.1

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 25

UniversityPaderbor

n

Nondeterministic Markov Decision Process

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 26

UniversityPaderbor

n

Nondeterministic Markov Decision Process

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 27

UniversityPaderbor

n

Example with South-Easten Wind

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 28

UniversityPaderbor

n

Example with South-Easten Wind

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 29

UniversityPaderbor

n

Methods

Dynamic Programming

ValueFunction

Approximation+

DynamicProgramming

ReinforcementLearning,

Monte Carlo Methods

ValuationFunction

Approximation+

ReinforcementLearning

continuousstates

discrete states discrete statescontinuous

states

Model (reward function and transitionprobabilities) is known

Model (reward function or transitionprobabilities) is unknown

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 30

UniversityPaderbor

n

Q-learning Algorithm

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 31

UniversityPaderbor

n

Q-learning Algorithm

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 32

UniversityPaderbor

n

Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 33

UniversityPaderbor

n

Example: Q-table Initialization

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 34

UniversityPaderbor

n

Example: Episode 1

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 35

UniversityPaderbor

n

Example: Episode 1

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 36

UniversityPaderbor

n

Example: Episode 1

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 37

UniversityPaderbor

n

Example: Episode 1

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 38

UniversityPaderbor

n

Example: Episode 1

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 39

UniversityPaderbor

n

Example: Q-table

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 40

UniversityPaderbor

n

Example: Episode 1

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 41

UniversityPaderbor

n

Episode 1

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 42

UniversityPaderbor

n

Example: Q-table

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 43

UniversityPaderbor

n

Example: Episode 2

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 44

UniversityPaderbor

n

Example: Episode 2

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 45

UniversityPaderbor

n

Example: Episode 2

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 46

UniversityPaderbor

n

Example: Q-table after Convergence

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 47

UniversityPaderbor

n

Example: Value Function after Convergence

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 48

UniversityPaderbor

n

Example: Optimal Policy

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 49

UniversityPaderbor

n

Example: Optimal Policy

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 50

UniversityPaderbor

n

Q-learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 51

UniversityPaderbor

n

Convergence of Q-learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 52

UniversityPaderbor

n

Blackjack• Standard rules of blackjack hold• State space:

– element[0] - current value of player's hand (4-21)

– element[1] - value of dealer's face -up card (2-11)

– element[2] - player does not have usable ace (0/1)

• Starting states:– player has any 2 cards (uniformly

distributed), dealer has any 1 card (uniformly distributed)

• Actions: – HIT– STICK

• Rewards: – 1 for a loss– 0 for a draw– 1 for a win

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 53

UniversityPaderbor

n

Blackjack: Optimal Policy

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 54

UniversityPaderbor

n

Reinforcement Learning: Example

• States– Grids

• Actions– Left– Up– Right– Down

• Rewards– Bonus 20– Food 1– Predator -10– Empty grid -0.1

• Transition probabilities– 0.80 – agent goes where he

intends to go– 0.20 – to any other adjacent

grid or remains where it was (in case he is on the board of the grid world he goes to the other side)

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 55

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 56

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 57

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 58

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 59

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 60

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 61

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 62

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 63

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 64

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 65

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 66

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 67

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 68

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 69

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 70

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 71

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 72

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 73

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 74

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 75

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 76

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 77

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 78

UniversityPaderbor

n

Reinforcement Learning: Example

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 79

UniversityPaderbor

n

Reinforcement Learning: Example