University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement...

UniversityPaderbor

16 January 2009

RG Knowledge Based Systems

Hans Kleine Büning

Reinforcement LearningReinforcement Learning

Reinforcement Learning Prof. Dr. Hans

Kleine Büning 2

UniversityPaderbor

OutlineOutline

• Motivation• Applications• Markov Decision Processes• Q-learning• Examples

Kleine Büning 3

UniversityPaderbor

Kleine Büning 4

UniversityPaderbor

Reinforcement Learning: The Idea

• A way of programming agents by reward and punishment without specifying how the task is to be achieved

Kleine Büning 5

UniversityPaderbor

Learning to Ride a Bicycle

Environment

action

€€€€€€

Kleine Büning 6

UniversityPaderbor

• States:– Angle of handle bars

– Angular velocity of handle bars

– Angle of bicycle to vertical

– Angular velocity of bicycle to vertical

– Acceleration of angle of bicycle to vertical

Kleine Büning 7

UniversityPaderbor

Environment

action

€€€€€€

Kleine Büning 8

UniversityPaderbor

• Actions:– Torque to be applied to the

handle bars

– Displacement of the center of mass from the bicycle’s plan (in cm)

Kleine Büning 9

UniversityPaderbor

Environment

action

€€€€€€

Kleine Büning 10

UniversityPaderbor

Angle of bicycle to vertical is greater

than 12°

Reward = 0

Reward = -1

no yes

Kleine Büning 11

UniversityPaderbor

Learning To Ride a Bicycle

Reinforcement Learning

Kleine Büning 12

UniversityPaderbor

Reinforcement Learning: Applications

• Board Games– TD-Gammon program, based on reinforcement learning, has

become a world-class backgammon player

• Mobile Robot Controlling– Learning to Drive a Bicycle– Navigation– Pole-balancing– Acrobot

• Sequential Process Controlling– Elevator Dispatching

Kleine Büning 13

UniversityPaderbor

Key Features of Reinforcement Learning

• Learner is not told which actions to take• Trial and error search• Possibility of delayed reward:

– Sacrifice of short-term gains for greater long-term gains

• Explore/Exploit trade-off• Considers the whole problem of a goal-directed

agent interacting with an uncertain environment

Kleine Büning 14

UniversityPaderbor

The Agent-Environment Interaction

• Agent and environment interact at discrete time steps: t = 0,1, 2, …– Agent observes state at step t :

st 2 S

– produces action at step t: at 2 A

– gets resulting reward : rt +1 2 ℜ

– and resulting next state: st +1 2 S

Kleine Büning 15

UniversityPaderbor

The Agent’s Goal:

• Coarsely, the agent’s goal is to get as much reward as it

can over the long run

Policy is• a mapping from states to action s) = a

• Reinforcement learning methods specify how the agent changes its policy as a result of experience experience

Kleine Büning 16

UniversityPaderbor

Deterministic Markov Decision Process

Kleine Büning 17

UniversityPaderbor

Example

Kleine Büning 18

UniversityPaderbor

Example: Corresponding MDP

Kleine Büning 19

UniversityPaderbor

Kleine Büning 20

UniversityPaderbor

Kleine Büning 21

UniversityPaderbor

Example: Policy

Kleine Büning 22

UniversityPaderbor

Value of Policy and Rewards

Kleine Büning 23

UniversityPaderbor

Value of Policy and Agent’s Task

Kleine Büning 24

UniversityPaderbor

Nondeterministic Markov Decision Process

P = 0.1

Kleine Büning 25

UniversityPaderbor

Kleine Büning 26

UniversityPaderbor

Kleine Büning 27

UniversityPaderbor

Example with South-Easten Wind

Kleine Büning 28

UniversityPaderbor

Example with South-Easten Wind

Kleine Büning 29

UniversityPaderbor

Methods

Dynamic Programming

ValueFunction

Approximation+

DynamicProgramming

ReinforcementLearning,

Monte Carlo Methods

ValuationFunction

Approximation+

ReinforcementLearning

continuousstates

discrete states discrete statescontinuous

states

Model (reward function and transitionprobabilities) is known

Model (reward function or transitionprobabilities) is unknown

Kleine Büning 30

UniversityPaderbor

Q-learning Algorithm

Kleine Büning 31

UniversityPaderbor

Q-learning Algorithm

Kleine Büning 32

UniversityPaderbor

Example

Kleine Büning 33

UniversityPaderbor

Example: Q-table Initialization

Kleine Büning 34

UniversityPaderbor

Example: Episode 1

Kleine Büning 35

UniversityPaderbor

Example: Episode 1

Kleine Büning 36

UniversityPaderbor

Example: Episode 1

Kleine Büning 37

UniversityPaderbor

Example: Episode 1

Kleine Büning 38

UniversityPaderbor

Example: Episode 1

Kleine Büning 39

UniversityPaderbor

Example: Q-table

Kleine Büning 40

UniversityPaderbor

Example: Episode 1

Kleine Büning 41

UniversityPaderbor

Episode 1

Kleine Büning 42

UniversityPaderbor

Example: Q-table

Kleine Büning 43

UniversityPaderbor

Example: Episode 2

Kleine Büning 44

UniversityPaderbor

Example: Episode 2

Kleine Büning 45

UniversityPaderbor

Example: Episode 2

Kleine Büning 46

UniversityPaderbor

Example: Q-table after Convergence

Kleine Büning 47

UniversityPaderbor

Example: Value Function after Convergence

Kleine Büning 48

UniversityPaderbor

Example: Optimal Policy

Kleine Büning 49

UniversityPaderbor

Example: Optimal Policy

Kleine Büning 50

UniversityPaderbor

Q-learning

Kleine Büning 51

UniversityPaderbor

Convergence of Q-learning

Kleine Büning 52

UniversityPaderbor

Blackjack• Standard rules of blackjack hold• State space:

– element[0] - current value of player's hand (4-21)

– element[1] - value of dealer's face -up card (2-11)

– element[2] - player does not have usable ace (0/1)

• Starting states:– player has any 2 cards (uniformly

distributed), dealer has any 1 card (uniformly distributed)

• Actions: – HIT– STICK

• Rewards: – 1 for a loss– 0 for a draw– 1 for a win

Kleine Büning 53

UniversityPaderbor

Blackjack: Optimal Policy

Kleine Büning 54

UniversityPaderbor

Reinforcement Learning: Example

• States– Grids

• Actions– Left– Up– Right– Down

• Rewards– Bonus 20– Food 1– Predator -10– Empty grid -0.1

• Transition probabilities– 0.80 – agent goes where he

intends to go– 0.20 – to any other adjacent

grid or remains where it was (in case he is on the board of the grid world he goes to the other side)

Kleine Büning 55

UniversityPaderbor

Kleine Büning 56

UniversityPaderbor

Kleine Büning 57

UniversityPaderbor

Kleine Büning 58

UniversityPaderbor

Kleine Büning 59

UniversityPaderbor

Kleine Büning 60

UniversityPaderbor

Kleine Büning 61

UniversityPaderbor

Kleine Büning 62

UniversityPaderbor

Kleine Büning 63

UniversityPaderbor

Kleine Büning 64

UniversityPaderbor

Kleine Büning 65

UniversityPaderbor

Kleine Büning 66

UniversityPaderbor

Kleine Büning 67

UniversityPaderbor

Kleine Büning 68

UniversityPaderbor

Kleine Büning 69

UniversityPaderbor

Kleine Büning 70

UniversityPaderbor

Kleine Büning 71

UniversityPaderbor

Kleine Büning 72

UniversityPaderbor

Kleine Büning 73

UniversityPaderbor

Kleine Büning 74

UniversityPaderbor

Kleine Büning 75

UniversityPaderbor

Kleine Büning 76

UniversityPaderbor

Kleine Büning 77

UniversityPaderbor

Kleine Büning 78

UniversityPaderbor

Kleine Büning 79

UniversityPaderbor

University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement...

Documents

Transcript of University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement...

University of Paderborn Software Engineering Group E. Kindler, F. Nillies Petri Nets and the Real World E. Kindler, F. Nillies Universität Paderborn.

S ebastien Auroux´ - arXiv · Haitham A Paderborn University S ebastien Auroux´ Paderborn University Holger Karl Paderborn University Abstract When deploying resource-intensive

1 Universität Paderborn, Department Physik, 33095 Paderborn, GERMANY

Eine Kleine - Mozart

Democenter-Infrastructure Paderborn En

Der kleine Fiber Guide - FreeStone

PRIMECLUSTER Reliant Monitor Services (RMS) with Wizard ...€¦ · Configuration and Administration Guide Redakteur Fujitsu Siemens Computers GmbH Paderborn 33094 Paderborn e-mail:

© Software Engineering Research Group, Heinz Nixdorf Institute, University of Paderborn HEINZ NIXDORF INSTITUTE Universitiy of Paderborn Software Engineering.

Arnold Schönberg Sechs kleine ... - schoenberg.at

University of Paderborn Algorithms and Complexity ...

PRIMECLUSTER™ - Fujitsusoftware.fujitsu.com/jp/manual/linux_e/b515504h/j2uz5300/...Fujitsu Siemens Computers GmbH Paderborn 33094 Paderborn e-mail: email: manuals@fujitsu-siemens.com

University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning.

Thermodynamics - Uni Paderborn

Lecture 2 Threat Modeling - Uni Paderborn

Mareike Kleine UNC-CH 20120907

International Graduate School of Dynamic Intelligent Systems Machine Learning RG Knowledge Based Systems Hans Kleine Büning 15 July 2015.

Startup Marketing fürs kleine Budget

Welterusten kleine beer power point

kleine fabriek catalogue

Kleine Berlin Statistik 2014