Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background...

36

date post
15-Jan-2016
Category

Documents
view
220
download
0

TAGS:

Embed Size (px):

Transcript of Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background...

Page 1: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Page 2: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Outline

• MDP (brief)– Background– Learning MDP

• Q learning

• Game theory (brief)– Background

• Markov games (2-player)– Background– Learning Markov games

• Littman’s Minimax Q learning (zero-sum)• Hu & Wellman’s Nash Q learning (general-sum)

Page 3: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

/ SG/ POSG

Stochastic games (SG)

Partially observable SG (POSG)

Page 4: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Immediate reward

Expectation over next states

Value of next state

Page 5: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

• Model-based reinforcement learning:1. Learn the reward function and the state transition function

2. Solve for the optimal policy

• Model-free reinforcement learning:1. Directly learn the optimal policy without knowing the reward

function or the state transition function

Page 6: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

#times action a has been executed in state s

#times action a causes state transition s s’

Total reward accrued when applying a in s

Page 7: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

v(s’)

Page 8: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

1. Start with arbitrary initial values of Q(s,a), for all sS, aA

2. At each time t the agent chooses an action and observes its reward rt

3. The agent then updates its Q-values based on the Q-learning rule

4. The learning rate t needs to decay over time in order for the learning algorithm to converge

Page 9: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Page 10: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Famous game theory example

Page 11: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Page 12: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Page 13: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

A co-operative game

Page 14: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Page 15: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Page 16: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Mixed strategy

Generalization of MDP

Page 17: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Page 18: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Stationary: the agent’s policy does not change over time

Deterministic: the same action is always chosen whenever the agent is in state s

Page 19: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Example

0 1 -1

-1 0 1

1 -1 0

1 -1

-1 1

2 1 1

1 2 1

1 1 2State 1

State 2

1 1

1 1

Page 20: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

v(s,*) v(s,) for all s S,

Page 21: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Max V

Such that: rock + paper + scissors = 1

Page 22: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Best response

Worst case

Expectation over all actions

Page 23: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Page 24: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Quality of a state-action pair

Discounted value of all succeeding states weighted by their likelihood

Discounted value of all succeeding states

This learning rule converges to the correct values of Q and v

Page 25: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

eplor controls how often the agent will deviate from its current policy

Expected reward for taking

action a when opponent chooses o from state s

Page 26: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Page 27: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Page 28: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Page 29: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Page 30: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Page 31: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Hu and Wellman general-sum Markov games as a framework for RL

Theorem (Nash, 1951) There exists a mixed strategy Nash equilibrium for any finite bimatrix game

Page 32: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Page 33: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Page 34: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Page 35: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Page 36: Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Markov Systems and Markov Decision Processes...States, Actions, Observations Passive Controlled Fully Observable Markov Model Markov Decision Process (MDP) Hidden State Hidden Markov

Markov Systems and Markov Decision Processes...States, Actions, Observations Passive Controlled Fully Observable Markov Model Markov Decision Process (MDP) Hidden State Hidden Markov

Modern Discrete Probability I - Introduction (continued) Review of …roch/mdp/roch-mdp-slides-chap1b.pdf · 2020. 9. 25. · Review of Markov chain theory Application to Gibbs sampling

Modern Discrete Probability I - Introduction (continued) Review of …roch/mdp/roch-mdp-slides-chap1b.pdf · 2020. 9. 25. · Review of Markov chain theory Application to Gibbs sampling

A Markov Decision Process (MDP) framework for active localizationweb2py.iiit.ac.in/publications/default/download/masters... · 2011. 9. 28. · CERTIFICATE It is certiﬁed that the

A Markov Decision Process (MDP) framework for active localizationweb2py.iiit.ac.in/publications/default/download/masters... · 2011. 9. 28. · CERTIFICATE It is certiﬁed that the

Probabilistic Relational Planning with First Order ... · 2.1 Relational Markov Decision Processes A Markov decision process (MDP) is a mathematical model of the interaction between

Probabilistic Relational Planning with First Order ... · 2.1 Relational Markov Decision Processes A Markov decision process (MDP) is a mathematical model of the interaction between

MDP Secretariat Room 126, Building 1070 MDP HQ ...

MDP Secretariat Room 126, Building 1070 MDP HQ ...

The Markov Decision Problem - uni-freiburg.deais.informatik.uni-freiburg.de/teaching/ss03/ams/DecisionProblems.pdf · Markov Decision Problem (MDP) Compute the optimal policy in an

The Markov Decision Problem - uni-freiburg.deais.informatik.uni-freiburg.de/teaching/ss03/ams/DecisionProblems.pdf · Markov Decision Problem (MDP) Compute the optimal policy in an

Markov Decision Process (MDP) Framework for Software Power …liu/paper/daem10_eric.pdf · 2010-05-19 · to application development, which is prevalent in the personal computer environment,

Markov Decision Process (MDP) Framework for Software Power …liu/paper/daem10_eric.pdf · 2010-05-19 · to application development, which is prevalent in the personal computer environment,

1 Planning under Uncertainty. Today’s Topics Sequential Decision Problems Markov Decision Process (MDP) Value Iteration Policy Iteration Partially Observable.

1 Planning under Uncertainty. Today’s Topics Sequential Decision Problems Markov Decision Process (MDP) Value Iteration Policy Iteration Partially Observable.

Finite Markov Decision Processes - Tulane Universityzzheng3/teaching/cmps6660/fall20/mdp.pdf · 2020. 9. 22. · Markov Decision Process •A Markov decision process (MDP) is a Markov

Finite Markov Decision Processes - Tulane Universityzzheng3/teaching/cmps6660/fall20/mdp.pdf · 2020. 9. 22. · Markov Decision Process •A Markov decision process (MDP) is a Markov

Markov Chains, Markov Decision Processes (MDP), Reinforcement Learning: A Quick Introduction

Markov Chains, Markov Decision Processes (MDP), Reinforcement Learning: A Quick Introduction

MDP Secretariat Room 126, Building 1070 MDP HQ ......MDP Secretariat Room 126, Building 1070 MDP HQ Wethersfield Braintree, Essex CM7 4AZ Tel: 01371 85 Fax: 01371 854080 E-mail: MDP-FOI-DP@mod.uk

MDP Secretariat Room 126, Building 1070 MDP HQ ......MDP Secretariat Room 126, Building 1070 MDP HQ Wethersfield Braintree, Essex CM7 4AZ Tel: 01371 85 Fax: 01371 854080 E-mail: [email protected]

IEOR 8100 Reinforcement Learning• Chat bots • Agent figuring out how to make a conversation • Dialogue generation, natural language processing Modeling foundation: MDP • Markov

IEOR 8100 Reinforcement Learning• Chat bots • Agent figuring out how to make a conversation • Dialogue generation, natural language processing Modeling foundation: MDP • Markov

Markov Decision Process (MDP)

Markov Decision Process (MDP)

Markov Decision Processes CSE 573. Add concrete MDP example No need to discuss strips or factored models Matrix ok until much later Example key (e.g.

Markov Decision Processes CSE 573. Add concrete MDP example No need to discuss strips or factored models Matrix ok until much later Example key (e.g.

Markov Decision Process (MDP) Framework for Software …liu/paper/daem10_eric.pdf · Markov Decision Process (MDP) Framework for Software Power Optimization using Call Proﬁles on

Markov Decision Process (MDP) Framework for Software …liu/paper/daem10_eric.pdf · Markov Decision Process (MDP) Framework for Software Power Optimization using Call Proﬁles on

Markov Decision Processes: Making Decision in the · PDF file• Control and action model learning ... – Prefers expedient solutions ... Markov Decision Process (MDP) • Key property

Markov Decision Processes: Making Decision in the · PDF file• Control and action model learning ... – Prefers expedient solutions ... Markov Decision Process (MDP) • Key property

Solving multi-objective optimization problems in ...piak/teaching/ec/... · any problem using a Markov decision process (MDP) formalism. 2 Materials and methods 2.1 Multi-objective

Solving multi-objective optimization problems in ...piak/teaching/ec/... · any problem using a Markov decision process (MDP) formalism. 2 Materials and methods 2.1 Multi-objective

Reinforcement Learningchercheurs.lille.inria.fr/ekaufman/RLCours1.pdf · Outline of the class Lecture 1. Markov Decision Processes (MDP), a formalization for reinforcement learning

Reinforcement Learningchercheurs.lille.inria.fr/ekaufman/RLCours1.pdf · Outline of the class Lecture 1. Markov Decision Processes (MDP), a formalization for reinforcement learning

Collaborative Data Scheduling With Joint Forward and ...rockey/Papers/SatelliteDataSchedulingJournal.pdfhorizon discrete Markov decision process (MDP) and propose a joint forward and

Collaborative Data Scheduling With Joint Forward and ...rockey/Papers/SatelliteDataSchedulingJournal.pdfhorizon discrete Markov decision process (MDP) and propose a joint forward and

Markov Decision Processes (MDPs) (cont.)guestrin/Class/10701/slides/mdps-rl.pdf · Markov Decision Process (MDP) Representation State space: Joint state x of entire system Action

Markov Decision Processes (MDPs) (cont.)guestrin/Class/10701/slides/mdps-rl.pdf · Markov Decision Process (MDP) Representation State space: Joint state x of entire system Action

Languages

Pages

Legal

Copyright © 2022 FDOCUMENTS