Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background...

Outline

• MDP (brief)– Background– Learning MDP

• Q learning

• Game theory (brief)– Background

• Markov games (2-player)– Background– Learning Markov games

• Littman’s Minimax Q learning (zero-sum)• Hu & Wellman’s Nash Q learning (general-sum)

/ SG/ POSG

Stochastic games (SG)

Partially observable SG (POSG)

Immediate reward

Expectation over next states

Value of next state

• Model-based reinforcement learning:1. Learn the reward function and the state transition function

2. Solve for the optimal policy

• Model-free reinforcement learning:1. Directly learn the optimal policy without knowing the reward

function or the state transition function

#times action a has been executed in state s

#times action a causes state transition s s’

Total reward accrued when applying a in s

v(s’)

1. Start with arbitrary initial values of Q(s,a), for all sS, aA

2. At each time t the agent chooses an action and observes its reward rt

3. The agent then updates its Q-values based on the Q-learning rule

4. The learning rate t needs to decay over time in order for the learning algorithm to converge

Famous game theory example

A co-operative game

Mixed strategy

Generalization of MDP

Stationary: the agent’s policy does not change over time

Deterministic: the same action is always chosen whenever the agent is in state s

Example

0 1 -1

-1 0 1

1 -1 0

1 1 2State 1

State 2

v(s,*) v(s,) for all s S,

Such that: rock + paper + scissors = 1

Best response

Worst case

Expectation over all actions

Quality of a state-action pair

Discounted value of all succeeding states weighted by their likelihood

Discounted value of all succeeding states

This learning rule converges to the correct values of Q and v

eplor controls how often the agent will deviate from its current policy

Expected reward for taking

action a when opponent chooses o from state s

Hu and Wellman general-sum Markov games as a framework for RL

Theorem (Nash, 1951) There exists a mixed strategy Nash equilibrium for any finite bimatrix game

Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background...

Documents

Transcript of Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background...

Overview Decision processes and Markov Decision Processes (MDP) Rewards and Optimal Policies Defining features of Markov Decision Process Solving.

Finite Markov Decision Processes - Tulane Universityzzheng3/teaching/cmps6660/fall20/mdp.pdf · 2020. 9. 22. · Markov Decision Process •A Markov decision process (MDP) is a Markov

Decision Making in Robots and Autonomous Agents Decision Making in Robots and Autonomous Agents The Markov Decision Process (MDP) model Subramanian Ramamoorthy.

RL Reinforcement Learning Pietquin OlivierOlivier Pietquin Introduction MDP Long term vision Policy Value Function Dynamic Programming Markov Decision Processes (MDP) De nition (MDP)

Markov Decision Process (MDP)

DYNAMIC POWER MANAGEMENT BY … · · 2017-02-10DPM Dynamic Power Management DVFS Dynamic Voltage Frequency Scaling MDP Markov Decision Process Model RL Reinforcement Learning SMDP

A Markov Decision Process (MDP) framework for active localizationweb2py.iiit.ac.in/publications/default/download/masters... · 2011. 9. 28. · CERTIFICATE It is certiﬁed that the

Modern Discrete Probability I - Introduction (continued) Review of …roch/mdp/roch-mdp-slides-chap1b.pdf · 2020. 9. 25. · Review of Markov chain theory Application to Gibbs sampling

Bayesian Belief Propagation Reading Group. Overview Problem Background Bayesian Modelling Bayesian Modelling Markov Random Fields Markov Random Fields.

Introduction to Markov Decision Processes...environments known as Markov decision processes (MDPs). An MDP is a tuple, (S , A, P a ss0, R a ss0, ⇥ ), where S is a set of states,

Behrouz Haji Soleimani Dr. Moradi. Outline What is uncertainty? Some examples Solutions to uncertainty Ignoring uncertainty Markov Decision Process (MDP)

Extracting Knowledge from Evaluative Text · Partially Observable MDP First Order Logics Markov Chains and HMMs Ontologies Applications of AI Approx. : Gibbs ... (Fraction of redundant

Probabilistic Relational Planning with First Order ... · 2.1 Relational Markov Decision Processes A Markov decision process (MDP) is a mathematical model of the interaction between

MODEL MDP-IOOJ MDP-200J MDP-400J ¥ 1280,000 ¥ 1 , 780,000 ... · model mdp-iooj mdp-200j mdp-400j ¥ 1280,000 ¥ 1 , 780,000 mdp-400j model

Factored Markov Decision Processes - sorbonne-universite · 2010-06-18 · Factored Markov Decision Processes 4.1. Introduction Solution methods described in the MDP framework (Chapters

Markov Decision Processesafern/classes/cs...What is a solution to an MDP? MDP Planning Problem: Input: an MDP (S,A,R,T) Output: a policy that achieves an “optimal value” This depends

16-mdp - University of Washington · 2012-04-30 · • Uncertainty & Bayesian Networks • Machine Learning •NLP & Special Topics MDPs Markov Decision Processes • Planning Under

Markov Decision Process (MDP) Framework for Software …liu/paper/daem10_eric.pdf · Markov Decision Process (MDP) Framework for Software Power Optimization using Call Proﬁles on

A CONVEX ANALYTIC APPROACH TO RISK-AWARE MARKOV …pdfs.semanticscholar.org/e9f8/323939fa5e165ea22b5db4c6be32b34453e4.pdfIn classical Markov decision process (MDP) theory, we search

Collaborative Data Scheduling With Joint Forward and ...rockey/Papers/SatelliteDataSchedulingJournal.pdfhorizon discrete Markov decision process (MDP) and propose a joint forward and