KI2 - 10 Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.

KI2 - 10

Kunstmatige Intelligentie / RuG

Markov Decision Processes

AIMA, Chapter 17

Markov Decision Problem

How to use knowledge about the world to make decision even when the outcomes of an action are uncertain and the payoffs will not be obtained until several (or many) actions have passed.

The Solution

Sequential decision problems in uncertain environments can be solved by calculating a policy that associates an optimal decision with every state that the agent might reach

=> Markov Decision Process (MDP)

Example

1 2 3 4

The world Actions have uncertain consequences

Utility of a State Sequence

Additive rewards

Discounted rewards

...)()()(...]),,([ 22

10210 sRsRsRsssU h

...)()()(...]),,([ 210210 sRsRsRsssU h

The utility of each state is the expected sum of discounted rewards if the agent executes the policy

The true utility of a state corresponds to the optimal policy *

Utility of a State

sssREsUt

Algorithms forCalculating the Optimal Policy

Value iteration

Policy iteration

Calculate the utility of each state

Then use the state utilities to select an optimal action in each state

Value Iteration

)(),,()( //* maxargsa

sUsasTs

Value Iteration Algorithm

function value-iteration(MDP) returns a utility function local variables: U, U’ initially identical to R repeat U U’ for each state s do

end until close-enough(U, U’) return U

)(),,()()( //maxsa

sUsasTsRsU

Bellman update

The utilities of the states by value iteration algorithm

The Utilities of the States ObtainedAfter Value Iteration

1 2 3 4

0.705 0.655 0.611 0.388

0.762 0.660

0.9120.8680.812

Policy Iteration

Pick a policy, then calculate the utility of each state given that policy (value determination step)

Update the policy at each state using the utilities of the successor states

Repeat until the policy stabilizes

Policy Iteration Algorithm

function policy-iteration(MDP) returns a policy local variables: U, a utility function, , a policy repeat U value-determination(,U,MDP,R) unchanged? true for each state s do

unchanged? false end until unchanged? return

)(),,()( //maxargsa

sUsasTs

thensUsssTsUsasTifssa

)()),(,()(),,( ////max

Value Determination

Simplification of the value iteration algorithm because the policy is fixed

Linear equations because the max() operator has been removed

Solve exactly for the utilities using standard linear algebra

1 2 3 4

u(1,1) = 0.8 u(1,2) + 0.1 u(1,2) + 0.1 u(1,1)

u(1,2) = 0.8 u(1,3) + 0.2 u(1,2)

Optimal Policy(policy iteration with 11 linear equations)

Partially observable MDP (POMDP)

In an inaccessible environment, the percept does not provide enough information to determine the state or the transition probability

POMDP– State transition function: P(st+1 | st, at)– Observation function: P(ot | st, at)– Reward function: E(rt | st, at)

Approach– To calculate a probability distribution over the possible

states given all previous percepts, and to base decision on this distribution

Difficulty– Actions will cause the agent to obtain new percept, which

will cause the agent’s beliefs to change in complex ways

KI2 - 10 Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.

Documents

Transcript of KI2 - 10 Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.

AN ANALYTICAL AND NUMERICAL STUDY OF PARAMETRIC …SFU Examiner June 28, 2006 . ki2 *"' SIMON FRASER 0 UNWERSITY~~ brary DECLARATION OF ... (4.14) (right). ..... 37 An example of a

PDF hosted at the Radboud Repository of the Radboud ...Ook emotionele intelligentie is tot op zekere hoogte trainbaar en zit wat betreft veranderbaarheid tussen coping en persoonlijkheid

Lambert Schomaker KI2 - 2 Kunstmatige Intelligentie / RuG.

1 Kunstmatige Intelligentie / RuG KI2 - 7 Clustering Algorithms Johan Everts.

Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

University of Groningen Peritoneaal dialyse Kop, Petrus ... · Omdat men de "kunstmatige nier" niet altijd kan gebruiken, ... wanneer een vloeistof door de buik ... Ret tweede hoofdstuk

Robot-journalistiek: Hoe kunstmatige intelligentie het … · 2019. 3. 25. · medialandschap beïnvloedt. Onderzoeksinteresse y it CONCEPTUALIZER Message generation Monitoring Preverbal

Kunstmatige Intelligentie / RuG KI2 - 11 Reinforcement Learning Sander van Dijk.

Data Mining: An Introductory Overview Kunstmatige Intelligentie / RuG modified by Marius Bulacu KI2 - 6 Jiawei Han and Micheline Kamber Intelligent Database.

University of Amsterdam - GitHub Pages · 2021. 1. 2. · kunstmatige neurale netwerken zijn computer modellen die gebaseerd zijn op de werking van het brein. Het interessante van

EMBRACING A BETTER LIFE · 2020. 4. 15. · worden verbonden met het snelgroeiende Internet of Things. Artificiële intelligentie zet die massa aan vergaarde data dan om in praktisch

Gambiadag 2011 Workshop Water · 2011. 4. 15. · Overalaanwezig, vaakvan goedekwaliteit Relatiefondiepgrondwater introductie |regenwater | oppervlaktewater | grondwater | kunstmatige

Kunstmatige Intelligentie 1: Introductiekosterswa/AI/aieen.pdf · (5, 6). Recently, the AlphaGo Zero algorithm achieved superhuman performance in the game of Go by representing Go

Indo-Europeanmrloux.weebly.com/uploads/3/8/5/1/38512431/ch5-ki2-notesup.pdf · Indo-European started in the Fertile Crescent. Language diffused peacefully through agricultural trade.

Kunstmatige Intelligentie en Blockchain: Samenwerken om ... · • Transparant en niet te veranderen • Veilig vanwege het mechanisme van overeenstemming • Gedecentraliseerd •

Artificiële intelligentie in welzijn en zorg: een vloek of ... · research and data sharing. 2. “Use case” based approach : look for societal and patient benefit. 3. Promote

SDN MAGAZINE KNOW- LEDGE#133kunstmatige intelligentie en machine learning een grote rol. Het feit dat veel services slimmer worden naarmate ze meer en meer gebruikt worden, geven deze

Arturo Arias final FINAL - Ohio State University 1/arturoarias.pdf · 22 arturo2arias2 2 21,20132 ISSN22168=84512 —Uff,2 tlaluel2 ni2 tlaxikojtok2–ki2 ijtoyaya Mundo2 ua ki2 melauayaya

KI2 - 1 Kunstmatige Intelligentie / RuG Structural Pattern Recognition Marius Bulacu & prof. dr. Lambert Schomaker.

Media Understanding Bachelor Kunstmatige Intelligentie · Media Understanding . Bachelor Kunstmatige Intelligentie . Regular Exam . Date: March 31, 2017 . Time: 9:00-12:00 . Place: