Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein

Reinforcement learning and

human behavior

Hanan Shteingart and Yonatan Loewenstein

MTAT.03.292 Seminar in Computational Neuroscience

Zurab Bzhalava

Introduction

• Operant Learning

• Dominant computational approach to model operant learning is model-free RL

• Human behavior is far more complex

• Remaining Challenges

Reinforcement Learning

RL: A class of learning problems in which an agent interacts with an unfamiliar, dynamic and stochastic environment

Goal: Learn a policy to maximize some measure of long-term reward

Markov Decision Process

• A (finite) set of states S• A (finite) set of actions A• Transition Model: T(s, a, s’) = P(s’ | a ,s)• Reward Function: R(s)

• ᵧ is a discount factor ᵧ [0; 1]∈

• Policy π

• Optimal policy π*

Markov Decision Process

Bellman equation:

Biological Algorithms

• Behavioral control

• Evaluate the world quickly

• Choose appropriate behavior based on those valuations

midbrain's dopamine neurons

• Central role in guiding our behavior and thoughts

• Valuation of our world– Value of money– Other human being

• Major role in decision-making • Reward-dependent learning• Malfunction in mental illness • Related to Parkinson's disease. • Schizophrenia

Reinforcement signals define an agent's goals

1. organism is in state X an receives reward information;

2. organism queries stored value of state X;

3. organism updates stored value of state X based on current reward information;

4. organism selects action based on stored policy

5. organism transitions to state Y and receives reward information.

The reward-prediction error hypothesis

Difference between the experienced and predicted “reward” of an event

•Neurons of the ventral tegmental area

•phasic activity changes encode a 'prediction error about summed future reward'

prediction-error signal encoded in dopamine neuron firing.

Value binding

Human reward responses

Model-based RL vs Model-free RL

• goal-directed vs habitual behaviors

• Implemented by two anatomically distinct systems (subject of debate)

• Some findings suggest:

– Medial striatum is more engaged during planning

– Lateral striatum is more engaged during choices in extensively trained tasks

Model-based RL vs Model-free RL

(b) Model-free RL

(c) Model-based RL

Human subjects in exhibited a mixture of both effects.

Challenges in relating human behavior to RL algorithms

• Humans tend to alternate rather than repeat an action after receiving a positively surprising payoff

• Tremendous heterogeneity in reports on human operant learning

• Probability matching or not

Heterogeneity in world model

Learning the world model

Reference List:

• Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein

• The ubiquity of model-based reinforcement learning Bradley B Doll Dylan A Simon3 and Nathaniel D Daw

• Computational roles for dopamine in behavioral control P. Read Montague1,2, Steven E. Hyman3 & Jonathan D. Cohen4,5

Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein

Documents

Transcript of Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein

Yonatan is25

Neuroeconomics: Why Economics Needs Brains (Camerer, Loewenstein ...

Davide Hanan

Loewenstein Rehabilitation Hospitalloewenstein-rehab.clinic/wp-content/uploads/2017/06/English-brochure_low2.pdf · Loewenstein are being used in the largest and leading rehabilitation

Ariely, Loewenstein, Prelec (2003) - TU Berlin · • Last week; Ariely, Loewenstein, Prelec (2003). • Other examples: framing, nudging. • System 1 vs System 2 cognition (Kahnemann,

Loewenstein Occupational Therapy Cognitive Assessment to ...

Hanan Salam Presentation

Proposal UoG Hanan V2

Alvin Yonatan Tanoko Portfolio

David And Yonatan By Chaya Rubens.

HANAN QUARTERLY NARRATIVE REPORT - JSIhanan.jsi.com/Docs/Project/2007_q3_report.pdf · HANAN QUARTERLY NARRATIVE REPORT Reporting ... health service statistics from each of the Hanan

Fragments, the love story - Yonatan Levin, Gett

Calendario 2007 Al Hanan

Hanan AS Media

Cain Loewenstein Moore 2005

Loewenstein Corporations Spring 2012

Cardiac Tamponade Prepared By Prepared By Dr. Hanan Said Ali Dr. Hanan Said Ali.

GEORGE LOEWENSTEIN - Carnegie Mellon Universitygl20/GeorgeLoewenstein/... · George Loewenstein Page 1 GEORGE LOEWENSTEIN Curriculum Vitae October, 2007 Department of Social and Decision

HANAN JAWAD HASSAN12-2016

Introduction to Genetic Algorithms Yonatan Shichel.