Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences,...

14
Human and Optimal Exploration and Exploitation in Bandit Problems rtment of Cognitive Sciences, University of Califor A Bayesian analysis of human decision-making on bandit problems: Journal of Mathematical Psychology 53 (2009) 168179. Presenter: Juan Wang Date: 18/5/2015

Transcript of Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences,...

Page 1: Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.

Human and Optimal Exploration and Exploitation in Bandit Problems

Department of Cognitive Sciences, University of California.

A Bayesian analysis of human decision-making on bandit problems: Journal of Mathematical Psychology 53 (2009) 168179.

Presenter: Juan WangDate: 18/5/2015

Page 2: Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.

Bandit problem

• A decision –maker must choose one out of multiple alternatives after a short sequence of trials. (Such as different treatments)

• Each of the alternatives has a fixed reward rate, but are not told what the rates are. (such as success rate after accepting one treatments)

• However, the problem of dilemma between exploration and exploitation is evident in many real-world decision-making situations. ---e.g. shown in the below figure.

Page 3: Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.

Which alternative should be chosen on the 11th trails?

The first choice represents more failures and less successes, but at a moderate rate and also well-known than the second. However, the second alternative explores the possibility that this alternative may be the more rewarding one. ---Dilemma between exploration and exploitation.

Acquiring knowledge of each alternative is exploration, and making use of it to making the option is exploitation.

Page 4: Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.

Therefore, it is necessary for decision-makers to find good ways to learn about alternatives, which is requires exploration and which requires exploitation , simultaneously attaining more rewards.

Page 5: Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.

Background

Human performance on bandit problems has been a topic of interest in variety of fields, such as economics and cognitive neuroscience.

Most studies focused on a large number of trials (larger horizon bandit problems), however which is less likely to allow for people switch flexibly between exploration and exploitation when a small number of trials ( short-horizon bandit problems).

Page 6: Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.

Objective

To know if people switch flexibly between exploration and exploitation under the short horizon bandit problems, and to well understand how switch on a specially interest situation: a well-understood but only moderately-rewarding alternative compared to a less well-understood but possibly better-rewarding alternative.

Page 7: Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.

In this paper, authors developed and evaluated a probabilistic model that assumes different latent states guide decision making for short-horizon bandit problems. (searching/exploration state and stand/exploition state )

Page 8: Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.

Assumption of three different situations

Page 9: Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.

The Probabilistic Model

Page 10: Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.

Experiment

• Conditions: six different types of bandit problems conditions: combination of two trial size (8 trials and 16 trials) and three different environmental distributions (Beta distribution where two parameters consisted of prior successes and prior failures ).

• Assumed 50 problems for each condition: (total 300 problems)

• Date: collected date from 10 naïve participants (6 males, 4 females)

• all problems within the conditions was randomized for each participant at each trail

Page 11: Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.

Optimal Performance and Model analysis

1) Calculate Optimal decision-making behavior for all of the problems completed by 10 participants using a recursive approach in reinforcement learning literature (e.g.,Kaebling et al.,1996).

I did not understand this recursive approach, and this issue mentioned in Kaebling’s paper, anyway, this approach is helpful to find the optimal decision-making process for a bandit problem after giving distribution conditions and trail size.

2) Applied the graphical model in Figure 2 to all of the optimal and human decision data (training data), for all six bandit problem conditions. For each data set, estimated parameter from 1000 posterior samples.

Page 12: Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.

Test the latent state model how to fit the observed data reasonable well• Compared its predicted decisions at its-best-fitting parameterization (estimator) to

all of the human and optimal decision-making data.

• Proportion of agreement calculated between both.

Generally fit well, just a little less well for participant AH.

Page 13: Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.

Check the descriptive adequacy of the latent state model

Zi parameter inferred in the model is a variable to determine either search or state for i-th trail.

Descriptive adequacy is shown in the figure of next slide.

Posterior probability that the i-th trial uses the stand state approximates the posterior of the Zi indicator variables.

Page 14: Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.