Neuroecon Seminar Pres

Risk and Reward: Valuation in Decision-Making

Neuroeconomics Seminar

10/13/09

Trevor Kvaran

Outline

• Chapter 23 – Why care about how valuation is computed?– Models of valuation– Neurobiological evidence

• Chapter 25– Neuroanatomy of the striatum– The role of the striatum in valuation

Chapter 23: Take Home Message

• Valuation and choice are separable processes.

• Valuation for decisions under risk is accomplished by computing expected reward and risk, not by computing probabilities and utilities.

• This model can be extended to decisions under ambiguity (uncertainty).

Clarifying Terms

• Bossaerts, Preuschoff, & Hsu (BPH) seem to use uncertainty and ambiguity in uncommon ways.

• For BPH, “decisions under uncertainty” are any decisions that involve probabilistic choices.

• “Decisions under ambiguity” are decisions where probabilities are unknown.

Why care about how value is computed?

• Will hopefully allow for better prediction of choices.

Are Valuation and Choice Separable Processes?

• Evidence from Berns et al. (2007)– neural activity during passive

trials was predictive of choices during active trials.

– Implication: if valuation and choice are separable processes, revealed preference may rest on questionable assumptions (is this right?).

– Potential problem: participants knew they would eventually be making active decisions, creating an incentive for passive valuation.

Default Actions• Default Actions:

– Stimulus-insensitive, computationally simple, goal-oriented behaviors.– Default actions reflect choices that on average maximize utility– Similar to other dual-process explanations, prefrontal cortex

activation may be involved in overriding default actions.

De Martino et al. (2006)

Phil’s Questions

• OFC activation decreases with subjects tendency to become risk-seeking with losses. Might this suggest this type of behaviour is "automatic", or non-conscious? Could we reduce this behaviour through cognitive means?

• Its noted that in the Random Utility Model, choice is always optimal; by contrast in the default action valuation model, choice is often sub-optimal. Are economic agents constrained by some kind of "prepotent" response that impairs their adjustment to environment changes?

David’s Question

Bossaerts, Preuschoff, and Hsu (2009) highlight that the computation of value is distinct from the computation of choice and, therefore, that choice can (at least in principle) be sub-optimal. That is, a decision-maker could make a physical act of choice that goes against his or her true preference. When and why might this disjunction between true and revealed preference happen?

More Default Action Evidence

• Caudate neurons that “prefer” a rewarded direction increase their firing rate prior to stimulus onset and decrease firing rate if rewarded direction is inappropriate.

• Increased errors for non-rewarded direction.

• Could support default action hypothesis.

Lauwereyns et al. (2002)

David’s Question

• Bossaerts et al. (2009) speculate that a bias toward “default actions” might cause a person to act against his or her true preference. By “default action,” the authors mean a behavior that an organism automatically performs unless overridden by other processes. Default actions are “stimulus-insensitive and goal-oriented” and, therefore, “robust to lapses of attention” (356), for they prevent the organism from having to constantly expend effort interpreting stimuli.

• See, for example, their interpretation of the De Martino et al. (2006) data on page 357; briefly, caudate neurons are thought to encode a preference to saccade in the direction that is more regularly rewarded – this is the bias toward a default action, essentially – and mistakes are often made when the stimulus demands a saccade in the opposite, infrequently rewarded direction. They infer that “the mistake was caused by the monkey’s inability to overcome its default action” (358). How is the alternative explanation of a perceptual error ruled out? In particular, might the monkey just ignore the stimulus, thinking it already knows what the stimulus will be (i.e., the regularly rewarded direction)? The stated normative rationale for a default action, after all, is to prevent the organism from having to constantly expend effort interpreting stimuli. If the monkey assumes (albeit wrongly) that the stimulus demands a saccade in the more regularly rewarded direction, then there is not really a conflict between true and revealed preference – it does what it wants, just under wrong assumptions. What seems needed is a way to determine that the monkey successfully processed the stimuli but nonetheless choose the default action.

• Maybe the authors simply mean to describe perceptual errors, but, if so, then the separation of true and revealed preference is not so interesting, at least not to me. More intriguing is a general problem of, as the philosophers would call it, akrasia: acting against one’s own – and known – better judgment.

Policy Implications

• BPH raise interesting questions about the policy implications of the default action valuation model.

• If correct, it implies that “revealed preferences” may not reliably track true preferences.

• Any thoughts?

Risk Assessment and Learning

• Risk has often been ignored in reinforcement learning models.

• To learn optimally, the risk of a prediction error should be assessed.

– If risk is high, small change to predictions.

• Tobler et al (2005) suggests that scaled prediction errors are calculated.

Filippo’s Question

• The mean-variance models discussed by Bossaerts et al. is a useful extension of standard “first moment” utility models. Moreover, their “Taylor-style” approach - i.e. accumulating terms in the utility function in order to better represent choices settings – is methodologically interesting. To a certain extent, one can imagine of comparing the fit of models with e.g. n-1 terms from the Taylor expansion and see which term has the most dramatic marginal effect. In this way, it would be possible to keep track of the information used by the decision-maker. It is also remarkable that simple Bayesian algorithm, which were not designed to identify minimum-variance strategies, do in fact find them (this is the result of a set of simulations I run.)

• Therefore, variance-minimization may be considered an explicatory factor in subjects’ (and algorithmic) behavior. The extent to which variance minimization does account for actual computation in the brain is nonetheless unclear. In principle, the same outcome can be achieved without the requirement of an explicit effort in variance minimizing, and it may emerge as a side effect of other computations (like in the case of my algorithm.)

• Brain data presented by Bossaerts et al. substantiate the idea that risk-minimization plays a role in human’s choices, it is nonetheless not clear whether there is in fact a ‘risk-encoding’ evaluation signal, or whether risk evaluation is the emergent effect of other computations.

• As Bossaerts et al. note, the integration of ‘evaluation signals’ is one of the most challenging question in the neuroscience of choices. In light of this question, some issues may be raised with respect to their approach. First of all, it is unlikely that these hypothesized signals are integrated in a linear way2. For this reason, it seems equally hard to identify individual components of the evaluation process by fitting a linear composition of independent signals. Moreover, the identification problem mentioned above constitutes an even more serious challenge here. Indeed, the way in which two processes interact is different from the way in which emerging by-products of two processes interact.

• Fore these reasons, I am not sure whether the ease in the interpretability of these terms, when considered independently (or linearly merged), can compensate the complexity that will likely emerge when we will try to put these ‘signals’ together. Alternatively, Bayesian models developed to account for integration of different signals (e.g. empirical Bayes) might be more generative and more manageable.

Phil’s Question

• I found it interesting that risk encoding may play a role in learning. Perceived risk could affect the learning rate, with more risk-averse agents learning more slowly. It could be interesting to study this in more detail, perhaps examining whether some temporary manipulation of risk aversion could influence learning.

Evaluating Reward and Risk

“linear” relationships in striatum (reward), inverted u-shape in insula (risk).

Integrating Reward and Risk

Is this evidence of integrating risk and reward, or simply that risk and reward are both encoded in PFC?

Decisions Under Ambiguity

• Models for decisions under ambiguity– Maxmin

• Utilities are computed assuming worst case scenarios about probabilities.

– ά-maxmin• Best and worst case scenarios can both be considered.

– ά>.50 = ambiguity-averse– ά <.5 = ambiguity-seeking

• Assuming ά-maxmin, utilities for decisions under ambiguity can be conceived as a trade-off between mean reward and risk.

• Kaisa’s Question– Expected utility & prospect theories and risk-return models provide two

different approaches to modeling decision making under risk. What are the advantages/disadvantages of these approaches, and how well do the brain imaging findings from these two approaches support each other

• Mirre’s Question– In decision theory ambiguity aversion is distinguished from risk aversion.

However, brain data suggests a similar neural mechanism for both risk and ambiguity. Knowing this, to what extent is ambiguity aversion different from risk aversion behavior?

Chapter 25: Subjective Value in the Striatum

Chapter 25: Take Home Message

• Subjective valuation is represented prior to choice (anticipated valuation), informs choice, and can be updated following choice (outcome valuation).

• Ventral striatal regions represent information about anticipated value.

• Dorsal striatal regions represent information about outcome values.

Striatal Neuroanatomy

• Set of subcortical structures near the center of the brain.

• Includes three structures:

– Caudate, Putamen, Nucleus Accumbens

Striatal Neuroanatomy

• Striatum can also be divided into ventral and dorsal parts.

• Ventral striatum includes Nacc and lower caudate and putamen.

• Dorsal striatum includes higher parts of caudate and putamen.

Striatal Connectivity

• The Striatum has a distinct “ascending spiral” connectivity with the prefrontal cortex.

• Ventral striatal regions connect to ventromedial cortical regions (associated with emotion and motivation), while more dorsal striatal regions connect to dorsolateral cortical regions (associated with movement and memory).

Valuation: evidence from rats

• Converging evidence that dopamine release in the Nacc occurs in response to anticipated reward.– Dopamine increases when:

• Perception of escape from predator is high.

• Smell of food is introduced.

• Introduced to a receptive female.

• Introduced to new rats.

Anticipated value: evidence from human neuroimaging

• Knutsen et al. (2001) found scaled activation to anticipated gains in NAcc.

Outcome value: evidence from neuroimaging

• Delgado et al. (2000) found caudate sensitive to outcome information.

• Further studies (O’Doherty et al., 2004; Delgado et al., 2005) suggest caudate is particularly sensitive when outcome information can inform future decisions.

Kaisa’s Question

• There is some evidence in the literature that the ventral striatum would be evaluating outcomes in respect to reference point (Tom et al 2007, De Martino et al 2009), whereas dorsal parts of the ventral striatum are related to reference independent evaluation of outcomes (Pine et al 2009, Tobler et al 2007, De Martino et al 2009). I started wondering how do these findings match with the view that ventral parts of the striatum is related to the evaluation of expected gains whereas the dorsal parts are more involved in evaluating the value of an outcome and in selection of actions based on the outcomes.

Alex’s Question

• King-Casa et al., (2005) found that striatum activation predicted participants tendency to invest in a partner who had cooperated with them before, and Delgrado et al (2005) found that reputation of a partner influencing future social gains correlated with striatum activation as well. Can we extend these findings on expectation studies (e.g.: knowing that players usually make generous offers induces responders to reject more unfair trials, see Sanfey 2009 on mind and society)?

• My point is: do expectations influence behavior because of increased rewarding expectations (as reputation study showed, leading to a modulation in the striatum) or because of more negative emotional reactivity (leading to a modulation in the insula activation) ?

• Note: Kliemann et al (2009) prior record paper may be relevant. RTPJ (TOM) increased for unfair players.

Mirre’s Question

• Several imaging studies suggested that activity in the striatum reflects a common neural currency of reward, that is, for rewards in the economics (e.g. money) as well as in the social (e.g. status, being liked) domain. My question is whether this is true, given the fact that the striatum consists of quite a few different components and the low resolution of fMRI does not allow for distinguishing between the smaller striatal components.

Cinzia’s Question

• Striatum encode subjective value in risk conditions. Thus ventral striatal activation has been investigated in several studies with gambling task. They showed that ventral striatal activation predicts a switching to the high-risk choices (Kuhnen and Knutson, 2005) and that this can also be exogenously controlled. Indeed, after positive pictures exposition subjects showed a higher-risk behavior, this was associated with an increase in the ventral striatal activation (Knutson et al., 2008). Ventral striatal activation can predict subsequent choices and economic behavior.

• Can we conclude that “ventral striatal activation affects risk seeking behavior”? How the activation of the ventral striatal affects the behavior? Does its activation affect only the behavior in the subsequent trial or the choices along the full task (e.g. gambling task)?

Neuroecon Seminar Pres

Technology

Transcript of Neuroecon Seminar Pres