Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 ·...

40
Neural systems analysis of decision making during goal-directed navigation Marsha R. Penner, Sheri J.Y. Mizumori * Department of Psychology, University of Washington, Seattle, WA 98195-1525, United States Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 2. Navigation and foraging behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 3. Laboratory tasks that are based on foraging behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4. Reinforcement learning and decision making environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.1. Temporal difference learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.2. Dopamine and reinforcement learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5. The neurobiology of reinforcement learning and goal-directed navigation: hippocampal contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.1. Hippocampal place fields as spatial context representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.2. The hippocampus distinguishes contexts during navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.3. Cellular and network mechanisms underlying hippocampal context processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.3.1. CA3 and CA1 place fields contributions to the evaluation of context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.3.2. Temporal encoding of spatial contextual information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.3.3. Sources of hippocampal spatial and nonspatial information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.3.4. Determining context saliency as a part of learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.4. Relationship between hippocampal context codes and reinforcement based learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.4.1. Functional connectivity between reinforcement and hippocampal systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.4.2. A role for dopamine in hippocampal-dependent learning and plasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.4.3. Impact of hippocampal context processing on dopamine cell responses to reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6. The neurobiology of reinforcement learning and goal-directed navigation: striatal contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6.1. Striatal based navigational circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6.2. Dopamine signaling and reward prediction error within the striatum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Progress in Neurobiology 96 (2012) 96–135 A R T I C L E I N F O Article history: Received 12 April 2011 Received in revised form 6 August 2011 Accepted 29 August 2011 Available online 21 September 2011 Keywords: Dopamine Reinforcement learning Hippocampus Striatum Navigation Decision making A B S T R A C T The ability to make adaptive decisions during goal-directed navigation is a fundamental and highly evolved behavior that requires continual coordination of perceptions, learning and memory processes, and the planning of behaviors. Here, a neurobiological account for such coordination is provided by integrating current literatures on spatial context analysis and decision-making. This integration includes discussions of our current understanding of the role of the hippocampal system in experience-dependent navigation, how hippocampal information comes to impact midbrain and striatal decision making systems, and finally the role of the striatum in the implementation of behaviors based on recent decisions. These discussions extend across cellular to neural systems levels of analysis. Not only are key findings described, but also fundamental organizing principles within and across neural systems, as well as between neural systems functions and behavior, are emphasized. It is suggested that studying decision making during goal-directed navigation is a powerful model for studying interactive brain systems and their mediation of complex behaviors. ß 2011 Published by Elsevier Ltd. Abbreviations: BLA, basolateral amygdale complex; DLS, dorsolateral striatum; DMS, dorsomedial striatum; LDTg, lateral dorsal tegmental nucleus; mPFC, medial prefrontal cortex; OFC, orbitofrontal cortex; PPTg, pedunculopontine nucleus; SI/MI, primary sensory and motor cortices; SNc, substantia nigra pars compacta; vPFC, ventral prefrontal cortex; VTA, ventral tegmental area. * Corresponding author at: Department of Psychology, Box 351525, University of Washington, Seattle, WA 98195-1525, United States. Tel.: +1 206 685 9660; fax: +1 206 685 3157. E-mail addresses: [email protected], [email protected] (Sheri J.Y. Mizumori). Contents lists available at SciVerse ScienceDirect Progress in Neurobiology jo u rn al ho m epag e: ww w.els evier .c om /lo cat e/pn eu ro b io 0301-0082/$ see front matter ß 2011 Published by Elsevier Ltd. doi:10.1016/j.pneurobio.2011.08.010

Transcript of Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 ·...

Page 1: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

Progress in Neurobiology 96 (2012) 96–135

Neural systems analysis of decision making during goal-directed navigation

Marsha R. Penner, Sheri J.Y. Mizumori *

Department of Psychology, University of Washington, Seattle, WA 98195-1525, United States

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

2. Navigation and foraging behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

3. Laboratory tasks that are based on foraging behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

4. Reinforcement learning and decision making environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

4.1. Temporal difference learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

4.2. Dopamine and reinforcement learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5. The neurobiology of reinforcement learning and goal-directed navigation: hippocampal contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.1. Hippocampal place fields as spatial context representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.2. The hippocampus distinguishes contexts during navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.3. Cellular and network mechanisms underlying hippocampal context processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.3.1. CA3 and CA1 place fields contributions to the evaluation of context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.3.2. Temporal encoding of spatial contextual information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.3.3. Sources of hippocampal spatial and nonspatial information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

5.3.4. Determining context saliency as a part of learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.4. Relationship between hippocampal context codes and reinforcement based learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.4.1. Functional connectivity between reinforcement and hippocampal systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.4.2. A role for dopamine in hippocampal-dependent learning and plasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.4.3. Impact of hippocampal context processing on dopamine cell responses to reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6. The neurobiology of reinforcement learning and goal-directed navigation: striatal contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6.1. Striatal based navigational circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6.2. Dopamine signaling and reward prediction error within the striatum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

A R T I C L E I N F O

Article history:

Received 12 April 2011

Received in revised form 6 August 2011

Accepted 29 August 2011

Available online 21 September 2011

Keywords:

Dopamine

Reinforcement learning

Hippocampus

Striatum

Navigation

Decision making

A B S T R A C T

The ability to make adaptive decisions during goal-directed navigation is a fundamental and highly

evolved behavior that requires continual coordination of perceptions, learning and memory processes,

and the planning of behaviors. Here, a neurobiological account for such coordination is provided by

integrating current literatures on spatial context analysis and decision-making. This integration includes

discussions of our current understanding of the role of the hippocampal system in experience-dependent

navigation, how hippocampal information comes to impact midbrain and striatal decision making

systems, and finally the role of the striatum in the implementation of behaviors based on recent

decisions. These discussions extend across cellular to neural systems levels of analysis. Not only are key

findings described, but also fundamental organizing principles within and across neural systems, as well

as between neural systems functions and behavior, are emphasized. It is suggested that studying

decision making during goal-directed navigation is a powerful model for studying interactive brain

systems and their mediation of complex behaviors.

� 2011 Published by Elsevier Ltd.

Abbreviations: BLA, basolateral amygdale complex; DLS, dorsolateral striatum; DMS, dorsomedial striatum; LDTg, lateral dorsal tegmental nucleus; mPFC, medial prefrontal

Contents lists available at SciVerse ScienceDirect

Progress in Neurobiology

jo u rn al ho m epag e: ww w.els evier . c om / lo cat e/pn eu ro b io

cortex; OFC, orbitofrontal cortex; PPTg, pedunculopontine nucleus; SI/MI, primary sensory and motor cortices; SNc, substantia nigra pars compacta; vPFC, ventral prefrontal

cortex; VTA, ventral tegmental area.

* Corresponding author at: Department of Psychology, Box 351525, University of Washington, Seattle, WA 98195-1525, United States. Tel.: +1 206 685 9660;

fax: +1 206 685 3157.

E-mail addresses: [email protected], [email protected] (Sheri J.Y. Mizumori).

0301-0082/$ – see front matter � 2011 Published by Elsevier Ltd.

doi:10.1016/j.pneurobio.2011.08.010

Page 2: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 97

6.3. The ventral striatum: Pavlovian learning and cost-based decision making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6.3.1. Nucleus accumbens and Pavlovian learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6.3.2. The nucleus accumbens and cost-based decision making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6.3.3. Spatial learning and navigation: the role of the ventral striatum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.4. Dorsal striatum: contributions to response and associative learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.4.1. Action–outcome learning and habit learning in the dorsal striatum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.4.2. Response learning in the dorsal striatum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.4.3. Sequence learning in the dorsal striatum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

6.5. Interactions between the dorsomedial and dorsolateral striatum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

7. Neural systems coordination: cellular mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7.1. Single cells and local network coordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7.2. Neural systems organization and oscillatory activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

7.2.1. Theta rhythms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

7.2.2. Gamma rhythms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

7.2.3. Coordination of theta and gamma rhythms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

8. Neural systems coordination: decisions and common foraging behaviors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

8.1. Goal directed navigation in a familiar context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

8.2. Goal directed navigation in a familiar context following a significant change in context. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

8.3. Goal directed navigation in a novel context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

9. The challenges ahead. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

1. Introduction

Nearly all cognitive processes utilize or include some aspect ofspatial information processing. An animal’s ability to find its wayaround its world is critical for survival; it is crucial for obtainingfood, avoiding prey and finding mates. Research into spatialinformation processing over many decades not only continues todefine the mechanisms that contribute to spatial informationprocessing, but these efforts have also provided significant insightinto the fundamental mechanisms that underlie learning andmemory more generally.

Within the laboratory, goal-directed spatial navigation, inparticular, is an immensely useful behavior to study because inmany ways it reflects ethologically relevant learning challenges,and provides opportunities to examine dynamic features of neuralfunction that are otherwise not afforded by more simplebehavioral paradigms and tasks. Goal-directed navigation is acomplex behavior, requiring the subject to perceive its environ-ment, learn about the significance of the environment, and thenselect where to go next based upon what has been learned. Thus,navigation-based tasks can be used to investigate behavioral andneural aspects of external and internal sensory perception,learning and decision making, memory consolidation and updat-ing, and planned movement. Goal-directed navigation, then, is apowerful model by which to study dynamic neural systemsinteractions during a fundamental and complex natural behavior.

As a whole, efforts to understand the neurobiology ofnavigational behavior have focused mainly on the nature andmechanisms of spatial representation in limbic brain structuresthat are known to be important for spatial learning. As a result,there have been important revelations regarding the physiologicalmechanisms that control limbic spatial representations. Relatingsuch representations, however, to limbic-mediated learning ormemory has been indirect and correlational at best (as discussed inMizumori et al., 2007a). Here, we suggest that careful applicationof reinforcement learning theory to an understanding of howdecisions are made during goal-directed navigation can identify afundamental and essential process that likely underlies naviga-tion-related perception, learning, memory or response selection.That is, in order to understand how spatial representations arerelated to learning, it is necessary to understand how decisions aremade during navigation from both neural and behavioralperspectives. Without the ability to make adaptive decisions,

animals will not acquire the efficient learning strategies necessaryfor adaptive behaviors. It should be noted that the suggestion tolink reinforcement learning ideas with navigation dates backdecades, although the terminology may be different (e.g., cost–benefit analysis of foraging behavior vs. value-based decisionmaking). By investigating this link in freely navigating animals, wemay be able to uncover the mechanisms that underlie naturalisticmotivated behaviors.

2. Navigation and foraging behavior

The natural foraging environments on which laboratorynavigational tasks are based are tremendously complex. Theforager’s challenge is to acquire sufficient food stores to preventstarvation, produce viable offspring, and avoid predators. A naturaltendency for many animals, including rodents, is to hoard smallamounts of food in a scattered distribution within their homerange or nest (Stephens, 1986). The caching of food requires carefulroute planning to and from the source of food, the cache, and thehome nest. Moreover, because animals acquire food during timeswhen it is abundant, and recover it when food sources are scarce,the animal must retain knowledge of where the food has beencached. This behavior, a naturally occurring spatially directedbehavior, is evident in many species, including rodents, birds,spiders, honeybees, and humans (e.g., Anderson, 1984; Davies,1977; Diaz-Fleischer, 2005; Goss-Custard, 1977; Hawkes et al.,1982; Waddington and Holden, 1979).

The development of mathematical models that formallydefined naturally occurring foraging behaviors led to optimal

foraging theory which describes the foraging behavior of an animalin relation to the metabolic payoff it receives when using differentforaging options. Most animals are adapted structurally andphysiologically to feed on a limited range of food and to gather thisfood in specific ways (e.g., caching of food during times ofabundance). Some food may contain more energy but be harder tocapture or be further away, while food that is close at hand may notbe considered as nutritionally profitable. According to optimalforaging theory, an ‘optimal forager’ will make decisions thatmaximize energy gain and minimize energy expenditure (Krebsand McCleery, 1984; Stephens, 1986). Two foraging models are ofnote: the ‘prey model’ proposed by MacArthur and Pianka (1966),and the ‘patch model’ proposed by Charnov (1976). The prey modelseeks to define the criteria that determine whether prey items will

Page 3: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

Fig. 1. Laboratory tasks used to assess navigational behaviors. (A) Morris swim task.

Photograph of a rat swimming in the cued version of the Morris swim task, in which

an escape platform is clearly visible to the rat. In the spatial version of the task, the

platform is submerged beneath the opaque water, and the rat uses distal cues

around the room to locate the platform. (B) Barnes Circular Platform Task.

Photograph of a rat making an ‘error’ on the Circular Platform Task by looking into a

hole that is not over the dark escape chamber. The arrow points to the correct

location of the hole over the goal, which the rat must find on the basis of the features

of the environment distal to the platform. (C) Radial arm maze. Photograph of a rat

on one of the 8 arms of the radial maze, which is designed to mimic natural foraging

behaviors. At the end of each of the arms is a food cup where reward is delivered. At

the beginning of a trial, subjects are placed in the center of the maze and allowed

access to all of the maze arms, but only a subset of the arms will actually contain a

reward (usually four). After a rentention delay, the subject is returned to the maze.

In win-stay conditions, the same four arms are baited after the delay, and the

number of correct choices the subject makes in collecting these rewards is recorded.

In win-shift conditions, the four arms not baited in the earlier trial are now baited,

and the number of correct arm choices is recorded. Each day, a new set of four arms

is chosen randomly. (D and E) Schematic of a plus maze. The plus maze represents a

‘dual solutions’ problem in that it can be solved using a ‘response’ strategy or a

‘place’ strategy. In the place/response task, rats are trained to retrieve food from one

arm of a T-maze or cross maze. The content of learning can be assessed by moving

the starting arm to the other side of the maze on a probe test. The animal may enter

the arm corresponding to the location of the reward during training (place strategy)

or the arm corresponding to the turning response that was reinforced during

training (response strategy).

Photograph in panel (A) taken by Dr. J. Lister; photograph in panel (B) taken by Dr.

C.A. Barnes; photograph in panel (C) taken by D. Jaramillo. All used with permission.

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–13598

be consumed based on the level of energetic investment needed toacquire the prey and the rate of energetic return (MacArthur andPianka, 1966). One prediction of the prey model is that when thereis an abundance of high quality food, an animal’s diet will consistmainly of these items, and lower quality food is less likely to beconsumed. The patch model, on the other hand (Charnov, 1976),takes into account the energy expended when an animal searchesfor food that is clumped in space and time, and thus must decidehow long to spend foraging within a food patch before abandoningit and moving onto another (i.e., exploration vs. exploitation).These models have been mapped onto the behavior of severalspecies (e.g., Anderson, 1984; Cowie, 1977; Davies, 1977; Diaz-Fleischer, 2005; Goss-Custard, 1977; Lima, 1983), and theydemonstrated decades ago the strength of applying an economicapproach to the study of naturally occurring, complex behaviors.

3. Laboratory tasks that are based on foraging behavior

The study of navigational behavior within the laboratorybecame central to the study of learning and memory function withthe introduction of the rat as the primary research subject (Munn,1950). There are a number of reasons why rodent foraging behavioris an ideal model with which to study complex learning in thelaboratory: (1) rodents are naturally excellent foragers, andtherefore they tend to learn tasks based on this abilityexceptionally well; (2) we can apply our understanding of thebrain’s natural motivational circuitry to gain new clues about themechanisms of a highly evolved and adaptive complex learningsystem; (3) despite its complexity – which is what most real worldlearning is – this model is highly tractable; (4) within the humanliterature, navigation-based tasks have been developed that mimicthe tasks used with rodents (e.g., Astur et al., 1998; Burgess et al.,2002; Fitting et al., 2007; Hamilton et al., 2002).

As early as the late 1890s and early 1900s, Willard S. Small usedone of the first mazes to investigate learning by white rats (Small,1899, 1900, 1901), and others soon followed (e.g., Carr, 1917;Honzik, 1933; Tolman, 1930; Watson, 1907). Early mazesconsisted of a system of runways or alleys arranged in variousconfigurations. The first investigations into maze learning wereaimed primarily at determining which sensory inputs wereessential for successfully navigating a maze to the intended goal,and this led to the assumption that navigation through a maze isperformed purely on proprioceptive responses (i.e., stimulus–response behavior), although later studies demonstrated thatstimulus–response strategies were not sufficient to optimally solvecomplex mazes (Munn, 1950; O’Keefe and Nadel, 1978a,b). Whilemany different kinds of mazes were developed in the early years ofmaze use, only a select few are still used, and these are well suitedfor studying reinforcement learning in the context of navigation.These include the T-Maze, and similar variations including themultiple T-maze, the plus maze, and the Y-maze. The radial maze,introduced by David Olton in 1976, is another excellent and well-used example of a so-called ‘multiple solutions’ laboratory task(Olton and Samuelson, 1976). Unlike many of the mazes used inthe early days, the solution to these sorts of maze tasks issufficiently ambiguous that successful performance is based onmore than a single trajectory to a unique goal, and this allows fortesting of more than one cognitive strategy (see Fig. 1).

The plus maze figured prominently in early debates betweenbehaviorists and cognitive learning theorists who pondered what,exactly, an animal learned that enabled it to find the goal on themaze (Hull, 1932, 1943; Packard, 2009; Restle, 1957; Tolman,1930). Behaviorists argued that all behavior is simply elicited byantecedent stimuli within the environment, and thus a task such asthe plus maze can be solved simply via stimulus–responseassociations (Guthrie, 1935). Cognitive learning theorists, on the

other hand, argued that rats could engage in goal-directedbehaviors to solve the maze task, meaning that animals werecapable of learning the casual relationship between their actionsand the resulting outcomes, allowing them control over their ownaction based on their desire for that particular outcome (Tolman,1948). The plus-maze is arranged so that a goal location can be

Page 4: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

The Goalin a Given Context

Value Assessment

Action Selection

Outcome Evaluation

Learning & Memory

Fig. 2. A general conceptual framework for evaluating goal-directed decision

making behavior. Within a context, an assessment of the internal and external

factors of the current situation help to determine the current goal for behavior. The

factors that influence goal assessment include internal states (e.g., hunger or thirst)

and external factors (e.g., distance to different goal locations, presence of

predators). A value assessment involves considering how rewarding any one

goal is (e.g., a far away large cache of food vs. an uncertain but close cache) and

assigns value to each of the available options. An action is selected and is then

implemented. An evaluation of the outcome is made. Did the behavior result in the

expected reward? Was the outcome better (e.g., more food) or worse (no food) than

expected? The outcome of the behavior results in learning when the outcome does

not match the expectation, and might be considered ‘complete’ when a mismatch

between what is expected and what is actually achieved no longer occurs. Memory

stores can then be updated to guide subsequent behavior.

After Rangel et al. (2008).

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 99

approached from one of two start boxes. In the standard ‘dualsolution’ version of the task, rats are consistently released from thesame start arm, and are trained to retrieve reward from anotherconsistently baited maze arm. Rats can use one of two strategies tosolve this task: they can acquire information concerning the spatiallocation of the goal and use that information to navigate to therewarded arm (i.e., a place strategy), or the rat can learn toapproach the rewarded location by acquiring a specific response,such as a right body turn to reach the reward (i.e., a responsestrategy). To determine which strategy the rat is using, a probe trialcan be given in which the rat starts the task from a different arm ofthe maze. Rats with knowledge of the spatial location of the foodshould continue to approach the rewarded arm on the probe trial,whereas rats that have learned a specific body turn should choosethe opposite arm. A number of factors can influence which strategya rat will ultimately use to reach the goal, including the amount oftraining the animal receives. Rat that are overtrained on this tasktend to predominantly use a response strategy, whereas most ratswill use a place strategy early on in training. Thus, overtrainingresults in a shift from goal-directed action–outcome learning andstrategy use to less flexible stimulus–response learning andstrategy use (Packard, 1999; Packard and McGaugh, 1996). Othergoal-directed navigation-based tasks that are widely used includethe Morris Swim Task (Morris, 1981) and the Barnes CircularPlatform Task (Barnes, 1979). All of the above tasks test goal-directed navigation that requires active decision making andlearning about how reinforcers influence choices that are made.These tasks can be contrasted to other ‘foraging’ tasks in which theanimal is not required to implement a decision-based strategy,including random foraging (for bits of food sprinkled randomlyaround an open platform or box), tasks in which movement ispassive (i.e., ‘assisted’ exploration), or tasks in which animalsfollow paths provided by the experimenter until rewards areencountered.

Navigational strategies (such as those just described) may rangefrom relatively simple approach and avoidance behavior to the useof complex representations of the environment (e.g., geometricalmaps). In the context of natural foraging, the goal is to find foodwhile avoiding predators and minimizing energy expenditures.Similarly, in many maze tasks, the goal of a hungry rodent is to findfood, or avoid unpleasant situations, such as cool water or brightopen spaces. In most cases, an animal is faced with more than oneoption. In a natural foraging context, animals may be faced with asituation in which it must take into account the energy expendedwhile searching for food, and thus must decide how long to spendforaging within a food patch before abandoning it and moving ontoanother (i.e., exploration vs. exploitation). On a maze task (e.g., 8-arm radial maze) the animal may need to decide which arms on themaze to visit first, for example, an arm that always has a small foodreward, or an arm that only sometimes has a large food reward. Todetermine a course of action, the animal will engage in ‘value-based decision making’, which can be broken down into severalsteps (Fig. 2; Rangel et al., 2008; Mizumori et al., 2000; Sutton andBarto, 1998). First, the organism needs to determine the goal of thecurrent behavior, a process that may include the assessment ofone’s internal state, such as level of hunger, and external context,such as risk in the environment. Next, a value assignment is madefor each available action, taking into consideration the relative costor benefit associated with each action. Once these values have beenassigned, they can be compared, and a choice is then made aboutwhich behavior to select, and it is then implemented. An analysis ofthe outcome of the behavior can then be determined. Did theaction result in the desired outcome? Was the outcome better thanexpected, or worse? Finally, this feedback is used to updatelearning and memory processes so that future decisions can beimpacted by what has just been learned. Learning is said to be

‘complete’ when the outcome of the chosen course of action isaligned to the expected outcome. If the outcome, on the otherhand, is better or worse than expected, learning about whichactions will lead to an optimal outcome continues.

These processes are, of course theoretical in nature and notabsolute, but help to guide our thinking about the neurobiologicalprocesses that contribute to successful goal-directed navigation. Itmay be prudent, at this point, to define ‘reward’ (for the sake ofsimplicity, we consider reward to be synonymous with goal).Rewards can be defined as objects or events that elicit approachand consummatory behavior, and they represent positive out-comes of decisions that result in positive emotions and hedonicfeelings. Rewards are crucial for survival and support elementaryprocesses such as drinking, eating and reproduction. For othersituations, rewards can also be more abstract, such as money,social status, and information. (e.g., Bromberg-Martin andHikosaka, 2009; Corrado et al., 2009).

4. Reinforcement learning and decision making environments

Reinforcement learning describes the process through which anorganism learns to optimize behavior within a decision environ-ment (see Fig. 3). The ultimate goal of reinforcement learning is toimplement behaviors or actions that result in a maximization ofreward or minimization of punishment. The decision-makingenvironments in which reinforcement learning occurs consist of aset of ‘states’ (Sutton and Barto, 1998), which in the case ofnavigation, can be represented by locations on a maze (e.g., thecenter platform would be one ‘state’, the end of an arm another‘state’); a set of possible actions that the decision-maker canchoose from (e.g., turn left or travel south); and a set of rules thatthe decision-maker will initially be naı̈ve to, and thus must learnvia interaction with the environment (e.g., a large reward is alwaysavailable on the south maze arm). The actions or behaviors that the

Page 5: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

Start

S2S4

S1

S3

S5

++ +

-Model-free trial and error

decision making

Start

S2S4

S1

S3

S5

++ +

-Model-based action-outcome

decision making

BA

Fig. 3. Reinforcement learning on a maze task. (A) Schematic of model-free trial and error decision making on a plus maze task. Model-free reinforcement learning involves

learning action values directly, by trial and error. The environments in which learning occurs consists of a set of states (i.e., locations on the maze), and each state (S1–S5) is

initially independent of other states. Because the decision-maker has not had experience with the states, they will all have similar values assigned to them, and are thus

equally likely to be chosen. (B) Schematic of model-based action–outcome decision making. The ultimate goal of reinforcement learning is to select actions that result in a

maximization of reward. Model-based reinforcement learning uses experience to construct an internal model, for example, a cognitive map, of the transitions and immediate

outcomes in the environment. Through trial error learning, this representation is constructed, and helps to strengthen the connection between states. In the example shown

here, thicker lines represent stronger associative connections, while thinner lines represent connections that are not as strong. Dashed lines indicate that an association has

not been strengthened, as in the case when reward is not delivered at one of those states (S5). In this example, the decision-maker has learned that choosing to go from S2 to S4

results in a large reward, whereas moving from S2 to S3 results in acquisition of a small reward. In a dynamic environment, the value of the rewards may change, resulting in

either strengthening or weakening of states.

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135100

decision-maker implements move the agent from one state toanother, and produces outcomes which can have positive ornegative utilities (e.g., finding a large reward, a small reward or noreward). Finally, the utility of the outcome can change, even withinthe same state, by factors such as the motivational circumstancesof the decision-maker, such as a change from hunger or thirst tosatiation (e.g., Aberman and Salamone, 1999; Dayan and Daw,2008; Dayan and Niv, 2008; Niv, 2009; Sutton and Barto, 1998).

Reinforcement learning models are often divided into model-free and model-based categories (e.g., Daw et al., 2005; Niv et al.,2006). Using model-free reinforcement learning strategies, ani-mals learn the value of each action directly, by trial and error. Incontrast, model-based reinforcement learning uses experience toconstruct an internal model, for example, a cognitive map, of thetransitions and immediate outcomes in the environment. Animalscan then estimate the value associated with each action in everytrial using knowledge about their costs and benefits. Within theframework of navigational behavior, this kind of learning allowsaction selection to be dynamic, changing as the rules within theenvironment change, and is thus suited to support goal-directedbehaviors. Learning using both model-based and model-freestrategies is generally driven by ‘prediction errors’, which arethe differences between actual and expected outcomes, and areused to update expectations in order to make predictions moreaccurate.

4.1. Temporal difference learning

A critical problem in animal and human decision making is howto choose behaviors that will lead to reward in the long run. A‘classic’ approach to this problem was proposed by Rescorla andWagner (1972) who argued that learning occurs when there is adiscrepancy between events that are predicted and those thatactually happen (Rescorla and Wagner, 1972). An extension to theRescorla–Wagner model was proposed by Sutton (1988) andSutton and Barto (1998) in a model which came to be known as‘temporal difference learning’. This has been widely used in

modeling behavioral and neural aspects of reward-related learning(e.g., Bayer and Glimcher, 2005; Kurth-Nelson and Redish, 2009,2010; Ludvig et al., 2008; Maia, 2009; Montague et al., 1996;Nakahara et al., 2004; O’Doherty et al., 2003; Pan et al., 2005, 2008;Schultz et al., 1997; Seymour et al., 2004) such that rewardpredictions are constantly improved by comparing them to actualrewards (Sutton and Barto, 1998). According to such models, anexpected reward value for a given state is estimated. Whenexternal reward is delivered, it is translated into an internal signalthat enters into a computation that determines whether the valueof the current state is better or worse than predicted. Signals thatreflect discrepancies between expected and actual reward valuescan be used to update future expected values and rewardprobabilities. The temporal difference model can be used todescribe how neural responses to stimuli change during learning;as prediction improves, these responses reflect the linking ofstimuli with their expected probability of reinforcement. Byextension, then, the temporal difference model predicts that neuralactivation will gradually shift from the time of reward to the timeof the predictors of subsequent reinforcement (reviewed in Suri,2002; Suri and Schultz, 2001). Indeed, different types of neuronshave been shown to exhibit these sorts of changes in firing duringlearning (Hollerman and Schultz, 1998; Mirenowicz and Schultz,1994; Schultz et al., 1993).

Although the neural circuitry by which temporal differencecomputations occur remains to be clarified, a popular idea is thatthere is one neural network that selects behaviors (the ‘actor’), anda second neural network that evaluates the outcomes of thebehaviors selected by the actor. That second network is referred toas the ‘critic’ (e.g., Houk et al., 1995; Sutton and Barto, 1998). Thefact that neurons within the reward circuitry represent action, andsometimes action sequences, as well as reward (Graybiel, 1998;Hikosaka et al., 1989, 1999; Lavoie and Mizumori, 1994; Mulderet al., 2004; Schmitzer-Torbert and Redish, 2004; Schultz et al.,1997; van der Meer et al., 2010; Wiener, 1993) was taken as initialevidence to support an actor–critic explanation. Computationalmodels suggest that the critic compares the outcome of the action

Page 6: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 101

of the actor against the expected value based on past experience. Ifthere is a discrepancy between predicted and actual rewards (i.e., areward prediction error), a temporal difference reinforcementsignal is used to update the value signal in memory. Future actionsare then selected according to whether they are expected toproduce a maximal value reward.

The striatum has received much attention as the locus of theactor–critic function (e.g., Joel et al., 2002). The lateral dorsalstriatum is often considered to mediate stimulus–response orhabit learning, while the ventral striatum and medial dorsalstriatum are thought of as evaluators of the outcomes of actions(see Section 6). Thus, many view the actor–critic networks ascorresponding to the lateral dorsal striatum and ventral/medialdorsal striatum, respectively (e.g., van der Meer and Redish, 2011;van der Meer et al., 2010). Since reward prediction error signals arecoded by dopamine neurons as well (Khamassi et al., 2008;O’Doherty et al., 2004; Schultz, 1997), dopamine neurons may alsocontribute to analysis by the critic. Others suggest that there aremultiple actor–critic functional modules within striatum, andthese correspond to the matrix–patch cellular subdivisions thatrun through both dorsal and ventral striatum, respectively (Houk,1995). While the issue of localization remains to be resolved, it isbecoming clearer that the neurocircuitry underlying critic func-tions extends across, at least, the dopaminergic-striatal circuitry(see Section 6).

It is worth noting that as appealing as the temporal differencemodel is, it cannot represent the full picture for how reinforcementoutcomes are determined. This is because reward is often delayed,and can be separated from the action for which it was rewarded byother, irrelevant actions. Such a delay creates an accountabilityproblem referred to as the problem of ‘temporal credit assignment’(Sutton and Barto, 1998). Studies of goal-directed navigation couldbe particularly useful for determining how the brain naturallysolves the temporal credit assignment: one can imagine a casewhen an animal will have to make a decision at, for example, a ‘forkin the road’. After enacting a decision about which way to turn, anumber of pathways may become available, the selection of anyone of which will lead to the goal (see Fig. 3). The next time theanimal encounters the ‘fork in the road’, it will have to rememberwhich of the many subsequent alternatives led to the desired goal.

4.2. Dopamine and reinforcement learning

A critical and unresolved issue is how the brain implementsreinforcement learning algorithms. In a series of pioneering studiesconducted in non-human primates, Schultz et al. (1997) providedevidence that one of the primary neural correlate of reinforcementlearning theory may reside in the signal provided by midbraindopamine neurons. Dopamine neurons respond with phasic burstsof action potentials when an unexpected reward is delivered, andalso respond to conditioned cues that predict reward (Ljungberget al., 1992; Mirenowicz and Schultz, 1994). When, however, anexpected event or reward does not occur, the activity of someputative dopamine cells tend is inhibited. Thus, a reward that isbetter than predicted can generate a positive prediction error, afully predicted reward elicits no error, and a reward that is worsethan predicted can elicit a negative prediction error (e.g., Bayer andGlimcher, 2005; Hollerman and Schultz, 1998; Hollerman et al.,1998). In this way, dopamine acts as a teaching signal that enablesthe use of flexible behaviors during learning (Schultz andDickinson, 2000), and facilitates motivated behaviors by signalingthe salience of environmental stimuli, such as cues that predictfood (Berridge and Robinson, 1998; Flagel et al., 2011; Salamoneand Correa, 2002). In addition, the prediction error signal appearsto take into account the behavioral context in which rewards areobtained (Nakahara et al., 2004).

The prediction error hypothesis has garnered a great deal ofattention since it was first proposed because it is exactly the kind ofteaching signal that figures prominently in many models oflearning, including the Rescorla–Wagner model and the temporaldifference reinforcement learning algorithm (Rescorla andWagner, 1972; Sutton and Barto, 1998; Sutton, 1988). There is,however, evidence that dopamine may also function in othercapacities to facilitate learning. For example, while most con-ceptualizations focus on reward-related signaling in the positivesense, there is also evidence that a subpopulation of dopamineneurons exhibit phasic responses to aversive stimuli or to cues thatpredict aversive events (e.g., Brischoux et al., 2009; Joshua et al.,2008; Matsumoto and Hikosaka, 2009; Zweifel et al., 2011). Inaddition, there are data suggesting that dopamine may provide areward risk signal (Fiorillo et al., 2003), and also signal non-rewarding salient events, such as surprising or novel stimuli(Redgrave and Gurney, 2006). Thus, a broader conceptualization ofthe role of dopamine in learning has emerged (e.g., Berridge, 2007;Bromberg-Martin et al., 2010; Redgrave and Gurney, 2006;Redgrave et al., 1999b; Salamone, 2007; Wise, 2006). Based on agrowing body of experimental evidence that suggests thatdifferent subgroups of neurons within the midbrain responddifferentially to, reward, aversive stimuli and novelty, Bromberg-Martin et al. (2010) suggest that some dopamine neurons encodereward value, necessary for reward seeking and value learning,while others encode motivational salience necessary for orientingand general motivation.

One hypothesis about how dopamine supports reinforcementlearning is that it adjusts the strength of synaptic connectionsbetween neurons according to a modified Hebbian rule (‘neuronsthat fire together wire together’; Hebb, 1949). Conceptually, if cellA activates cell B, and cell B results in an action that is rewarded,dopamine is released and the A/B connection is reinforced(Montague et al., 1996; Schultz, 1998a,b). With enough experience,this mechanism would allow an organism to learn the optimalchoice of actions to gain reward. In fact, dopamine has been shownto facilitate synaptic plasticity in several mnemonic brainstructures (Frank, 2005; Goto et al., 2010; Lisman and Grace,2005; Marowsky et al., 2005; Molina-Luna et al., 2009; Surmeieret al., 2010). The precise information transmitted when dopaminecells fire is not clear. To address this issue, it is necessary tounderstand the firing patterns of dopamine neurons, and thefactors that regulate these patterns. Dopamine signals occur in twomodes, a tonic mode and a phasic mode (Grace, 1991; Grace et al.,2007). Tonic dopaminergic signaling maintains a steady baselinelevel of dopamine in afferent structures. While a precise functionalrole for the tonic dopamine signal has not yet been established(Ostlund et al., 2011), one intriguing hypothesis is that tonicdopamine may represent the ‘‘net value’’ of rewards, and underliethe vigor with which responding is made (Niv et al., 2007). Phasicdopamine, on the other hand, is the dopaminergic signal that isthought to do the heavy lifting, at least in terms of rewardprocessing (Schultz, 1997; Schultz et al., 1997; Wise, 2005) andincentive salience that promotes reward seeking (Berridge andRobinson, 1998). Dopamine may have unique effects acrossdifferent efferent targets, however, since (a) the regulation oftonic vs. phasic activation of dopamine cells is controlled by anarray of diverse inputs, and (b) dopamine efferent systems expressdifferent levels and types of dopamine receptors. Important for thepresent discussion, both the ventral tegmental area (VTA) and thesubstantia nigra pars compacta (SNc) project to the hippocampusand to the striatum, two brain structures frequently discussed interms of goal-directed navigation and learning. How dopaminecontributes to information processing within these structuresduring navigation-based learning will be discussed in thefollowing sections.

Page 7: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

Medialentorhinal

cortex

Lateralentorhinal

cortex

Neocortex

Hippocampal formationParahippocampal cortex

Postrhinalcortex

Perirhinalcortex

Subiculum DentategyrusCA1 CA3

Spatial informationNonpatial information

Fig. 4. Flow of cortical information to hippocampus. Multimodal sensory, motor,

and associative information arrive in the hippocampus primarily through the

parahippocampal cortex. The anatomically distinct medial entorhinal cortex and

lateral entorhinal cortex receive spatial and nonspatial information from distinct

adjacent cortical regions of the postrhinal cortex (spatial), which receives input

from the parietal and retrosplenial cortices (not shown), and perirhinal cortex

(nonspatial), respectively. Both entorhinal cortical regions, in turn, project to the

dentate gyrus, CA3, CA1 and subicular regions of hippocampus proper. Although all

intrahippocampal regions receive neocortical input, each is thought to make a

distinct contribution to the determination of context saliency as context

information passes through from the dentate gyrus to the subiculum. The red

arrow refers to the large recurrent excitatory system found amongst CA3 neurons.

Presumably this unique pattern allows for information to be held on-line for brief

periods.

CA1CA3

B

C

A Place cells

Grid cells

Head direction cells

Fig. 5. (A) Schematic illustration of location-selective firing by a hippocampal CA1

place cell (red), and a hippocampal CA3 place cell (blue). As shown, CA3 place fields

tend to be more spatially constricted than CA1 place fields. Also place fields

typically show a Gaussian distribution of firing as an animal traverses the place

field. (B) Entorhinal cortex contains cells that show regularly spaced location-

selective firing. These are referred to as grid cells as the firing fields can be viewed as

vertices of a grid that covers a particular environment. (C) A third type of spatial

representation is one that relays information about the directional heading of an

animal. In this example, the arrows indicate the preferred orientation direction of a

cell: if the animal orients its head in the northeast direction of the environment

(from any location), the cell will preferentially fire. Typically, when the rat orients

its head in other directions, a head direction cell will not fire.

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135102

5. The neurobiology of reinforcement learning and goal-directed navigation: hippocampal contributions

The previous discussion clearly illustrates the central role ofdopamine in decision-making processes that lead to effectivelearning. In this section, we first describe the hippocampal neuralcircuit whose dynamic and interactive functions form thesubstrate on which the dopamine system acts, then discuss howthis circuit guides decision making (and ultimately learning) byidentifying the saliency of a context (i.e., whether a familiarcontext has changed or if the current context is novel). Bothinstances of context analysis may rely on the same computation.

5.1. Hippocampal place fields as spatial context representations

The hippocampal complex is comprised of hippocampus properand the surrounding parahippocampal cortex. Generally speaking,there are two tracks of information flow into the hippocampusfrom the neocortex (see Fig. 4). Spatial information arrives from thepostrhinal region to the medial entorhinal area of posterior cortex.In contrast, predominantly nonspatial information is passed fromthe perirhinal cortex to the lateral entorhinal cortex. Bothentorhinal cortices in turn project to all of the subregions ofhippocampus proper (which includes the dentate gyrus, CA3, CA1and subicular areas; Amaral and Lavenex, 2006; Burwell, 2000;Burwell and Amaral, 1998a,b; Van Strien et al., 2009).

Single unit recording studies have generated foundationalinformation for theories of hippocampal function. The mostcommonly reported behavioral correlate of hippocampal outputneurons (pyramidal cells) is location-selective firing, referred to asplace fields (see Fig. 5 for an example; O’Keefe and Dostrovsky,

Page 8: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 103

1971). The seminal discovery that hippocampal pyramidal neuronsexhibit remarkably distinct and reliable firing when rats visitparticular regions of the environment led to a widely held view ofhippocampus as a cognitive map (O’Keefe and Nadel, 1978a,b).Decades of research (for reviews see McNaughton et al., 1996;Mizumori et al., 1999; Muller et al., 1996; O’Keefe, 1976; O’Mara,1995; Wiener, 1996) clearly demonstrate that place fields reflectmore than details of the current external sensory surround sincethey are observed when external cues are essentially absent(McNaughton et al., 1996; O’Keefe and Conway, 1978; Quirk et al.,1990). Further, in the absence of external sensory cues, temporal orinternal sensory cue information has been shown to shape thecharacteristics of place fields. For instance, the elapsed time sinceleaving a goal box can often be a better predictor of place fieldsthan the external features of an environment (Gothard et al., 1996;Redish et al., 2000). Also, internally generated sensory and motioninformation about one’s own behavior impacts place fields: thevelocity of an animal’s movement through a place field, thedirection in which rats traverse a place field, and vestibular (orinertial) information has been shown to be correlated with placecell firing rates (e.g., Gavrilov et al., 1998; Hill and Best, 1981;Knierim et al., 1995; Markus et al., 1994; McNaughton et al., 1983;Wiener et al., 1995). Evidence indicates that the location selectivityof place fields is positively related to the degree of sensitivity tointernally generated cues: for example, the extent to which placefields are sensitive to internally generated cues systematicallydeclines from the septal pole to the temporal pole of hippocampus(Maurer et al., 2005), and place fields become increasingly largerfor place cells recorded along the dorsal-to-ventral axis (e.g., Junget al., 1994). Also supporting the conclusion that (at least dorsal)hippocampal place fields represent egocentric information arefindings that the degree to which animals are free to move about inan environment predicts place field specificity (Foster et al., 1989;Gavrilov et al., 1998; Song et al., 2005). Compared to passivemovement conditions in which rats are made to go through a placefield either by being held by the experimenter or by being placedon a moveable robotic device, active and unrestrained movementcorresponds to the observation of more selective and reliable placefields (Terrazas et al., 2005). The fact that neural representations inthe brain are so dramatically affected by voluntary and activenavigation provides a compelling argument for studying not onlylearning, but also decision making, in animals that navigatespatially extended environments.

One interpretation of the sensitivity of place fields to bothegocentric and allocentric information is that it allows rats torapidly switch between multiple cue sources, thereby insuringcontinuously adaptive choices (e.g., Etienne and Jeffery, 2004;Gavrilov et al., 1998; Knierim et al., 1995; Maurer et al., 2005;McNaughton et al., 1996; Mizumori et al., 2000; Mizumori, 2008;Whishaw and Gorny, 1999). Such an ability seems advantageous ina constantly changing environment. The identity of the necessarychanges in conditions that lead to a decision to switch strategies,however, remains to be determined.

To identify motivational or mnemonic, rather than sensory orbehavioral state influences on place fields, rats can be trained tosolve a maze task under conditions in which the external sensoryenvironment and the behavioral requirements of the task are heldconstant while the internal state or specific memory used to guidebehaviors are manipulated by the experimenter (e.g., Frank et al.,2000; Kelemen and Fenton, 2010; Smith and Mizumori, 2006a,b;Wood et al., 2000; Yeshenko et al., 2004). Under these testconditions place field representation of sensory and behavioralinformation can be conditional upon an animal’s motivationalstate (e.g., hungry or thirsty; Kennedy and Shapiro, 2004), as wellas recent (retrospective coding) or upcoming (prospective coding)events such as behavioral sequences, or response trajectories

(Buzsaki, 1989; Fenton and Muller, 1998; Ferbinteanu and Shapiro,2003; Ferbinteanu et al., 2011; Foster and Wilson, 2006; Franket al., 2000; Lee and Wilson, 2002; Louie and Wilson, 2001;Olypher et al., 2002; Pennartz et al., 2002; Touretzky and Redish,1996; Redish, 1999; Wilson and McNaughton, 1994; Wood et al.,2000; Yeshenko et al., 2004). Additional reports provide evidencethat place fields reflect expectations based on learned rewardinformation (e.g., Jackson and Redish, 2007). Place fields have beenobserved to move closer to goal locations as animals gain moreexperience receiving rewards at the goal (Hollup et al., 2001;Lenck-Santini et al., 2001, 2002). Further, when compared to timesof random foraging, a larger proportion of hippocampal neuronsexhibit reward responsiveness when rats are explicitly trained todiscriminate reward locations (Smith and Mizumori, 2006b). Thus,an animal’s motivational state or its expectations or successfulbehavioral outcomes contribute to how learning-related brainstructures code information that is directly relevant to futuredecisions and behavioral choices.

Place fields, then, appear to represent a matrix of informationthat includes location-selective salient features such as externaland internal sensory information, an animal’s past, present, andfuture behaviors relative to the target location, as well as theexpectations for the consequences of behaviors. This sort ofcomplex representation has been taken as evidence that duringactive navigation, the hippocampus represents spatially organizedcontextual information, perhaps for the purpose of determiningthe salience of the current context. Context saliency refers to notonly the significance of currently existing contextual features, butalso the extent to which the expected contextual features havechanged (e.g., Kubie and Ranck, 1983; Mizumori et al., 1999, 2000;Mizumori, 2008; Nadel and Payne, 2002; Nadel and Wilner, 1980).This conclusion is consistent with a literature documenting theimpact of hippocampal lesions on animals’ use of contextualinformation (for reviews see Anagnostaras et al., 2001; Maren,2001; Myers and Gluck, 1994). For example, subjects withhippocampal damage do not exhibit conditioned fear responsesto contextual stimuli even though responses to discrete condition-al stimuli remain intact (Kim and Fanselow, 1992; Phillips andLeDoux, 1992). While intact subjects exhibit decrements inconditioned responding when the context is altered, subjects withlesions of the hippocampus (Penick and Solomon, 1991) or theentorhinal cortex (Freeman et al., 1997) do not. These findingsconverge on a hypothesis that hippocampus is important fordetermining context saliency.

It is important to note that a context processing interpretationof hippocampal neural representations is entirely consistent with anumber of hypotheses that have been put forth to account forhippocampal contributions to learning, including spatial proces-sing (e.g., Long and Kesner, 1996; O’Keefe and Nadel, 1978a,b;Poucet, 1993), working memory (Olton et al., 1979), relationallearning (Eichenbaum and Cohen, 2001), episodic memory (e.g.,Tulving, 2002), context processing (e.g., Hirsh, 1974), declarativememory (Squire, 1994), and the encoding of experiences in general(Moscovitch et al., 2005). It is consistent with these other theoriesbecause context analyses represent a fundamental computation ofthe hippocampus that underlies relational learning, or episodic,working, or declarative memory (e.g., Mizumori, 2008).

5.2. The hippocampus distinguishes contexts during navigation

The literature shows that place cells are simultaneouslyresponsive to, and thus presumably encode, a combination ofdifferent context-defining features such as spatial information (i.e.,location and heading direction), consequential information (i.e.,reward), current movement-related (i.e., velocity and acceleration– determinants of response trajectory), external (nonspatial)

Page 9: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135104

sensory information, the currently active memory (definedoperationally in terms of task strategy and/or task phase), andthe current motivational state. Thus, place fields are considered tobe spatial context representations, and it has been suggested thatthey code the extent to which familiar contexts change (Nadel andPayne, 2002; Nadel and Wilner, 1980), perhaps by performing amatch–mismatch comparison of expected and actual contextfeatures (e.g., Anderson and Jeffery, 2003; Jeffery et al., 2004;Mizumori et al., 1999, 2000; Vinogradova, 1995). The results ofmatch–mismatch comparisons can serve as a metric for determin-ing the saliency of the current context, and this in turn should bedirectly related to an animal’s ability to distinguish contexts. Sucha discrimination function seems necessary for the hippocampus todefine significant events or episodes (as defined by Tulving, 2002).Analogous to what has been described by others (e.g., Hasselmo,2005a,b; Hasselmo and McGaughy, 2004; Lisman, 1999; Mizumori,2008; Smith and Mizumori, 2006a,b; Treves, 2004; Wang andMorris, 2010) the process of comparing expected and actualcontexts should be automatic in nature because a change in acontext can happen often or at unexpected times during naturalforaging. By continually determining context saliency (i.e., alwayscomputing whether a context has changed), the hippocampus canimmediately alert other neural systems when a change doesoccurs. In this way, the hippocampus contributes to rapid learningof new information and the optimal implementation of adaptivechoices and behaviors.

What is the underlying neural circuitry that discriminatescontexts? A Context Discrimination Hypothesis (Mizumori, 2008;Smith and Mizumori, 2006a) emphasizes the importance ofrepresenting integrated sensory, motivational, response, andmemorial input. Indeed, place fields represent such integratedinformation. The relative strengths of these four types of inputsmay vary depending on task demands such that a given cell mayshow, for example, a place correlate during the performance of onetask, and a nonspatial correlate during the performance of adifferent task (e.g., Wiener et al., 1989). Also, movement correlatesobserved in one task may not be observed when the memorycomponent of the context, and not behavior, changes (e.g.,Yeshenko et al., 2004). It should be noted that context discrimina-tion by hippocampal neurons is observed not only duringperformance of spatial tasks, but also during nonspatial taskperformance such as olfactory (e.g., Wiener et al., 1989) or auditorydiscrimination (Freeman et al., 1996; Sakurai, 1994). Thus, contextdiscrimination may be a basic hippocampal operation that can beuniversally applied to facilitate decision making, enhance learning,and/or strengthen any sort of memory that uses contextinformation. As such, it is important to understand how contextdiscrimination is accomplished at a neural level, since this shouldhelp us to understand the types of contextual information thatcome to impact future decisions. The following summarizes theneural circuitry that may be responsible for determining contextsaliency by hippocampal neurons.

5.3. Cellular and network mechanisms underlying hippocampal

context processing

Determining context saliency likely involves a number of stagesof processing within different synaptic regions of hippocampus(Fig. 4). The following discussion describes how these variousstages of processing may result in an assessment of contextsaliency, beginning with context representation by individualneurons.

The relative influence of context-defining input on thedischarge rates of place (pyramidal) cells and interneurons mayvary not only according to the strength of each type of afferentinput, but also the intrinsic (membrane) properties of a cell. Place

cells exhibit characteristic short-lasting, high frequency bursts ofaction potentials when a rat passes through a cell’s place field(Ranck, 1973). This type of phasic, burst firing pattern is thought tobe associated with increased synaptic plasticity (Martin et al.,2000), as well as the encoding of discrete features of a situationthat do not change very rapidly or often (e.g., significant locations,reward expectations, task phase). Interneurons, on the other hand,discharge signals continuously and at high rates, a pattern that iswell suited to encode rapidly and continuously changing features,such as changes in movement and orientation during taskperformance. The combination of context features and thepotential for temporally patterned discharge by both pyramidalcells and interneurons, then, provides the hippocampus with a richarray of rate and temporal neural codes to use in the determinationof context saliency (Mizumori et al., 1999; Mizumori, 2008).

It is often reported that place fields rapidly reorganize (i.e.,change field location and/or firing rate with the place field) whenan environmental context is altered. Notably, however, unless ananimal is tested in a completely novel environment, one also findsa group of place fields that are unchanged following a change in thecontext. Thus there seems to be two forms of context representa-tion in the hippocampus. The place fields that reorganize aftercontext modification may reflect current contextual features whilethe place fields that persist when a context changes may reflect theexpected contextual features. In principle, a novel environmentwould not generate expectations, resulting in ‘complete reorgani-zation’, where 100% of the cells exhibit new place field properties.However, when an animal experiences a change in a familiarcontext, one observes what is referred to as ‘partial reorganization’,when only a subset of place fields show altered properties (forreview, see Colgin et al., 2008). To explain the latter, it is helpful toclarify that any context representation, almost by definition,reflects a unique array of inputs. In theory, then, a change in anyone or combination of features could result in the production of an‘error’ signal that reflects a mismatch between expected and actualcontext features (Mizumori et al., 2000). If such a ‘contextprediction error’ occurs, then the output message from hippocam-pus should reflect this fact. Such a signal may be sent to updatecortical memory circuits, which in turn leads to an update of themost recent hippocampal expectation for a context. A hippocampaloutput that signals a context prediction error may also be sent tothe ventral striatum to engage the critic function of the actor–criticsystem (described in more detail in Section 4.1). Further, a contexterror message should update the selection of ongoing behaviors byinforming basal ganglia circuitry. If it is determined that thecontext has not changed (i.e., there is no place field reorganization),a consistent hippocampal output will result in the persistence andstrengthening of currently active neural activity patterns, which inturn maintains the same expectation information in hippocampus,and the same behavioral expression patterns.

It is intriguing to note that the proposed error analysis byhippocampus is analogous to error prediction signals thatdopamine cells generate when an expected reward is not realized.It is known from studies of dopamine cells that the magnitude ofthe error prediction signal depends in part on the certainty andsaliency of reward (Fiorillo et al., 2003; Mirenowicz and Schultz,1994; Schultz, 1997; Schultz et al., 1997): the less certain it is that areward will be found, the smaller the magnitude of an errorprediction signal. When this idea is applied to our understanding ofplace field reorganization, one could argue that whether a placefield reorganizes depends on the strength of memory expectations.A strong expectation signal to some cells may result in a highthreshold for generating error signals, i.e., place field reorganiza-tion. Place fields of these cells would tend to show persistent placefields when there is a minor context shift. Such a condition mayapply to CA1. Other cells may not receive such a strong expectation

Page 10: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 105

signal, resulting in place field reorganization following even minorchanges in context, such as that which is observed for CA3 placefields.

With the introduction of new technologies and cleverexperimentation by a large number of researchers, a neurobiologi-cal model of hippocampal function has emerged that describesmechanisms involved in determining the saliency of a context. Theprocess of context comparison begins by identifying the relevantstimuli and memories (or expectations). The dentate gyrus isthought to engage in pattern separation functions that might servethis purpose by distinguishing between similar, potentiallyimportant inputs (Gilbert et al., 2001; Leutgeb et al., 2007; O’Reillyand McClelland, 1994; Rolls, 1996). Specifically, dentate gyrusplace fields tend to be smaller (i.e., more spatially localized) thaneither CA3 or CA1 place fields, and they show the most immediateresponse to context changes. Also, the fact that there is tremendousconvergence of input from the dentate gyrus to the CA3 regions(Amaral et al., 1990) further suggests that the dentate gyrus filters,or separates patterns of information, for subsequent hippocampalprocessing. The transformation of CA3 place fields to downstreamCA1 place fields is currently enigmatic since the connections aredirect, yet there are clear differences in the properties of CA3 andCA1 place fields.

5.3.1. CA3 and CA1 place fields contributions to the evaluation of

context

Hippocampal-based context evaluations require representationof both expected and current context information. There is ampleevidence that both CA1 and CA3 place fields represent bothexpected and current contextual information. However, recentdata suggest that the contributions made by CA3 and CA1 placecells differ. When rats perform at asymptotic levels on hippocam-pal-dependent spatial memory tasks, CA3 place fields are smallerthan CA1 place fields, and more easily disrupted following cuemanipulations (Barnes et al., 1990; Guzowski et al., 2004;Mizumori, 2006; Mizumori et al., 1989b, 1999). CA3 place fieldsare more labile generally than CA1 place fields in that they are alsomore easily disrupted following reversible inactivation of themedial septum (Mizumori et al., 1989a). The greater sensitivity ofCA3 fields to changed inputs seems to occur regardless of the typeof task being used (Lee et al., 2004; Leutgeb et al., 2004). This mayindicate that CA3 place fields are more exclusively linked to thecurrently active spatial coordinate system (i.e., a map; Leutgebet al., 2007) compared to CA1 place fields. As such, CA3 is bettersuited than CA1 to distinguish the contextual significance ofabsolute locations in space, a process that presumably relies onsmall differences in input configurations at different locations. Thisfunction is likely related to the key role that CA3 plays in the rapidacquisition of new memories (Kesner, 2007; Miyashita et al., 2009),a conclusion that is consistent with a vast literature on theimportance of hippocampus for new learning (Mizumori et al.,2007b).

If CA3 is the brain area where context novelty is identified, thenone would expect CA3 to also represent information that definesthe baseline expectations from which novelty (i.e., unexpectedinformation) is determined. In this regard, it is worth noting thatdespite the greater overall sensitivity of CA3 place fields to changesin contextual information, a subpopulation of CA3 place fieldscontinue to persist when faced with contextual changes in familiarenvironments (Mizumori et al., 1999). Novelty detection requires amechanism by which baseline and new information can be heldbriefly on-line so that the expected and current information can becompared. The intrinsic circuitry of CA3 is one that can holdinformation on-line: less than one-third of its inputs come fromoutside of CA3 (Amaral and Lavenex, 2006), and the mostprominent input to CA3 pyramidal cells come from the CA3 cells

themselves. The recurrent networks of the CA3 region may supportthe short-term buffer that is postulated to be needed to determinewhether specific features of the current context match expectedcontextual features (e.g., Gold and Kesner, 2005; Guzowski et al.,2004; Treves, 2004).

CA1 also seems to represent current and expected contextualinformation but, relative to CA3, a greater proportion of cells showpersistent place fields despite changes in a familiar context (e.g.,Lee et al., 2004; Leutgeb et al., 2004; Mizumori et al., 1989b, 1999).CA1 place fields also show more discordant responses to contextchange than CA3 (Lee et al., 2004), and this may reflect the fact thatCA3 is driven in large part by recurrent collaterals while CA1 is not.Further, as noted above, CA3 may be more strongly tied to a spatialcoordinate system than CA1, and perhaps this accounts for thecommon findings that CA3 place fields tend to be smaller in sizerelative to CA1 place fields, and that more CA1 than CA3 place cellsshow ‘split fields’, i.e., more than one location that elicits elevatedfiring. All of the above differences suggest that CA1 place fields donot convey as precise location or sensory information as CA3 placefields, and consequently they may include more nonspatialinformation within their neural code (Mizumori et al., 2000;Wiener et al., 1989). Furthermore, Henriksen and colleagues(2010) further suggest that the extent to which CA1 conveysspatial and nonspatial information varies depending on thelocation of the CA1 place cell being recorded: distal (closest tosubiculum) CA1 neurons show stronger spatial codes thanproximal CA1 place neurons.

A difference in the ratio of spatial to nonspatial informationcoded by CA3 and CA1 place fields may be accounted for by theirdifferent afferent patterns of input. For example, nonspatialcontext-defining information may arrive directly in CA1 via layerIII entorhinal input. By comparison, CA3 receives its directentorhinal cortex input from layer II (Witter et al., 2000) whichseems to contain more neural codes for explicit spatial featuresthan layer III. If some of the nonspatial input to CA1 includesmemory-defined expectations, then this may account for a greaterproportion of CA1 place fields showing stability across minor shiftsin context.

If CA3 is primarily responsible for the comparison of contextualinformation, then what function does CA1 serve? Many havesuggested that CA1 is especially important for temporallyorganizing or sequencing information (e.g., Gilbert et al., 2001;Hampson et al., 1993; Hoge and Kesner, 2007; Kesner et al., 2004;Olton et al., 1979; Rawlins, 1985; Treves, 2004; Wiener et al.,1995). That is, CA1 place cells may temporally organize, or define,CA3 output such that meaningful epochs of related information arepassed on to efferent targets, such as the prefrontal cortex (Jayet al., 1989) and subiculum, to impact future behavioral choices.Neocortical-based memory representations may, via direct ento-rhinal input to CA1 (Witter et al., 2000), predispose CA1 totemporally organize CA3-based information in experience-depen-dent ways (Mizumori et al., 1999). Although the precise nature ofthis temporal organization remains to be determined, CA1 appearsto be more tightly coupled than CA3 cells to the rhythmicoscillations of hippocampal EEG (Buzsaki, 2005; Buzsaki andChrobak, 2005).

5.3.2. Temporal encoding of spatial contextual information

It is becoming clearer that important context information isembedded within the temporal organization of intrahippocampalnetworks. Many years ago, it was shown that movement throughplace fields is associated with dynamic changes in spike timingrelative to the ongoing theta oscillations in the EEG (O’Keefe andRecce, 1993). That is, on a single pass through a field, the first spikeof successive bursts of spikes occurs at progressively earlier phasesof the theta cycle. The discovery of this so-called ‘phase precession’

Page 11: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135106

effect is considered significant because it was the first clearevidence that place cells are part of a temporal code that couldcontribute to the mnemonic processes of the hippocampus.Changes in this sort of temporally organized spiking may be akey mechanism by which place fields provide a link betweentemporally extended behaviors of an animal and the comparativelyrapid synaptic plasticity mechanisms that are thought to subservelearning (e.g., Skaggs et al., 1996). Theoretical models have beengenerated to explain in more detail how phase precession couldexplain the link between predictive and sequence behaviors, andneural plasticity mechanisms (Buzsaki, 2005; Buzsaki andChrobak, 2005; Jensen and Lisman, 1996; Lisman and Redish,2009; Zugaro et al., 2005).

Another form of temporal-based neuroplasticity involves achange in the timing of spike discharge by one cell relative to thoseof other cells. For example, theta recorded from CA1 and CA3 tendto be more cohesive when rats pass through the stem region of a T-Maze, presumably reflecting greater synchrony of neural firingduring times when decision are made (Montgomery et al., 2009).Greater synchronization could offer a stronger output signal toefferent structures. Experience-dependent temporal codes mayalso be found in terms of the temporal relationships between thefiring of cells with adjacent place fields. With continued exposureto a new environment, place fields begin to expand asymmetricallyin that the peak firing rate is achieved with shorter latency uponentrance into the field (Mehta et al., 1997, 2000). It was postulatedthat repeated activation of a particular sequence of place cellsresults in stronger synaptic connections between cells withadjacent fields. Under these conditions entry into one place fieldbegins to activate the cell with the adjacent place field at shorterand shorter latency. The asymmetric backwards expansion of placefields is thought to provide a neural mechanism for learningdirectional sequences. Moreover, it has been suggested that thebackward expansion phenomenon may contribute to the trans-formation of a rate code to a temporal code such as that illustratedin phase precession (Mehta et al., 2000). The backward expansionmechanism could also help to explain other place field phenome-non such as the tendency for place cells to fire in anticipation ofentering a field within a familiar environment (Muller and Kubie,1989). While the dynamic changes in place field shape areintriguing, it remains to be determined whether the asymmetricexpansion is directly related to spatial learning. Also, there is anintriguing possibility that dopamine may play a key role incoordinating some aspect of the temporal phenomena observed inhippocampus. For example it has been shown that the temporalcoherence of the discharges of place cells is greater in mice with anintact hippocampus compared to mice with deficient NMDAsystems (McHugh et al., 1996), and there is evidence thatdopamine may exert powerful influences in hippocampus viacontrol of NMDA receptor function (e.g., Bethus et al., 2010; Freyet al., 1990). Therefore it is possible that even though the relativequantity of dopamine innervations in hippocampus is small (Fieldset al., 2007) dopamine may have a critical orchestrating role in ahippocampal determination of context salience.

5.3.3. Sources of hippocampal spatial and nonspatial information

Consideration of the sources of the different types of informa-tion that enters into hippocampal context-related computationsprovides keen insight into the stages of processing required tomake efficient, context-relevant choices. The parahippocampalregion (which includes perirhinal, postrhinal, and entorhinalcortices; see Fig. 4) is considered to provide the bulk of the spatialand nonspatial sensory information to the hippocampus (Burwell,2000; Burwell and Amaral, 1998a,b; Eichenbaum and Lipton, 2008;Hunsaker et al., 2007; Knierim et al., 2006; Witter et al., 2000).Generally, spatial information is thought to arrive in the

hippocampus via the medial regions of the parahippocampalcortex (i.e., postrhinal cortex and the MEC) since a prominent inputto postrhinal cortex is the posterior parietal cortex (Burwell andAmaral, 1998a,b). In contrast, the multimodal temporal cortex ofrat projects nonspatial information to the hippocampus via thelateral parahippocampal regions (i.e., perirhinal cortex and LEC).Both MEC and LEC afferents appear to relay visual, auditory,olfactory and/or tactile sensory information (Burwell and Amaral,1998a). Thus, the nature of information transmitted within apathway or brain structure does not reveal how that information isused. [This broad conclusion will be seen to be relevant when themesoaccumbens system is discussed below.] Also, although theMEC is often considered to be specialized to process spatialinformation, accurate navigation likely relies on integrated inputfrom both MEC and LEC since one needs to understand the spatialdimensions of behavior (e.g., location and orientation) relative tosalient environmental information. Indeed, contralateral, but notipsilateral, lesion of the perirhinal cortex and the hippocampusresults in impaired object–place association learning (Jo and Lee,2010).

The recent development of more theories on a more specificrole for parahippocampal cortex during active navigation is mainlydue to the discovery of multiple types of spatial representation inthe MEC (Enomoto and Floresco, 2009; Hafting et al., 2005;Sargolini et al., 2006; Taha et al., 2007), including grid cells and head

direction cells (see Fig. 5). Like place cells, grid cells fire whenanimals traverse specific locations within an environment.However, unlike place cells, grid cells fire relative to a numberof small regions arranged in a hexagonal grid rather than in a singleregion of a given environment. Head direction cells, on the otherhand, show elevated firing rates that coincide with the particularhead orientation of the rat regardless of the rat’s location. A thirdpopulation of cells shows both grid and head direction properties,and are therefore called conjunctive cells. Finally, a fourth class ofspatial cell is the border cells that are found in the medial entorhinalcortex. Head direction cells and border cells are known to also existin related cortical regions, such as subiculum, postsubiculum,parasubiculum, and postrhinal cortices (Lever et al., 2009; Taubeet al., 1990). There are strong anatomical and functional tiesbetween cells associated with these types of spatial representation,and they are thought to form a coordinated network for orientingan animal in allocentric space.

There are a number of excellent reviews that detail grid fieldproperties (Burgess et al., 2007; Derdikman and Moser, 2010;Moser et al., 2008; Savelli and Knierim, 2010). Briefly, MEC layer IIhas the highest proportion of grid cells (�50%), layer III has a morediverse blend of grid cells, head direction cells and conjunctivecells; head direction cells are the predominant cell type in the deeplayers. Nearby grid cells tend to have similar spacing, but theirpeaks are offset relative to each other. The spacing seems to reflectspatial features of the current environment since, in familiarenvironments, grid fields will rotate in the direction of cuerotations, and if a familiar environment is widened or narrowed,grid field spacing will resize accordingly (Barry et al., 2007). Acrossthe dorsal–ventral axis, there seems to be a topographicallyorganized increased spacing of adjacent grid fields (Enomoto andFloresco, 2009; Hafting et al., 2005). If experimental proceduresinduce grid field reorganization, different grid fields rotate andtranslate together. Such cohesion between grid cells, along withthe regularity of the grids and their apparently consistent spacing,gives the impression that the grid system is stable acrossenvironments and that they might form a blueprint (i.e., a spatialreference frame) onto which the hippocampus can add relevantinformation. Presumably, these sort of spatial and nonspatialassociations in hippocampus derive from convergent input fromthe MEC and LEC. This associative process must occur fairly rapidly

Page 12: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 107

since hippocampal place fields are observed upon first exposure toa new environment (e.g., Hill, 1978; Muller and Kubie, 1987;O’Keefe and Burgess, 1996; Wilson and McNaughton, 1993). Theapparent regularity of the spatial representations within thehippocampal and entorhinal system has been further strengthenedby findings that grid fields, head direction preferences, and placefields show a high degree of coherence (e.g., displacement) inresponse to changes in simple geometric environments (Har-greaves et al., 2007; Lee and Knierim, 2007; Nicola et al., 1996).

Additional studies, however, suggest that a straightforwarddescription of the relationship between grid and place fields is notlikely. Place fields in CA1 continue to reorganize in response tochanges in the visuo-spatial environment for periods of time thatexceed the period of grid field responses (Van Cauter et al., 2008).Also, place fields have been observed to become more specific afterrepeated exposure to a familiar environment (Nicola and Malenka,1998) even after entorhinal cortex lesions. Further, as thebehavioral tasks have become more complex, so have the natureof the responses of grid fields. Importantly the hexagonal gridpatterns do not appear to persist in more complex environments.When an animal is running along a linear track, the grid patternsreset when rats turns around (Fyhn et al., 2007) and if a mazecontains multiple hairpin turns, the resetting occurs periodically(Hikosaka et al., 2008). Finally, when using a linear track that is18 m long, periodicity is limited to sections of the track (Nicola andMalenka, 1998). These observations imply that the ‘gridness’ ofeach cell is subject to being organized by ongoing behavior perhapsseparately from place field reorganization. The extent to whichother features of a context (e.g., motivation, memory, etc.)similarly impact all spatial representations remains to bedetermined.

One issue of importance is the assumption that place and gridfield reliability and spatial specificity is necessary for optimaldecision-making during navigation. For place fields, this issue hasbeen addressed in a number of ways (for review see Mizumoriet al., 2007b), including demonstrations that physiologicalconditions that are associated with normal learning and decisions(e.g., synaptic plasticity mechanisms, sensory and motor proces-sing systems, motivational systems, and so on) are also associatedwith greater place field stability. Although a systematic and directtest of this relationship has yet to be carried out, it is worth notingthat it may be difficult to observe a clear and strong correlationbetween (at least) CA1 place field stability and choice accuracysince the recorded CA1 population tend to exhibit a heterogeneouscollection of neural responses (e.g., within a single recordingsession, there are individual cell differences in place fieldresponses to context changes). Indeed, laboratories have reporteda lack of correlation between CA1 place field reorganization andbehavior (e.g., Cooper and Mizumori, 2001; Jeffery et al., 2003).Most of the place field data in the literature are based on recordingsfrom CA1 neurons. Therefore the relationship between CA3 placefield properties and optimal decisions remains to be determined.The same is true for grid cells: the results of direct tests of therelevance of grid fields for accurate decisions are not yet known.

The discussion so far presents the view that hippocampusfunctions to detect differences between contexts, or detect when acontext changes. A basic algorithm that compares an animal’sexpectations of a familiar contextual environment (i.e., the spatiallayout of external sensory cues, the relevant behaviors to obtainrewards, the location of goals, and consequences to specificchoices) with actual experiences can be used to discriminatecontexts, detect changes in a familiar context, or identify novelsituations. All of these operations have in common the need todetermine the saliency of the current context. There is currentlyonly a rudimentary understanding of how the various neuralrepresentations of the spatial context by hippocampal neurons

(e.g., place and grid fields) may contribute to the determination ofcontext saliency, but there is abundant evidence to support theclaim that this is a key function of the hippocampus.

5.3.4. Determining context saliency as a part of learning

As one learns the significance of a new environment, one’sperception of the relationship between environmental stimuli,responses, and consequences is continually updated. Presumably,mismatches between updated expectations and experiences withthe new context are frequently detected, resulting in the continualshaping of long-term memory representations (McClelland et al.,1995). As memory representations become more precise, so too willthe feedback to hippocampal cells regarding the expected contex-tual features. Thus, it is predicted that place fields should becomemore specific and reliable with continued training as one graduallylearns about associations relevant to the test environment. Insupport of this prediction, many studies have shown that place fieldsbecome more specific and/or reliable with short-term exposure tonovel environments (e.g., Frank et al., 2004; Hetherington andShapiro, 1997; Kentros et al., 1998; Markus et al., 1995; Muller andKubie, 1987; O’Keefe and Burgess, 1996; Wilson and McNaughton,1993). More spatially selective firing (or reduced ‘overdispersion’)has also been reported to reflect goal-directed learning (e.g., Fentonand Muller, 1998; Mizumori et al., 1996; Kentros et al., 1998; O’Keefeand Speakman, 1987; Rosenzweig et al., 2003).

Learning can be considered to be complete when mismatchesno longer occur and consistent memory representations aremaintained during behavior (Mizumori, 2008). Indeed, afterlearning, place fields are remarkably stable after repeatedexposures to the same, familiar context, and this presumablyreflects stable input from memory representations. If more thanone context is learned simultaneously, a given population of placecells should show context-specific patterns of place fields, and eachpattern should be reliable for that context (Smith and Mizumori,2006a,b). Presumably, such stable hippocampal patterns are insome way driven by established neocortical networks, or schemas(Tse et al., 2007). To insure adaptive behavior, however, thehippocampus must constantly engage in context comparisons inthe event that the familiar context is altered. Similarly, hippocam-pus should process contextual information even for tasks that donot explicitly require contextual knowledge in case contextualinformation becomes relevant. Place cell studies indeed, show thatspecific neural codes in the hippocampus remain responsive tochanges in context even though contextual learning is notnecessary to solve a task (Yeshenko et al., 2004). Thus, processingcontextual information by the hippocampus appears to beautomatic and continuous (Morris and Frey, 1997). A differentbut related theory is that the hippocampus uses contextinformation to recall specific context-relevant memories (Fuhsand Touretzky, 2007; Redish, 1999; Redish et al., 2001).

If the hippocampus continually processes contextual informa-tion, then why do hippocampal lesions disrupt only certain formsof learning and not others? If one assumes that lesion effects areobserved only when the intrinsic processing by the structure ofinterest is unique and essential for learning to take place, then nobehavioral impairment should be observed if other neural circuitscan compensate for the lesion-induced change in function. Indeed,there is abundant evidence that under most conditions, stimulus–response learning is not impaired following hippocampal lesions,since striatal computations are sufficient to support such learning(e.g., McDonald and White, 1993; Packard et al., 1989; Packard andMcGaugh, 1996). This does not mean that the hippocampus doesnot normally play a role in stimulus–response performance, butrather, that the hippocampus may contribute by defining thecontext for the learning, which in turn may allow the learnedinformation to be more adaptive in new situations in the future.

Page 13: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

Hippocampus (CA1/subiculum)

Ventralstriatum

Ventraltegmental

area

Ventralpallidum

Pedunculopontinetegmentalnucleus

Prefrontalcortex

Lateral dorsal tegmentumLateral habenulaLateral hypothalamusand more

GLU

GLU

GLU

GLUGLU

ACh

GABAGABADA

Fig. 6. An essential neural circuit that links hippocampal (spatial context)

information with reinforcement learning and decision making systems of the

brain. Direct hippocampal arrives in the reinforcement learning system via the CA1

and subicular projections to the ventral striatum (i.e., the nucleus accumbens). The

ventral striatum is thought to serve as the ‘critic’ in the actor–critic model of

reinforcement learning. As such, the ventral striatum determines whether the

outcomes of behavior are as predicted based on an animal’s expectations for a given

context. If the outcome is as expected, ventral striatum continues exerting

inhibitory control over VTA neurons. In this situation, encounters with rewards do

not result in dopamine cell firing. If the saliency of a context changes (as determined

by hippocampal processing), signals to the ventral striatum may preferentially

excite VTA neurons via an indirect pathway that includes the ventral pallidum and

the pedunculopontine nucleus. The result of this elevated excitation may be a

depolarization of VTA neurons such that they are more likely to fire when

subsequent reward information arrives in VTA.

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135108

5.4. Relationship between hippocampal context codes and

reinforcement based learning

Hippocampal efferent systems can use the result of thehippocampal context analysis to update their neural responseprofile such that subsequent behavioral choices are optimized. Themidbrain and striatal reinforcement learning systems are a majortarget of hippocampal output (see Fig. 6). Therefore, it is oftenassumed that hippocampus provides the necessary contextinformation that guides dopamine-related reward or behavioralresponses. The outcomes of behavioral choices are evaluated bythe reinforcement learning system, and the result of such anevaluation is thought to feed back to memory systems and thehippocampus to update future context-based expectations. Tobegin to discuss how a hippocampal evaluation of context saliencyimpacts reinforcement learning systems of the brain, the followingdiscusses (1) a neuroanatomical network that supports afunctional link between hippocampal place fields and reinforce-ment learning systems, (2) evidence for a role for dopamine inhippocampal-dependent learning and plasticity, and (3) thepossible impact of hippocampal context processing on dopaminecell responses to reward.

5.4.1. Functional connectivity between reinforcement and

hippocampal systems

Direct dopaminergic innervation of the hippocampus arisesfrom both the VTA and the sunstantia nigra pars compacta (SNc),although input from the VTA is more extensive (Gasbarri et al.,1994b). Dopaminergic projections occur across the entirety of thedorsal–ventral axis of the hippocampus, with the ventral axis beingmore heavily innervated. The innervation is also differentially

distributed across the subiculum, CA1, CA3, and the dentate gyruswith CA1 and the subiculum receiving more innervation relative toCA3 and the dentate gyrus (Gasbarri et al., 1994a,b, 1997).Compared to other efferent structures of the dopaminergic systemsuch as the nucleus accumbens, the hippocampus receives arelatively small proportion of input from the VTA; 10% or less of thecytochemically identified dopamine neurons project to thehippocampus, whereas �80% of that population projects to thenucleus accumbens (Fields et al., 2007).

Although the hippocampus receives modest dopaminergicinnervation from the VTA, it is one of the few brain regions thatexpress all of the five dopamine receptor subtypes. The dentategyrus and subiculum shows high levels of the D1 receptor subtype,and the D1-like D5 receptors are expressed throughout thehippocampus. D2 receptor binding sites are most prominent indorsal CA1 and the subiculum, while the levels of D3 receptors arelow throughout. Finally, D4 receptors are found in the dentategyrus, CA1, and CA3. The dopaminergic innervations of thestructure, along with the expression of all five receptor subtypesallows dopamine to have a powerful influence on the function ofthe hippocampus, impacting information processing and plasticity(Frey et al., 1990; Huang and Kandel, 1995; Li et al., 2003;Otmakhova and Lisman, 1998).

The path from hippocampus to the midbrain dopaminergicsystem is indirect and varied (see Fig. 6). The most direct path fromthe hippocampus involves transmission from both dorsal andventral subiculum, and to a lesser extent CA1, via the fimbria-fornix(Boeijinga et al., 1993; Lopes da Silva et al., 1984; Groenewegen et al.,1999a, 1987; McGeorge and Faull, 1989; Mulder et al., 1998;Swanson and Cowan, 1977; Totterdell and Meredith, 1997; vanGroen and Wyss, 1990). More specifically, the dorsal subiculum (andCA1) project primarily to the rostro-lateral shell region of thenucleus accumbens, while the ventral subiculum (and CA1)selectively terminate throughout the rostral–caudal extent of theaccumbens shell. Entorhinal cortex also provides extensive input tothe nucleus accumbens, with the MEC preferentially innervating therostro-medial shell and core divisions of the accumbens, and the LECterminating throughout the rostral–caudal extent of the lateral shelland core regions (Totterdell and Meredith, 1997). It should be notedthat the limbic input to the ventral striatum (including the nucleusaccumbens) is one of a number of convergent inputs to individualventral striatal neurons (e.g., Floresco et al., 2001; French andTotterdell, 2002; Goto and O’Donnell, 2002; O’Donnell and Grace,1995). Other sources of afferents include the prelimbic/infralimbicand orbital frontal cortices, as well as the basolateral amygdala.Thus, the ventral striatum has long been considered a central point ofintegration of information needed for adaptive behaviors (Mogen-son et al., 1980).

It is through the ventral striatum that the hippocampus mayultimately impact dopamine cell firing, since the ventral striatumin turn innervates the VTA and SNc. Moreover, both of the core andshell components of the nucleus accumbens have some degree ofcontrol over the dopamine cells that in turn project to them. Thedetails of the circuitry is complex (for a recent excellent summary,see Humphries and Prescott, 2010) but of direct relevance here isthat the lateral and medial shell innervates, either via direct orindirect routes, the lateral or ventral sectors of the VTA,respectively (Ikemoto, 2007; Zhou et al., 2003). This patternmatches the topography of VTA connections back to the shellregion. Also of note is the fact that both GABA and dopamineneurons participate in this reciprocal interaction between VTA andventral striatum (Carr and Sesack, 2000; Nair-Roberts et al., 2008).This is an important point to note since studies of VTA single unitrepresentations during hippocampal-based memory performancesuggest that it is likely that both dopaminergic and GABAergicpopulations contribute to reward processing (Martig and

Page 14: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 109

Mizumori, 2011; Puryear et al., 2010). Core regions of theaccumbens project to a slightly different population of dopami-nergic neurons, those in the SNC and in the lateral regions of theVTA (Berendse et al., 1992a,b; Usuda et al., 1998; Zhou et al., 2003).These dopaminergic regions seem to project back to the same coreareas that project to it (Joel and Weiner, 1994). For both shell andcore regions, their impact on the VTA and SNc are presumed to beinhibitory since the accumbens projection cells are GABAergic.Thus, one possibility is that excitatory (glutamatergic) messagesfrom the hippocampus add to the inhibitory control overdopaminergic neurons. Currently it is not possible to state howmuch control the hippocampus exerts onto dopamine neuronssince we do not yet fully understand the significance andmechanism of convergence in ventral striatum of hippocampal,frontal and amygdala information. Nevertheless, this is likely animportant pathway by which hippocampal systems and themidbrain motivational circuitry interact.

In addition to the hippocampal–accumbens–VTA/SNc pathway,there are a number of sources of excitatory and inhibitory controlover dopamine cell firing (see Fig. 6), and details of theseconnections remain to be worked out. Four of the most studieddopamine afferent systems include the frontal cortex and theamygdala (Lodge and Grace, 2006; Woolf, 1991), as well as thepedunculopontine nucleus (PPTg) and the lateral dorsal tegmentalnucleus. As an example of the complex nature of each afferentinput, the PPTg provides cholinergic (Woolf, 1991) and glutama-tergic input to VTA and SNc (Beninato and Spencer, 1987; Futamiet al., 1995; Sesack et al., 2003) and this input is topographical innature. The PPTg is characterized by an uneven distribution ofdistinct populations of cholinergic, glutamatergic, and GABAergiccells (Wang and Morales, 2009), with differential input and outputprojections of its anterior and posterior subdivisions (Aldersonet al., 2008). Cholinergic cells are concentrated in posterior PPTg(Wilson et al., 2009) and project mostly to VTA, while anterior PPTgcontains proportionately greater GABAergic cells that project tothe SNc (Oakman et al., 1995). It has been argued that the PPTgregulates the transition to burst firing by dopamine cells (Graceet al., 2007), but precisely how this happens remains underinvestigation. Thus, the ventral striatum may ultimately be in aposition to orchestrate the balance between inhibitory andexcitatory control over dopamine cell firing depending on thedetermination of saliency of the current context by hippocampus.

5.4.2. A role for dopamine in hippocampal-dependent learning and

plasticity

There is abundant evidence that the dopaminergic system playsan important role in hippocampal-dependent behavior andplasticity. The hippocampal dopaminergic system has beenmanipulated in a number of ways, and the bulk of the evidenceshows that dopaminergic agonism and antagonism, respectively,enhance and impair spatial learning. As examples, D1 receptorknock-out mice exhibit deficits in spatial learning (El-Ghundi et al.,1999) and selective 6-OHDA lesions in hippocampus impairedperformance in the Morris swim task (Gasbarri et al., 1996). Directhippocampal infusions of agents that disrupt D1–NMDA receptorinteractions also produce performance deficits in the workingmemory version of the Morris swim task (Nai et al., 2010). Selectiveremoval of hippocampal dopamine input via local 6-OHDAinfusions into the subiculum and adjacent CA1 region of rats alsoimpairs performance in the spatial version of the water maze(Gasbarri et al., 1996). Manipulations of endogenous levels ofdopamine in the hippocampus also negatively impact hippocam-pal-dependent processing (e.g., Kentros et al., 2004; Martig et al.,2009; Wisman et al., 2008). Finally, dopamine agonist treatment inthe hippocampus can reverse age-related decreases in spatialperformance (Bach et al., 1999; Behr et al., 2000).

The hippocampus likely plays a role in detecting changes infamiliar contexts, and for generating novelty related signals thatinitiate relevant investigatory behaviors for both spatial andnonspatial tasks. Interestingly, the dopamine system is also knownfor its association with novelty detection (Horvitz et al., 1997;Ljungberg et al., 1992; Redish et al., 2007; Seamans and Yang,2004), a response that is perhaps triggered following hippocampalidentification of novelty. Further, exposure to novel environmentsenhances synaptic plasticity mechanisms in hippocampus, and thisenhancement appears related to D1 receptor activation (Li et al.,2003). Thus, it has been postulated that a functional loop betweenthe VTA and the hippocampus allows novelty signals from thehippocampus to be relayed to the VTA to generate responses tonovelty by dopaminergic neurons (Lisman and Grace, 2005;Mizumori et al., 2004). The latter responses are then thought tobe relayed back to the hippocampus to facilitate plasticity circuitsand learning.

Most of the studies investigating possible dopaminergic effectson hippocampal function include the application of drugs directlyto, or lesions of, the hippocampus. Recently, Martig et al. (2009)employed a different approach, and that was to reversiblyinactivate the VTA of rats to temporarily reduce endogenouslevels of dopamine within the hippocampus. Attempts were madeto selectively silence VTA dopamine neurons by infusing baclofen(Xi and Stein, 1998), rather than more broadly inactivating VTAwith anesthetics such as lidocaine or tetracaine. VTA inactivationsignificantly impaired choice accuracy on a hippocampal-depen-dent spatial working memory task. However the effect was timedependent: greater impairment was observed after the initial daysof infusion, suggesting some form of compensatory change in theneural circuitry connecting the hippocampus and the VTA. Further,VTA inactivation selectively impaired short term working memory,a form of memory that is hypothesized to be important following achange in context. Importantly, the selective behavioral effectsdemonstrate that the hippocampal effects were not due to changesin behavioral control or motivation.

In a subsequent experiment, Martig and Mizumori (2011)recorded hippocampal place field responses to baclofen-inducedinactivation of the VTA as rats performed a spatial workingmemory task on a radial arm maze. Based on the findings ofKentros et al. (2004), it was predicted that VTA inactivation woulddestabilize choice accuracy that is dependent on hippocampalfunction, as well as the stability of place fields. Also, given thedifferential distribution of VTA afferents to the hippocampalsubfields (CA1 > CA3), it was expected that CA1 place fields wouldbe impacted more dramatically than CA3 place fields. Finally, giventhe transient behavioral effect that was observed by Martig et al.(2009), the maze training procedures were modified to increasethe likelihood that VTA was essential for good performance. That is,rats learned to expect rewards of different magnitudes at specificlocations on the maze.

The results showed that VTA inactivation significantly, andmore consistently, impaired choice accuracy than in Martig et al.(2009). This behavioral impairment occurred even though ratsretained their preference to visit maze locations that werepreviously associated with large rewards. This result wassurprising given that VTA neurons are known to preferentiallyrespond to larger rewards than small rewards (Puryear et al., 2010;Schultz et al., 1997). The authors interpreted this unexpectedresult to indicate that VTA’s selective coding of large rewards is notnecessary or sufficient to drive behavioral choices toward the largerewards. Rather, the VTA neural codes may contribute to anevaluation of the consequences of behaviors. Recorded hippocam-pal CA1 place cells showed less stable fields after VTA inactivationrelative to control conditions and relative to CA3 place cells. Thedifferential response reveals that in a well learned task, CA3 place

Page 15: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135110

fields alone are not sufficient to maintain high choice accuracyduring navigation. This supports the view described above that ahippocampal evaluation of the expectations (and hence saliency)of a context requires coordinated effort between CA1 and CA3.

In summary, there is substantial evidence that there is animportant role for the VTA dopamine cells in regulatinghippocampal-dependent learning and context representation.The place field data show that hippocampal neurons rely ondopamine input for representing context-relevant informationover time. These results are consistent with growing evidence thatdopamine increases the stability of neural plasticity mechanismsin hippocampus. Cellular mechanisms for this stabilizationfunction are revealed in studies of dopamine effects on hippocam-pal synaptic plasticity. Dopamine appears to importantly regulatea leading model of learning-related synaptic plasticity, long-termpotentiation (LTP). LTP is generally described as a persistentincrease in synaptic efficiency (Martin et al., 2000), and it has beenshown that its induction alters place fields (Dragoi et al., 2003). Theduration of LTP varies depending upon the pattern of neuralactivation used for induction (Morris and Frey, 1997). D1 receptoractivation appears critical for the maintenance of late phase LTP inCA1 (L-LTP; Frey et al., 1990, 1991; Huang and Kandel, 1995;Williams and Eskandar, 2006). Dopamine application is alsocapable of inducing LTP, referred to as early phase LTP (E-LTP), inthe dentate gyrus, following stimulation protocols which arenormally insufficient to do so (Kusuki et al., 1997). Further there issome indication that dopamine agonists alone may be sufficient toinduce a slowly developing potentiation that is independent of anyother external stimulation (Huang and Kandel, 1995; Williamset al., 2006; Williams and Eskandar, 2006). The general pattern,then, seems to be that dopamine elevates and/or maintainssynaptic excitability of hippocampal neurons. Enhancing theduration of strong neural signals may be an important way toincrease the associative capacity of temporally discrete events, andthis could in turn facilitate accurate determinations of contextsaliency.

A possible mechanism for dopamine’s effects on hippocampalneurons was revealed by findings that dopamine agonist-inducedL-LTP can be significantly attenuated by NMDA-receptor antago-nism (Stramiello and Wagner, 2008) suggesting an importantinteraction between these neurotransmitter systems. There isadditional evidence that the interaction between glutamatergicand dopaminergic systems modulates heterosynaptic LTP, where-by weak inputs become strongly potentiated (O’Carroll and Morris,2004). Specifically, it is suggested that NMDA-receptor activationin hippocampus may ‘prime’ synaptic markers that synergize withneuromodulatory signals, such as dopamine, to initiate increases inthe mRNA and protein synthesis that is thought to be so importantfor L-LTP (Frey and Morris, 1997).

The electrical stimulation protocols used to induce LTP areunlikely to occur during natural learning scenarios. However,evidence indicates that lasting changes in synaptic plasticity in thehippocampus can result from exposure to different spatialcontexts. Dopamine has been implicated in such context-inducedchanges in hippocampal synaptic plasticity. Pre-treatment with aD1/D5 receptor antagonist interferes with the LTP-inducing effectsof spatial exploration (Lemon and Manahan-Vaughan, 2006; Liet al., 2003). The ability of dopamine to gate exploration-inducedsynaptic plasticity, then, may be reflected in changes in spatiallyselective neural activity. If dopamine enhances the duration of LTP,then dopamine may act to stabilize place field properties. Thishypothesis was supported recently by Martig and Mizumori (2011)who found that temporarily removing dopamine input to placecells reduces place field stability.

Hippocampal output via the subiculum is also modulated bydopamine afferents. In one study, a low dose of dopamine was

shown to reduce EPSPs in subiculum (Behr et al., 2000). This resultimplies that excitatory inputs to hippocampus must surpass theinhibitory influence of low levels of dopamine in subiculum.However, when large quantities of dopamine are applied, there is afacilitation of long lasting synaptic potentiation in the CA1 region(Huang and Kandel, 1995). Therefore, dopamine acts to dose-dependently gate excitatory drive by reducing the effectiveness ofpotentially irrelevant inputs. By determining the overall effective-ness of excitatory inputs within a structure, dopamine could bepart of a mechanism that determines the likelihood that new orsalient information is remembered.

5.4.3. Impact of hippocampal context processing on dopamine cell

responses to reward

In contrast to the abundant evidence for a functional link fromthe dopaminergic system to the hippocampal system, convergingevidence for a functional link in the other direction is only recentlybeginning to emerge. Nevertheless, existing theories argue that theVTA–hippocampal connection is important for several complexbehaviors, such as reinforcement learning, spatial/contextuallearning, and motivation (Fields et al., 2007; Lisman and Grace,2005; Schultz, 2002; Wise, 2004). Central to these functions is theidea that dopamine may strengthen stimulus–reward associations(Schultz, 2002). Accordingly, dopamine neurons fire upon presen-tation of unexpected rewards and conditioned cues that predictreward, and they are inhibited when expected events do not occur(Schultz and Dickinson, 2000). These firing patterns may signal anerror in the prediction of reward (Bayer and Glimcher, 2005;Hollerman and Schultz, 1998), and this in turn enables the use offlexible behaviors during learning (Schultz and Dickinson, 2000).The reward prediction error signal appears to take into account thebehavioral context in which rewards are obtained (Nakahara et al.,2004; Roesch et al., 2007), context information that may derivefrom hippocampal input. If this is the case, it should be possible torecord similar reward responses in freely behaving rats performinga hippocampal-dependent maze task. A recent study explicitlytested this idea.

Puryear et al. (2010) found that VTA dopamine neuronsincreased firing when rats encountered rewards in expectedlocations on a radial maze, and that the response was much largerfollowing encounters of the larger size rewards. This is analogousto dopamine responses reported from studies with primate(Schultz et al., 1997). Moreover, it appeared as if these cells firedin response to cues that predict reward in that they exhibitedelevated discharge coincident with an auditory stimulus thatsignified the beginning of a trial. Also, it was shown that changes inthe visual aspects of the test environment resulted in significantalterations in the reward responsiveness of the dopamine neurons.Thus, again as shown in primate studies, the dopamine rewardresponses appear to be context-dependent. Of particular interestwas whether rodent VTA neurons would show evidence for eitherpositive or negative reward prediction signaling during navigationbased goal-directed behaviors. Indeed, it was found that VTA cellsincreased firing when a larger than expected reward wasencountered, and reduced firing when an expected reward wasnot found. In addition to confirming that rodent dopamine cellscode reward when spatial information is used to guide behaviors tolocations that signify food, use of a navigation-based task allowedPuryear et al. (2010) to examine the relationship betweenvoluntary movement and reward codes. This was of interest givena vast clinical and research literature showing a critical role for thedopamine system in the voluntary initiation of behaviors. Thefiring rates of dopaminergic reward neurons were found to becorrelated with velocity and/or acceleration as rats movedbetween food locations. However, in contrast to the rewardresponses, the movement correlates were not context-dependent,

Page 16: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 111

suggesting that there are at least two independent sources thatregulate dopamine cell firing during navigation.

A rather surprising result of the Puryear study was thatdopamine neurons consistently responded to rewards even thoughthe task was well learned. According to the now classic studies bySchultz and his colleagues (e.g., Schultz, 1998b, 2010; Schultz et al.,1997) dopamine cells cease firing to rewards and instead fire inresponse to the presentation of cues that predict rewards. Firing tocues was in fact observed in the Puryear study, but so was firing tothe rewards. One possible explanation for the continued responseto reward by dopamine neurons is that our working memory taskgenerated a sufficient degree of uncertainty about choices thatdopamine responses to rewards were retained (Fiorillo et al.,2003). Dopamine signals can be thought of as ‘uncertainty signals’that reflect the strategy of continually updating action–outcomesystems to optimize future behavioral choices. To test thishypothesis, Martig and Mizumori (2011) recorded VTA neuronsas rats learned a spatial task that did not involve working memory.Rats learned to visit the same maze arm to obtain food reward.After rats learned the initial goal location over days, the same ratswere trained to find food in a novel location; after rats learned thesecond location, a third novel location was introduced. The numberof VTA cells showing reward responses declined as additionallocations were learned. For comparison, SNc neurons were alsorecorded as rats performed the same task. In contrast to the VTAcells, SNc cells did not show a change in the number of reward cellswith continued training. This differential response of VTA and SNccells is potentially highly significant since it (1) suggests thatdopamine signaling can have more than one function, and (2)stresses the importance in future studies of identifying thelocations of the cells being recorded in any functional analysisof dopamine neurons. Evidently, context-dependent rewardresponses are more apparent for VTA than for SNc cells. Thisfinding begs the question: what is the source of contextinformation for VTA neurons?

The VTA may receive context-dependent information via anindirect pathway from the hippocampus that includes the ventralstriatum, ventral pallidum, and the PPTg (Fig. 6). Recent worktested whether the latter pathway is an essential link that bridgeshippocampal context processing and the VTA. It had been knownthat PPTg contributes to the burst firing of dopamine cells (Oakmanet al., 1995; Pan and Hyland, 2005), yet the significance of thisinfluence is not clear. Consideration of sensory afferents to thePPTg (Redgrave et al., 1987; Reese et al., 1995) along with theestablished role of dopamine in reinforcement-based operantlearning (Schultz, 1998b) suggests that the PPTg may facilitate theprocessing of (or attention to) learned conditioned stimuli via asensory-gating mechanism (Kobayashi and Isa, 2002; Winn, 2006).Indeed, PPTg neurons exhibit phasic responses to auditory andvisual sensory stimuli that predict reward with a shorter latency(5–10 ms) than dopamine cells (Pan and Hyland, 2005). The PPTgmay, however, serve a more complex function than to relay currentsensory information since context-dependent responses of PPTgneurons have been described in cats performing a motorconditioning task (Dormont et al., 1998). Thus it was of interestto identify the nature of the information passed from PPTg todopamine cells during goal-directed navigation by investigatingPPTg neural responses during performance of a task that is (a)known to rely on intact hippocampal processing, and (b) known togenerate burst firing by VTA neurons in a context-dependentfashion (Puryear et al., 2010).

When PPTg cells were recorded from rats searching for food inknown locations on a radial maze, 45% of recorded PPTg neuronswere either excited or inhibited upon reward acquisition, and therewas no evidence for prediction error signaling. Thus, the lattercomponent of reward processing may arrive in the VTA via a route

that does not involve the PPTg (such as the lateral habenula,Matsumoto and Hikosaka, 2007). A separate population of PPTgneurons exhibited firing rate correlations with the velocity ofmovement. There were also a small number of cells that encodedreward in conjunction with a specific type of egocentric movement(i.e., turning behavior). The context-dependency of PPTg rewardresponses was tested by observing the impact of changes invisuospatial and reward information. Visuospatial, but not rewardmanipulations significantly altered PPTg reward-related activity.Movement-related responses, however, were not affected by eithertype of manipulations. These results suggest that PPTg neuronsconjunctively encode both reward and behavioral responseinformation, and that the reward information is processed in acontext-dependent manner.

Upon closer examination of the PPTg data, it was found thatexcitatory reward responses predominated for anterior PPTg cells,and not posterior PPTg neurons. Considering their differentefferent targets (Puryear and Mizumori, 2008), it appears thatthere is increased synaptic drive to nigral cells from anterior PPTgcoincident with reward consumption in our task. At the same timethere is reduced synaptic drive to VTA. This was unexpected sinceit has been shown that under the identical test conditions, bothVTA and nigral cells increase burst firing relative to rewardacquisition (Gill and Mizumori, 2007; Martig and Mizumori, 2011;Puryear et al., 2010). To account for this apparent discrepancy, it issuggested that during reward acquisition, the reduction ofcholinergic input to VTA from the posterior PPTg may reducethe excitatory drive to VTA GABA neurons. Since VTA GABAneurons normally provide inhibitory control over dopamine cells(Omelchenko and Sesack, 2009), their reduced activation ‘permits’dopamine burst firing. Posterior PPTg responses to rewards tendedto persist for the duration of reward consumption, whereas VTAcells show phasic high frequency burst firing to rewards, and theduration of the VTA response is relatively short compared to theduration of reward consumption. Thus while posterior PPTg mayinitiate VTA dopaminergic reward responses, other intrinsic orextrinsic mechanisms regulate the duration of dopamine burstfiring (perhaps the inhibitory input from accumbens or pallidum(Zahm and Heimer, 1990; Zahm et al., 1996). Fig. 7 provides aschematic illustration of a comparison between VTA and PPTgneural responses to reward.

A salient feature of the dopamine cell response to reward is thebrief changes in firing rates when rats encounter unexpectedlylarge or small rewards. Such a prediction error signal was notobserved for PPTg neuron, suggesting that it is either computedlocally within VTA circuitry, or it is received by an afferentstructure. Matsumoto and Hikosaka (2007) provide convincingevidence that the lateral habenula is at least a critical player ingenerating a prediction error signal for dopamine cells since itsneurons also show altered firing rates in response to a change inthe expected amount of reward. The direction of the change,however, is the opposite of that of dopamine cells: they increasefiring when animals encounter less reward than expected, and theyshow reduced firing after encounters of unexpectedly largerewards. This pattern is consistent with the finding that lateralhabenula activation normally inhibits the activity of VTA and SNcdopamine neurons (Christoph et al., 1986; Herkenham and Nauta,1979). Additionally, Puryear and Mizumori (2008) found predic-tion error codes in cells of the medial reticular nucleus (Swanson,2003), which is known to provide glutamatergic input to VTA(Geisler et al., 2007). The reticular formation is thought to beimportant for modulating arousal and vigilance levels necessaryfor attending to and acting upon salient stimuli (Mesulam, 1981;Pragay et al., 1978). Thus, it seems reasonable that multiple areasmodulate the activity of VTA dopamine neurons when the outcomeof behavior does not meet expectations.

Page 17: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

Midbrain DA cells PPTg cells

Unexpectedrewards

large largesmall small

Expectedreward

Rewardomission

stimulus reward stimulus reward

500 ms0

30

Fig. 7. Reward-related neural discharge has now been shown to exist in multiple brain structures throughout the midbrain and forebrain areas. Left: Responses of a midbrain

(VTA) dopamine cell to rewards of large and small magnitude. The top two rows illustrate responses when a large or small reward is unexpectedly presented to an animal: the

top row shows a schematized response that illustrates a greater dopamine cell response to large rewards. The example of a response by a single dopamine neuron in the

second row confirms the schematic on the top row. The third row illustrates that after a stimulus has been associated with reward, the stimulus itself, and not the reward,

elicits dopamine cell discharge. In this case the subject expects to receive reward following presentation of the stimulus. The bottom row illustrates dopamine cell responses

when a reward is omitted after the associated stimulus is presented. It can be seen that dopamine cells increase firing after stimulus presentation, but the same cell shows

reduced firing at the time when the rat expected to receive reward. This inhibited response is referred to as an inhibitory (or negative) reward prediction error that signals

efferent structures that an expected reward was not found. Right: For comparison with dopamine cell responses, schematized and exemplar responses are shown for cells

recorded in the pedunculopontine nucleus (PPTg), a structure that is thought to regulate burst firing by dopamine cells. Like dopamine cells, PPTg cells respond not only

respond to encounters with unexpected reward, but they also do so differentially. However, in contrast to dopamine cells, the PPTg responses differentiate reward

magnitudes in terms of the duration of response, and not magnitude of response. This pattern suggests that PPTg cells signal the presence of reward. If stimuli are associated

with subsequent reward encounters, PPTg cells show responses to cues that predict rewards (and not to stimuli that do not predict rewards). Unlike dopamine cells, PPTg cells

continue to response to reward presentations even after the presentation of a conditioned stimulus. The last row shows that, again unlike the response of dopamine neurons,

PPTg cells show no evidence of prediction error signaling.

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135112

To summarize, the hippocampus may provide a fundamentalanalysis of the current context that allows subsequent decisions tobe made based on the most recent determination of contextsaliency. Via direct projections to the ventral striatal–VTA system,hippocampus may signal the dopaminergic component of thereinforcement learning system when there are violations of one’sexpectations for a given context. This ‘alerting’ signal may lowerthe threshold for dopamine cell firing to reward so that the‘teaching signal’ can be distributed to update memory andbehavioral systems. The following section will describe currentideas about the impact of dopamine signals on the ventral anddorsal striatum, focusing on the role of dopamine in decisionmaking and behavioral control during navigation.

6. The neurobiology of reinforcement learning and goal-directed navigation: striatal contributions

Decision making or action selection processes have beenattributed to the striatum, which acts as a dynamic controller ofbehavior, integrating sensory, contextual and motivational infor-mation from a wide network of cortical and subcortical structures.This function can be accomplished through the use of reinforce-ment learning algorithms that compare the expected success of alearned behavior with the actual success experienced by theorganism. In reinforcement learning models, the actor and criticuse these predictions to implement successful action–outcomepolicies (Khamassi et al., 2005). The actor–critic distinctionrepresents a classic distinction in psychological literature, thatbetween Pavlovian learning (stimulus–outcome relationships) andinstrumental learning (action–outcome learning). While theseaspects of learning are often studied under restrictive conditionsdesigned to assess particular features of each type of learning, in

fact, these forms of learning can be represented on a kind ofcontinuum. Pavlovian learning mechanisms underlie the ability ofan organism to learn that neutral stimuli can be predictive ofrewards and goals and can eventually facilitate instrumentallearning (i.e., Pavlovian-instrumental transfer), and instrumentallearning can progress from goal-directed behavior, to habitualaction–outcome associations once a behavior has been well-learned. Within the reinforcement learning literature, thesedifferent modes of learning are described by ‘model-free’ algo-rithms that attempt to explain stimulus–response behavior, and‘model-based’ algorithms that describe how learning about theenvironment allows an organism to consider impending actions orformulate new actions within the current context. Until veryrecently, it was thought that the dorsal striatum worked as theactor in a model-free system, and the ventral striatum functionedas the critic in a model-based system (Atallah et al., 2007; Johnsonet al., 2007; van der Meer and Redish, 2011). A wealth of recentdata, however, suggests a more fine-tuned delineation of functionacross the dorsal–ventral striatum. Along with a refinement of thefunctional anatomy of the striatum, it is also clear that reinforce-ment learning algorithms themselves may need to be reconsideredif they are to successfully model learning in complex environ-ments.

6.1. Striatal based navigational circuitry

Like the hippocampus, the striatum is composed of severalfunctionally and anatomically distinct subregions. All corticalareas project to the striatum (Berendse et al., 1992a,b; McGeorgeand Faull, 1987, 1989; Parent, 1990) and the distribution of theseprojections help to define three main subdivisions of the striatum:the ventral striatum (often synonymous with the nucleus

Page 18: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

Motorpallidum

Ventralthalamus

Sensorimotorcortices

Dorsolateralstriatum

Sensorimotor loop

DorsalSNc

Assoc.pallidum

Mediodorsal/ventral

thalamus

PFC & Parietalassoc. cortex

VentralSNc

Dorsomedialstriatum

Associative loop

Ventralpallidum

Mediodorsalthalamus

Orbital & Ventral PFC

BLA

MedialVTA

Limbic loop

NAc

LateralVTA

ExcitatoryInhibitory

DisinhibitionDA modulation

CoreShell

BA C

Fig. 8. Striatal–cortical information processing loops. (A) The ‘limbic loop’ connects the orbital and ventromedial prefrontal cortex with the nucleus accumbens. Input from

these cortical regions is excitatory. The accumbens sends inhibitory projections to the ventral pallidum, which innervates the mediodorsal and other thalamic divisions. (B)

An ‘associative loop’ connects the prefrontal and parietal association cortices with the dorsomedial striatum. The dorsomedial striatum sends inhibitory projections to the

associative pallidum which innervates the mediodorsal and ventral thalamus. (C) The ‘sensorimotor loop’ connects the primary sensorimotor cortices with the dorsolateral

striatum. Emphasis is placed on the spiraling midbrain–striatum–midbrain projections, which allows information to be propagated forward in a hierarchical manner. Note

that this is only one possible neural implementation; interactions via different thalamo-cortico-thalamic projections are also possible (Haber, 2003). BLA, basolateral

amygdale complex; core, nucleus accumbens core; DLS, dorsolateral striatum; DMS, dorsomedial striatum; mPFC, medial prefrontal cortex; OFC, orbitofrontal cortex; shell,

nucleus accumbens shell; SI/MI, primary sensory and motor cortices; SNc, substantia nigra pars compacta; vPFC, ventral prefrontal cortex; VTA, ventral tegmental area.

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 113

accumbens), dorsomedial striatum, and dorsolateral striatum(Alexander and Crutcher, 1990a; Alexander et al., 1986; Humph-ries and Prescott, 2010; Voorn et al., 2004). Each of thesesubregions participates in one of a series of parallel loops thatgo from the neocortex to the striatum, pallidum, thalamus, andthen back to neocortex (see Fig. 8; Alexander and Crutcher, 1990a;Groenewegen et al., 1999a; Haber, 2003). These loops include a‘limbic loop’ that connects the ventromedial prefrontal cortex withthe ventral striatum (Alexander and Crutcher, 1990a; Graybiel,2008; Graybiel et al., 1994; Pennartz et al., 2009; Voorn et al., 2004;Yin and Knowlton, 2006), an ‘associative loop’ that connects themedial prefrontal cortex with the dorsomedial striatum, and a‘sensorimotor loop’ that connects somatosensory and motorcortical areas with the dorsolateral striatum. Activity within theseloops is modulated by dopamine, released from fibers originatingin either the VTA or the SNc. Dopamine influences glutamatergicafferents and striatal medium spiny neuron efferents, and throughthese actions, modulates striatal output from these loops (Horvitz,2002; Nicola et al., 2004). The particular role that dopamine playsin regulating information processing within each of the cortical–striatal loops is influenced by the origin and destination of thedopaminergic projections. In addition, recent work has demon-strated regional differences in tonic and phasic dopamine signalsacross the ventral–dorsal axis of the striatum (Zhang et al., 2009).

As recently pointed out by Humphries and Prescott (2010), andalso noted by others (Bromberg-Martin et al., 2010; Salamone,2007; Wise, 2009; Yin et al., 2008), a number of issues related todopamine signaling within the striatum remain topics of intensedebate, for example, where and what type of dopamine receptorsare found within the striatum, and what effects their activationmay have on cell signaling and behavior. The factors that are likelyto contribute to the confusion include unclear boundaries betweenstriatal compartments, unclear boundaries between midbraindopaminergic regions (VTA and the SNc), and the differentmethods used to study the effects of dopamine (e.g., pharmaco-logical manipulations, lesions, genetically engineered mice,microdialysis, and voltammetry) on many different kinds of

behaviors (e.g., learning vs. performance, operant vs. mazelearning, Pavlovian vs. instrumental learning). A complete discus-sion of these issues is beyond the scope of the current paper, thusthe interested reader is directed to several excellent reviews thathave discussed these details (Bromberg-Martin et al., 2010;Humphries and Prescott, 2010; Nicola et al., 2000; Redgrave andGurney, 2006; Wise, 2009; Yin et al., 2008).

6.2. Dopamine signaling and reward prediction error within the

striatum

The striatum is a major target of midbrain dopaminergicprojections from both the VTA and the SNc (Beckstead et al.,1979; Haber et al., 2000; Humphries and Prescott, 2010). Thedopaminergic projections from the VTA and SNc play a crucial role inmotor control and in emotional and cognitive processes (Wise,2004). Dopamine neurons in the VTA send projections to theprefrontal cortex, hippocampus, and amygdala, in addition to theprojection to the ventral striatum, whereas dopaminergic neuronsfrom the SNc connect primarily to the dorsal striatum (Bjorklund andDunnett, 2007). The projections that originate in the VTA and connectto the prefrontal cortex are thought to regulate attentional processesand working memory (Dalley et al., 2004), whereas VTA projectionsto the ventral striatum are assumed to play a key role in reward,motivation, and goal-directed behavior (Ikemoto, 2007; McFarlandand Ettenberg, 1995; Smith-Roe and Kelley, 2000; Wolterink et al.,1993). In terms of dopaminergic projections that originate in the SNc,the traditional view has been that this projection influences motoroutput and stimulus–response learning (Featherstone and McDo-nald, 2004; Hikosaka et al., 2006; O’Doherty et al., 2004). However,recent evidence indicates that goal-directed behaviors depend onsignaling in the dorsomedial striatum and prefrontal cortex(Graybiel, 2008; Yin et al., 2008). In addition, data from rodentswith neurotoxic lesions of nigrostriatal dopaminergic neuronssuggest that the dorsal striatum strongly contributes to visuospatialfunction and memory (Baunez and Robbins, 1999; Chudasama andRobbins, 2006; De Leonibus et al., 2007; Da Cunha et al., 2003).

Page 19: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135114

The nucleus accumbens is the dopamine terminal field moststrongly implicated in reward function. As discussed in Section 4.2,the predominate view of phasic burst firing of dopaminergicneurons within the midbrain is that it provides a reward predictionerror signal representing the difference between the expected andthe received reward outcome (Ljungberg et al., 1992; Schultz,1998b). In Pavlovian conditioning tasks, in which a cue signals theavailability of reward, these neurons burst fire in response toreward but with learning, this activity shifts to the cue thatpredicts reward. When the reward is omitted after learning, theputative dopamine cells show a brief depression in activity at theexpected time of its delivery (e.g., Fiorillo et al., 2003; Tobler et al.,2003; Waelti et al., 2001; see Section 4.2). Demonstrating changesin activity within the dopamine-rich VTA, however, does notnecessarily equate to dopamine release within its target structure,although one would predict that these events would be correlatedif dopamine modulates activity within the nucleus accumbens.Technological advances have provided a tool, fast-scan cyclicvoltammetry, for measuring dopamine release in target structureson a subsecond timescale (Clark et al., 2010; Robinson et al., 2003;Wightman and Robinson, 2002). Using this technique, work byRegina Carelli’s group tested the hypothesis that dopamine releasein the accumbens core is indeed correlated with a prediction errorsignal in an appetitive Pavlovian conditioning paradigm (Day et al.,2007). As would be predicted based on activity in the VTA, a phasicdopamine signal in the accumbens core was observed immediatelyafter receipt of reward, but over extended training, this signalshifted to the conditioned stimuli. This finding supports theoriginal ‘prediction error’ hypothesis and is also consistent withearlier work showing impaired performance of a Pavlovianconditioned response after either dopamine receptor antagonismor dopamine depletion in the accumbens core (Di Ciano et al.,2001; Parkinson et al., 2002). Thus, at least within the nucleusaccumbens, the generation of a reward prediction error within theVTA does appear to provide a teaching signal to the nucleusaccumbens that facilitates learning, but it should be noted that itmay not provide a unitary teaching signal across the ventralstriatum (Aragona et al., 2009).

Although many remarkable discoveries have been made interms of how the nucleus accumbens contributes to decisionmaking processes, it has become increasingly clear that the dorsalstriatum is also involved. The existence of an error predictionsignal, however, is not as well established in the dorsal comparedto the ventral striatum. Direct measurement of dopamine withinthe dorsal striatum has not been undertaken during a task thatwould produce a prediction error signal from the midbrain. Workby Oyama et al. (2010) has provided the best evidence to date thatan error signal is in fact generated in the dorsal striatum. In thisstudy, single unit activity was recorded in the dorsal striatum andthe VTA/SNc within the same animals to look for correlated activitybetween structures during performance of a probabilistic Pavlovi-an conditioning task. The data indicate that neurons within thedorsal striatum do in fact show activity indicative of an errorprediction signal that is similar to the signal generated by putativedopaminergic neurons within the midbrain.

In addition to potentially providing a prediction signal,dopamine within the dorsal striatum promotes learning andmemory processes that are necessary for goal-directed behavior.The dopamine projection to the dorsomedial striatum, however,may play a different role in learning than the projection to thedorsolateral striatum, as these two regions may differ significantlyin the temporal profile of dopamine release, uptake and degrada-tion (Wickens et al., 2007a,b). One current working hypothesis isthat dopamine projections to the dorsomedial striatum from themedial SNc promotes action–outcome learning, while dopaminer-gic projections from the lateral SNc to the dorsolateral striatum

promotes habit learning (Yin et al., 2008). For example, selectivelesions of dopamine cells that project to the dorsolateral striatumimpairs habit learning (Faure et al., 2005). Local dopaminedepletion, then, is similar to excitotoxic lesions of the dorsolateralstriatum, in that both manipulations retard habit formation andfavor the acquisition of goal-directed actions (Yin et al., 2004).Further evidence that dopamine signaling within the dorsalstriatum may differentially mediate action–outcome and habit/motor learning has been provided by Yin et al. (2009). Mediumspiny neurons within the striatum can be segregated into twodistinct populations, those projecting directly to neurons of thesubstantia nigra pars reticulata (SNr) and internal segment of theglobus pallidus (the ‘direct’ pathway) and those that project to theexternal segment of the globus pallidus, or entopeduncularnucleus in rodents (the ‘indirect’ pathway). Neurons of theexternal globus pallidus or entopeduncular neurons then projectto the SNr, the internal globus pallidus, and subthalamic nucleus.These two populations exhibit distinct physiological propertiesand, importantly, express different dopaminergic receptors, withneurons of the direct pathway preferentially expressing D1receptors and neurons of the indirect pathway preferentiallyexpressing D2 receptors (Albin et al., 1989; Surmeier et al., 2007).Using D2-eGFP mice, Yin et al. (2009) found that D2 expressingneurons located in dorsolateral striatum exhibit a significantincrease in synaptic strength compared to D1 expressing neuronsfrom the same region when mice underwent extended training ona rotarod task. Further, blocking D1 receptors did not affectperformance when injected after the task had been well-learned. Incontrast, blocking D2 receptors impaired performance at bothearly and late training phases. This suggests that motor skilllearning involves an increase in synaptic activation of D2expressing medium spiny neurons within the dorsolateral stria-tum. An intriguing possibility is that these kinds of changes mayalso underlie habitual behavior as routes become very familiar inan unchanging environment.

Additional methods to assess the distinct role that dopaminehas on learning and decision making mechanisms within the dorsalstriatum have been employed by Palmiter and colleagues, using adopamine deficient mouse (Palmiter, 2008; Wall et al., 2011).These mice lack tyrosine hydroxylase selectively in dopamineneurons and are therefore unable to synthesize dopamine. Incontrast to lesion models, dopamine neurons in dopaminedeficient mice are functionally intact (Robinson et al., 2004),and endogenous dopamine signaling can be selectively restored bythe experimenter, making them a powerful tool for studyingdopamine signaling. These mice show impairments in instrumen-tal learning and performance, but their performance can berestored either by L-DOPA injection or by anatomically selectiveviral gene transfer (Robinson et al., 2007; Sotak et al., 2005). Workby Darvas and Palmiter (2010, 2011) has provided evidence thatdopamine is necessary for cognitive flexibility using a water U-maze task in which mice had to shift from an initially acquiredescape strategy to a new strategy, or to reverse the initially learnedstrategy. Restricting dopamine signaling to the ventral striatumdid not impair learning of the initial strategy or reversal-learningbut strongly disrupted strategy-shifting. In contrast, mice withdopamine signaling restricted to the dorsal striatum had intactlearning of the initial strategy, reversal-learning, and strategy-shifting. This suggests that dopamine signaling in both dorsal andventral striatum is sufficient for reversal-learning, whereas onlydopamine signaling in the dorsal striatum is sufficient for the moredemanding strategy-shifting task. In a follow-up study (Darvas andPalmiter, 2011) dopamine was restored to the ventromedialstriatum, and this treatment rescued spatial memory, visuospatialand discriminatory learning. Acquisition of operant behavior wasdelayed, however, and motivation to obtain food rewards was

Page 20: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 115

blunted. These studies indicate that precise restoration ofdopamine signaling within the striatum can selectively affectbehavior. It should be noted, however, that whatever functions canbe rescued by L-DOPA or adenosine antagonism in DA-deficientmice are likely related to restoration of tonic dopamine signaling,rather than phasic dopamine signaling. In addition, these micehave not been used to directly assess habit formation, or thepotential parallel signaling that may take place between thedorsolateral and dorsomedial striatum as learning develops.Nevertheless, the development of this kind of model for selectivelyinvestigating dopamine function in the striatum is likely tosignificantly advance our understanding of the role that dopamineplays in decision making during learning.

Based on these data, one hypothesis about the influence ofdopamine on striatal function suggests that the striatum can beorganized into four regions that underlie different, but synergisticassociation processes, each contributing to the decision processesthat are necessary for navigating within complex learningenvironments (Ikemoto, 2007; Yin et al., 2008). Neuronal signalingmoves through a serial cascade, beginning in the ventral striatumand moving into the dorsomedial and finally, the dorsolateralstriatum as learning progresses. It is thought that this spiraling ofinformation through the ventral–dorsal aspects of the striatumpromotes the transition from goal-directed to habit-drivenbehaviors (Belin and Everitt, 2008; Everitt and Robbins, 2005).Details of this working model of the striatum include the following(also see Fig. 9):

(a) The ventral striatum is important for Pavlovian learning andthe interaction between Pavlovian and instrumental learningmechanisms. This kind of stimulus–reward learning underliesconditioned approach behaviors, and is a powerful way inwhich one can learn that neutral stimuli leads to reward. Insome cases, the stimuli that predict reward may acquire someof the motivational properties of the primary reward. Anexample of this is the value that money has – while moneyitself has no innate biological importance, it is often paired withitems that do have motivational significance, allowing it toserve as a predictor for future rewards, and also as a powerfulconditioned reinforcer.

DLS

DMS

core

shell

Model-freeModel-based

StimHa

ActGo

NStimPavAnt

NStimPaHe

Fig. 9. Major functional domains of the striatum. An illustration of a coronal section of the

domains are anatomically continuous, and roughly correspond to what are commonly

striatum and the dorsolateral striatum. These striatal subregions are thought to implem

grey) or ‘model-based’ learning (light grey). In addition, these subregions are thought to r

supports a model-free actor function whereas the dorsomedial region represents a model

to represent the critic; the core represents a model-free critic, whereas the shell repre

After Bornstein and Daw (2011) and Yin et al. (2008).

(b) The dorsomedial striatum, on the other hand, appears tosupport action–outcome associations. This kind of learning isfundamental for adaptive goal-directed behaviors. Many of ourbehaviors can be considered goal-directed, for example,publishing more papers will lead to a promotion at work, orincreasing our level of exercise may lead to better health.

(c) The dorsolateral striatum is involved in incremental stimulus–response kinds of learning that underlie procedural learningand the formation of habits, and the sequencing of behavior. Inmany cases, habits are thought of in a negative context such asdrug addiction. When habits are discussed here, the term ismeant to indicate something more general and adaptive,reflecting a well-learned skill or automatic behavior. Oneexample of this kind of learning may be learning to ride abicycle; initially, a great deal of effort and conscious thoughtgoes into staying upright and moving the bicycle forward. Overtime, however, these actions become considerably easier andthe individual components of the behavior that keeps youupright and moves the bicycle forward become an implicit fluidsequence that may be difficult to verbalize when teachingsomeone else how to ride a bicycle.

While these descriptions of the contributions of the striatalsubregions to decision making processes suggests separablefunctions (i.e., serial processing), it is more likely that thesesubregions function synergistically within a wide network todirect behavior in complex learning environments (Groenewegenet al., 1999b; Haber, 2003; Haruno and Kawato, 2006; Joel andWeiner, 2000; Yin et al., 2008; Zahm, 2000). These functions will bediscussed individually.

6.3. The ventral striatum: Pavlovian learning and cost-based decision

making

The ventral striatum receives convergent glutamatergic inputfrom multiple sensory and association areas of the neocortex(prefrontal cortex) and the limbic system, including the amygdalaand hippocampus and related structures (subiculum, area CA1,entorhinal cortex) (Boeijinga et al., 1993; Flaherty and Graybiel,1993; Groenewegen et al., 1999a,b, 1987; Humphries and Prescott,

Dorsolateral Striatumulus response learning

bits, skills, behavioral sequencing

Dorsomedial Striatumion-outcome learningal-directed action

ucleus Accumbens Coreulus-outcome learninglovian preparatory CRs &icipatory approach behaviors

ucleus Accumbens Shellulus-outcome learning

vlovian consummatory CRs &donic URs

striatum showing half of the brain (Paxinos and Watson, 2007). The four functional

known as nucleus accumbens shell and core (ventral striatum), the dorsomedial

ent different aspects of reinforcement learning, either ‘model-free’ learning (dark

epresent both the actor and the critic. Within the dorsal striatum, the lateral portion

-based actor. The ventral striatum, which is crucial for Pavlovian learning, is thought

sents a model-based critic.

Page 21: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135116

2010; Izquierdo et al., 2006; McGeorge and Faull, 1989; Mulderet al., 1998; Totterdell and Meredith, 1997; van Groen and Wyss,1990; Voorn et al., 2004). The nucleus accumbens, the main portionof the ventral striatum, can be divided into two major subregions,the core which is continuous with the dorsomedial striatum, andthe shell which occupies the ventral and medial portions of thenucleus accumbens. Although the core and shell regions sharecommon characteristics, they also differ significantly in terms oftheir cellular morphology, neurochemistry, and patterns ofprojections, all of which may suggest a different function for thecore and shell (Heimer et al., 1991; Jongen-Relo et al., 1994;Meredith, 1999; Meredith et al., 1992, 1996, 2008; Usuda et al.,1998; Zahm and Brog, 1992; Zahm and Heimer, 1993). The coreand shell regions of the nucleus accumbens are not likely tofunction completely independently of each other, however, asdirect interconnections between these areas have also beendescribed (Heimer et al., 1991; van Dongen et al., 2005; Zahm,1999; Zahm and Brog, 1992; Zahm and Heimer, 1993).

Based on its connectivity, a general working model has beenthat the nucleus accumbens represents a ‘limbic–motor interface’that facilitates appropriate responding to reward-predictivestimuli (e.g., Ikemoto and Panksepp, 1999; Mogenson et al.,1980; Nicola, 2007; Pennartz et al., 1994; Wise, 2004; Wright et al.,1996; Zahm, 2000). How this process is achieved, however, is notfully understood. If the accumbens does indeed represent such aninterface, then it should, at the very least, process informationrelated to reward and the actions that lead to the acquisition ofreward. In fact, there is a fair amount of evidence suggesting thatneurons within the nucleus accumbens respond to cues associatedwith a reward (e.g., Carelli and Ijames, 2001; Cromwell andSchultz, 2003; Hassani et al., 2001; Hollerman and Schultz, 1998;Nicola et al., 2004; Roitman et al., 2005; Setlow et al., 2003; Wilsonand Bowman, 2005), as well as the selection of one behavior fromamong competing alternatives (Hikosaka et al., 2006; Nicola, 2007;Pennartz et al., 1994; Redgrave et al., 1999a; Roesch et al., 2009;Taha et al., 2007).

6.3.1. Nucleus accumbens and Pavlovian learning

Foraging animals encounter situations in which they arerequired to find food or other necessary resources. In order tolearn that certain stimuli may signal the availability of the resourcebeing pursued, organisms must be able to learn relationshipsbetween positive outcomes and their reward predictive cues. Thisbehavior can be investigated within the laboratory using anautoshaping (also known as ‘sign tracking’) paradigm. In auto-shaping experiments, a cue is paired with the availability ofreward. Initially, this cue is neutral, meaning that the cue itself isneither biologically significant, nor is it predictive of reward.Because the cue is novel, and rodents have a propensity forinvestigating novel cues and objects (Bardo et al., 1989, 1996;Bardo and Dwoskin, 2004; Burns et al., 1996; De Leonibus et al.,2006), the animal will approach the cue, and over time will begin toassociate the cue with a reward. Thus, the neutral cue gains controlover approach responses even though reward delivery is indepen-dent of any specific behavior, and with extended training,approach responses are observed nearly every time the reward-predictive cue is presented. A cue that has never been paired withreward does not elicit approach behavior even after repeatedpresentation (Bussey et al., 1997; Robbins and Everitt, 2002). Thisapproach behavior lacks the flexibility of instrumental learning inthat the behavior is not generally altered by the introduction ofnew contingencies (Bussey et al., 1997; Day and Carelli, 2007;Jenkins and Moore, 1973; Locurto et al., 1976; Williams andWilliams, 1969). Autoshaping has important implications forforaging behavior; in a rapidly changing environment, autoshapingbehaviors represent a fundamental mechanism through which an

organism learns about environmental cues that lead to biologicallysignificant events such as food, mates, and shelter. It is notsurprising then, that autoshaping is demonstrated by a number ofspecies, including birds (Brown and Jenkins, 1968) monkeys(Sidman and Fletcher, 1968) and humans (Wilcove and Miller,1974).

A number of studies suggest that the nucleus accumbensmediates autoshaping. For example, Cardinal et al. (2001)demonstrated that excitotoxic lesions of the nucleus accumbenscore impair the ability to discriminate between a cue that ispredictive of reward and an alternate cue with no predictive value.Similarly, depletion of dopamine in the nucleus accumbens resultsin deficits in the acquisition and expression of approach behaviors(Di Ciano et al., 2001; Parkinson et al., 2002). Further, electrophys-iological recordings during autoshaping demonstrate that accum-bens neurons exhibit phasic changes in firing rate that are selectivefor cues predictive of reward; in some cases, an increase in activityis associated with the onset of a reward predicting cue, while asecond subset of neurons is significantly inhibited. These samecells showed little or no change in activity in response to a cue thatwas not paired with reward. These findings were also core andshell specific; significantly fewer neurons in the shell showed anexcitatory response to predictive cues compared to neurons withinthe core (Day et al., 2006). In addition, lesion and pharmacologicaldata indicate that disrupting activity within the core interfereswith approach toward predictive cues, suggesting that the coremay help organisms discriminate between biologically relevantand irrelevant cues (Cardinal et al., 2001; Di Ciano et al., 2001). Thefunctional dissociation between the core and shell might beexpected given that these regions send separate projections todifferent output structures (Heimer et al., 1991; Sesack and Grace,2010).

The accumbens is also involved in Pavlovian-instrumentaltransfer (PIT), which is the capacity of a Pavlovian stimulus thatpredicts reward to elicit or increase instrumental responses for thesame (or a similar) reward (Estes, 1943, 1948; Kruse et al., 1983;Rescorla and Solomon, 1967). To produce PIT, animals first undergoPavlovian and then instrumental training during which they learnto associate a cue with reward and then later, learn to make aspecific operant response (i.e., press a lever) for the reward. On aprobe trial, the predictive cue is presented with the lever, and thechange in response rate on the lever is measured. Two forms of PITcan be observed, one that is related to the arousing effect ofreward-related cues (non-selective PIT), and another that is moreselective for choice performance produced by the predictive statusof a cue with respect to one specific reward compared to others(outcome-selective PIT) (Holmes et al., 2010). The shell and coreregions of the nucleus accumbens are differentially involved ingeneral and selective PIT; general PIT is disrupted by lesions of thecore, but not by lesions of the shell (Hall et al., 2001), whereasselective PIT is disrupted by lesions of the shell, but not by lesionsof the core (Corbit et al., 2001). Importantly, because theaccumbens is not thought to be integral to instrumental behaviors(Yin et al., 2008), other regions of the striatum that are involved ininstrumental learning should also be involved in PIT. In fact, Corbitand Janak (2007, 2010) have shown that the dorsolateral anddorsomedial striatum integrate different aspects of Pavlovian andinstrumental information. For example, lesions of the dorsolateralstriatum reduces PIT altogether, whereas lesions of the dorsome-dial striatum interferes with the selectivity of PIT (Corbit and Janak,2007).

6.3.2. The nucleus accumbens and cost-based decision making

When animals are pursuing a goal, they are often faced withcomplex effort or time-related barriers that separate the actionsthey make from the goal being pursued. This is the case in natural

Page 22: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 117

foraging environments, and in the laboratory where animals aretrained to lever press or navigate a maze for reward. Thus, it isadaptive for animals to cope with delayed reinforcement orincreased effort to obtain the desired outcome. Within thelaboratory, effort-based decision making can be assessed byproviding the organism with a choice between a low-cost/lowvalue reward vs. a high-cost/high value reward. Most typically, lowcost options are associated with, for example, few lever pressresponses or a short time delay, while high cost options requiresignificantly more lever presses or impose a longer delay betweenthe last response and the delivery of reward. Many factors mayinfluence the choice that any one animal makes, includingmotivational factors such as how hungry the animal is, or howdesirable the reward is (Salamone et al., 2007, 2009). A growingbody of work suggests that the nucleus accumbens and its corticalafferents (e.g., the anterior cingulate cortex and medial prefrontalcortex) are involved in exertion of effort and effort-related choicebehaviors (e.g., Cardinal et al., 2001; Floresco and Ghods-Sharifi,2007; Floresco et al., 2008a; Salamone, 2002; Walton et al., 2006).Disrupting activity within the nucleus accumbens can shiftbehavior toward actions that require less effort or are associatedwith shorter delays to reward (Aberman and Salamone, 1999;Aberman et al., 1998; Bezzina et al., 2008; Cardinal et al., 2001; Dayet al., 2011; Hauber and Sommer, 2009; Walton et al., 2006). In arecent study (Day et al., 2011), the complex role that the nucleusaccumbens plays in effort-based and delay-based costs wasassessed. In this study, a visual cue signaled the relative valueof an upcoming reward. Analysis of single unit activity within theaccumbens indicates that a subgroup of neurons show phasicincreases in firing in response to the predictive cue, and thisactivity reflects the cost-discounted value of the upcomingresponse for effort-related, but not delay-related costs. In contrast,additional subgroups of neurons respond during response initia-tion or reward delivery, but this activity does not differ on the basisof reward cost. Finally, another population of neurons within theaccumbens showed sustained changes in firing rate (eitherexcitation or inhibition) while rats completed high-effort require-ments or waited for delayed rewards. The complexity of the resultsreported in this study highlights the complexity of the computa-tions required to make decisions when faced with competingoptions. For the foraging animal, the cost of obtaining rewards isdynamic; for example, the time to explore and the distance thatmust be travelled to obtain resources is constantly changing(Stephens, 1986). Because individual neurons within the accum-bens receive diverse cortical and subcortical inputs, they are likelyto carry a heavy information processing load (Kincaid et al., 1998)in complex decision making environments.

Dopamine signaling also contributes to the execution of cost–benefit decisions (Fiorillo et al., 2003, 2008; Gan et al., 2010;Kobayashi and Schultz, 2008; Ostlund et al., 2011; Phillips et al.,2007; Roesch et al., 2007; Roitman et al., 2004; Tobler et al., 2005;Wanat et al., 2010). Some studies have investigated the role ofputative dopamine neurons to cost-based decisions by measuringactivity in the midbrain (Fiorillo et al., 2003, 2005, 2008; Kobayashiand Schultz, 2008; Roesch et al., 2007; Tobler et al., 2005) whileothers studies have obtained a measure of dopamine activity withinthe nucleus accumbens, since the latter is a major target of midbraindopaminergic projections, and is known to be involved in thecomputations that support cost-based decision making (Day et al.,2011; Gan et al., 2010; Salamone et al., 2009; Wanat et al., 2010). Instudies using voltammetry to measure phasic dopamine release,cue-evoked dopamine signals are shown to be relatively insensitiveto both effort-based and delay-based costs, but a significantresponse is observed when the cost to obtain reward changes

(Gan et al., 2010; Roesch et al., 2007; Wanat et al., 2010). Further,Wanat et al. (2010) showed that dopamine responses to rewards and

their predictive cues are separable and independently modulatedwhen instrumental-response requirements are progressively in-creased. That is, reward-evoked dopamine release within theaccumbens is affected by escalating costs in proportion to the delayimposed prior to reward delivery rather than to increased workrequirements, whereas cue-evoked dopamine release is unaffectedby either temporal or effort-related costs. Together, these resultsmay be congruent with competing theories of dopamine function: ifdopamine provides a prediction error signal, then dopamineneurons in a trained animal respond to rewards only when theyare unexpected (Fiorillo et al., 2003; Schultz et al., 1997), as would bethe case when the relative cost of a reward changes. In addition,phasic dopamine signals may provide an incentive signal that is usedto determine the value of the reward (Berridge, 2007). This wouldalso explain the observation that changes in phasic dopamine occurwhen costs to obtain the reward changes. Finally, these results mayalso be consistent with the ‘Flexible Approach Hypothesis’ whichstates that dopamine signaling within the accumbens is required forreward seeking behavior only when specific actions that arenecessary to obtain reward are variable across trials (Nicola, 2010).

The role of the nucleus accumbens in mediating cost-basedchoice behavior has also been tested using maze tasks. Forexample, a T-maze choice task (Cousins et al., 1996; Salamone,1994) can be used in which one of the choice arms contains a largefood reward, whereas the other arm has a significantly smallerreward. Effort-related decision problems can be introduced byplacing a barrier in the arm that contains the larger reward, thuspresenting an obstacle that the rat must climb to gain access to thelarger reward. Alternatively, the barrier that prevents the rat fromaccessing the larger reward can be used to impose a delay beforeaccess to the large reward is granted. Using an effort-based versionof this task, Cousins et al. (1996) demonstrated that excitotoxiclesions of the accumbens significantly decreased selection of thehigh effort/high reward maze arm. When, however, reward wasentirely omitted from the low effort maze arm, these rats choosethe high effort/high reward arm and were capable of obtaining thereward, despite the high cost.

Recently, Bardgett et al. (2009) used a discounting version of theT-maze task in which the amount of food in the large reward arm ofthe maze was reduced each time the rat selected that arm. This‘adjusting-amount’ discounting variant of the T-maze task permitsassessment of the indifference point for each rat, which is definedas the point at which the rat no longer shows a preference for onereward over the other, and therefore chooses both amounts equallyoften (Richards et al., 1997). When dopamine signaling wasblocked with either a D1 or D2 receptor antagonist, rats were morelikely to choose the small-reward arm, but when treated withamphetamine, rats were more likely to choose the large-rewardarm. Clearly, carefully designed behavioral studies with mazes canprovide a more complete understanding of how the brainprocesses information necessary for making (optimal) decisionsin complex learning environments. In fact, cost-based decisionmaking has been investigated on several maze-based tasks, haveundergone behavioral validation and evaluation (Cousins et al.,1996; Salamone et al., 1991; van den Bos et al., 2006), and havebeen used by several laboratories to characterize the effects ofbrain lesions or drug manipulations on choice behavior (Bardgettet al., 2009; Denk et al., 2005; Salamone et al., 1991; Schweimerand Hauber, 2006; Walton et al., 2002). Although there are veryobvious differences between these tasks, and the operant tasksafter which they have been modeled, both have yielded remark-ably similar results (Bardgett et al., 2009; Cousins et al., 1994; Denket al., 2005; Floresco et al., 2008b; Koch et al., 2000; Salamone et al.,1991, 2002; Sink et al., 2008; Wakabayashi et al., 2004; Waltonet al., 2006). Thus, maze tasks appear to be valid models forinvestigating choice behavior during cost-based decision making.

Page 23: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135118

6.3.3. Spatial learning and navigation: the role of the ventral striatum

The ability to make optimal cost-based decisions is essential ifanimals are to make adaptive behavioral choices during goal-directed navigation. The ventral striatum appears strategicallypositioned to play a key role in cost based decisions duringnavigation given the convergent evidence from a variety of mazestudies, including the spatial version of the Morris swim task(Sargolini et al., 2003; Setlow and McGaugh, 1998), the radial maze(Gal et al., 1997; Smith-Roe et al., 1999), a spatial version of thehole board task (Maldonado-Irizarry and Kelley, 1995), as well as atask in which the animals are required to discriminate a spatialdisplacement of objects (e.g., Annett et al., 1989; Ferretti et al.,2005; Roullet et al., 2001; Sargolini et al., 1999; Seamans andPhillips, 1994; Usiello et al., 1998).

To investigate the idea that the ventral striatum associatesspecifically spatial context with reward information to facilitateinitiation of appropriate navigation-based behaviors (Mogensonet al., 1980), Lavoie and Mizumori (1994) recorded neural activityin the ventral striatum while rats navigated an 8-arm radial mazefor food reward. This study demonstrated, for the first time, spatialfiring correlates within the ventral striatum (Lavoie and Mizumori,1994). The mean place specificity for all ventral striatal neuronswas significantly lower than that typically observed in thehippocampus (Barnes et al., 1990), indicating that while ventralstriatal neurons discharge with spatial selectivity, they are not asselective as those observed from hippocampal neurons. Themoderate spatial selectivity likely reflects the integration ofspatial with other non-spatial information within the ventralstriatum, including reward and movement. The fact that singleventral striatal neurons encode multiple types of informationsupports the view that spatial, reward and movement informationmay be integrated at the level of individual ventral striatalneurons. Recent evidence suggests that spatial information withinthe ventral striatum is derived from the hippocampus, Ito et al.(2008) showed that an interruption of information sharingbetween the hippocampus and shell of the nucleus accumbensdisrupted the acquisition of context-dependent retrieval of cueinformation, suggesting that the shell, in particular, may provide asite at which spatial and discrete cue information may beintegrated.

Work by Redish and his colleagues have sought to describe theunique contributions that the hippocampus and the striatum maketo choice behavior and spatial information processing using amultiple T-maze task. With this task, several choice points arepresented to the rat as it navigates from a start location to a rewardsite. The final choice point on the maze represents a point in spacewhere the animal makes a final ‘high-cost’ choice to gain access toreward. At this critical point, a number of interesting events occurin terms of both observable behavior and neuronal responses. First,early in training, while the animal is learning the correct choice,the animal pauses and engages in what is called ‘vicarious trial anderror’ (Tolman, 1938, 1939). While this behavior is being engaged,ensembles of hippocampal neurons transiently represent locationsahead of the animal, sweeping down the arms of the maze beforethe animal implements a choice (Schmitzer-Torbert and Redish,2002; van der Meer et al., 2010). In parallel with these forwardsweeps, neurons in the ventral striatum that are responsive toreward (i.e., at the reward site on the maze), also show enhancedneural responses at the final decision point. This activity is thoughtto reflect an ‘expectation-of-reward’ signal at decision points (vander Meer et al., 2010; van der Meer and Redish, 2010). Thisinterpretation is congruent with work described above showingthat the ventral striatum is involved in mediating the influencethat motivationally relevant cues have on behavior (Cardinal et al.,2001; Day and Carelli, 2007; Kelley, 2004). In addition, theseresults support the idea that the moderately spatial selective

neurons described by Lavoie and Mizumori (1994) likely reflect theintegration of spatial with non-spatial information (i.e., rewardand movement-related information) within the ventral striatum.In addition, the fact that single ventral striatal neurons encodemultiple types of information supports the view that spatial,reward and movement information may be integrated at the levelof individual ventral striatal neurons. Thus, together with thehippocampus, the ventral striatum plays a key role in evaluatingand selecting the behaviors most likely to result in reward, andthus underlie goal-directed behavior (in this particular case, goal-directed navigation).

In addition to characterizing the activity of the hippocampusand the ventral striatum in a maze-based decision making task,characterization of the dorsal striatum was also undertaken.Previous studies have provided evidence that neurons within thedorsal striatum exhibit egocentric movement-related discharge(e.g., Barnes et al., 2005; Jog et al., 1999; Yeshenko et al., 2004) andshow spatially selective firing on maze tasks. On the multiple T-maze task, van der Meer et al. (2010) observed a gradual increase inthe coding efficiency of dorsal striatal neurons as the animalsbecome better at implementing the correct choice. In addition,these responses within the dorsal striatum are most evident duringthe turn sequence, at reward location, and in response to cues thatare predictive of reward (van der Meer et al., 2010). This suggeststhat activity in the dorsal striatum may reflect the events thatdefine the task structure; because the ultimate goal of the task is toreach reward, this is one salient event, and the turn sequence thatthe rat makes in order to reach that reward might be consideredanother salient aspect of task structure. This result is in line withwork from Graybiel and colleagues (e.g., Barnes et al., 2005; Joget al., 1999), which is discussed in greater detail below. Overall,these results provide evidence for a functional network thatsupports choice behavior on a goal-directed navigation based task.The role that the dorsal striatum plays in decision and learningprocesses will be discussed below.

6.4. Dorsal striatum: contributions to response and associative

learning

Historically, investigation of the particular role that the dorsalstriatum plays in mediating goal-directed behaviors investigatedthe dorsal striatum as a single entity, and it has only been fairlyrecently recognized that the lateral and medial aspects of thedorsal striatum participate in learning in unique ways (Balleineet al., 2007; Balleine and O’Doherty, 2010; Yin et al., 2008). Thedorsomedial striatum is innervated by the association cortices, andthe anterior portion of the dorsomedial striatum also receivesprojections from the prefrontal cortex, while the more posteriorregion receives significant projections from the perirhinal andagranular insular regions, as well as the entorhinal cortex andbasolateral amygdala (McGeorge and Faull, 1987, 1989). Thisregion of the dorsal striatum is thought to mediate goal-directedbehaviors, as has been shown in instrumental operant tasks, and ingoal-directed navigational tasks. In contrast, the dorsolateralstriatum, which is innervated by the primary motor andsomatosensory cortices, underlies motor skill learning and habitlearning that allows automaticity of behavior when appropriate(see Balleine et al., 2007; Johnson et al., 2007; Yin and Knowlton,2006; Yin et al., 2008). Importantly, both modes of learning willcontribute to flexible navigational behaviors – it is through theinteraction of these two modes of learning that animals will be ableto select the most adaptive behavior necessary to navigate in acomplex learning environment. In terms of reinforcement learningtheory, as a whole, the dorsal striatum is thought to represent theactor in the actor–critic framework, but the dorsomedial striatumis thought to perform this function within a model-based system

Page 24: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 119

whereas the dorsolateral striatum is thought to perform thisfunction within a model-free framework.

6.4.1. Action–outcome learning and habit learning in the dorsal

striatum

Given enough time and practice, the learning of a motor skill orhabit can move from being effortful, to a point where the newlyacquired skill can be performed without a great deal of cognitiveeffort. Under ‘normal’ learning conditions, some degree ofautomation of behavior may be beneficial in that well-learnedbehaviors can take place without a great deal of informationprocessing resources being engaged, thus leaving the organism in aposition to direct attentional and cognitive resources to moredifficult or urgent matters. The mechanisms that underlie thistransition are only just beginning to be understood. Behavioralevidence indicates that motor skill and habit learning takes placeover an initial phase of fast improvements, followed by a slowerphase of gradual refinement (Costa et al., 2004; Karni et al., 1998;Yin and Knowlton, 2006; Yin et al., 2008). Within an instrumentallearning task, this incremental learning is observed during aninitial phase of learning that is sensitive to both the action–outcome contingency and the value of the outcome. Afterprolonged training, however, these actions are transformed, andthe behavior becomes automatic and insensitive to both theaction–outcome contingency and to the outcome value (Balleineand Dickinson, 1998; Balleine et al., 2009; Yin et al., 2008).

A series of elegant studies conducted by Yin and his colleagueshave clearly identified functional differences between the dorsolat-eral and dorsomedial striatum (Yin et al., 2004, 2005, 2006, 2009; Yinand Knowlton, 2004). Animals were trained to lever press for sucrosereward using instrumental contingencies that are known toeventually lead to habit formation. To test if the behavior indeedreached habit status, the reward was paired with lithium chloride toinduce taste aversion. Control animal given this treatment contin-ued to lever press for sucrose reward, indicating that their behaviorwas impervious to the reward devaluation procedure. Animals withselective lesions of the dorsolateral striatum, however, significantlyreduced their rate of responding, indicating that the dorsolateralstriatum plays a key role in habit behavior. Importantly, lesions ofthe dorsomedial striatum after the acquisition of the habitualbehavior did not affect habitual responding; these animalscontinued to lever press for sucrose reward after lithium chloridetreatment, indicating that the dorsomedial striatum is not necessaryfor the expression of habitual behavior once it has been acquired (Yinet al., 2004). Working on the idea that the dorsomedial striatum maybe involved in action–outcome learning rather than habit learning,Yin et al. again trained rats on a task that is normally sensitive tooutcome devaluation and contingency degradation in which theprobability of reward delivery is no longer dependent on anappropriate response by the rat (Colwill and Rescorla, 1990;Hammond, 1980). Reversible inactivation of the posterior part ofthe dorsomedial striatum, as well as pre- and post-training lesions ofthis region, eliminated sensitivity to outcome devaluation anddegradation, and thus led to habit-like responding (Yin et al., 2005).Based on these results, it appears that the posterior dorsomedialstriatum is important for learning and expression of goal-directedbehavior because when this region is functionally blocked, behaviorof the animal becomes habitual even under training conditions thatnormally result in goal-directed actions in control rats.

As discussed in relation to the ventral striatum, maze tasks canbe used that closely parallel learning contingencies used withininstrumental-operant tasks, despite there being obvious differ-ences between the motor programs necessary for pressing a leverand traversing a maze. Using a T-maze task, Yin and Knowlton(2004) evaluated the idea that the posterior dorsomedial striatumis involved in flexible action–outcome/associative learning,

whereas the dorsolateral striatum underlies response (motor)learning. Lesions to the dorsomedial or dorsolateral striatum weremade prior to the acquisition of the task, and rats were thenextensively trained to retrieve reward using a response strategy,specifically a rightward body turn. The strategy that the animal isusing can be assessed directly on a probe trial in which the animalbegins the trial on a different arm of the maze. If the animal isdependent on a response/motor strategy, then it will persist inmaking a rightward body turn, but if using a more flexible placestrategy, the animal will be able to navigate to the rewarded site byreintegrating the spatial features of the environment with the goallocation. Lesions of the posterior dorsomedial striatum resulted inthe use of a response strategy; in this case, the animals continuedto make rightward body turns, while control animals were able toemploy a place strategy to successfully retrieve the reward. Thisobservation, together with the data discussed above, indicates thatthe dorsomedial striatum underlies flexible choice behavior(Corbit and Janak, 2010; Devan and White, 1999; Ragozzinoet al., 2002; Whishaw et al., 1987).

Neurophysiological studies indicate that neurons within thedorsomedial striatum undergo changes in activity early on duringmotor learning and their firing has been shown to changeaccording to flexible stimulus-value assignments (Kimchi andLaubach, 2009; Yin et al., 2009). Similarly, inactivation orpharmacological manipulations of the prelimbic and infralimbiccortical areas, which form part of the association loop that projectsto the medial portion of the dorsal striatum, also impairsbehavioral flexibility (Ragozzino et al., 1999a,b). Whereas thehippocampus may be necessary to establish the spatial location ofthe goal (see Section 5), it would appear that the dorsomedialstriatum is important for choosing the correct course of action thatleads the animal to this location. One intriguing interpretation ofthese results is that the hippocampus does not compete with, orfunction independently of, the striatum, as has been previouslyclaimed (Packard and Knowlton, 2002; Poldrack and Packard,2003), but rather, these brain regions work synergistically to forma functional circuit (Mizumori et al., 2004, 2009; Yin and Knowlton,2006). This hypothesis is supported by studies that have examinedneural activity in the dorsomedial and dorsolateral striatum duringspatial navigation. Some of the neurons within these regionsexhibit location-specific firing while a rat traverses a maze,occasionally independent of both movement and reward condition(Mizumori et al., 2000; Ragozzino et al., 2001; Wiener, 1993).While it has been argued that hippocampal place fields contributeto the determination of context saliency (discussed in Section5.3.1), striatal place fields may be used to provide location-selective and context-dependent control over an animal’s move-ment. On the other hand, neurons that are sensitive to theegocentric movement of the animal are likely to reflect intentionalmovement/planning of movement toward the goal location, andneurons responsive to the goal location provide informationregarding the outcome of the action/movement to the goal location(Mizumori et al., 2004; Yeshenko et al., 2004). Support for this ideahas also been shown in non-human primates, in which striatalneurons become engaged in processing information about learnedevents that have not yet occurred, suggesting that this activity isevoked by the expectation of an upcoming salient event (Schultzet al., 1997). This kind of neural activity signals not only whetheran event is going to occur, but also the location of the event(Hikosaka et al., 1989), and in some cases, the direction ofimpending movement (Alexander and Crutcher, 1990b).

6.4.2. Response learning in the dorsal striatum

For response learning, sensory stimuli directs the behavior ormotor response that will ultimately be made, for example, an armmovement or a body turn. The likelihood that any particular

Page 25: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135120

movement is made in response to a stimulus is initially influencedby the presence or absence of reward. Over time however, rewardno longer reliably influences behavior, and thus, the behavior is nolonger considered flexible, but is considered habitual. Theacquisition of a habit involves the gradual development of specificS–R associations (Mishkin et al., 1984; Squire et al., 1993). A habitis distinguished by the tendency to be ‘response-like’ meaning thatit is triggered automatically by a particular stimulus or stimuluscomplex (Dickinson, 1985). If individual neurons representstimulus–response associations, then they should exhibit twokey characteristics: their activity should be modulated by thepresentation of a stimulus that cues the organism to perform anaction for reward, and their activity should encode some aspect ofthe action that the organism performs once the stimulus has beenpresented. This kind of activity has been well demonstrated in thedorsolateral striatum using tasks that require the subject to make aspecific response movement to receive a reward as directed by aninstructional cue (e.g., Barnes et al., 2011; Jog et al., 1999; Thornet al., 2010). These kinds of results have been demonstrated in bothprimates and rodents, for several different kinds of task-relevantcues, including auditory and visual cues, and for many differentbody movements, including movement of the hand, arm/forelimb,eyes, head and whole body movements (Alexander and Crutcher,1990b; Barnes et al., 2005; Gardiner and Kitai, 1992; Hikosakaet al., 1989; Jaeger et al., 1993; Jog et al., 1999; Kimura et al., 1992;Schultz and Romo, 1988, 1992; White and Rebec, 1993).

Work by Ann Graybiel and her colleagues have identified someof the key neural mechanisms that underlie habit formation/stimulus–response learning (Barnes et al., 2005, 2011; Jog et al.,1999). Using a T-maze task, rats were overtrained to respond to thepresentation of an auditory instruction cue that indicated that theanimal should turn left or right to reach the goal (i.e., food reward).Single unit recordings from the dorsolateral striatum wereperformed throughout the training procedure, which allowed anassessment of potential changes in neural activity as learningprogressed. In addition, task-related neural activity was assessedat different areas on the maze, including the start area, the areawhere the tone was provided, the area where the body turn towardthe goal was executed, and the goal location. Initially, neuralactivity was responsive to several aspects of the task, especially thepoint at which an animal executed the body turn toward the goallocation. Over the course of learning, however, neural activitygradually shifted, so that task-related activity reflected thebeginning and the end of the task. This pattern of activityremained stable over the course of several weeks, as did thebehavior (Jog et al., 1999). These results suggest that there is arestructuring of neuronal responses within the sensorimotorstriatum as habitual behavior develops.

6.4.3. Sequence learning in the dorsal striatum

In addition to learning which behaviors ultimately lead toreward, goal-directed behavior may require that behaviors areperformed in a particular order or sequence. There is evidence thatthe striatum participates in the sequential organization of naturalbehaviors in monkeys (Van den Bercken and Cools, 1982) and rats(Berridge and Whishaw, 1992; DeCoteau and Kesner, 2000; Pelliset al., 1993). For example, the dorsal striatum has been shown to becritical for grooming sequences in rats (Aldridge and Berridge,1998; Berridge and Whishaw, 1992). In addition, in the workdiscussed above, it was demonstrated that neurons within thedorsolateral striatum tend to respond to the beginning and the endof trials as training on a cued T-maze task progresses. This responsemay indicate that behavioral sequences are parsed into ‘chunks’ asthe task is learned (e.g., Barnes et al., 2005; Boyd et al., 2009;Graybiel, 1998; Kubota et al., 2009; Thorn and Graybiel, 2010;Tremblay et al., 2009, 2010). Recent work by Yin (2010) also

suggests that the dorsal striatum participates in self-initiatedsequences of behaviors that lead to reward. In this study, rats weretrained to press two levers in a particular sequence in order to gainaccess to reward. Excitotoxic lesions of the dorsolateral striatumsignificantly impaired the acquisition of the correct sequence,while lesions of the dorsomedial striatum had no significant effecton sequence learning. In terms of reinforcement learning algo-rithms, chunking of behaviors into a coherent ‘whole’ that leads toa desired goal is formalized in hierarchical reinforcement learningmodels (Botvinick et al., 2009). These models are attractive fordescribing goal-directed behavior in complex learning situationsbecause they may be able to more accurately describe multiple‘bits’ of behavior that ultimately lead to goal acquisition, blendingboth model-based and model-free behavioral strategies that arelikely to underlie flexible goal-directed behavior. Learning toexecute learned actions in a complete sequence is essential forsurvival and subserves many routine behaviors, including naviga-tion.

Organizing behaviors into sequences requires that precisetiming and identification of the beginning and the end of acomplete sequence of behaviors. Recent work has elegantlydemonstrated that the ‘stop’ and ‘start’ signals that identify thebeginning and end of self-initiated sequential behavior appear tobe coded within the dorsal striatum (Jin and Costa, 2010). In thisstudy, rats were trained to press a lever on a fixed ratio schedulethat required 8 lever presses to obtain sucrose reward. Over thecourse of training, rats gradually acquired a sequence ofapproximately 8 lever presses, with few responding any more orany less when the lever was active. As the rats learned thebehavioral sequence necessary for obtaining reward, the activity ofneurons within the dorsal striatum and the SNc appeared to reflectthe initiation and termination of the self-paced action sequences.Importantly, control experiments provided evidence that theselearning-related changes in neuronal activity reflected neithermovement speed nor action value (Jin and Costa, 2010). Thus, theseresults have identified a fundamental mechanism that organizesactions into behavioral sequences, and have important implica-tions for complex adaptive behaviors, including goal-directednavigation.

6.5. Interactions between the dorsomedial and dorsolateral striatum

Although many behaviors that are performed on a regular basisare often performed automatically, there are instances when it isnecessary to alter a routine if something in the environmentchanges and the routine behavior is thus rendered inappropriate.The regulation of this behavioral switching can occur eitherretroactively as a result of error feedback or proactively bydetecting a change in context. A salient example that is often givenfor this kind of behavior is driving to work – anecdotally, manypeople have experienced suddenly arriving at work in their carwithout any specific recollection of the journey, despite being thedriver of the car. This is due, in part, to a fairly static context inwhich we transverse the same route, and thus encounter the sametraffic lights, execute the same turns, and become accustomed tothe background scenery around us (buildings, street lights, trees,etc.). When, however, a significant change is encountered on ourdrive to work, for example an unexpected accident that is backingup traffic, we can quite quickly interrupt our behavioral routineand evaluate other available options for getting to work. Thus,when confronted with a change in context, an important decisioncan be made to switch from a routine behavior to an alternativebehavior that will allow us to reach our goal location.

In order for habits to develop, learning needs to occur thatassociates a particular action with a particular outcome. Asdescribed above, this kind of association can be mediated by the

Page 26: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 121

dorsomedial striatum. Once a behavior has been well-learned,however, its performance appears to be mediated by thedorsolateral striatum. If these observations are true, then aquestion that remains is how these different subregions gain ormaintain control over behavior? Recent work by Thorn et al. (2010)suggests that activity within the dorsolateral and dorsomedialstriatum undergo simultaneous changes in their neuronal activitypatterns, but that these changes are unique in each structure aslearning progresses. These results point to a confluence of manyother pieces of data (e.g., Jog et al., 1999; Yin and Knowlton, 2004;Yin et al., 2009) and suggest a current working model in which thedorsomedial striatum regulates the evolution of behavior towardhabit formation. This idea has been further tested by Yin et al.(2009), who identified region specific changes in striatal neuralactivity that map onto different phases of skill learning.Electrophysiological recordings from the dorsolateral and dor-somedial striatum were performed while mice learned anaccelerating rotarod task, a task that requires the gradualacquisition of complex movements to stay on the rotating rod.Performance on this task is characterized by rapid initialimprovement on the first day of training, with performancereaching asymptotic levels after three days of training. Thesebehavioral observations were accompanied by distinct changes inthe rate of neuronal activity in the dorsomedial striatum early on intraining, while the dorsolateral striatum showed increased ratemodulation during the extended training period. Further, whenlesions of the dorsomedial striatum were given prior to training,mice were unable to acquire the skill, but this was not observedwhen lesions were produced after the acquisition of the skill. Incontrast, lesions of the dorsolateral striatum affected both earlyand late phases of training, suggesting that the dorsolateralstriatum and dorsomedial striatum participate in the acquisition ofthe motor skills, but once the skills are learned, the dorsomedialstriatum is no longer engaged. Recordings from slices taken fromthe trained animals demonstrated a potential synaptic mechanismfor this transition: medium spiny neurons in both the dorsomedialand dorsolateral striatum exhibited training phase-relatedchanges in glutamatergic transmission. The slope of excitatorypostsynaptic potentials, a measure of synaptic strength, was higherselectively in the dorsomedial striatum following early training,while an increase in synaptic strength was higher in thedorsolateral striatum only after extended training. Although thistask is different from more traditional learning tasks (instrumen-tal-operant or maze tasks), it is likely to point to a fundamentalsynaptic mechanism that underlies the transition from action–outcome/associative learning to well learned habit/motor skills,irrespective of the task used.

In summary, there is emerging evidence that the striatumfunctions to evaluate the outcomes of behaviors in terms of anorganism’s learned expectations. Through a series of interactiveloops of information flow between the striatum and differentcortical and subcortical structures, behavioral responses and theirexpected consequences will become more refined and predictable.Ultimately, a well learned behavioral response will develop as thedorsolateral striatum assumes greater control over behavior. Thesefunctions must ultimately be coordinated with the contextsaliency function of the hippocampus so that the ‘best’ behaviorscan be selected within the correct context or decision makingenvironment. How this coordination among different brainstructures occurs is discussed in the following section.

7. Neural systems coordination: cellular mechanisms

Understanding how, and under what conditions, neuralsystems interact is no small feat, even with a tractable modelsuch as goal-directed navigation. This is the case not only because

multiple neural systems are involved, but also the adaptivefeatures of this behavioral model depend on conditional anditerative processing loops, as well as coordination at multiplelevels of neural function (from single neuron to specific interac-tions between brain structures). Also contributing to the difficultyof studying complex behaviors are the dynamic ways in which thenature of the signals transmitted to efferent structures can change,both in terms of information content, and whether such signalsserve activating, inhibiting, or permissive roles. Moreover, much ofthe existing literature on the neurobiology of complex behaviorsconsiders rate codes of neurons, and to a lesser degree, temporalcodes, although this is changing in more recent studies. At a higher,more integrative level, the identity of the coordinating mechanismof orchestrated neural activity is not yet known. With regard to thelatter issue, a likely possibility is that the primary determinant ofthe interactive and dynamic patterns that emerge may not beattributed to a single brain structure but rather a state, such as amotivational or emotional state.

7.1. Single cells and local network coordination

The functional orchestration of neural systems that underliecomplex behaviors should be expected to involve integrationwithin and across multiple levels of processing, from cellular tolocal circuit to neural systems. We are only beginning tounderstand how such integration can happen, and studies ofgoal-directed navigation have begun to reveal important clues.Starting at the level of single neurons, it is known that dopaminehas effects across different timescales in different brain structures,and this may define the type of coordination that is possible at anygiven point in time. In the hippocampus, a short lasting effect ofdopamine may be to determine the location of a place field (Martiget al., 2009), while a long lasting effect could be to enhance theduration of the post-event period of plasticity (Huang and Kandel,1995; Otmakhova and Lisman, 1996; Rossato et al., 2009). Byprolonging periods of plasticity, dopamine activation may make itso that there is sufficient time for accurate context analysis, aprocess that in turn determines which memories are formed orupdated. An example of how this might work can be seen when oneconsiders place field responses during learning: place fieldsbecome sequentially associated as rats repeatedly traverse a pathon its way to reward. This sequential activation of place cells wasshown to repeat itself ‘off-line’ during subsequent period ofrelative inactivity (e.g., Lee and Wilson, 2002; Louie and Wilson,2001; Wilson and McNaughton, 1994). This pattern of neural‘replay’ is consistent with many theories of memory including theidea that optimal memory requires the reactivation of behavioralexperiences, typically during periods of sleep or rest (Buzsaki,1989; Marr, 1971; McClelland et al., 1995; Pennartz et al., 2002).Interestingly, dopamine has been shown to facilitate hippocampal‘replay’ of sequences of place fields (Singer and Frank, 2009). Thus,dopamine may direct both cellular (e.g., place field location) andcircuit level (e.g., sequential activation of place fields) neuralorganization within the hippocampus. In this way, dopamine isnecessary for synaptic plasticity within the hippocampus.

The replay of temporally ordered neural activity has beenprimarily studied in populations of hippocampal pyramidal cellsthat exhibit place fields (Skaggs et al., 1996; Wilson andMcNaughton, 1994), where it is assumed to underlie spatial andcontextual information processing. Work by Pennartz et al. (2004),however, indicates that this kind of replay may reflect a commonprocess that enables binding of many kinds of information. In thatstudy, replay of sequences of neural activity was found to alsooccur in the ventral striatum during periods of rest that followperiods of activity. Moreover, recent work suggests that reward-related replay contributes a motivational component to a

Page 27: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135122

reactivated memory trace (Lansink et al., 2008). A follow-up studyby the same group (Lansink et al., 2009) further demonstrated thathippocampal–striatal ensembles reactivated together duringsleep. This process was especially strong in pairs in which thehippocampal cell processed spatial information and ventral striatalfiring correlated to reward, suggesting a mechanism for consoli-dating place-reward associations.

7.2. Neural systems organization and oscillatory activity

Neural circuits have a natural tendency to oscillate according to awide range of frequencies, and as such are likely to reflect afundamental mechanism for coordinating neural activity acrossmultiple brain regions (e.g., Buzsaki, 2006; Fries, 2009). Goal-directed navigation likely requires a high degree of coordination ofmultiple forms of information so that decisions can be made quickly.Thus, it seems reasonable to assume that a rich array of rhythmiccoordination occurs as animals engage in decision processes duringnavigation. Oscillatory activity reflects alternating periods ofsynchronous and desynchronous neural firing: synchronous activityis associated with greater synaptic plasticity and stronger couplingamong cells of an ensemble, while desynchronous activity isassociated with period of less plasticity and weak signal strength(Buzsaki, 2006; Hasselmo, 2005b; Hasselmo et al., 2002).

7.2.1. Theta rhythms

Numerous laboratories have now reported that synchronousneural activity (in particular coherence of the theta rhythm) can bedetected across local neural networks both within and betweenbrain structures such as the hippocampus, striatum, or prefrontalcortex (DeCoteau et al., 2007a; Engel et al., 2001; Fell et al., 2001;Siapas et al., 2005; Tabuchi et al., 2000; Varela et al., 2001;Womelsdorf et al., 2007). For example, hippocampal theta activitymodulates the probability of neuronal firing, and theta can becomesynchronized with place cell firing, serving to coordinate thetiming of spatial coding (Gengler et al., 2005; O’Keefe and Recce,1993). A growing number of studies demonstrate coordinatedneural activity between the hippocampus and the striatum. Thetaoscillations within the striatum can become entrained to thehippocampal theta rhythm (Allers et al., 2002; Berke et al., 2004;DeCoteau et al., 2007a). Stimulating the striatum can inducehippocampal theta activity (Sabatino et al., 1985) and increaseshigh frequency theta power, which is thought to be important forsensorimotor integration (Hallworth and Bland, 2004). Whenneural activity is disrupted in the striatum via D2 receptorantagonism, striatal modulation of high frequency hippocampaltheta activity is reduced, motor and spatial/contextual informationis not integrated, and task performance is impaired (Gengler et al.,2005). It appears then that during goal directed navigation,hippocampal and striatal activity becomes increasingly coherent,and this pattern appears dopamine dependent.

Particularly intriguing is a finding common to both thehippocampus and striatum: synchronous neural activity occursin specific task-relevant ways (e.g., Hyman et al., 2005; Jones andWilson, 2005), and in particular, during times when rats are said tobe engaged in decision making (e.g., Benchenane et al., 2010). Forexample, striatal theta is modified over the course of learning anegocentric T-maze task, increasing as the rat chooses and initiatesturn behavior (DeCoteau et al., 2007a,b). Rats that learned the taskdeveloped an antiphase relationship between hippocampal andstriatal theta oscillations, while rats that did not learn the task alsodid not show this coherent theta relationship. This coherence hasalso been observed during striatal-dependent classical condition-ing (Kropf and Kuschinsky, 1993).

Coherent theta oscillations across distant brain structures canbe enhanced with application of dopamine, at least in anesthetized

rats (Benchenane et al., 2010). Assuming this is also the case inawake navigating rats, it may be that dopamine plays a crucial rolein coordinating ensemble activity across brain areas within adecision-making network during navigation. Functionally, thistype of control by dopamine suggests that information about thesaliency of reward may determine which brain systems becomesynchronized (and desynchronized), and this in turn informsdecisions about what information is used to update memories andwhich behaviors are selected.

7.2.2. Gamma rhythms

Neuronal groups are observed to synchronize their activity atfrequencies that are higher than the theta rhythm. In particular it isnow well established that many brain areas exhibit rhythmicneural activity in the gamma band (30–100 Hz). These includemany sensory and motor areas of cortex, hippocampus, parietalcortex, and striatum (e.g., Bauer et al., 2006; Berke et al., 2004;Brosch et al., 2002; Csicsvari et al., 2003; Hoogenboom et al., 2006;Leung and Yim, 1993; Womelsdorf et al., 2006). In all cases, it isthought that the inhibitory interneuron networks within eachstructure play a major role in generating synchronized gammaoscillations (e.g., Bartos et al., 2007; Vida et al., 2006; Whittingtonet al., 1995). The functional importance of gamma oscillationsremains debated. However, since gamma oscillations tend to occurintermittently (i.e., in the form of a ‘gamma burst’ of about 150–250 ms followed by periods of desynchronous activity), informa-tion carried by the cells that participate in a gamma-bursteffectively become a noticeable punctuate signal against abackground of disorganized neural activity. For this reason, ithas been suggested that gamma-bursts represent a fundamentalmechanism by which information becomes segmented and/orfiltered within a structure, as well as a way to coordinateinformation across structures (Buzsaki, 2006). Although thetaand gamma frequencies vary by quite a bit (perhaps reflecting thetype of information that each rhythm coordinates), there are manycommon physiological and behavioral relationships that suggestthey are components of a coordinated and larger scale oscillatorynetwork. For example, similar to theta rhythms, single unitresponses that are recorded simultaneously with gamma oscilla-tions have been found to have specific phase relationships to thegamma rhythm (e.g., Berke, 2009; Kalenscher et al., 2010; van derMeer and Redish, 2009). Also, it is hypothesized that gammaoscillations may effectively select salient information that cancome to impact decisions, learning, and behavioral responses (e.g.,Kalenscher et al., 2010; van der Meer and Redish, 2009) since theirappearance is often in relation to task-relevant events. Anothersimilarity with the theta system is that the occurrence gammaoscillations appear to be at least in part regulated by the dopaminesystem (Berke, 2009).

7.2.3. Coordination of theta and gamma rhythms

It appears that task demands dictate the nature of neuralsynchrony across distal brain structures, suggesting that coordi-nation of neural activity across brain structures has at least amnemonic component. A recent study (Fujisawa and Buzsaki,2010) showed that such an influence may come in the form of avery low frequency (4 Hz) entrainment of local field potentialsacross brain areas (e.g., the 7–12 Hz theta oscillation). In thatstudy, a 4-Hz rhythm emerged only during phases of a maze taskwhen rats made decisions (i.e., in the stem of a T-Maze). Duringdecision periods, the 4 Hz rhythm was phase locked to the thetaoscillations in both the prefrontal cortex and VTA. Some of theindividual prefrontal and VTA neurons were also phase locked tohippocampal theta oscillation at this time. Importantly, the 4 Hzrhythm was present only during a decision making period whentheta oscillations were also present. The findings of the Fujisawa

Page 28: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 123

and Buzsaki (2010) study suggest that a 4 Hz rhythm maycoordinate activity in distal brains structures specifically asanimals make decisions during goal-directed navigation. Itremains to be seen whether dopamine selectively activates the4 Hz rhythm when decisions need to be made.

8. Neural systems coordination: decisions and commonforaging behaviors

Successful decisions during goal-directed navigation likelydepend on a hierarchy of systems and cellular level interactions inthe brain. The accompanying video (http://depts.washington.edu/mizlab) demonstrates on a basic level, the relative involvement ofthe hippocampus, the dopamine system, and the ventral and dorsal(medial and lateral) striatum during a simple food search task on alaboratory maze. Particular attention is paid to the relativecontributions of these brain areas during each of the five ‘states’of processing in Fig. 3, and as a function of novel exposure, newlearning, and asymptotic performance levels.

To illustrate in more detail the functional interactions of thesame brain regions during common foraging scenarios, thefollowing are neural and behavioral explanations for how animalsmake adaptive choices while navigating familiar environments,how decisions are adjusted when familiar conditions change, andthen how this same circuitry mediates rapid and adaptive learningwhen animals find themselves in novel situations.

8.1. Goal directed navigation in a familiar context

There is clearly a home court advantage when it comes to ananimal’s survival. If animals are familiar with their environment,they are more likely to make good choices when it comes todeciding when and where to secure food, safe shelter, and mates.This is the case not only because animals have learned the physicalcharacteristics of the environment, but perhaps more importantlybecause they have learned to identify its salient features. Thesesalient features have taken on predictive value based on theexpected probability of reward given certain levels of effort. Thisinformation can be used to make choices that are appropriate fordifferent motivational and behavioral states. Under constantconditions, obtaining a predicted outcome should result in thestrengthening of the memories that were used to guide decisionsand behavioral choices in the first place.

It is postulated that the motivational state of an animalpredisposes them to pay attention to specific cues within a familiarenvironment, cues that have been previously associated with goalacquisition. In this way, memories of past behavioral outcomes of,for example a hungry rat, defines the appropriate behavioralresponses needed to obtain maximum amounts of food withminimal effort or temporal delay. Based on the extensive literaturesummarized previously, it seems reasonable to assume that whena rat enters a familiar environment in search of food, itstranslational movement generates (movement-sensitive) thetarhythms in hippocampal regions, resulting in the activation of aspatial coordinate system that in turn imposes an experience-determined spatial organization to information used during thecurrent event. The clearest neural instantiation of such anorganization (often referred to as a spatial reference frame, mapor chart) is represented by the grid cells of medial entorhinalcortex. While there remain unresolved issues about how such areference system actually works (e.g., does a given ‘map’ resetduring a single navigational event, and if so how and under whatconditions?), the current view is that both learned spatial andnonspatial information arrive in hippocampus via the medial andlateral entorhinal cortices, respectively. Upon entering a familiarenvironment, the medial entorhinal spatial reference includes not

only a representation of the current spatial structure of anenvironment, but also an experience-dependent definition of arat’s expectations for the sensory environment that is, itself,influenced by the appropriate behavioral repertoire, and expecta-tions about the consequences of decisions and choices. Lateralentorhinal cortex is presumably also activated by current (but inthis case nonspatial) sensory input as well as the same set ofexpectations (i.e., memories) that influence medial entorhinalcortical processing. With the combined input from medial andlateral entorhinal cortex, the hippocampus can determine theextent to which the rat’s (spatial and nonspatial) expectations forthe current context are met.

When goals are achieved as predicted (e.g., food is found inexpected locations), hippocampal output may have the effect ofstrengthening currently active memory circuits thereby increasingthe likelihood that the same decisions and behaviors will beselected the next time the rat is in the same familiar situation. Thesignal strength to ventral striatum would be expected to bemoderate, resulting in ventral striatal output that maintains abaseline level of inhibitory control over VTA neural responses toreward encounters. That is, when rats encounter rewards inexpected locations, there should be no VTA response to the rewardencounter itself. If an animal finds itself engaging in ratherstereotyped or habitual behaviors in the familiar environment, it islikely that the dorsolateral striatum exerts more control overbehavior than ventral striatum since dorsolateral striatum isparticularly involved in the performance of habitual behaviors(e.g., Atallah et al., 2007; Jog et al., 1999; Thorn et al., 2010; Yin andKnowlton, 2004; Yin et al., 2009 as discussed above).

VTA dopamine neurons are known to increase firing when ananimal encounters cues that predict reward (in familiar testconditions, e.g., Puryear et al., 2010; Schultz et al., 1997). Thesecue-elicited responses may arrive from the frontal cortex as thereis little evidence of predictive cue processing in at least two othermajor VTA afferent structures (e.g., the PPTg and LDTg). Thus,during navigation in a familiar environment, both frontal cortexand hippocampus may determine the timing of dopamine cells’contribution to reward processing. Although the details of theunderlying neurocircuitry are presently not clear, this pattern ofdopamine cell firing to cues and rewards results in the mainte-nance of the currently active memory networks.

8.2. Goal directed navigation in a familiar context following a

significant change in context

The natural environment is a continuously changing one. Thus,even when a rat navigates a familiar environment, the hippocam-pus should automatically and continuously evaluate the saliency ofthe current context. In that way, when a rat encounters a change inthe expected matrix of context information, hippocampal outputcan immediately reflect the detected change to assess the need tochange decisions and behaviors. Note that since a given context iscomprised of multiple features, a detectable change in any onefeature should result in a signal that the context is different. Theimpact of detecting a context change on subsequent behaviorsdepends on the processing within efferent target structures.

When an unexpected behavioral outcome or stimulus configu-ration occurs in a familiar environment, rats increase exploratoryactivity and attention to potential cues. The latter would beexpected to result from the reorganization of spatial representa-tions (e.g., grid and place cells) in hippocampal systems. Thehippocampal reorganization would in turn generate an output thatreflects the context change. In anticipation of receipt of newinformation, striatal theories (e.g., Belin and Everitt, 2008;Humphries and Prescott, 2010; Salamone et al., 2009) suggestthat when there is a significant change in a familiar environment,

Page 29: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135124

ventral striatum may come to play a greater role in behavioralcontrol than dorsal striatum. According to the circuitry presentedin Fig. 6, hippocampal output to the ventral striatum canpotentially activate two pathways of information flow to theVTA. According to a scenario described by Humphries and Prescott(2010), the ventral striatum in turn relays information aboutreward expectations via a direct inhibitory pathway to the VTA,and information about the actual rewards via an indirect excitatorypathway (the ventral pallidum and the PPTg) to the VTA. When theactual rewards occur as expected, there is comparable inhibitoryand excitatory control over dopamine cell responses to reward.This balanced pattern of input results in no response to rewards bydopamine neurons. Indeed dopamine cells do not respond to theacquisition of expected rewards. If however, the actual reward isgreater than expected, the excitatory drive should be greater thanthe inhibitory one, resulting in increased firing to reward bydopamine cells. Perhaps the increased excitatory ventral striatalinput transitions dopamine cell membranes to a relative depolar-ized state. On the other hand, if the actual reward is less thanexpected, the inhibitory drive becomes greater than the excitatoryone and this is manifest as reduced firing at the time of expectedrewards. Either of these altered dopamine responses to reward

HippocampusDopamin(VTA/SN

Context SalienceReward Salience

ExpectationsBehavioral state

Context Salience

Teaching signal

Theta wave

Goal-directed

Fig. 10. Orchestration of neural systems while animals make decisions during goal-dir

multiple types of information (e.g., context salience, reward salience, expectations (based

that all of these types of information are represented in some way within different neural

and dorsal striatum are shown. Thus, the nature of information represented does not c

directed navigation. Rather, the specialized contributions of different neural systems mu

connectivity), and the particular efferent structures that receive their output messag

evaluation of the salience of the current context, dopamine cells signal changes in expecte

in efferent structures), the ventral striatum determines whether the outcomes of behavio

ventral striatal analysis. Especially during new learning, the dorsal medial striatum pla

model free, the dorsal lateral striatum serves the ‘actor’ role. These neural systems do not

on specific task demands, neural activity may become synchronized across combinations

Importantly the synchronization appears to happen at times when decisions should be m

systems interactions will occur. One possibility is that a very low frequency oscillation (

neural systems (Fujisawa and Buzsaki, 2010). Since general physiological states are kn

Shapiro, 2004), it is suggested here that physiological states, such as hunger, fear, and stre

in order for animals to make optimal decisions relative to the achievement of specific

have been interpreted as ‘teaching signals’ for other neural systems(Schultz and Dickinson, 2000).

The outcome of a striatal/VTA evaluation of the reinforcementoutcomes of context-dependent behaviors is likely used by striatalefferent systems to modify decisions about which behaviors toengage and which memories to modify. As memories becomeupdated, so do the expectations for a given spatial context.Assuming that the expected spatial context input to hippocampusis continuously refreshed, the context discrimination can alwaysproceed with the most recent information from neocortex.

8.3. Goal directed navigation in a novel context

Recent evidence suggests that at least rats have an innate,though initially rudimentary, spatial navigation-related neuralnetwork that continues to develop over time (Langston et al., 2010;Wills et al., 2010). While the directional heading circuitry appearsadult-like from a very young age, the grid and location systemstake more time to develop. As experiences accumulate over alifetime then so might the efficiency of a context-dependentnavigation circuit. Learning is faster when the outcomes ofbehaviors are predictable, and predictability can be enhanced

e)

Ventralstriatum

Dorsalstriatum

(Medial) (Lateral)

Critic

Actor-model based

Actor-model free

Gamma wave

navigation

ected navigation. Accurate goal-directed navigation requires precise integration of

on memories), and one’s behavioral state). Based on the current literature, it is clear

systems. For illustration purposes, only the hippocampus, dopamine system, ventral

learly reveal the unique contributions of any one of these neural systems to goal-

st be defined by their computational capacities (i.e., their intrinsic patterns of neural

es. Converging evidence supports the view that hippocampal output reflects an

d reward values (and in doing so serve as a ‘teaching signal’ that updates processing

r were predicted, and dorsal striatum selects the appropriate behavior based on the

ys this ‘actor’ role for model based learning. As learning and performance become

necessarily function independently. Rather emerging findings show that, depending

of two or three brain structures according to theta and gamma rhythm frequencies.

ade. This suggests that there may be some overarching factor that determines when

4 Hz) coordinates the theta and gamma coherence that has been observed between

own to alter patterns of neural representations during learning (e.g., Kennedy and

ss, may determine the kind of neural systems orchestration that needs to take place

kinds of goals.

Page 30: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 125

even in a novel environment, if the significance of at least a subsetof contextual features can be inferred from past experiences withsimilar features.

Novelty coding by the navigational circuit is typically testedwith rats that have been trained to forage for food in oneenvironment, then placed in a new testing environment but askedto perform the same behaviors (e.g., search for randomly place foodin a novel open arena). Thus, the task rules and motor instructionsfor the novel context are previously learned, but there is novelty interms of the cues that are present to inform goal-directed choices.The familiar features (e.g., the narrow alleys of a maze, an enclosedtesting area with cues, and the fact that rewards can be found onsuch mazes) should immediately activate a ‘best match’ referenceframe that can be used to guide initial exploration and goaldirected decisions. As consequences to choices occur and learningtakes place, the difference between expected and actual contextwill diminish, and the relevant memory and reference frame willbe updated accordingly. At the point when the expected contextsand behavioral outcomes match what actually occur, one canconclude that learning is complete. This new learning process mayuse similar neural circuitry as that described above wheninformation about changes in an expected context updatememories. In this way, behaviors that increase cue predictabilityand reduce unexpected outcomes will be associated with specificcues.

If a rat with no testing experience is placed in an experimentalarena for the first time, the rat may still bring to bear a ‘best match’option, or some minimal form of spatial reference within which toincorporate new information into the memories that are beingcreated during learning. For example, the rat may have learned theidentity of a safe new food, but now needs to learn the rules thatlead to the efficient, most cost-effective strategy for securing thefood. Compared to a foraging situation when there are slightchanges in a familiar context, it should take more trials or moretime to reach the point when the expectations match the actualoutcomes (i.e., when learning is complete).

9. The challenges ahead

A big challenge facing the general field of neuroscience is tounderstand the dynamic neural mechanisms that underliecomplex and adaptive natural behaviors. A first step towardaddressing this challenge could be to integrate existing literatureson specific components of the adaptive behavior of interest, such ascontext processing and decision-making that occurs during goal-directed navigation. In addition, new findings indicate thatdecision making during navigation is a powerful model for notonly defining neural and behavioral states that are relevant to thisbehavior, but also for understanding how these states switchprocessing modes during natural learning situations. The identifi-cation of such ‘switching mechanisms’ is important for ourunderstanding of what leads to decisions to ‘stay the course’ orchange behaviors. It is proposed that the motivational state of theanimal establishes the intended goals, and as such sets thethresholds for, and constraints on, neural activation acrossmultiple brain structures. A summary of key elements of thismodel is shown in Fig. 10.

An explanation of the neurobiological mechanisms that supportdecisions during goal-directed navigation will undoubtedlybecome more complex. This is the case not only because of thetechnological advances in our ability to probe brain function, butalso because of the following:

(a) There are other important contributing factors that were notdiscussed here. Examples include the possible roles ofserotonin, acetylcholine, enkephalins, A2A receptors, and GABA

in reinforcement learning (e.g., Doya, 2008; Farrar et al., 2008,2007, 2010; Font et al., 2008; Mingote et al., 2008a,b; Miyazakiet al., 2011; Mott et al., 2009; Ragozzino, 2003; Ragozzino et al.,2009; Worden et al., 2009).

(b) There are many unanswered questions regarding the role ofdopamine in decision making and learning. For example, doesdopamine have the same impact on synaptic and behavioralfunctions in all brain regions that receive dopamine inputs?The answer is likely yes and no. Dopamine appears to facilitateexcitation in efferent structures, although the details andtemporal details may vary. Even if the degree of excitabilitywere the same in different brain areas, the impact on behaviorwill likely be different since different structures (e.g.,hippocampus and striatum) engage unique intrinsic computa-tional architectures to process similar information (e.g., spatial,movement, and reward). Another critical issue whose resolu-tion will impact future theoretical explanations of decisionmaking during navigation is the regulation and meaning oftonic release of dopamine. For instance, tonic levels ofdopamine may contribute to defining the overall motivationor goals during navigation (e.g., Niv et al., 2007).

(c) When recording in navigating animals, it is clear that against aforeground of interesting task-relevant firing is a backgroundof neural codes for egocentric movements exhibited by theanimal. The meaning of this seemingly universal coding ofegocentric information remains elusive. An intriguing possi-bility is that such codes guide specific task-relevant codes in amanner analogous to the way that intended movements appearto bias sensory responses by cortical neurons (e.g., Colby, 1998;Colby and Goldberg, 1999). Interestingly, the movement-related cells are often interpreted as reflecting the firingpatterns of inhibitory interneurons, the specific function ofwhich are only beginning to be appreciated.

The existence of many unresolved issues should not detercontinued and intensive investigation of the adaptive navigation-based heuristic for complex learning situations. Rather, because itis evolutionarily highly conserved, this model holds great promisefor continuing to reveal fundamental organizing principles withinand across neural systems, as well as between neural systemsfunctions and behavior.

Acknowledgements

We thank Yong Sang Jo for helpful comments on earlier versionsof this manuscript and for producing all of the figures, TrevorBortins for producing the video linked to the article, DanielaJaramillo for help managing references, Drs. Jeremy Clark andAndrea Stocco for insightful discussion regarding the striatum, andDr. Van Redila for comments on an earlier version. We also thankanonymous reviewers for their comments. This work is funded byNIMH grant MH58755.

References

Aberman, J.E., Salamone, J.D., 1999. Nucleus accumbens dopamine depletions makerats more sensitive to high ratio requirements but do not impair primary foodreinforcement. Neuroscience 92, 545–552.

Aberman, J.E., Ward, S.J., Salamone, J.D., 1998. Effects of dopamine antagonists andaccumbens dopamine depletions on time-constrained progressive-ratio per-formance. Pharmacol. Biochem. Behav. 61, 341–348.

Albin, R.L., Young, A.B., Penney, J.B., 1989. The functional anatomy of basal gangliadisorders. Trends Neurosci. 12, 366–375.

Alderson, H.L., Latimer, M.P., Winn, P., 2008. A functional dissociation of the anteriorand posterior pedunculopontine tegmental nucleus: excitotoxic lesions havedifferential effects on locomotion and the response to nicotine. Brain Struct.Funct. 213, 247–253.

Aldridge, J.W., Berridge, K.C., 1998. Coding of serial order by neostriatal neurons: a‘‘natural action’’ approach to movement sequence. J. Neurosci. 18, 2777–2787.

Page 31: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135126

Alexander, G.E., Crutcher, M.D., 1990a. Functional architecture of basal gangliacircuits: neural substrates of parallel processing. Trends Neurosci. 13, 266–271.

Alexander, G.E., Crutcher, M.D., 1990b. Preparation for movement: neural repre-sentations of intended direction in three motor areas of the monkey. J. Neu-rophysiol. 64, 133–150.

Alexander, G.E., DeLong, M.R., Strick, P.L., 1986. Parallel organization of functionallysegregated circuits linking basal ganglia and cortex. Annu. Rev. Neurosci. 9,357–381.

Allers, K.A., Ruskin, D.N., Bergstrom, D.A., Freeman, L.E., Ghazi, L.J., Tierney, P.L.,Walters, J.R., 2002. Multisecond periodicities in basal ganglia firing ratescorrelate with theta bursts in transcortical and hippocampal EEG. J. Neurophy-siol. 87, 1118–1122.

Amaral, D.G., Ishizuka, N., Claiborne, B., 1990. Neurons, numbers and the hippo-campal network. Prog. Brain Res. 83, 1–11.

Amaral, D.G., Lavenex, P., 2006. Hippocampal neuroanatomy. In: Anderson, P.,Morris, R., Amaral, D., Bliss, T., O’Keefe, J. (Eds.), The Hippocampus. OxfordUniversity Press, Oxford.

Anagnostaras, S.G., Gale, G.D., Fanselow, M.S., 2001. Hippocampus and contextualfear conditioning: recent controversies and advances. Hippocampus 11, 8–17.

Anderson, M.I., Jeffery, K.J., 2003. Heterogeneous modulation of place cell firing bychanges in context. J. Neurosci. 23, 8827–8835.

Anderson, O., 1984. Optimal foraging by largemouth bass in structured environ-ments. Ecology 65, 851–861.

Annett, L.E., McGregor, A., Robbins, T.W., 1989. The effects of ibotenic acid lesions ofthe nucleus accumbens on spatial learning and extinction in the rat. Behav.Brain Res. 31, 231–242.

Aragona, B.J., Day, J.J., Roitman, M.F., Cleaveland, N.A., Wightman, R.M., Carelli, R.M.,2009. Regional specificity in real-time development pf phasic dopamine trans-mission patterns during acquisition of a cue-cocaine association in rats. Eur. J.Neurosci. 30, 1889–1899.

Astur, R.S., Ortiz, M.L., Sutherland, R.J., 1998. A characterization of performance bymen and women in a virtual Morris water task: a large and reliable sexdifference. Behav. Brain Res. 93, 185–190.

Atallah, H.E., Lopez-Paniagua, D., Rudy, J.W., O’Reilly, R.C., 2007. Separate neuralsubstrates for skill learning and performance in the ventral and dorsal striatum.Nat. Neurosci. 10, 126–131.

Bach, M.E., Barad, M., Son, H., Zhuo, M., Lu, Y.F., Shih, R., Mansuy, I., Hawkins, R.D.,Kandel, E.R., 1999. Age-related defects in spatial memory are correlated withdefects in the late phase of hippocampal long-term potentiation in vitro and areattenuated by drugs that enhance the cAMP signaling pathway. Proc. Natl. Acad.Sci. U.S.A. 96, 5280–5285.

Balleine, B.W., Delgado, M.R., Hikosaka, O., 2007. The role of the dorsal striatum inreward and decision-making. J. Neurosci. 27, 8161–8165.

Balleine, B.W., Dickinson, A., 1998. Goal-directed instrumental action: contingencyand incentive learning and their cortical substrates. Neuropharmacology 37,407–419.

Balleine, B.W., Liljeholm, M., Ostlund, S.B., 2009. The integrative function of thebasal ganglia in instrumental conditioning. Behav. Brain Res. 199, 43–52.

Balleine, B.W., O’Doherty, J.P., 2010. Human and rodent homologies in actioncontrol: corticostriatal determinants of goal-directed and habitual action.Neuropsychopharmacology 35, 48–69.

Bardgett, M.E., Depenbrock, M., Downs, N., Points, M., Green, L., 2009. Dopaminemodulates effort-based decision making in rats. Behav. Neurosci. 123, 242–251.

Bardo, M.T., Donohew, R.L., Harrington, N.G., 1996. Psychobiology of noveltyseeking and drug seeking behavior. Behav. Brain Res. 77, 23–43.

Bardo, M.T., Dwoskin, L.P., 2004. Biological connection between novelty- and drug-seeking motivational systems. Nebr. Symp. Motiv. 50, 127–158.

Bardo, M.T., Neisewander, J.L., Pierce, R.C., 1989. Novelty-induced place preferencebehavior in rats: effects of opiate and dopaminergic drugs. Pharmacol. Biochem.Behav. 32, 683–689.

Barnes, C.A., 1979. Memory deficits associated with senescence: a neurophysiologi-cal and behavioral study in the rat. J. Comp. Physiol. Psychol. 93, 74–104.

Barnes, C.A., McNaughton, B.L., Mizumori, S.J., Leonard, B.W., Lin, L.H., 1990.Comparison of spatial and temporal characteristics of neuronal activity insequential stages of hippocampal processing. Prog. Brain Res. 83, 287–300.

Barnes, T.D., Kubota, Y., Hu, D., Jin, D.Z., Graybiel, A.M., 2005. Activity of striatalneurons reflects dynamic encoding and recoding of procedural memories.Nature 437, 1158–1161.

Barnes, T.D., Mao, J.B., Hu, D., Kubota, Y., Dreyer, A.A., Stamoulis, C., Brown, E.N.,Graybiel, A.M., 2011. Advance-cueing produces enhanced action-boundarypatterns of spike activity in the sensorimotor striatum. J. Neurophysiol. 105,1861–1878.

Barry, C., Hayman, R., Burgess, N., Jeffery, K.J., 2007. Experience-dependent rescalingof entorhinal grids. Nat. Neurosci. 10, 682–684.

Bartos, M., Vida, I., Jonas, P., 2007. Synaptic mechanisms of synchronized gammaoscillations in inhibitory interneuron networks. Nat. Rev. Neurosci. 8, 45–56.

Bauer, M., Oostenveld, R., Peeters, M., Fries, P., 2006. Tactile spatial attentionenhances gamma-band activity in somatosensory cortex and reduces low-frequency activity in parieto-occipital areas. J. Neurosci. 26, 490–501.

Baunez, C., Robbins, T.W., 1999. Effects of dopamine depletion of the dorsal striatumand further interaction with subthalamic nucleus lesions in an attentional taskin the rat. Neuroscience 92, 1343–1356.

Bayer, H.M., Glimcher, P.W., 2005. Midbrain dopamine neurons encode a quantita-tive reward prediction error signal. Neuron 47, 129–141.

Beckstead, R.M., Domesick, V.B., Nauta, W.J., 1979. Efferent connections of thesubstantia nigra and ventral tegmental area in the rat. Brain Res. 175, 191–217.

Behr, J., Gloveli, T., Schmitz, D., Heinemann, U., 2000. Dopamine depresses excit-atory synaptic transmission onto rat subicular neurons via presynaptic D1-likedopamine receptors. J. Neurophysiol. 84, 112–119.

Belin, D., Everitt, B.J., 2008. Cocaine seeking habits depend upon dopamine-depen-dent serial connectivity linking the ventral with the dorsal striatum. Neuron 57,432–441.

Benchenane, K., Peyrache, A., Khamassi, M., Tierney, P.L., Gioanni, Y., Battaglia, F.P.,Wiener, S.I., 2010. Coherent theta oscillations and reorganization of spiketiming in the hippocampal–prefrontal network upon learning. Neuron 66,921–936.

Beninato, M., Spencer, R.F., 1987. A cholinergic projection to the rat substantia nigrafrom the pedunculopontine tegmental nucleus. Brain Res. 412, 169–174.

Berendse, H.W., Galis-de Graaf, Y., Groenewegen, H.J., 1992a. Topographical orga-nization and relationship with ventral striatal compartments of prefrontalcorticostriatal projections in the rat. J. Comp. Neurol. 316, 314–347.

Berendse, H.W., Groenewegen, H.J., Lohman, A.H., 1992b. Compartmental distribu-tion of ventral striatal neurons projecting to the mesencephalon in the rat. J.Neurosci. 12, 2079–2103.

Berke, J.D., 2009. Fast oscillations in cortical–striatal networks switch frequencyfollowing rewarding events and stimulant drugs. Eur. J. Neurosci. 30, 848–859.

Berke, J.D., Okatan, M., Skurski, J., Eichenbaum, H.B., 2004. Oscillatory entrainmentof striatal neurons in freely moving rats. Neuron 43, 883–896.

Berridge, K.C., 2007. The debate over dopamine’s role in reward: the case forincentive salience. Psychopharmacology (Berl) 191, 391–431.

Berridge, K.C., Robinson, T.E., 1998. What is the role of dopamine in reward: hedonicimpact, reward learning, or incentive salience? Brain Res. Brain Res. Rev. 28,309–369.

Berridge, K.C., Whishaw, I.Q., 1992. Cortex, striatum and cerebellum: control ofserial order in a grooming sequence. Exp. Brain Res. 90, 275–290.

Bethus, I., Tse, D., Morris, R.G., 2010. Dopamine and memory: modulation of thepersistence of memory for novel hippocampal NMDA receptor-dependentpaired associates. J. Neurosci. 30, 1610–1618.

Bezzina, G., Body, S., Cheung, T.H., Hampson, C.L., Deakin, J.F., Anderson, I.M.,Szabadi, E., Bradshaw, C.M., 2008. Effect of quinolinic acid-induced lesions ofthe nucleus accumbens core on performance on a progressive ratio schedule ofreinforcement: implications for inter-temporal choice. Psychopharmacology(Berl) 197, 339–350.

Bjorklund, A., Dunnett, S.B., 2007. Dopamine neuron systems in the brain: anupdate. Trends Neurosci. 30, 194–202.

Boeijinga, P.H., Mulder, A.B., Pennartz, C.M., Manshanden, I., Lopes da Silva, F.H.,1993. Responses of the nucleus accumbens following fornix/fimbria stimula-tion in the rat. Identification and long-term potentiation of mono- and poly-synaptic pathways. Neuroscience 53, 1049–1058.

Bornstein, A.M., Daw, N.D., 2011. Multiplicity of control in the basal ganglia:computational roles of striatal subregions. Curr. Opin. Neurobiol. 21, 374–380.

Botvinick, M.M., Niv, Y., Barto, A.C., 2009. Hierarchically organized behavior and itsneural foundations: a reinforcement learning perspective. Cognition 113, 262–280.

Boyd, L.A., Edwards, J.D., Siengsukon, C.S., Vidoni, E.D., Wessel, B.D., Linsdell, M.A.,2009. Motor sequence chunking is impaired by basal ganglia stroke. Neurobiol.Learn. Mem. 92, 35–44.

Brischoux, F., Chakraborty, S., Brierley, D.I., Ungless, M.A., 2009. Phasic excitation ofdopamine neurons in ventral VTA by noxious stimuli. Proc. Natl. Acad. Sci. U.S.A.106, 4894–4899.

Bromberg-Martin, E.S., Hikosaka, O., 2009. Midbrain dopamine neurons signalpreference for advance information about upcoming rewards. Neuron 63,119–126.

Bromberg-Martin, E.S., Matsumoto, M., Hikosaka, O., 2010. Dopamine in motiva-tional control: rewarding, aversive, and alerting. Neuron 68, 815–834.

Brosch, M., Budinger, E., Scheich, H., 2002. Stimulus-related gamma oscillations inprimate auditory cortex. J. Neurophysiol. 87, 2715–2725.

Brown, P.L., Jenkins, H.M., 1968. Auto-shaping of the pigeon’s key-peck. J. Exp. Anal.Behav. 11, 1–8.

Burgess, N., Barry, C., O’Keefe, J., 2007. An oscillatory interference model of grid cellfiring. Hippocampus 17, 801–812.

Burgess, N., Maguire, E.A., O’Keefe, J., 2002. The human hippocampus and spatialand episodic memory. Neuron 35, 625–641.

Burns, L.H., Annett, L., Kelley, A.E., Everitt, B.J., Robbins, T.W., 1996. Effects of lesionsto amygdala, ventral subiculum, medial prefrontal cortex, and nucleus accum-bens on the reaction to novelty: implication for limbic–striatal interactions.Behav. Neurosci. 110, 60–73.

Burwell, R.D., 2000. The parahippocampal region: corticocortical connectivity. Ann.N. Y. Acad. Sci. 911, 25–42.

Burwell, R.D., Amaral, D.G., 1998a. Cortical afferents of the perirhinal, postrhinal,and entorhinal cortices of the rat. J. Comp. Neurol. 398, 179–205.

Burwell, R.D., Amaral, D.G., 1998b. Perirhinal and postrhinal cortices of the rat:interconnectivity and connections with the entorhinal cortex. J. Comp. Neurol.391, 293–321.

Bussey, T.J., Everitt, B.J., Robbins, T.W., 1997. Dissociable effects of cingulate andmedial frontal cortex lesions on stimulus–reward learning using a novelPavlovian autoshaping procedure for the rat: implications for the neurobiologyof emotion. Behav. Neurosci. 111, 908–919.

Buzsaki, G., 1989. Two-stage model of memory trace formation: a role for ‘‘noisy’’brain states. Neuroscience 31, 551–570.

Page 32: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 127

Buzsaki, G., 2005. Theta rhythm of navigation: link between path integration andlandmark navigation, episodic and semantic memory. Hippocampus 15, 827–840.

Buzsaki, G., 2006. Rhythms of the Brain. Oxford Press, NY.Buzsaki, G., Chrobak, J.J., 2005. Synaptic plasticity and self-organization in the

hippocampus. Nat. Neurosci. 8, 1418–1420.Cardinal, R.N., Pennicott, D.R., Sugathapala, C.L., Robbins, T.W., Everitt, B.J., 2001.

Impulsive choice induced in rats by lesions of the nucleus accumbens core.Science 292, 2499–2501.

Carelli, R.M., Ijames, S.G., 2001. Selective activation of accumbens neurons bycocaine-associated stimuli during a water/cocaine multiple schedule. BrainRes. 907, 156–161.

Carr, D.B., Sesack, S.R., 2000. Projections from the rat prefrontal cortex to the ventraltegmental area: target specificity in the synaptic associations with mesoac-cumbens and mesocortical neurons. J. Neurosci. 20, 3864–3873.

Carr, H.A., 1917. The distribution and elimination of errors in the maze. J. Anim.Behav. 7, 145–159.

Charnov, E.L., 1976. Optimal foraging, the marginal value theorem. Theor. Popul.Biol. 9, 129–136.

Christoph, G.R., Leonzio, R.J., Wilcox, K.S., 1986. Stimulation of the lateral habenulainhibits dopamine-containing neurons in the substantia nigra and ventraltegmental area of the rat. J. Neurosci. 6, 613–619.

Chudasama, Y., Robbins, T.W., 2006. Functions of frontostriatal systems in cogni-tion: comparative neuropsychopharmacological studies in rats, monkeys andhumans. Biol. Psychol. 73, 19–38.

Clark, J.J., Sandberg, S.G., Wanat, M.J., Gan, J.O., Horne, E.A., Hart, A.S., Akers, C.A.,Parker, J.G., Willuhn, I., Martinez, V., Evans, S.B., Stella, N., Phillips, P.E., 2010.Chronic microsensors for longitudinal, subsecond dopamine detection in be-having animals. Nat. Methods 7, 126–129.

Colby, C.L., 1998. Action-oriented spatial reference frames in cortex. Neuron 20, 15–24.

Colby, C.L., Goldberg, M.E., 1999. Space and attention in parietal cortex. Annu. Rev.Neurosci. 22, 319–349.

Colgin, L.L., Moser, E.I., Moser, M., 2008. Understanding memory through hippo-campal remapping. Trends Neurosci. 31, 469–477.

Colwill, R.M., Rescorla, R.A., 1990. Effect of reinforcer devaluation on discriminativecontrol of instrumental behavior. J. Exp. Psychol. Anim. Behav. Process 16, 40–47.

Cooper, B.G., Mizumori, S.J., 2001. Temporary inactivation of the retrosplenialcortex causes a transient reorganization of spatial coding in the hippocampus.J. Neurosci. 21, 3986–4001.

Corbit, L.H., Janak, P.H., 2007. Inactivation of the lateral but not medial dorsalstriatum eliminates the excitatory impact of Pavlovian stimuli on instrumentalresponding. J. Neurosci. 27, 13977–13981.

Corbit, L.H., Janak, P.H., 2010. Posterior dorsomedial striatum is critical for bothselective instrumental and Pavlovian reward learning. Eur. J. Neurosci. 31,1312–1321.

Corbit, L.H., Muir, J.L., Balleine, B.W., 2001. The role of the nucleus accumbens ininstrumental conditioning: evidence of a functional dissociation betweenaccumbens core and shell. J. Neurosci. 21, 3251–3260.

Corrado, G.S., Sugrue, L.P., Brown, J.R., Newsome, W.T., 2009. The trouble withchoice: studying decision variables in the brain. In: Glimcher, P.W., Camerer,C.F., Fehr, E., Poldrack, R.A. (Eds.), Neuroeconomics: Decision Making the Brain.Elsevier.

Costa, R.M., Cohen, D., Nicolelis, M.A., 2004. Differential corticostriatal plasticityduring fast and slow motor skill learning in mice. Curr. Biol. 14, 1124–1134.

Cousins, M.S., Atherton, A., Turner, L., Salamone, J.D., 1996. Nucleus accumbensdopamine depletions alter relative response allocation in a T-maze cost/benefittask. Behav. Brain Res. 74, 189–197.

Cousins, M.S., Wei, W., Salamone, J.D., 1994. Pharmacological characterization ofperformance on a concurrent lever pressing/feeding choice procedure: effectsof dopamine antagonist, cholinomimetic, sedative and stimulant drugs. Psycho-pharmacology (Berl) 116, 529–537.

Cowie, R.J., 1977. Optimal foraging in great tits (Parus major). Nature 268, 137–139.Cromwell, H.C., Schultz, W., 2003. Effects of expectations for different reward

magnitudes on neuronal activity in primate striatum. J. Neurophysiol. 89,2823–2838.

Csicsvari, J., Jamieson, B., Wise, K.D., Buzsaki, G., 2003. Mechanisms of gammaoscillations in the hippocampus of the behaving rat. Neuron 37, 311–322.

Dalley, J.W., Cardinal, R.N., Robbins, T.W., 2004. Prefrontal executive and cognitivefunctions in rodents: neural and neurochemical substrates. Neurosci. Biobehav.Rev. 28, 771–784.

Da Cunha, C., Wietzikoski, S., Wietzikoski, E.C., Miyoshi, E., Ferro, M.M., Anselmo-Franci, J.A., Canteras, N.S., 2003. Evidence for the substantia nigra pars compactaas an essential component of a memory system independent of the hippocam-pal memory system. Neurobiol. Learn Mem. 79, 236–242.

Darvas, M., Palmiter, R.D., 2010. Restricting dopaminergic signaling to eitherdorsolateral or medial striatum facilitates cognition. J. Neurosci. 30, 1158–1165.

Darvas, M., Palmiter, R.D., 2011. Contributions of striatal dopamine signaling to themodulation of cognitive flexibility. Biol. Psychiatry 69, 704–707.

Davies, N.B., 1977. Prey selection and search strategy of spotted flycatcher (Mus-ciapa striata)-filed-study on optimal foraging. Anim. Behav. 25, 1016–1033.

Daw, N.D., Niv, Y., Dayan, P., 2005. Uncertainty-based competition between pre-frontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci.12, 1704–1711.

Day, J.J., Carelli, R.M., 2007. The nucleus accumbens and Pavlovian reward learning.Neuroscientist 13, 148–159.

Day, J.J., Jones, J.L., Carelli, R.M., 2011. Nucleus accumbens neurons encode predictedand ongoing reward costs in rats. Eur. J. Neurosci. 33, 308–321.

Day, J.J., Roitman, M.F., Wightman, R.M., Carelli, R.M., 2007. Associative learningmediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat.Neurosci. 10, 1020–1028.

Day, J.J., Wheeler, R.A., Roitman, M.F., Carelli, R.M., 2006. Nucleus accumbensneurons encode Pavlovian approach behaviors: evidence from an autoshapingparadigm. Eur. J. Neurosci. 23, 1341–1351.

Dayan, P., Daw, N.D., 2008. Decision theory, reinforcement learning, and the brain.Cogn. Affect. Behav. Neurosci. 8, 429–453.

Dayan, P., Niv, Y., 2008. Reinforcement learning: the good, the bad and the ugly.Curr. Opin. Neurobiol. 18, 185–196.

De Leonibus, E., Pascucci, T., Lopez, S., Oliverio, A., Amalric, M., Mele, A., 2007. Spatialdeficits in a mouse model of Parkinson disease. Psychopharmacology (Berl) 194,517–525.

De Leonibus, E., Verheij, M.M., Mele, A., Cools, A., 2006. Distinct kinds of noveltyprocessing differentially increase extracellular dopamine in different brainregions. Eur. J. Neurosci. 23, 1332–1340.

DeCoteau, W.E., Kesner, R.P., 2000. A double dissociation between the rat hippo-campus and medial caudoputamen in processing two forms of knowledge.Behav. Neurosci. 114, 1096–1108.

DeCoteau, W.E., Thorn, C., Gibson, D.J., Courtemanche, R., Mitra, P., Kubota, Y.,Graybiel, A.M., 2007a. Learning-related coordination of striatal and hippocam-pal theta rhythms during acquisition of a procedural maze task. Proc. Natl. Acad.Sci. U.S.A. 104, 5644–5649.

DeCoteau, W.E., Thorn, C., Gibson, D.J., Courtemanche, R., Mitra, P., Kubota, Y.,Graybiel, A.M., 2007b. Oscillations of local field potentials in the rat dorsalstriatum during spontaneous and instructed behaviors. J. Neurophysiol. 97,3800–3805.

Denk, F., Walton, M.E., Jennings, K.A., Sharp, T., Rushworth, M.F., Bannerman, D.M.,2005. Differential involvement of serotonin and dopamine systems in cost–benefit decisions about delay or effort. Psychopharmacology (Berl) 179, 587–596.

Derdikman, D., Moser, E.I., 2010. A manifold of spatial maps in the brain. TrendsCogn. Sci. 14, 561–569.

Devan, B.D., White, N.M., 1999. Parallel information processing in the dorsalstriatum: relation to hippocampal function. J. Neurosci. 19, 2789–2798.

Di Ciano, P., Cardinal, R.N., Cowell, R.A., Little, S.J., Everitt, B.J., 2001. Differentialinvolvement of NMDA, AMPA/kainate, and dopamine receptors in the nucleusaccumbens core in the acquisition and performance of Pavlovian approachbehavior. J. Neurosci. 21, 9471–9477.

Dickinson, A., 1985. Actions and habits: the development of behavioural autonomy.Philos. Trans. R. Soc. Lond. B: Biol. Sci. 308, 67–78.

Diaz-Fleischer, F., 2005. Predatory behavior and prey-capture decision-making bythe web-weaving spider Micrathena sagittata. Can. J. Zool. Rev. Can. Zool. 83,268–273.

Dormont, J.F., Conde, H., Farin, D., 1998. The role of the pedunculopontine tegmentalnucleus in relation to conditioned motor performance in the cat. I. Context-dependent and reinforcement-related single unit activity. Exp. Brain Res. 121,401–410.

Doya, K., 2008. Modulators of decision making. Nat. Neurosci. 11, 410–416.Dragoi, G., Harris, K.D., Buzsaki, G., 2003. Place representation within hippocampal

networks is modified by long-term potentiation. Neuron 39, 843–853.Eichenbaum, H., Cohen, N.J., 2001. From Conditioning to Conscious Recollection:

Memory Systems of the Brain. Oxford University Press, New York.Eichenbaum, H., Lipton, P.A., 2008. Towards a functional organization of the medial

temporal lobe memory system: role of the parahippocampal and medialentorhinal cortical areas. Hippocampus 18, 1314–1324.

El-Ghundi, M., Fletcher, P.J., Drago, J., Sibley, D.R., O’Dowd, B.F., George, S.R., 1999.Spatial learning deficit in dopamine D(1) receptor knockout mice. Eur. J.Pharmacol. 383, 95–106.

Engel, A.K., Fries, P., Singer, W., 2001. Dynamic predictions: oscillations andsynchrony in top-down processing. Nat. Rev. Neurosci. 2, 704–716.

Enomoto, T., Floresco, S.B., 2009. Disruptions in spatial working memory, but notshort-term memory, induced by repeated ketamine exposure. Prog. Neurop-sychopharmacol. Biol. Psychiatry 33, 668–675.

Estes, W.K., 1943. Discriminative conditioning. I. A discriminative property ofconditioned anticipation. J. Exp. Psychol. 32, 150–155.

Estes, W.K., 1948. Discriminative conditioning. II. Effects of a Pavlovian conditionedstimulus upon a subsequently established operant response. J. Exp. Psychol. 38,173–177.

Etienne, A.S., Jeffery, K.J., 2004. Path integration in mammals. Hippocampus 14,180–192.

Everitt, B.J., Robbins, T.W., 2005. Neural systems of reinforcement for drug addic-tion: from actions to habits to compulsion. Nat. Neurosci. 8, 1481–1489.

Farrar, A.M., Font, L., Pereira, M., Mingote, S., Bunce, J.G., Chrobak, J.J., Salamone, J.D.,2008. Forebrain circuitry involved in effort-related choice: injections of theGABAA agonist muscimol into ventral pallidum alter response allocation infood-seeking behavior. Neuroscience 152, 321–330.

Farrar, A.M., Pereira, M., Velasco, F., Hockemeyer, J., Muller, C.E., Salamone, J.D.,2007. Adenosine A(2A) receptor antagonism reverses the effects of dopaminereceptor antagonism on instrumental output and effort-related choice in therat: implications for studies of psychomotor slowing. Psychopharmacology(Berl) 191, 579–586.

Page 33: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135128

Farrar, A.M., Segovia, K.N., Randall, P.A., Nunes, E.J., Collins, L.E., Stopper, C.M., Port,R.G., Hockemeyer, J., Muller, C.E., Correa, M., Salamone, J.D., 2010. Nucleusaccumbens and effort-related functions: behavioral and neural markers of theinteractions between adenosine A2A and dopamine D2 receptors. Neuroscience166, 1056–1067.

Faure, A., Haberland, U., Conde, F., El Massioui, N., 2005. Lesion to the nigrostriataldopamine system disrupts stimulus–response habit formation. J. Neurosci. 25,2771–2780.

Featherstone, R.E., McDonald, R.J., 2004. Dorsal striatum and stimulus–responselearning: lesions of the dorsolateral, but not dorsomedial, striatum impairacquisition of a simple discrimination task. Behav. Brain Res. 150, 15–23.

Fell, J., Klaver, P., Lehnertz, K., Grunwald, T., Schaller, C., Elger, C.E., Fernandez, G.,2001. Human memory formation is accompanied by rhinal-hippocampal cou-pling and decoupling. Nat. Neurosci. 4, 1259–1264.

Fenton, A.A., Muller, R.U., 1998. Place cell discharge is extremely variable duringindividual passes of the rat through the firing field. Proc. Natl. Acad. Sci. U.S.A.95, 3182–3187.

Ferbinteanu, J., Shirvalkar, P., Shapiro, M.L., 2011. Memory modulates journey-dependent coding in the rat hippocampus. J. Neurosci. 31, 9135–9146.

Ferbinteanu, J., Shapiro, M.L., 2003. Prospective and retrospective memory coding inthe hippocampus. Neuron 40, 1227–1239.

Ferretti, V., Florian, C., Costantini, V.J., Roullet, P., Rinaldi, A., De Leonibus, E.,Oliverio, A., Mele, A., 2005. Co-activation of glutamate and dopamine receptorswithin the nucleus accumbens is required for spatial memory consolidation inmice. Psychopharmacology (Berl) 179, 108–116.

Fields, H.L., Hjelmstad, G.O., Margolis, E.B., Nicola, S.M., 2007. Ventral tegmentalarea neurons in learned appetitive behavior and positive reinforcement. Annu.Rev. Neurosci. 30, 289–316.

Fiorillo, C.D., Newsome, W.T., Schultz, W., 2008. The temporal precision of rewardprediction in dopamine neurons. Nat. Neurosci. 11, 966–973.

Fiorillo, C.D., Tobler, P.N., Schultz, W., 2003. Discrete coding of reward probabilityand uncertainty by dopamine neurons. Science 299, 1898–1902.

Fiorillo, C.D., Tobler, P.N., Schultz, W., 2005. Evidence that the delay-period activityof dopamine neurons corresponds to reward uncertainty rather than back-propagating TD errors. Behav. Brain Funct. 1, 7.

Fitting, S., Allen, G.L., Wedell, D.H., 2007. Remembering places in space: a humananalog study of the Morris water maze. In: Barkowsky, T., Knauff, M., Ligozat,G., Montello, D.R. (Eds.), Spatial Cognition V: Reasoning, Action, Interaction.Springer-Verlag, Berlin, Heidelberg, pp. 59–75.

Flagel, S.B., Clark, J.J., Robinson, T.E., Mayo, L., Czuj, A., Willuhn, I., Akers, C.A.,Clinton, S.M., Phillips, P.E., Akil, H., 2011. A selective role for dopamine instimulus–reward learning. Nature 469, 53–57.

Flaherty, A.W., Graybiel, A.M., 1993. Output architecture of the primate putamen. J.Neurosci. 13, 3222–3237.

Floresco, S.B., Blaha, C.D., Yang, C.R., Phillips, A.G., 2001. Modulation of hippocampaland amygdalar-evoked activity of nucleus accumbens neurons by dopamine:cellular mechanisms of input selection. J. Neurosci. 21, 2851–2860.

Floresco, S.B., Ghods-Sharifi, S., 2007. Amygdala-prefrontal cortical circuitry reg-ulates effort-based decision making. Cereb. Cortex 17, 251–260.

Floresco, S.B., St Onge, J.R., Ghods-Sharifi, S., Winstanley, C.A., 2008a. Cortico-limbic-striatal circuits subserving different forms of cost–benefit decisionmaking. Cogn. Affect. Behav. Neurosci. 8, 375–389.

Floresco, S.B., Tse, M.T., Ghods-Sharifi, S., 2008b. Dopaminergic and glutamatergicregulation of effort- and delay-based decision making. Neuropsychopharma-cology 33, 1966–1979.

Font, L., Mingote, S., Farrar, A.M., Pereira, M., Worden, L., Stopper, C., Port, R.G.,Salamone, J.D., 2008. Intra-accumbens injections of the adenosine A2A agonistCGS 21680 affect effort-related choice behavior in rats. Psychopharmacology(Berl) 199, 515–526.

Foster, D.J., Wilson, M.A., 2006. Reverse replay of behavioural sequences in hippo-campal place cells during the awake state. Nature 440, 680–683.

Foster, T.C., Castro, C.A., McNaughton, B.L., 1989. Spatial selectivity of rat hippo-campal neurons: dependence on preparedness for movement. Science 244,1580–1582.

Frank, L.M., Brown, E.N., Wilson, M., 2000. Trajectory encoding in the hippocampusand entorhinal cortex. Neuron 27, 169–178.

Frank, L.M., Stanley, G.B., Brown, E.N., 2004. Hippocampal plasticity across multipledays of exposure to novel environments. J. Neurosci. 24, 7681–7689.

Frank, M.J., 2005. Dynamic dopamine modulation in the basal ganglia: a neuro-computational account of cognitive deficits in medicated and nonmedicatedParkinsonism. J. Cogn. Neurosci. 17, 51–72.

Freeman Jr., J.H., Cuppernell, C., Flannery, K., Gabriel, M., 1996. Context-specificmulti-site cingulate cortical, limbic thalamic, and hippocampal neuronal activ-ity during concurrent discriminative approach and avoidance training in rab-bits. J. Neurosci. 16, 1538–1549.

Freeman Jr., J.H., Weible, A., Rossi, J., Gabriel, M., 1997. Lesions of the entorhinalcortex disrupt behavioral and neuronal responses to context change duringextinction of discriminative avoidance behavior. Exp. Brain Res. 115, 445–457.

French, S.J., Totterdell, S., 2002. Hippocampal and prefrontal cortical inputs mono-synaptically converge with individual projection neurons of the nucleus accum-bens. J. Comp. Neurol. 446, 151–165.

Frey, U., Matthies, H., Reymann, K.G., 1991. The effect of dopaminergic D1 receptorblockade during tetanization on the expression of long-term potentiation in therat CA1 region in vitro. Neurosci. Lett. 129, 111–114.

Frey, U., Morris, R.G., 1997. Synaptic tagging and long-term potentiation. Nature385, 533–536.

Frey, U., Schroeder, H., Matthies, H., 1990. Dopaminergic antagonists prevent long-term maintenance of posttetanic LTP in the CA1 region of rat hippocampalslices. Brain Res. 522, 69–75.

Fries, P., 2009. Neuronal gamma-band synchronization as a fundamental process incortical computation. Annu. Rev. Neurosci. 32, 209–224.

Fuhs, M.C., Touretzky, D.S., 2007. Context learning in the rodent hippocampus.Neural Comput. 19, 3173–3215.

Fujisawa, S., Buzsaki, G., 2010. Theta and 4 Hz Oscillations: Region-specific Couplingof PFC, VTA and Hippocampus in a Goal-directed Behavior. Society for Neuro-science, San Diego, CA.

Futami, T., Takakusaki, K., Kitai, S.T., 1995. Glutamatergic and cholinergic inputsfrom the pedunculopontine tegmental nucleus to dopamine neurons in thesubstantia nigra pars compacta. Neurosci. Res. 21, 331–342.

Fyhn, M., Hafting, T., Treves, A., Moser, M.B., Moser, E.I., 2007. Hippocampalremapping and grid realignment in entorhinal cortex. Nature 446, 190–194.

Gal, G., Joel, D., Gusak, O., Feldon, J., Weiner, I., 1997. The effects of electrolytic lesionto the shell subterritory of the nucleus accumbens on delayed non-matching-to-sample and four-arm baited eight-arm radial-maze tasks. Behav. Neurosci.111, 92–103.

Gan, J.O., Walton, M.E., Phillips, P.E., 2010. Dissociable cost and benefit encoding offuture rewards by mesolimbic dopamine. Nat. Neurosci. 13, 25–27.

Gardiner, T.W., Kitai, S.T., 1992. Single-unit activity in the globus pallidus andneostriatum of the rat during performance of a trained head movement. Exp.Brain Res. 88, 517–530.

Gasbarri, A., Packard, M.G., Campana, E., Pacitti, C., 1994a. Anterograde and retro-grade tracing of projections from the ventral tegmental area to the hippocampalformation in the rat. Brain Res. Bull. 33, 445–452.

Gasbarri, A., Sulli, A., Innocenzi, R., Pacitti, C., Brioni, J.D., 1996. Spatial memoryimpairment induced by lesion of the mesohippocampal dopaminergic systemin the rat. Neuroscience 74, 1037–1044.

Gasbarri, A., Sulli, A., Packard, M.G., 1997. The dopaminergic mesencephalic projec-tions to the hippocampal formation in the rat. Prog. Neuropsychopharmacol.Biol. Psychiatry 21, 1–22.

Gasbarri, A., Verney, C., Innocenzi, R., Campana, E., Pacitti, C., 1994b. Mesolimbicdopaminergic neurons innervating the hippocampal formation in the rat: acombined retrograde tracing and immunohistochemical study. Brain Res. 668,71–79.

Gavrilov, V.V., Wiener, S.I., Berthoz, A., 1998. Discharge correlates of hippocampalcomplex spike neurons in behaving rats passively displaced on a mobile robot.Hippocampus 8, 475–490.

Geisler, S., Derst, C., Veh, R.W., Zahm, D.S., 2007. Glutamatergic afferents of theventral tegmental area in the rat. J. Neurosci. 27, 5730–5743.

Gengler, S., Mallot, H.A., Holscher, C., 2005. Inactivation of the rat dorsal striatumimpairs performance in spatial tasks and alters hippocampal theta in the freelymoving rat. Behav. Brain Res. 164, 73–82.

Gilbert, P.E., Kesner, R.P., Lee, I., 2001. Dissociating hippocampal subregions: doubledissociation between dentate gyrus and CA1. Hippocampus 11, 626–636.

Gill, K.M., Mizumori, S.J., 2007. Inactivation of prefrontal cortex alters reward-related neural activity in substantia nigra. In: Society for NeuroscienceAbstracts. Program No. 640.1.

Gold, A.E., Kesner, R.P., 2005. The role of the CA3 subregion of the dorsal hippo-campus in spatial pattern completion in the rat. Hippocampus 15, 808–814.

Goss-Custard, J.D., 1977. Response of redshank, Tringa totanus, to absolute andrelative densities of 2 prey species. J. Anim. Ecol. 46, 867–874.

Gothard, K.M., Skaggs, W.E., Moore, K.M., McNaughton, B.L., 1996. Binding ofhippocampal CA1 neural activity to multiple reference frames in a land-mark-based navigation task. J. Neurosci. 16, 823–835.

Goto, Y., O’Donnell, P., 2002. Timing-dependent limbic–motor synaptic integrationin the nucleus accumbens. Proc. Natl. Acad. Sci. U.S.A. 99, 13189–13193.

Goto, Y., Yang, C.R., Otani, S., 2010. Functional and dysfunctional synaptic plasticityin prefrontal cortex: roles in psychiatric disorders. Biol. Psychiatry 67, 199–207.

Grace, A.A., 1991. Phasic versus tonic dopamine release and the modulation ofdopamine system responsivity: a hypothesis for the etiology of schizophrenia.Neuroscience 41, 1–24.

Grace, A.A., Floresco, S.B., Goto, Y., Lodge, D.J., 2007. Regulation of firing of dopami-nergic neurons and control of goal-directed behaviors. Trends Neurosci. 30,220–227.

Graybiel, A.M., 1998. The basal ganglia and chunking of action repertoires. Neuro-biol. Learn. Mem. 70, 119–136.

Graybiel, A.M., 2008. Habits, rituals, and the evaluative brain. Annu. Rev. Neurosci.31, 359–387.

Graybiel, A.M., Aosaki, T., Flaherty, A.W., Kimura, M., 1994. The basal ganglia andadaptive motor control. Science 265, 1826–1831.

Groenewegen, H.J., Galis-de Graaf, Y., Smeets, W.J., 1999a. Integration and segrega-tion of limbic cortico-striatal loops at the thalamic level: an experimentaltracing study in rats. J. Chem. Neuroanat. 16, 167–185.

Groenewegen, H.J., Vermeulen-Van der Zee, E., te Kortschot, A., Witter, M.P., 1987.Organization of the projections from the subiculum to the ventral striatum inthe rat. A study using anterograde transport of Phaseolus vulgaris leucoagglu-tinin. Neuroscience 23, 103–120.

Groenewegen, H.J., Wright, C.I., Beijer, A.V., Voorn, P., 1999b. Convergence andsegregation of ventral striatal inputs and outputs. Ann. N. Y. Acad. Sci. 877, 49–63.

Guthrie, E.R., 1935. The Psychology of Learning. Harper, New York.Guzowski, J.F., Knierim, J.J., Moser, E.I., 2004. Ensemble dynamics of hippocampal

regions CA3 and CA1. Neuron 44, 581–584.

Page 34: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 129

Haber, S.N., 2003. The primate basal ganglia: parallel and integrative networks. J.Chem. Neuroanat. 26, 317–330.

Haber, S.N., Fudge, J.L., McFarland, N.R., 2000. Striatonigrostriatal pathways inprimates form an ascending spiral from the shell to the dorsolateral striatum.J. Neurosci. 20, 2369–2382.

Hafting, T., Fyhn, M., Molden, S., Moser, M.B., Moser, E.I., 2005. Microstructure of aspatial map in the entorhinal cortex. Nature 436, 801–806.

Hall, J., Parkinson, J.A., Connor, T.M., Dickinson, A., Everitt, B.J., 2001. Involvement ofthe central nucleus of the amygdala and nucleus accumbens core in mediatingPavlovian influences on instrumental behaviour. Eur. J. Neurosci. 13, 1984–1992.

Hallworth, N.E., Bland, B.H., 2004. Basal ganglia–hippocampal interactions supportthe role of the hippocampal formation in sensorimotor integration. Exp. Neurol.188, 430–443.

Hamilton, D.A., Driscoll, I., Sutherland, R.J., 2002. Human place learning in a virtualMorris water task: some important constraints on the flexibility of placenavigation. Behav. Brain Res. 129, 159–170.

Hammond, L.J., 1980. The effect of contingency upon the appetitive conditioning offree-operant behavior. J. Exp. Anal. Behav. 34, 297–304.

Hampson, R.E., Heyser, C.J., Deadwyler, S.A., 1993. Hippocampal cell firing correlatesof delayed-match-to-sample performance in the rat. Behav. Neurosci. 107, 715–739.

Hargreaves, E.L., Yoganarasimha, D., Knierim, J.J., 2007. Cohesiveness of spatial anddirectional representations recorded from neural ensembles in the anteriorthalamus, parasubiculum, medial entorhinal cortex, and hippocampus. Hippo-campus 17, 826–841.

Haruno, M., Kawato, M., 2006. Heterarchical reinforcement-learning model forintegration of multiple cortico-striatal loops: fMRI examination in stimulus–action–reward association learning. Neural Netw. 19, 1242–1254.

Hassani, O.K., Cromwell, H.C., Schultz, W., 2001. Influence of expectation of differentrewards on behavior-related neuronal activity in the striatum. J. Neurophysiol.85, 2477–2489.

Hasselmo, M.E., 2005a. The role of hippocampal regions CA3 and CA1 in matchingentorhinal input with retrieval of associations between objects and context:theoretical comment on Lee et al. (2005). Behav. Neurosci. 119, 342–345.

Hasselmo, M.E., 2005b. What is the function of hippocampal theta rhythm? Linkingbehavioral data to phasic properties of field potential and unit recording data.Hippocampus 15, 936–949.

Hasselmo, M.E., Hay, J., Ilyn, M., Gorchetchnikov, A., 2002. Neuromodulation, thetarhythm and rat spatial navigation. Neural Netw. 15, 689–707.

Hasselmo, M.E., McGaughy, J., 2004. High acetylcholine levels set circuit dynamicsfor attention and encoding and low acetylcholine levels set dynamics forconsolidation. Prog. Brain Res. 145, 207–231.

Hauber, W., Sommer, S., 2009. Prefrontostriatal circuitry regulates effort-relateddecision making. Cereb. Cortex 19, 2240–2247.

Hawkes, K., Hill, K., O’Connell, J., 1982. Why hunters gather—optimal foraging andthe ache of eastern Paraguay. Am. Ethnol. 9, 379–398.

Hebb, D.O., 1949. The Organization of Behavior: A Neuropsychological Theory. JohnWiley and Sons.

Heimer, L., Zahm, D.S., Churchill, L., Kalivas, P.W., Wohltmann, C., 1991. Specificityin the projection patterns of accumbal core and shell in the rat. Neuroscience 41,89–125.

Henriksen, E.J., Colgin, L.L., Barnes, C.A., Witter, M.P., Moser, M.B., Moser, E.I., 2010.Spatial representation along the proximodistal axis of CA1. Neuron 68, 127–137.

Herkenham, M., Nauta, W.J., 1979. Efferent connections of the habenular nuclei inthe rat. J. Comp. Neurol. 187, 19–47.

Hetherington, P.A., Shapiro, M.L., 1997. Hippocampal place fields are altered by theremoval of single visual cues in a distance-dependent manner. Behav. Neurosci.111, 20–34.

Hikosaka, O., Bromberg-Martin, E., Hong, S., Matsumoto, M., 2008. New insights onthe subcortical representation of reward. Curr. Opin. Neurobiol. 18, 203–208.

Hikosaka, O., Nakahara, H., Rand, M.K., Sakai, K., Lu, X., Nakamura, K., Miyachi, S.,Doya, K., 1999. Parallel neural networks for learning sequential procedures.Trends Neurosci. 22, 464–471.

Hikosaka, O., Nakamura, K., Nakahara, H., 2006. Basal ganglia orient eyes to reward.J. Neurophysiol. 95, 567–584.

Hikosaka, O., Sakamoto, M., Usui, S., 1989. Functional properties of monkey caudateneurons. III. Activities related to expectation of target and reward. J. Neuro-physiol. 61, 814–832.

Hill, A.J., 1978. First occurrence of hippocampal spatial firing in a new environment.Exp. Neurol. 62, 282–297.

Hill, A.J., Best, P.J., 1981. Effects of deafness and blindness on the spatial correlates ofhippocampal unit activity in the rat. Exp. Neurol. 74, 204–217.

Hirsh, R., 1974. The hippocampus and contextual retrieval of information frommemory: a theory. Behav. Biol. 12, 421–444.

Hoge, J., Kesner, R.P., 2007. Role of CA3 and CA1 subregions of the dorsal hippo-campus on temporal processing of objects. Neurobiol. Learn. Mem. 88, 225–231.

Hollerman, J.R., Schultz, W., 1998. Dopamine neurons report an error in thetemporal prediction of reward during learning. Nat. Neurosci. 1, 304–309.

Hollerman, J.R., Tremblay, L., Schultz, W., 1998. Influence of reward expectation onbehavior-related neuronal activity in primate striatum. J. Neurophysiol. 80,947–963.

Hollup, S.A., Kjelstrup, K.G., Hoff, J., Moser, M.B., Moser, E.I., 2001. Impairedrecognition of the goal location during spatial navigation in rats with hippo-campal lesions. J. Neurosci. 21, 4505–4513.

Holmes, N.M., Marchand, A.R., Coutureau, E., 2010. Pavlovian to instrumentaltransfer: a neurobehavioural perspective. Neurosci. Biobehav. Rev. 34, 1277–1295.

Honzik, C.H., 1933. Maze learning in rats on the absence of specific intra- and extra-maze stimuli. Psychol. Bull. 30, 589–590.

Hoogenboom, N., Schoffelen, J.M., Oostenveld, R., Parkes, L.M., Fries, P., 2006.Localizing human visual gamma-band activity in frequency, time and space.Neuroimage 29, 764–773.

Horvitz, J.C., 2002. Dopamine gating of glutamatergic sensorimotor and incentivemotivational input signals to the striatum. Behav. Brain Res. 137, 65–74.

Horvitz, J.C., Stewart, T., Jacobs, B.L., 1997. Burst activity of ventral tegmentaldopamine neurons is elicited by sensory stimuli in the awake cat. Brain Res.759, 251–258.

Houk, J.C., 1995. Information processing in modular circuits linking basal gangliaand cerebral cortex. In: Houk, J.C., Davis, J.L., Beiser, D.G. (Eds.), Models ofInformation Processing in the Basal Ganglia. MIT Press, Cambridge.

Houk, J.C., Davis, J.L., Beiser, D.G., 1995. Models of Information Processing in theBasal Ganglia. MIT Press, Cambridge, MA.

Huang, Y.Y., Kandel, E.R., 1995. D1/D5 receptor agonists induce a protein synthesis-dependent late potentiation in the CA1 region of the hippocampus. Proc. Natl.Acad. Sci. U.S.A. 92, 2446–2450.

Hull, C.L., 1932. The goal gradient hypothesis and maze learning. Psychol. Rev. 39,25–43.

Hull, C.L., 1943. Principles of Behavior. Appleton-Century Crofts, New York.Humphries, M.D., Prescott, T.J., 2010. The ventral basal ganglia, a selection mecha-

nism at the crossroads of space, strategy, and reward. Prog. Neurobiol. 90, 385–417.

Hunsaker, M.R., Mooy, G.G., Swift, J.S., Kesner, R.P., 2007. Dissociations of the medialand lateral perforant path projections into dorsal DG, CA3, and CA1 for spatialand nonspatial (visual object) information processing. Behav. Neurosci. 121,742–750.

Hyman, J.M., Zilli, E.A., Paley, A.M., Hasselmo, M.E., 2005. Medial prefrontal cortexcells show dynamic modulation with the hippocampal theta rhythm dependenton behavior. Hippocampus 15, 739–749.

Ikemoto, S., 2007. Dopamine reward circuitry: two projection systems from theventral midbrain to the nucleus accumbens-olfactory tubercle complex. BrainRes. Rev. 56, 27–78.

Ikemoto, S., Panksepp, J., 1999. The role of nucleus accumbens dopamine inmotivated behavior: a unifying interpretation with special reference to re-ward-seeking. Brain Res. Brain Res. Rev. 31, 6–41.

Ito, R., Robbins, T.W., Pennartz, C.M., Everitt, B.J., 2008. Functional interactionbetween the hippocampus and nucleus accumbens shell is necessary for theacquisition of appetitive spatial context conditioning. J. Neurosci. 28, 6950–6959.

Izquierdo, I., Bevilaqua, L.R., Rossato, J.I., Bonini, J.S., Da Silva, W.C., Medina, J.H.,Cammarota, M., 2006. The connection between the hippocampal and thestriatal memory systems of the brain: a review of recent findings. Neurotox.Res. 10, 113–121.

Jackson, J., Redish, A.D., 2007. Network dynamics of hippocampal cell-assembliesresemble multiple spatial maps within single tasks. Hippocampus 17, 1209–1229.

Jaeger, D., Gilman, S., Aldridge, J.W., 1993. Primate basal ganglia activity ina precued reaching task: preparation for movement. Exp. Brain Res. 95,51–64.

Jay, T.M., Glowinski, J., Thierry, A.M., 1989. Selectivity of the hippocampalprojection to the prelimbic area of the prefrontal cortex in the rat. BrainRes. 505, 337–340.

Jeffery, K.J., Anderson, M.I., Hayman, R., Chakraborty, S., 2004. A proposed architec-ture for the neural representation of spatial context. Neurosci. Biobehav. Rev.28, 201–218.

Jeffery, K.J., Gilbert, A., Burton, S., Strudwick, A., 2003. Preserved performance in ahippocampal-dependent spatial task despite complete place cell remapping.Hippocampus 13, 175–189.

Jenkins, H.M., Moore, B.R., 1973. The form of the auto-shaped response with food orwater reinforcers. J. Exp. Anal. Behav. 20, 163–181.

Jensen, O., Lisman, J.E., 1996. Hippocampal CA3 region predicts memory sequences:accounting for the phase precession of place cells. Learn. Mem. 3, 279–287.

Jin, X., Costa, R.M., 2010. Start/stop signals emerge in nigrostriatal circuits duringsequence learning. Nature 466, 457–462.

Jo, Y.S., Lee, I., 2010. Disconnection of the hippocampal–perirhinal cortical circuitsseverely disrupts object–place paired associative memory. J. Neurosci. 30,9850–9858.

Joel, D., Niv, Y., Ruppin, E., 2002. Actor–critic models of basal ganglia function: newanatomical and computational perpectives. Neural Netw. 15, 535–547.

Joel, D., Weiner, I., 1994. The organization of the basal ganglia–thalamocorticalcircuits: open interconnected rather than closed segregated. Neuroscience 63,363–379.

Joel, D., Weiner, I., 2000. The connections of the dopaminergic system with thestriatum in rats and primates: an analysis with respect to the functional andcompartmental organization of the striatum. Neuroscience 96, 451–474.

Jog, M.S., Kubota, Y., Connolly, C.I., Hillegaart, V., Graybiel, A.M., 1999. Buildingneural representations of habits. Science 286, 1745–1749.

Johnson, A., van der Meer, M.A., Redish, A.D., 2007. Integrating hippocampus andstriatum in decision-making. Curr. Opin. Neurobiol. 17, 692–697.

Jones, M.W., Wilson, M.A., 2005. Theta rhythms coordinate hippocampal–prefrontalinteractions in a spatial memory task. PLoS Biol. 3, e402.

Page 35: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135130

Jongen-Relo, A.L., Voorn, P., Groenewegen, H.J., 1994. Immunohistochemical char-acterization of the shell and core territories of the nucleus accumbens in the rat.Eur. J. Neurosci. 6, 1255–1264.

Joshua, M., Adler, A., Mitelman, R., Vaadia, E., Bergman, H., 2008. Midbrain dopa-minergic neurons and striatal cholinergic interneurons encode the differencebetween reward and aversive events at different epochs of probabilistic classi-cal conditioning trials. J. Neurosci. 28, 11673–11684.

Jung, M.W., Wiener, S.I., McNaughton, B.L., 1994. Comparison of spatial firingcharacteristics of units in dorsal and ventral hippocampus of the rat. J. Neurosci.14, 7347–7356.

Kalenscher, T., Lansink, C.S., Lankelma, J.V., Pennartz, C.M., 2010. Reward-associatedgamma oscillations in ventral striatum are regionally differentiated and mod-ulate local firing activity. J. Neurophysiol. 103, 1658–1672.

Karni, A., Meyer, G., Rey-Hipolito, C., Jezzard, P., Adams, M.M., Turner, R., Ungerlei-der, L.G., 1998. The acquisition of skilled motor performance: fast and slowexperience-driven changes in primary motor cortex. Proc. Natl. Acad. Sci. U.S.A.95, 861–868.

Kelemen, E., Fenton, A.A., 2010. Dynamic grouping of hippocampal neural activityduring cognitive control of two spatial frames. PLoS 8, e1000403.

Kelley, A.E., 2004. Ventral striatal control of appetitive motivation: role iningestive behavior and reward-related learning. Neurosci. Biobehav. Rev.27, 765–776.

Kennedy, P.J., Shapiro, M.L., 2004. Retrieving memories via internal context requiresthe hippocampus. J. Neurosci. 24, 6979–6985.

Kentros, C.G., Agnihotri, N.T., Streater, S., Hawkins, R.D., Kandel, E.R., 2004. Increasedattention to spatial context increases both place field stability and spatialmemory. Neuron 42, 283–295.

Kentros, C., Hargreaves, E., Hawkins, R.D., Kandel, E.R., Shapiro, M., Muller, R.V.,1998. Abolition of long-term stability of new hippocampal place cell maps byNMDA receptor blockade. Science 280, 2121–2126.

Kesner, R.P., 2007. Behavioral functions of the CA3 subregion of the hippocampus.Learn. Mem. 14, 771–781.

Kesner, R.P., Lee, I., Gilbert, P., 2004. A behavioral assessment of hippocampalfunction based on a subregional analysis. Rev. Neurosci. 15, 333–351.

Khamassi, M., Lacheze, L., Girard, B., Berthoz, A., Guillot, A., 2005. Actor–criticmodels of reinforcement learning in the basal ganglia: from natural to arificialrats. Adapt. Behav. 13, 131–148.

Khamassi, M., Mulder, A.B., Tabuchi, E., Douchamps, V., Wiener, S.I., 2008. Antici-patory reward signals in ventral striatal neurons of behaving rats. Eur. J.Neurosci. 28, 1849–1866.

Kim, J.J., Fanselow, M.S., 1992. Modality-specific retrograde amnesia of fear. Science256, 675–677.

Kimchi, E.Y., Laubach, M., 2009. Dynamic encoding of action selection by the medialstriatum. J. Neurosci. 29, 3148–3159.

Kimura, M., Aosaki, T., Hu, Y., Ishida, A., Watanabe, K., 1992. Activity of primateputamen neurons is selective to the mode of voluntary movement: visuallyguided, self-initiated or memory-guided. Exp. Brain Res. 89, 473–477.

Kincaid, A.E., Zheng, T., Wilson, C.J., 1998. Connectivity and convergence of singlecorticostriatal axons. J. Neurosci. 18, 4722–4731.

Knierim, J.J., Kudrimoti, H.S., McNaughton, B.L., 1995. Place cells, head directioncells, and the learning of landmark stability. J. Neurosci. 15, 1648–1659.

Knierim, J.J., Lee, I., Hargreaves, E.L., 2006. Hippocampal place cells: parallel inputstreams, subregional processing, and implications for episodic memory. Hip-pocampus 16, 755–764.

Kobayashi, S., Schultz, W., 2008. Influence of reward delays on responses ofdopamine neurons. J. Neurosci. 28, 7837–7846.

Kobayashi, Y., Isa, T., 2002. Sensory-motor gating and cognitive control by thebrainstem cholinergic system. Neural Netw. 15, 731–741.

Koch, M., Schmid, A., Schnitzler, H.U., 2000. Role of muscles accumbens dopamineD1 and D2 receptors in instrumental and Pavlovian paradigms of conditionedreward. Psychopharmacology 152, 67–73.

Krebs, J.R., McCleery, R.H., 1984. Optimization in behavioural ecology. In: Davies,J.R.K.N.B. (Ed.), Behavioural Ecology. Sinauer, Sunderland, MA, pp. 91–121.

Kropf, W., Kuschinsky, K., 1993. Conditioned effects of apomorphine are manifest inregional EEG of rats both in hippocampus and in striatum. Naunyn Schmiede-bergs Arch. Pharmacol. 347, 487–493.

Kruse, J.M., Overmier, B., Konz, W.A., Rokke, E., 1983. Pavlovian conditionedstimulus effects upon instrumental choice behavior are reinforcer specific.Learn. Motiv. 14, 165–181.

Kubie, J.L., Ranck Jr., J.B., 1983. Sensory-behavioral correlates in individualhippocampus neurons in three situations: space and context. In: Seifert,W. (Ed.), Neurobiology of the Hippocampus. Academic, New York, pp.433–447.

Kubota, Y., Liu, J., Hu, D., DeCoteau, W.E., Eden, U.T., Smith, A.C., Graybiel, A.M., 2009.Stable encoding of task structure coexists with flexible coding of task events insensorimotor striatum. J. Neurophysiol. 102, 2142–2160.

Kurth-Nelson, Z., Redish, A.D., 2009. Temporal-difference reinforcement learningwith distributed representations. PLoS One 4, e7362.

Kurth-Nelson, Z., Redish, A.D., 2010. A reinforcement learning model of precom-mitment in decision making. Front. Behav. Neurosci. 4, 184.

Kusuki, T., Imahori, Y., Ueda, S., Inokuchi, K., 1997. Dopaminergic modulation of LTPinduction in the dentate gyrus of intact brain. Neuroreport 8, 2037–2040.

Langston, R.F., Ainge, J.A., Couey, J.J., Canto, C.B., Bjerknes, T.L., Witter, M.P., Moser,E.I., Moser, M.B., 2010. Development of the spatial representation system in therat. Science 328, 1576–1580.

Lansink, C.S., Goltstein, P.M., Lankelma, J.V., Joosten, R.N., McNaughton, B.L., Pen-nartz, C.M., 2008. Preferential reactivation of motivationally relevant informa-tion in the ventral striatum. J. Neurosci. 28, 6372–6382.

Lansink, C.S., Goltstein, P.M., Lankelma, J.V., McNaughton, B.L., Pennartz, C.M., 2009.Hippocampus leads ventral striatum in replay of place-reward information.PLoS Biol. 7, e1000173.

Lavoie, A.M., Mizumori, S.J., 1994. Spatial, movement- and reward-sensitive dis-charge by medial ventral striatum neurons of rats. Brain Res. 638, 157–168.

Lee, A.K., Wilson, M.A., 2002. Memory of sequential experience in the hippocampusduring slow wave sleep. Neuron 36, 1183–1194.

Lee, I., Knierim, J.J., 2007. The relationship between the field-shifting phenomenonand representational coherence of place cells in CA1 and CA3 in a cue-alteredenvironment. Learn. Mem. 14, 807–815.

Lee, I., Yoganarasimha, D., Rao, G., Knierim, J.J., 2004. Comparison of populationcoherence of place cells in hippocampal subfields CA1 and CA3. Nature 430,456–459.

Lemon, N., Manahan-Vaughan, D., 2006. Dopamine D1/D5 receptors gate theacquisition of novel information through hippocampal long-term potentiationand long-term depression. J. Neurosci. 26, 7723–7729.

Lenck-Santini, P.P., Muller, R.U., Save, E., Poucet, B., 2002. Relationships betweenplace cell firing fields and navigational decisions by rats. J. Neurosci. 22, 9035–9047.

Lenck-Santini, P.P., Save, E., Poucet, B., 2001. Evidence for a relationship betweenplace-cell spatial firing and spatial memory performance. Hippocampus 11,377–390.

Leung, L.S., Yim, C.Y., 1993. Rhythmic delta-frequency activities in the nucleusaccumbens of anesthetized and freely moving rats. Can. J. Physiol. Pharmacol.71, 311–320.

Leutgeb, J.K., Leutgeb, S., Moser, M.B., Moser, E.I., 2007. Pattern separation in thedentate gyrus and CA3 of the hippocampus. Science 315, 961–966.

Leutgeb, S., Leutgeb, J.K., Treves, A., Moser, M.B., Moser, E.I., 2004. Distinct ensemblecodes in hippocampal areas CA3 and CA1. Science 305, 1295–1298.

Lever, C., Burton, S., Jeewajee, A., O’Keefe, J., Burgess, N., 2009. Boundary vector cellsin the subiculum of the hippocampal formation. J. Neurosci. 29, 9771–9777.

Li, S., Cullen, W.K., Anwyl, R., Rowan, M.J., 2003. Dopamine-dependent facilitation ofLTP induction in hippocampal CA1 by exposure to spatial novelty. Nat. Neurosci.6, 526–531.

Lima, S.L., 1983. Downy woodpecker foraging behavior-foraging by expectation andenergy intake rate. Oecologia 58, 232–237.

Lisman, J.E., 1999. Relating hippocampal circuitry to function: recall of memorysequences by reciprocal dentate–CA3 interactions. Neuron 22, 233–242.

Lisman, J.E., Grace, A.A., 2005. The hippocampal–VTA loop: controlling the entry ofinformation into long-term memory. Neuron 46, 703–713.

Lisman, J., Redish, A.D., 2009. Prediction, sequences and the hippocampus. Philos.Trans. R. Soc. Lond. B: Biol. Sci. 364, 1193–1201.

Ljungberg, T., Apicella, P., Schultz, W., 1992. Responses of monkey dopamineneurons during learning of behavioral reactions. J. Neurophysiol. 67, 145–163.

Locurto, C., Terrace, H.S., Gibbon, J., 1976. Autoshaping, random control, andomission training in the rat. J. Exp. Anal. Behav. 26, 451–462.

Lodge, D.J., Grace, A.A., 2006. The laterodorsal tegmentum is essential for burstfiring of ventral tegmental area dopamine neurons. Proc. Natl. Acad. Sci. U.S.A.103, 5167–5172.

Long, J.M., Kesner, R.P., 1996. The effects of dorsal versus ventral hippocampal, totalhippocampal, and parietal cortex lesions on memory for allocentric distance inrats. Behav. Neurosci. 110, 922–932.

Lopes da Silva, F.H., Arnolds, D.E., Neijt, H.C., 1984. A functional link between thelimbic cortex and ventral striatum: physiology of the subiculum accumbenspathway. Exp. Brain Res. 55, 205–214.

Louie, K., Wilson, M.A., 2001. Temporally structured replay of awake hippocampalensemble activity during rapid eye movement sleep. Neuron 29, 145–156.

Ludvig, E.A., Sutton, R.S., Kehoe, E.J., 2008. Stimulus representation and the timing ofreward-prediction errors in models of the dopamine system. Neural Comput.20, 3034–3054.

MacArthur, R.H., Pianka, E.R., 1966. On optimal use of patchy environments. Am.Nat. 100, 603–609.

Maia, T.V., 2009. Reinforcement learning, conditioning, and the brain: success andchallenges. Cogn. Affect. Behav. Neurosci. 9, 343–364.

Maldonado-Irizarry, C.S., Kelley, A.E., 1995. Excitatory amino acid receptors withinnucleus accumbens subregions differentially mediate spatial learning in the rat.Behav. Pharmacol. 6, 527–539.

Maren, S., 2001. Neurobiology of Pavlovian fear conditioning. Annu. Rev. Neurosci.24, 897–931.

Markus, E.J., Barnes, C.A., McNaughton, B.L., Gladden, V.L., Skaggs, W.E., 1994.Spatial information content and reliability of hippocampal CA1 neurons: effectsof visual input. Hippocampus 4, 410–421.

Markus, E.J., Qin, Y.L., Leonard, B., Skaggs, W.E., McNaughton, B.L., Barnes, C.A., 1995.Interactions between location and task affect the spatial and directional firing ofhippocampal neurons. J. Neurosci. 15, 7079–7094.

Marowsky, A., Yanagawa, Y., Obata, K., Vogt, K.E., 2005. A specialized subclass ofinterneurons mediates dopaminergic facilitation of amygdala function. Neuron48, 1025–1037.

Marr, D., 1971. Simple memory: a theory for archicortex. Philos. Trans. R. Soc. Lond.B: Biol. Sci. 262, 23–81.

Martig, A.K., Jones, G.L., Smith, K.E., Mizumori, S.J., 2009. Context dependent effectsof ventral tegmental area inactivation on spatial working memory. Behav. BrainRes. 203, 316–320.

Page 36: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 131

Martig, A.K., Mizumori, S.J., 2011. Ventral tegmental area disruption selectivelyaffects CA1/CA2 but not CA3 place fields during a differential reward workingmemory task. Hippocampus 21, 172–184.

Martin, S.J., Grimwood, P.D., Morris, R.G., 2000. Synaptic plasticity and memory: anevaluation of the hypothesis. Annu. Rev. Neurosci. 23, 649–711.

Matsumoto, M., Hikosaka, O., 2007. Lateral habenula as a source of negative rewardsignals in dopamine neurons. Nature 447, 1111–1115.

Matsumoto, M., Hikosaka, O., 2009. Two types of dopamine neuron distinctlyconvey positive and negative motivational signals. Nature 459, 837–841.

Maurer, A.P., Vanrhoads, S.R., Sutherland, G.R., Lipa, P., McNaughton, B.L., 2005. Self-motion and the origin of differential spatial scaling along the septo-temporalaxis of the hippocampus. Hippocampus 15, 841–852.

McClelland, J.L., McNaughton, B.L., O’Reilly, R.C., 1995. Why there are complemen-tary learning systems in the hippocampus and neocortex: insights from thesuccesses and failures of connectionist models of learning and memory. Psy-chol. Rev. 102, 419–457.

McDonald, R.J., White, N.M., 1993. A triple dissociation of memory systems:hippocampus, amygdala, and dorsal striatum. Behav. Neurosci. 107, 3–22.

McFarland, K., Ettenberg, A., 1995. Haloperidol differentially affects reinforcementand motivational processes in rats running an alley for intravenous heroin.Psychopharmacology (Berl) 122, 346–350.

McGeorge, A.J., Faull, R.L., 1987. The organization and collateralization of corti-costriate neurones in the motor and sensory cortex of the rat brain. Brain Res.423, 318–324.

McGeorge, A.J., Faull, R.L., 1989. The organization of the projection from the cerebralcortex to the striatum in the rat. Neuroscience 29, 503–537.

McHugh, T.J., Blum, K.I., Tsien, J.Z., Tonegawa, S., Wilson, M.A., 1996. Impairedhippocampal representation of space in CA1-specific NMDAR1 knockout mice.Cell 87, 1339–1349.

McNaughton, B.L., Barnes, C.A., Gerrard, J.L., Gothard, K., Jung, M.W., Knierim, J.J.,Kudrimoti, H., Qin, Y., Skaggs, W.E., Suster, M., Weaver, K.L., 1996. Decipheringthe hippocampal polyglot: the hippocampus as a path integration system. J.Exp. Biol. 199, 173–185.

McNaughton, B.L., Barnes, C.A., O’Keefe, J., 1983. The contributions of position,direction, and velocity to single unit activity in the hippocampus of freely-moving rats. Exp. Brain Res. 52, 41–49.

Mehta, M.R., Barnes, C.A., McNaughton, B.L., 1997. Experience-dependent, asym-metric expansion of hippocampal place fields. Proc. Natl. Acad. Sci. U.S.A. 94,8918–8921.

Mehta, M.R., Quirk, M.C., Wilson, M.A., 2000. Experience-dependent asymmetricshape of hippocampal receptive fields. Neuron 25, 707–715.

Meredith, G.E., 1999. The synaptic framework for chemical signaling in nucleusaccumbens. Ann. N. Y. Acad. Sci. 877, 140–156.

Meredith, G.E., Agolia, R., Arts, M.P., Groenewegen, H.J., Zahm, D.S., 1992. Morpho-logical differences between projection neurons of the core and shell in thenucleus accumbens of the rat. Neuroscience 50, 149–162.

Meredith, G.E., Baldo, B.A., Andrezjewski, M.E., Kelley, A.E., 2008. The structuralbasis for mapping behavior onto the ventral striatum and its subdivisions. BrainStruct. Funct. 213, 17–27.

Meredith, G.E., Pattiselanno, A., Groenewegen, H.J., Haber, S.N., 1996. Shell and corein monkey and human nucleus accumbens identified with antibodies to cal-bindin-D28k. J. Comp. Neurol. 365, 628–639.

Mesulam, M.M., 1981. A cortical network for directed attention and unilateralneglect. Ann. Neurol. 10, 309–325.

Mingote, S., Font, L., Farrar, A.M., Vontell, R., Worden, L.T., Stopper, C.M., Port, R.G.,Sink, K.S., Bunce, J.G., Chrobak, J.J., Salamone, J.D., 2008a. Nucleus accumbensadenosine A2A receptors regulate exertion of effort by acting on the ventralstriatopallidal pathway. J. Neurosci. 28, 9037–9046.

Mingote, S., Pereira, M., Farrar, A.M., McLaughlin, P.J., Salamone, J.D., 2008b.Systemic administration of the adenosine A(2A) agonist CGS 21680 inducessedation at doses that suppress lever pressing and food intake. Pharmacol.Biochem. Behav. 89, 345–351.

Mirenowicz, J., Schultz, W., 1994. Importance of unpredictability for rewardresponses in primate dopamine neurons. J. Neurophysiol. 72, 1024–1027.

Mishkin, M., Malamut, B., Bachevalier, J., 1984. Memories and habits: two neuralsystems. In: Lynch, G., MCGaugh, J.L., Weinberger, N.M. (Eds.), Neurobiology ofLearning and Memory. Guilford, New York.

Miyashita, T., Kubik, S., Haghighi, N., Steward, O., Guzowski, J.F., 2009. Rapidactivation of plasticity-associated gene transcription in hippocampal neuronsprovides a mechanism for encoding of one-trial experience. J. Neurosci. 29,898–906.

Miyazaki, K.W., Miyazaki, K., Doya, K., 2011. Activation of the central serotonergicsystem in response to delayed but not omitted rewards. Eur. J. Neurosci. 33,153–160.

Mizumori, S.J., 2006. Hippocampal place fields: a neural code for episodic memory?Hippocampus 16, 685–690.

Mizumori, S.J., Barnes, C.A., McNaughton, B.L., 1989a. Reversible inactivation of themedial septum: selective effects on the spontaneous unit activity of differenthippocampal cell types. Brain Res. 500, 99–106.

Mizumori, S.J., Cooper, B.G., Leutgeb, S., Pratt, W.E., 2000. A neural systems analysisof adaptive navigation. Mol. Neurobiol. 21, 57–82.

Mizumori, S.J., Lavoie, A.M., Kalyani, A., 1996. Redistribution of spatial representa-tion in the hippocampus of aged rats performing a spatial memory task. Behav.Neurosci. 110, 1006–1016.

Mizumori, S.J., McNaughton, B.L., Barnes, C.A., Fox, K.B., 1989b. Preserved spatialcoding in hippocampal CA1 pyramidal cells during reversible suppression of

CA3c output: evidence for pattern completion in hippocampus. J. Neurosci. 9,3915–3928.

Mizumori, S.J., Puryear, C.B., Martig, A.K., 2009. Basal ganglia contributions toadaptive navigation. Behav. Brain Res. 199, 32–42.

Mizumori, S.J., Ragozzino, K.E., Cooper, B.G., Leutgeb, S., 1999. Hippocampal repre-sentational organization and spatial context. Hippocampus 9, 444–451.

Mizumori, S.J., Smith, D.M., Puryear, C.B., 2007a. Hippocampal and neocorticalinteractions during context discrimination: electrophysiological evidence fromthe rat. Hippocampus 17, 851–862.

Mizumori, S.J., Yeshenko, O., Gill, K.M., Davis, D.M., 2004. Parallel processing acrossneural systems: implications for a multiple memory system hypothesis. Neu-robiol. Learn. Mem. 82, 278–298.

Mizumori, S.J.Y., 2008. Hippocampal Place Fields: Relevance to Learning andMemory. Oxford University Press, New York.

Mizumori, S.J.Y., Smith, D.M., Puryear, C.B., 2007b. Mnemonic contributions ofhippocampal place cells. In: Martinez, J.L., Kesner, R.P. (Eds.), Neurobiologyof Learning and Memory. Academic Press.

Mogenson, G.J., Jones, D.L., Yim, C.Y., 1980. From motivation to action: functionalinterface between the limbic system and the motor system. Prog. Neurobiol. 14,69–97.

Molina-Luna, K., Pekanovic, A., Rohrich, S., Hertler, B., Schubring-Giese, M., Rioult-Pedotti, M.S., Luft, A.R., 2009. Dopamine in motor cortex is necessary for skilllearning and synaptic plasticity. PLoS One 4, e7082.

Montague, P.R., Dayan, P., Sejnowski, T.J., 1996. A framework for mesencephalicdopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947.

Montgomery, S.M., Betancur, M.I., Buzsaki, G., 2009. Behavior-dependent coordi-nation of multiple theta dipoles in the hippocampus. J. Neurosci. 29, 1381–1394.

Morris, R.G., Frey, U., 1997. Hippocampal synaptic plasticity: role in spatial learningor the automatic recording of attended experience? Philos. Trans. R. Soc. Lond.B: Biol. Sci. 352, 1489–1503.

Morris, R.G.M., 1981. Spatial localization does not require the presence of local cues.Learn. Motiv. 12, 239–260.

Moscovitch, M., Rosenbaum, R.S., Gilboa, A., Addis, D.R., Westmacott, R., Grady, C.,McAndrews, M.P., Levine, B., Black, S., Winocur, G., Nadel, L., 2005. Functionalneuroanatomy of remote episodic, semantic and spatial memory: a unifiedaccount based on multiple trace theory. J. Anat. 207, 35–66.

Moser, E.I., Kropff, E., Moser, M.B., 2008. Place cells, grid cells, and the brain’s spatialrepresentation system. Annu. Rev. Neurosci. 31, 69–89.

Mott, A.M., Nunes, E.J., Collins, L.E., Port, R.G., Sink, K.S., Hockemeyer, J., Muller,C.E., Salamone, J.D., 2009. The adenosine A2A antagonist MSX-3 reverses theeffects of the dopamine antagonist haloperidol on effort-related decisionmaking in a T-maze cost/benefit procedure. Psychopharmacology (Berl) 204,103–112.

Mulder, A.B., Hodenpijl, M.G., Lopes da Silva, F.H., 1998. Electrophysiology of thehippocampal and amygdaloid projections to the nucleus accumbens of therat: convergence, segregation, and interaction of inputs. J. Neurosci. 18,5095–5102.

Mulder, A.B., Tabuchi, E., Wiener, S.I., 2004. Neurons in hippocampal afferent zonesof rat striatum parse routes into multi-pace segments during maze navigation.Eur. J. Neurosci. 19, 1923–1932.

Muller, R.U., Kubie, J.L., 1987. The effects of changes in the environment on thespatial firing of hippocampal complex-spike cells. J. Neurosci. 7, 1951–1968.

Muller, R.U., Kubie, J.L., 1989. The firing of hippocampal place cells predicts thefuture position of freely moving rats. J. Neurosci. 9, 4101–4110.

Muller, R.U., Stead, M., Pach, J., 1996. The hippocampus as a cognitive graph. J. Gen.Physiol. 107, 663–694.

Munn, N.L., 1950. Handbook of Psychological Research on the Rat; An Introductionto Animal Psychology. Houghton Mifflin, Oxford.

Myers, C.E., Gluck, M., 1994. Context, conditioning, and hippocampal rerepresenta-tion in animal learning. Behav. Neurosci. 108, 835–847.

Nadel, L., Payne, J.D., 2002. The hippocampus, wayfinding and episodic memory. In:Sharp, P.E. (Ed.), The Neural Basis of Navigation: Evidence from Single CellRecording. Kluwer Academic Publication, MA.

Nadel, L., Wilner, J., 1980. Context and conditioning: a place for space. Physiol.Psychol. 8, 218–228.

Nai, Q., Li, S., Wang, S.H., Liu, J., Lee, F.J., Frankland, P.W., Liu, F., 2010. Uncoupling theD1–N-methyl-D-aspartate (NMDA) receptor complex promotes NMDA-depen-dent long-term potentiation and working memory. Biol. Psychiatry 67, 246–254.

Nair-Roberts, R.G., Chatelain-Badie, S.D., Benson, E., White-Cooper, H., Bolam, J.P.,Ungless, M.A., 2008. Stereological estimates of dopaminergic, GABAergic andglutamatergic neurons in the ventral tegmental area, substantia nigra andretrorubral field in the rat. Neuroscience 152, 1024–1031.

Nakahara, H., Itoh, H., Kawagoe, R., Takikawa, Y., Hikosaka, O., 2004. Dopamineneurons can represent context-dependent prediction error. Neuron 41, 269–280.

Nicola, S.M., 2007. The nucleus accumbens as part of a basal ganglia action selectioncircuit. Psychopharmacology (Berl) 191, 521–550.

Nicola, S.M., 2010. The flexible approach hypothesis: unification of effort and cue-responding hypotheses for the role of nucleus accumbens dopamine in theactivation of reward-seeking behavior. J. Neurosci. 30, 16585–16600.

Nicola, S.M., Kombian, S.B., Malenka, R.C., 1996. Psychostimulants depress excit-atory synaptic transmission in the nucleus accumbens via presynaptic D1-likedopamine receptors. J. Neurosci. 16, 1591–1604.

Page 37: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135132

Nicola, S.M., Malenka, R.C., 1998. Modulation of synaptic transmission by dopamineand norepinephrine in ventral but not dorsal striatum. J. Neurophysiol. 79,1768–1776.

Nicola, S.M., Surmeier, J., Malenka, R.C., 2000. Dopaminergic modulation of neuronalexcitability in the striatum and nucleus accumbens. Annu. Rev. Neurosci. 23,185–215.

Nicola, S.M., Yun, I.A., Wakabayashi, K.T., Fields, H.L., 2004. Firing of nucleusaccumbens neurons during the consummatory phase of a discriminative stim-ulus task depends on previous reward predictive cues. J. Neurophysiol. 91,1866–1882.

Niv, Y., 2009. Reinforcement learning in the brain. J. Math. Psychol. 53, 139–154.Niv, Y., Daw, N.D., Joel, D., Dayan, P., 2007. Tonic dopamine: opportunity costs and

the control of response vigor. Psychopharmacology (Berl) 191, 507–520.Niv, Y., Joel, D., Dayan, P., 2006. A normative perspective on motivation. Trends

Cogn. Sci. 10, 375–381.O’Carroll, C.M., Morris, R.G., 2004. Heterosynaptic co-activation of glutamatergic

and dopaminergic afferents is required to induce persistent long-term potenti-ation. Neuropharmacology 47, 324–332.

O’Doherty, J., Dayan, P., Schultz, J., Deichmann, R., Friston, K., Dolan, R.J., 2004.Dissociable roles of ventral and dorsal striatum in instrumental conditioning.Science 304, 452–454.

O’Doherty, J.P., Dayan, P., Friston, K., Critchley, H., Dolan, R.J., 2003. Temporaldifference models and reward-related learning in the human brain. Neuron38, 329–337.

O’Donnell, P., Grace, A.A., 1995. Synaptic interactions among excitatory afferents tonucleus accumbens neurons: hippocampal gating of prefrontal cortical input. J.Neurosci. 15, 3622–3639.

O’Keefe, J., 1976. Place units in the hippocampus of the freely moving rat. Exp.Neurol. 51, 78–109.

O’Keefe, J., Burgess, N., 1996. Geometric determinants of the place fields of hippo-campal neurons. Nature 381, 425–428.

O’Keefe, J., Conway, D.H., 1978. Hippocampal place units in the freely moving rat:why they fire where they fire. Exp. Brain Res. 31, 573–590.

O’Keefe, J., Dostrovsky, J., 1971. The hippocampus as a spatial map. Preliminaryevidence from unit activity in the freely-moving rat. Brain Res. 34, 171–175.

O’Keefe, J., Nadel, L., 1978a. The Hippocampus as a Cognitive Map. Oxford UniversityPress.

O’Keefe, J., Recce, M.L., 1993. Phase relationship between hippocampal place unitsand the EEG theta rhythm. Hippocampus 3, 317–330.

O’Mara, S.M., 1995. Spatially selective firing properties of hippocampal formationneurons in rodents and primates. Prog. Neurobiol. 45, 253–274.

O’Reilly, R.C., McClelland, J.L., 1994. Hippocampal conjunctive encoding, storage,and recall: avoiding a trade-off. Hippocampus 4, 661–682.

O’Keefe, J., Nadel, L., 1978b. The Hippocampus as a Cognitive Map. Oxford UniversityPress, Oxford.

O’Keefe, J., Speakman, A., 1987. Single unit activity in the rat hippocampus during aspatial memory task. Exp. Brain Res. 68, 1–27.

Oakman, S.A., Faris, P.L., Kerr, P.E., Cozzari, C., Hartman, B.K., 1995. Distribution ofpontomesencephalic cholinergic neurons projecting to substantia nigra differssignificantly from those projecting to ventral tegmental area. J. Neurosci. 15,5859–5869.

Olton, D.S., Becker, J.T., Handelmann, G.E., 1979. Hippocampus, space, and memory.Brain Behav. Sci. 2, 313–365.

Olton, D.S., Samuelson, R.J., 1976. Remembrance of places passed: spatial memoryin rats. J. Exp. Psychol. Anim. Behav. Process. 2, 97–116.

Olypher, A.V., Lansky, P., Fenton, A.A., 2002. Properties of the extra-positional signalin hippocampal place cell discharge derived from the overdispersion in loca-tion-specific firing. Neuroscience 111, 553–566.

Omelchenko, N., Sesack, S.R., 2009. Ultrastructural analysis of local collaterals of ratventral tegmental area neurons: GABA phenotype and synapses onto dopamineand GABA cells. Synapse 63, 895–906.

Ostlund, S.B., Wassum, K.M., Murphy, N.P., Balleine, B.W., Maidment, N.T., 2011.Extracellular dopamine levels in striatal subregions track shifts in motivationand response cost during instrumental conditioning. J. Neurosci. 31, 200–207.

Otmakhova, N.A., Lisman, J.E., 1996. D1/D5 dopamine receptor activation increasesthe magnitude of early long-term potentiation at CA1 hippocampal synapses. J.Neurosci. 16, 7478–7486.

Otmakhova, N.A., Lisman, J.E., 1998. D1/D5 dopamine receptors inhibit depotentia-tion at CA1 synapses via cAMP-dependent mechanism. J. Neurosci. 18, 1270–1279.

Oyama, K., Hernadi, I., Iijima, T., Tsutsui, K., 2010. Reward prediction error coding indorsal striatal neurons. J. Neurosci. 30, 11447–11457.

Packard, M.G., 1999. Glutamate infused posttraining into the hippocampus orcaudate-putamen differentially strengthens place and response learning. Proc.Natl. Acad. Sci. U.S.A. 96, 12881–12886.

Packard, M.G., 2009. Exhumed from thought: basal ganglia and response learning inthe plus-maze. Behav. Brain Res. 199, 24–31.

Packard, M.G., Knowlton, B.J., 2002. Learning and memory functions of the basalganglia. Annu. Rev. Neurosci. 25, 563–593.

Packard, M.G., McGaugh, J.L., 1996. Inactivation of hippocampus or caudate nucleuswith lidocaine differentially affects expression of place and response learning.Neurobiol. Learn. Mem. 65, 65–72.

Packard, M.G., Hirsh, R., White, N.M., 1989. Differential effects of fornix and caudatenucleus lesions on two radial maze tasks: evidence for multiple memorysystems. J. Neurosci. 9, 1465–1472.

Palmiter, R.D., 2008. Dopamine signaling in the dorsal striatum is essential formotivated behaviors: lessons from dopamine-deficient mice. Ann. N. Y. Acad.Sci. 1129, 35–46.

Pan, W.X., Hyland, B.I., 2005. Pedunculopontine tegmental nucleus controls condi-tioned responses of midbrain dopamine neurons in behaving rats. J. Neurosci.25, 4725–4732.

Pan, W.X., Schmidt, R., Wickens, J.R., Hyland, B.I., 2005. Dopamine cells respond topredicted events during classical conditioning: evidence for eligibility traces inthe reward-learning network. J. Neurosci. 25, 6235–6242.

Pan, W.X., Schmidt, R., Wickens, J.R., Hyland, B.I., 2008. Tripartite mechanism ofextinction suggested by dopamine neuron activity and temporal differencemodel. J. Neurosci. 28, 9619–9631.

Parent, A., 1990. Extrinsic connections of the basal ganglia. Trends Neurosci. 13,254–258.

Parkinson, J.A., Dalley, J.W., Cardinal, R.N., Bamford, A., Fehnert, B., Lachenal, G.,Rudarakanchana, N., Halkerston, K.M., Robbins, T.W., Everitt, B.J., 2002. Nucleusaccumbens dopamine depletion impairs both acquisition and performance ofappetitive Pavlovian approach behaviour: implications for mesoaccumbensdopamine function. Behav. Brain Res. 137, 149–163.

Paxinos, G., Watson, C., 2007. The Rat Brain in Stereotaxic Coordinates. ElsevierAcademic Press, San Diego.

Pellis, S.M., Castaneda, E., McKenna, M.M., Tran-Nguyen, L.T., Whishaw, I.Q., 1993.The role of the striatum in organizing sequences of play fighting in neonatallydopamine-depleted rats. Neurosci. Lett. 158, 13–15.

Penick, S., Solomon, P.R., 1991. Hippocampus, context, and conditioning. Behav.Neurosci. 105, 611–617.

Pennartz, C.M., Berke, J.D., Graybiel, A.M., Ito, R., Lansink, C.S., van der Meer, M.,Redish, A.D., Smith, K.S., Voorn, P., 2009. Corticostriatal interactions duringlearning, memory processing, and decision making. J. Neurosci. 29, 12831–12838.

Pennartz, C.M., Groenewegen, H.J., Lopes da Silva, F.H., 1994. The nucleus accum-bens as a complex of functionally distinct neuronal ensembles: an integration ofbehavioural, electrophysiological and anatomical data. Prog. Neurobiol. 42,719–761.

Pennartz, C.M., Lee, E., Verheul, J., Lipa, P., Barnes, C.A., McNaughton, B.L., 2004. Theventral striatum in off-line processing: ensemble reactivation during sleep andmodulation by hippocampal ripples. J. Neurosci. 24, 6446–6456.

Pennartz, C.M., Uylings, H.B., Barnes, C.A., McNaughton, B.L., 2002. Memory reacti-vation and consolidation during sleep: from cellular mechanisms to humanperformance. Prog. Brain Res. 138, 143–166.

Phillips, P.E., Walton, M.E., Jhou, T.C., 2007. Calculating utility: preclinical evidencefor cost–benefit analysis by mesolimbic dopamine. Psychopharmacology (Berl)191, 483–495.

Phillips, R.G., LeDoux, J.E., 1992. Differential contribution of amygdala and hippo-campus to cued and contextual fear conditioning. Behav. Neurosci. 106, 274–285.

Poldrack, R.A., Packard, M.G., 2003. Competition among multiple memory systems:converging evidence from animal and human brain studies. Neuropsychologia41, 245–251.

Poucet, B., 1993. Spatial cognitive maps in animals: new hypotheses on theirstructure and neural mechanisms. Psychol. Rev. 100, 163–182.

Pragay, E.B., Mirsky, A.F., Ray, C.L., Turner, D.F., Mirsky, C.V., 1978. Neuronal activityin the brain stem reticular formation during performance of a ‘‘go-no go’’ visualattention task in the monkey. Exp. Neurol. 60, 83–95.

Puryear, C.B., Kim, M.J., Mizumori, S.J., 2010. Conjunctive encoding of movementand reward by ventral tegmental area neurons in the freely navigating rodent.Behav. Neurosci. 124, 234–247.

Puryear, C.B., Mizumori, S.J., 2008. Reward prediction error signals by reticularformation neurons. Learn. Mem. 15, 895–898.

Quirk, G.J., Muller, R.U., Kubie, J.L., 1990. The firing of hippocampal place cells in thedark depends on the rat’s recent experience. J. Neurosci. 10, 2008–2017.

Ragozzino, K.E., Leutgeb, S., Mizumori, S.J., 2001. Dorsal striatal head direction andhippocampal place representations during spatial navigation. Exp. Brain Res.139, 372–376.

Ragozzino, M.E., 2003. Acetylcholine actions in the dorsomedial striatum supportthe flexible shifting of response patterns. Neurobiol. Learn. Mem. 80, 257–267.

Ragozzino, M.E., Detrick, S., Kesner, R.P., 1999a. Involvement of the prelimbic–infralimbic areas of the rodent prefrontal cortex in behavioral flexibility forplace and response learning. J. Neurosci. 19, 4585–4594.

Ragozzino, M.E., Mohler, E.G., Prior, M., Palencia, C.A., Rozman, S., 2009. Acetylcho-line activity in selective striatal regions supports behavioral flexibility. Neu-robiol. Learn. Mem. 91, 13–22.

Ragozzino, M.E., Ragozzino, K.E., Mizumori, S.J., Kesner, R.P., 2002. Role of thedorsomedial striatum in behavioral flexibility for response and visual cuediscrimination learning. Behav. Neurosci. 116, 105–115.

Ragozzino, M.E., Wilcox, C., Raso, M., Kesner, R.P., 1999b. Involvement of rodentprefrontal cortex subregions in strategy switching. Behav. Neurosci. 113, 32–41.

Ranck Jr., J.B., 1973. Studies on single neurons in dorsal hippocampal formation andseptum in unrestrained rats. I. Behavioral correlates and firing repertoires. Exp.Neurol. 41, 461–531.

Rangel, A., Camerer, C., Montague, P.R., 2008. A framework for studying theneurobiology of value-based decision making. Nat. Rev. Neurosci. 9, 545–556.

Rawlins, J.N.P., 1985. Associations across time: the hippocampus as a temporarymemory store. Brain Behav. Sci. 8, 479–496.

Redgrave, P., Gurney, K., 2006. The short-latency dopamine signal: a role indiscovering novel actions? Nat. Rev. Neurosci. 7, 967–975.

Page 38: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 133

Redgrave, P., Mitchell, I.J., Dean, P., 1987. Further evidence for segregated outputchannels from superior colliculus in rat: ipsilateral tecto-pontine and tecto-cuneiform projections have different cells of origin. Brain Res. 413, 170–174.

Redgrave, P., Prescott, T.J., Gurney, K., 1999a. The basal ganglia: a vertebratesolution to the selection problem? Neuroscience 89, 1009–1023.

Redgrave, P., Prescott, T.J., Gurney, K., 1999b. Is the short-latency dopamineresponse too short to signal reward error? Trends Neurosci. 22, 146–151.

Redish, A.D., Jensen, S., Johnson, A., Kurth-Nelson, Z., 2007. Reconciling reinforce-ment learning models with behavioral extinction and renewal: implications foraddiction, relapse, and problem gambling. Psychol. Rev. 114, 784–805.

Redish, A.D., 1999. Beyond the Cognitive Map: From Place Cells to Episodic Memory.The MIT Press, Boston.

Redish, A.D., Battaglia, F.P., Chawla, M.K., Ekstrom, A.D., Gerrard, J.L., Lipa, P.,Rosenzweig, E.S., Worley, P.F., Guzowski, J.F., McNaughton, B.L., Barnes, C.A.,2001. Independence of firing correlates of anatomically proximate hippocampalpyramidal cells. J. Neurosci. 21, RC134 (1–6).

Redish, A.D., Rosenzweig, E.S., Bohanick, J.D., McNaughton, B.L., Barnes, C.A., 2000.Dynamics of hippocampal ensemble activity realignment: time versus space. J.Neurosci. 20, 9298–9309.

Reese, N.B., Garcia-Rill, E., Skinner, R.D., 1995. The pedunculopontine nucleus—auditory input, arousal and pathophysiology. Prog. Neurobiol. 47, 105–133.

Rescorla, R.A., Solomon, R.L., 1967. Two-process learning theory: relationshipsbetween Pavlovian conditioning and instrumental learning. Psychol. Rev. 74,151–182.

Rescorla, R.A., Wagner, A.R., 1972. A theory of Pavlovian conditioning: variations inthe effectiveness of reinforcement and nonreinforcement. In: Black, A.H.,Prokasy, W.F. (Eds.), Classical Conditioning II: Current Research and Theory.Appleton Century Crofts, New York, pp. 64–99.

Restle, F., 1957. Discrimination of cues in mazes: a resolution of the place-vs.-response question. Psychol. Rev. 64, 217–228.

Richards, J.B., Mitchell, S.H., de Wit, H., Seiden, L.S., 1997. Determination of discountfunctions in rats with an adjusting-amount procedure. J. Exp. Anal. Behav. 67,353–366.

Robbins, T.W., Everitt, B.J., 2002. Limbic–striatal memory systems and drug addic-tion. Neurobiol. Learn. Mem. 78, 625–636.

Robinson, D.L., Venton, B.J., Heien, M.L., Wightman, R.M., 2003. Detecting subseconddopamine release with fast-scan cyclic voltammetry in vivo. Clin. Chem. 49,1763–1773.

Robinson, S., Rainwater, A.J., Hnasko, T.S., Palmiter, R.D., 2007. Viral restoration ofdopamine signaling to the dorsal striatum restores instrumental conditioningto dopamine-deficient mice. Psychopharmacology (Berl) 191, 567–578.

Robinson, S., Smith, D.M., Mizumori, S.J., Palmiter, R.D., 2004. Firing properties ofdopamine neurons in freely moving dopamine-deficient mice: effects of dopa-mine receptor activation and anesthesia. Proc. Natl. Acad. Sci. U.S.A. 101,13329–13334.

Roesch, M.R., Calu, D.J., Schoenbaum, G., 2007. Dopamine neurons encode the betteroption in rats deciding between differently delayed or sized rewards. Nat.Neurosci. 10, 1615–1624.

Roesch, M.R., Singh, T., Brown, P.L., Mullins, S.E., Schoenbaum, G., 2009. Ventralstriatal neurons encode the value of the chosen action in rats deciding betweendifferently delayed or sized rewards. J. Neurosci. 29, 13365–13376.

Roitman, M.F., Stuber, G.D., Phillips, P.E., Wightman, R.M., Carelli, R.M., 2004.Dopamine operates as a subsecond modulator of food seeking. J. Neurosci.24, 1265–1271.

Roitman, M.F., Wheeler, R.A., Carelli, R.M., 2005. Nucleus accumbens neurons areinnately tuned for rewarding and aversive taste stimuli, encode their predictors,and are linked to motor output. Neuron 45, 587–597.

Rolls, E.T., 1996. A theory of hippocampal function in memory. Hippocampus 6,601–620.

Rosenzweig, E.S., Redish, A.D., McNaughton, B.L., Barnes, C.A., 2003. Hippocampalmap realignment and spatial learning. Nat. Neurosci. 6, 609–615.

Rossato, J.I., Bevilaqua, L.R., Izquierdo, I., Medina, J.H., Cammarota, M., 2009.Dopamine controls persistence of long-term memory storage. Science 325,1017–1020.

Roullet, P., Sargolini, F., Oliverio, A., Mele, A., 2001. NMDA and AMPA antagonistinfusions into the ventral striatum impair different steps of spatial informationprocessing in a nonassociative task in mice. J. Neurosci. 21, 2143–2149.

Sabatino, M., Ferraro, G., Liberti, G., Vella, N., La Grutta, V., 1985. Striatal and septalinfluence on hippocampal theta and spikes in the cat. Neurosci. Lett. 61, 55–59.

Sakurai, Y., 1994. Involvement of auditory cortical and hippocampal neurons inauditory working memory and reference memory in the rat. J. Neurosci. 14,2606–2623.

Salamone, J.D., 1994. The involvement of nucleus accumbens dopamine in appeti-tive and aversive motivation. Behav. Brain Res. 61, 117–133.

Salamone, J.D., 2002. Functional significance of nucleus accumbens dopamine:behavior, pharmacology and neurochemistry. Behav. Brain Res. 137, 1.

Salamone, J.D., 2007. Functions of mesolimbic dopamine: changing concepts andshifting paradigms. Psychopharmacology (Berl) 191, 389.

Salamone, J.D., Arizzi, M.N., Sandoval, M.D., Cervone, K.M., Aberman, J.E., 2002.Dopamine antagonists alter response allocation but do not suppress appetite forfood in rats: contrast between the effects of SKF 83566, raclopride, andfenfluramine on a concurrent choice task. Psychopharmacology (Berl) 160,371–380.

Salamone, J.D., Correa, M., 2002. Motivational views of reinforcement: implicationsfor understanding the behavioral functions of nucleus accumbens dopamine.Behav. Brain Res. 137, 3–25.

Salamone, J.D., Correa, M., Farrar, A., Mingote, S.M., 2007. Effort-related functions ofnucleus accumbens dopamine and associated forebrain circuits. Psychophar-macology (Berl) 191, 461–482.

Salamone, J.D., Correa, M., Farrar, A.M., Nunes, E.J., Pardo, M., 2009. Dopamine,behavioral economics, and effort. Front. Behav. Neurosci. 3, 13.

Salamone, J.D., Steinpreis, R.E., McCullough, L.D., Smith, P., Grebel, D., Mahan, K.,1991. Haloperidol and nucleus accumbens dopamine depletion suppress leverpressing for food but increase free food consumption in a novel food choiceprocedure. Psychopharmacology (Berl) 104, 515–521.

Sargolini, F., Florian, C., Oliverio, A., Mele, A., Roullet, P., 2003. Differential involve-ment of NMDA and AMPA receptors within the nucleus accumbens in consoli-dation of information necessary for place navigation and guidance strategy ofmice. Learn. Mem. 10, 285–292.

Sargolini, F., Fyhn, M., Hafting, T., McNaughton, B.L., Witter, M.P., Moser, M.B.,Moser, E.I., 2006. Conjunctive representation of position, direction, and velocityin entorhinal cortex. Science 312, 758–762.

Sargolini, F., Roullet, P., Oliverio, A., Mele, A., 1999. Effects of lesions to theglutamatergic afferents to the nucleus accumbens in the modulation ofreactivity to spatial and non-spatial novelty in mice. Neuroscience 93,855–867.

Savelli, F., Knierim, J.J., 2010. Hebbian analysis of the transformation of medialentorhinal grid-cell inputs to hippocampal place fields. J. Neurophysiol. 103,3167–3183.

Schmitzer-Torbert, N., Redish, A.D., 2002. Development of path stereotypy in asingle day in rats on a multiple-T maze. Arch. Ital. Biol. 140, 295–301.

Schmitzer-Torbert, N., Redish, A.D., 2004. Neuronal activity in the rodent dorsalstriatum in sequential navigation: separation of spatial and reward responseson the multiple T task. J. Neurophysiol. 91, 2259–2272.

Schultz, W., 1997. Dopamine neurons and their role in reward mechanisms. Curr.Opin. Neurobiol. 7, 191–197.

Schultz, W., 1998a. The phasic reward signal of primate dopamine neurons. Adv.Pharmacol. 42, 686–690.

Schultz, W., 1998b. Predictive reward signal of dopamine neurons. J. Neurophysiol.80, 1–27.

Schultz, W., 2002. Getting formal with dopamine and reward. Neuron 36, 241–263.Schultz, W., 2010. Dopamine signals for reward value and risk: basic and recent

data. Behav. Brain Funct. 6, 24.Schultz, W., Apicella, P., Ljungberg, T., 1993. Responses of monkey dopamine

neurons to reward and conditioned stimuli during successive steps of learninga delayed response task. J. Neurosci. 13, 900–913.

Schultz, W., Dayan, P., Montague, P.R., 1997. A neural substrate of prediction andreward. Science 275, 1593–1599.

Schultz, W., Dickinson, A., 2000. Neuronal coding of prediction errors. Annu. Rev.Neurosci. 23, 473–500.

Schultz, W., Romo, R., 1988. Neuronal activity in the monkey striatum during theinitiation of movements. Exp. Brain Res. 71, 431–436.

Schultz, W., Romo, R., 1992. Role of primate basal ganglia and frontal cortex in theinternal generation of movements. I. Preparatory activity in the anterior stria-tum. Exp. Brain Res. 91, 363–384.

Schweimer, J., Hauber, W., 2006. Dopamine D1 receptors in the anterior cingulatecortex regulate effort-based decision making. Learn. Mem. 13, 777–782.

Seamans, J.K., Phillips, A.G., 1994. Selective memory impairments produced bytransient lidocaine-induced lesions of the nucleus accumbens in rats. Behav.Neurosci. 108, 456–468.

Seamans, .J.K., Yang, C.R., 2004. The principal features and mechanisms of dopaminemodulation in the prefrontal cortex. Prog. Neurobiol. 74, 1–58.

Sesack, S.R., Carr, D.B., Omelchenko, N., Pinto, A., 2003. Anatomical substrates forglutamate–dopamine interactions: evidence for specificity of connections andextrasynaptic actions. Ann. N. Y. Acad. Sci. 1003, 36–52.

Sesack, S.R., Grace, A.A., 2010. Cortico-basal ganglia reward network: microcircuit-ry. Neuropsychopharmacology 35, 27–47.

Setlow, B., McGaugh, J.L., 1998. Sulpiride infused into the nucleus accumbensposttraining impairs memory of spatial water maze training. Behav. Neurosci.112, 603–610.

Setlow, B., Schoenbaum, G., Gallagher, M., 2003. Neural encoding in ventral striatumduring olfactory discrimination learning. Neuron 38, 625–636.

Seymour, B., O’Doherty, J.P., Dayan, P., Koltzenburg, M., Jones, A.K., Dolan, R.J.,Friston, K.J., Frackowiak, R.S., 2004. Temporal difference models describehigher-order learning in humans. Nature 429, 664–667.

Siapas, A.G., Lubenov, E.V., Wilson, M.A., 2005. Prefrontal phase locking to hippo-campal theta oscillations. Neuron 46, 141–151.

Sidman, M., Fletcher, F.G., 1968. A demonstration of auto-shaping with monkeys. J.Exp. Anal. Behav. 11, 307–309.

Singer, A.C., Frank, L.M., 2009. Rewarded outcomes enhance reactivation of experi-ence in the hippocampus. Neuron 64, 910–921.

Sink, K.S., Vemuri, V.K., Olszewska, T., Makriyannis, A., Salamone, J.D., 2008.Cannabinoid CB1 antagonists and dopamine antagonists produce differenteffects on a task involving response allocation and effort-related choice infood-seeking behavior. Psychopharmacology (Berl) 196, 565–574.

Skaggs, W.E., McNaughton, B.L., Wilson, M.A., Barnes, C.A., 1996. Theta phaseprecession in hippocampal neuronal populations and the compression oftemporal sequences. Hippocampus 6, 149–172.

Small, W.S., 1899. Notes on the psychic development of the young white rat. Am. J.Psychol. 11, 80–100.

Small, W.S., 1900. An experimental study of the mental processes of the rat. Am. J.Psychol. 11, 133–165.

Page 39: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135134

Small, W.S., 1901. Experimental study of the mental processes of the rat. Am. J.Psychol. 12, 206–239.

Smith-Roe, S.L., Kelley, A.E., 2000. Coincident activation of NMDA and dopamine D1receptors within the nucleus accumbens core is required for appetitive instru-mental learning. J. Neurosci. 20, 7737–7742.

Smith-Roe, S.L., Sadeghian, K., Kelley, A.E., 1999. Spatial learning and performancein the radial arm maze is impaired after N-methyl-D-aspartate (NMDA) receptorblockade in striatal subregions. Behav. Neurosci. 113, 703–717.

Smith, D.M., Mizumori, S.J., 2006a. Hippocampal place cells, context, and episodicmemory. Hippocampus 16, 716–729.

Smith, D.M., Mizumori, S.J., 2006b. Learning-related development of context-spe-cific neuronal responses to places and events: the hippocampal role in contextprocessing. J. Neurosci. 26, 3154–3163.

Song, E.Y., Kim, Y.B., Kim, Y.H., Jung, M.W., 2005. Role of active movement in place-specific firing of hippocampal neurons. Hippocampus 15, 8–17.

Sotak, B.N., Hnasko, T.S., Robinson, S., Kremer, E.J., Palmiter, R.D., 2005. Dysregula-tion of dopamine signaling in the dorsal striatum inhibits feeding. Brain Res.1061, 88–96.

Squire, L.R., Knowlton, B., Musen, G., 1993. The structure and organization ofmemory. Annu. Rev. Psychol. 44, 453–495.

Squire, L.R., 1994. Memory and forgetting: long-term and gradual changes inmemory storage. Int. Rev. Neurobiol. 37, 243–269 discussion 248–285.

Stephens, B.a.K.J., 1986. Foraging Theory. Princeton University Press, Princeton, NJ.Stramiello, M., Wagner, J.J., 2008. D1/5 receptor-mediated enhancement of LTP

requires PKA. Src family kinases, and NR2B-containing NMDARs. Neurophar-macology 55, 871–877.

Suri, R.E., 2002. TD models of reward predictive responses in dopamine neurons.Neural Netw. 15, 523–533.

Suri, R.E., Schultz, W., 2001. Temporal difference model reproduces anticipatoryneural activity. Neural Comput. 13, 841–862.

Surmeier, D.J., Ding, J., Day, M., Wang, Z., Shen, W., 2007. D1 and D2 dopamine-receptor modulation of striatal glutamatergic signaling in striatal mediumspiny neurons. Trends Neurosci. 30, 228–235.

Surmeier, D.J., Shen, W., Day, M., Gertler, T., Chan, S., Tian, X., Plotkin, J.L., 2010. Therole of dopamine in modulating the structure and function of striatal circuits.Prog. Brain Res. 183, 149–167.

Sutton, R.S., 1988. Learning to predict by the methods of temporal differences.Mach. Learn. 3, 9–44.

Sutton, R., Barto, A., 1998. Reinforcement Learning: An Introduction. MIT Press,Cambridge, MA.

Swanson, L.W., 2003. Brain Maps: Structure of the Rat Brain, 3rd edition. AcademicPress, San Diego, CA.

Swanson, L.W., Cowan, W.M., 1977. An autoradiographic study of the organizationof the efferent connections of the hippocampal formation in the rat. J. Comp.Neurol. 172, 49–84.

Tabuchi, E.T., Mulder, A.B., Wiener, S.I., 2000. Position and behavioral modulation ofsynchronization of hippocampal and accumbens neuronal discharges in freelymoving rats. Hippocampus 10, 717–728.

Taha, S.A., Nicola, S.M., Fields, H.L., 2007. Cue-evoked encoding of movementplanning and execution in the rat nucleus accumbens. J. Physiol. 584, 801–818.

Taube, J.S., Muller, R.U., Ranck Jr., J.B., 1990. Head-direction cells recorded from thepostsubiculum in freely moving rats. I. Description and quantitative analysis. J.Neurosci. 10, 420–435.

Terrazas, A., Krause, M., Lipa, P., Gothard, K.M., Barnes, C.A., McNaughton, B.L., 2005.Self-motion and the hippocampal spatial metric. J. Neurosci. 25, 8085–8096.

Thorn, C.A., Atallah, H., Howe, M., Graybiel, A.M., 2010. Differential dynamics ofactivity changes in dorsolateral and dorsomedial striatal loops during learning.Neuron 66, 781–795.

Thorn, C.A., Graybiel, A.M., 2010. Pausing to regroup: thalamic gating of cortico-basal ganglia networks. Neuron 67, 175–178.

Tobler, P.N., Dickinson, A., Schultz, W., 2003. Coding of predicted reward omissionby dopamine neurons in a conditioned inhibition paradigm. J. Neurosci. 23,10402–10410.

Tobler, P.N., Fiorillo, C.D., Schultz, W., 2005. Adaptive coding of reward value bydopamine neurons. Science 307, 1642–1645.

Tolman, E.C., 1930. Maze performance a function of motivation and of reward aswell as knowledge of the maze paths. J. Gen. Psychol. 4, 338–342.

Tolman, E.C., 1938. The determiners of behavior at a choice point. Psychol. Rev. 46,318–336.

Tolman, E.C., 1939. Prediction of vicarious trial and error by means of the schematicsowbug. Psychol. Rev. 46, 318–336.

Tolman, E.C., 1948. Cognitive maps in rats and men. Psychol. Rev. 55, 189–208.Totterdell, S., Meredith, G.E., 1997. Topographical organization of projections from

the entorhinal cortex to the striatum of the rat. Neuroscience 78, 715–729.Touretzky, D.S., Redish, A.D., 1996. Theory of rodent navigation based on interacting

representations of space. Hippocampus 6, 247–270.Tremblay, P.L., Bedard, M.A., Langlois, D., Blanchet, P.J., Lemay, M., Parent, M.,

2010. Movement chunking during sequence learning is a dopamine-depen-dant process: a study conducted in Parkinson’s disease. Exp. Brain Res. 205,375–385.

Tremblay, P.L., Bedard, M.A., Levesque, M., Chebli, M., Parent, M., Courtemanche,R., Blanchet, P.J., 2009. Motor sequence learning in primate: role of the D2receptor in movement chunking during consolidation. Behav. Brain Res. 198,231–239.

Treves, A., 2004. Computational constraints between retrieving the past and pre-dicting the future, and the CA3–CA1 differentiation. Hippocampus 14, 539–556.

Tse, D., Langston, R.F., Kakeyama, M., Bethus, I., Spooner, P.A., Wood, E.R., Witter,M.P., Morris, R.G., 2007. Schemas and memory consolidation. Science 316, 76–82.

Tulving, E., 2002. Episodic memory: from mind to brain. Annu. Rev. Psychol. 53, 1–25.

Usiello, A., Sargolini, F., Roullet, P., Ammassari-Teule, M., Passino, E., Oliverio, A.,Mele, A., 1998. N-methyl-D-aspartate receptors in the nucleus accumbens areinvolved in detection of spatial novelty in mice. Psychopharmacology (Berl)137, 175–183.

Usuda, I., Tanaka, K., Chiba, T., 1998. Efferent projections of the nucleus accumbensin the rat with special reference to subdivision of the nucleus: biotinylateddextran amine study. Brain Res. 797, 73–93.

Van Cauter, T., Poucet, B., Save, E., 2008. Unstable CA1 place cell representation inrats with entorhinal cortex lesions. Eur. J. Neurosci. 27, 1933–1946.

Van den Bercken, J.H., Cools, A.R., 1982. Evidence for a role of the caudate nucleus inthe sequential organization of behavior. Behav. Brain Res. 4, 319–327.

van den Bos, R., Lasthuis, W., den Heijer, E., van der Harst, J., Spruijt, B., 2006. Towarda rodent model of the Iowa gambling task. Behav. Res. Methods 38, 470–478.

van der Meer, M.A., Redish, A.D., 2011. Ventral striatum: a critical look at models oflearning and evaluation. Curr. Opin. Neurobiol. 21, 387–392.

van der Meer, M.A., Johnson, A., Schmitzer-Torbert, N.C., Redish, A.D., 2010. Tripledissociation of information processing in dorsal striatum, ventral striatum, andhippocampus on a learned spatial decision task. Neuron 67, 25–32.

van der Meer, M.A., Redish, A.D., 2009. Low and high gamma oscillations in ratventral striatum have distinct relationships to behavior, reward, and spikingactivity on a learned spatial decision task. Front. Integr. Neurosci. 3, 9.

van der Meer, M.A., Redish, A.D., 2010. Expectancies in decision making, reinforce-ment learning, and ventral striatum. Front. Neurosci. 4, 6.

van Dongen, Y.C., Deniau, J.M., Pennartz, C.M., Galis-de Graaf, Y., Voorn, P., Thierry,A.M., Groenewegen, H.J., 2005. Anatomical evidence for direct connectionsbetween the shell and core subregions of the rat nucleus accumbens. Neuro-science 136, 1049–1071.

van Groen, T., Wyss, J.M., 1990. Extrinsic projections from area CA1 of the rathippocampus: olfactory, cortical, subcortical, and bilateral hippocampal for-mation projections. J. Comp. Neurol. 302, 515–528.

Van Strien, N.M., Cappaert, N.L., Witter, M.P., 2009. The anatomy of memory: aninteractive overview of the parahippocampal–hippocampal network. Nat. Rev.Neurosci. 10, 272–282.

Varela, F., Lachaux, J.P., Rodriguez, E., Martinerie, J., 2001. The brainweb: phasesynchronization and large-scale integration. Nat. Rev. Neurosci. 2, 229–239.

Vida, I., Bartos, M., Jonas, P., 2006. Shunting inhibition improves robustness ofgamma oscillations in hippocampal interneuron networks by homogenizingfiring rates. Neuron 49, 107–117.

Vinogradova, O.S., 1995. Expression, control, and probable functional significance ofthe neuronal theta-rhythm. Prog. Neurobiol. 45, 523–583.

Voorn, P., Vanderschuren, L.J., Groenewegen, H.J., Robbins, T.W., Pennartz, C.M.,2004. Putting a spin on the dorsal–ventral divide of the striatum. TrendsNeurosci. 27, 468–474.

Waddington, K.D., Holden, L.R., 1979. Optimal foraging-flower selection by bees.Am. Nat. 114, 179–196.

Waelti, P., Dickinson, A., Schultz, W., 2001. Dopamine responses comply with basicassumptions of formal learning theory. Nature 412, 43–48.

Wakabayashi, K.T., Fields, H.L., Nicola, S.M., 2004. Dissociation of the role of nucleusaccumbens dopamine in responding to reward-predictive cues and waiting forreward. Behav. Brain Res. 154, 19–30.

Wall, V.Z., Parker, J.G., Fadok, J.P., Darvas, M., Zweifel, L., Palmiter, R.D., 2011. Abehavioral genetics approach to understanding D1 receptor involvement inphasic dopamine signaling. Mol. Cell. Neurosci. 46, 21–31.

Walton, M.E., Bannerman, D.M., Rushworth, M.F., 2002. The role of rat medialfrontal cortex in effort-based decision making. J. Neurosci. 22, 10996–11003.

Walton, M.E., Kennerley, S.W., Bannerman, D.M., Phillips, P.E., Rushworth, M.F.,2006. Weighing up the benefits of work: behavioral and neural analyses ofeffort-related decision making. Neural Netw. 19, 1302–1314.

Wanat, M.J., Kuhnen, C.M., Phillips, P.E., 2010. Delays conferred by escalating costsmodulate dopamine release to rewards but not their predictors. J. Neurosci. 30,12020–12027.

Wang, H.L., Morales, M., 2009. Pedunculopontine and laterodorsal tegmental nucleicontain distinct populations of cholinergic, glutamatergic and GABAergic neu-rons in the rat. Eur. J. Neurosci. 29, 340–358.

Wang, S.H., Morris, R.G., 2010. Hippocampal–neocortical interactions in memoryformation, consolidation, and reconsolidation. Annu. Rev. Psychol. 61 (49-79),C41–C44.

Watson, J.B., 1907. Kinaesthetic and organic sensations: their role in the reactions ofthe white rat. Psychol. Rev. Monogr. (Suppl. (8)) number 2.

Whishaw, I.Q., Gorny, B., 1999. Path integration absent in scent-tracking fimbria-fornix rats: evidence for hippocampal involvement in ‘‘sense of direction’’ and‘‘sense of distance’’ using self-movement cues. J. Neurosci. 19, 4662–4673.

Whishaw, I.Q., Mittleman, G., Bunch, S.T., Dunnett, S.B., 1987. Impairments in theacquisition, retention and selection of spatial navigation strategies after medialcaudate-putamen lesions in rats. Behav. Brain Res. 24, 125–138.

White, I.M., Rebec, G.V., 1993. Responses of rat striatal neurons during performanceof a lever-release version of the conditioned avoidance response task. Brain Res.616, 71–82.

Whittington, M.A., Traub, R.D., Jefferys, J.G., 1995. Synchronized oscillations ininterneuron networks driven by metabotropic glutamate receptor activation.Nature 373, 612–615.

Page 40: Progress in Neurobiologydepts.washington.edu/mizulab/sites/default/files/pdfs... · 2011-12-14 · approach to the study of naturally occurring, complex behaviors. 3. Laboratory tasks

M.R. Penner, S.J.Y. Mizumori / Progress in Neurobiology 96 (2012) 96–135 135

Wickens, J.R., Budd, C.S., Hyland, B.I., Arbuthnott, G.W., 2007a. Striatal contributionsto reward and decision making: making sense of regional variations in areiterated processing matrix. Ann. N. Y. Acad. Sci. 1104, 192–212.

Wickens, J.R., Horvitz, J.C., Costa, R.M., Killcross, S., 2007b. Dopaminergic mecha-nisms in actions and habits. J. Neurosci. 27, 8181–8183.

Wiener, S.I., 1993. Spatial and behavioral correlates of striatal neurons in ratsperforming a self-initiated navigation task. J. Neurosci. 13, 3802–3817.

Wiener, S.I., 1996. Spatial, behavioral and sensory correlates of hippocampal CA1complex spike cell activity: implications for information processing functions.Prog. Neurobiol. 49, 335–361.

Wiener, S.I., Korshunov, V.A., Garcia, R., Berthoz, A., 1995. Inertial, substratal andlandmark cue control of hippocampal CA1 place cell activity. Eur. J. Neurosci. 7,2206–2219.

Wiener, S.I., Paul, C.A., Eichenbaum, H., 1989. Spatial and behavioral correlates ofhippocampal neuronal activity. J. Neurosci. 9, 2737–2763.

Wightman, R.M., Robinson, D.L., 2002. Transient changes in mesolimbic dopamineand their association with ‘reward’. J. Neurochem. 82, 721–735.

Wilcove, W.G., Miller, J.C., 1974. CS-USC presentations and a lever: human auto-shaping. J. Exp. Psychol. 103, 868–877.

Williams, D.R., Williams, H., 1969. Auto-maintenance in the pigeon: sustainedpecking despite contingent non-reinforcement. J. Exp. Anal. Behav. 12, 511–520.

Williams, S., Mmbaga, N., Chirwa, S., 2006. Dopaminergic D1 receptor agonist SKF38393 induces GAP-43 expression and long-term potentiation in hippocampusin vivo. Neurosci. Lett. 402, 46–50.

Williams, Z.M., Eskandar, E.N., 2006. Selective enhancement of associative learningby microstimulation of the anterior caudate. Nat. Neurosci. 9, 562–568.

Wills, T.J., Cacucci, F., Burgess, N., O’Keefe, J., 2010. Development of the hippocampalcognitive map in preweanling rats. Science 328, 1573–1576.

Wilson, D.I., Bowman, E.M., 2005. Rat nucleus accumbens neurons predominantlyrespond to the outcome-related properties of conditioned stimuli rather thantheir behavioral-switching properties. J. Neurophysiol. 94, 49–61.

Wilson, D.I., MacLaren, D.A., Winn, P., 2009. Bar pressing for food: differentialconsequences of lesions to the anterior versus posterior pedunculopontine. Eur.J. Neurosci. 30, 504–513.

Wilson, M.A., McNaughton, B.L., 1993. Dynamics of the hippocampal ensemble codefor space. Science 261, 1055–1058.

Wilson, M.A., McNaughton, B.L., 1994. Reactivation of hippocampal ensemblememories during sleep. Science 265, 676–679.

Winn, P., 2006. How best to consider the structure and function of the peduncu-lopontine tegmental nucleus: evidence from animal studies. J. Neurol. Sci. 248,234–250.

Wise, R.A., 2004. Dopamine, learning and motivation. Nat. Rev. Neurosci. 5, 483–494.

Wise, R.A., 2005. Forebrain substrates of reward and motivation. J. Comp. Neurol.493, 115–121.

Wise, R.A., 2006. Role of brain dopamine in food reward and reinforcement. Philos.Trans. R. Soc. Lond. B: Biol. Sci. 361, 1149–1158.

Wise, R.A., 2009. Roles for nigrostriatal—not just mesocorticolimbic—dopamine inreward and addiction. Trends Neurosci. 32, 517–524.

Wisman, L.A., Sahin, G., Maingay, M., Leanza, G., Kirik, D., 2008. Functional conver-gence of dopaminergic and cholinergic input is critical for hippocampus-dependent working memory. J. Neurosci. 28, 7797–7807.

Witter, M.P., Naber, P.A., van Haeften, T., Machielsen, W.C., Rombouts, S.A., Barkhof,F., Scheltens, P., Lopes da Silva, F.H., 2000. Cortico-hippocampal communicationby way of parallel parahippocampal-subicular pathways. Hippocampus 10,398–410.

Wolterink, G., Phillips, G., Cador, M., Donselaar-Wolterink, I., Robbins, T.W., Everitt,B.J., 1993. Relative roles of ventral striatal D1 and D2 dopamine receptors inresponding with conditioned reinforcement. Psychopharmacology (Berl) 110,355–364.

Womelsdorf, T., Fries, P., Mitra, P.P., Desimone, R., 2006. Gamma-band synchro-nization in visual cortex predicts speed of change detection. Nature 439, 733–736.

Womelsdorf, T., Schoffelen, J.M., Oostenveld, R., Singer, W., Desimone, R., Engel, A.K.,Fries, P., 2007. Modulation of neuronal interactions through neuronal synchro-nization. Science 316, 1609–1612.

Wood, E.R., Dudchenko, P.A., Robitsek, R.J., Eichenbaum, H., 2000. Hippocampalneurons encode information about different types of memory episodes occur-ring in the same location. Neuron 27, 623–633.

Woolf, N.J., 1991. Cholinergic systems in mammalian brain and spinal cord. Prog.Neurobiol. 37, 475–524.

Worden, L.T., Shahriari, M., Farrar, A.M., Sink, K.S., Hockemeyer, J., Muller, C.E.,Salamone, J.D., 2009. The adenosine A2A antagonist MSX-3 reverses the effort-related effects of dopamine blockade: differential interaction with D1 and D2family antagonists. Psychopharmacology (Berl) 203, 489–499.

Wright, C.I., Beijer, A.V., Groenewegen, H.J., 1996. Basal amygdaloid complexafferents to the rat nucleus accumbens are compartmentally organized. J.Neurosci. 16, 1877–1893.

Xi, Z.X., Stein, E.A., 1998. Nucleus accumbens dopamine release modulation bymesolimbic GABAA receptors—an in vivo electrochemical study. Brain Res. 798,156–165.

Yeshenko, O., Guazzelli, A., Mizumori, S.J., 2004. Context-dependent reorganizationof spatial and movement representations by simultaneously recorded hippo-campal and striatal neurons during performance of allocentric and egocentrictasks. Behav. Neurosci. 118, 751–769.

Yin, H.H., 2010. The sensorimotor striatum is necessary for serial order learning. J.Neurosci. 30, 14719–14723.

Yin, H.H., Knowlton, B.J., 2004. Contributions of striatal subregions to place andresponse learning. Learn. Mem. 11, 459–463.

Yin, H.H., Knowlton, B.J., 2006. The role of the basal ganglia in habit formation. Nat.Rev. Neurosci. 7, 464–476.

Yin, H.H., Knowlton, B.J., Balleine, B.W., 2004. Lesions of dorsolateral striatumpreserve outcome expectancy but disrupt habit formation in instrumentallearning. Eur. J. Neurosci. 19, 181–189.

Yin, H.H., Knowlton, B.J., Balleine, B.W., 2006. Inactivation of dorsolateral striatumenhances sensitivity to changes in the action-outcome contingency in instru-mental conditioning. Behav. Brain Res. 166, 189–196.

Yin, H.H., Mulcare, S.P., Hilario, M.R., Clouse, E., Holloway, T., Davis, M.I., Hansson,A.C., Lovinger, D.M., Costa, R.M., 2009. Dynamic reorganization of striatalcircuits during the acquisition and consolidation of a skill. Nat. Neurosci. 12,333–341.

Yin, H.H., Ostlund, S.B., Balleine, B.W., 2008. Reward-guided learning beyonddopamine in the nucleus accumbens: the integrative functions of cortico-basalganglia networks. Eur. J. Neurosci. 28, 1437–1448.

Yin, H.H., Ostlund, S.B., Knowlton, B.J., Balleine, B.W., 2005. The role of the dor-somedial striatum in instrumental conditioning. Eur. J. Neurosci. 22, 513–523.

Zahm, D.S., 1999. Functional–anatomical implications of the nucleus accumbenscore and shell subterritories. Ann. N. Y. Acad. Sci. 877, 113–128.

Zahm, D.S., 2000. An integrative neuroanatomical perspective on some subcorticalsubstrates of adaptive responding with emphasis on the nucleus accumbens.Neurosci. Biobehav. Rev. 24, 85–105.

Zahm, D.S., Brog, J.S., 1992. On the significance of subterritories in the ‘‘accumbens’’part of the rat ventral striatum. Neuroscience 50, 751–767.

Zahm, D.S., Heimer, L., 1990. Two transpallidal pathways originating in the ratnucleus accumbens. J. Comp. Neurol. 302, 437–446.

Zahm, D.S., Heimer, L., 1993. Specificity in the efferent projections of the nucleusaccumbens in the rat: comparison of the rostral pole projection patterns withthose of the core and shell. J. Comp. Neurol. 327, 220–232.

Zahm, D.S., Williams, E., Wohltmann, C., 1996. Ventral striatopallidothalamicprojection: IV. Relative involvements of neurochemically distinct subterritoriesin the ventral pallidum and adjacent parts of the rostroventral forebrain. J.Comp. Neurol. 364, 340–362.

Zhang, L., Doyon, W.M., Clark, J.J., Phillips, P.E., Dani, J.A., 2009. Controls of tonic andphasic dopamine transmission in the dorsal and ventral striatum. Mol. Phar-macol. 76, 396–404.

Zhou, L., Furuta, T., Kaneko, T., 2003. Chemical organization of projection neurons inthe rat accumbens nucleus and olfactory tubercle. Neuroscience 120, 783–798.

Zugaro, M.B., Monconduit, L., Buzsaki, G., 2005. Spike phase precession persists aftertransient intrahippocampal perturbation. Nat. Neurosci. 8, 67–71.

Zweifel, L., Fadok, J.P., Argilli, E., Garelick, M.G., Jones, G.L., Dickerson, T.M.K., Allens,J.M., Mizumori, S.J.Y., Bonci, A., Palmiter, R., 2011. Activation of dopamineneurons is critical for aversive conditioning and prevention of generalizedanxiety. Nat. Neurosci. 14, 620–626.