DESCRIPTION: State the application’s broad, long-term...

58
Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H. INTRODUCTION Overview. This application is a first resubmission. The original application proposed three inter-related subprojects concerning modeling and neuroimaging of (i) the brain regions involved in speech sound sequence generation, (ii) the neural processes underlying the learning of new sound sequences, and (iii) problems with this sequencing circuitry that may underlie stuttering. The reviewers of the original application generally agreed that the proposed research was highly innovative, was of high potential significance, and was theoretically well-motivated. We have therefore left the Background and Significance intact, as well as the theoretical description of the proposed model and the theoretical motivation for the fMRI experiments (with the exception of the removal of the stuttering component of the project, as described below). The reviewers’ primary concerns were that the application was too ambitious, and that (perhaps as a result of the large scope) important details concerning the experiments were lacking, as were descriptions of how the model simulations would be compared to the experimental results. Our revisions have therefore focused on improvements to these aspects of the proposal, as detailed in the following paragraphs. Changes to the application text are indicated with vertical bars in the left margin of the Research Plan. Project was too ambitious as originally proposed. The reviewers generally felt that the project as originally proposed was too ambitious. Reviewer 3 explicitly referred to the stuttering work, which formed over 1/3 of the original research plan, as overextending the project. To address these concerns, we have removed the stuttering component of the proposed project and used the resulting space to expand the remainder of the research design and methods. Removal of the stuttering component also addresses the concern of Reviewers 1 and 3 that this component may be confounded by treatment history and/or compensatory strategies of the stuttering subjects. Removal of the stuttering component included removal of three fMRI experiments and three modeling projects. Simulating a BOLD signal from the model and comparing model activations to fMRI results. The reviewers generally felt that the description of how the model simulations will be compared to the results of fMRI experiments was not sufficiently detailed. To address this concern, we have added a section entitled “Generating simulated fMRI activations from model simulations” (Section C.2), in which we detail how we simulate fMRI data from the models. Our method is based on the most recent results concerning the relationship between neural activity and the blood oxygen level dependent (BOLD) signal measured with fMRI, as detailed in this section. The section also includes a treatment of how inhibitory neurons are modeled in terms of BOLD effects, thus addressing Reviewer 3’s concern about this issue. Additional text describing how we will compare the model fMRI activations to the results of our fMRI experiments has been added after the descriptions of each of the fMRI experiments. fMRI power analysis. Reviewer 2 expressed concern regarding how many subjects would be needed to obtain significant results in our fMRI experiments. To address this concern, we have added a subsection entitled “fMRI power analysis” at the beginning of section D that justifies the subject population sizes proposed in the fMRI experiments. In this analysis, we have considered the possibility that many trials may contain production errors such as insertions of extra phonemes, etc. (as pointed out by Reviewers 2 and 3). When determining the number of subjects needed to obtain significant results, we have very conservatively assumed that as many as 30% of the trials in the experiments in Section D.1 and 50% of the trials in the experiments in Section D.2 may need to be removed from the analysis due to such errors. These assumed error rates are much higher than those obtained in our previous fMRI experiment on sound sequencing described in Section C.4 (which had an average error rate of 14.4% for the most difficult utterances) and our pilot studies for the learning PHS 398 (Rev. 09/04) Page> 14

Transcript of DESCRIPTION: State the application’s broad, long-term...

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.INTRODUCTION

Overview. This application is a first resubmission. The original application proposed three inter-related sub-projects concerning modeling and neuroimaging of (i) the brain regions involved in speech sound sequence generation, (ii) the neural processes underlying the learning of new sound sequences, and (iii) problems with this sequencing circuitry that may underlie stuttering. The reviewers of the original application generally agreed that the proposed research was highly innovative, was of high potential significance, and was theoret -ically well-motivated. We have therefore left the Background and Significance intact, as well as the theoretical description of the proposed model and the theoretical motivation for the fMRI experiments (with the exception of the removal of the stuttering component of the project, as described below). The reviewers’ primary con-cerns were that the application was too ambitious, and that (perhaps as a result of the large scope) important details concerning the experiments were lacking, as were descriptions of how the model simulations would be compared to the experimental results. Our revisions have therefore focused on improvements to these as-pects of the proposal, as detailed in the following paragraphs. Changes to the application text are indicated with vertical bars in the left margin of the Research Plan.

Project was too ambitious as originally proposed. The reviewers generally felt that the project as originally proposed was too ambitious. Reviewer 3 explicitly referred to the stuttering work, which formed over 1/3 of the original research plan, as overextending the project. To address these concerns, we have removed the stuttering component of the proposed project and used the resulting space to expand the remainder of the re -search design and methods. Removal of the stuttering component also addresses the concern of Reviewers 1 and 3 that this component may be confounded by treatment history and/or compensatory strategies of the stuttering subjects. Removal of the stuttering component included removal of three fMRI experiments and three modeling projects.

Simulating a BOLD signal from the model and comparing model activations to fMRI results. The re-viewers generally felt that the description of how the model simulations will be compared to the results of fMRI experiments was not sufficiently detailed. To address this concern, we have added a section entitled “Gener-ating simulated fMRI activations from model simulations” (Section C.2), in which we detail how we simulate fMRI data from the models. Our method is based on the most recent results concerning the relationship be -tween neural activity and the blood oxygen level dependent (BOLD) signal measured with fMRI, as detailed in this section. The section also includes a treatment of how inhibitory neurons are modeled in terms of BOLD effects, thus addressing Reviewer 3’s concern about this issue. Additional text describing how we will com-pare the model fMRI activations to the results of our fMRI experiments has been added after the descriptions of each of the fMRI experiments.

fMRI power analysis. Reviewer 2 expressed concern regarding how many subjects would be needed to ob-tain significant results in our fMRI experiments. To address this concern, we have added a subsection entitled “fMRI power analysis” at the beginning of section D that justifies the subject population sizes proposed in the fMRI experiments. In this analysis, we have considered the possibility that many trials may contain production errors such as insertions of extra phonemes, etc. (as pointed out by Reviewers 2 and 3). When determining the number of subjects needed to obtain significant results, we have very conservatively assumed that as many as 30% of the trials in the experiments in Section D.1 and 50% of the trials in the experiments in Sec-tion D.2 may need to be removed from the analysis due to such errors. These assumed error rates are much higher than those obtained in our previous fMRI experiment on sound sequencing described in Section C.4 (which had an average error rate of 14.4% for the most difficult utterances) and our pilot studies for the learn -ing experiments in D.2 (which had an average error rate of 21%).

Effective connectivity analyses. Reviewer 1 noted that effective connectivity analyses should be more em-phasized in the proposal since they may provide a valuable means for testing model predictions. We have accordingly added a sub-section called “Effective connectivity analysis” in Section D of the proposal. We have also added more explicit descriptions of how effective connectivity analysis will be used to test predictions in the fMRI experiments.

Hypothesis tests. The reviewers also felt that descriptions of how specific hypotheses would be tested were lacking. We have thus added text explicitly stating the hypotheses to be tested and the manner in which they will be tested. To address reviewer concerns that it was not clear what we would conclude if our hypotheses are not supported, we have added descriptions of alternative interpretations. It is important to note that, al -though most of the hypotheses to be tested are embodied by our proposed neural model, they are not simply our view, but instead reflect current hypotheses proposed by many other researchers studying non-speech

PHS 398 (Rev. 09/04) Page> 14

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.motor control, as detailed in the theoretical background portions of Section D. Indeed, these hypotheses were the primary forces that shaped the proposed model. Thus our experiments not only test our particular model, but they also test whether these non-speech motor control theories, which generally arise from the animal lit -erature, generalize to the neural processes underlying speech production in humans. If the hypotheses are supported by our experimental results, then we have gained important information regarding similarities be-tween the brain mechanisms underlying speech in humans and non-speech motor behaviors in animals. If the hypotheses are not supported, we have gained equally important information regarding how the neural bases of speech motor control differ from other forms of motor control.

Potential problems with the fMRI experiments of novel sound sequence learning. Two reviewers ex-pressed concern regarding the design of fMRI experiments involving novel speech sound sequence learning in Section D.2, in particular the experiment involving generation of novel sub-syllabic sequences that are phonotactically illegal or highly unlikely in English. Specifically, the reviewers pointed out that subjects might insert additional phonemes or change some consonant clusters into more familiar forms, which would make these utterances easier to produce. Reviewer 1 also noted that the original experiment, which involved learn -ing within a single fMRI scanning session, would necessarily be limited to very early stages of learning. To address these concerns, we have modified the design of the fMRI experiments in Section D.2. The experi -ments now involve learning in two training sessions that take place on different days, performed before the fMRI experiment. Furthermore, only subjects that show significant learning over the training sessions (as measured by improvements in error rate, duration, and reaction time) will be used in the corresponding fMRI experiment. Finally, all trials in the fMRI experiment will be checked for production errors, and any trials con-taining errors will be removed from subsequent data analysis.

Timing issues. Reviewer 3 stated that it was not clear how much priority is being given to modeling timing. We have provided text clarifying this in several places. The models to be developed will make systematic pre-dictions regarding latencies and other aspects of the timing of speech sequences. In particular, we have em-phasized that the modeling framework adopted (competitive queuing, CQ) is the only one that has been shown able to explain not just latency patterns for correct performances but also the latencies of errors in a recent comparison of four classes of sequencing models, conducted by Farrell & Lewandowsky (2004).

Computational framework. Reviewer 3 also expressed concern about missing details regarding software, hardware, etc. used to implement the proposed computational model. A sub-section called “Computational framework” has been added to the beginning of Section D to address this concern.

Miscellaneous concerns. Two reviewers were concerned with potential confounds in the jaw clench condi-tion in fMRI Experiment 1 in Section D.1. This condition has been removed in the revised experiment.

.

PHS 398 (Rev. 09/04) Page> 15

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.A. SPECIFIC AIMSThe primary aim of this project is to develop and experimentally test a neural model of the brain interactions underlying the production of speech sound sequences. In particular, we will focus on several brain regions thought to be involved in motor sequence production, including the lateral prefrontal cortex, lateral premotor cortex, supplementary motor area (SMA), pre-SMA, and associated subcortical structures (basal ganglia, cerebellum, and thalamus). Each of these brain regions will be modeled mathematically with equations gov -erning neuron activities, and the interactions between the regions will be modeled with equations governing synaptic strengths. The resulting model will be implemented in computer software and integrated with an ex -isting neural model of speech sound production, the DIVA1 model (Guenther, 1994, 1995; Guenther et al., 1998, in press), to allow generation of simulated articulator movements (along with corresponding acoustic signals) for producing speech sound sequences. The results of these computer simulations will be compared to existing behavioral and functional neuroimaging data to guide model development. We also propose 4 new functional magnetic resonance imaging (fMRI) experiments designed to test key hypotheses of the model, to test between the model and competing hypotheses, and to fill in gaps in the existing neuroimaging literature.

The project is divided into two separate but highly integrated subprojects whose aims are as follows:(1) Creating and testing a neural model of speech sequence production. The primary aim of this subpro-ject is to develop a model of the neural circuits involved in the properly ordered and properly timed production of speech sound sequences, such as a sequence of phonemes, syllables, and words making up a sentence. The model will be implemented in computer software and integrated with the DIVA model. The DIVA model describes the brain mechanisms responsible for producing individual speech sounds, whereas the model de-veloped in this proposal describes the “higher-level” brain mechanisms involved in representing a sequence of sounds and determining which sound in the sequence to produce next (sequencing), as well as when to produce it (initiation). Thus the output of the current model essentially acts as input to the DIVA model, which then commands the sequence of articulator movements needed to produce each sound. We also pro-pose two speech production fMRI experiments which test key hypotheses of our preliminary model: (i) the processing of syllable “frames” by the SMA/pre-SMA and “content” by lateral premotor areas, and (ii) the exis-tence of a working memory representation for speech sound sequences in the inferior frontal sulcus. Simula -tions of the model performing the same speech tasks as subjects in the fMRI experiments will be run, and the results of these simulations will be compared to results from the fMRI studies to test key aspects of the model.(2) Investigating the learning of new speech sequences. The primary aim of this subproject is to further develop the model created in Subproject 1 to incorporate the effects of practice on the neural circuits underly -ing speech sound sequence generation. In two modeling projects, we will model learning effects as changes in synaptic strengths in two subcortical structures: the basal ganglia and the cerebellum. This work will be guided by the existing literature on learning of motor sequences, and we propose two behavioral experiments and two corresponding fMRI experiments to test hypotheses concerning learning to quickly and accurately produce novel supra-syllabic sequences (multi-syllable utterances involving novel combinations of known syllables) and sub-syllabic sequences (new syllables consisting of infrequent combinations of phonemes).

We believe our integrated approach of computational neural modeling and functional brain imaging will provide a clearer, more mechanistic account of the neural processes underlying speech production in nor-mal speakers and individuals with disorders affecting speech sound initiation and sequencing. In the long term, we believe this improved understanding will aid in developing better treatments for these disorders. B. BACKGROUND AND SIGNIFICANCECombining neural models and functional brain imaging to understand the neural bases of speech. Re-cent years have witnessed a large number of functional brain imaging experiments studying speech and lan-guage, and much has been learned from these studies regarding the brain mechanisms underlying speech and its disorders. For example, functional magnetic resonance imaging (fMRI) studies have identified the cor-tical and subcortical areas involved in simple speech tasks (e.g., Hickok et al., 2000; Riecker et al, 2000a,b; Wise et al, 1999; Wildgruber et al., 2001) as well as more complex language tasks (e.g., Dapretto & Bookheimer, 1999; Kerns et al., 2004; Vingerhoets et al., 2003). However these imaging experiments do not, by themselves, answer the question of what important function, if any, a particular brain region may play in speech. For example, activity in the anterior insula has been identified in numerous speech neuroimaging studies (Wise et al, 1999; Hickok et al., 2000; Riecker et al., 2000a), but much controversy still exists con-cerning its particular role in the neural control of speech (Dronkers, 1996; Ackermann & Riecker 2004; Hillis et al, 2004; Shuster & Lemieux, 2005). A better understanding of the exact roles of different brain regions in speech requires the formulation of computational neural models whose components model the computa-

1 DIVA stands for Directions Into Velocities of Articulators, which describes a central aspect of the model’s control scheme.

PHS 398 (Rev. 09/04) Page> 16

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.tions performed by individual brain regions, as well as the interactions between these regions (see Horwitz & Braun, 2004; Husain et al., 2004; Tagamets & Horwitz, 1997; Fagg & Arbib, 1998 for other examples of this approach). In the past decade we have developed one such model of speech production called the DIVA model (Guenther, 1994, 1995; Guenther et al., 1998, in press). The model’s components correspond to re -gions of the cerebral cortex and cerebellum, and they consist of modeled neurons whose activities during speech tasks can be measured and compared to the brain activities of human subjects performing the same task (e.g., Guenther et al., in press, included in Appendix materials for this application). These neurons are connected by adaptive synapses that become tuned during a babbling process as well as with continued practice with a speech sound. Computer simulations of the model have been shown to provide a unified ac-count for a wide range of observations concerning speech acquisition and production, including data concern-ing the kinematics and acoustics of speech movements (Callan et al., 2000; Guenther, 1994, 1995; Guenther et al., 1998, in press; Guenther and Ghosh, 2003; Perkell et al., 2000) as well as the brain activities underly -ing speech production (Guenther et al., in press). The DIVA model computes articulator movement trajecto-ries for producing individual speech sounds that are presented to it by the modeler. Importantly, however, the DIVA model does not model the brain regions and computations that are responsible for initiating and se-quencing of speech sounds2. Understanding the computations performed by these brain regions could pro-vide important insight into a number of important communication disorders that are characterized by their ap-parent malfunctions.

In this application we propose to develop a new model that identifies the neural computations underly -ing the initiation and sequencing of speech sounds during production. For clarity, we will refer to this model as the sequence model in this grant application. This model addresses several brain regions not treated in the DIVA model, including the supplementary and pre-supplementary motor areas, ventrolateral prefrontal cortex, and basal ganglia. These brain regions are believed to be involved in the initiation and sequencing of motor actions and appear to functional abnormally in several communication disorders (see Sequencing and initia-tion communication disorders below). Furthermore we propose fMRI studies of speech initiation and sequenc-ing designed to test and refine this neural model. The “neural” nature of the model’s components makes pos -sible direct comparisons between the model’s cell activities and the results of neuroimaging experiments. The resulting model will provide a computational account of normal speech mechanisms, and it will serve as a the-oretical framework for investigating communication disorders involving malfunctions in speech initiation and sequencing, thus aiding in the development of better treatments/prostheses for these disorders.Sequencing in motor control and language. A sine qua non of speech production is our ability to learn and perform many sequences defined over a relatively small set of elements. Behaviorist theories postulated that sequences are produced by sequential chaining, in which associative links allowed early responses in a se-quence to elicit later ones. Recurrent neural network models (e.g., Elman, 1995; Dominey, 1998; Beiser & Houk, 1998) proposed revisions to the associative chaining theory, hypothesizing that an entire series of se-quence-specific cognitive states must be learned to mediate any sequence recall. Although this type of recur -rent net allows more than one sequence to be learned over the same alphabet of elements, there is no basis for performance of novel sequences, learning is often unrealistically slow with poor temporal generalization (Henson et al., 1996; Page, 2000; Wang, 1996), and internal recall of a sequence remains an iterative se-quential operation. In contrast, competitive queuing (CQ) models allow performance of novel sequences (including reuse of the same alphabet of elements), rapid learning, and internal recall of a sequence repre -sentation as a parallel operation. Since Lashley (1951), behavioral evidence has accumulated (cf. Rhodes et al., 2004) to support the idea that parallel representation of elements constituting a sequence underlies much of our learned serial behavior. From speech and typing errors, Lashley inferred that there must be an active “co-temporal” representation of the items constituting a forthcoming sequence. He also inferred that item-item associative links may be unnecessary in, and even a hindrance to, the learning of many sequences defined over a small finite “alphabet”. Left unanswered were questions of mechanism: What is the nature of the paral-lel representation? How is the relative priority of simultaneously active item representations “tagged”? What limitations are inherent in this representation? What mechanisms convert the parallel representation to serial action? All four questions have been addressed, without any reliance on item-item associations, in various CQ models (Grossberg, 1978a,b; Houghton, 1990; Bullock & Rhodes, 2003). These neural network models postulate that a standing parallel representation of all the items constituting a planned sequence exists in a motor working memory prior to initiating performance of the first item. As explained in Section C.2, this paral -lel representation works in tandem with an iterated choice process to generate a sequential performance.

To date, CQ-compatible neural models have been used to account for data in many domains of learned serial behavior, including: eye movements (Grossberg & Kuperstein, 1986); recall of novel lists

2 If presented with a sequence of sounds, the DIVA model can produce them in the order presented. However this “se -quencing” is external to the model; a computer program simply tells the model to produce the first sound, then the second sound, etc.

PHS 398 (Rev. 09/04) Page> 17

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.(Boardman & Bullock, 1991; Page & Norris, 1998) and highly practiced lists (Rhodes & Bullock, 2002); cur-sive handwriting (Bullock et al., 1993); working memory storage of sequential inputs (Bradski et al., 1994); word recognition and recall (Grossberg, 1986; Hartley & Houghton, 1996; Gupta & MacWhinney, 1997); lan-guage production (Dell et al., 1997; Ward, 1994); and music learning and performance (Mannes, 1994; Page, 1999). The stature of CQ as a neurobiological model has grown steadily due to an accumulation of directly pertinent neurophysiological observations (e.g., Averbeck et al., 2002, 2003; discussed in C.2). Section C.2 will summarize CQ theory and the evidence that led us to choose CQ circuitry as a core for a new model of sequential speech motor control. Communication disorders involving sequencing and/or initiation of speech sounds. A number of com-munication disorders, including aphasias, apraxia of speech (AOS), and stuttering, include deficits in the proper initiation and/or sequencing of speech sounds. Sequencing errors known as literal or phonemic para-phasias, in which “well-formed sounds or syllables are substituted or transposed in an otherwise recognizable target word” (Goodglass, 1993) exist in most aphasic patients, most commonly in conduction aphasics. Most of the common symptoms reported in AOS patients3 are selection and sequencing problems including articu-lation errors, phonemic mistakes, prosodic disturbances, difficulties initiating speech, and slowed speech (Mc-Neil and Doyle 2004; Dronkers 1996). Several brain areas have been implicated in AOS, including the left premotor cortex, Broca’s area, and the left anterior insula (Miller 2002). Models of speech production have largely been unable to inform the study of AOS because “theories of AOS encounter a dilemma in that they begin where the most powerful models of movement control end and end where most cognitive neurolinguis -tic models begin” (Ziegler 2002). The model proposed herein attempts to fill this gap between neurolinguistic models and movement control models.

Though different in many ways, stuttering, which affects approximately 1% of the adult population in the United States, shares with AOS the trait of improper initiation of speech motor programs without impair-ment of comprehension or damage to the peripheral speech neuromuscular system (Kent 2000; Dronkers 1996). Phenomenological, physiological, and psychological studies of developmental stuttering over the past several decades have produced a large body of data and spawned many theories regarding both its etiology (e.g., Geshwind & Galaburda, 1985; Starkweather, 1987; Travis, 1931; West, 1958) and its expression (e.g., Johnson & Knot, 1936; Mysak, 1960; Perkins et al., 1991; Postma & Kolk, 1993; Zimmerman, 1980). The ad-vent of structural and functional imaging technologies has provided a means to investigate the neural under-pinnings of developmental stuttering and greatly increased the available data while offering a new perspec -tive on the disorder. A proliferation of imaging studies prompted Peter Fox, whose group has conducted sev-eral imaging studies of stuttering (e.g. Fox et al., 1996, 2000), to describe the need for a model of the neural systems of speech, their breakdown in stuttering, and their normalization with treatment, in order to advance the study of developmental stuttering (Fox, 2003). The modeling work described here will provide a substrate for exploring the initiation and repetition behaviors in persons who stutter. Perhaps more importantly, it will provide a cohesive framework for the examination of available data and the assessment of theories of stutter-ing and other disorders (see also van der Merwe, 1997).

In addition to considering symptom-based diagnoses, it is important to consider the effects of lesions and pathological conditions involving particular brain regions on speech processes. Case studies in patients with lesions of the supplementary motor area (e.g. Jonas, 1981, 1987; Ziegler et al., 1997; Pai, 1999) or basal ganglia pathologies (e.g. Ho et al., 1998; Pickett et al., 1998) have shown that these areas provide spe -cific contributions to the sequencing and initiation of speech sounds. While pathological speech data are abundant, parsimonious explanations for differential syndromes remain elusive. Many authors have noted the importance of establishing well-specified models of normal and disordered speech to help provide differential diagnoses and treatment options for these conditions. In just this context, while describing the DIVA model of speech production, McNeil et al. (2004) write “While this model addresses phenomena that may be relevant in the differential diagnosis of motor speech disorders…in it’s current stage of development it has not been extended to make claims about the relationship between disrupted processing and speech errors in motor speech disorders” (p. 406). The work proposed here seeks to extend the DIVA model in this direction.C. PRELIMINARY STUDIESC.1. The DIVA model. Over the past decade our laboratory has developed and experimentally tested the DIVA model, a neural network model of the brain processes underlying speech acquisition and production (e.g., Guenther, 1994, 1995; Guenther et al., 1998; Guenther et al., in press). The model is able to produce speech sounds (including both articulator movements and a corresponding acoustic signal) by learning map-pings between articulator movements and their acoustic consequences, as well as auditory and somatosen-sory targets for individual speech sounds. It accounts for a number of speech production phenomena includ-3 Darley et al. (1975) describe AOS as a unique syndrome that affects motor speech production without diminished mus-cle strength. AOS has been associated with phoneme substitution errors similar to literal paraphasias (e.g. Wertz et al, 1984).

PHS 398 (Rev. 09/04) Page> 18

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.ing aspects of speech acquisition, coarticulation, contextual variability, motor equivalence, velocity/distance relationships, and speaking rate effects (see Guenther, 1995 and Guenther et al., 1998 in Appendix docu -ments). The latest version of the DIVA model is detailed in Guenther et al. (in press), in the Appendix docu -ments. Here we briefly describe the model with attention to aspects relevant for this proposal.

A schematic of the DIVA model is shown in Fig. 1. Each box in the diagram corresponds to a set of neurons in the model, and arrows correspond to synaptic projec-tions that form mappings from one type of neural repre-sentation to another. Several mappings in the network are tuned during a babbling phase in which semi-random articulator movements lead to auditory and somatosen-sory feedback; the model’s synaptic projections are ad-justed to encode sensory-motor relationships based on this combination of articulatory, auditory, and somatosen-sory information. The model posits additional forms of learning wherein (i) auditory targets for speech sounds are learned through exposure to the native language (Au-ditory Goal Region in Fig. 1), (ii) feedforward commands between premotor and motor cortical areas are learned during “practice” in which the model attempts to produce a learned sound (Feedforward Command), and (iii) so-matosensory targets for speech sounds are learned through practice (Somatosensory Goal Region).

In the model, production of a phoneme or syllable starts with activation of a Speech Sound Map cell corre-sponding to the sound to be produced. Speech sound map cells are hypothesized to lie in left lateral premotor cortex, specifically posterior Broca’s area (left Brodmann’s Area 44; ab-breviated as BA 44 herein). After the cell has been activated, signals project from the cell to the auditory and somatosensory cortical areas through tuned synapses that encode sensory ex-pectations for the sound, where they are compared to incoming sensory information. Any discrepancy between expected and actual sensory information constitutes a production error (Audi-tory Error and/or Somatosensory Error) which leads to correc-tive movements via projections from the sensory areas to the motor cortex (Auditory Feedback-based Command and So-matosensory Feedback-based Command). Additional synaptic projections from speech sound map cells to the motor cortex form a feedforward motor command; this command is tuned by monitoring the effects of the feedback control system and incor-porating corrective commands into the feedforward command.

Feedforward and feedback control signals are combined in the model’s motor cortex. Early in development the feedfor-ward command is inaccurate, and the model depends on feed-back control. Over time, however, the feedforward command becomes well tuned through monitoring of the movements con-trolled by the feedback subsystem. Once the feedforward sub-system is accurately tuned, the system can rely almost entirely on feedforward commands because no sen -sory errors are generated unless external perturbations are applied.

Of particular note for the current application are the BA 44 speech sound map cells that form the “in-put” to the DIVA model. In Section D we describe a new model of the sequencing and initiation of sound se -quences. Activation of the speech sound map cells in the DIVA model basically comprises the “output” of this new model, which will be integrated with the DIVA model to form a more complete description of the neural bases underlying speech. Also of note for the current application is the fact that the latest version of the DIVA model (Guenther et al., in press, in Appendix documents) incorporates realistic transmission delays between brain regions, including sensory feedback delays. This development has made it possible to more precisely simulate the timing of articulator movements during normal and perturbed speech. The results of these simu-lations closely approximate behavioral results (see Guenther et al., in press in Appendix). The modeling project proposed in Section D.1 extends this work to account for serial reaction time studies of sequence pro -

PHS 398 (Rev. 09/04) Page> 19

Fig. 2. Top: fMRI activation (white) measured during CV syllable production. [10 subjects; random effects analysis; p<0.001 uncorrected.] Bottom: Simulated fMRI activation when the DIVA model produces CV syllables. [For additional views, in color, see Guenther et al. (in press) in Appendix.]

fMRI Activations

Model Simulations

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.duction. Finally, a unique feature of the DIVA model is that each of the model’s components is associated with a particular neuroanatomical location based on the results of fMRI and PET studies of speech production and articulation (see Guenther et al., in press in Appendix documents for details). Since the model’s compo -nents correspond to groups of neurons, it is possible to generate simulated fMRI activations corresponding to model cell activities during a simulation (described further in Section C.2). Fig. 2 compares results from an fMRI study performed by our lab of single consonant-vowel (CV) syllable production to simulated fMRI data from the DIVA model in the same speech task. Comparison of the top and bottom panels of Fig. 2 indicates that the model accounts for most of the fMRI activations. The proposed research will extend this work to ac-count for additional fMRI activations that occur in more complex multi-syllabic speaking tasks.C.2 Generating simulated fMRI activations from model simulations. The relationship between the signal measured in blood oxygen level dependent (BOLD) fMRI and electrical activity of neurons has been studied by numerous investigators in recent years. It is well-known that the BOLD signal is relatively sluggish com-pared to electrical neural activity. That is, for a very brief burst of neural activity, the BOLD signal will begin to rise and continue rising well after the neural activity stops, peaking about 4-6 seconds after the neural activation burst before falling down somewhat below the starting level around 10-12 seconds after the neural burst and slowly rising back to the starting level. This hemodynamic response function (HRF) is schematized in the figure at right. We use such a response function, which is part of the SPM software package that we use for fMRI data analysis, to trans-form neural activities in our model cells into simulated fMRI activity. However there are different possible definitions of “neural activity”, and the exact nature of the neural activity that gives rise to the BOLD signal is still currently under debate (e.g., Caesar et al., 2003; Heeger et al., 2000; Logothetis et al., 2001; Logothetis and Pfeuffer, 2004; Rees et al., 2000; Tagamets and Horwitz, 2001).

In our modeling work, each model cell is hypothesized to correspond to a small population of neurons that fire together. The output of the cell corresponds to neural firing rate (i.e., the number of action potentials per second of the population of neurons). This output is sent to other cells in the network, where it is multi -plied by synaptic weights to form synaptic inputs to these cells. The activity level of a cell is calculated as the sum of all the synaptic inputs to the cell (both excitatory and inhibitory), and if the net activity is above zero, the cell’s output is proportional to the activity level. If the net activity is below zero, the cell’s output is zero. It has been shown that the magnitude of the BOLD signal typically scales proportionally with the average firing rate of the neurons in the region where the BOLD signal is mea-sured (e.g., Heeger et al., 2000; Rees et al., 2000). It has been noted elsewhere, however, that the BOLD signal actually correlates more closely with local field poten-tials, which are thought to arise pri-marily from averaged postsynaptic potentials (corresponding to the in-puts of neurons), than it does to the average firing rate of an area (Logothetis et al., 2001). In particu-lar, whereas the average firing rate may habituate down to zero with prolonged stimulation (greater than 2 sec), the local field potential and BOLD signal do not habituate com-pletely, maintaining non-zero steady state values with prolonged stimulation. In accord with this find-ing, the fMRI activations that we generate from our models are de-termined by convolving the total in-puts to our modeled neurons (i.e., the activity level as defined above), rather than the outputs4 (firing rates),

4 It is noteworthy, however, that total synaptic input correlates highly with firing rate, both physiologically and in our mod -els. Thus the two measures for estimating neural activity (firing rate vs. total input) are likely to produce similar results.

PHS 398 (Rev. 09/04) Page> 20

Fig. 4. Left: Locations of the cell types in the DIVA model. Right: Generating BOLD signals for a jaw-perturbed speech – unperturbed speech contrast. Left panels show cell activities (gray) and resulting BOLD signals (black) for two cell types in the model. The perturbation results in increased somatosensory error cell activation, but no increased auditory error cell activation. The corresponding BOLD signals are spatially smoothed and plotted on the standard single subject brain from the SPM software package (top right panel). The results of an fMRI experiment comparing perturbed to unperturbed speech are located in the bottom right panel.

Locations of DIVA Model Cells

fMRI Experimental Result

Model Simulation

Perturbed – Unperturbed Speech

Model Cell Activitiesand BOLD Response

Somatosensory Error Cells (S)

Auditory Error Cells ()

Time (s)

Bold

Signal

Neural

Actvity

Fig. 3. Typical hemodynamic response function (HRF).

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.with an idealized hemodynamic response function generated using default settings of the function ‘spm_hrf’ from the SPM toolbox (see Guenther et al., in press in Appendix for details).

In our models, an active inhibitory neuron has two effects on the BOLD signal: (i) the total input to the inhibitory neuron will have a positive effect on the local BOLD signal, and (ii) the output of the inhibitory neu -ron will act as an inhibitory input to excitatory neurons, thereby decreasing their summed input and, in turn, reducing the corresponding BOLD signal. Relatedly, it has been shown that inhibition to a neuron can cause a decrease in the firing rate of that neuron while at the same time causing an increase in cerebral blood flow, which is closely related to the BOLD signal (Caesar et al., 2003). Caesar et al. (2003) conclude that this cere-bral blood flow increase probably occurs as the result of excitation of inhibitory neurons, consistent with our model. They also note that the cerebral blood flow increase caused by combined excitatory and inhibitory in-puts is somewhat less than the sum of the increases to each input type alone; this is also consistent with our model since the increase in BOLD signal caused by the active inhibitory neurons is somewhat counteracted by the inhibitory effect of these neurons on the total input to excitatory neurons.

Figure 4 illustrates the process of generating fMRI activations from a model simulation and comparing the resulting activation to the results of an fMRI experiment designed to test a model prediction. The left panel of the figure illustrates the locations of the DIVA model components on the “standard” single subject brain from the SPM2 software package. The DIVA model predicts that unexpected perturbation of the jaw during speech will cause a mismatch between somatosensory targets and actual somatosensory inputs, causing activation of somatosensory error cells in higher-order somatosensory cortical areas in the supra-marginal gyrus of the inferior parietal cortex. The location of these cells is denoted by S in the left panel of the figure. Simulations of the DIVA model producing speech sounds with and without jaw perturbation were performed. The top middle panel indicates the neural activity (gray) of the somatosensory error signals in the perturbed condition minus activity in the unperturbed condition, along with the resulting BOLD signal (black). Since the somatosensory error cells are more active in the perturbed condition, a relatively large positive re -sponse is seen in the BOLD signal. Auditory error cells, on the other hand, show little differential activation in the two conditions since very little auditory error is created by the jaw perturbation (bottom middle panel), and thus the BOLD signal for the auditory error cells in the perturbed – unperturbed contrast is near zero. The de-rived BOLD signals are Gaussian smoothed spatially and plotted on the standard SPM brain in the top right panel. The bottom right panel shows the results of an fMRI study we carried out to compare perturbed and unperturbed speech (13 subjects, random effects analysis, false discovery rate = 0.05). In this case, the model correctly predicts the existence and location of somatosensory error cell activation, but additional acti -vation not explained by the model is found in the left frontal operculum region. C.3 Competitive queuing (CQ) models of motor sequencing. As described above, competitive queuing (CQ) models have been applied to many domains of learned serial behavior (see Background and Signifi-cance) and account for a number of behavioral and neurophysiological observations. An Investigator on the proposed project, Dr. Daniel Bullock, and his colleagues have published a series of articles describing CQ models and providing computer simulations verifying the ability of this class of models to account for se-quencing, timing, and kinematic phenomena in non-speech motor tasks (Boardman & Bullock, 1991; Bullock et al. 1993; Bullock et al., 1999; Rhodes & Bullock, 2002; Bullock & Rhodes, 2003; Brown et al., 2004; Bul-lock, 2004a,b; Rhodes et al., 2004). Here we describe the basic CQ mechanism as formulated in such stud -ies; further details are available in Bullock (2004a) and Rhodes et al. (2004) in the Appendix materials.

A schematic of a CQ network is shown in Fig. 5. A fundamental principle of CQ networks is the parallel representation of items in a motor sequence (e.g., a sequence of phonemes making up an utterance) in work-ing memory prior to initiation of movement. These items are represented by nodes in the model’s planning layer. The relative activations of these nodes determine the ordering of the segments within the sequence. Items in the planning layer compete via mutually inhibitory connections at the competitive choice layer. The “winning” item, typically the item with the highest activity in the planning layer, is selected by the “winner-take-all” (WTA) dynamics of the neural network making up the choice layer, such that only one node (sequence item) is active at the choice layer, and the motor program corresponding to that item is executed by down -stream mechanisms outside the CQ model. At this point the chosen item’s representation is extinguished at the planning layer, a new competition is run, and the item with the next highest activation is chosen. This cy-cle continues until all sequence items are performed.

Biologically plausible recurrent or feedforward competitive neural nets (e.g., Grossberg, 1978b; Durstewitz & Seamans, 2002) provide the types of interactions required in a CQ model. Because of its need to store an activity pattern, the planning layer is modeled as a normalized recurrent net in which the activity of each item is lessened as more items are added, and when an item is extinguished its share of activity auto-matically redistributes to the remaining items. Such networks afford parametric modulation of the competition, e.g., the rate at which the choice layer selects the most highly active node from the planning layer. Psychophysical studies of serial planning and performance support these properties of the CQ model. CQ models explain error patterns in verbal immediate serial recall (ISR) studies including primacy and recency ef-

PHS 398 (Rev. 09/04) Page> 21

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.fects (Henson, 1996) and transposition or “fill-in” errors (e.g, Page & Norris, 1998) that occur when noise causes two items with similar activation levels to be selected in the wrong order. They also account for word length effects found in ISR (Cowan, 1994) and sequence length effects on latency and interresponse inter-vals in typing tasks (e.g., Sternberg et al, 1978; Rosenbaum et al., 1984; Boardman & Bullock, 1991; Rhodes et al. 2004). CQ models are unique for their ability to explain both timing and error data. Farrell & Lewandowsky (2004) recently compared four major classes of sequencing models, and showed that three of the four classes made incorrect predictions regarding the latencies of transposition errors. Only the CQ model predicted both the correct distribution of transposition errors and the latencies of those errors in production.

Complementary to such CQ-consistent behavioral patterns are recent neural recordings that provide di-rect evidence of CQ-like processing in the frontal cor-tex of monkeys. These studies have strikingly sup-ported four key predictions of CQ models, as indi-cated by the neurophysiological data (top) and model simulations (bottom) in Figure 6. First, the primate electrophysiological studies of Averbeck et al. (2002; 2003) demonstrated that prior to initiating a serial act (using a cursor to draw a geometric form with a pre-scribed stroke sequence), there exists in prefrontal area 46 an active parallel (simultaneous) representa-tion of each of the strokes planned as components of the forthcoming sequence. Small pools of active neu-rons code each stroke. Second, the relative strength of activation of a stroke representation (neural pool) predicts its order of production, with higher activation level indicating earlier production. Third, as the se-quence is being produced, the initially simultaneous representations are serially deactivated in the order that the corresponding strokes are produced. Fourth, several studies (Averbeck et al., 2002; Basso & Wurtz, 1998; Cisek & Kalaska, 2002; Pellizer & Hedges, 2003) of neural planning sites also show partial activity normalization: the amount of activation that is spread among the plans grows more slowly than the number of plans in the sequence, and eventually stops grow-ing. This property, which is critical to the CQ planning layer, explains why the capacity of working memory to encode novel serial orders is limited. A simulation of the planning layer dynamics of a CQ model (Boardman & Bullock, 1991) is shown at the bottom of Fig. 6 for comparison with recording data from Averbeck et al. (2002). The simulation traces correspond remarkably well with empirical observations made a decade later. Taken together, the physiological evidence of CQ-like processing in the brain and the behavioral results ex-plained by the model provide a strong argument for choosing the CQ model as the basis of a sequencing mechanism for speech production.

In Section D.1 we propose the development of a model that effectively combines the CQ model with the DIVA model of speech production. CQ output will interface with the DIVA Speech Sound Map to enable serial performance of speech sound sequences. C.4 fMRI study of brain activations during syllable sequence production. As part of an existing grant concerning the DIVA model (R01 DC02852), we have conducted an fMRI experiment to explore brain activity underlying the sequencing of speech sounds. Here we present results from this experiment. The experiment and associated modeling work serve as the starting point for much of the work proposed herein5.

5 The research projects described herein do not duplicate any of the research in R01 DC02852, which focuses primarily on sensory-motor interactions in the production of speech sounds as embodied by the DIVA model. The current proposal, in contrast, focuses on the frontal cortical mechanisms involved in higher-level aspects of the planning of speech se -

PHS 398 (Rev. 09/04) Page> 22

Fig. 5. Schematic of a competitive queuing (CQ) network. All CQ models have at least two layers, a parallel planning layer and a competitive choice layer. The planning layer contains nodes representing possible sequence elements (in this example, planning layer nodes represent drawing stroke directions). To plan a sequence, a desired subset of these nodes is activated in parallel (e.g., the subset of nodes representing the strokes required to draw a square) and the relative amount of activation (signaled by the relative heights of bars placed above the nodes) specifies the relative order of performance. At the onset of a performance, a gating signal (not shown) causes competition within the planning layer for output via the choice layer to begin. Typically the most active planning layer node wins the competition and thereby generates a corresponding output from the choice layer, which initiates the action associated with that item. A second effect of this output, mediated by an inhibitory pathway from each output node to its corresponding planning layer node, is deletion of activity at whatever planning layer node has just won. Iteration of this choose-perform-delete cycle assures that an element’s initial relative activation level in the planning layer implicitly codes its relative order in the forthcoming sequence. In this example, the CQ network dynamics step through the segments required to draw a square.

PlanningLayer

ChoiceLayer

1. 2. 3. 4.

Order of sequence progression:

PlanningLayer

ChoiceLayer

1. 2. 3. 4.

Order of sequence progression:

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.This experiment was designed to elucidate the

roles of several brain regions in speech production, including the medial premotor cortex (the supple-mentary motor area proper (SMA), pre-supplemen-tary motor area (pre-SMA), and cingulate motor ar-eas), the peri- and intra-sylvian cortex (including the anterior insula, frontal operculum and inferior frontal gyrus), cerebellum, and basal ganglia. Clinical stud-ies have suggested that these areas may be impor-tant for sequencing in speech motor control (e.g. Dronkers, 1996; Jonas, 1981, 1987; Riva, 1998; Pickett et al., 1998). Only a small portion of the functional imaging work dedicated to speech and language has dealt with overt speech production, but the largest body of relevant studies comes from Ackermann, Riecker, and colleagues (reviewed in Dogil et al., 2002). Regarding sequencing, Riecker et al. (2000b) examined brain activations evoked by overt production of speech stimuli of varying com-plexity: CV’s, CCCV’s, CVCVCV sequences, and lexical items (words). These speech test materials failed to elicit activation of SMA, cerebellum, or an-terior insula. This finding contrasts with other stud-ies (e.g. Fiez, 2001; Indefrey & Levelt, 2004) as well as our own findings suggesting involvement of these areas even in simple mono-syllable produc-tion (Guenther et al., in press). This issue provided further motivation for the following experiment.

In this study we examined the differences in brain activations during preparation for and overt production of memory-guided non-lexical se-quences of three syllables. Two parameters deter-mined the stimulus content. The first, sub-syllabic complexity, varied the number of phoneme seg-ments that constituted each individual syllable (i.e. CV vs. CCCV). The second parameter, supra-syl-labic complexity, varied the number of unique sylla-bles comprising the three-syllable sequence (repeti-tion of the same syllable vs. three different sylla-bles). Each factor was varied between one of two values (simple or complex), yielding four stimulus types. Each of these types was presented either for vocal-ization (GO condition) or for preparation only (NOGO condition).

13 neurologically normal right-handed adult American English speakers participated. In a 3T Siemens Trio scanner, subjects were visually presented the syllable sequences. After 2.5s the syllables were removed from the projection screen and replaced by a white fixation cross. In the GO case, following a short random dura-tion (0.5 - 2.0s), the white cross turned green, signaling the subject to begin vocalization of the most recent sequence. During this production period, the scanner remained silent. In the NOGO case, the fixation cross remained white throughout. Following a 2.5s production period (or equivalent time in the NOGO case), the scanner was triggered (see fMRI experiment protocol in Section D for details) to acquire three full brain func-tional images (TR=2.5s, 30 slices, 5mm thickness). Following the third volume, the fixation cross disap-peared, and the next stimulus was presented. The subjects’ vocal responses were recorded using an MRI-compatible microphone and checked offline for accuracy. Functional image volumes were realigned, coregis-tered to a high-resolution structural series acquired for each subject, normalized into stereotactic space, and smoothed using a Gaussian kernel with full-width at half-maximum of 12mm. Analysis was performed using a random effects model with SPM2 (http://www.fil.ion.ucl.ac.uk/spm/).

Fig. 7 shows statistically significant cortical activations during overt production (Go condition) of complex sequences of complex syllables compared to baseline (p < 0.05). Overt production activated a wide bilateral cortical and subcortical network: precentral gyrus and somatosensory cortices, the SMA and pre-SMA, audi -

quences.

PHS 398 (Rev. 09/04) Page> 23

Fig. 6. A comparison of simulated CQ dynamics with cellular data from area 46 of prefrontal cortex. Top: Each black or gray data trace (solid, dashed, dotted) represents the relative activation level in monkey area 46 of a small neural ensemble that represents one element of a 3-, 4-, or 5-element sequence used to draw a geometric form. [Adapted from Averbeck et al., 2002.] Bottom: A simulation of cellular dynamics in the plan layer of a normalized CQ model (Boardman & Bullock, 1991) during production of a 5-item sequence. Each simulation trace depicts the activation history of one of the sequence element representations during the interval from just before initiation of sequence performance to just after production of the last element.

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.tory cortical regions, the intra-sylvian cortex and nearby frontal regions, as well subcortical regions including thalamus, basal ganglia, and superior cerebellum (not pictured).

Active regions relevant to the sequence model proposed in Section D include the left hemisphere SMA, pre-SMA, inferior frontal sulcus (IFS), and posterior Broca’s areas (BA 44). These areas are labeled in Fig. 7. The remaining activations in the figure, along the sensorimotor cortex surrounding the central sulcus as well as superior temporal auditory cortical areas, are accounted for by the DIVA model (see Section C.1 and Guenther et al., in press in Appendix materials).

The preparation (NOGO) condition (not shown) activated much of the same cortical network to a lesser degree. The auditory cortical areas as well as motor and somatosensory face areas were much less active in the NOGO conditions, as expected. The superior cerebellum, basal ganglia/anterior thalamus, left anterior in-sula/frontal operculum, and SMA also showed significantly increased activity for overt production compared to preparation. These findings generally agree with studies comparing overt to covert speech production (Mur-phy et al., 1997; Wise et al., 1999; Sakurai et al., 2001; Riecker et al., 2000a), although there is considerable variability in experimental designs and outcomes.

Production of complex sequences of three dis-tinct syllables (e.g. ba-da-ti) was expected to engage cortical sequencing mechanisms to a greater degree than production of syllable repetitions (e.g. ba-ba-ba). Several cortical regions responded more strongly for complex sequences in our experiment (Fig. 8, top), in-cluding bilateral pre-SMA, frontal operculum/anterior in-sula, left IFS, and superior parietal cortex. Furthermore, subcortical activation in right inferior cerebellum and bi-lateral basal ganglia was observed for complex – sim-ple sequences of simple syllables (Fig. 8, bottom).

Increased syllable complexity (e.g. stra-stra-stra vs. ba-ba-ba) should engage brain mechanisms neces-sary for programming articulator movements at a sub-syllabic level. Additional activations for overt production of complex vs. simple syllable types were observed in the pre-SMA bilaterally, within the left pre-central gyrus, and in the superior paravermal cerebellum.

The results of this experiment give some important initial data points to aid in understanding how the brain represents and executes sequences of syllables. The region of activity around the left inferior frontal sulcus (IFS) represents the “highest-level” brain region that showed reliable differential activations in this experiment and is likely to serve as a working memory representation for planned utterances (see D.1 for further discussion). The pre-SMA showed great sensitivity to both supra- and sub-syllabic complexity and has suitable connectivity to serve as an interface between the prefrontal cortex and the motor executive system (Jurgens, 1984; Luppino et al, 1993). The SMA proper, in contrast, showed larger activations for overt production and is likely to provide for initiation of motor outputs. A region at the junc-tion of the frontal operculum and anterior insula also showed speech complexity differences; this region may play a role in online monitoring during speech production (see also Ackermann & Riecker, 2004). The cerebellum showed distinct activation patterns along its superior and inferior aspects. The superior portions were more active for overt pro-duction than for preparation and more active for complex syllables than for simple ones. This region may be crucial for execution of well-learned syllables and/or for coarticulation effects. The right inferior cerebellum was significantly active for complex sequences but not sim-ple ones. In line with cerebellar projections to prefrontal cortex (Middle-ton and Strick, 2000; Dum and Strick, 2003), this area may play a role in support of the working memory representation in IFS (discussed in D.2).D. RESEARCH DESIGN AND METHODS

PHS 398 (Rev. 09/04) Page> 24

Fig. 8. Top: Cortical activations for complex – simple sequences, GO condition. Bottom: Right inferior cerebellum (left) and basal ganglia (right) activity for complex – simple sequences of simple syllables, GO condition.

Fig. 7. Cortical activation (white patches) during production of complex sequences of complex syllables.

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.The proposed research consists of two closely inter-related subprojects that combine fMRI with computational neural modeling to investigate sequencing and initiation in speech production. For the sake of clarity, the ba -sic fMRI methods and modeling framework are described first, followed by the subproject descriptions. fMRI experimental protocol. The fMRI experiments proposed herein will each involve 17 subjects6. All fMRI sessions will be carried out on a 3 Tesla Siemens scanner at the Massachusetts General Hospital NMR Cen -ter. Prior to functional runs, a high-resolution structural image of the subject's brain is collected. This struc-tural image serves as the basis for localizing task-related blood oxygenation level dependent (BOLD) activity. The fMRI experiment parameters will be based on the sequences available at the time of scanning7. The fac-ulty and research staffs at MGH, together with engineers from Siemens, develop and test pulse sequences continuously that optimize T1 and BOLD contrast while providing maximum spatial and temporal resolution for the installed Siemens scanners (Allegra, Sonata and Trio). Because scanner noise related to echo-planar imaging (EPI) may alter normal auditory cortical responses and/or cause subjects to adopt abnormal strate -gies during speech, and because articulator movements can induce artifacts in MR images, it is important to avoid image acquisition during stimulus presentation and articulation (e.g., Munhall, 2001). To this end, we will use an event-triggered paradigm in which the scanner is triggered to collect 2 full-brain image volumes following production of each stimulus. Because BOLD changes induced by the task persist for many seconds, this technique allows us to measure activation changes while avoiding scanner noise confounds and motion artifacts. The inter-stimulus interval will be determined for each experiment to be long enough to allow collec-tion of two volumes starting approximately 3 seconds8 after the speech production period is complete (total ISI of approximately 12-15 seconds). Data analysis will correct for summation of blood oxygen level across trials using a general linear model (including correction for the effects of the scanner noise during the previous trial).

Each session will consist of approximately 4-8 functional runs of approximately 6-12 minutes each. During a run, stimuli will typically be presented in a pseudo-random order. For each experiment, the task(s) and stim-ulus type(s) are carefully chosen to address the aspect of speech sequencing and/or initiation being studied; these tasks and stimuli are described in the subproject descriptions in Sections D.1-D.2. We have developed software to allow us to send event triggers to the scanner and to analyze the resulting data, and we have successfully used this technique to measure brain activation during speech production as part of another grant (R01 DC02852; see Sections C.2 and C.4 and Guenther et al., in press in Appendix). fMRI power analysis. Following the methodology described in Zarahn and Slifstein (2001), we utilized the data from our syllable sequence production experiment described in Section C.4 to obtain reference parame-ters from which to derive power estimations for the fMRI studies proposed in this application. Activations dur-ing overt speech productions compared to baseline provided measures of within- and between-subject vari-ability as well as a reference effect size of the SPM-derived general linear model parameters. The expected within-subject variability for our proposed studies was then computed from the reference value by using the number of conditions and stimulus presentations in the proposed studies compared to the same values for the reference study. Power estimates for the two proposed fMRI experiments in Section D.1, which involve 5 stimulus conditions, show that 17 subjects would be needed to detect (with probability>.8) in a random-effect analysis (at a p<.01 type I error level) an effect size that is 35% as large as the effect size of the reference contrast (overt speech - baseline). Most of the contrasts evaluated in the reference study resulted in effect sizes that fell well above the 35% threshold. This power calculation is based on the assumption that 30% of the trials in any condition contain subject errors or have other problems that cause them to be removed from the analysis. (For comparison, the error rate for the most difficult condition in the preliminary sequencing ex-periment described in Section C.4 -- the complex syllable/complex sequence condition -- was 14.2%)

The fMRI experiments in Section D.2 involve only 3 conditions. However the production tasks may be more prone to errors as they involve more difficult phoneme/syllable sequences. If we assume 50% of the tri -als must be removed from the analysis due to production errors, we arrive at power estimates nearly identical to those described above for the experiments in D.1.

These power estimations are conservative, particularly considering that they assume large numbers of

6 Subject pool sizes and selection criteria. All subjects will be between the ages of 18 and 55 with normal hearing and no known neurological disorders. We anticipate the need for 17 subjects per fMRI study to produce optimal results (see fMRI power analysis above). We further anticipate that 1-2 subjects will be needed to fine-tune each fMRI experiment, and data from 1-2 fMRI scanning sessions will be unusable due to subject motion or technical difficulties, yielding a bud-geting estimate of 20 fMRI sessions per experiment. 7 Current scanning parameters are as follows. T1-weighted high resolution anatomical scans: voxel size 1.33mm sagittal x 1mm coronal x 1mm axial. T2-weighted Echo Planar Imaging (EPI) functional scans: 16 slices per second, matrix size 64x64, field of view (FOV) 200x200mm, in-plane resolution 3.125 x 3.125mm. Our experience has shown that a scan vol -ume of 200mm x 200mm x 150mm (e.g., 37 slices of 4.05mm thickness) is sufficient to cover the entire brain. Slices are acquired in an interleaved manner and the volume acquisition time is approximately 2.25 seconds. 8 The 3s offset allows us to scan at the peak of the hemodynamic response to speech production (cf. Birn et al., 1999).

PHS 398 (Rev. 09/04) Page> 25

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.unusable trials and are based on voxel-level statistics. The use in the proposed studies of cluster-level and/or ROI analyses, methods that use information from neighboring voxels, should increase statistical power of the proposed studies beyond the values predicted here (Friston et al. 1996, Nieto-Castanon et al. 2003).Effective connectivity analysis. Whereas commonly used voxel-based analyses of fMRI data rely on the notion of functional specialization, the brain, as well as the neural model proposed herein, is a connected structure in a graph-theoretic sense, and connections between specialized regions bring about functional in-tegration (see e.g. Horwitz, 2000; Friston, 2002). Functional integration, or the task-specific interactions be-tween brain regions, can be assessed through various network analyses that measure effective connectivity. In the current proposal, we will use structural equation modeling (SEM) to examine these interactions. SEM is the most widely used method for making effective connectivity inferences from fMRI (Penny et al., 2004), and benefits from a large literature regarding its application to neuroimaging (e.g. McIntosh et al., 1994a, 1994b; Büchel and Friston, 1997; Bullmore et al., 2000; Mechelli et al., 2002) as well as its general theory (see e.g. Bollen, 1989). This method requires the specification of a causal, directed anatomical (structural) model, and estimates path coefficients (strength of influence) for each connection in the model that minimize the differ -ence between the measured inter-regional covariance matrix and that implied by the model.We will utilize a single characteristic time-course of the BOLD response from each region-of-interest (ROI) corresponding to a component of our model in the SEM calculations. There exists a natural correspondence between structural equation models and neural network models, as both are specified by connectivity graphs and connection strengths. We can specify the connectivity structure in both models identically (based on known anatomy from primate studies, diffusion tensor imaging studies, etc.), and directly compare the resulting inferred path coefficients with the connectivity in the model. In both cases, inter-regional interaction may be dynamic in the sense that the activity in one region may be driven by different regions (or in different proportions) in varying tasks and contexts; likewise, learning may result in the strengthening or weakening of effective connections (e.g. Büchel et al., 1999). To assess the overall goodness of fit of the SEM we will use the χ2 statistic corre-sponding to a likelihood ratio test. If we are unable to obtain proper fits using our theoretical structural models (i.e. P(χ2) > 0.05), we will consider this evidence that the connectivity structure is insufficient and we will de -velop and test alternative models. To make inferences about changes in effective connectivity due to task manipulations or learning (see Sections D.1 and D.2 for details), we will utilize a “stacked model” approach; this consists of comparing a ‘null model’ in which path coefficients are constrained to be the same across conditions with an ‘alternative model’ in which the coefficients are unconstrained. A χ2 difference test will be used to determine if the alternative model provides a significant improvement in the overall goodness-of-fit. If so, the null model can be rejected, indicating that effective connectivity differed across the conditions of inter-est.Computational modeling framework. The model will be implemented using the Matlab programming envi-ronment on Windows and Linux workstations equipped with large amounts of RAM (4 Gigabytes) to allow ma -nipulation of large matrices of synaptic weights, as well as other memory-intensive computations, in the neu-ral network simulations. Matlab allows graphical user interface generation, sophisticated matrix-based compu-tations (ideal for simulating neural networks as proposed herein), generation of graphical output (in the form of moving speech articulators, simulated brain activation patterns, and plots of variables of interest), and gen-eration of acoustic output (in the form of speech sounds produced by the model). Portions of the model that require intensive computations will be implemented in C and imported into Matlab as MEX executable files in order to speed up the simulations. Our department has all the relevant Matlab licenses, and our laboratory has used this environment for development of the DIVA model of speech production as part of another project. D.1 Subproject 1: Combining the CQ and DIVA models to investigate the neural basis of speech sound sequence production. This subproject will consist of one modeling project and two neuroimaging experi-ments. The goal of the modeling project is to create a competitive queuing (CQ) based model whose compo -nents specifically relate to cells in brain areas underlying sequence generation in speech. This model, which we will refer to as the sequence model, will then be integrated with the DIVA model of speech production. The combined model will be referred to as the CQ+DIVA model in this application. Simulations of the CQ+DIVA model producing syllable strings in particular speaking tasks will be run and compared to the re -sults of the proposed fMRI studies which involve those same speaking tasks. These experiments are de-signed to guide model development, test particular aspects of the model, and test between the model and al -ternative theories. This work offers three advancements over previous modeling work in this area: 1) it builds upon the CQ work of Bullock and colleagues (Boardman & Bullock, 1991; Rhodes & Bullock, 2002; Rhodes et al., 2004) which has thus far focused on sequences of externally cued manual movements, 2) the CQ net -work will work in concert with a well-developed neural model of speech production (the DIVA model) to allow quantitative measurement of interactions between the planning and motor execution stages of speech, and 3) by developing a model that includes both stages, a framework will exist for examining speech disorders that involve sequencing, initiation, and/or motor execution of speech sounds, such as stuttering and apraxia of

PHS 398 (Rev. 09/04) Page> 26

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.speech.

In the following paragraphs we describe the theoretical framework that will be implemented (in equations and computer simulations) in the proposed modeling project and tested in the accompanying fMRI experi-ments. The framework synthesizes a number of theoretical and experi-mental contributions into a cohesive, unified account of a broad range of neurological and behavioral data. We first describe a basic computational unit, the basal ganglia (BG) loop, that will be used in the complete se-quence model, before describing the rest of the model.The cortico-basal ganglia-thalamo-cortical loop as a functional unit. The proposed sequence model is built upon the basic functional unit illus-trated in Fig. 9. This unit consists of a cortico-BG-thalamo-cortical loop that begins and ends in the same portion of the cerebral cortex. Such neural loops have been widely reported in the monkey neuroanatomical literature (e.g., Alexander et al., 1986; Alexander & Crutcher, 1990; Cum-mings, 1993; Middleton & Strick, 2000). In our model we will include two “cells”9 to represent a cortical column: a superficial layer cell and a deep layer cell. This simplified breakdown of the layers in a cortical column is analogous to the breakdown utilized in the detailed model of BG function of Brown et al. (2004). The two-layer simplification allows the model to in-corporate two major empirical generalizations regarding cortical-BG and cortico-cortical projections. First, the dominant cortico-striatal projection is from layers 5a or above (“superficial”) whereas the cortico-thalamic and cortico-sub-thalamic projections are from deeper layers (5b, 6). Second, the cortico-cortical projections are either from deep layers to superficial layers or from superficial layers to deep layers10; cortico-cortical projections between layers of equivalent depth appear to be excluded (e.g., Barbas & Rempel-Clower, 1997). Each superficial layer cortical cell projects to a corresponding cell in the striatum (the major input portion of the BG, which includes the caudate and putamen). The striatum then projects to cells in the internal segment of the globus pallidus (GPi) or the substantia nigra pars reticulata (SNr), the output structures of the BG, via two pathways: a direct pathway in which each cell in the striatum projects through an inhibitory connection to a corresponding output cell (Albin et al., 1989), and an indirect pathway that projects through inhibitory pathways to the external segment of the globus pallidus (GPe) which in turn provides diffuse inhibition of the GPi/SNr11 (Parent & Hazrati, 1995; Mink, 1996). The BG output nuclei, typically tonically active, project through inhibitory connections to the thal-amus (Penney and Young, 1981; Deniau and Chevalier, 1985), which in turn sends excitatory projections to the deep layer of the cortical column from which the loop originated.

We will typically assume that each cortical column represents a different planned motor action (e.g., a particular phoneme). Thus the cortex schematized in Fig. 9 represents four motor actions. The superficial layer in this example contains a parallel representation of the four actions, as in the planning layer of the CQ model described in Section C.3 (see Fig. 5). We follow prior interpretations (Mink & Thach, 1993; Kropotov & Etlinger, 1999; Brown et al. 2004) in hypothesizing that pathways through BG have the effect of selectively enabling output from the winner of the competition between motor actions. In the schematic of Fig. 9, a “win-ner take all” or “choice” dynamic enables the cortical column responsible for the largest input to the BG to re -ceive enough thalamic activation to generate an output from its deep layer, which drives a subsequent stage of processing. In contrast, the columns representing the other items do not receive such output-enabling acti -vation from thalamus and thus have no deep layer activity. In other instances, the BG loop may simply “gate off” or scale the activity in the deep layer of the cortical column instead of performing a “choice”.

To understand the functionality of the BG loop, it is useful to consider the net effect of the excitatory and inhibitory projections between the striatum and cortex via the direct and indirect pathways. In the direct pathway, activation of a striatal cell has the effect of inhibiting the corresponding GPi/SNr cell. This dis-in -hibits the corresponding thalamic cell, which in turn excites the corresponding superficial cortical layer cell. In other words, the direct pathway is excitatory and has the specific effect of exciting only the same column(s) 9 Although we use the term “cells” to describe the smallest computational units in the model, we expect that each cell cor-responds to a small population of neurons that behave in a highly coordinated fashion rather than a single neuron. Ac-cordingly, when modeling the fMRI activity associated with a model cell’s activity, we generate a Gaussian spread of acti-vation centered at the cell’s defined location in stereotactic space. This spread also helps account for the spatial averag -ing and smoothing inherent in fMRI data analysis.10 Due to spatial resolution limits of fMRI, it is not possible to reliably distinguish deep from superficial cortical layers. The deep and superficial layer cells of a cortical column will be assumed to be at the same spatial location in our model.11 Due to space constraints, we do not explicitly address the role of the subthalamic nucleus (STN) in the indirect pathway in this application, though it is implicit in our indirect pathway and we will include it in the proposed modeling study.

PHS 398 (Rev. 09/04) Page> 27

Fig. 9. The cortico-BG-thalamo-cortical functional unit (BG loop).

C o rtex

Sup erfic ia l

Dee p

Ba sa l G a ng liaStria tum

G Pi/SNr

Tha l

To o the rc o rte x

G Pe--

-

-

Fro mo the rc o rte x

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.as the superficial layer cell(s) that provide the initial excitatory input to the striatum. Because it involves one extra inhibitory connection, the indirect pathway is inhibitory on thalamus and cortex. Furthermore, due to the diffuse indirect path projection, this inhibition spreads across all competing columns (i.e., across all motor action representations). Thus the larger an action’s advantage in superficial layer activation, the more it will excite itself and inhibit other actions in the BG-loop-mediated competition. This inherent connectivity provides the types of interactions that would be expected for a CQ-like choice network. The balance between direct and indirect pathways can change; e.g., the release of dopamine in the striatum has an excitatory effect on the direct pathway and an inhibitory effect on the indirect pathway (Gerfen & Wilson, 1996; Brown et al., 2004; Frank, 2005), thus biasing the system toward the direct pathway. Such a bias can affect competitions between actions. For example, a stronger indirect pathway will more strongly suppress all action; such a process may be responsible for the overall lack of movement seen in Parkinson’s disease, which is character-ized by a lack of dopamine in the striatum and corresponding bias toward the indirect pathway (e.g. Wich-mann and DeLong, 1996) in addition to pathological neuronal bursting that leads to tremor (Bevan et al., 2002).

This basic functional unit will be used to model several cortical regions along with associated basal gan -glia and thalamic areas. The following paragraphs describe the model that will be implemented, tested, and refined in this subproject. For clarity of exposition, we will introduce the overall model in several stages.Working memory sound sequence representation. An overview of the cortical interactions in our proposed model is presented in Fig. 10. The top part of the figure shows brain activations from the complex sequence/complex syllable vs. baseline comparison from our preliminary fMRI study (Section C.4), with the approximate locations of the proposed model’s components labeled. The bottom half of the figure schematizes the cortical interactions in the model, as described in the following paragraphs. Basal ganglia loops associated with the cortical areas are not shown for clarity.

A parallel working memory (WM) representation of speech sounds, hypothesized to exist in/along the left inferior frontal sulcus (labeled IFS in Fig. 10), constitutes the highest level of our model (in concert with pre-SMA, described later). The model’s “job” is to produce a sound sequence represented in the IFS working memory in the proper order and with proper timing. Evidence for a verbal working memory representation in or near the left inferior frontal sulcus12, including the dorsolateral and/or ventrolateral prefrontal cortex, has been found in a number of experimental studies (e.g., Crottaz-Herbette et al., 2004; Henson et al., 2000; Veltman et al., 2003; Wagner et al., 2001), though the exact nature and location of this representation remain unclear. In our fMRI study of syllable sequence production described in Section C.4, left IFS had more activity for more difficult speech sequences, in keeping with its proposed role as a sound sequence working memory, and in an fMRI study of syllable production that did not include a working memory component (unlike the study in C.4), activation of this area was ab-sent (see Guenther et al., in press in Appendix). Furthermore, Petrides & Pandya (1999) suggest this region may be the human equivalent to the monkey posterior principal sulcus region, which was the site of the queued representation of drawing movement se-quences identified by Averbeck et al. (2002, 2003) and used as the basis for the working memory representation in our model.

The model’s working memory representation is informed by the competitive queuing (CQ) framework described in Section C.3 and by the strikingly consistent neurophysiological findings from the Aver-beck et al. (2002) study of drawing sequences. However, speech is more complex than drawing, and patterns of speech errors provide evidence against the sufficiency of a CQ model that utilizes a single parallel planning zone that mixes representations of all phonological units without regard to type. Error distribution data suggest that com-petition occurs between speech units of the same phonological type but not between units of different types, such as syllable onsets, nu-clei, and codas. In speech errors known as Spoonerisms (MacKay, 1970), two parts of a phrase are switched, e.g. “a lack of pies” may be spoken instead of the intended “a pack of lies”. Such sound swapping errors in speech almost always involve two items of the same phonological or

12 It is likely that the parietal cortex also plays a role in working memory (e.g., Becker et al., 1999; Hickok et al., 2003; Jonides et al, 1998; Mottaghy et al. 2002, 2003). In this application we focus on frontal cortical mechanisms; however we will also analyze parietal activations in our experimental results and incorporate parietal WM mechanisms into the model as needed to account for the results.

PHS 398 (Rev. 09/04) Page> 28

Fig. 10. Cortical interactions underlying sequencing and initiation of speech in the proposed model.

IFS BA 44

Pre SM A SM A

Se q ue nc eWM

Sp e e c hSo und M a p

Trig g e r C e llsFra m e Re p

Ne xtSo und

Trig g e rSig na ls

Fra m eSig na ls

To m o to rc o rte x(D IVA

m o d e l)

O ve rtsp e e c ho nly

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.grammatical type. For example, syllable onset consonants often swap with other syllable onset consonants (as in the example above), entire initial syllables can swap with other initial syllables (“Birthington’s washday” instead of “Washington’s birthday”), verbs can swap with other verbs, and nouns can swap with other nouns (Nooteboom, 1969; Fromkin, 1971, 1973; Garrett, 1975, 1980). However very rarely are vowels swapped with consonants, initial syllables swapped with final syllables, or nouns swapped with verbs.

We therefore propose that the IFS working memory consists of several distinct CQ circuits, each of which mediates competition among speech units of a particular type (cf. Hartley & Houghton, 1996). That is, there are separate CQ circuits in IFS respectively dedicated to choosing syllable onsets (the initial conso-nants of a syllable), syllable nuclei (vowels), and syllable codas (final consonants). In this modeling project we will limit ourselves to these three phonological types for the sake of tractability. Within each of these CQ circuits, the model will include sets of cells to represent sets of speech sounds of the corresponding type (e.g., in the syllable onset circuit there is a cell for /p/, a cell for /t/, etc.). Our modeling starts with the assump-tion that a sequence of speech sounds becomes represented as parallel patterns of activity in these IFS CQ circuits. The brain mechanisms leading to this activation of IFS (that is, the mechanisms responsible for choosing the words/syllables to be spoken) are beyond the scope of the current model, which starts from this pattern of activation and performs the neural computations needed to transform it into a set of articulator movements that produce the sound sequence in the proper order and with the proper timing.

Proper readout of a sound sequence requires coordination across the three CQ circuits in IFS. Each circuit only “knows” the order of sounds of a particular type (e.g., syllable onsets) in the current sequence; it does not know which sound type should come next in the utterance. We propose below that the pre-SMA sends signals to the CQ circuits in IFS indicating which sound type should occur next in the sequence. These “frame signals” cause the readout of the next sound in the CQ circuit that receives the signal. This breakdown of speech production into a subsystem that processes syllable structure without regard for the exact pho-nemes within a syllable (“syllable frames”, e.g. /CV/, /CVC/) and a subsystem that processes the phonemic “content” of these frames is in accord with “frame/content” or “slots-and-fillers” theories of speech production (e.g., MacNeilage, 1998; Shattuck-Hufnagel, 1979, 1983, 1987). Further, we agree with MacNeilage’s (1998) proposal that the medial premotor areas (specifically SMA and pre-SMA in our model) are primarily involved with processing frames while lateral areas (specifically IFS and BA 44 in our model) are primarily involved in processing content. The model we propose is also in the spirit of prior, more abstract, models, in which CQ appeared as a core of models that could explain grammar-respecting patterns of sequencing errors observed in language production (e.g., Dell et al., 1997; Hartley and Houghton, 1996; Ward, 1994). Our model differs in two key ways from prior efforts: it simulates in detail the lower levels of processing, including the interface be -tween sequencing and the actual articulations needed for phone production, and its hypotheses have de-tailed brain-circuit interpretations that make them testable via brain activity measurements.

In the following subsections we delineate the neural circuitry believed to be involved in the “reading out” of sound sequences represented in working memory in the IFS. Choosing the next sound from the working memory sequence representation. Once a syllable se-quence representation has been activated in left IFS and the proper CQ circuit receives a frame signal from the pre-SMA (to be described in the next subsection), the triggered CQ circuit must read out the next sound from its parallel representation, as in the competitive layer of the CQ model (see Fig. 5 in Section C.3). We propose that this competition is carried out by the IFS-basal ganglia loop. In monkeys, BA 46v (thought to be equivalent to our IFS; cf. Petrides & Pandya, 1999) projects to the caudate nucleus (part of the striatum) and receives input from GPi/SNr via the thalamus (Middleton and Strick, 2002). We propose that this loop per-forms a competition between the different sound sequence items, as schematized in Fig. 9. The winning item projects back up to the IFS deep layers and then on to the BA 44 superficial layers (pathway labeled “next sound” in Fig. 10).

The DIVA model speech sound map cells are proposed to lie in the deep layers of BA 44. Recall that ac-tivation of these cells leads to readout of the motor program for producing the speech sound (see Section C.1 and Guenther et al., in press in Appendix materials for details). As described above, the superficial layer BA 44 cell corresponding to the chosen item from working memory becomes active due to projections from IFS. However, readout of the motor program for that sound does not occur until the corresponding deep layer cell (speech sound map cell) is activated. In the proposed sequence model, the activation of the deep layer cells depends multiplicatively on two factors: (i) the output of the BA 44 BG loop that starts from the superficial lay-ers of BA 44 and terminates in the deep layers, and (ii) a “trigger signal” arising from the SMA. The SMA trig -ger signal (described further below) has a value of 1 or 0; it becomes active (value of 1) to initiate the motor production of the current syllable/phoneme, and its activity goes to 0 at completion of the motor program for the syllable. The model’s BA 44 BG loop has the effect of scaling the size of the activation represented in the superficial layer; this is hypothesized to control the speed of readout of the motor program (i.e., speaking rate). Thus the speech sound map cell corresponding to the chosen syllable will remain inactive until the SMA trigger signal arrives, at which time it will become active and will stay active during production of the current

PHS 398 (Rev. 09/04) Page> 29

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.syllable, with its level of activation regulated by the BA 44 BG loop to control speaking rate. From this point the DIVA model takes over the motor execution of the syllable via a combination of learned feedforward com-mands to motor cortex and feedback control circuitry (see Guenther et al., in press in Appendix for details).

Next we address the generation of properly timed frame and trigger signals by the pre-SMA and SMA.Initiating the chosen sound at the right time. Numerous studies in the past decade have suggested a sep-aration of the medial wall premotor area previously described as the “supplementary motor area” into a poste-rior area termed the SMA proper (referred to here as SMA) and an anterior area termed the pre-SMA with dif -ferent functionality (e.g. Matsuzaka et al, 1992, Shima & Tanji, 2000). The SMA has been implicated in the initiation of specific actions in numerous studies (Eccles, 1982; Krainik et al., 2001; Picard & Strick, 1996; Tanji, 2001; Hoshi & Tanji, 2004). Many cells in SMA fire time-locked to the execution of a particular motor act (Matsuzaka et al., 1992; Tanji & Shima, 1994). The pre-SMA, on the other hand, appears to be involved in higher-level aspects of complex tasks such as choosing the right action in response to a particular stimulus (“cognitive set”; Matsuzaka et al., 1992), representing numerical order of movements in a sequence (Clower & Alexander, 1998; Shima & Tanji, 2000), or updating an entire sequential motor plan (Shima et al., 1996; Kennerley et al., 2004). Whereas many SMA cells are only active immediately before and during movement execution and are only active for a specific movement type or effector system, pre-SMA cells typically begin firing well in advance of movements and often are not specific for which effector system is used to carry out an action (e.g. Shima & Tanji, 2000; Fujii et al., 2002; Hoshi & Tanji, 2004).

In keeping with these observations, we propose that the pre-SMA sends “frame signals” (see Fig. 10) to the IFS working memory that determine which CQ circuit (i.e., the onset, nucleus, or coda CQ circuit) will read out its next item to BA 44. Actual motor execution of the item, however, requires an SMA trigger signal projecting to BA 44. This is necessary in the model because pre-SMA does not have direct sensory and mo-tor inputs (Luppino et al., 1993) that can act as “completion signals” indicating completion of the previous item in the sequence. The SMA, in contrast, receives a wide variety of sensory and motor inputs (Passingham, 1993) that can act as completion signals, and thus SMA is capable of timing the motor execution of the cur -rent item immediately upon completion of the previous item. We further propose that, in addition to sending trigger signals to BA44, the SMA also sends trigger signals to primary motor cortex. Such projections from SMA to motor cortex and F5 (the monkey analog of BA 44) have been identified in anatomical studies (Lup-pino et al., 1993; Jürgens, 1984). We propose that “inner speech” (with no overt articulation) involves only the trigger signals to BA 44, whereas overt speech involves SMA trigger signals to both BA 44 and primary motor cortex.

Generation of the pre-SMA frame signals and SMA trigger signals for a speech sequence is hypothe-sized to occur as follows. To fix ideas, consider production of a CVC syllable. A pre-SMA “frame cell” repre -senting the syllable’s frame structure (e.g., CVC) is activated by other parts of cortex (not treated in the model) at the same time that the phonemes for that syllable are loaded into the IFS working memory repre -sentation. Single cell recordings in monkeys have identified cells in pre-SMA that respond in advance of a particular movement sequence but not other sequences (e.g., Shima & Tanji, 2000). This frame cell activates another pre-SMA cell that represents the first item in the sequence (C for a CVC), and this cell in turn sends a frame signal (see Fig. 10) to the corresponding working memory CQ circuit (the “onset” circuit). Activation of the SMA trigger cell (to initiate production of the item) can occur in one of three ways in the model: via projec-tions from pre-SMA (for internally timed sequences that have not been practiced heavily), via the SMA BG loop (for heavily practiced internally timed sequences), or via cortico-cortical connections from sensory areas processing external timing cues (in externally timed sequences). We focus here on internally timed se-quences. Furthermore, for the current modeling project we will focus on the pre-SMA -> SMA route; the SMA BG loop route is explored in Section D.2. Once the SMA trigger cell is activated, it remains active until it re -ceives a signal indicating completion of the motor execution of the current item. The SMA then projects back to the pre-SMA to signal completion of the item, extinguishing activity in the pre-SMA cell representing the completed frame component and activating the cell for the next component (V for a CVC). This process re -peats (except for the activation of the pre-SMA frame cell, which happens only once for each syllable) until the entire syllable has been completed. We further propose that the left SMA/pre-SMA and right SMA/pre-SMA control different aspects of speech production. We propose that left hemisphere SMA/pre-SMA is more involved in signaling the frame type associated with a word and triggering the production of the particular con-tent items in the frame, while the right hemisphere is more involved in sentence-level prosodic aspects of the utterances, in keeping with several studies and theoretical treatments (e.g., Ross & Mesulam, 1979; Heilman et al., 2004; Baum & Dwivedi, 2003; Perkins et al., 1991). We focus on left SMA/pre-SMA in the current pro -posal; we omit sentence-level prosodic control from the model for the sake of tractability.

The distinction between motor program storage in lateral premotor areas and timing signal generation by medial premotor areas is also similar to the FARS model of Arbib and colleagues, a model of monkey grasping circuits (Fagg & Arbib, 1998) that has also been used to investigate evolution of language (Arbib, in press). Although space limitations preclude detailed treatment, our model differs in several key respects, most

PHS 398 (Rev. 09/04) Page> 30

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.notably in the use of CQ for working memory and the fact that our model will generate articulator movements and sounds that will be compared to experimental data. Modeling project. In the modeling project, we will define mathematical equations for all model components described above and implement these equations in computer software that will allow us to simulate the se-quence model to produce speech sound sequences, in concert with the DIVA model. The modeling project will be carried out in several stages. In the first stage, a generic cortex-BG loop model will be implemented in computer simulations, and we will verify its ability to perform a competitive “choice” of a parallel plan in the su -perficial layer (as schematized in Fig. 9). In the second stage, we will implement the circuitry for readout of an IFS working memory sound sequence plan to the speech sound map cells. We will verify the circuit’s ability to read out sequences in the proper order when provided with appropriate trigger signals from the modeler. In the third stage, we will implement the circuitry for generating the pre-SMA and SMA trigger signals, and in the fourth stage we will combine the various sequence model components into a single computer simulation and verify its ability to activate BA 44 speech sound map cells in the proper order. The timing of signals between these regions will reflect realistic transmission delays, as is the case in the DIVA model (see section C.1 and Guenther et al., in press in Appendix documents). Finally we will integrate this model with the DIVA model (thus forming the CQ+DIVA model) to allow generation of the articulator movements and acoustic signal for the sound sequence. We will verify the functionality of the overall system by performing simulations of the CQ+DIVA model producing a number of sound sequences, allowing us to qualitatively assess model perfor-mance by listening to its productions.

After the model has passed these qualitative tests, we will evaluate it more quantitatively by adapting protocols for model-data comparisons that consider both timing and error patterns (MacKay, 1970; Sternberg et al., 1978; Boardman & Bullock, 1991; Verwey, 1996; Page & Norris, 1998; Conway & Cristiansen, 2001; Rhodes & Bullock, 2002; Klapp, 2003; Rhodes et al., 2004; Farrell & Lewandowsky, 2004; Agam et al., 2005). Regarding timing patterns, past studies of speeded immediate serial recall (ISR) of prepared verbal sequences have shown several systematic timing phenomena, including: (1) a sequence length effect on the latency to initiate a sequence; (2) a ratio much greater than one between initiation latency and continuation latencies (inter-item intervals); (3) an inverse sequence length effect on mean production rate; and (4) non-monotonic relationships between inter-item intervals and serial position within the sequence. Moreover, (5) high levels of practice with specific sequences causes the latency to initiate the practiced sequence to be-come independent of the length of that sequence. Thus learning can eliminate effect (1) while most of the other effects remain. All these patterns, which also hold for non-verbal key-press sequencing, were success-fully modeled in the N-STREAMS model for key-pressing, an extended CQ model developed by Rhodes & Bullock (2002) and assessed vis-a-vis further data in Rhodes et al. (2004). In this modeling project we will verify that the CQ+DIVA model also has these properties, modifying the model if necessary to account for the data. Furthermore, adding noise at the CQ level of our deterministic model will create a stochastic model that is capable of making sequencing errors, notably transposition errors in which two elements of a planned se -quence mistakenly swap positions in the output. Plotting data from a large number of stochastic simulations will allow us to verify that the model's phoneme transposition errors obey both frame and adjacency con-straints suggested by many prior reports (MacKay; 1970; Fromkin, 1973; Garrett, 1980; Dell et al., 1997). In particular, most transposition errors should be swaps between items of the same type (e.g., onset or coda), and they should be separated by only one serial position within their respective CQ circuit (i.e. contents of ad-jacent onsets in a multi-syllable sequence plan should swap much more often than contents of onsets sepa-rated by more than one serial position in the plan). Also, CQ-type models, unlike all other types so far pro -posed to explain transposition distributions, can explain the monotonic decline of the latency of a transposi -tion error as a function of transposition distance (Farrell & Lewandowsky, 2004; cf. also Dell et al., 1997). The stochastic model's ability to explain this key interaction effect will also be assessed.

As in our work with the DIVA model (see Section C.1 and Guenther et al., in press in Appendix), we will identify specific locations of the model’s cells in the stereotactic coordinate frame used in the SPM2 analy-sis software (based on the anatomy of the SPM “standard brain”) and extract simulated fMRI data from the model simulations (see Section C.2). These data will be compared to activations measured in our proposed fMRI studies (described further below).

Although the model described above is in keeping with a wide range of data from behavioral and neu-roimaging studies, we consider it to be a preliminary model that will be refined in this modeling and experi -mental work. For example, we have omitted in this description the cingulate motor areas, which may also play a role in sequencing and initiation of motor actions (e.g. Paus et al., 1993; Procyk et al., 2000). We do this in part because the connectivity is not as well established (cf. Geyer et al., 2000), and also to keep the model as simple as possible. Nonetheless we will analyze the cingulate motor areas in our fMRI studies and look for evidence of their involvement in our tasks. If found, this evidence will be used to guide refinement of the model to incorporate these areas. Also, the anterior insula is not treated in the sequence model because it appears to be primarily involved in overt rather than covert articulation (Ackermann & Riecker, 2004) and

PHS 398 (Rev. 09/04) Page> 31

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.therefore is considered part of the DIVA model, which is addressed by another grant. Nonetheless we will an-alyze insula activations in our fMRI studies and incorporate any relevant findings into the CQ+DIVA model.fMRI Experiment 1: The representation of syllable frames in prefrontal cortex. According to our pro-posed model (and in accord with MacNeilage’s frame/content theory), the medial premotor areas, in particular the pre-SMA, are involved in syllable frame generation during speech, whereas the IFS is a working memory that stores content elements (i.e., specific phonemes to fill in the syllabic frames). Furthermore, projections from pre-SMA to IFS trigger the readout of the appropriate content element to the BA 44 speech sound map, and the production of this element takes place when the SMA sends a trigger signal to BA 44. Overt speech (as opposed to inner speech) additionally involves SMA trigger signals to primary motor cortex. In this experi-ment we propose to test these hypotheses.

Two different stimulus types will be used in this experiment: (1) three-syllable nonsense words with simple frames, and (2) three-syllable nonsense words with complex frames. The simple and complex frames will be matched for “content”; i.e., they will use the same phonemes. Each of these stimulus types will be used in one of two speech tasks performed in different runs: inner speech (i.e., saying the utterance “in your head”, without articulating) and overt speech (saying the utterance out loud) for a total of four experi-mental conditions. Additionally, a control condition of resting quietly while viewing “XXXX” on the screen will be used as a baseline for determining activations in the production tasks. For simple frames, we will use CV-CV-CV and V-CV-CVC nonsense pseudowords (like “padita” and “adapit”), and for complex frames we will use V-CVC-CV (e.g., “akupni”) and CV-V-CVC (e.g., “puabab”). These frames were chosen based in part on their relative frequency in English words. The publicly available CELEX-2 Lexical Database (Centre for Lexi -cal Information, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands, 1995) contains de-tailed information about the phonology, morphology, syntax, and frequency of English words and syllables drawn from text passages. According to the CELEX-2 database, the frequency of occurrence of each frame type per 1 million spoken words is 6444 for CV-CV-CV, 2079 for V-CV-CVC, 177 for V-CVC-CV, and 29 for CV-V-CVC; thus the simple frames occur on average over 40 times more frequently that the complex frames. Subjects will be instructed before the experiment to stress the first syllable of each utterance to avoid varia -tions in stress patterns across subjects. During the fMRI session, the experimenter will monitor the subject’s productions, and if frequent errors in production are detected, the experimenter will remind the subject be -tween runs of the correct pronunciations. The event-triggered design of our protocol allows us to remove trials in which subjects have committed production errors. After completion of the fMRI session, recordings of the subject’s productions will be reviewed and erroneous productions removed from subsequent analyses. Ac-cording to our power analysis described above, average error rates as high as 30% will still allow detection of significant activation changes in the fMRI analysis with 17 subjects. We expect the actual error rates to be much lower than this, given our rate of 14.2% for the most difficult syllable sequences in our prior experiment (Section C.4).

Hypothesis Test 1. It is very likely that complex frames require more neural processing than simpler frames in brain areas responsible for frame generation; fMRI and PET studies widely report more activation during more difficult tasks (e.g., Paus et al., 1998; Gould et al., 2003). If the pre-SMA is the primary site of frame processing as proposed in the model, then it should be more active in the complex frames task than the simple frames task. This should be true for both the overt and covert speech cases. To test this hypothe -sis, we will perform a contrast (using the SPM software package) between the complex and simple frame cases for both the overt speech and inner speech conditions (denoted as complex – simple frame hereafter). This contrast will identify any statistically significant activity differences (random effect analysis, statistics con-trolled at a 0.05 false discovery rate (FDR)) between the two cases. If there is significantly greater activity for complex - simple frames in pre-SMA in both the overt and inner speech cases, this will be taken as support for the model’s hypothesis that pre-SMA is involved in frame generation. Such a finding would also be consis -tent with MacNeilage’s (1998) proposal that medial frontal cortex is involved in frame processing. If there is no significant difference in one or both cases (overt or inner speech), we will examine alternative accounts of frame generation and of the pre-SMA role in speech. Significant activity differences in other premotor areas, for example, will be taken as evidence that these areas may contribute to frame generation. If such a result is found, the model will be modified accordingly. Hypothesis Test 2. In contrast to pre-SMA, IFS should have the same amount of activity in the two tasks (complex vs. simple frames) according to our model since they involve the same content elements (phonemes). This hypothesis will be tested using the complex – simple frame contrast in both overt and inner speech conditions. If significant activation is found in IFS for complex compared to simple frames, the model’s hypothesis will be rejected and the alternative hypothesis that IFS is involved in frame generation, not just content storage, will be supported. If no significant difference is found in the IFS region, this would support our model. This finding would also be consistent with the results shown in Figure 6 (greater activation for complex vs. simple sequences) and findings from previous imaging studies which show activation in IFS during verbal short term memory tasks (e.g., Wagner et al., 2001; Crottaz-Her -bette et al., 2003; Veltman et al., 2003). Hypothesis Test 3. The model predicts that inner speech should in-

PHS 398 (Rev. 09/04) Page> 32

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.volve less activation in SMA than overt speech since in overt speech cells within SMA send trigger signals to both BA 44 and primary motor cortex (involving two different subsets of SMA cells), while in inner speech only cells which trigger BA 44 are recruited. To test this hypothesis, we will perform a contrast of overt speech – inner speech for both the complex and simple frame cases. If these contrasts show significantly greater activ -ity in SMA for overt speech, the model’s hypothesis will be supported; this would represent a generalization to human speech of the finding of SMA cell activity time-locked to movement onsets in monkeys (Matsuzaka et al., 1992; Tanji & Shima, 1994). If no significant activity is found in SMA for overt – inner speech, this would suggest that the SMA is equally involved in triggering speech regardless of whether or not the speech is overtly articulated will be supported (e.g., see Riecker et al., 2000a, Shuster & Lemieux, 2005). Finally, if sig-nificantly more activity is found in SMA for inner speech than overt speech, the alternative hypothesis that SMA actively inhibits motor cortex during inner speech will be supported (cf. Goldberg, 1985). Hypothesis Test 4. In contrast to SMA, the model predicts that pre-SMA activation should be the same for the overt and inner speech tasks since they both involve the same frames. This hypothesis will be supported if no signifi -cant activity is detected in pre-SMA in the overt – inner speech contrast. Significant activity in pre-SMA for this contrast13 will be taken as evidence for the alternative hypothesis that pre-SMA is, like SMA, more in -volved when speech is overtly articulated than when it is simply spoken internally. This would be evidence for a more “motoric” role of pre-SMA than previously thought (e.g., Shima & Tanji, 2000; Fujii et al., 2002; Hoshi & Tanji, 2004). Although fMRI studies of both inner speech and overt speech have been performed 14 (e.g., Murphy et al., 1997; Riecker et al., 2000a; Shuster & Lemieux, 2005), none of these studies differentiated pre-SMA activation from SMA activation, likely because SMA and pre-SMA are abutting and thus difficult to differentiate using standard fMRI and PET analysis techniques. We have developed region-of-interest (ROI) based fMRI analysis tools that allow us to test hypotheses concerning particular anatomically defined regions of interest with greater statistical power and anatomical accuracy than standard fMRI analysis methods (see Nieto-Castanon et al., 2003, in Appendix documents for details). We will use these tools to differentiate SMA and pre-SMA when testing Hypotheses 3 and 4.

The components of our sequencing model are all hypothesized to lie in the left hemisphere, based on the fact that these regions were left-lateralized in our preliminary sequencing study (Section C.4). In addition to the hypothesis tests described above, we will also test for the predicted laterality for each of the regions in the proposed model (paired t-test within subjects across hemispheres for the contrasts of interest in Hypothe -ses 1 to 4), in addition to the contrasts comparing each stimulus type vs. the control condition. Though not re-peated hereafter for brevity, laterality tests will be performed for all voxel-based hypotheses in all experi-ments.

In addition to the voxel-based tests described above, we plan to perform structural equation model (SEM) analyses to test hypotheses concerning the signaling between brain regions (see Effective connectiv-ity analysis above for methodological details). It should be noted, however, that the validity of effective con-nectivity methods is not as well established as the validity of voxel-based methods. Thus caution must be ex-ercised in interpreting SEM results as falsifying or strongly supporting a particular hypothesis. Nonetheless SEM can be expected to provide valuable supplemental information to the voxel-based hypothesis tests de-scribed above. Hypothesis Test 5. During both overt and inner speech the proposed model predicts signals from pre-SMA to IFS, from IFS to BA 44, from SMA to BA 44, and bidirectional signaling between pre-SMA and SMA. These signals are not expected during the baseline task (viewing ‘XXXX’). We therefore expect greater effective connectivity between these regions during each speech condition than during the control condition. To test this hypothesis we will compare effective connectivity path strengths determined using SEM between each speech condition and the baseline condition, looking specifically at the pathways mentioned above. Greater path connectivity for each speech condition for a given path coupling supports the model’s hy-pothesis that signaling between the two regions increases during speech. Hypothesis Test 6. The model predicts no change in effective connectivity for the paths outlined in Hypothesis Test 5 when comparing the overt and inner speech conditions. This hypothesis will be supported if SEM indicates no difference in path strength between these regions in the overt and covert conditions. If a difference is found, it will be taken as evidence for the alternative hypothesis that signaling between these areas is modulated by whether or not speech is overtly articulated. Hypothesis Test 7. The model predicts no modulation of effective connectivity from IFS to BA 44 and from SMA to BA 44 due to frame complexity. This hypothesis will be supported if SEM indicates no difference in path strength between these regions in the simple and complex frame conditions. If 13 It is also possible that overt, compared to inner speech, causes an increase in the overall gain of the speech network. If a large number of regions beyond the primary sensorimotor areas show this effect, we will remove the gain using linear regression, and evaluate regions that show the overt vs. inner speech effect above and beyond this global modulation.14 Note that the NOGO condition in our fMRI study in Section C.4 is different from covert speech in that subjects are never actually triggered to produce the syllable sequences, although they may be mentally rehearsing them in anticipation of the trigger signal. This confound prevents that study from testing our hypotheses concerning degree of SMA/pre-SMA activity in covert vs. overt speech.

PHS 398 (Rev. 09/04) Page> 33

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.a difference is found, it will be taken as evidence for the alternative hypothesis that signaling between these areas is modulated by frame complexity, and the model will be modified accordingly. Specifically, a difference between IFS and BA 44 would indicate that IFS is involved in frame generation, not just content storage. Likewise, a difference in the coupling between SMA and BA 44 would indicate SMA plays a role in frame gen-eration and not just timing signals. Hypothesis Test 8. As outlined in Hypothesis 3 above, the model posits trigger signals from SMA to the speech articulator portions of motor cortex during overt speech but not during inner speech. This hypothesis will be supported if SEM analysis indicates a stronger positive path from SMA to ventral motor cortex during the overt than the inner speech condition. If on the other hand a negative path strength is found between SMA and motor cortex for the inner speech condition, this will support the alterna -tive hypothesis that SMA inhibits motor cortex to prevent movement during inner speech.Comparing model cell activations to fMRI experimental results. We will perform computer simulations of the proposed CQ+DIVA model performing the same speech tasks as the speakers in the fMRI experiment. Specifically, we will first train the model to produce the utterances from the stimulus set (see Guenther et al., in press in Appendix materials for details regarding the learning of new speech sounds in the DIVA model). The model will then produce each utterance in the stimulus list in both overt and inner speech modes. In ad -dition to verifying that the model is capable of producing the syllables for each utterance in the proper order (thus testing the functionality of the CQ components of the model as well as the frame generation mecha-nism), we will also generate simulated fMRI data as the model produces the utterances from the experiment. The same contrasts performed for the fMRI experiment will be performed on the simulated data to allow direct comparisons between simulated fMRI activity and the experimental results (see Section C.2 for details). As in our earlier work with the DIVA model (see Guenther et al., in press in the Appendix materials for details), each cell in the proposed model will be associated with a precise anatomical location specified in Montreal Neurological Institute (MNI) normalized spatial coordinates, the same coordinate frame used to analyze fMRI data in the SPM software package. This allows us to project the model’s simulated fMRI activations onto the same brain surface used to plot the results of our fMRI experiments, and to directly compare the locations and magnitudes of the peak activations in the model to those found in the fMRI experiments. If the model’s cell activities differ significantly from the experimental results (e.g., if the locations of the peak activations of the model differ substantially from the peak activity locations in the experimental data for a particular con-trast), the model will be modified to be in accord with the experimental findings by relocating model cells and/or changing the hypothesized functionality of a particular brain region in the model. There is currently no standard for quantitatively comparing the locations of simulated BOLD activations produced by a model such as the one described here to fMRI experimental results. To test a proposed location of a cell type in the model that is active in a particular simulated contrast, we will simply determine whether the stereotactic coor -dinate of the model cell falls within a cluster of activation in the corresponding fMRI contrast. fMRI Experiment 2: IFS as a sound sequence working memory or semantic processing center. Our model proposes that the left IFS is the site of a sound sequence working memory needed during production of speech sound sequences. Activity in this same region was identified in the word generation study of Crosson et al. (2001), who surmised from their results that the area was involved in semantic processing. This contrasts with our model as well as the results of our preliminary fMRI study described in Section C.4, wherein IFS activation was found for semantically meaningless syllable sequences. To resolve this issue and further inform our model, we propose to directly test between these two hypotheses concerning function of the IFS in this experiment, and test additional hypotheses generated from our model.

Subjects will perform three tasks in the fMRI scanner. In the semantic generation task, subjects will be briefly presented with a word (text on a video screen) that describes a category of items (e.g., “birds”, or “furniture”) and will be asked to think of two items (few items condition) or four items (many items condi-tion) that are members of that category. The word will disappear, and after a few seconds a “go” signal will be presented on the screen, at which time the subject will say the items that he/she thought of. This is similar in several respects (though not identical) to the paced word generation condition of Crosson et al. (2001). In the nonsense utterance task, subjects will be briefly presented with two short nonsense words (few items condi-tion) or four short nonsense words (many items condition) on the screen, then the words will disappear. A few seconds later a go signal will be presented and the subject will say the nonsense words. Finally, the baseline control task will consist of the brief presentation of “XXXX” on the video screen instead of a category word or nonsense word, and a few seconds later the go signal will be presented. However subjects will be instructed before the experiment to just rest quietly and look at the screen when they see the X’s, rather than producing speech. This task will be used as a baseline for measuring brain activations in the other tasks. Behavioral pi-lot studies will be run in advance of the scanning sessions (involving approximately 10-15 subjects) to deter-mine an appropriate delay period between word presentation and the go signal, i.e. one that is long enough to allow a subject to generate the category items in the semantic generation task, but not so long that sub-jects cannot remember the nonsense words in the nonsense utterance task. This delay is designed to in -crease the working memory demand of the tasks in order to highlight the differences in the predictions of the

PHS 398 (Rev. 09/04) Page> 34

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.two hypotheses. We will also measure the word lengths and phonological makeup of the words that subjects most frequently generate in the semantic task of the pilot study and use these to determine the length and phonological makeup of the nonsense words to use in the fMRI experiment nonsense utterance task.

Hypothesis Test 1. According to the Crosson et al. (2001) hypothesis of IFS function, the IFS should be significantly more active in the semantic generation tasks than in the nonsense utterance tasks. This con-trasts with our model’s prediction that both tasks should lead to the same amount of activation in IFS since they involve approximately the same number of speech sounds buffered in working memory. We will test be -tween these alternatives in Hypothesis Tests 1 and 2. The first test involves assessing a semantic generation – nonsense utterance contrast with SPM. Significantly more activity (random effects analysis, FDR = 0.05) in IFS for the semantic generation task will be taken as support for the Crosson et al. (2001) hypothesis. If no significant activation difference is found, this will support our model’s explanation for the role of IFS as a speech sound buffer, which is specifically tested in Hypothesis Test 2. Hypothesis Test 2. Our model pre-dicts that the many items conditions of both tasks should lead to higher activation in IFS than the few items conditions since more items need to be stored in the sound sequence working memory. This should be true in both the nonsense utterance and semantic generation tasks. In contrast, the Crosson et al. hypothesis pre-dicts no difference in the amount of IFS activation for many items vs. few items in the nonsense utterance task since neither condition involves a semantic component. We will test these predictions by performing a many – few items contrast for both the nonsense and semantic generation tasks. The model’s hypothesis will be supported, and the Crosson hypothesis rejected, if significantly more activation is found in left IFS for this contrast in both the nonsense and semantic generation tasks. If no significant activation is found for this con -trast in either the semantic generation or nonsense utterance task, we will take this as evidence against the model’s hypothesis that left IFS represents a speech sound working memory that does not differentiate be -tween meaningful and nonsense utterances. Collectively, hypothesis tests 1 and 2 will allow us to choose be-tween the competing proposals of our model and Crosson’s model. Hypothesis Test 3. Our model predicts greater signaling from IFS to BA 44 during the many items condition of both tasks since a greater number of speech sound map cells will be activated (see also Shuster & Lemieux, 2005). This hypothesis will be sup-ported if SEM analysis indicates a stronger IFS-to-BA 44 path in the many items condition than the few items condition. Comparing model activations to fMRI experimental results. The CQ+DIVA model will be trained to pro-duce the utterances used in the preceding fMRI experiment, and simulations of the model producing the stim-uli from the experiment will be performed. In simulations of the semantic generation task, utterances from the experimental subjects will be used for the model simulation stimulus set. Note that the CQ+DIVA model does not differentiate between semantically meaningful and nonsense utterances; thus we expect the model simu-lations to show the same activations in the two tasks (semantic generation and nonsense utterance). In addi -tion to verifying that the model is capable of producing the syllables for each utterance in the proper order (thus testing the functionality of the CQ components of the model as well as the frame generation mecha-nism), we will also generate simulated fMRI data as the model produces the utterances from the experiment, and the same contrasts performed in the fMRI experiment will be performed on the resulting data to allow di-rect comparisons between simulated fMRI activity and the experimental results. We will compare the locations and magnitudes of the peak activations in the model to those found in the fMRI experiments. If the model’s cell activities differ significantly from the experimental results (as described above for Experiment 1), the model will be modified to be in accord with the experimental findings by relocating model cells and/or chang-ing the hypothesized functionality of a particular brain region in the model.D.2 Subproject 2: Investigating the learning of new speech sound sequences. This subproject will con-sist of two modeling projects and two corresponding fMRI/behavioral experiments that investigate the learning of new speech sequences. Sakai et al. (2003) define motor sequence learning as “a process whereby a se-ries of elementary movements is re-coded into an efficient representation for the entire sequence (p. 229).” In terms of speech learning, this implies that familiar phoneme or syllable motor programs might be combined into larger “chunks” that can be efficiently manipulated and read out for rapid speech. The proposed modeling projects will investigate two possible loci for learning to produce familiar speech sound sequences: the basal ganglia and the cerebellum. These areas are well-studied and quite distinct anatomically. Our computational studies are designed to help determine how these areas individually contribute to sequence learning, and will generate predictions that can be verified or falsified by assessing the results of the proposed fMRI studies.

Both the basal ganglia and cerebellum have been shown to be important for sequence learning in var-ious studies (e.g. Miyachi et al., 1997; Lu et al., 1998; Nixon and Passingham, 2000; Doyon et al., 2002; Shin and Ivry, 2003). The storage of detailed sequence-specific information in these subcortical structures may re -duce demands on higher-level cognitive processes mediated by prefrontal cortical areas during sequence performance. While this type of learning-driven shift in processing regions has inspired theories of sequential motor control for limb, eye, and finger movement sequences (e.g. Hikosaka et al., 1999; 2000; Nakahara et al., 2001; Rhodes and Bullock, 2002; Rhodes et al., 2004), the nature and limits of such off-loading remain

PHS 398 (Rev. 09/04) Page> 35

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.controversial and have yet to be determined, particularly for well-learned sequences in speech. In the model -ing studies, we hypothesize ways in which the basal ganglia and cerebellum could be utilized in learning with -out sacrificing flexibility in performance. We will use behavioral studies to verify that subjects indeed improve performance in our sequence learning tasks prior to the corresponding fMRI investigations of supra-syllabic (Exp. 1) and sub-syllabic (Exp. 2) sequence learning. The results of these experiments will, in turn, be used to further constrain our sequence learning model.Modeling Project 1: Basal ganglia contributions in sequence learning. A key finding from studies that ex-amined visuo-motor sequence learning through early, intermediate and late phases (Hikosaka et al., 2000) is a progressive decrease in activity in prefrontal/pre-SMA/anterior striatum (caudate) and a progressive in-crease in activity in SMA/posterior striatum (putamen). This generalization is supported by single-cell studies in non-human primates (Nakamura et al., 1998, 1999; Miyachi et al., 1997; Miyachi et al., 2002) and by brain imaging experiments in humans (Sakai et al. 1998). Nakahara et al. (2001) proposed a non-CQ computa-tional model15 of processes that might underlie these activity shifts; in contrast, we propose a CQ-consistent model to address these and further data on contributions of BG circuits and their associated parts in frontal cortex.

In our proposed model, there are two different sources of timing signals that initiate SMA trigger cell activation and thus trigger production of the next item during self-timed speech16: the SMA BG loop or cortico-cortical input from the pre-SMA. According to our model, when a child first begins producing speech se-quences, his/her brain relies heavily on prefrontal cortical areas and the pre-SMA to trigger the onset of each sound in the sequence. During this time, the BG is effectively “monitoring” the time course of SMA activity generated by cortex, and after some practice with a particular sequence, the SMA BG loop becomes capable of activating and deactivating the SMA trigger cells internally (i.e., without requiring pre-SMA timing input to SMA). This is possible because the SMA BG loop receives a wide range of sensory, motor, and prefrontal in-put (including input from IFS, which codes the planned sequence, as well as motor and sensory cortical infor-mation concerning ongoing speech movements) that it can use as context for determining when to activate a particular SMA cell. Thus, the SMA BG loop is hypothesized to be involved in activating SMA trigger cells for “automatic tasks” that have been practiced before, including commonly occurring syllable strings. The genera-tion of trigger signals by the BG when producing a familiar speech sound sequence can be envisioned as fol-lows. High-level cortical areas such as BA 45 generate the sequence and activate the appropriate IFS work -ing memory cells and pre-SMA frame cells. A pre-SMA to SMA signal activates an SMA trigger cell for the first item in the sequence, and this leads to motor execution of that item. Based on earlier learning, the BG are capable of recognizing the motor commands and sensory feedback associated with completion of the item17. When the BG recognize completion, they send a signal to SMA that terminates trigger cell activation for the current sound and initiates activation of the trigger cell for the next sound. In this way, the BG are re-sponsible for the timing of heavily practiced sound sequences. In keeping with this view, several studies have indicated a role for BG in timing of movements within a sequence (Rao et al., 1997; Harrington and Haaland, 1998).

In this framework, prior to any practice, production of a novel speech sequence (and, more generally,

15 Like many prior models, the Nakahara et al. model proposed “central chaining” (networks with internal recurrence) as the underlying serial representation. Early in learning the model forms a response chain using visual coordinates, whereas in later learning, it formed a chain using motor coordinates. This model has several shortcomings. Although it was applied to a task that requires monkeys to learn a “hyperset” sequence comprising five two-item sub-sequences, the proposed learning process completely ignores the two-item chunks, and in effect learns a ten-item chain. As such, the model can-not explain abundant evidence for chunks. Moreover, because there is no CQ component in such chaining models, the only representation of sequences is implicit in the recurrent weight matrices that govern state transitions during perfor -mance. Thus there is no way that such models can explain the neurophysiological data (e.g., Averbeck et al., 2002) that show a pre-movement rank ordering, across simultaneous neural activities in prefrontal cortex, that predicts the forthcom-ing behavioral sequence. Finally, Nakamura et al. reduce the BG loop to computation of a softmax function. In contrast, our proposed sequencing model provides an alternative basis for modeling the caudate to putamen shift in a way that is consistent both with observations of parallel sequence representation and with the complex BG functionality (Brown et al., 2004; see Appendix materials) implied by the complexity of BG-cortical connectivity.16 Note that externally timed sequencing tasks are proposed to involve a third source of timing signal, arriving at SMA via the sensory areas that process the timing signal (see Initiating the chosen sound at the right time in D.1).17 Under normal and fast speaking conditions, we expect that the motor command is the dominant cue for determining completion of the sound, since sensory feedback is significantly delayed relative to the motor command; thus, reliance on a sensory completion signal would lead to pauses between speech sounds, or prolongations of the current sound, while the system waits for sensory evidence of current sound completion before triggering the next sound. It is possible, how-ever, that sensory information regarding the early part of the production of the speech sound also factors into identifying completion of the sound by the BG. For example, if the sensory delay is 75 ms, then the sensory state corresponding to 75 ms before completion of the sound could act (in combination with the motor command corresponding to the end of the sound) to signal completion of the sound and to initiate production of the next sound.

PHS 398 (Rev. 09/04) Page> 36

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.a novel movement sequence of any sort) relies heavily on prefrontal and pre-SMA circuits for timing. These areas are part of BG loops that primarily involve anterior portions of the striatum, particularly in the caudate (Parent and Hazrati, 1995). With practice, as described above, the SMA BG loop becomes more and more in-volved in production of the sequence. This loop involves more posterior portions of the striatum, particularly the putamen. Thus this view is in accord with the main results reported by Hikosaka and colleagues. Below we propose experiments that extend these studies to the learning of new speech sequences.

In the first modeling study we will define the “learning law” equations that modulate the strengths of synapses in the SMA BG loop, and we will implement these equations in the CQ+DIVA model computer soft-ware. We will also modify the SMA BG loop as described below to allow it to identify completion of the current sound. We will perform simulations of the extended model performing speech sequence learning tasks to ver-ify its ability to acquire new sequences in a manner consistent with human behavioral data. These tasks will include the learning of sub-syllabic and supra-syllabic sequences as in the experiments described below. The model will be fit to data obtained about the rate of learning new sequences, reaction times, and acoustic du-rations throughout the course of learning; this fitting will involve adjustment of learning rate parameters in the learning law equations. Synthetic fMRI data will be generated from the model simulations and compared to the results of the imaging experiments (described further below).

In order to recognize completion of an ongoing sound, the BG must receive inputs from the relevant motor, somatosensory, and (possibly) auditory areas. The DIVA model currently includes motor, somatosen-sory, and auditory representations that are appropriate for this purpose. We will implement projections from these areas to the striatum in the CQ+DIVA model, specifically to the portions of the putamen that are part of the SMA BG loop. This is in keeping with the notion of a sensorimotor loop (Parent and Hazrati, 1995). Fur-thermore, projections from the model’s working memory representation (left IFS) to the portion of the striatum corresponding to the SMA BG loop will be added; the pattern of IFS inputs to the BG identify the sequence being produced, whereas the motor and sensory inputs identify the state of the motor execution of the current sound. Together they provide all the information needed for identifying the completion of the current sound in a sequence and triggering of the next sound. Recall that each striatal cell in the model’s SMA BG loop corre-sponds to a different cortical column in SMA (Section D.1). Each time the SMA trigger cell from that column changes state (i.e., goes from inactive to active or from active to inactive), the BG will effectively take a “snap-shot” of the current contextual inputs and will associate it with the change in state of the SMA trigger cell. This type of associative memory operation is a common property of many neural network architectures. Over time, this learning process will allow the BG to generate an appropriate timing signal based on the contextual in -puts alone, without requiring cortico-cortical processes to activate/inactivate the SMA trigger cell via pre-SMA. Modeling Project 2: Sequence learning in the cerebellum. In this modeling project, we will examine how cerebellar learning can contribute to sequencing of speech sounds. As in modeling project 1, we will imple -ment biologically realistic learning equations that modify synaptic weights, in this case within the model cere-bellum and cortico-cerebellar loops (e.g. Allen & Tsukahara, 1974; Middleton & Strick, 2000), and we will again make comparisons between data simulated from the model and the reaction time, duration, and fMRI results of the learning experiments described below. Like the basal ganglia, the cerebellum receives wide-spread inputs from motor, somatosensory, and prefrontal cortical areas (e.g. Schmahmann & Pandya, 1997) and its output nuclei project to many cortical areas including the SMA and pre-SMA (Wiesendanger & Wiesendanger, 1985; Matelli et al., 1995) and BA 46/9 (Dum & Strick, 2003). Furthermore, both the basal ganglia and cerebellum included portions that were more active for complex syllable sequences than for sim-ple syllable sequences in our preliminary fMRI study (Section C.4). Although cerebellar and basal ganglia in-put systems are not completely redundant, the two receive much of the same contextual information from the cortex. However, unlike the striatum (Zheng & Wilson, 2002), the cerebellar cortex has connectivity sufficient to perform a sparse expansive recoding of its input. By implementing both of these adaptive components in neurobiologically realistic computer simulations of the CQ+DIVA model, and by comparing simulations to the results of our proposed fMRI and behavioral studies, we hope to clarify the precise nature of cerebellar and basal ganglia involvement in learning of speech sound sequences.

In prior research, we defined biologically plausible neural network models of learning in cerebellum to characterize its function in typing-like performances and in eyeblink conditioning (Rhodes & Bullock, 2002; Fi-ala et al., 1996). In this project, we will incorporate such a network within the CQ+DIVA model (in addition to the BG learning circuit described in Modeling Project 1). In particular, Rhodes & Bullock (2002) simulated how the cortico-cerebellar loop could learn long-term sequence memories for familiar items and retrieve them into a CQ planning layer. This enables the cortex to treat familiar sequences as single items, i.e., “chunks” (which behave similarly in manual and speech sequences; see Klapp, 2003). This will augment the CQ+DIVA model with an ability to incrementally learn what Levelt et al. (1999) called a “syllabary”, i.e., a representation of high frequency syllables that does not necessarily respect word boundaries but aids in rapid speech production18.

18 Dell et al. (1999) criticized the Levelt et al. model because it could not generate observed patterns of exchange errors.

PHS 398 (Rev. 09/04) Page> 37

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.In the experiments described below, participants will learn to quickly and accurately produce novel

strings of syllables (Experiment 1) or phonemes (Experiment 2). Lesion and imaging studies suggest that multiple memory systems and brain areas may be recruited to support such learning, which occurs many times daily during the rapid vocabulary acquisition phase of language learning. If elements of the sequence are already familiar and easily reproduced in isolation, then such sequence learning has three notable phases: (1) good perceptual sequence recognition, but perhaps incomplete (error-prone) recall, following one or a few exposures, (2) robust sequence recall sufficient to accurately reproduce the sequence with moderate fluency, and (3) integration of the sequence into a single "chunk" with highly fluent production. The number of training trials in our experiments will allow subjects to reach at least stage (2) for learned sequences (cf. Klapp, 2003).Behavioral Experiment 1: Learning of new supra-syllabic sequences. The purpose of this experiment is to prepare subjects for a subsequent fMRI experiment that will look for differences in brain activation for learned sequences of syllables compared to novel sequences. In order to carefully control the degree to which particular sequences of syllables have been learned, and to avoid potential confounds due to semantic content, we will construct a list of 18 nonsense pseudowords, each consisting of four concatenated syllables (e.g., “nerGERpretez”), and train each subject to produce a subset of the pseudowords. The pseudowords are generated by using random permutations of 4 syllables according to several constraints: 1) all stimuli have the same syllable structure ([CVC][CVC][CCV][CVC]); 2) stress is always assigned to the second sylla-ble; 3) all syllables occur with approximately the same frequency in English; 4) consecutive syllable-syllable pairs occur with negligible frequency in English; 5) the phonological neighborhood density for all stimuli is zero; 6) none of the syllables are words in English. Frequency information is derived from the CELEX-2 data -base; neighborhood density is assessed using the Washington University Neighborhood Database.

The experiment will involve two approximately 1-hour sessions for each of 17-34 healthy adult sub-jects. The first run of the first session19 will be a pre-test that measures mean error rates, reaction times, and durations as the subject produces the 18 pseudowords, presented 5 times each, in a simple reaction time ex-periment. At the beginning of each trial, the pseudoword will be presented orthographically and auditorily (to promote proper pronunciation) for 1.5 s. After a random delay period (500ms to 2000ms), a go signal in the form of a beep will be presented, and the subject will be instructed to accurately say the word as quickly as possible after the beep, then the next trial begins (total trial length approx. 5s). After the pre-test, the subject will perform 6 training runs (each approximately 8-10 minutes long) in which they repeatedly produce 6 of the pseudowords from the list in the same serial reaction time task (10 presentations of each stimulus in each run). These 6 pseudowords will constitute the learned sequences for that subject, and the remaining 12 pseudowords, which are not encountered during the training runs, will constitute the unlearned sequences for that subject20. The second training session will be scheduled one day after the first training session 21, at which time the subject will perform six additional training runs followed by a post-test identical to the pre-test.

Hypothesis Test 1. Based on the results of serial reaction time studies (Schendan et al, 2003; Doyon, 2002, 2003; Aizenstein, 2004) and prepared sequence reaction time studies (Klapp, 2003) involving the learning of novel sequences we expect reaction times to decrease more for the learned sequences than the unlearned sequences as a result of training22. This hypothesis will be tested by performing a repeated measures ANOVA (two-way interaction term between training session [pre- vs, post- training] and sequence set [learned vs. unlearned], one-tailed, p < 0.05). Hypothesis Test 2. We also expect a relatively larger de-crease in the duration of the learned pseudoword productions after training compared to unlearned psue-dowords (same test as Hypothesis 1 but with duration as outcome). Hypothesis Test 3. Finally, we expect a relatively larger decrease in the error rates of the learned pseudoword productions after training compared to the unlearned pseudowords (log-linear test for two-way interaction in crosstabulation of correct/incorrect pro-ductions, training session, and sequence set, p<.05). Error rates will be determined by presenting produced

Levelt et al. (1999) responded that they might be able to address this problem by adopting a central CQ assumption: pro-duction of a segment causes deletion of that segment’s representation from the output buffer. The proposed CQ+DIVA model, augmented with trans-cerebellar chunking to provide the production-speed advantages of a “syllabary”, will con-tinue to exhibit realistic exchange errors because cerebellar output will act via the planning level of the CQ+DIVA model.19 Prior to the first run, subjects will undergo a practice run with stimuli that will not appear in the subsequent experiment. Our pilot studies indicate that such a run greatly reduces variability in the pre-test reaction time and duration measures.20 The learned and unlearned sequences lists will be balanced for average reaction time, duration, and error rate as deter -mined from pilot runs. Different subjects will learn different pseudowords to control for additional possible biases.21 Studies indicate that consolidation of motor sequence learning benefits from sleep (e.g., Doyon and Benali, 2005).22 Reaction time experiments typically report a task learning effect in addition to sequence learning. Thus we anticipate that the subject will show faster reaction times in the post-test than the pre-test for both learned and unlearned se-quences. The important measure for our purposes is the relative sizes of these reductions; specifically, reaction time for the learned sequences should decrease more than for the unlearned sequences. Our statistical tests are designed to ad-dress this issue.

PHS 398 (Rev. 09/04) Page> 38

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.stimuli in random order to three speech scientists for error tagging23. An utterance will be considered an error if two of the three scientists tag it as erroneous.

It is likely that error rates will show floor effects for some subjects who make very few errors even early in learning. Also, the duration/reaction time differences may be quite small in some subjects. Thus it is un-likely that all subjects will show significant differences in all three measures (error rate, duration, reaction time) even if they learn to produce the sequences fluently. We will conclude that a subject has undergone sig-nificant learning of the sequences if any two of Hypotheses 1-3 are supported for that subject. We have per -formed pilot studies (involving 3 subjects) of the experimental protocols described here and in Behavioral Ex-periment 2 below. Subjects showed significant decreases in error rate and duration for the learned words (Hy-potheses 2 and 3), but the changes in reaction time for these subjects were not statistically significant (Hy -pothesis 1). This appeared to be due (at least in part) to having only 3 learned pseudowords for each sub -ject24. We will use 6 learned pseudowords rather than 3 in the proposed studies in order to increase statistical power for the reaction time measure. Only subjects who show significant learning will be used in the subse-quent fMRI experiment. Based on our pilot study, we anticipate that we may need to run as many as 34 sub-jects to obtain 17 “significant learners” for the fMRI experiment. fMRI Experiment 1: Brain activations underlying the learning of new supra-syllabic sequences. This experiment is designed to test several hypotheses discussed in the theoretical background above. The 17 “significant learners” from the behavioral experiment will participate in this fMRI study, which will compare the brain activity patterns underlying the production of learned and novel syllable sequences25. In a given trial, the subject will be presented with either a learned sequence from the behavioral experiment or an unlearned sequence not encountered in the behavioral experiment to produce in a simple reaction time task. The learned and unlearned sequence trials will be the same as in the behavioral experiment except that we will use a longer total trial length to accommodate our event-triggered scanning protocol (see fMRI experimental protocol above). A third, baseline condition will consist of viewing XXXX on the screen; subjects will be in-structed to rest quietly during trials that start with the XXXX stimulus. The three different trial types (learned sequences, unlearned sequences, and baseline) will be randomly interspersed within a run. During all parts of the fMRI study, the experimenter will monitor the subject's productions, and if frequent errors in production are detected, the experimenter will remind the subject between runs of the correct pronunciations. The event-triggered design of our fMRI protocol allows us to remove trials in which subjects have committed production errors. After completion of the fMRI session, recordings of the subject's utterances will be reviewed and erro -neous productions removed from subsequent analyses. According to our power analysis, average error rates as high as 50% will still allow detection of significant activation changes in the fMRI analysis.

According to the theoretical framework described earlier in this section, for novel syllable sequences the pre-SMA and associated circuitry in the anterior striatum (caudate) must send individual trigger signals for each syllable to the SMA. In contrast, for a learned sequence only the first trigger signal comes from pre-SMA, while the trigger signals for the remaining syllables are transmitted to SMA via the SMA BG loop, which involves more posterior regions of the striatum (putamen). To test these hypotheses, we will perform a con-trast between the learned and unlearned conditions to identify significant activity differences (random effects, FDR = 0.05) between the two conditions (learned – unlearned contrast and unlearned – learned contrast). Hypothesis Test 1. The hypothesis of more pre-SMA involvement for unlearned sequences will be supported if we find significant activation in the pre-SMA in the learned – unlearned contrast. If no significant activation difference between the learned and unlearned sequences is found in pre-SMA, we will conclude that pre-SMA is likely to be equally involved in the production of novel and learned supra-syllabic speech sequences, unlike other types of movement sequences (Sakai et al., 1998; Wu et al., 2004; Doyon, 2002). Hypothesis Test 2. The hypothesis of more caudate activity for unlearned sequences will be supported if we find signifi -cant activation in the caudate in the unlearned – learned contrast. If no significant activation difference be-tween the learned and unlearned sequences is found in caudate, we will conclude that the caudate is likely equally involved in the production of novel and learned supra-syllabic speech sequences, unlike other types of movement sequences (Floyer-Lea, 2004). Hypothesis Test 3. The hypothesis of more putamen activity for learned sequences will be supported if we find significant activation in the putamen in the learned – un-learned contrast. If no significant activation difference between the learned and unlearned sequences is found in putamen, we will conclude that the putamen is likely equally involved in the production of novel and 23 We will count all phoneme and syllable substitutions, omissions, and insertions as errors, as well as incorrect stress patterning. Analysis of the acoustic signal (including f0, intensity, and duration measures) will be used to aid in this process.24 The number of presentations of each pseudoword in the pre-test must be small since it is matched to the number of presentations of each unlearned word, and the number of presentations of unlearned words must be limited to prevent significant learning of these words during the pre-test.25 Subjects will undergo two additional behavioral training runs with the learned sequence stimuli approximately 1-2 hours prior to the scanning session to re-familiarize themselves with these stimuli.

PHS 398 (Rev. 09/04) Page> 39

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.learned supra-syllabic speech sequences, unlike other types of movement sequences (Floyer-Lea, 2004; Doyon, 2002). Hypothesis Test 4. According to our model, the pre-SMA sends many more trigger signals to SMA for novel than for learned sequences. To test this hypothesis, we will compare the strength of effective connectivity between SMA and pre-SMA in the two conditions. Our hypothesis will be supported if we find sig-nificantly stronger effective connectivity in the novel condition than the learned condition. Hypothesis Test 5. According to our model, the basal ganglia send many more trigger signals via thalamus to SMA for learned sequences than for novel sequences. To test this hypothesis, we will compare the strength of effective con-nectivity between the ventral lateral thalamic nucleus and the SMA in the two conditions. Our hypothesis will be supported if we find significantly stronger effective connectivity in the learned condition than the novel con -dition.

According to our theoretical framework, the cerebellum learns to “chunk” frequently occurring sub-phrases to reduce the working memory load in IFS (see also Rhodes and Bullock, 2002). Thus we should see less IFS activity and more cerebellar output nucleus activity (i.e., more deep cerebellar nucleus activity) for learned sequences. Hypothesis Test 6. If significant activity is found in the left IFS for the unlearned – learned contrast, the hypothesis of reduced working memory load in IFS for learned sequences will be sup-ported. Hypothesis Test 7. If significant activity is found in the deep cerebellar nuclei for the learned – un-learned contrast, the hypothesis of cerebellar involvement in the learning of frequently occurring sequence “chunks” will be supported. If Hypotheses 6 and 7 are not supported, we will conclude that the cerebellum is not involved in learning frequently occurring syllable sequences, and instead it may be involved in learning novel sub-syllabic sequences; this will be tested in fMRI Experiment 2. Behavioral Experiment 2: Learning of new sub-syllabic sequences. This experiment will be the same as Behavioral Experiment 1, except that subjects will be trained on novel sub-syllabic sequences; i.e., syllables that are composed of phonemes from American English, but which rarely, if ever, occur in the language. These single syllable stimuli will be constructed from phonotactically illegal and/or highly infrequent syllable onsets and codas. For example, the syllable onsets /tl/ and /pf/ are not used in English, but occur in other lan -guages. Likewise, certain consonant clusters rarely if ever appear in the coda position (e.g. /lthk/). A set of unfamiliar syllables will be constructed by combining such onsets and codas with vowels (/i/, /a/, /u/) to form syllables like /pfalthk/. A similar approach to creating improbable speech stimuli from infrequent phonological sequences has been used in several prior studies (e.g. Treiman et al., 2000; Munson, 2001; Vitevitch et al., 1997). To verify that the syllables we create are highly infrequent or non-existent in spoken or written English, we will search the CELEX-2 database to insure that it contains no instances of each syllable. Because sub-jects will presumably have no practice with these syllables26, according to Levelt’s model they should not be part of the mental syllabary and they should be produced with longer reaction times (Levelt and Wheeldon, 1994) than frequently occurring English syllables. Furthermore, according to our model they should involve more pre-SMA activity than more frequently occurring syllables. We will perform the same hypothesis tests described for Behavioral Experiment 1, and we will conclude that a subject is a “significant learner” if 2 of the 3 hypotheses are supported for that subject (see Behavioral Experiment 1 for details). Only significant learn -ers will participate in the following fMRI experiment. Although subjects may insert additional sounds to make the sequences more familiar at the start of learning, our pilot data indicate that the use of an auditory exam-ple reduces this tendency, and the error rate is not high enough to prohibit proper analyses excluding those trials.fMRI Experiment 2: Brain activations underlying the learning of new sub-syllabic sequences. In this fMRI experiment, we will use the same stimulus set used in Behavioral Experiment 2 to measure BOLD activ -ity for learned vs. unlearned sub-syllabic sequences. The experimental protocol will be identical to fMRI Ex-periment 1; the only difference being the use of sub-syllabic stimuli from Behavioral Experiment 2 rather than the supra-syllabic stimuli used in fMRI Experiment 1. As in fMRI Experiment 1, our analysis will look for differ -ences in brain activation for learned vs. unlearned sequences. We will perform the same 7 hypothesis tests for this experiment as performed in fMRI Experiment 1. Erroneous utterances will be removed from the analy-sis as described for Experiment 1. We will compare the results of the two experiments to identify similarities and differences in the brain mechanisms used to learn sequences at the sub-syllabic and supra-syllabic lev -els. Together these experiments and associated modeling projects should significantly improve our under-standing of the brain mechanisms involved in learning to produce fluent speech sequences. Comparing model activations to fMRI experimental results. The CQ+DIVA model will be trained to pro-duce the supra- and sub-syllabic sequences used in the fMRI experiments, and simulations of the model pro -ducing the stimuli from the experiment will be performed. In addition to verifying that the model is capable of properly producing the sequences, we will also generate simulated fMRI data and perform the same contrasts as outlined for the fMRI experiments. We will then compare the simulated fMRI activity and the experimental

26 For this study, we will recruit only subjects who speak only American English. This eliminates a possible confound with subjects who have experience with languages which use sub-syllabic phoneme transitions that are illegal in English.

PHS 398 (Rev. 09/04) Page> 40

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.results as described previously. Particular attention will be paid to differential activations of cortical and sub -cortical areas in the two tasks; we expect to gain a much clearer picture of how the basal ganglia and cere-bellum interact with cortical regions during speech sequence learning based on these results. The model will be modified accordingly. D.3. Timeline of proposed re-search

PHS 398 (Rev. 09/04) Page> 41

Year 5Year 4Year 1

fMRI Expt 1.1data collection

Year 2 Year 3

Modeling Project 1.1Modeling Project 2.1

Modeling Project 2.2

fMRI Expt 1.2data collection

fMRI Expt 2.1data collection

fMRI Expt 2.2data collection

fMRI Expt 1.1analysis & writeup

fMRI Expt 1.2analysis & writeup

fMRI Expt 2.1analysis & writeup

fMRI Expt 2.2analysis & writeup

Behavioral Expt D.2.1data collect. & analysis

Behavioral Expt D.2.2data collect. & analysis

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.E. Protection of Human Subjects

RISKS TO THE SUBJECTS Human Subjects Involvement and Characteristics: All subjects will be healthy individuals with no history of speech or hearing disorders (except as noted), seizures, or severe claustrophobia. Normally speaking sub-jects will be recruited by local advertisement, and consented by procedures approved by the local Institutional Review Boards. The study proposed is of basic physiology and thus cannot be expected, at least initially, to be influenced by gender or ethnicity. Subjects will therefore be recruited in proportion to their ethnic and gender bal-ance in the local community with the additional constraint that subjects will be native speakers of American English (Massachusetts demographics according to the year 2000 census data: 7% Latino, 93% not Latino; 84% white, 5% black or African American, 4% Asian, 0.2% American Indian and Alaska native, with similar numbers of men and women). A further constraint will apply to subjects recruited for the sub-syllabic experiments described in section D.2. To ensure no exposure to sub-syllabic phoneme combinations that are illegal in Ameri-can English (which may occur in the experimental stimuli), these subjects must speak only American English Subjects will be between the ages of 18 and 55, with no form of implant that involves magnetic or electric parts. Because brain lateralization for different speech tasks is among the issues being studied, we will use right-handed subjects to maximize the likelihood that the subject has left hemisphere dominance for lan -guage. A total of approximately 160 subjects (60 of which speak only American English) will be needed for the experiments proposed herein. In experiments involving visually presented stimuli, subjects will be required to have normal or corrected-to-normal vision (corrected using non-magnetic glasses available at the MGH NMR Center). The proposed dates of enrollment are 4/1/2006 – 3/31/2010.1. Inclusion Criteria

(1) Age between 18 and 55 years(2) Normal physical and normal neurologic examination(3) Ability to participate in fMRI experiments

2. Exclusion Criteria(1) Cognitive deficit that could impair ability to give informed consent or competently participate in the study(2) Active medical, neurologic or psychiatric condition(3) Presence of an MRI risk factor:

(1) Known claustrophobia(2) An electrically, magnetically or mechanically activated implant (such as cardiac pacemaker) or

intracerebral vascular clip(3) Metal in any part of the body, including metal injury to the eye(4) Pregnancy

(4) Lefthandedness

Sources of Materials: Research material will be in the form of computer data concerning performance on psychophysical tests conducted at Boston University and MRI data recorded at the NMR Center of Massa-chusetts General Hospital. These data will be obtained specifically for the research purposes described in Sections A-D and will be stored in accordance with HIPAA regulations.

Potential Risks: The planned procedures involve unlikely and negligible risk. There are no known or foresee-able physical risks associated with MEG. There are also no known or foreseeable physical risks associated with undergoing MRI, except for those individuals who have an electronically, magnetically or mechanically activated implant or metal in their body, or are pregnant (see MRI risk factors above). All features of the MRI system to be used in the proposed study have been approved by the FDA and will be operated using param-eters accepted by the FDA. Subjects will wear earplugs or headsets as hearing protection as mandated by OSHA. There is the risk of psychological discomfort if a subject has claustrophobia. All potential subjects will be screened for risk factors prior to study enrollment, and all enrolled subjects will again be screened just be-fore undergoing MRI. These screening procedures should exclude subjects with foreseeable risk.

ADEQUACY OF PROTECTION AGAINST RISKS Recruitment and Informed Consent: Subjects will be recruited by advertising in the form of paper and web-based posters, and electronic mail messages. When an individual volunteers to be a subject, the experimen-tal protocol will be explained in detail verbally and she/he will be given a copy of the consent form to read and sign. Prior to the fMRI experiments, the subject will fill out a questionnaire designed to identify any potentially dangerous health conditions (such as metal implants). The subject will be told that the experimenters will an-swer any questions about the procedure (except about aspects of the design or hypotheses that might influ-

PHS 398 (Rev. 09/04) Page> 42

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.ence their performance).

Protection Against Risk: The MRI equipment is built to ensure the highest possible degree of subject safety. Subjects will be able to terminate experimental sessions for any reason by signaling the experi -menters in a manner explained to the subject before the session begins. Subject confidentiality is protected by not using their names or initials in published reports and protecting their data in accordance with HIPAA regulations. All potential subjects will be screened for MRI risk factors prior to study enrollment. If the potential subject cannot rule out the possibility of pregnancy, a pregnancy test will be conducted prior to study enroll -ment. All enrolled subjects will again be screened just before undergoing MRI. These screening procedures will exclude subjects with foreseeable risks. All subjects will be monitored continuously by research investiga-tors during MRI sessions. Subjects will be able to communicate with research investigators throughout all ex-perimental sessions via a 2-way microphone, and an emergency alarm system. All subjects will wear earplugs or headsets during MRI to reduce the transmission of noises of the MRI scanner (e.g., buzzing, beeping) to a comfortable and safe level. If a subject experiences any discomfort that cannot be alleviated by the research investigators, the experimental session will be terminated.

POTENTIAL BENEFITS OF THE PROPOSED RESEARCH TO THE SUBJECTS AND OTHERS AND IM-PORTANCE OF THE KNOWLEDGE TO BE GAINEDSubjects will not directly benefit from their participation in the proposed studies, except for subject payment. However, their participation will contribute very useful information concerning the neural mechanisms of nor-mal and dysfluent speech. Thus the risk/benefit ratio is negligible.

Collaborating SitesMassachusetts General Hospital, Charlestown, MA (OHRP assurance number M1331)

INCLUSION OF WOMEN Subject recruitment is via advertisement and it will be clearly stated that women are encouraged to participate in the research. From our past studies we expect women to make up approximately half of our subject pool (see Planned Enrollment Table below).

INCLUSION OF MINORITIESSubject recruitment is via advertisement and it will be clearly stated that members of minority groups are en -couraged to participate in the research. From our past studies we expect minorities to be represented in our subject pool in approximate proportion to their representation in the local population (see Planned Enrollment Table below).

INCLUSION OF CHILDRENChildren 18-21 will be enrolled. Younger children will not be enrolled for the following reasons:

(1) the proposed studies are primarily aimed at determining brain function in the fully developed brain and are not addressing issues of development;

(2) the fMRI experiments require subjects who can patiently perform repetitive tasks without moving and/or losing concentration.

PHS 398 (Rev. 09/04) Page> 43

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.

Targeted/Planned Enrollment TableThis report format should NOT be used for data collection from study participants.

Study Title: Sequencing and Initiation in Speech Production

Total Planned Enrollment: 160

TARGETED/PLANNED ENROLLMENT: Number of Subjects

Ethnic Category Sex/Gender

Females Males Total

Hispanic or Latino 6 6 12

Not Hispanic or Latino 74 74 148

Ethnic Category Total of All Subjects* 80 80 160

Racial Categories 

American Indian/Alaska Native 1 1 2

Asian 4 4 6

Native Hawaiian or Other Pacific Islander 1 1 2

Black or African American 6 6 14

White 68 68 170

Racial Categories: Total of All Subjects * 80 80 160 *The “Ethnic Category Total of All Subjects” must be equal to the “Racial Categories Total of All Subjects.”

PHS 398 (Rev. 09/04) Page> 44

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.F. Vertebrate Animals

There are no studies involving vertebrate animals in the current proposal.

PHS 398 (Rev. 09/04) Page> 45

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.

G. Literature Cited[1] H. Ackermann and A. Riecker (2004). "The contribution of the insula to motor aspects of speech produc-

tion: a review and a hypothesis." Brain Lang 89(2): 320-8.[2] Y. Agam, D. Bullock and R. Sekuler (2005). "Imitating unfamiliar sequences of connected linear motions."

Volen Center for Complex Systems Technical Report 2005-3: Submitted for publication.[3] H. J. Aizenstein, V. A. Stenger, J. Cochran, K. Clark, M. Johnson, R. D. Nebes and C. S. Carter (2004).

"Regional brain activation during concurrent implicit and explicit sequence learning." Cerebral Cortex 14: 199-208.

[4] R. L. Albin, A. B. Young and J. B. Penney (1989). "The functional anatomy of basal ganglia disorders." Trends in Neurosciences 12: 366-374.

[5] G. E. Alexander and M. D. Crutcher (1990). "Functional architecture of basal ganglia circuits: neural sub-strates of parallel processing." Trends in Neurosciences 13: 266-271.

[6] G. E. Alexander, M. R. DeLong and P. L. Strick (1986). "Parallel organization of functionally segregated circuits linking basal ganglia and cortex." Annual Reviews of Neuroscience 9: 357-381.

[7] G. I. Allen and N. Tsukahara (1974). "Cerebrocerebellar communication systems." Physiol Rev 54(4): 957-1006.

[8] M. A. Arbib (in press). "From monkey-like action recognition to human language." Behavoral and Brain Sciences.

[9] B. B. Averbeck, M. V. Chafee, D. A. Crowe and A. P. Georgopoulos (2002). "Parallel processing of serial movements in prefrontal cortex." Proc Natl Acad Sci U S A 99(20): 13172-7.

[10] B. B. Averbeck, D. A. Crowe, M. V. Chafee and A. P. Georgopoulos (2003). "Neural activity in prefrontal cortex during copying geometrical shapes. II. Decoding shape segments from neural ensembles." Exp Brain Res 150(2): 142-53.

[11] H. Barbas and N. Rempel-Clower (1997). "Cortical structure predicts the pattern of corticocortical con-nections." Cereb Cortex 7(7): 635-46.

[12] M. A. Basso and R. H. Wurtz (1998). "Modulation of neuronal activity in superior colliculus by changes in target probability." Journal of Neuroscience 18: 7519-7543.

[13] S. R. Baum and V. D. Dwivedi (2003). "Sensitivity to prosodic structure in left- and right-hemisphere-dam-aged individuals." Brain Lang 87(2): 278-89.

[14] J. T. Becker, D. K. MacAndrew and J. A. Fiez (1999). "A comment on the functional localization of the phonological storage subsystem of working memory." Brain Cogn 41(1): 27-38.

[15] D. G. Beiser and J. C. Houk (1998). "Model of cortical-basal ganglionic processing: encoding the serial order of sensory events." J Neurophysiol 79(6): 3168-88.

[16] M. D. Bevan, P. J. Magill, D. Terman, J. P. Bolam and C. J. Wilson (2002). "Move to the rhythm: oscilla-tions in the subthalamic nucleus-external globus pallidus network." Trends Neurosci 25(10): 525-31.

[17] R. M. Birn, P. A. Bandettini, R. W. Cox and R. Shaker (1999). "Event-related fMRI of tasks involving brief motion." Hum Brain Mapp 7(2): 106-14.

[18] I. Boardman and D. Bullock (1991). A neural network model of serial order recall from short-term mem-ory. IJCNN Proceedings, Seattle, Washington.

[19] K. A. Bollen (1989). Structural equations with latent variables. New York, Wiley.[20] G. Bradski, G. A. Carpenter and S. Grossberg (1994). "STORE working memory networks for storage

and recall of arbitrary temporal sequences." Biological Cybernetics 71: 469-480.[21] J. W. Brown, D. Bullock and S. Grossberg (2004). "How laminar frontal cortex and basal ganglia circuits

interact to control planned and reactive saccades." Neural Netw 17(4): 471-510.[22] C. Buchel, J. T. Coull and K. J. Friston (1999). "The predictive value of changes in effective connectivity

for human learning." Science 283: 1538-1541.[23] C. Buchel and K. J. Friston (1997). "Modulation of connectivity in visual pathways by attention: cortical in-

teractions evaluated with structural equation modeling and fMRI." Cerebral Cortex 7: 768-778.[24] E. Bullmore, B. Horwitz, G. Honey, M. Brammer, S. Williams and T. Sharma (2000). "How good is good

enough in path analysis of fMRI data?" Neuroimage 11: 289-301.[25] D. Bullock (2004a). "Adaptive neural models of queuing and timing in fluent action." Trends Cogn Sci

8(9): 426-433.[26] D. Bullock (2004b). "From parallel sequence representations to calligraphic control: A conspiracy of neu-

ral circuits." Motor control 6: 371-391.[27] D. Bullock, R. Bongers, M. Lankhorst and P. J. Beek (1999). "A vector-integration-to-endpoint model for

per-formance of viapoint movements." Neural networks 12: 1-29.[28] D. Bullock, S. Grossberg and C. Mannes (1993). "A neural network model for cursive script production."

Biological Cybernetics 70: 15-28.[29] D. Bullock and B. Rhodes (2003). Competitive queuing for serial planning and performance. In M. Arbib

PHS 398 (Rev. 09/04) Page> 46

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.(Ed.) Handbook of brain theory and neural networks, 2ed. Cambridge, MA: MIT Press, pp. 241-244. Cambridge, MA, MIT Press.

[30] K. Caesar, L. Gold and M. Lauritzen (2003). "Context sensitivity of activity-dependent increases in cere -bral blood flow." PNAS 100(7): 4239-4244.

[31] D. E. Callan, R. D. Kent, F. H. Guenther and H. K. Vorperian (2000). "An auditory-feedback-based neural network model of speech production that is robust to developmental changes in the size and shape of the articulatory system." J Speech Lang Hear Res 43(3): 721-36.

[32] P. Cisek and J. F. Kalaska (2002). "Simultaneous encoding of multiple potential reach directions in dorsal premotor cortex." Journal of Neurophysiology 87: 1149-1154.

[33] W. T. Clower and G. E. Alexander (1998). "Movement sequence-related activity reflecting numerical or-der of components in supplementary and presupplementary motor areas." J Neurophysiol 80(3): 1562-6.

[34] C. M. Conway and M. H. Christiansen (2001). "Sequential learning in non-human primates." Trends in Cognitive Sciences 5(12): 539-546.

[35] N. Cowan (1994). "Mechanisms of verbal short-term memory." Current Directions in Psychological Sci-ence 3(6): 185-189.

[36] B. Crosson, J. R. Sadek, L. Maron, D. Gokcay, C. M. Mohr, E. J. Auerbach, A. J. Freeman, C. M. Leonard and R. W. Briggs (2001). "Relative shift in activity from medial to lateral frontal cortex during internally versus externally guided word generation." J Cogn Neurosci 13(2): 272-83.

[37] S. Crottaz-Herbette, R. T. Anagnoson and V. Menon (2004). "Modality effects in verbal working memory: differential prefrontal and parietal responses to auditory and visual stimuli." Neuroimage 21(1): 340-51.

[38] J. L. Cummings (1993). "Frontal-subcortical circuits and human behavior." Arch Neurol 50(8): 873-80.[39] M. Dapretto and S. Y. Bookheimer (1999). "Form and content: dissociating syntax and semantics in sen-

tence comprehension." Neuron 24(2): 427-32.[40] F. L. Darley, A. Aronson, E. and J. R. Brown (1975). Motor speech disorders. Philadelphia, Saunders.[41] G. S. Dell, L. K. Burger and W. R. Svec (1997). "Language production and serial order: a functional anal -

ysis and a model." Psychol Rev 104(1): 123-47.[42] G. S. Dell, V. S. Ferreira and K. Bock (1999). "Binding, attention, and exchanges." Behavioral and Brain

Sciences 22: 41-42.[43] J. M. Deniau and G. Chevalier (1985). "Disinhibition as a basic process in the expression of striatal func -

tions. II. The striato-nigral influence on thalamocortical cells of the ventromedial thalamic nucleus." Brain Res 334(2): 227-33.

[44] G. Dogil, H. Ackermann, W. Grodd, H. Haider, H. Kamp, J. Mayer, A. Riecker and D. Wildgruber (2002). "The speaking brain: a tutorial introduction to fmri experiments in the production of speech, prosody and syntax." Journal of Neurolinguistics 15: 59-90.

[45] P. F. Dominey (1998). "Influences of temporal organization on sequence learning and transfer: Com-ments on Stadler (1995) and Curran and Keele (1993)." Journal of Experimental Psychology: Learn-ing, Memory, and Cognition 24: 234-248.

[46] J. Doyon and H. Benali (2005). "Reorganization and plasticity in the adult brain during learning of motor skills." Curr Opin Neurobiol 15(2): 161-167.

[47] J. Doyon, V. B. Penhune and L. G. Ungerleider (2003). "Distinct contribution of the cortico-striatal and cortico-cerebellar systems to motor skill learning." Neuropsychologia 41: 252-262.

[48] J. Doyon, A. W. Song, A. Karni, F. Lalonde, M. M. Adams and L. G. Ungerleider (2002). "Experience-de-pendent changes in cerebellar contributions to motor sequence learning." Proc Natl Acad Sci U S A 99(2): 1017-22.

[49] N. F. Dronkers (1996). "A new brain region for coordinating speech articulation." Nature 384(6605): 159-61.

[50] R. P. Dum and P. L. Strick (2003). "An unfolded map of the cerebellar dentate nucleus and its projections to the cerebral cortex." J Neurophysiol 89(1): 634-9.

[51] D. Durstewitz and J. K. Seamans (2002). "The computational role of dopamine D1 receptors in working memory." Neural Netw 15(4-6): 561-72.

[52] J. C. Eccles (1982). "The initiation of voluntary movements by the supplementary motor area." Arch Psy-chiatr Nervenkr 231(5): 423-41.

[53] J. Elman (1995). Language processing. The handbook of brain theory and neural networks. M. A. Arbib. Cambridge, MA, MIT Press: 508-512.

[54] A. H. Fagg and M. A. Arbib (1998). "Modeling parietal-premotor interactions in primate control of grasp-ing." Neural Netw 11(7-8): 1277-1303.

[55] S. Farrell and S. Lewandowsky (2004). "Modelling transposition latencies: Constraints for theories of se-rial order memory." Journal of Memory and Language 51: 115-135.

[56] J. C. Fiala, S. Grossberg and D. Bullock (1996). "Metabotropic glutamate receptor activation in cerebellar

PHS 398 (Rev. 09/04) Page> 47

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.Purkinje cells as substrate for adaptive timing of the classically conditioned eye-blink response." J Neurosci 16(11): 3760-74.

[57] J. A. Fiez (2001). "Neuroimaging studies of speech an overview of techniques and methodological ap -proaches." J.\ Commun.\ Disord. 34(6): 445--454.

[58] A. Floyer-Lea and P. M. Matthews (2004). "Changing brain networks for visuomotor control with in-creased movement automaticity." Journal of Neurophysiology 92: 2405-2412.

[59] P. T. Fox (2003). "Brain imaging in stuttering: where next?" J Fluency Disord 28(4): 265-72.[60] P. T. Fox, R. J. Ingham, J. C. Ingham, T. B. Hirsch, J. H. Downs, C. Martin, P. Jerabek, T. Glass and J. L.

Lancaster (1996). "A PET study of the neural systems of stuttering." Nature 382(6587): 158-61.[61] P. T. Fox, R. J. Ingham, J. C. Ingham, F. Zamarripa, J. H. Xiong and J. L. Lancaster (2000). "Brain corre -

lates of stuttering and syllable production. A PET performance-correlation analysis." Brain 123 ( Pt 10): 1985-2004.

[62] M. J. Frank (2005). "Dynamic dopamine modulation in the basal ganglia: a neurocomputational account of cognitive deficits in medicated and nonmedicated Parkinsonism." Journal of Cognitive Neuro-science 17: 51-72.

[63] K. J. Friston (2002). "Beyond phrenology: What can neuroimaging tell us about distributed circuitry?" An-nual Reviews of Neuroscience 25: 221-250.

[64] K. J. Friston, A. Holmes, J. B. Poline, C. J. Price and C. D. Frith (1996). "Detecting activations in PET and fMRI: levels of inference and power." Neuroimage 4(3 Pt 1): 223-235.

[65] V. Fromkin (1971). "The non-anomalous nature of anomalous utterances." Language 47: 27-52.[66] V. Fromkin (1973). Speech errors as linguistic evidence. The Hague, Mouton Publishers.[67] N. Fujii, H. Mushiake and J. Tanji (2002). "Distribution of eye- and arm-movement-related neuronal activ-

ity in the SEF and in the SMA and Pre-SMA of monkeys." J Neurophysiol 87(4): 2158-66.[68] M. F. Garrett (1975). The analysis of sentence production. The Psychology of learning and motivation. G.

H. Bower. New York, Academic Press. 9: 133-177.[69] M. F. Garrett (1980). Levels of processing in sentence production. Language production, Vol 1., Speech

and talk. B. L. Butterworth. London, Academic Press: 177-220.[70] C. Gerfen and C. Wilson (1996). The basal ganglia. Handbook of chemical neuroanatomy, Vol 12, Inte-

grated systems of the CNS, Part III. L. W. Swanson, A. Bjorklund and T. T. Hokfelt. Holland, Elsevier Science B.V.: 371-468.

[71] N. Geschwind and A. M. Galaburda (1985). "Cerebral lateralization. Biological mechanisms, associations, and pathology: I. A hypothesis and a program for research." Arch Neurol 42(5): 428-59.

[72] S. Geyer, M. Matelli, G. Luppino and K. Zilles (2000). "Functional neuroanatomy of the primate isocortical motor system." Anat Embryol (Berl) 202(6): 443-74.

[73] G. Goldberg (1985). "Supplementary motor area structure and function: Review and hypotheses." Behav-ioral and Brain Sciences 8: 567-616.

[74] R. L. Gould, R. G. Brown, A. M. Owen, D. H. ffytche and R. J. Howard (2003). "fMRI BOLD response to increasing task difficulty during successful paired associates learning." Neuroimage 20(2): 1006-19.

[75] S. Grossberg (1978a). "Behavioral contrast in short term memory: Serial binary memory models or paral-lel con-tinuous memory models?" Journal of Mathematical Psychology 17: 199-219.

[76] S. Grossberg (1978b). A theory of human memory: Self-organization and performance of sensory-motor codes, maps, and plans. Progress in theoretical biology. R. Rosen and S. F. New York, Academic Press. 5: 233-374.

[77] S. Grossberg (1986). The adaptive self-organization of serial order in behavior: Speech, language, and motor control. Pattern recognition by humans and machines. Volume 1: Speech perception. E. C. Schwab and H. C. Nusbaum. New York, Academic Press.

[78] S. Grossberg and M. Kuperstein (1986). Neural dynamics of adaptive sensory-motor control, expanded edition. New York, Pergamon.

[79] F. H. Guenther (1994). "A neural network model of speech acquisition and motor equivalent speech production." Biol.\ Cybern. 72(1): 43--53.

[80] F. H. Guenther (1995). "Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production." Psychol.\ Rev. 102(3): 594--621.

[81] F. H. Guenther and S. S. Ghosh (2003). A neural model of speech production. Proceedings of the 6th In-ternational Seminar on Speech Production, Sydney, Australia.

[82] F. H. Guenther, S. S. Ghosh and J. A. Tourville (in press). "Neural modeling and imaging of the cortical interactions underlying syllable production." Brain and Language.

[83] F. H. Guenther, M. Hampson and D. Johnson (1998). "A theoretical investigation of reference frames for the planning of speech movements." Psychol Rev 105(4): 611-33.

[84] P. Gupta and B. MacWhinney (1997). "Vocabulary acquisition and verbal short-term memory: computa -tional and neural bases." Brain Lang 59(2): 267-333.

PHS 398 (Rev. 09/04) Page> 48

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.[85] D. Harrington and K. Haaland (1998). Sequencing and timing operations of the basal ganglia. Timing of

behavior. D. Rosenbaum and C. Collyer. Cambridge, MA, MIT Press.[86] T. A. Hartley and G. Houghton (1996). "A linguistically constrained model of short-term memory for non -

words." Journal of Memory and Language 35: 1-31.[87] D. J. Heeger, A. C. Huk, W. S. Geisler and D. G. Albrecht (2000). "Spikes versus BOLD: what does neu-

roimaging tell us about neuronal activity?" Nature Neuroscience 3(7): 631-633.[88] K. M. Heilman, S. A. Leon and J. C. Rosenbek (2004). "Affective aprosodia from a medial frontal stroke."

Brain Lang 89(3): 411-6.[89] R. N. Henson, N. Burgess and C. D. Frith (2000). "Recoding, storage, rehearsal and grouping in verbal

short-term memory: an fMRI study." Neuropsychologia 38(4): 426-40.[90] R. N. A. Henson, D. G. Norris, M. P. A. Page and A. D. Baddeley (1996). "Unchained memory: Error pat-

terns rule out chaining models of immediate serial recall." Quarterly Journal of Experimental Psychol-ogy 49A(80-115).

[91] G. Hickok, B. Buchsbaum, C. Humphries and T. Muftuler (2003). "Auditory-motor interaction revealed by fMRI: speech, music, and working memory in area Spt." J Cogn Neurosci 15(5): 673-82.

[92] G. Hickok, P. Erhard, J. Kassubek, A. K. Helms-Tillery, S. Naeve-Velguth, J. P. Strupp, P. L. Strick and K. Ugurbil (2000). "A functional magnetic resonance imaging study of the role of left posterior superior temporal gyrus in speech production: implications for the explanation of conduction aphasia." Neu-rosci Lett 287(2): 156-60.

[93] O. Hikosaka, H. Nakahara, M. K. Rand, K. Sakai, X. Lu, K. Nakamura, S. Miyachi and K. Doya (1999). "Parallel neural networks for learning sequential procedures." Trends Neurosci 22(10): 464-71.

[94] O. Hikosaka, K. Sakai, H. Nakahara, X. Lu, S. Miyachi, K. Nakamura and M. K. Rand (2000). Neural mechanisms for learning of sequential procedures. The new cognitive neuroscience. M. S. Gazzaniga. Cambridge, MA, MIT Press: 553-572.

[95] A. E. Hillis, M. Work, P. B. Barker, M. A. Jacobs, E. L. Breese and K. Maurer (2004). "Re-examining the brain regions crucial for orchestrating speech articulation." Brain 127(Pt 7): 1479-87.

[96] A. Ho, J. Bradshaw, R. Cunnington, J. Phillips and R. Iansek (1998). "Sequence heterogeneity in parkin-sonian speech." Brain and Language 64(122-145).

[97] B. Horwitz and A. R. Braun (2004). "Brain network interactions in auditory, visual and linguistic process-ing." Brain Lang 89(2): 377-84.

[98] B. Horwitz, K. J. Friston and J. G. Taylor (2000). "Neural modeling and functional brain imaging: an over-view." Neural Networks 13: 829-846.

[99] E. Hoshi and J. Tanji (2004). "Differential roles of neuronal activity in the supplementary and presupple-mentary motor areas: From information retrieval to motor planning and execution." J Neurophysiol.

[100] G. Houghton (1990). The problem of serial order: A neural network model of sequence learning and re-call. Current research in natural language generation. R. Dale, C. Mellish and M. Zock. London, Aca-demic Press: 287-319.

[101] F. T. Husain, M. A. Tagamets, S. J. Fromm, A. R. Braun and B. Horwitz (2004). "Relating neuronal dy-namics for auditory object processing to neuroimaging activity: a computational modeling and an fMRI study." Neuroimage 21(4): 1701-20.

[102] P. Indefrey and W. J. Levelt (2004). "The spatial and temporal signatures of word production compo -nents." Cognition 92(1-2): 101-44.

[103] W. Johnson and J. R. Knott (1936). "The moment of stuttering." Journal of Genetics and Psychology 48(475-479).

[104] S. Jonas (1981). "The supplementary motor region and speech emission." Journal of Communication Disorders 14: 349-373.

[105] S. Jonas (1987). The supplementary motor region and speech. The frontal lobes revisited. E. Perec-man. New York, IRBN Press: 241–250.

[106] J. Jonides, E. H. Schumacher, E. E. Smith, R. A. Koeppe, E. Awh, P. A. Reuter-Lorenz, C. Marshuetz and C. R. Willis (1998). "The role of parietal cortex in verbal working memory." J Neurosci 18(13): 5026-34.

[107] U. Jurgens (1984). "The efferent and efferent connections of the supplementary motor area." Brain Re-search 300: 63-81.

[108] S. W. Kennerley, K. Sakai and M. F. Rushworth (2004). "Organization of action sequences and the role of the pre-SMA." J Neurophysiol 91(2): 978-93.

[109] R. D. Kent (2000). "Research on speech motor control and its disorders: a review and prospective." J.\ Commun.\ Disord. 33(5): 391--427.

[110] J. G. Kerns, J. D. Cohen, V. A. Stenger and C. S. Carter (2004). "Prefrontal cortex guides context-ap -propriate responding during language production." Neuron 43(2): 283-91.

[111] S. T. Klapp (2003). "Reaction time analysis of two types of motor preparation for speech articulation: ac-

PHS 398 (Rev. 09/04) Page> 49

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.tion as a sequence of chunks." Journal of Motor Behavior 35: 135-150.

[112] A. Krainik, S. Lehericy, H. Duffau, M. Vlaicu, F. Poupon, L. Capelle, P. Cornu, S. Clemenceau, M. Sa-hel, C. A. Valery, A. L. Boch, J. F. Mangin, D. L. Bihan and C. Marsault (2001). "Role of the supple -mentary motor area in motor deficit following medial frontal lobe surgery." Neurology 57(5): 871-8.

[113] J. D. Kropotov and S. C. Etlinger (1999). "Selection of actions in the basal ganglia-thalamocortical cir -cuits: review and model." Int J Psychophysiol 31(3): 197-217.

[114] K. S. Lashley (1951). The problem of serial order in behavior. Cerebral mechanisms in behavior. L. A. Jeffress. New York, Wiley.

[115] W. J. Levelt and L. Wheeldon (1994). "Do speakers have access to a mental syllabary?" Cognition 50(1-3): 239-69.

[116] W. J. M. Levelt, A. Roelofs and A. S. Meyer (1999). "A theory of lexical access in speech production." Behavioral and Brain Sciences 22: 1-75.

[117] N. K. Logothetis, J. Pauls, M. Augath, T. Trinath and A. Oeltermann (2001). "Neurophysiological investi-gation of the basis of the fMRI signal." Nature 412(6843): 150-157.

[118] N. K. Logothetis and J. Pfeuffer (2004). "On the nature of the BOLD fMRI contrast mechanism." Mag-netic Resonance Imaging 22: 1517-1531.

[119] X. Lu, O. Hikosaka and S. Miyachi (1998). "Role of monkey cerebellar nuclei in skill for sequential movement." J Neurophysiol 79(5): 2245-54.

[120] G. Luppino, M. Matelli, R. Camarda and G. Rizzolatti (1993). "Corticocortical connections of area F3 (SMA-proper) and area F6 (Pre-SMA) in the macaque monkey." Journal of Comparative Neurology 338: 114-140.

[121] D. G. MacKay (1970). "Spoonerisms: the structure of errors in the serial order of speech." Neuropsy-chologia 8(3): 323-50.

[122] P. F. MacNeilage (1998). "The frame/content theory of evolution of speech production." Behavioral and Brain Sciences 21: 499-511.

[123] C. Mannes (1994). Neural network models of serial order and handwriting movement generation. De-partment of Cognitive and Neural Systems. Boston, MA, Boston University.

[124] M. Matelli, G. Luppino and G. Rizzolatti (1995). "Convergence of pallidal and cerebellar outputs on the frontal motor areas." Acta Biomed Ateneo Parmense 66(3-4): 83-92.

[125] Y. Matsuzaka, H. Aizawa and J. Tanji (1992). "A motor area rostral to the supplementary motor area (presupplementary motor area) in the monkey: neuronal activity during a learned motor task." J Neuro-physiol 68(3): 653-62.

[126] A. R. McIntosh and F. Gonzalez-Lima (1994a). "Structural equation modeling and its application to net -work analysis in functional brain imaging." Human Brain Mapping 2: 2-22.

[127] A. R. McIntosh, C. L. Grady, L. G. Ungerleider, J. V. Haxby, S. I. Rapoport and B. Horwitz (1994b). "Network analysis of cortical visual pathways mapped with PET." Journal of Neuroscience 14(2): 655-666.

[128] M. R. McNeil and P. J. Doyle (2004a). Apraxia of speech: nature and phenomenology. The MIT Ency-clopedia of Communication Disorders. R. Kent. 101-103.

[129] M. R. McNeil, S. R. Pratt and T. Fossett, R.D. (2004b). The differential diagnosis of apraxia. Speech motor control in normal and disordered speech. B. Maasen, Kent, R., Peters, H.F.M, van Leishout, P.H.H.M., Hulstijn, W. New York, Oxford University Press: 389-413.

[130] A. Mechelli, W. D. Penny, C. J. Price, D. R. Gitelman and K. J. Friston (2002). "Effective connectivity and intersubject variability: using a multisubject network to test differences and commonalities." Neu-roImage 17(3): 1459-1469.

[131] F. A. Middleton and P. L. Strick (2000). "Basal ganglia and cerebellar loops: motor and cognitive cir -cuits." Brain Res Brain Res Rev 31(2-3): 236-50.

[132] F. A. Middleton and P. L. Strick (2002). "Basal-ganglia 'projections' to the prefrontal cortex of the pri -mate." Cereb Cortex 12(9): 926-35.

[133] N. Miller (2002). "The neurological bases of apraxia of speech." Semin Speech Lang 23(4): 223-30.[134] J. W. Mink (1996). "The basal ganglia: focused selection and inhibition of competing motor programs."

Prog Neurobiol 50(4): 381-425.[135] J. W. Mink and W. T. Thach (1993). "Basal ganglia intrinsic circuits and their role in behavior." Curr Opin

Neurobiol 3(6): 950-7.[136] S. Miyachi, O. Hikosaka and X. Lu (2002). "Differential activation of monkey striatal neurons in the early

and late stages of procedural learning." Experimental Brain Research 146(1): 122-126.[137] S. Miyachi, O. Hikosaka, K. Miyashita, Z. Karadi and M. K. Rand (1997). "Differential roles of monkey

striatum in learning of sequential hand movement." Exp Brain Res 115(1): 1-5.[138] F. M. Mottaghy, T. Doring, H. W. Muller-Gartner, R. Topper and B. J. Krause (2002). "Bilateral parieto-

frontal network for verbal working memory: an interference approach using repetitive transcranial mag-

PHS 398 (Rev. 09/04) Page> 50

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.netic stimulation (rTMS)." Eur J Neurosci 16(8): 1627-32.

[139] F. M. Mottaghy, M. Gangitano, B. J. Krause and A. Pascual-Leone (2003). "Chronometry of parietal and prefrontal activations in verbal working memory revealed by transcranial magnetic stimulation." Neu-roimage 18(3): 565-75.

[140] K. G. Munhall (2001). "Functional imaging during speech production." Acta Psychologica 107: 95-117.[141] B. Munson (2001). "Phonological pattern frequency and speech production in adults and children." J

Speech Lang Hear Res 44(4): 778-92.[142] K. Murphy, D. R. Corfield, A. Guz, G. R. Fink, R. J. Wise, J. Harrison and L. Adams (1997). "Cerebral

areas associated with motor control of speech in humans." J Appl Physiol 83(5): 1438-1447.[143] E. D. Mysak (1960). "Servo theory and stuttering." J Speech Hear Disord 25: 188-95.[144] H. Nakahara, K. Doya and O. Hikosaka (2001). "Parallel cortico-basal ganglia mechanisms for acquisi -

tion and execution of visuomotor sequences - a computational approach." J Cogn Neurosci 13(5): 626-47.

[145] K. Nakamura, K. Sakai and O. Hikosaka (1998). "Neuronal activity in medial frontal cortex during learn -ing of sequential procedures." J Neurophysiol 80(5): 2671-87.

[146] K. Nakamura, K. Sakai and O. Hikosaka (1999). "Effects of local inactivation of monkey medial frontal cortex in learning of sequential procedures." J Neurophysiol 82(2): 1063-8.

[147] A. Nieto-Castanon, S. S. Ghosh, J. A. Tourville and F. H. Guenther (2003). "Region of interest based analysis of functional imaging data." Neuroimage 19(4): 1303-16.

[148] P. D. Nixon and R. E. Passingham (2000). "The cerebellum and cognition: cerebellar lesions impair se -quence learning but not conditional visuomotor learning in monkeys." Neuropsychologia 38(7): 1054-72.

[149] S. G. Nooteboom (1969). The tongue slips into patterns. Leyden studies in linguistics and phonetics. A. G. Sciarone, van Essen, A.J., van Raad, A.A. The Hague, Mouton: 114-132.

[150] M. Page (2000). "Connectionist modelling in psychology: a localist manifesto." Behav Brain Sci 23(4): 443-67; discussion 467-512.

[151] M. P. Page and D. Norris (1998). "The primacy model: a new model of immediate serial recall." Psychol Rev 105(4): 761-81.

[152] M. P. A. I. E. Page, : , pp. 175-198. (1999). Modeling the perception of musical sequences with self-or -ganizing neural networks. Musical networks. N. Griffith and P. M. Todd. Cambridge, MA, MIT Press: 175-198.

[153] M. Pai (1999). "Supplementary motor area aphasia: a case report." Clinical Neurology and Neuro-surgery 101: 29-32.

[154] A. Parent and L. N. Hazrati (1995). "Functional anatomy of the basal ganglia. II. The place of subthala -mic nucleus and external pallidum in basal ganglia circuitry." Brain Res Brain Res Rev 20(1): 128-54.

[155] R. Passingham (1993). The frontal lobes and voluntary activity. Oxford, Oxford U. Press.[156] T. Paus, L. Koski, Z. Caramanos and C. Westbury (1998). "Regional differences in the effects of task

difficulty and motor output on blood flow response in the human anterior cingulate cortex: a review of 107 PET activation studies." Neuroreport 9(9): R37-47.

[157] T. Paus, M. Petrides, A. C. Evans and E. Meyer (1993). "Role of the human anterior cingulate cortex in the control of oculomotor, manual, and speech responses: a positron emission tomography study." J Neurophysiol 70(2): 453-69.

[158] G. Pellizzer and J. H. Hedges (2003). "Motor planning: effect of directional uncertainty with discrete spa-tial cues." Experimental Brain Research 150: 276-289.

[159] J. B. Penney, Jr. and A. B. Young (1981). "GABA as the pallidothalamic neurotransmitter: implications for basal ganglia function." Brain Res 207(1): 195-9.

[160] W. D. Penny, K. E. Stephan, A. Mechelli and K. J. Friston (2004). "Modelling functional integration: a comparison of structural equation and dynamic causal models." NeuroImage 23: 264-274.

[161] J. S. Perkell, F. H. Guenther, H. Lane, M. L. Matthies, P. Perrier, J. Vick, R. Wilhelms-Tricarico and M. Zandipour (2000). "A theory of speech motor control and supporting data from speakers with normal hearing and profound hearing loss." Journal of Phonetics 28: 233-272.

[162] W. H. Perkins, R. D. Kent and R. F. Curlee (1991). "A theory of neuropsycholinguistic function in stutter -ing." J Speech Hear Res 34(4): 734-52.

[163] M. Petrides and D. N. Pandya (1999). "Dorsolateral prefrontal cortex: comparative cytoarchitectonic analysis in the human and the macaque brain and corticocortical connection patterns." Eur J Neurosci 11(3): 1011-36.

[164] N. Picard and P. L. Strick (1996). "Motor areas of the medial wall: a review of their location and func-tional activation." Cereb Cortex 6(3): 342-53.

[165] E. Pickett, E. Kuniholm, A. Protopapas, J. Friedman and P. Lieberman (1998). "Selective speech motor, syntax and cognitive deficits associated with bilateral damage to the putamen and the head of the

PHS 398 (Rev. 09/04) Page> 51

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.caudate nuclues: a case study." Neuropsychologia 36(173-188).

[166] A. Postma and H. Kolk (1993). "The covert repair hypothesis: prearticulatory repair processes in normal and stuttered disfluencies." J Speech Hear Res 36(3): 472-87.

[167] E. Procyk, Y. L. Tanaka and J. P. Joseph (2000). "Anterior cingulate activity during routine and non-rou-tine sequential behaviors in macaques." Nat Neurosci 3(5): 502-8.

[168] S. M. Rao, D. L. Harrington, K. Y. Haaland, J. A. Bobholz, R. W. Cox and J. R. Binder (1997). "Distrib -uted neural systems underlying the timing of movements." J Neurosci 17(14): 5528-35.

[169] G. Rees, K. Friston and C. Koch (2000). "A direct quantitative relationship between the functional prop -erties of human and macaque V5." Nature Neuroscience 3(7): 716-723.

[170] B. Rhodes and D. Bullock (2002). Neural dynamics of learning and performance of fixed sequences: Latency pattern reorganization and the N-STREAMS model. Boston University Technical Report CAS/CNS-02-007.

[171] B. J. Rhodes, D. Bullock, W. B. Verwey, B. B. Averbeck and M. P. A. Page (2004). "Learning and pro -duction of movement sequences: Behavioral, neurophysiological, and modeling perspectives." In press, Human Movement Science 23.

[172] A. Riecker, H. Ackermann, D. Wildgruber, G. Dogil and W. Grodd (2000a). "Opposite hemispheric later-alization effects during speaking and singing at motor cortex, insula and cerebellum." Neuroreport 11(9): 1997-2000.

[173] A. Riecker, H. Ackermann, D. Wildgruber, J. Meyer, G. Dogil, H. Haider and W. Grodd (2000b). "Articu-latory/phonetic sequencing at the level of the anterior perisylvian cortex: a functional magnetic reso-nance imaging (fmri) study." Brain and Language 75(2): 259-276.

[174] D. Riva (1998). "The cerebellar contribution to language and sequential functions: evidence from a child with cerebellitis." Cortex 34(2): 279-87.

[175] D. A. Rosenbaum, E. Saltzman and A. Kingman (1984). Choosing between movement sequences. Preparatory states and processes. K. S. and J. Requin. Hillsdale, NJ, Erlbaum: 119-134.

[176] E. D. Ross and M. M. Mesulam (1979). "Dominant language functions of the right hemisphere? Prosody and emotional gesturing." Arch Neurol 36(3): 144-8.

[177] K. Sakai, O. Hikosaka, S. Miyauchi, R. Takino, Y. Sasaki and B. Putz (1998). "Transition of brain activa-tion from frontal to parietal areas in visuomotor sequence learning." Journal of Neuroscience 18: 1827-1840.

[178] K. Sakai, K. Kitaguchi and O. Hikosaka (2003). "Chunking during human visuomotor sequence learn -ing." Exp Brain Res 152(2): 229-42.

[179] Y. Sakurai, T. Momose, M. Iwata, Y. Sudo, K. Ohtomo and I. Kanazawa (2001). "Cortical activity associ -ated with vocalization and reading proper." Brain Res Cogn Brain Res 12(1): 161-5.

[180] H. E. Schendan, M. M. Searl, R. J. Melrose and C. E. Stern (2003). "An fMRI study of the role of the medial temporal lobe in implicit and explicit sequence learning." Neuron 37: 1013-1025.

[181] J. D. Schmahmann and D. N. Pandya (1997). "The cerebrocerebellar system." Int Rev Neurobiol 41: 31-60.

[182] S. Shattuck-Hufnagel (1979). Speech errors as evidence for a serial order mechanism in sentence pro-duction. Sentence Processing: Psycholinguistic Studies Presented to Merrill Garrett. W. E. Cooper and E. C. T. Walker. Hillsdale, NJ, Erlbaum: 295-342.

[183] S. Shattuck-Hufnagel (1983). Sublexical units and suprasegmental structure in speech production plan-ning. The production of speech. P. F. MacNeilage. New York, Springer-Verlag: 109-136.

[184] S. Shattuck-Hufnagel (1987). The role of word-onset consonants in speech production planning: New evidence from speech error patterns. Motor and Sensory Processes of Language. E. Keller and M. Gopnik. Hillsdale NJ, Lawrence Erlbaum: 17-51.

[185] K. Shima, H. Mushiake, N. Saito and J. Tanji (1996). "Role for cells in the presupplementary motor area in updating motor plans." Proc Natl Acad Sci U S A 93(16): 8694-8.

[186] K. Shima and J. Tanji (2000). "Neuronal activity in the supplementary and presupplementary motor ar-eas for temporal organization of multiple movements." J Neurophysiol 84(4): 2148-60.

[187] J. C. Shin and R. B. Ivry (2003). "Spatial and temporal sequence learning in patients with Parkinson's disease or cerebellar lesions." J Cogn Neurosci 15(8): 1232-43.

[188] L. I. Shuster and S. K. Lemieux (2005). "An fMRI investigation of covertly and overtly produced mono- and multisyllabic words." Brain and Language 93(1): 20-31.

[189] C. W. Starkweather (1987). Fluency and Stuttering. Englewood Cliffs, NJ, Prentice-Hall.[190] S. Sternberg, S. Monsell, R. L. Knoll and C. D. Wright (1978). The latency and duration of rapid move-

ment se-quences: Comparisons of speech and typewriting. Information processing in motor control and learning. G. E. Stelmech. New York, Academic Press: 117-152.

[191] M. A. Tagamets and B. Horwitz (1997). Modeling brain imaging data with neuronal assembly dynamics. New York, NY, Plenum.

PHS 398 (Rev. 09/04) Page> 52

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.[192] M. A. Tagamets and B. Horwitz (2001). "Interpreting PET and fMRI measures of functional neural activ -

ity: The effects of synaptic inhibition on cortical activation in human imaging studies." Brain Research Bulletin 54(3): 267-273.

[193] J. Tanji (2001). "Sequential organization of multiple movements: involvement of cortical motor areas." Annu Rev Neurosci 24: 631-51.

[194] J. Tanji and K. Shima (1994). "Role for supplementary motor area cells in planning several movements ahead." Nature 371(6496): 413-416.

[195] J. A. Tourville and F. H. Guenther (2003). A cortical parcellation scheme for speech studies. Boston University Technical Report CAS/CNS-03-022. Boston, MA, Boston University.

[196] L. E. Travis (1931). Speech Pathology. New York, D. Appleton-Century.[197] R. Treiman, B. Kessler, S. Knewasser, R. Tincoff and M. Bowman (2000). English speakers' sensitivity

to phonotactic patterns. Papers in laboratory phonology. M. Broe and J. Pierrehumbert. Cambridge, MA, CUP.

[198] A. van der Merwe (1997). A theoretical framework for the characterization of pathological speech sen-sorimotor control. Clinical management of sensorimotor speech disorders. M. R. McNeil. Thieme, New York: 1-25.

[199] D. J. Veltman, S. A. Rombouts and R. J. Dolan (2003). "Maintenance versus manipulation in verbal working memory revisited: an fMRI study." Neuroimage 18(2): 247-56.

[200] W. B. Verwey (1996). "Buffer loading and chunking in sequential keypressing." Journal of Experimental Psychology-Human Perception and Performance 22: 544-562.

[201] G. Vingerhoets, J. Van Borsel, C. Tesink, M. van den Noort, K. Deblaere, R. Seurinck, P. Vandemaele and E. Achten (2003). "Multilingualism: an fMRI study." Neuroimage 20(4): 2181-96.

[202] M. S. Vitevitch, P. A. Luce, J. Charles-Luce and D. Kemmerer (1997). "Phonotactics and syllable stress: implications for the processing of spoken nonsense words." Lang Speech 40 ( Pt 1): 47-62.

[203] A. D. Wagner, A. Maril, R. A. Bjork and D. L. Schacter (2001). "Prefrontal contributions to executive control: fMRI evidence for functional distinctions within lateral Prefrontal cortex." Neuroimage 14(6): 1337-47.

[204] D. L. Wang, X. M. Liu and S. C. Ahalt (1996). "On temporal generalization of simple recurrent net-works." Neural Networks 9(1099-1118).

[205] N. Ward (1994). A connectionist language generator. Norwood, NJ, Ablex Publishing.[206] R. T. Wertz, L. L. LaPointe and J. C. Rosenbek (1984). Apraxia of Speech in Adults: The Disorder and

its Management. Orlando, Grune and Stratton Inc.[207] R. West (1958). An agnostic's speculations about stuttering. New York, Harper & Row.[208] T. Wichmann and M. R. DeLong (1996). "Functional and pathophysiological models of the basal gan-

glia." Curr Opin Neurobiol 6(6): 751-8.[209] R. Wiesendanger and M. Wiesendanger (1985). "Cerebello-cortical linkage in the monkey as revealed

by transcellular labeling with the lectin wheat germ agglutinin conjugated to the marker horseradish peroxidase." Exp Brain Res 59(1): 105-17.

[210] D. Wildgruber, H. Ackermann and W. Grodd (2001). "Differential contributions of motor cortex, basal ganglia, and cerebellum to speech motor control: effects of syllable repetition rate evaluated by fMRI." Neuroimage 13(1): 101-9.

[211] R. J. Wise, J. Greene, C. Buchel and S. K. Scott (1999). "Brain regions involved in articulation." Lancet 353(9158): 1057-61.

[212] T. Wu, K. Kansaku and M. Hallett (2004). "How self-initiated memorized movements become automatic: a functional MRI study." Journal of Neurophysiology 91: 1690-1698.

[213] E. Zarahn and M. Slifstein (2001). "A reference effect approach for power analysis in fMRI." Neuroim-age 14(3): 768-779.

[214] T. Zheng, C. J. Wilson and J. N. 1007-1017 (2002). "Corticostriatal combinatorics: The implications of cortico-striatal axonal arborizations." Journal of Neurophysiology 87: 1007-1017.

[215] W. Ziegler (2002). "Psycholinguistic and motor theories of apraxia of speech." Semin Speech Lang 23(4): 231-44.

[216] W. Ziegler, B. Kilian and K. Deger (1997). "The role of the left mesial frontal cortex in fluent speech: evi -dence from a case of left supplementary motor area hemorrhage." Neuropsychologia 35(9): 1197–1208.

[217] G. Zimmerman (1980). "Stuttering: A disorder of movement." Journal of Speech and Hearing Research 23: 122-136.

PHS 398 (Rev. 09/04) Page> 53

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.

H. Consortium/Contractual Arrangements

In addition to his primary appointment at Boston University, Dr. Guenther has a research appointment at Massachusetts General Hospital, where the fMRI experiments will be performed. Dr. Guenther and his lab members therefore have direct access to the MRI equipment needed to perform the research in this applica -tion. Imaging time at the MRI facilities will be invoiced to Prof. Guenther at Boston University; thus no subcon-tract is necessary.

PHS 398 (Rev. 09/04) Page> 54

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.I. Resource Sharing

The proposal does not involve direct cost amounts in excess of $500,000 in any year, and it does not include model organisms.

PHS 398 (Rev. 09/04) Page> 55

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.J. Consultants

There are no outside consultants on the proposed project.

PHS 398 (Rev. 09/04) Page> 56

Principal Investigator/Program Director (Last, first, middle): > Guenther, Frank H.8. Appendix Materials

The following documents are included in the Appendix:

1) Guenther, F.H., Ghosh, S.S., and Tourville, J.A. (in press). Neural modeling and imaging of the corti -cal interactions underlying syllable production. Brain and Language.

2) Guenther, F.H., Hampson, M., and Johnson, D. (1998). A theoretical investigation of reference frames for the planning of speech movements. Psychological Review, 105, pp. 611-633.

3) Guenther, F.H. (1995). Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production. Psychological Review, 102, pp. 594-621.

4) Bullock, D. (2004a). Adaptive neural models of queuing and timing in fluent action. Trends in Cogni-tive Sciences, 8, 426-433.

5) Rhodes, B.J., Bullock, D., Verwey, W.B., Averbeck, B.B., and Page, M.P.A. (2004). Learning and pro -duction of movement sequences: Behavioral, neurophysiological, and modeling perspectives. In press, Human Movement Science, 23, pp. 699-746.

6) Nieto-Castanon, A., Ghosh, S.S., Tourville, J.A., and Guenther, F.H. (2003). Region-of-interest based analysis of functional imaging data. NeuroImage, 19, pp. 1303-1316.

PHS 398 (Rev. 09/04) Page> 57