THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

222
THE ATOMS OF PHONOLOGICAL REPRESENTATION: GESTURES, COORDINATION AND PERCEPTUAL FEATURES IN CONSONANT CLUSTER PHONOTACTICS by Lisa Davidson A dissertation submitted to Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy Baltimore, Maryland July, 2003 © Lisa Davidson 2003 All Rights Reserved

Transcript of THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

Page 1: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

THE ATOMS OF PHONOLOGICAL REPRESENTATION:

GESTURES, COORDINATION AND PERCEPTUAL FEATURES IN

CONSONANT CLUSTER PHONOTACTICS

by

Lisa Davidson

A dissertation submitted to Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy

Baltimore, Maryland

July, 2003

© Lisa Davidson 2003

All Rights Reserved

Page 2: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

ii

ABSTRACT The central goal of this dissertation is to investigate the roles and interaction of articulatory, perceptual, and temporal elements in the phonological component of the grammar. This inquiry extends both to the input representations that are submitted to a phonological grammar, and to the constraints in the grammar. In order to adequately account for both production data and data from language typology, two elements must be integrated into the phonological component alongside articulatory gestures: perceptual features, which play an important role in determining phonotactic patterns, and gestural coordination, which establishes whether and how adjacent gestures are related to one another. This dissertation reports three experiments on the production of word-initial consonant clusters; such clusters are an appropriate environment for investigating how perception, articulation, and coordination interact in the phonology. The first experiment is an acoustic study of the production by native English speakers of Czech-possible consonant clusters (e.g. fkale, zbano, vnodi). Results show that speakers are more accurate on some English-illegal phonotactic sequences than others. Speakers most often repair illegal target clusters by inserting a schwa between the two consonants in the cluster. The nature of this schwa is addressed in the second experiment. A comparison of speakers producing both phonotactically legal and illegal word-initial clusters using ultrasound imaging shows that speakers’ repairs of the illegal sequence are more consistent with the alteration of gestural coordination than with phonological vowel epenthesis. The third experiment addresses fast speech schwa deletion in English. Results from this experiment suggest that surface changes caused by speech rate may be implemented in the phonology through a modification of the coordination relationship between gestures. The formal analysis of this data draws on insights from Articulatory Phonology (Browman and Goldstein 1986, et seq.) and the Licensing-by-Cue framework (Steriade 1997) to determine how articulatory, perceptual and temporal factors affect consonant cluster production. These factors are incorporated into a constraint-based phonological framework that not only accounts for the coordination between sequential gestures (COORDINATION constraints, based on Gafos 2002), but also determines which gestures are subject to such coordination (ASSOCIATION constraints) and whether the coordinated gestures form a phonotactically legal sequence (*OVERLAP constraints). Together, these constraints form Gestural Association Theory. The framework is extended to incorporate floating constraints to account for the variation observed in the experimental results. Advisor/Reader: Dr. Paul Smolensky Reader: Dr. Luigi Burzio

Page 3: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

iii

ACKNOWLEDGEMENTS If I had listened to my first linguistics professor, I would have ended up in computer science. He told me that if I liked the kind of thinking and problem-solving required for studying linguistics, then I would be better prepared for future economic stability and personal sanity by changing my major to CS. Needless to say, I didn’t listen to him, and I have never regretted the decision I made. When I was deciding on graduate schools, I was intrigued by the fact that at Johns Hopkins I could study with one of the foremost faculty in linguistics and learn how to integrate my interests in phonology with other aspects of cognitive science at the same time. Paul Smolensky has been the best advisor and role model that a student could hope for. Not only is he a brilliant thinker who challenged me to constantly rework, refine, and expand my ideas, but he is also an exceptionally generous person who always found time for his students despite his inconceivably busy schedule. Paul may have never expected that I would turn out to be so excited by the “low-level” world of phonetics, but I respect him deeply for sticking with it as we found a way to unite our interests. Despite the considerable amount of time Paul spent tutoring me in the formalism of phonology, I’ll call us even if I really have helped make an experimentalist out of him after all. I am enormously appreciative of the efforts of my other teachers throughout the years. I owe a special debt of gratitude to Maureen Stone of the University of Maryland and Lisa Zsiga of Georgetown University for teaching me phonetics and helping me with the experimental work in my dissertation. I cannot thank them enough. At Johns Hopkins, Géraldine Legendre taught me that syntax projects can be fun too (despite my initial misgivings). I thank Bill Badecker, Brenda Rapp, and Mike McCloskey for the questions about experimentation that they’ve had to field over the years, and Luigi Burzio for the insightful comments he gave me on early versions of my dissertation. At Brown, my first two phonology teachers, Rolf Noyer and Katherine Demuth, were instrumental in sparking my interest in research. It is because they took the time to mentor me as an over-achieving undergraduate that I have ended up where I am today. At the University of Barcelona, Núria Sebastián Gallés patiently put up with my initial attempts at speaking Catalan as she taught me how to be a rigorous experimentalist. Moltes gràcies! Thanks are also due to some people who were particularly helpful at various stages during my dissertation work. Vijay Parthasarathy was kind enough to explain his visualization software to me multiple times and to cheerfully accompany me on my journey to find the right statistical test. Melissa Epstein was generous not only in helping me run ultrasound subjects, but also in discussing how to teach a phonetics class with me. Also, though they probably don’t even realize the role they’ve played, I am grateful to a number of linguists who have donated their time and expertise to me: Mary Beckman, Ioana Chitoran, Adamantios Gafos, John Kingston, Bob Ladd, and Donca Steriade. Thanks also to Sanjeev Khudanpur and Barbara Landau, the remaining members of my dissertation committee. I might never have finished if it weren’t for the support and companionship of the many friends I have made during graduate school. I continue to be impressed by the level of commitment that Hopkins students have to the field of cognitive science, and I will always be inspired by their enthusiasm. I am especially indebted to Matt Goldrick and

Page 4: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

iv

Colin Wilson, who are incredibly insightful phonologists; had it not been for their answers to my many questions, my work would certainly have suffered. I hope our professional relationships and personal friendships continue for many years to come. I’d like to thank John Hale for sticking it out with me even as all of our other classmates disappeared. Over the years many others have contributed to making Hopkins and the city of Baltimore great places to work and live: (in alphabetical order) Adam Buchwald, Alexis Lewis, Chris Goldrick, Danny Dilks, Jared Medina, Jussi Valtonen, Laura LaKusta, Marni Switkin, Matt Clapp, Oren Schwartz, and Tamara Nicol. Finally, this dissertation is dedicated to my parents, Evan and Judy Davidson and to my fiancé David Goldberg. Not only did Dave selflessly volunteer his formidable MatLab skills whenever I needed them, but he also continues to be the best friend, confidante, sounding board, and moral support system that I could ever hope for. As for my parents, I thank them for remaining open-minded and supporting my decision to get my doctorate. They’ve always believed in me, and I cannot thank them enough for allowing me the freedom to make my own choices. It has made all the difference.

Page 5: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

v

TABLE OF CONTENTS ABSTRACT........................................................................................................................ ii ACKNOWLEDGEMENTS............................................................................................... iii TABLE OF FIGURES..................................................................................................... viii TABLE OF TABLES ......................................................................................................... x CHAPTER 1. The Elements of Phonetic and Phonological Representations .................... 1

1.1. Outline of the dissertation........................................................................................ 5 CHAPTER 2. Articulatory and Perceptual Approaches to Phonology............................... 7

2.1. Articulatory Phonology and gestural representation................................................ 7 2.1.1. Gestures and gestural scores ....................................................................... 7 2.1.2. Regularities and natural classes .................................................................. 8 2.1.3. Phasing and coordination relationships .................................................... 10 2.1.4. Gestural overlap ........................................................................................ 16

2.2. Adding perceptual factors into phonology............................................................. 20 2.2.1. “Tug-of-war” between articulation and perception .................................. 20 2.2.2. Grounding and functionalism in phonological theory .............................. 23

2.3. Summary ................................................................................................................ 26 CHAPTER 3. Perceptual and Articulatory Influences on the Production of Non-Native Sequences.......................................................................................................................... 27

3.1. The perception, production, and acquisition of non-native sequences .................. 27 3.1.1. Perception ................................................................................................. 27 3.1.2. Production and acquisition........................................................................ 30

3.2. Experiment............................................................................................................. 36 3.2.1. Phonetic influences ................................................................................... 36 3.2.2. Participants................................................................................................ 39 3.2.3. Materials ................................................................................................... 39 3.2.4. Design and procedure ............................................................................... 40

3.3. Results.................................................................................................................... 42 3.3.1. Repetition first vs. Sentence first .............................................................. 42 3.3.2. Repetition condition.................................................................................. 42

3.3.2.1 First segment......................................................................................... 42 3.3.2.2 Second segment .................................................................................... 43 3.3.2.3 Cluster type ........................................................................................... 44 3.3.2.4 Error types............................................................................................. 45

3.3.3. Sentence condition.................................................................................... 46 3.3.3.1 First segment......................................................................................... 46 3.3.3.2 Second segment .................................................................................... 47 3.3.3.3 Cluster type ........................................................................................... 47 3.3.3.4 Error types............................................................................................. 48

Page 6: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

vi

3.3.3.5 Discussion............................................................................................. 48 3.4. General discussion ................................................................................................. 49

3.4.1. Fricative-initial obstruent clusters............................................................. 49 3.4.2. Nature of errors ......................................................................................... 51

3.5. Summary ................................................................................................................ 56 CHAPTER 4. An Ultrasound Investigation of Consonant Coordination in Initial Clusters........................................................................................................................................... 58

4.1. The nature of schwa ............................................................................................... 58 4.1.1. Experimental and typological characterizations of transitional schwa..... 58 4.1.2. The relationship between transitional schwa and non-native production. 61

4.2. Previous ultrasound research ................................................................................. 61 4.3. Ultrasound experiment........................................................................................... 63

4.3.1. Participants................................................................................................ 64 4.3.2. Materials ................................................................................................... 65 4.3.3. Design and data collection ........................................................................ 65

4.3.3.1 Ultrasound setup ................................................................................... 65 4.3.3.2 Recording procedure............................................................................. 67

4.3.4. Methods..................................................................................................... 67 4.3.4.1 Data processing..................................................................................... 67 4.3.4.2 Statistical measures: L2 norms and the sign test ................................... 72

4.4. Results.................................................................................................................... 73 4.4.1. Visual imaging .......................................................................................... 73 4.4.2. Statistical results ....................................................................................... 79

4.5. General discussion ................................................................................................. 82 4.6. Summary ................................................................................................................ 84

CHAPTER 5. Cluster Production in a Constraint-Based Gestural Theory ...................... 86

5.1. Loanwords: the interaction between phonology and non-native sequences.......... 87 5.2. Justifying a phonological account.......................................................................... 89 5.3. Consonant cluster phonotactics in a grammar of gestural coordination ................ 92

5.3.1. Constraints on word-initial clusters .......................................................... 92 5.3.2. Gestural Association Theory and syllable structure ............................... 102

5.3.2.1 Analysis of phonotactically legal forms ............................................. 108 5.3.2.2 Analysis of phonotactically illegal forms ........................................... 115

5.4. Accounting for variability in production ............................................................. 119 5.4.1. Variation in OT grammars ...................................................................... 119 5.4.2. A floating constraint analysis of English experimental performance..... 122 5.4.3. The origin of hidden rankings................................................................. 130

5.5. The special case of /f/-initial clusters................................................................... 131 5.6. Summary .............................................................................................................. 135

CHAPTER 6. Coordination in Fast Speech Schwa Elision............................................ 137

6.1. Pre-tonic schwa elision ........................................................................................ 138 6.1.1. Weak vowel deletion in the phonology .................................................. 138 6.1.2. Acoustic and articulatory evidence: Elision as overlap .......................... 139

Page 7: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

vii

6.1.3. The influence of speaking rate on gestural coordination ........................ 141 6.2. Experiment........................................................................................................... 144

6.2.1. Participants.............................................................................................. 147 6.2.2. Materials ................................................................................................. 147 6.2.3. Design and procedure ............................................................................. 148 6.2.4. Analysis................................................................................................... 148

6.3. Results.................................................................................................................. 149 6.3.1. Speaking rate........................................................................................... 149 6.3.2. Elision ..................................................................................................... 149 6.3.3. Schwa devoicing ..................................................................................... 150 6.3.4. Schwa duration........................................................................................ 158 6.3.5. Individual /#C´C/ sequences .................................................................. 162 6.3.6. Individual speakers ................................................................................. 164 6.3.7. Word frequency ...................................................................................... 167

6.4. General discussion ............................................................................................... 168 6.4.1. The relationship between overlap, speaking rate, and coordination....... 170 6.4.2. Deletion plus gestural mistiming: A possible alternative account? ........ 176

6.5. Summary .............................................................................................................. 178 CHAPTER 7. Concluding Remarks ............................................................................... 180 APPENDIX 1. Words used in Czech-consonant cluster experiment (Chapter 3) .......... 183 APPENDIX 2: An alternative to local conjunction (Chapter 5)..................................... 184 APPENDIX 3: Stimuli from fast speech schwa elision experiment (Chapter 6) ........... 187 APPENDIX 4: Word frequency counts using Google (Chapter 6) ................................ 188 APPENDIX 5: Spectrograms showing schwas, aspiration, and elision (Chapter 6) ...... 190 REFERENCES ............................................................................................................... 193 CURRICULUM VITA ................................................................................................... 212

Page 8: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

viii

TABLE OF FIGURES

Figure 1. Schematic gestural score for ‘pan’ [phæn]........................................................... 8 Figure 2. Feature geometric articulator tree........................................................................ 9 Figure 3. Possible mappings between articulator sets and constriction locations ............ 10 Figure 4. Gestural landmarks............................................................................................ 10 Figure 5. Coordination relationships for VC and CV sequences...................................... 11 Figure 6. Coordination topology for onset and coda clusters ........................................... 12 Figure 7. Canonical and casual speech productions of perfect memory ........................... 16 Figure 8. Gestural scores of wash and warsh ................................................................... 17 Figure 9. Production accuracy on obstruent word-initial clusters in Davidson et al. (2003)

................................................................................................................................... 35 Figure 10. Performance based on first segment in Repetition condition.......................... 43 Figure 11. Performance based on second segment in Repetition condition ..................... 44 Figure 12. Performance on clusters broken down by cluster type in Repetition condition

................................................................................................................................... 45 Figure 13. Error types in Repetition condition ................................................................. 46 Figure 14. Performance on fricative-initial clusters in Sentence condition...................... 46 Figure 15. Performance based on second segment in Repetition condition ..................... 47 Figure 16. Performance on clusters broken down by cluster type in Sentence condition 47 Figure 17. Error types in Sentence condition.................................................................... 48 Figure 18. Different manifestations of inserted vowel between voiceless consonants..... 53 Figure 19. Different manifestations of inserted vowel between voiced consonants......... 54 Figure 20. Productions of auditory stimuli by Czech speaker .......................................... 55 Figure 21. Frontal image of HATS system....................................................................... 66 Figure 22. Mid-sagittal ultrasound image of the beginning of the sound /s/ .................... 66 Figure 23. Automatically tracked contour ........................................................................ 68 Figure 24. The sequence of steps in ultrasound data collection ....................................... 70 Figure 25. Overlay of spatiotemporal XY-T images ........................................................ 71 Figure 26. Difference graph for the first 8 frames of ELR’s succumb and zgama ........... 71 Figure 27. Schematic of L2 norms.................................................................................... 73 Figure 28. (a)-(c) XY-T displays of each surface in the /s´k/, /sk/, /zg/ triad for speaker

JED............................................................................................................................ 74 Figure 29. JED’s production of [zEl] in zealot.................................................................. 75 Figure 30. Examples from speaker JED ........................................................................... 76 Figure 31. Examples from speaker JED ........................................................................... 76 Figure 32. Examples from speaker HJC for the labial triad ............................................. 78 Figure 33. Performance on clusters broken down by context category............................ 95 Figure 34. Possible association topologies for onset clusters ......................................... 106 Figure 35. Example association topology for onset and coda clusters that violates

*MULTASSOC .......................................................................................................... 106 Figure 36. Scatter plots indicating performance on successively higher ranked strata .. 129 Figure 37. Spectrograms demonstrating different productions of /f+voiceless obstruent/

clusters from the experiment in Chapter 3.............................................................. 133

Page 9: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

ix

Figure 38. Pre-tonic schwa deletion in word-initial position in read speech.................. 146 Figure 39. Schwa retention is slow speech “suffice”...................................................... 149 Figure 40. Schwa deletion in fast speech “suffice” ........................................................ 149 Figure 41. Elision by sequence type ............................................................................... 150 Figure 42. Aspiration/devoicing by sequence type......................................................... 154 Figure 43. Elision compared to aspiration/devoicing ..................................................... 155 Figure 44. Duration in ms. of /s/ in tokens with schwa elision vs. schwa retention....... 156 Figure 45. Duration in ms. of /l/ in tokens with schwa elision vs. schwa retention ....... 157 Figure 46. The word superior with vowel deletion and aspiration on the /p/................. 158 Figure 47. Distribution of schwa durations for slow and fast speech tokens ................. 159 Figure 48. Hypothetical histogram representing a phonetic continuum of reduction..... 160 Figure 49. Distribution of normalized schwa durations for slow and fast speech tokens

................................................................................................................................. 161 Figure 50. Proportion of elision for each individual /#C´C/ environment ..................... 163 Figure 51. Proportion of devoicing/aspiration for each individual /#C´C/ environment164 Figure 52. Overall elision patterns by participant........................................................... 165 Figure 53. Elision in rate-dependent eliders ................................................................... 165 Figure 54. Elision in rate-independent eliders ................................................................ 166 Figure 55. A hypothetical probability distribution for a CC-COORD phase window...... 171 Figure 56. (a) Standard English CC-COORD constraint with phase window. ................. 172 Figure 57. Normal and fast CV-COORD constraints (with phase windows). .................. 176

Page 10: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

x

TABLE OF TABLES

Table 1. Word-initial cluster stimuli used in Davidson et al. (2003)................................ 34 Table 2. Pseudo-Czech word-initial clusters used in experiment ..................................... 40 Table 3. Possible response types....................................................................................... 42 Table 4. Proportion correct in Repetition and Sentence conditions for the first segment

divided by version seen first ..................................................................................... 42 Table 5. Accuracy groupings for cluster type in the Repetition condition ....................... 45 Table 6. Accuracy groupings for context category in the Sentence condition ................. 48 Table 7. English and pseudo-Polish experimental target words ....................................... 65 Table 8. Speakers’ productions of non-native stimuli ...................................................... 80 Table 9. Frame-by-frame average L2 norms for each speaker ......................................... 81 Table 10. Languages containing /f/, /z/, and /v/-initial clusters........................................ 92 Table 11. Ranking typology to account for fricative-initial onset cluster inventories...... 98 Table 12. Observed proportion correct versus predicted proportion correct .................. 127 Table 13. Observed proportion correct versus predicted proportion correct .................. 128 Table 14. Details of the production of /f+voiceless obstruent/ sequences...................... 132 Table 15. Experimental /#C1´C2-/ tokens ....................................................................... 148 Table 16. Predictions for acoustic output of overlapped /C/ and /´/ .............................. 153 Table 17. Number of cases of /s/-initial tokens with elided schwa and with schwa present

by individual token and by speaker ........................................................................ 155 Table 18. Number of cases of /l/-second tokens with elided schwa and with schwa

present. .................................................................................................................... 156 Table 19. Correlation of word frequency and elision scores. ......................................... 168

Page 11: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

1

CHAPTER 1. The Elements of Phonetic and Phonological Representations In the study of the phonetics-phonology interface, it is often assumed that there is a division between abstract, categorical, symbolic phonological processes and grounded, gradient phonetic ones. The distinction between the two types of processes is that phonological knowledge employs qualitative categories that are cognitive in nature, whereas phonetic implementation is contingent on the fact that articulators have continuous trajectories that give rise to gradient differences among outputs (Keating 1988, Pierrehumbert 1990). The need for a phonological level of representation arises from the observation that some processes cannot be explained on an articulatory or acoustic basis alone. An obvious example is suprasegmental phenomena concerned with metrical structure, which are not affected by the limitations imposed by vocal tract movement, but rather controlled by an abstract system that determines the different types of stress patterns that can arise (e.g. Hayes 1985, Halle and Vergnaud 1987). But even processes which directly involve transitions between articulators or the influence of one segment on another, such as nasal place assimilation or voice assimilation in consonant clusters, are also typically considered phonological phenomena (e.g. Chomsky and Halle 1968). It has been argued, however, that certain processes previously treated in a phonological framework may actually result from gradient phonetic implementation, not from the imposition of categorical rules, as in the case of anticipatory nasalization in English (Cohn 1993). Other similar processes, however, like vowel nasalization in French, remain fundamentally phonological. While much work in this area has been devoted to demonstrating that there must be a distinction between phonological and phonetic processes, there have also been a number of researchers who have focused on detailing a framework which can incorporate both phonological and phonetic information into a unitary grammatical system. Notably, integrated accounts have proposed that grammars must be sensitive to both acoustic/perceptual and articulatory factors. Both Flemming (1995) and Kirchner (1998/2001) propose that phonological representations contain all speaker-controlled articulatory properties as well as auditory properties; in these systems, the phonology has access to both articulatory and acoustic elements. A later proposal by Flemming (2001) postulates an Optimality Theoretic-like system which requires weighted constraints in order to best satisfy the maximization of contrast and the minimization of articulatory effort by the speaker. Another type of proposal by Boersma (1998) argues that there are separate production and perception grammars, but he still advocates for a phonology that is concerned with both articulatory and perceptual specifications. The constraints allowed in his system are the standard phonological kind (e.g. *COMPLEX), ones that refer to articulatory events (e.g. *GESTURE), and others that pertain to perceptual drives (e.g. *WARP). In comparison to these frameworks which integrate phonology and phonetics by incorporating primitives from both domains into a single system, Browman and Goldstein’s Articulatory Phonology (Browman and Goldstein 1986, 1988, 1990a, b, 1992a, b, 1995, 2001) reconceptualizes the interface by assuming an innovative type of representation over which the grammar operates. Whereas other accounts often presuppose underlying representations that are segmental in nature (whether perceptual, as for Boersma, or more abstract), Browman and Goldstein claim that the basic unit of

Page 12: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

2

representation is a dynamically defined gesture. Unlike the segment, which is a static entity composed of features, the gesture is defined articulatorily along both spatial and temporal dimensions. The spatial dimensions include vocal tract variables, such as lips, tongue tip, glottis, etc., which are involved in determining the location in the vocal tract where the gesture is made. The spatial dimension also consists of the types of constrictions that can occur, such as closures for stop gestures or critical narrowing for fricative gestures. The temporal dimension refers to landmarks in the gesture over time: the gestural onset; the onset, center, and release of the target; and the gestural offset. It is change over time which makes gestural representations dynamic. Although the gestural score—the level of organization at which the relationships between all gestures in a domain like the word are represented—does not contain discrete linear units, the constellation of gestures (a group of more than one gesture that makes up a specific sound) and the timing relationship among them can be thought of as representing phonological units like stops or liquids or vowels, or other segments as they are traditionally considered (Byrd 1996b). Furthermore, the coordination relationships between adjacent gestures and the constraints that govern them are important for understanding how syllable structure is realized in Articulatory Phonology. Articulatory Phonology and its assumptions will be fleshed out more completely in the next section. One appealing aspect of Articulatory Phonology as a grammatical framework is that gestural representations and the interactions between them are potentially able to give rise to both phonological and phonetic phenomena. That is, it seems likely that certain manipulations of gestures—such as compression or overlap—could lead to gradient behavior, whereas others may be categorical in nature—such as gesture deletion or insertion. However, the ability of Articulatory Phonology to capture both types of phenomena has been a matter of dispute. One strength of the current formulation of Articulatory Phonology is that it demonstrates that many phenomena that have previously been given a discrete, phonological explanation may actually be more appropriately accounted for by gestural coordination and overlap. For example, Browman and Goldstein and others have shown how changes in the relative positioning and duration of different gestures can lead to optional phenomena like schwa deletion and insertion (Browman and Goldstein 1992b, Jannedy 1994), intrusive consonants (Gick 1999), nasal place assimilation (Browman and Goldstein 1990b), and palatalization at word boundaries (Zsiga 1995, 2000). These are often processes found in fast speech or in a more casual speaking register, but ones which have typically been classified as phonological. On the other hand, it has been argued that Articulatory Phonology is ill-equipped to handle these processes when they are categorical, including deletion or insertion of segments, such as schwa epenthesis in Dutch (Warner, Jongman, Cutler and Mücke 2002) or linking and intrusive [r] in non-rhotic varieties of English (McMahon, Foulkes and Tollfree 1994, but see Gick 1999 for an alternative gestural account). This perceived incompatibility between adjustments among gestures and the modification of the actual gestural score in Articulatory Phonology has arisen from the claim that variation in different speech registers or at different speaking rates may result from changes in magnitude and overlap, but that “gestures are never changed into other gestures, nor are gestures added” (Browman and Goldstein 1992a:173). However, while there has not been an alternative theory presented to explain these processes or the formulation an

Page 13: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

3

Articulatory Phonology-based formal theory that could account for them, it is not the case that Articulatory Phonology is fundamentally ill-equipped to make the necessary distinctions. Another concern regarding Articulatory Phonology is that while its goal is to create a functionally-based framework for phonological theories, it ignores the other functionally grounded area which has also been shown to have an affect on phonological grammars: namely, acoustic or perceptual phonetics. It has been recognized for some time that phenomena like sound change and connected speech processes are influenced by the pressures of both perception and articulation (e.g. Martinet 1952, Ohala 1981, Lindblom and Maddieson 1988, Kohler 1990, Lindblom 1990a). For example, Kohler (1990) examined segmental reduction in German in order to investigate both the articulatory pressure for economy of effort and the perceptual constraints that determine which of the possible speech patterns become cemented in a language. Kohler argues that segmental reductions, deletions, and assimilations may initially occur in order to minimize energy expenditure, but notes that that “they are only accepted (1) if they bear an auditory similarity to their points of departure and (2) if the situational context does not force the speaker to rate the cost of a misunderstanding or a break-down of communication very high (89)”. A similar claim is made by Lindblom (1983, 1990a), who proposed the Hyper- and Hypospeech Theory (H&H) to elucidate how speakers adapt their speech to accommodate the communicative situation. In hyperspeech mode, speakers may over-articulate in order to maximize contrast between linguistic elements, but they use hypospeech when minimization of articulatory effect can be achieved without compromising the ability to correctly perceive the message. Lindblom (1990a) underscores the relationship between phonetic influences and language typology by making an explicit link between the H&H Theory and cross-linguistic inventories. More recently, a number of analyses have sought to formalize the role of perception in Optimality Theoretic grammars, which are fundamentally well-suited to capturing competition (e.g. Jun 1995, Côté 1997, Silverman 1997, Steriade 1997, Kirchner 1998/2001, Hayes 1999, Côté 2000, Wilson 2000, Fleischhacker 2001, Wilson 2001). These will be discussed further in the next section. As advocated by Zsiga (2000), it seems likely that ultimately, a distinction between a phonological and phonetic component will have to be maintained. As is implicit in many proposals advocating the need for both articulatory and perceptual aspects in the grammar, it is only at a more abstract phonological level that otherwise unrelated phonetic factors may combine and interact to influence not only language inventories, but also speech production. In addition, the existence of both phonetic and phonological components may explain the dissociation between the modification of gestural scores and fine adjustment of gestural timing; whereas the phonological component can manipulate the gestural content of the score and determine and alter abstract coordination relations among adjacent gestures, the phonetic component controls the timing between gestures that do not have coordination relationships specified by the phonology.1 As will be illustrated in Section 2.1, the majority of cases that Browman and

1 From this point on, the term “timing” will be reserved for the fined-grained phonetic implementation of relationships among gestures, and “coordination” will be used in reference to the abstract, discrete relationships defined by the phonology.

Page 14: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

4

Goldstein use to construct the theory are processes that occur at word boundaries or which are otherwise not determined phonologically. On the other hand, when relationships between gestures are specified in the output, they are in the domain of the phonology. Crucially, the phonology must have access to abstract coordination relationship between gestures, but it does not have access to absolute duration, the implementation of gestural timing, or how gestures at word boundaries interact with one another. In this dissertation, it is assumed that phonological and phonetic systems are unified in the sense that both of them take gestural representations as their input. By positing the same type of representation for both systems, the problem of translation between phonological output and units that the phonetic component can operate on is presumably greatly simplified. It will be illustrated that gestural representations in the phonology are necessary because they allow for the temporal coordination between segmental gestures. The experimental data that will be presented in this dissertation suggest that grammaticization of temporal coordination can account for aspects of surface representations that have been problematic for phonological representations that do not represent temporal relationships. The issue of coordination between gestures—or the lack thereof—has led to the criticism that the Articulatory Phonology framework as initially formulated is merely a descriptive account of relationships between gestures, insufficiently able to predict or constrain the types of coordination and timing relationships that can occur (Kingston 1992, Kingston and Cohen 1992). Kingston argues that while the gestural machinery employed by Articulatory Phonology is capable of accounting for how particular speech events are produced, there is no distinction made, if one exists, between coordination relationships among gestures which are found in natural speech and those which are not. Kingston and Cohen note that by limiting variation to gestural overlap or change in the magnitude of the gestures, Browman and Goldstein (1992a) have made a step in the right direction. However, like other authors, Kingston and Cohen agree that adjustments of timing relationships alone will ultimately fail to account for all types of allophonic variation. Although the papers by Browman and Goldstein do not directly address the issue of possible versus impossible timing and coordination relationships, Gafos (1996/1999, 2002) takes on the task of adopting a formalism based on Optimality Theory (Prince and Smolensky 1993) to demonstrate how coordination relationships among gestures can be constrained to give rise to different configurations between consonants and vowels. In the formalism developed in Gafos (2002), coordination relations are defined by alignment constraints, which indicate how gestures are related to one another in the output in order to give the optimal configuration. The central goal of this dissertation is to investigate the types of elements that are necessary in the phonological component, whether articulatory, perceptual, temporal, or abstract in nature, and to study how these elements interact. These questions extend both to the input representations that are submitted to a phonological grammar, and to the types of constraints found in the grammar. In order to adequately account for both production data and data from language typology, a phonological component must include articulatory gestures, which are the basis for spatiotemporally defined linguistic units; perceptual features, which play an important role in determining phonotactic

Page 15: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

5

patterns; and gestural coordination, which establishes whether and how adjacent gestures are related to one another. In this dissertation I examine the production of word-initial consonant clusters in various conditions, since they are an environment in which perception, articulation, and coordination interact in the phonology. I draw on the insights from research in the Licensing-by-Cue framework (Steriade 1997) to determine how perceptual factors affect consonant cluster production by English speakers during various tasks. Furthermore, building on the formalism based on Articulatory Phonology developed in Gafos (2002), I further explore the claim that a phonological component which incorporates gestural coordination can best account for various phenomena found in production. I will argue that a coherent phonological system takes gestures as inputs to a constraint-based grammar that evaluates both gestural coordination relations and the components of the gestures themselves. Crucially, the formation of coordination relationships among gestures is a grammatical process that defines how adjacent gestures are related. In other words, a gestural coordination system must account for whether consonants have a relationship with preceding or following vowels, whether consonants are coordinated with other consonants, and whether these complexes have a relationship with the same vowel or with different ones. In essence, this provides the basis for a gestural correlate of syllable structure. This will be illustrated with experimental data from schwa insertion in production of Czech onset clusters by English speakers and English schwa elision in pre-tonic syllables.

1.1. Outline of the dissertation The organization of the dissertation is as follows. In Chapter 2, I discuss the basic tenets of both Articulatory Phonology and perceptual approaches to phonology in detail. I also address the formal theory developed by Gafos (2002), and review empirical findings that have either supported or refuted the claims of Articulatory Phonology. Theoretical frameworks incorporating articulatory and perceptual factors into the phonological grammar are also discussed. Chapters 3 and 4 contain experimental results that provide direct evidence for the need for both abstract phonological entities and gestural coordination in phonological grammars. In Chapter 3, I report on the acoustic results of an experiment in which English speakers produced non-native word-initial clusters. It is shown that speakers make distinctions among illegal phonotactic sequences, even though none of them are permitted in the native English lexicon. When speakers fail to correctly produce the target word-initial clusters, they most often repair them by inserting a schwa between the two consonants in the cluster. The precise nature of this schwa is investigated in Chapter 4. Evidence from articulatory studies have suggested that ill-formed phonotactic sequences may be repaired not only by phonological epenthesis, but also by modifications of gestural coordination. An ultrasound study of the production of non-native word-initial clusters was conducted to determine whether English speakers are repairing illegal sequences with phonological epenthesis or through alterations in gestural coordination. In Chapter 5, a formal analysis of the experimental findings regarding the production of non-native onset clusters is developed. The chapter is divided into three parts. First, it is argued that the production patterns exhibited by English speakers can only be accounted for with a phonological analysis, and that a purely articulatory account

Page 16: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

6

will not suffice. Rather, an adequate analysis of word-initial consonant clusters, both to account for the experiment and to explain typological facts about consonant cluster inventories, must appeal to both perceptual and articulatory factors that can only be combined at the phonological level. Results of the ultrasound experiment from Chapter 4 provide further evidence that gestural coordination must reside in the grammar in order for the speakers’ repairs to be able to interact with phonotactic markedness constraints. Second, it is hypothesized that gestures subject to coordination relationships must be defined by the phonology, so Gestural Association Theory, a mechanism for determining which consonants and vowels have coordination relationships among them, is developed. Finally, the proportion of accuracy on the different non-native clusters produced in the experiment in Chapter 3 is accounted for with a floating constraint analysis. It is argued that speakers are sometimes accurately producing the phonotactically illegal clusters and are otherwise failing to coordinate the consonants correctly. This behavior is attributed to a grammar in which a coordination constraint pertaining to consonant sequences does not have a fixed ranking with respect to consonant cluster markedness constraints. In Chapter 6, the relationship between the theoretical framework developed in Chapter 5 and the results of an experiment investigating fast speech pre-tonic schwa deletion in English are discussed. Whereas coordination assists in producing a schwa under certain conditions in some environments, it may also play a role in eliminating it in others. Though the findings suggest that the phonotactic environment (i.e. whether the resulting cluster is legal word-initially in English) does not seem to affect deletion rates, it is argued that speech rate may nevertheless affect the phonology by modifying the standard coordination relationships among gestures.

Page 17: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

7

CHAPTER 2. Articulatory and Perceptual Approaches to Phonology In this chapter, evidence and previous arguments for incorporating articulatory, perceptual and coordination elements in the grammar are presented. In Section 2.1, the basic tenets of Articulatory Phonology are explained in some detail, as they form the basis for the constraint-based grammar of gestural coordination that will be adopted in this dissertation. The notion of the gesture and the organization of gestures into gestural scores is introduced in Section 2.1.1. Gestures are the fundamental phonological unit in Articulatory Phonology, and scores represent how gestures combine to form a lexical entity. The relationship between gestural representation and traditional phonological natural classes is covered in Section 2.1.2. The similarities between gestural notions of natural classes and those used in feature geometry aid in the interpretation of how the phonology can make reference to types of gestures that are similar along some dimension. In Section 2.1.3, phasing and coordination relationships, and Gafos’s (2002) constraint-based grammar taking gestures and coordination relationships as input is introduced. This is the framework that will be used in the formal analyses of the data presented in this dissertation in Chapters 5 and 6. Section 2.1.4 contains a review of previous experimental and instrumental studies of gestural overlap. In addition to articulation, the basis of a gestural theory of phonology, perception also has been argued to play a critical role in shaping phonological inventories and phonotactic patterns, and in influencing how speech is implemented by a speaker who is (implicitly) taking the needs of the listener into consideration. This idea is reviewed in Section 2.2. In Section 2.2.1, empirical evidence supporting the notion that speech production and the organization of typologies are affected by both perception and articulation is presented. Research focused on developing theoretical frameworks to account for this idea is reviewed in Section 2.2.2.

2.1. Articulatory Phonology and gestural representation

2.1.1. Gestures and gestural scores Gestures in Articulatory Phonology, as described by Browman and Goldstein (1986, 1989, 1990b, 1992a), are defined by variables that refer to both the location and degree of constrictions in the vocal tract. Five tract variables are identified: Lips (L), Tongue Tip (TT), Tongue Body (TB), Velum (VEL) and Glottis (GLO) (other tract variables, like Tongue Root, have yet to be developed in detail). It is assumed that each of these articulators is independent of the others, which allows for separate gestural representations for each variable. Tract variables are specified for constriction location (CL), to accommodate the fact that variables can constrict the vocal tract in more than one location. CL values include [protrusion], [labial], [dental], [alveolar], [postalveolar], [palatal], [velar], [uvular], and [pharyngeal]. In addition to location, there is also constriction degree (CD), which is necessary for distinguishing between different types of constrictions that can be made at the same location. There are five values for CD: [closure], [critical], [narrow], [mid], and [wide]. For stops, the CD is [closure] (clo), whereas for fricatives, it is [critical] (crit). This distinction corresponds to the values in the [±continuant] feature of Chomsky and Halle

Page 18: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

8

(1968). The remaining three constriction degrees pertain to approximants and vowels, where [mid] in consonants is similar to Catford’s (1977) feature [approximant], but is grouped with [wide] and [narrow] in vowels to refer to vowel height. Combining location and degree information, for example, a [g] would be produced with a tongue body constriction location (TBCL) of velar and a tongue body constriction degree (TBCD) of closure. For Browman and Goldstein, the gestures that constitute a speech sound are considered the primitives of phonology and are given significance when they are combined into a pattern of organization that is the representation of an utterance. The information that is necessary for defining the phasing relationships between gestures at any point in the utterance is represented in a two-dimensional gestural score. The dimension on the y-axis corresponds to the articulatory tiers, while the x-axis encodes temporal information. This is shown in the schema for the word “pan” in Figure 1.

VEL wide TB æ wide pharyngeal TT clo alveolar LIPS clo labial GLO wide

Figure 1. Schematic gestural score for ‘pan’ [phæn]

In this schema, the sound usually described as the voiceless aspirated stop [ph] corresponds to both the labial closure and the wide glottal gesture. The gesture defined by TBCD=[wide] and TBCL=[pharyngeal] corresponds to the vowel [æ], while the [wide] gesture produced at the velum in combination with TBCL=[alveolar] and TBCD=[closure] is the representation of the [n]. In the case that multiple gestures are required to identify one speech sound (as typically defined), the grouping of gestures into a coherent unit is called a constellation of gestures, and can be thought of as corresponding to the traditional notion of segment. The specifics of the timing of the two gestures within a constellation, as in [ph], will be described further below, as will the notion that the vocalic gesture spans across the duration of the initial and final consonantal gestures.

2.1.2. Regularities and natural classes Browman and Goldstein (1989) claim that gestural scores are effectively lexical entries. Minimal contrast between lexical entries—such as between ‘pan’ and ‘man’—is identified by differences in the gestures that make up the score. Thus, while the initial labial closure in ‘pan’ has a relation with a wide glottal gesture, the [m] of ‘man’ will not have a wide glottis but rather a wide velum in a relationship with the labial closure instead. These kinds of contrasts suggest that generalizations about phonological entities can be gleaned from the gestures and gestural coordination. Browman and Goldstein contend that the organization of gestures in Articulatory Phonology has much in common with the hierarchies found in various feature geometry proposals (Clements 1985, Sagey 1986, Ladefoged 1988a, b, McCarthy 1988, Halle 1992), which were originally

ph

n

Page 19: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

9

developed to demonstrate that phonological features are organized with respect to articulators. The articulatory geometry tree proposed by Browman and Goldstein (1989, and modified by Gafos (1996/1999)) in which gestures are organized by major articulators is shown in Figure 2: Articulator set Dimensions LIPS [CD, CL] Oral TT [CD, CL] Tongue Vocal Tract TB [CD, CL] TR [CD, CL] VEL [CD] GLO [CD] LARYNX [CD]

Figure 2. Feature geometric articulator tree

In addition to dividing the gestures in terms of articulators, they can also be further delineated by the constriction degree and constriction location specifications. Constriction degree is the gestural analog of manner, though Browman and Goldstein are careful to note that it is purely articulatory in nature and not acoustic in any sense. As noted in Section 2.1.1, the constriction degree [closure], which results in a complete stoppage of airflow through the vocal tract, defines both oral and nasal stops, whereas [critical], which allows for frication when it is in conjunction with an oral articulator, is the same as fricative manner. The values [narrow], [wide], and [mid] correspond to vowel features like [+high], [+low], and [+mid], but are also relevant for glottal aperture. Whereas the [wide] glottal gesture in combination with stops leads to aspiration, the [narrow] glottal gesture is associated with unaspirated stops. Just as constriction degree is related to manner, constriction location is similar to place. Browman and Goldstein note that a difference between traditional notions of place and articulatory constriction location is that there is not a one-to-one mapping of articulators to constriction locations. That is, the set of articulators forming a constriction and the location of that constriction are independent dimensions of a gesture and it may be possible for more than one articulator to make a constriction at a given location. Consequently, they argue that this kind of mapping is better motivated because it shows how these two gestural variables can combine produce fine-grained articulatory distinctions, such as bilabial, labio-dental, and dental. This is shown in Figure 3:

Page 20: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

10

Articulator Set Constriction Locations LIPS protruded labial dental TT alveolar postalveolar palatal TB velar uvular TR pharyngeal

Figure 3. Possible mappings between articulator sets and constriction locations

2.1.3. Phasing and coordination relationships Now that the spatial characteristics of gestures have been reviewed, their temporal aspects can be examined in order to determine how gestures behave when in contact with one another. Much like vocal tract variables have analogs in feature geometry, the spatial information represented in a gestural score can be organized in terms of functional tiers that are similar to those posited for consonants and vowels in CV phonology (e.g. Clements and Keyser 1983). In both of these frameworks, it is hypothesized that consonants and vowels are represented on separate tiers, where vowels are contiguous on the vocalic tier, but consonants on the consonantal tier are not (e.g. Öhman 1966, Carney and Moll 1971). Whereas contiguous vowel sequences are not affected by intervening consonants, a vowel situated between two consonants prevents them from overlapping at all (Browman and Goldstein 1990b, Gafos 1996/1999). This organization has been used to explain, for example, coarticulation facts (Keating 1985) and why vowel harmony is a common process in natural language, but consonant harmony is very rare (Gafos 1996/1999). While the division of consonants and vowels onto separate tiers is motivated by facts from coarticulation and vowel harmony, this does not mean that there is no relationship between consonants and vowels or between adjacent consonant. Using tracings from x-ray pellet trajectories of speakers producing various different types of words, Browman and Goldstein (1990b) demonstrate that consonant and vowel gestures have coordination relationships that consistently associate particular positions of one gesture with particular positions of another, depending on which gesture precedes or follows the other. Traditionally, gestural landmarks are described with a notation based on a 360º trajectory for each gesture. However, Gafos (2002) employs a more intuitive notion to indicate the types of relationships found between consonants and vowel and consonants and other consonants. This notation is shown in comparison to the 360º trajectory in Figure 4: target center release ` 180º 240º 330º onset 0º release offset 360º

Figure 4. Gestural landmarks

Page 21: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

11

Gafos’s landmarks do not correspond to neat divisions of the 360º cycle because the trajectory of a gesture is assumed to be critically damped. Critical damping, a feature of the task-dynamic model that has been developed to implement gestural phonology, means that the trajectory of the gesture approaches its equilibrium position increasingly slowly. Browman and Goldstein define the equilibrium position as occurring at 240º (Browman and Goldstein 1990b). For English sequences, it was determined that the center of the consonant (about 240º) in simple VC sequences coincides with the release of the vowel (about 330º).2 Using Gafos’s notation, this relationship can be approximated by aligning the release of the vowel with the center of the consonant. For simple CV sequences, they found that the center of the consonantal gesture is phased with the onset of the vocalic gesture. A division between VC and CV timing relationships is further supported by findings reported in Browman and Goldstein (1995), who show that nasals and laterals are realized differently when they are in the onset as opposed to the coda. In this case, the individual gestures that make up the nasal (tongue body closure and velum lowering) or the lateral (tongue tip raising and tongue body retraction) are timed such that the gesture with the wider constriction degree comes earlier following the vowel, but is approximately simultaneous with the narrower constriction when preceding the vowel. Browman and Goldstein note that this type of distinction in Articulatory Phonology reflects the kinds of distinctions that are captured by syllable structure. A number of other articulatory studies have also shown that pre-vocalic/onset consonants have different patterns of gestural organization from post-vocalic/coda consonants (e.g. Krakow 1989, Sproat and Fujimura 1993, Browman and Goldstein 1995, Byrd 1996a, Fougeron and Keating 1997, Kochetov to appear). In his work on Moroccan Colloquial Arabic, Gafos further formalizes coordination relationships with Alignment constraints in the grammar (McCarthy and Prince 1993, see also Zsiga 2000 for phonetic alignment constraints). Thus, VC- and CV-coordination relations for English are defined by the following constraints: (1) VC-COORD: ALIGN (V, release, C, center) (2) CV-COORD: ALIGN (C, center, V, onset) These VC- and CV-coordination relations are shown schematically in Figure 5: V C C V

Figure 5. Coordination relationships for VC and CV sequences

(solid lines=consonants, dotted lines=vowels). The vertical line marks the point of alignment. Consonant clusters both in the onset and in the coda of a syllable present a new set of considerations for determining the coordination relationships between gestures. 2 Though not a term typically associated with vowels, release refers to the point at which the trajectory of the gesture starts to move away from its steady-state (or equilibrium) portion. In obstruents, the release coincides with an audible event; while this does not occur for all gestures, a release is nevertheless part of the gesture.

Page 22: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

12

With respect to onsets, Browman and Goldstein and colleagues have shown that consonants in a cluster exhibit a consistent timing relationship among themselves, but also act as a single unit with respect to the following vowel (Browman and Goldstein 1988, Byrd 1995, Honorof and Browman 1995, Browman and Goldstein 2001). Evidence from x-ray microbeam tracings of the utterances sayed [sejd], paid [pejd], and spayed [spejd] showed that with respect to the fixed reference point of maximum constriction for the final [d], the centers of the target interval for [s] and [p] in sayed and paid were essentially in the same position (Browman and Goldstein 2001, see also other clusters examined in Browman and Goldstein 1988). Neither of the centers of the individual consonants in the [sp] cluster was similarly aligned to the [d], but the mean of the two centers of each consonants—called the c-center—was found to have the same timing relationship with the reference point as the centers of the singleton [s] and [p]. For singleton consonants, the c-center and the center landmark are equivalent. It is assumed that the vowel gesture does not change with respect to the [d] in any of these words, so the fact that the onset consonants are invariantly timed with respect to it suggests that they also all have similar timing relationships with respect to the vowel. It is concluded that the c-center is the point in the onset that has the most stable relationship with a following vowel. Browman and Goldstein (2001) hypothesize that the stability of the c-center derives from the fact that each consonant in the onset has an individual CV-coordination relationship with the following vowel in addition to a CC-coordination relationship. These two coordination relations both exert pressure on the onset consonants. Full satisfaction of the CV-coordination relation would require that the two consonantal gestures occur simultaneously, since this would be the only way for the target of each C to be aligned with the onset of the V. However, depending on the particular consonants, this might result in one of the consonants being unrecoverable (Mattingly 1981, Silverman 1995/1997), so Browman and Goldstein conclude that it must be the case that the CC-coordination relation has a stronger influence than the CV-coordination relation. In Optimality Theoretic terms, this would result from the ranking CC-COORD >> CV-COORD. In coda clusters, however, no one particular stable timing relationship has been found to suggest that there is a c-center effect for the coda (Byrd 1995, Browman and Goldstein 2001). Based on evidence that excrescent schwas (or an open transition between consonants) in Moroccan Colloquial Arabic occur between consonants in a coda cluster but not in an onset cluster (which has a close transition), Gafos (2002) also posits a difference between the consonantal relationships in complex onsets and codas. The general coordination topology for CCVCC sequences can be described as in Figure 6: C1 C2 C3 C4 CC-coordination relation CV- or VC-coordination relation V

Figure 6. Coordination topology for onset and coda clusters

The evidence from x-ray pellet trajectories already discussed with respect to CV relationships indicates that consonants in onset clusters are not only coordinated with the

Page 23: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

13

vowel, but also have consistent coordination relations in which the release of the first consonant is aligned with the onset of the second consonant (Browman and Goldstein 1990b). This differs somewhat from the CC-coordination relation for English discussed by Gafos (2002), who suggests that consonants in close transition in a cluster have a CC-coordination relationship such that the release of the first consonant is aligned with the target of the second consonant (i.e. ALIGN (C1, release, C2, target)). Gafos’s coordination relation follows from Catford (1988), who defines close transition as the formation of the articulatory stricture of the second consonant in a cluster before the stricture of the first consonant is released. A phonetic alignment constraint expressing a very similar timing relationship for English consonants across word boundaries was also proposed by Zsiga (2000). Using Gafos’s schematic notation, this coordination relationship is demonstrated in (3): (3) CC-COORD in English: ALIGN (C1, release, C2, target) C1 C2 Further evidence supporting the coordination relationship in (3) as the appropriate alignment for English comes from comparison with the optimal CC-COORD relationship in Moroccan Colloquial Arabic (MCA). Because MCA exhibits a transitional schwa in coda cluster, Gafos hypothesizes that this arises from a timing relationship in which there is not a continuation from the target plateau of the first consonant to that of the second. This leaves a short period of open vocal tract between the two consonants, which gives rise to the transitional schwa. (Pressure from CV-COORD, which affects onset but not coda clusters, will force the retiming of onset consonants so that there is no transitional schwa syllable-intially.) CC-COORD in MCA is demonstrated in (4): open vocal tract (4) CC-COORD in MCA: ALIGN (C1, center, C2, onset) C1 C2 In English, both onset and coda clusters are in close transition, and acoustic releases are not evidenced between consonants in a cluster. However, it is possible that different types of consonants necessitate different coordination relations between them. For example, Browman and Goldstein base their definition of the timing relationship between consonants in onset clusters on x-ray tracing evidence from a [pl] cluster. Because the [l] is vocalic in nature, it may have a CC-coordination relationship similar to the English preferred CV timing. Gafos, on the other hand, formulates his CC-Coord constraint on the basis of obstruent clusters, which may need a different amount of overlap to produce a close transition. Further examination of experimental data of [stop+liquid] clusters and obstruent clusters is necessary to determine the coordination relationships that may hold between different types of consonants in clusters. The idea that consonant manner and place as well as extralinguistic factors like speech rate influence the phasing relationships between CV, CC, and VV sequences has been studied in depth by Byrd (1994, 1996a, 1996b), who has claimed that the invariant nature of the coordination relationships posited by Browman and Goldstein does not hold

Page 24: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

14

up across all conditions. Specifically, she argues that there is a constrained variability in intergestural timing that can be affected by factors such as the manner and place of each consonant, whether they are tautosyllabic or heterosyllabic, and whether it is possible for resyllabification to occur across word boundaries (Byrd 1996a). In order to account for the variability found in phasing relationships, Byrd (1996b) introduces a phase window framework which allows linguistic and extralinguistic variables to affect intergestural timing relationships while imposing limitations on the influence these factors can have. In this framework, probabilities are assigned to a restricted range of possible phase relations between gestures. For example, a cluster in coda position may increase or decrease in overlap within certain limits, and with a certain probability. The probability associated with the amount of overlap can be influenced in three different ways: (1) by having a preference for a particular region of the phase window, (2) by differing in the extent of the window that it can influence, and (3) by contributing a particular weighting or activation level to some linguistic or extralinguistic factor. A more extensive discussion of phase windows and how they might be incorporated into a phonological account of coordination is discussed in Chapter 6. Byrd is careful to note that the phase window is most pertinent for understanding how lexically unspecified timing relationships, such as those across word boundaries, may vary from utterance to utterance. She hypothesizes that within lexical items, there can be lexically-specified relationships between gestures that are stable, which may well correspond to the traditional notion of segment. It is implied that it is also the case that timing relationships may be more stable for particular components of a syllable (such as onsets) than for others (see also Byrd 1995). This would result in narrower phase windows for those timing relations as well. Zsiga (2000) discusses how the phase window framework might be applied to palatalization of /s/ across word boundaries, as in the realization of “miss you” as [mISU]. Zsiga showed that at word boundaries, English speakers show a range of palatalization effects from full palatalization (=[S]), to a token that is acoustically similar to [s] at the beginning of the production and closer to [S] at the end (=[sS]), to an audible release of the coronal fricative (=[s#j]). Zsiga attributes these differences to variations in the amount of overlap of the critical alveolar gesture and the palatal gesture, and posits a constraint that contains a window of possible alignment locations for the first consonant: ALIGN(C1, [240°-270°], C2, [240°]) (see Figure 4). A similar proposal is offered by Cho (1998), who examined gestural overlap in two kinds of comparisons: lexicalized (semantically opaque) vs. non-lexicalized (semantically transparent) compounds and tautomorphemic vs. heteromorphemic sequences in Korean. For example, descriptively, /t/ is palatalized before /i/ only in derived environments: /mat-i/ [madZi] ‘the eldest’ but /mati/ [madi] ‘knot’.3 Based on data from EPG and EMA from Korean, Cho finds considerably greater variability (measured in standard deviation) in gestural timing between two consonant gestures at the boundary of two morphemes in a non-lexicalized compound than in a lexicalized compound. He develops an Optimality Theoretic analysis incorporating the phase window framework that focuses on the interaction between the constraints OVERLAP and IDENT(timing). He proposes that the intergestural coordination relationships for 3 Intervocalic voicing in Korean is a separate process.

Page 25: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

15

lexicalized compounds and tautomorphemic sequences are specified in the lexical representation of these words, but that there is no such specification for non-lexicalized compounds or heteromorphemic sequences. The lexicalized coordination relationship is accompanied by the specification of a phase window that regulates the amount of overlap that two gestures can exhibit. This is regulated by the constraint IDENT(timing), defined in (5): (5) IDENT(timing): The range of gestural phasings in the output must be identical to, or

fall within the Phase Window in the input which specifies a permissible range of gestural overlap.

This constraint is violated whenever the overlap between two consonants does not fall within the range specified by the phase window. Such a violation could be compelled by the constraint OVERLAP, which Cho takes to mean that two gestures must be maximally overlapped. OVERLAP is evaluated in a gradient fashion, such that partial overlap incurs one violation, and minimal overlap gains two violations. Thus, in the derived (non-lexicalized) environment, there is no specified timing relationship, so satisfaction of OVERLAP leads to palatalization, as in [madZi]. A similar account of the realization of /RC/ clusters in Urban East Norwegian is developed in Bradley (2002). Alternatives to Cho’s (1998) notion of overlap have also been proposed, namely that gestures prefer to avoid total overlap rather than require it. In order to explain Korean palatalization effects, Cho assumes that “the system prefers as much overlap as possible unless there are other competing factors in the grammar, such as contrast maximization constraints and preservation of lexically specified timing (33).” However, constraint-based gestural analyses of other phenomena, such as svarabhakti in various languages (Hall 2002) and OCP restrictions in Moroccan Arabic (Gafos 2002) and English (Smorodinsky 2002), have either explicitly or implicitly proposed constraints that require gestures to reduce overlap as much as possible. Because Cho does not incorporate coordination constraints like the Alignment constraints proposed by Gafos (2002), he is required to posit OVERLAP in order to ensure that gestures in Korean will overlap sufficiently in order to produce palatalization. However, an analysis containing both coordination constraints and also constraints specifying which gestures are in a relationship with one another so that they are subject to the coordination constraints, such as the one proposed for palatal coalescence in Zoque (Davidson 2003), would also be able to capture the facts presented by Cho. It is important to reiterate that Articulatory Phonology is a theory of phonology; that is, it takes gestural scores to be lexical representations. Typically it is assumed that certain coordination relations among gestures are specified as part of the lexical entry (e.g. Byrd 1996b), but it will be argued in this dissertation that specified relationships, or associations, are both created and regulated by grammatical constraints. This occurs when associations are essential to defining relationships between consonants and vowels and consonants and consonants that are typically considered to have syllabic relationships, such as a CV that is an onset-nucleus or a CC that is an onset cluster. The grammatical constraints necessary for determining which gestures are associated with one another and how they interact with the alignment constraints proposed for coordination relations is developed in Chapter 5.

Page 26: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

16

2.1.4. Gestural overlap Having explained the notion of intergestural coordination, it is now possible to review studies that both demonstrate the successes of Articulatory Phonology in accounting for phonological phenomena and indicate where it falls short of doing so. Browman and Goldstein (1990b) establish that a number of alternations found in casual speech, including segmental insertions, deletions, assimilations, and weakening can result from alterations in magnitude and temporal duration in the gestural score. For example, they examine word-final deletions, such as when the final /t/ of the word perfect in “perfect memory” is not present in the acoustic record, as in [p‘fEkmEmri]. Using evidence from x-ray pellet trajectories, they demonstrate that in fact the tongue tip raising gesture for the /t/ is still present, but it is completely overlapped by the labial closure gesture for the /m/. If the onset or target of the closure gesture for /m/ overlaps the release of the alveolar closure gesture, it will not be audible. Thus, a case of apparent deletion can in fact be explained by modification in the timing relationship between the gestures, where all gestures from the input are still present in the output. This can be illustrated using the gestural score notation, as in the partial scores in Figure 7: TB clo velar clo velar

TT clo alv clo alv

LIPS clo lab clo lab (a) closure gestures for /k/, /t/, and /m/ in (b) closure gestures for /k/, /t/, and /m/ in list reading of [p‘fEkt mEmri] (no overlap casual production of [p‘fEkmEmri] of alveolar and labial gestures) (total overlap of alveolar gesture by velar and labial gestures)

Figure 7. Canonical and casual speech productions of perfect memory A gestural account of purported segmental insertion is detailed for the case of intrusive /r/ in Gick (1999). This process, which refers to the appearance of an /r/ intervocalically at morphological boundaries in some dialects, as in the realization of idea is as [ejdi´rIz], is generally taken to be an arbitrary segmental insertion (e.g. McCarthy 1993). This has been problematic for an Articulatory Phonology/gestural timing account because a gesture or gestures corresponding to that phoneme would have to be inserted into the gestural score for lexical items where there is no /r/ underlyinging (Kohler 1992, McMahon et al. 1994). Gick, however, proposes that the /r/ is present in the lexical representation of these forms but absent from the surface form of the citation form because it undergoes vocalization. Using EMMA (electromagnetic midsagittal articulometer) data, Gick shows that /r/ is made up of two gestures: a consonantal gesture (C-gesture: tongue blade raising) and a vocalic gesture (V-gesture: pharyngeal constriction). In word final environments, the C-gesture is significantly reduced in magnitude, leading to vocalization of the /r/. Word-initial /r/ is always present because it has a much stronger C-gesture. Gick concludes that Articulatory Phonology is compatible with this kind of account of intrusive /r/, since change in the magnitude of gestures is allowed by the theory.

Page 27: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

17

While this particular account of intrusive /r/ only holds up if one is willing to agree that /r/ is present in the lexical entry, Gick also presents an account of word-internal intrusive /r/ that is much more easily defensible. Word-internal intrusive /r/ refers to the presence of an /r/ between /a/ and /S/ as in the pronunciation of Washington [warSINtn]̀ in certain dialects of American English. In this case, speakers may overlap the pharyngeal gesture for the /a/ and the critical palatal and lip protrusion gestures, which will lead to the percept of an /r/. A similar analysis has been developed for intrusive stops (tense [tEnts]) (Browman and Goldstein 1990b). The distinction between wash and warsh is shown in Figure 8 (following Gick): TB phar narrow phar narrow

TT crit palatal crit palatal

LIPS protrusion Percept of /r/ protrusion

GLO wide wide (w)a S (w)a [r] S

Figure 8. Gestural scores of wash and warsh

In another study examining whether gestural coordination can account for perceived segmental insertions and deletions, Jannedy (1994) compared the production of three minimal pairs in German along a continuum of speaking rates by six speakers to determine whether they deleted schwa in fast speech and inserted it at slower rates. Her corpus was comprised of the pairs Kannen “cans” ~ kann “can (V)”, geleiten “to accompany” ~ gleiten “to slide”, and beraten “to advise” ~ braten “to fry”. Results showed that as speaking rate increases, the duration of the [n´n] portion of Kannen becomes more similar to the [nn] of kann, although the change is gradient. In other words, at slow rates of speech, the durations of the nasal sequences were quite different, but increasingly comparable as the rate increased. Unlike kannen ~ kann, the comparison between the production of geleiten and gleiten indicates that the durations of the [g´l] and [gl] sequences remain significantly different at all speaking rates. Furthermore, there is no token of geleiten in which the schwa is entirely deleted, suggesting that this environment led to neither deletion nor reduction. Finally, the production of beraten and braten demonstrates that there is no schwa deletion in the initial syllable of beraten at fast speeds, while at slow speeds some speakers exhibit a transitional schwa between the [b] and the [“] of braten. Jannedy states that because there are no abrupt discontinuities in duration as speaking rate slows (indicating the presence of schwa epenthesis), this cannot be a case of phonological epenthesis. She concludes that it represents an instance of increasing the distance between the [b] gesture and the [“] gesture. However, she does not measure the vowel directly to determine whether or not it is present (her conclusion is based on the measurement of the duration of [b“] and [b´“]), nor does she examine whether the magnitude of the [b] or the [“] gestures change as a function of rate. This makes it difficult to determine whether or not there is a categorical change between those tokens that have a schwa and those that do not.

Page 28: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

18

Assimilations are hypothesized to occur when the same articulator is required for two consecutive gestures that may have a different constriction location. Thus, in the pronunciation of ten things, the final /n/ of ten may be dental instead of coronal because the articulator is forced to compromise between the two locations (Browman and Goldstein 1986). Zsiga (1997) also notes that a word-final alveolar may be perceived as assimilating to a following velar or labial if the coronal closure of the first gesture is overlapped. Jun (1996) takes a different view of the role of articulation in place assimilation. Based on evidence from oral and pharyngeal pressure changes, Jun argues assimilation is not a result of gestural overlap, but rather of gestural reduction. He investigates the production of /pk/ clusters in both Korean and English and shows that the /p/ is perceived as assimilated only when the labial gesture is reduced in magnitude—that is, when the labial closure which would cut off oral airflow is not made completely. When changes in oral air pressure indicate that the velar closure occurs before the labial release, which is indicative of gestural overlap, both closures in the /pk/ sequence are accurately perceived. In Korean, reduction of the labial is found to be quite common, whereas it is very rare in the English productions. Jun takes this, and the fact that reduction does not generalize to /pt/ clusters, to be evidence that assimilation is under speaker control. He concludes that these results are incompatible with Articulatory Phonology, since research based in that framework has typically claimed that gestural overlap leads to assimilation. However, there is no Articulatory Phonology account which specifically claims that assimilation can only occur as a result of overlap, so Jun’s gestural reduction analysis is not necessarily at odds with Articulatory Phonology. A study of place assimilation in English /n#k/ sequences discussed by Ellis and Hardcastle (1999) employs both electropalatography (EPG) and electromagnetic articulography (EMA) to show that individual speakers may exploit either categorical or gradient processes when producing the nasal. Results from EPG data showed that two speakers never exhibited tongue-palate contact in the velar region during the /n/, whereas 4 other speakers consistently produced /N/. Two more speakers alternated between complete assimilation and no assimilation. Only two speakers showed a range of tongue-palate contact from full alveolar closure to partial velar contact to total velar closure. It is concluded that the speakers who exhibit both full alveolar closure and full velar closure demonstrate only a phonological (but optional) assimilation effect, whereas the other two treat assimilation as a gradient, mechanical process. Although the authors do not discuss the relevance of these results for Articulatory Phonology, the findings suggest that phonological systems in general must be able to account for at least some cases of assimilation, even if not all of them can be attributed to the phonology. Whereas most of these studies have focused primarily on validating the claims of Articulatory Phonology with a variety of different types of data, a number of other studies have argued that some processes are more appropriately divided up into those phenomena that can be accounted for by manipulating gestural relationships and those that require a categorical phonological explanation employing featural representations. A major proponent of this kind of division is Zsiga (1995, 1997, 2000), who examines a number of phenomena in several different languages including English, Russian, and Igbo. In her study of Igbo, Zsiga (1997) demonstrates that there are two separate processes—ATR harmony and vowel assimilation—that are examples of categorical and gradient phenomena, respectively. Zsiga argues that ATR harmony must be considered a

Page 29: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

19

categorical process because one vowel replaces the other, and there are no phonetic differences between the harmonized vowel and the underlying version of the same vowel. Furthermore, harmony is sensitive to the morphological environment in which it occurs. Vowel assimilation, on the other hand, is shown to be a gradient process. At word boundaries in Igbo, certain vowels partially assimilate to the following vowel, but the resulting quality of the assimilated vowel is not absolutely the same as the first vowel. Experimental evidence first reported in Zsiga (1993) shows that assimilation in Igbo varies from very little to total. This is analyzed in terms of blending that results from increased overlap of the two tongue body gestures that correspond to the different vowels in contact. As discussed in Section 2.1.3, Zsiga (2000) demonstrates that /s/+/j/ palatalization across word boundaries in English is a gradient process since tokens are often intermediate between /s/, /S/ and /s+j/. In a similar environment in Russian, however, there is absolutely no palatalization of /s/ when followed by /j/. The distinction between English and Russian is accounted for by positing that English and Russian have different consonant alignment parameters, such that the English consonants allow more overlap than the Russian ones, which may have no overlap at all (at least at word boundaries). This overlap in English may also be accompanied by weakening and blending, which gives rise to the gradient character of palatalization that was observed. Palatalized fricatives in Russian (/sj/), which have been analyzed as an /s/ and a /j/ articulated at the same time, differ from the English overlap case in that they do not also exhibit blending and weakness. Ladd and Scobbie (forthcoming) present the case of external sandhi in Sardinian as another example of a process that cannot be explained by gestural overlap. It is hypothesized that post-lexical gemination, a sandhi process in which consonants lengthen at word boundaries in Sardinian, could potentially be given a gestural analysis. However, duration measurements indicate that the post-lexical geminates are indistinguishable from lexical geminates, and that there is a contrastive difference between singletons and geminates in that environment. It is argued that this result does not support a gestural overlap account of sandhi in Sardinian, even though sandhi processes in other languages (like the case of palatalization in English just reviewed) may result from gestural overlap. In their study of optional schwa epenthesis between /l/,/r/ and a non-coronal segment in Dutch, Warner et al. (2002) challenge the claim that schwa epenthesis results from increased temporal distance between gestures. They argue that the epenthesis process is phonological, not phonetic, despite the fact that it occurs optionally and somewhat infrequently in spontaneous speech. This process typically occurs word-finally (e.g. /mElk/ “milk”: [mElk] ~ [mEl´k]), but may also occur word-medially (/fIlm´r/ “cameraman”: [fIlm´“] ~ [fIl´m´“]). Using data from articulograph measurements, Warner et al. find that /l/ before epenthetic schwa is a light /l/, much like it is before underlying schwa. In both of these cases, /l/ is considered an onset. The lightness of the /l/ is determined by showing that tongue tip raising is greater both before epenthetic and lexical schwa than it is for coda /l/, which is dark in Dutch. They argue that if the presence of the schwa resulted from the pulling apart of gestures, then /l/ should remain dark since it would be coordinated with the preceding vowel but not with the following vowel (thus making it effectively a coda, not an onset). Since the /l/ before

Page 30: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

20

epenthetic schwa is light, it is concluded that schwa must result from epenthesis, which would require the addition of a gesture in the score. However, the results reported by Warner et al. must be treated cautiously. Because spontaneous epenthesis is relatively infrequent, the majority of the data for the so-called epenthetic case comes from tokens in which speakers were instructed to make a distinction between the forms with a cluster and those with the “extra sound”. Given that speakers were consciously told to produce a schwa (which may not be a natural production for them, supported by the fact that one speaker could not accurately produce any tokens with epenthesis), it is not surprising that these speakers appear to be inserting a full vowel. A comparison of the few natural productions of epenthetic forms and the items produced in the instructed condition shows that there are significant differences in the vertical position of the tongue tip for /l/ for the two different kinds of schwas for four of six speakers. Since the tongue tip movement is in the same direction for both kinds of schwas, the authors conclude that epenthetic schwa is the same in both of these cases. While this is suggestive, it is by no means conclusive confirmation that schwa epenthesis in Dutch is a robust phonological process. A related situation is encountered in the analysis of schwa insertion by English speakers in non-native word-initial consonant clusters presented in Chapters 3 and 4. In this dissertation, an account of consonant cluster production using both input and output representations based on gestures is proposed. However, despite the articulatory nature of gestures, it is not the case that acoustic or perceptual factors do not play a role in phonological processes and grammars. The next section presents a review of research which has shown that many aspects of language, including typology, speech production, and sound change are influenced by perceptual factors.

2.2. Adding perceptual factors into phonology

2.2.1. “Tug-of-war” between articulation and perception It has been recognized for some time that perceptual salience plays an important role in determining not only the shape of language inventories (e.g. Martinet 1952, Liljencrants and Lindblom 1972, Ohala 1983, Lindblom 1986, Lindblom and Maddieson 1988, Ohala 1990), but also in influencing the types of modifications, reductions, assimilations, or mergers that can occur in connected speech (Kohler 1990, Lindblom 1990a, Kohler 1991). This line of research has argued that speech systems are organized in a way that takes both articulatory and perceptual factors into account. In one of the earlier proposals of this idea, as support for his Principle of Least Effort in describing human behaviors in wide-ranging areas such as language, relationships among organisms, art, geography, mental illness, international cooperation and conflict, etc., Zipf (1949) proposed that languages tend to include phonemes that are “easiest both to articulate orally and to discriminate aurally (100).” Lindblom (1983), in a more rigorous investigation of the claim, observed that the speech production system is rarely forced to its physical limits by speakers. Speakers neither hyperarticulate, which would make them maximally perceptible, nor do they constantly whisper or mumble, which would minimize the amount of energy spent on the production of speech. Lindblom concludes that the concurrent demands of speech perception and speech production determine the behavioral patterns of the speaker; in other words, “These conditions interact to yield a subset of signals which are sufficiently adapted to their communicative purpose but at the

Page 31: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

21

same time put reasonable demands on the expenditure of physiological energy (Lindblom 1983:219)”. Lindblom supports these claims through investigations of coarticulation and vowel reduction, in which he shows that speakers do not execute extreme displacements or velocities in production, but rather prefer behaviors that minimize motoric demands. Lindblom (1990a) further refines this concept in his outline of the Hyper- and Hypospeech theory (H&H Theory). He proposes that speech production is governed by two sets of constraints, output constraints and system constraints. When output constraints are dominant, hyperforms—those which are most clearly articulated—are most common, whereas hypospeech—reduced forms—are present when system constraints prevail. Usually, however, neither one type of constraint nor the other are in full control of the speech production system, leading to behaviors that indicate a compromise. For example, to demonstrate that speakers do not usually produce speech as clearly as they could, Lindblom cites a number of jaw movement studies which show that both under normal circumstances and when the system is strained by having speakers produce loud speech or speech with a bite-block, the extreme jaw movements that would be necessary to produce the clearest articulations of the vowel targets are not evidenced (Lindblom 1967, Gay, Lindblom and Lubker 1981, Lindblom, Lubker, Lyberg, Branderud and Holmgen 1987). In taking the H&H Theory one step further, Lindblom notes that similar perceptual and articulatory constraints ultimately affect the construction of phoneme inventories cross-linguistically. Based on an investigation of the UPSID database (Lindblom and Maddieson 1988), Lindblom divides the segments found in language inventories into three types based on articulatory difficulty: Basic, Elaborated, and Complex. It is shown that the presence of these different segments is correlated to the size of a language’s inventory, such that languages with the smallest inventories have only Basic segments, languages with medium-sized inventories can have Basic and Elaborated segments, and only languages with the largest inventories can also include Complex segments. Lindblom argues that articulatorily complex segments are only available when the preservation of contrast is important in a language. A similar account of the distribution of oral and nasal vowel in various languages was also proposed (Wright 1986). Kawasaki (1982) claims that cross-linguistic preferences for specific consonant-consonant sequences can be best understood in terms of perceptual salience. It is hypothesized that acoustic modulation, defined as rapid acoustic change in sound intensity across frequency regions or rapid variation in the spectrum, plays a significant role in determining well-formed consonant sequences because the perceptual system can best distinguish sequences with maximal modulation. Kawasaki investigates this hypothesis by examining the acoustic characteristics of rare CC sequences such as alveolar stop+[l], labial C+[w], dental/alveolar/palatal stop+[j], and dispreferred CV sequences like [wu] and [ji]. Her findings modestly support the acoustic modulation hypothesis: for example, sequences like stop+[r] show greater spectral differences than stop+[l] sequences, suggesting that the absence of alveolar stop+[l] may be attributable to a preference for a following [r]. However, the spectral differences between [dl] and [gl] are minimal, despite the fact that the latter cluster is widely evidenced cross-linguistically but the preceding one is not. In the case of [j]+front vowel, it is shown that [j]+low vowels have large acoustic modulation, whereas sequences like [ji] do not. Likewise,

Page 32: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

22

labial consonant+[w] and dental/alveolar/palatal stop+[j] lack significant spectral change, but the acoustic characteristics of [gw] are also quite similar to those for [bw], which is not expected from typological distributions. Thus, Kawasaki concludes that while there may be some other factors that have an important role in determining some phonotactic universals, many such universals can be attributed to predominantly acoustic/auditory bases (but see Janson 1986 for an alternative view). Silverman (1995/1997) is primarily concerned with the phasing of articulatory gestures necessary for producing “laryngeally complex” vowels, but his investigation of articulatory timing relationships is based on the hypothesis that phasing is governed by the need to recover the perceptual cues that distinguish these vowels for the listener. Silverman surveys a large number of Otomanguean language that have both tone and non-modal phonatory settings such as breathiness and creakiness. His analysis of the coordination relationships among laryngeal gestures corresponding to these vowels and supralaryngeal gestures demonstrates how languages implement fine laryngeal distinctions such as breathy or creaky registers, pre- and post-aspiration, and glottalization. While a number of timing relationships could be possible, Silverman argues that the patterns attested cross-linguistically are those most likely to optimize the recoverability of these non-modally phonated segments. Other studies have also reanalyzed traditionally phonological phenomena in light of perceptual considerations, including Wright (1996), Gordon (1999, 2002), and Zhang (2001). Wright’s study of consonant clusters in Tsou is similar to Silverman’s analysis of vowels. Though Tsou allows a large number of unusual consonant clusters, speakers of the language ensure that the perceptual cues to the identity of the elements of the clusters are very robust. Wright demonstrates that the robust encoding of perceptual cues can be investigated in terms of redundancy, the auditory impact of cues, and the resistance of cues to environmental masking. Gordon shows that phonological weight distinctions necessary to account for stress and metrical systems are correlated with the total energy needed to produce such syllables. Like the cases of typology and inventory already discussed, Gordon argues that cross-linguistically, syllable weight reflects a compromise—in this case between the optimal phonetic conditions for producing weight contrasts and structural simplicity. Zhang examines the cross-linguistic distribution of contour tones, demonstrating that their placement is directly related to a number of phonetic factors contributing to syllable length. Ohala (1974, 1981, 1983, 1989, 1990, 1993) has also argued that articulatory and perceptual factors are both important in determining phoneme inventories and phonological processes as well as conditioning sound change. According to Ohala, one source of variation leading to sound change is the speaker, who over time may tend to rectify phonological patterns that are articulatorily difficult. An example of speaker-controlled sound changes discussed in Ohala (1989) includes aerodynamic constraints pertaining to the optimal conditions for voicing on stops. Because it is difficult to maintain the appropriate balance between oral and glottal pressure necessary to sustain voicing during the production of stops, Ohala argues that phonological processes which avoid such a situation especially under more extreme circumstances have arisen, such as the devoicing of word-medial voiced stop geminates in Nubian, or intervocalic voiced stop spirantization in Spanish.

Page 33: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

23

Another source of variation leading to sound change is the listener, who can misinterpret the acoustic information in the speech stream. This may lead listeners to posit novel representations, which are ultimately reanalyzed into new forms in the language. Listener-induced change may arise from a number of sources, including confusion of similar sounds, hypo-correction (segmental encoding of phonetic cues, such as [t] affricating and palatalizing before [i], leading the hearer to posit [tS]), and hyper-correction (implementation of ‘corrective rules’ by hearers who assume that phonetic cues are redundant. One such example is the diachronic change in Shona in which [w] [F] after labials, under the assumption that the labiovelar offglide is predictable and should be factored out, leaving only a velar.) Similarly, perception based accounts of sound-change are presented for several different types of metathesis (Blevins and Garrett 1998, in press) and velar palatalization (Guion 1998). Kohler (1990) also argues that phonetic factors play a major role in determining which phonological processes are found in natural language and which are not. However, Kohler goes further than Lindblom or Ohala in claiming that generative phonological rules, such as those used to indicate vowel reduction, consonant or vowel deletion, and assimilation, are meaningless statements unless they are defined in terms of their phonetic origins and physical requirements. In general, Kohler attributes such articulatory weakening processes in German casual speech to motor economy that is conditioned by a communicative situation in which “the listener decides whether reductions are permissible in certain speech situations or whether they interfere with the transmission of information (90).” Collectively, these studies provide a rich inventory of cases in which articulatory and perceptual factors appear to be at work simultaneously. However, the difference between processes or changes within the phonological component that are phonetic in origin, whether perceptual or articulatory, and phenomena that might be solely due to phonetic implementation is not always well-delineated. In the next section, some formal accounts of phonetic factors in phonology that attempt to situate them within a theoretical framework are reviewed.

2.2.2. Grounding and functionalism in phonological theory As noted by Hayes (1999), incorporating functional grounding into phonological theory has become prominent only since the widespread adoption of Optimality Theory as a phonological framework (notable pre-OT proposals include Natural Phonology, Stampe 1973, Donegan and Stampe 1979, and Grounded Phonology, Archangeli and Pulleyblank 1994). As Hayes points out, this may be because Optimality Theory is based on the idea that markedness constraints are universal but ranked differently in different languages. Even if constraints are designed to be phonetically grounded, it is still in the domain of the phonology to decide which phonetic tradeoffs will be preferred in any particular language. It is along these lines that Hayes argues that phonetics cannot directly give rise to phonology, since phonological grammars may or may not implement the preferred phonetic state for a particular phenomenon. As an example, he contrasts post-nasal voicing in English, which is a typically gradient process (Hayes and Stivers 2000) with post-nasal voicing in Ecuadorian Quechua, which is found categorically at morpheme-boundaries (Orr 1962). Were phonetics the only basis for such processes,

Page 34: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

24

post-nasal voicing should either be categorical in every language or simply a tendency in every language. One of the first OT-based proposals for a perceptual account of vowel inventories is that of Flemming (1995, further developed in Flemming 2001, Ni Chiosáin and Padgett 2001, Flemming to appear). Flemming introduces Dispersion Theory, based on Lindblom’s Theory of Adaptive Dispersion (Lindblom 1986, 1990b), which contends that phonological inventories are constrained by three functional criteria (Flemming 1995:13): (6) i. Maximize the number of contrasts ii. Maximize the distinctiveness of contrasts iii. Minimize articulatory effort Crucially, these principles can only be applied to pairs or sets of sounds; there is no sense in which a single phoneme has greater or lesser contrast, but sets of phonemes can be evaluated in terms of how different they are from one another. In Flemming’s phonological system, MINDIST constraints requiring a minimal distance on some dimension (such as F2 values) interact with both MAXIMIZECONTRAST and articulatory *EFFORT constraints, which are geared toward ruling out particularly rapid articulatory movements. Flemming shows how these constraints can account not only for vowel inventories in general, but also the reduction of vowel inventories from stressed to unstressed syllables, vowel neutralization based on the context of surrounding consonants, stop voicing contrasts, and contrastive nasalization. Other early OT-based proposals employ experimental results to demonstrate that phonological systems require both perceptually and articulatorily motivated constraints. Jun (1995) examines place assimilation in Korean with articulatory evidence from oral pressure experiments. Based on his conclusion that assimilation is due to gestural reduction, Jun proposes an analysis of assimilation within Optimality Theory that relies on the interaction of the constraint WEAKENING, which requires the conservation of articulatory effort, with PRESERVATION constraints which maintain perceptual cues for place and manner, and may refer to syllabic position. In his analysis of labial-to-velar place assimilation, Jun posits a hierarchy of PRESERVATION constraints to explain why labials assimilate before velars, while other assimilations are not found (e.g. velar-to-labial). Assimilation occurs when the articulatory constraint WEAKENING is ranked above the PRESERVATION constraints. A similar account of consonant reduction in Taiwanese is developed in Hsu (1996). Kirchner (1997, 1998/2001, to appear) applies articulatorily and perceptually-based constraints to an account of consonant lenition. In addition to utilizing PRESERVATION constraints to account for another type of phonological phenomena, Kirchner’s focus is mostly on the role of articulatory effort minimization. His proposal is based on a family of constraints called LAZY which take the ability to estimate the amount of effort expended as a mental notion that can interact with the drive to preserve perceptual contrasts. A similar goal is pursued by Boersma (1998) using a modification of OT called Functional Optimality Theory. Boersma’s constraint system also weighs perceptual contrasts against articulatory effort, but the crucial difference between Boersma’s proposal and most other functionally-based OT analyses is that he considers

Page 35: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

25

underlying representations to be only perceptually specified, and he makes a division between production and perception grammars. The work by Silverman, Gordon, Zhang, Jun, Flemming and Kirchner are an integral part of the development of the Licensing-by-Cue framework of Steriade (1997, 1999a, 1999b, in press), and elaborated by a number of other researchers (e.g. Côté 1997, 2000, Wilson 2000, Fleischhacker 2001, Wilson 2001). Licensing-by-Cue is a theory of contrast and neutralization that is based on the hypothesis that phonological contrasts can be licensed only in contexts where there are sufficiently strong perceptual cues to signal the contrast. For example, it has been shown that perceptual cues to a voice contrast in obstruents, such as voice onset time, closure duration, and burst duration and amplitude, are best perceived when the obstruent is followed by a vowel or a sonorant consonant (Keating 1984, Kingston 1990, Steriade 1993). Steriade (1997) presents an analysis of voice neutralization in Lithuanian based on this finding. In Lithuanian, distinctive voicing is preserved before sonorants, but is neutralized both word-finally and before obstruents. Specifically, word-finally there is devoicing, whereas voicing assimilation occurs before obstruents. Using contextual markedness constraints, Steriade defines a hierarchy to account for this pattern of voice neutralization. The necessary ranking, including the faithfulness constraint PRESERVE[voice], is shown in (7): (7) Ranking for voicing neutralization in Lithuanian (Steriade 1997)

*αvoice/V_[−son] >> *αvoice/V_# >> PRESERVE[voice] >> *αvoice/V_[+son] Steriade discusses not just voice neutralization, but also neutralization of contrast for aspiration, creakiness, and ejectives. She points out that [−sonorant] alone may not be a sufficient environment for determining neutralization, and notes that the real generalization to be captured is something like *voice/in contexts lacking VOT cues. Steriade hypothesizes that ultimately it will be necessary to refer directly to cues and cue duration in order to determine where contrasts will be licensed.− While Steriade’s implementation of this framework is primarily geared toward accounting for patterns of neutralization, it can be adapted to demonstrate why structures with poor transitional cues would be prohibited outright in some languages, or repaired though epenthesis or deletion. Wilson’s (2000, 2001) Targeted Constraints are an extension of Licensing-by-Cue designed to explain directionality in contextual neutralizations, such as why deletion processes targeting intervocalic clusters systematically affect the first consonant of the cluster. Côté (2000) also investigates consonant cluster phonotactics within the Licensing-by-Cue framework, arguing that the traditional syllabic approach to syllabic structure cannot account for cross-linguistic patterns of consonant deletion, vowel epenthesis, or vowel deletion in different positions in the word. Based on data from Parisian French, Québec French, and Ondarroa Basque, she argues that factors such as contextual cues, modulation in the acoustic signal, and cue enhancement at the edges of prosodic domains must be taken into account. Kochetov (2001/2002) departs from previous formal accounts incorporating phonetic factors into phonological grammars in that he attributes markedness patterns and scales to the emergence of high-level structure from low-level perceptual and articulatory factors. In his investigation of the contrast between plain and palatalized consonants in

Page 36: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

26

Russian, Kochetov argues that marked structures are those whose gestural production and coordination are unstable and difficult to reproduce, whereas unmarked structures are stable and robust articulatory and perceptual targets that do not present either processing system with a significant challenge. However, Kochetov recognizes that there is not a perfect mapping between the optimal phonetic situation and the structures found in languages, so he posits some grammatical restructuring which typically occurs in order to maintain similarities between morphologically related items. To this end, he provides an analysis of the interaction of plain and palatalized consonants and morphological boundaries within an Output-Output correspondence framework (Benua 1997, Burzio 2000, Steriade 2000).

2.3. Summary In this chapter, the tenets of Articulatory Phonology, including both theoretical proposals and the empirical investigations that support them, have been reviewed. The major development within the Articulatory Phonology research program that will be explored and expanded on in this dissertation is the idea that temporal coordination is in the domain of the phonology, and should not be relegated to the phonetic implementation component (see Gafos 2002). Additionally, there has also been a significant body of research showing that articulatory phonetics is not the only source of functional pressures in phonology; on the contrary, it has also been claimed that perceptual and acoustic phonetics play a key role both in shaping phonological inventories and in affecting the types of processes realized in speech production. The results of the experiments in this dissertation are consistent with such a view, and a formal account of the findings must ultimately combine perceptual, articulatory, and temporal factors. Chapter 3 presents data from acoustic measurements of non-native consonant cluster production which illustrates a case in which both articulatory and perceptual factors differentially influence speakers’ accuracy. In Chapter 4, results from an articulatory (ultrasound) investigation of the production of illegal word-initial clusters demonstrate that speakers manipulate the temporal coordination of gestures in order to repair non-native sequences. In Chapter 5, I argue that these results can only be accounted for if both phonetic and temporal factors are incorporated into the phonological grammar, and that phonetic implementation alone cannot account for the patterns of production exhibited by the speakers. Chapter 6 extends the analysis to fast speech deletion patterns in English.

Page 37: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

27

CHAPTER 3. Perceptual and Articulatory Influences on the Production of Non-Native Sequences

This chapter examines the production of non-native word-initial consonant clusters by English speakers. Previous research has shown that speakers produce and perceive phonotactically illegal sequences with different levels of accuracy, even though none of them are found in their native language. The experiment presented in this chapter was designed to uncover which factors affect speakers’ ability to produce non-native sequences. It is shown that English speakers producing Czech-legal word-initial consonant clusters exhibit varying levels of accuracy on the clusters, depending on the place, voice, and manner of the members of the sequence. It is argued that the influence of these phonological features stems from their acoustic and articulatory properties. Whereas some types of features, such as place, seem to be disadvantaged for acoustic reasons, other features, like voice, lead to lower accuracy for articulatory reasons. The nature of the predominant error, schwa insertion, is also discussed.

3.1. The perception, production, and acquisition of non-native sequences

3.1.1. Perception Investigating the perception, production, and acquisition of non-native sequences can shed light on many aspects of a speaker’s phonology that cannot be observed by examining native lexical items alone. A common goal of studies of the perception of phonotactically illegal sequences has been to examine the role of the native language as a phonological filter. One of the earliest descriptive studies of the perception of illegal clusters by Brown and Hildum (1956) showed that in general, untrained listeners asked to transcribe non-words perform progressively worse when the stimuli are (a) low frequency English words with initial clusters (/skejn/), (b) possible but non-existent English words with legal initial clusters (/Triv/), and (c) non-words with illegal initial clusters (/ps&uwp/). Results showed that the types of errors found in the transcribed words were related to how many phonemes were changed from the target. If the error affected only one phoneme, it generally only involved changing one feature, such as place, voicing, or continuancy. If the errors affected more than one phoneme, they usually resulted in a real word. While it is difficult to determine whether a transcription task reflects a locus of error in the input or the output, the fact that the participants’ errors tended to form legal sequences out of illegal words suggests that the phonology of one’s native language influences the process at some level. Using a task designed to test perception only, Greenberg and Jenkins (1964) hypothesized that listeners rate the acceptability of non-native sequences as a function of their “distance” from existing English words. Distance is defined by the number of segments that must be replaced in order for a non-English word to become an English word. A list of 24 CCVC sequences was created, ranging in distance from 0 substitutions necessary (these were in fact real English words) to 4 substitutions (where all segments were either non-English or could not occur in the given position in a real English word). Some of the nonce words were rendered non-English only by virtue of having an illegal CC sequence (such as /ZrIk/), while others contained illegal CC sequences and non-

Page 38: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

28

English segments (such as /Dgˆx/). Participants were asked to make a subjective judgment of how far these items seemed to be from being a possible English word relative to the other words in the list. The results showed that as the word increased in number of substitutions necessary, the subjective distance ratings of the participants also increase. This finding was replicated for French speakers in a similarly designed study (d'Anglejan, Lambert, Tucker and Greenberg 1971). An interesting secondary point arises in Greenberg and Jenkins (1964) because the number of words with non-native segments versus the number of words with only a non-native sequence was not properly counterbalanced. Consequently, Greenberg and Jenkins revised the initial materials such that the items for each distance level were divided between those that had non-English sounds and sequences and those that had only non-English sequences. Results from subsequent testing showed that participants’ ratings were equivalent regardless of whether or not the sequence had only an illegal cluster or non-English sounds in addition to the illegal initial cluster. Though not the main focus of the authors, one important finding from this study is that participants are as sensitive to illegal sequences in judging “goodness” of non-words as they are to nonnative segments. This supports the idea that the phonotactics of the language are at least as important as acoustic considerations in the perceptual domain. More recent studies that have looked at on- and off-line perceptual processing of illegal clusters have shown that listeners tend to assimilate impossible clusters to ones that are legal in their native languages. That is, listeners may be taking a low-level code and integrating it with higher-level phonological constraints. Based on results from a transcription task, a forced choice paradigm giving written options, a phonemic gating task, and a phoneme monitoring task, Hallé, Segui, Frauenfelder and Meunier (1998) found that French listeners are more likely to hear illegal /tl/ and /dl/ word-initial sequences as /kl/ and /gl/. Dupoux and colleagues demonstrated that Japanese listeners faced with consonant sequences that are prohibited by the native language phonology, such that in ebzo, perceive an illusory epenthetic [u] between the consonants, and cannot distinguish among stimuli without the vowel (ebzo) and those with one (ebuzo) (Dupoux, Kakehi, Hirose, Pallier and Mehler 1999, Dupoux, Pallier, Kakehi and Mehler 2001). These authors conclude that higher-level knowledge of phonotactics must ultimately dominate lower level perceptual processes which can effortlessly distinguish native segments in otherwise legal contexts. Pitt (1998) showed that in the perception of illegal clusters in English, listeners are more sensitive to phonotactic constraints than they are to frequency considerations. In one experiment, Pitt hypothesized that if frequency is more influential than phonotactics in recognizing consonant clusters during speech, then listeners would be more biased toward hearing the more frequent of pairs like /br/-/bl/, /gr/-/gl/ or /tr/-/tl/. Frequency calculations were made across word-initial and word-medial clusters, so clusters like /tl/ (as in atlas) were also assigned a frequency. Participants were asked to classify /b,d,g,s,t + liquid + Q/ CCV sequences, where the liquid varied on an 8-step continuum from /r/ to /l/. Results showed that the correct liquid was identified for all legal sequences and that there was no bias toward reporting the more frequent of the cluster types. For the illegal sequences, there was a strong bias toward hearing the legal sequence, without any interference from frequency. In another experiment, the perception of /tl/ and /sr/ clusters was investigated by lengthening the steady state of the liquid to determine whether

Page 39: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

29

hearers would interpret this as epenthesis more often in illegal sequences than in legal sequences. As the steady state increased, the percentage of two-syllable responses also went up for both legal and illegal sequences, but illegal clusters were reported as having two syllables significantly more often for all time steps. This is another example showing that listeners tend to treat illegal sequences as legal when they can apply an appropriate language-specific repair for the sequence. One potential point of interest is that whereas Hallé et. al. (1998) found assimilation effects where the second consonant of the cluster affected the perception of the first consonant, Pitt (1998) reported that participants’ errors were incorrect perception of the second consonant. One potential source for this difference arises from the different experimental paradigms used. An older study by Massaro and Cohen (1983) sheds light on this issue by showing that under differing conditions, it can be the case that either the first consonant affects the perception of the second, or vice versa. In the first experiment, /p,t,s,v/ + /r,l/ + /i/ sequences were created by manipulating the third formant (F3) of the glide on a 5-step /r/-to-/l/ continuum. Participants were asked to press one of 8 buttons that corresponded to /pri/, /pli/, /tri/, /tli/ and so forth. When presented with an illegal sequence like /tl/ or /sr/, participants showed a bias toward choosing a legal cluster (as in /tr/ and /sl/). This is similar to the results reported by Pitt. However, in a subsequent experiment, Massaro and Cohen factorially combined the /r/-to-/l/ continuum with a /b/-to-/d/ continuum and asked participants to label the auditory stimuli as “bl”, “br”, “dl”, or “dr”. If the prototypes of these categories can be formed by the various combinations of a low second formant (F2) for /b/, a high F2 for /d/, a low F3 for /l/ and a high F3 for /r/, then it would be expected that the combination of [high F2]+[low F3] would produce an unequivocal /dl/ sequence. However, the results showed that this combination was not in fact sufficient. Instead, they showed that a high F3 not only biased the liquid judgment to /l/, but it also biased the consonant judgment toward /b/. In addition, a high F2 biased judgment not only toward /d/ but also toward /r/. Massaro and Cohen report that this finding is most robust in the /dl/ case, where the phonotactic illegality of the sequence is most likely to be susceptible to this kind of manipulation. Thus, depending on the particular experimental manipulation of F2 and F3 for the obstruent+liquid combination, participants’ repairs may involve either the first or second consonant. One concern regarding the role of the phonology in the perception of non-native sequences is that it may not be that the phonology per se is affecting the perception of these sequences, but rather that other factors, such as frequency, lexicality, lexical neighborhood effects, etc, determine how hearers process such sequences. In addition to Pitt (1998), who argued that frequency did not affect hearers’ perception of phonotactically illegal sequence, Moreton (2002) designed a study to investigate whether structural models (those which explain perceptual processing in terms of phonological generalizations over classes of phonemes) or unit models (those which are concerned with frequency, lexicality, etc.) can better account for perceptual data. Moreton contrasted English listeners’ performance on nonsense words beginning with either /bw/ or /dl/ clusters. The sequences /bw/ and /dl/ both have zero frequency in English lexical items, but structurally, phonological evidence indicates that English contains onset consonants with the same properties as /bw/ (such as /br/, where /r/ is considered secondarily labialized), but not /dl/. Results show that speakers were significantly more

Page 40: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

30

likely to correctly perceive /bw/ than /dl/. Moreton argues that since both onsets have zero frequency, the bias in favor of /bw/ can only be accounted for in terms of structural models that incorporate higher level phonological knowledge. In sum, the results from these studies indicate that in perception, the phonology of the native language plays an important role in mediating how illegal clusters are perceived by listeners. The next section focuses on studies of complex clusters in production and acquisition, which shed light on the nature of the output of the phonology when the input is a non-native sequence.

3.1.2. Production and acquisition The earliest research in the acquisition of syllable structure by second language learners primarily addressed whether difficulties and errors with phonotactics were related to transfer from the native language phonology or whether speakers were reverting to universal principles of markedness during L2 acquisition. This question pertained to both consonant clusters as well as coda condition constraints. In order to investigate this question, Tarone (1987) analyzed the spontaneously elicited speech of two speakers each of Cantonese, Korean, and Portuguese. She found that with respect to their consonant clusters and final consonants, the large majority of errors were attributable to language transfer (such as [k´las] for [klQs] in a Korean’s speaker’s speech, since Korean does not have any consonant clusters). Sato (1987), for example, found that native Vietnamese speakers learning English preferred to simplify consonant clusters in such a way that they more often resulted in CVC syllables. Sato speculates that this is because Vietnamese has a large proportion of CVC syllables in the lexicon, which influences the general preferred syllable shape in her participants’ L2 productions. The Markedness Differential Hypothesis (MDH) (Eckman 1977) incorporates both native language transfer and universal markedness principles into a theory of second language speech production. The MDH states the following: (1) Markedness Differential Hypothesis

The areas of difficulty that a language learner will have can be predicted on the basis of a systematic comparison of the grammars of the native language, the target language, and the markedness relations stated in universal grammar, such that:

(a) Those areas of the target language which differ from the native language and

are more marked than the native language will be difficult. (b) The relative degree of difficulty of the areas of the target language which are

more marked than the native language will correspond to the relative degree of markedness.

(c) Those areas of the target language which are different from the native language, but are not more marked than the native language will not be difficult.

Anderson (1987) conducted an investigation of the MDH, testing speakers of colloquial Egyptian Arabic (CEA), Mandarin Chinese (MC), and Amoy Chinese (AC) on consonant clusters of English. Whereas neither MC nor AC allow any clusters in initial or final position, CEA can have two-consonant final clusters. Taking into consideration the fact

Page 41: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

31

that the MDH allows for effects of both native language transfer and general markedness principles, Anderson predicted that since longer clusters (CC vs. CCC) are more marked than shorter clusters, they would be more difficult for both Chinese and Arabic speakers, even though CEA does allow CC final clusters. For position, she hypothesized that for the AC and MC speakers who have no clusters at all, final clusters would be more difficult since they are more marked. As for CEA, Anderson noted that there was a conflict in the predictions because native language transfer gives an advantage to final clusters, but markedness factors favor better performance on initial clusters. Consequently, she predicted no difference in performance in these two positions. The results basically upheld her predictions. Since spontaneous speech was used, an insufficient number of three-consonant initial clusters was produced. For final clusters, however, she found that correct performance of both Arabic and Chinese speakers decreased significantly as final clusters increased from two to three consonants Likewise, at least for CC clusters, there was an advantage for initial over final clusters for both Arabic and Chinese speakers. Anderson concluded that the MDH was generally accurate in its predictions. Similar findings were reported by Carlisle (1998). He found that when tested on initial /sk/, /skr/, /sp/ and /spr/ clusters at two different stages of learning, native Spanish speakers learning English were more correct in producing the 2 consonant versus 3 consonant clusters at both times, although general percent correct increased for both categories in the second testing session. In addition to investigating the effects of native language transfer versus markedness, several studies have also been focused on examining the types of errors made by L2 learners during the acquisition of consonant clusters. One of the most frequently cited findings regarding cluster simplification is that of Broselow (1983) who analyzed the L2 initial consonant clusters of Iraqi and Egyptian Arabic speakers. She reported that speakers of these Arabic dialects generally simplified English consonant clusters with epenthesis. However, since the vowel insertion locations are different in Egyptian and Iraqi, she found that the epenthesis sites in English differed along these lines as well. Whereas Egyptian speakers epenthesized segments between the two initial consonants (as in [bilastik] for plastic or [silajd] for slide), Iraqi speakers added a vowel word-initially (as in [isnoo] for snow or [istadi] for study). Broselow attributes these simplification strategies to positive transfer from vowel insertion patterns found in the native languages. Major (1987) found similar effects for speakers of Brazilian Portuguese, who tended to epenthesize [i] between the two consonants of final English CC clusters. Since this type of repair is commonly found in loanwords in Brazilian Portuguese, it was an expected strategy. Other studies, however, have not had such strong evidence for epenthesis as a means of cluster simplification. In languages that do not have productive cluster simplification strategies as Arabic does, predicting the errors that learners will exhibit when producing L2 consonant clusters is not so straightforward. Anderson (1987) found a pattern of epenthesis in initial clusters for her CEA speakers, but higher percentages of deletion than epenthesis for final clusters. The Chinese speakers in her study significantly favored deletion over epenthesis for both cluster types. Davidson (1997) reported that for both CC and CCC final clusters, speakers of Turkish, Japanese, and Shanghai Chinese all strongly preferred deletion over epenthesis. Results in Hancin-Bhatt and Bhatt (1998) suggest that there is an asymmetry in the types of simplifications preferred for onset

Page 42: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

32

clusters versus coda clusters. Both their Japanese and Spanish speakers exhibited more epenthesis than deletion errors in the onset, and more deletion than epenthesis errors in the coda. Because their error rates were so small, it is not clear whether or not this is a trend that could be replicated. However, if it is in fact true, the reason behind this asymmetry merits further research. Several more recent studies have questioned whether learners distinguish among related illegal structures during acquisition. Following their hypothesis that sonority sequencing affects the production of onset clusters in L2 acquisition, Broselow and Finer (1991) designed a set of stimuli that included words with initial clusters /pr/, /br/, /fr/, /py/, /by/, and /fy/. Working with the sonority scale (in order of decreasing sonority) [glides > liquids > nasals > fricatives > stops], Broselow and Finer were primarily interested in (a) whether the voiced-voiceless distinction is necessary and relevant for sonority, and (b) whether the stop-continuant distinction is necessary. The stimuli were designed such that performance on obviously different clusters such as [obstruent+glide] vs. [obstruent+liquid] could be contrasted with less obvious distinctions such as [stop+sonorant] vs. [fricative+sonorant] or [voiceless obstruent+sonorant] vs. [voiced obstruent+sonorant]. Under these criteria, /Cy/ clusters should be less marked than /Cr/ clusters, and the obstruent clusters increase in markedness for the sequences /pC/ < /bC/ < /fC/. Results showed that both the manner and voice predictions held up. The Japanese and Korean participants in this study exhibited the following pattern (followed by percent error): /py/—5%, /pr/—3%, /by/—8%, /br/—26%, /fy/—24%, /fr/—34%. Given that both Japanese and Korean only have /Cy/ initial clusters and that neither language has the phoneme /f/, these results accord both with native language transfer and sonority predictions. In fact, Broselow and Finer hypothesized that if the participants had merely relied on native language transfer, they would have performed better on /by/ clusters than on /pr/ clusters, since only the first type of cluster is legal in the native languages. However, because /pr/ is overall less marked, it is equally as acceptable to these speakers as /py/ and /by/. These findings suggest that both voicing and continuancy are important factors in cluster acquisition. Broselow and Finer also looked at the error patterns exhibited by their participants. Three major types of cluster simplification are attested: epenthesis, deletion, and “manner errors”, or segmental substitutions. Neither the Japanese nor the Korean speakers appear to systematically prefer any one of these simplification strategies. However, given that the number of errors was so low, any conclusion regarding error patterns based on this data would be premature. Eckman and Iverson (1993) responded to the Broselow and Finer study with their own analysis. They conducted a similar study of Korean, Japanese, and additionally Cantonese speakers, this time testing them on /pr/, /pl/, /br/, /bl/, /py/, /fr/, /fl/, /tr/, /dr/, /tw/, /Tr/, /kl/, /kr/, /gl/, /gr/, /kw/, and /ky/. Of these clusters, they hypothesized that the following markedness relationships would be attested in terms of greater pronunciation difficulty, where the clusters increase in markedness:

Page 43: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

33

(2) /pr/,/pl/ < /br/,/bl/ < /py/4 /pr/,/pl/ < /fr/,/fl/

/tr/ < /dr/ < /tw/ /tr/ < /Tr/ /kl/,/kr/ < /gl/,/gr/ < /kw/,/ky/ Their participants’ errors generally increased as predicted by their markedness oppositions, although like the Broselow and Finer study, there was a very low number of errors. The major difference, however, is not empirical, but rather theoretical. Eckman and Iverson suggest that an explanation in terms of sonority is not necessary and that instead, typological universals are more appropriate for explaining the pattern. They claim that the following markedness generalizations are sufficient for explaining the performance exhibited by these participants: (3) (i) a voiced stop followed by a liquid or glide is more difficult than a voiceless

stop followed by a liquid or glide. (ii) a voiced fricative followed by a liquid or glide is more difficult than a

voiceless fricative followed by a liquid or glide. (iii) a voiceless fricative followed by a liquid or glide is more difficult than a

voiceless stop followed by a liquid or glide. As Archibald (1998b, 1998a) rightly points out, however, this is less of an explanation than a description of facts that need to be explained. A number of both the perceptual and production studies reviewed demonstrate that speakers’ accuracy on illegal onset clusters varies, even though none of them are found in the native language inventory. Broselow and Finer (1991) attribute such distinctions in production to the influence of sonority sequencing, but such a conclusion assumes that sonority sequencing can be broken down into such fine-grained divisions such as voiceless versus voiced obstruents. While some researchers have proposed just such a sonority scale (e.g. Hooper 1976, Steriade 1982, Selkirk 1984, Dell and Elmedlaoui 1985), others contend that there is no empirical basis for such exhaustive distinctions (e.g. Clements 1990, Kenstowicz 1994, Morelli 1999). A few studies of English speakers (not learners) asked to produce non-native sequences in a variety of tasks indicate that there may be a number of factors other than sonority distance affecting the production of phonotactically illegal clusters. Following the implicational universals for word-initial consonant clusters proposed by Greenberg (1965), Pertz and Bever (1975) examined children’s and adolescents’ ratings of pairs of phonotactically illegal non-words that differed only by one phoneme (e.g. /ntIf/-/nkIf/, /rbik/-/rnik/, /ldIf/-/lmIf/). Each pair was matched for a universal principle that predicted one cluster would be more marked than the other (e.g. heterorganic nasal+stop clusters

4 The treatment of /obstruent+glide/ clusters as being more marked than /obstruent+liquid/ clusters is based on Clements’ (1990) Sonority Dispersion Principle, which states that sonority preferably increases gradually from onset to nucleus. Whereas /obstruent+liquid/ clusters do in fact increase gradually in sonority, /obstruent+glide/ clusters rise sharply from obstruent to glide, and then almost not at all from glide to vowel. Consequently, /obstruent+glide/ clusters are considered more marked than /obstruent+liquid/ clusters on these grounds.

Page 44: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

34

like /nkIf/ are more marked than homorganic ones like /ntIf/) It was hypothesized that if such universals are represented as innate linguistic content, then younger participants would be as good or better than adolescents in distinguishing among the pairs of words related on a markedness scale. If the hierarchy of universals is due to “an innate acquisition process (152)”, however, then the adolescents should show more distinctions since they are more mature. Results show that adolescents were more likely to treat the more marked member of the pair related by some universal principle as less English-like than the children were, which the authors take to confirm the hypothesis that universal implications must results from a learning process. However, the greater discrimination shown by adolescents may be a task effect; perhaps younger children (9-11 years vs. 16-20 years) either have more difficulty with the task, or do not understand the instructions (choose the word that is easier, or more likely to have an “initial sound cluster used in more languages in the world (154)”). Since the criteria each group uses to make a decision is unknown, these results would have to be confirmed by a more sensitive task. Two studies conducted independently but addressing similar issues examined English speakers’ production of Polish word-initial clusters (Davidson, Jusczyk and Smolensky 2003) and production and perception of Russian word-initial clusters (Haunz 2002). In a follow-up of L2 studies that proposed that sonority sequencing plays an important role in determining why speakers make distinctions among sequences that are not possible in their native language, Davidson et al. (2003) presented English speakers with words containing Polish-legal onset clusters, including the sequences in Table 1.

CLUSTER TYPE CLUSTERS [stop+stop] [affricate+stop] [fricative+fricative] [stop+fricative] [fricative+nasal] [fricative+liquid]

/kp/,/kt/,/pt/ /c&k/ /vz/ /tf/,/dv/ /zm/,/vn/,/sm/,/sn/ /zr/,/Sl/,/Sr/,/fr/

Table 1. Word-initial cluster stimuli used in Davidson et al. (2003)

Clusters in italics are legal or marginally acceptable in English. English speakers were presented the target words both auditorily and with English-like orthography simultaneously to ensure that they were correctly perceiving the cluster that was being spoken by a native Polish speaker. In one condition, the speakers were asked to read aloud a sentence containing the word after hearing it produced by the Polish speaker. Results showed that as clusters became more marked on various dimensions, speakers produced them less accurately. Based on a Newman-Keuls post-hoc grouping of accuracy proportions (p<.05), clusters were divided into four groups: Legal, Easy, Intermediate, and Difficult. These are demonstrated in Figure 9:

Page 45: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

35

11%19% 20%

34% 36% 38% 39%

52%

63% 63%

94% 95% 97% 97% 98%

0%10%20%30%40%50%60%70%80%90%

100%

vn vz dv kt pt kp chk tf zm zr shl sn shr sm fr

Difficult Intermediate Easy Legal

perc

ent a

ccur

acy

Figure 9. Production accuracy on obstruent word-initial clusters in Davidson et al. (2003)

The production results show that sonority sequencing alone cannot account for the data. For example, clusters like /zm/ and /vn/ differ only in place, not in sonority distance, but speakers’ accuracy on these clusters is very different. The difference in performance was attributed to the fact that /z/ is [+coronal], whereas /v/ is a more marked labial fricative. In an Optimality Theoretic analysis of the results, Davidson et al. (2003) proposed that the relationships between various marked sequences could be established and accounted for with the ranking of constraints. For example, one such markedness restriction entails that there are no [stop+obstruent] or [stop+nasal] word-initial clusters because in English, obstruents must be released in onsets, and release only occurs if an obstruent is followed by an approximant (Steriade 1993), which is defined as a liquid, glide, or vowel. Another aspect relevant to the production facts is that as the first segment becomes more marked on the dimensions of place, manner, and voicing, the cluster is less likely to be produced accurately. For example, the cluster /dv/ prompts significantly lower accuracy than /tf/, which differs only in voicing. Furthermore, when more than one of these markedness factors is violated, it causes an even larger decrement in accuracy. The cluster /vz/, for example, is among the least accurate of all not only because the first consonant does not release into an approximant, but also because it is voiced. The reader interested in the full account of all the clusters is referred to Davidson et al. (2003). In her experiment focusing on Russian onset clusters, Haunz (2002) examined the role of the native language phonology in both perception and production. In one task, English speakers were presented with a Russian word auditorily and were asked to repeat it in isolation and within an English sentence. Although the particular clusters used in Haunz were different than those in Davidson et al., the findings were roughly similar. For example, speakers were more accurate on voiceless clusters like /fp/ than on voiced clusters like /vb/. Likewise, speakers were more accurate when fricatives were followed by a liquid (/vr/ and /vl/) than a stop (/vz/ and /vb/). However, Haunz points out that this methodology conflates perception and production, so she presents another condition in which participants listen to stimuli and write down the cluster that they hear. Again, speakers show differences in accuracy among non-native clusters, but the results also demonstrate that speakers are considerably more accurate when writing the target words than when producing them orally. Based on the comparison of the results from writing

Page 46: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

36

and speaking, Haunz concludes that adaptation—the conversion of the illegal target string to something possible in one’s native language—occurs to some extent in perception. While Davidson et al.’s (2003) and Haunz’s (2002) examinations of a large number of clusters are a necessary first step for determining what dimensions affect the perception and production of non-native sequences, these studies may in fact be too detailed to be able to draw any strong conclusions. That is, the clusters used in the experiments were not particularly well-matched on any criteria other than sonority distance, so there are only a limited number of comparisons that can be made in order to discover which dimensions are relevant to determining speakers’ accuracy on non-native clusters. In light of the large body of research reviewed in Chapter 2 supporting the proposal that speech production and perception and language typology (and by extension, phonology) are influenced by articulatory and perceptual factors, a new production experiment with more tightly controlled stimuli was designed to investigate how the production of phonotactically illegal word-initial clusters can shed light on the relationship between phonological factors and their acoustic and articulatory characteristics. In the experiment presented below, the effects of place, voice, and manner in determining the relative accuracy in the production of different illegal word-initial clusters are tested.

3.2. Experiment

3.2.1. Phonetic influences Since typological studies of word-initial clusters have shown that inventories are sensitive to the features of both the first and second consonants (Wright 1996, Morelli 1999), it was hypothesized that similar biases would affect production of illegal sequences. Working within a cue-based framework, Wright argues that segments which contain internal cues to voice and place, such as fricatives, should be better tolerated in preconsonantal position than segments such as stops, which require transitions with a sonorant or vowel in order to maximize their cues. This is confirmed by Morelli’s cross-linguistic work on word-initial obstruent clusters, which argues that fricative-initial clusters are typologically least marked. However, even fricatives may suffer from weak place and voice cues; when the cues for the first consonant are not sufficient, the perception of these features may need to be bolstered by information in the transition to the second consonant of the clusters (Harris 1958, Steriade 1993, 1997). In fact, a closer look at the typological survey carried out by Morelli (1999) shows that her claim that fricative-initial obstruent clusters are the most common initial obstruent clusters is based solely on the relatively wide distribution of /s/-initial clusters, which are commonly considered the most robust and perceptible of all fricatives. Other types of fricatives, however, are not nearly as common in obstruent clusters. In the current study, English speakers were tested on their ability to produce word-initial consonant clusters beginning with /s/, /f/, /z/, and /v/ and followed by stops, fricatives, and nasals in order to examine the robustness of the initial segment and the importance of the perceptual cues that may be provided by the second consonant of the cluster. These clusters are found word-initially in Czech, which was the language used to record the auditory stimuli for this experiment. By using fricatives, a comparison between English-illegal clusters and legal (/s/-initial) clusters is facilitated. Furthermore, since English has /s/-initial clusters, it is assumed that being fricative-initial per se is not

Page 47: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

37

disallowed by the English grammar. Rather, it is the place and/or voicing of /f/, /z/, and /v/ that are prohibited. With respect to the effect of place, a number of phonetic studies of fricatives have shown that the sibilants /s/ and /z/ are significantly more perceptible than non-sibilants /f/ and /v/. Because the sound source is filtered by a long anterior-cavity resonance for sibilants /s/ and /z/, the amplitude of the frication noise is much greater than that for labiodental fricatives, which have no cavity in front of the obstruction (Behrens and Blumstein 1988, Stevens 1998, Jongman, Wayland and Wong 2000). This also leads to a well-defined, distinct spectral shape. This high-intensity noise and clear timbre contribute to the salience and distinctiveness of /s/, making its acoustic properties almost vowel-like (Kingston 1990). Jongman et al. (2000) also found that /f/ and /v/ had significantly lower noise amplitude relative to /s/ and /z/. In terms of duration, they reported that non-sibilant fricatives were significantly shorter than the sibilants. Furthermore, discriminant analysis and perceptual confusion tests have shown that non-sibilant fricatives like /f/, /v/, /T/, and /D/ are significantly more likely to be confused with one another than are sibilants such as /s/, /z/, /S/ and /Z/ (Miller and Nicely 1955, Tomiak 1990, Jongman et al. 2000). In a different type of perceptual task, Harris (1958) presented listeners with syllables consisting of a fricative /s/, /S/, /f/, or /T/ combined with a vowel that had originally been produced in conjunction with one of the other fricatives. Thus, the /a/ of a syllable such as /sa/ could be spliced with an initial /f/ to create a syllable /fa/ that had the wrong F2 transition for the consonant and vowel pair. When speakers were asked to classify the syllable, these mismatched syllables did not affect accurate discrimination of the sibilants, but caused significantly poorer classification of the non-sibilants. Harris concluded that accurate perception of non-sibilants is dependent not just on internal cues to fricative place, but on the F2 transitions from the fricative to the following segment. Unlike /f/, the failure of /z/ to appear in word-initial clusters does not seem to be motivated by perceptual reasons. Perceptually, /z/ is nearly as salient as /s/ both in terms of its internal features as well as in comparison to other fricatives. Miller and Nicely (1955) note that all sibilants, regardless of their voicing specification, are not only characterized by intense high-frequency noise, but also by an extra long duration. A different type of evidence comes from Harris’s (1958) discrimination task, in which /z/ was correctly identified when compared to other voiced fricatives as often as /s/ was correctly identified among voiceless ones. The discriminant analysis discussed in Jongman et al. (2000) likewise shows that classification rates for both /s/ and /z/ were significantly higher than those for either /f/ and /v/ or /θ/ and /D/. Instead, whereas non-sibilant fricatives are less salient compared to sibilants primarily for perceptual reasons, voiced fricatives are disadvantaged compared to voiceless ones on articulatory grounds. As noted by Ohala (1994), the optimal situation for obstruent voicing occurs when oral pressure is maximally lower than glottal pressure. For the most favorable frication, however, oral pressure should be maximally higher than atmospheric pressure, thus setting up conflicting articulatory requirements for the production of voiced fricatives. For example, in an investigation of the cues to voicing in the fricative pairs /s,z/ and /f,v/, Stevens et al. (1992) found that about 22% of singleton intervocalic voiced fricatives in their stimuli set were not fully voiced throughout the duration of the fricative (see also Haggard 1978). Voiced obstruent clusters are further disadvantaged by the fact that they are longer in duration than single voiced obstruents,

Page 48: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

38

which may require the conflicting air pressure requirements to be sustained for longer than the speech system can accommodate (Westbury and Keating 1986, Ohala and Kawasaki-Fukumori 1997). Voiced fricative-stop clusters have articulatory shortcomings similar to those of fricative-fricative clusters; just as frication requires high oral pressure, stops block the flow of air, which likewise causes oral pressure to increase, adversely affecting the pressure drop needed for voicing. These perceptual and articulatory factors contribute to the overall “goodness” of certain consonant clusters, and play a role in determining which types of sequences are more likely to be found in word-initial cluster inventories cross-linguistically (e.g. Greenberg 1965, Ohala 1994, Morelli 1999). Thus, since /s/ is the optimal fricative both in terms of voice and place (i.e. it is a voiceless strident), it is the best candidate for the edges of obstruent and other consonant clusters among the fricatives. Note that the prevalence of /s/ in cluster-initial position is not likely to be accounted for with a general statement that [+coronal] segments are articulatorily simpler. For typology, this claim would also lead to the expectation that [+coronal] stops should be preferred as the first segment in /stop+obstruent/ word-initial clusters. In other words, there should be no languages that have [+dorsal] or [+labial] cluster-initial segments without also having [+coronal], and if language only have one type of /stop+obstruent/ word-initial cluster, it should be coronal-initial. However, a review of the obstruent cluster inventories in Morelli (1999) shows that (a) clusters can begin with coronal, labial or dorsal stops in 10 languages (Cambodian, Dakota, Georgian, Hebrew, Khasi, Nisgha, Pashto, Serbo-Croatian, Seri, and Tsou), (b) two languages have clusters beginning with labial and dorsal stops but not coronal ones (Attic Greek and German), (c) two languages only allow clusters beginning with dorsal stops (Mawo and Wichita), and (d) only one language allows labials and coronals in initial position, but not dorsals (Yatee Zapotec). For some of the languages that do allow coronal-initial clusters, the inventories show that the number of these clusters is much smaller than dorsal or labial-initial ones (e.g. Dakota has 4 /p/-initial clusters, 5 /k/-initial clusters, and only one /t/-initial cluster; Yatee Zapotec has 9 labial-initial clusters and only 2 coronal-initial ones). This type of evidence suggests that being [+coronal] alone cannot account for the wide cross-linguistic distribution of /s/ in cluster initial position. Word-initial clusters beginning with /f/, /z/, or /v/ may be ruled out of a language’s cluster inventory for having insufficiently perceptible frication noise or for being too difficult to articulate. However, the presence of such clusters in an inventory is language-dependent; Slavic languages, for example, include clusters beginning with all of these fricatives. It must therefore be the case that their presence or absence is determined by the phonological grammar of a language. Since low-intensity energy and voicing are disadvantaged with respect to the optimal fricative /s/, traditional phonological terminology may be applied to say that these characteristics are cross-linguistically marked. Furthermore, since these properties play an important role in phonological grammars, they can be characterized in terms of traditional phonological features, such as [±voice] and [±strident]. Note that both perceptually and articulatorily-based features are necessary. It is hypothesized that less accurate production by English speakers on non-native clusters may be incurred for two reasons: the first consonant is marked by virtue of (i) the poor perceptual cues of a non-strident fricative which has weak-intensity energy or (ii)

Page 49: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

39

the articulatory difficulty of being followed by a voiced segment. Specifically, this leads to the prediction of three possibilities with respect to production accuracy based on the effect of the first consonant, shown in (4); these are tested in this experiment. (4) Accuracy More Less Possibility 1. [−strident] and [+voice,−son] are equally marked: /s/ f /f/ = /z/ f /v/ Possibility 2. [−strident] is more marked than [+voice,−son]: /s/ f /z/ f /f/ f /v/ Possibility 3. [+voice,−son] is more marked than [−strident]: /s/ f /f/ f /z/ f /v/ If voicing or a lack of stridency are equally marked, then there should be no difference between production accuracy on /f/-initial or /z/-initial clusters. This corresponds to Possibility 1. If one or the other has a greater effect, however, then either Possibility 2 or 3 could occur. Regardless of whether [−strident] is more marked than [+voice] or vice versa, performance on /v/-initial clusters, which are marked for both stridency and voice, should be least accurate. Likewise, because English does contain /s/-initial clusters, which are unmarked on both features, participants should be at or near 100% accuracy on these. In addition to the markedness of the first segment of the cluster, the combined effects of the first and second consonant must also be taken into consideration. Following Steriade’s (1993, 1997) claim that perceptual cues to the place and voice of obstruents are bolstered when they are followed by sonorants, it is hypothesized that for each of the fricatives investigated, accuracy on clusters will be improved when the second member of the cluster is a nasal rather than an obstruent, since nasals are sonorants (the exception is /s/, which is expected to be at ceiling regardless of the second consonant). The effect of the second consonant will be described more fully in Section 3.3.

3.2.2. Participants The participants were 20 Johns Hopkins University undergraduates who received course credit for their participation. All of them were native speakers of English and had no exposure to Slavic languages. None reported any history of speech or hearing impairments.

3.2.3. Materials The target words used in the study were pseudo-Czech words with /s/, /f/, /z/, and /v/-initial obstruent clusters. These initial segments were combined with the stops /p/, /t/, /k/ for the voiceless fricatives and /b/, /d/, /g/ for the voiced fricatives, the other fricative with the same voicing specification (i.e. /sf/, /fs/, /zv/ and /vz/), and the nasals /m/ and /n/ to create 24 word-initial clusters. All possible combinations are given in Table 2. Four distinct CCaCV tokens were created for each onset, for a total of 96 target words. The stimuli were recorded by a native Czech speaker using the Kay Elemetrics Computerized Speech Lab (CSL) at a 44.1-kHz sampling rate. These words are shown in Appendix 1.

Page 50: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

40

FIRST SEGMENT SECOND SEGMENT CLUSTER /s/ nasal

fricative stop

/sm/,/sn/ /sf/ /sp/,/st/,/sk/

/f/ nasal fricative stop

/fm/,/fn/ /fs/ /fp/,/ft/,/fk/

/z/ nasal fricative stop

/zm/,/zn/ /zv/ /zb/,/zd/,/zg/

/v/ nasal fricative stop

/vm/,/vn/ /vz/ /vb/,/vd/,/vg/

Table 2. Pseudo-Czech word-initial clusters used in experiment

The experiment was designed with PsyScope 1.2.6 and was presented on a Macintosh G3 laptop. Participants were seated in a sound-proof booth and their responses were digitally recorded with a CSL using a head-mounted, unidirectional short-range microphone.

3.2.4. Design and procedure All 96 words were presented to all participants in two conditions: Repetition and Sentence. These conditions were intended to test whether different types of tasks would have an effect on speakers’ ability to accurately produce non-native word-initial clusters. With respect to the Repetition condition, it may be the case that speakers employ a “foreign register” when producing non-native words in isolation that does not represent the English grammar. Another possibility is that simple repetition does not require speakers to access their English grammar. If this is true, speakers are hypothesized to perform with higher accuracy in the Repetition condition than in the Sentence condition, which is assumed to more fully engage the English grammar. In fact, the Sentence condition was designed to encourage the use of the English grammar; by embedding the non-native words in an English sentence, it is hypothesized that speakers will not be able to switch back and forth between the English grammar needed to produce the carrier sentence and any special strategies that might be used with non-native words. However, if speakers cannot “bypass” their English phonology in the Repetition condition after all, then the performance in both of these conditions should be nearly equivalent. Repetition Condition. At the start of each trial, the target word written in English orthography appeared on the screen and remained there for the rest of the trial.5 Twenty milliseconds after the word appeared on the screen, the target stimulus recorded by a native Czech speaker was presented auditorily to the participant through external

5 The target words were presented orthographically in order to bias the participant toward hearing the word-initial cluster. While the use of orthography may present a somewhat undesired confound, it was felt that presenting the participant with the written form of the target was critical for preventing possible misperceptions of the word.

Page 51: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

41

speakers. The word was repeated again 300ms after the end of the first auditory presentation. Participants were instructed to listen to the two repetitions of the word, and then repeat it one time into the microphone. They then pressed the space bar to move on to the next trial. Each trial lasted 2500ms. Participants were given five practice trials before the experimental trials. Sentence Condition. Forty-eight English carrier sentences were created for the experimental targets (e.g. How far away is the Zvabu subway station?). Every target item was incorporated into one of the carrier sentences, which were each used twice in order to accommodate all of the 96 words. In this condition, the participants were told to imagine themselves as tourists traveling in a foreign country. While they did not know the language of this country, they would find themselves having to ask natives for directions to different places, or how to buy different products. They could assume that the natives spoke some English, and that they would be helpful in trying to answer their questions. For each trial, participants simultaneously heard the target word and saw the word in English orthography, which remained on the computer screen for 1100ms. The visual word then disappeared, and 500ms later the sentence containing that word appeared on the screen for 3400ms. Participants were told to read the sentence on the screen into the microphone. At the end of the trial the sentence disappeared and the next trial automatically began. Each trial lasted 5 seconds. Participants were given 5 practice trials before the experimental trials. The order of the conditions was counterbalanced such that 10 participants saw the Repetition condition first and 10 saw the Sentence condition first. In both conditions, the stimuli were presented in a different random order for each subject. Coding. For each experimental condition, the waveform and spectrogram for each target word were analyzed using Praat for Windows6 to determine what, if any, error had been made. There were six possible response categories for each target: Correct, Insertion, Prothesis, Deletion, Segment Change, and Other. The correct productions did not contain any period of voicing, aspiration, or formant structure between the two consonants (as in Czech). If any of these phonetic characteristics were present, the target was coded for insertion. Repairs are exemplified in Table 3: RESPONSE TYPE DEFINITION EXAMPLE Correct Target is produced with no changes or

simplifications /zvabu/ [zvabu]

Insertion Target is produced with a schwa between the consonants in the cluster

/zvabu/ [z´vabu]

Deletion Target is produced with either the first or second member deleted

/zvabu/ [zabu] /zvabu/ [vabu]

Prothesis Target is produced with a schwa before the cluster

/zvabu/ [´zvabu]

Segment Change Target is produced with two segments, but one differs from the original

/zvabu/ [svabu]

Other Target is not produced, has more than one error, or is completely unrecognizable

/zvabu/ ∅, /zvabu/ [v´vabu], /zvabu/ [sfada]

6Praat was developed by Paul Boersma and David Weenink, and can be found at http://www.praat.org.

Page 52: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

42

Table 3. Possible response types

3.3. Results

3.3.1. Repetition first vs. Sentence first The effect of seeing the Repetition condition first versus seeing the Sentence condition first was examined with an ANOVA. The independent variables were condition seen first (Repetition first versus Sentence first) and the initial segment (/s/, /f/, /z/ and /v/) The dependent variable was proportion correct. For this and all subsequent statistical tests, each data point corresponds to each participant’s proportion correct for each cluster. Thus, for example, if a subject was correct on 3 out of 4 attempts at the cluster /fn/, then she was given a score of .75 for /fn/. This was done in order to have a continuous value for the dependent variable. Mean proportion of correct production for each condition for first segment by version seen first is given in Table 4:

Repetition seen first Sentence seen first s 96% 97% f 55% 65% z 37% 41%

Performance in Repetition

v 19% 26% s 93% 95% f 45% 63% z 46% 56%

Performance in Sentence

v 25% 35%

Table 4. Proportion correct in Repetition and Sentence conditions for the first segment divided by version seen first

For the Repetition condition, results from the ANOVA show a main effect of first segment type [F(3,472)=160.47, p<.001], and a main effect of version seen first [F(1,472)=4.41, p<.04]. The interaction between first segment and version seen first is not significant [F(3,472)<1]. A similar result is found for the Sentence condition. Results show a main effect of first segment type [F(3,472)=84.12, p<.001], and a main effect of version seen first [F(1,472)=11.21, p<.001]. Again, the interaction between first segment and version seen first is not significant [F(3,472)=1.32, p>.26]. Though the participants that saw the Sentence version first appear to be more accurate overall, accounting for the main effect of version seen first, the lack of interaction demonstrates that the pattern of production based on the type of first segment is the same regardless of which version was seen first. Since this is the parameter of interest, the two groups can be collapsed for both conditions.

3.3.2. Repetition condition

3.3.2.1 First segment

Page 53: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

43

The effect of first segment was examined with a one-way ANOVA considering all participants together. The independent variable was the first segment (/s/, /f/, /z/ and /v/). The dependent variable was proportion correct. Participants were treated as a random factor. Mean proportion correct for each first segment is presented in Figure 10. Results show a main effect of first segment type [F(3,57)=96.78 p<.001]. Pairwise comparisons show that each of the initial segment categories are all significantly different from one another (p<.001). These results indicate that the nature of the first consonant has a crucial effect on accuracy: speakers produce /f/-initial clusters more accurately than /z/-initial ones, which are more accurate than /v/-initial sequences. As expected, speakers are nearly perfect on /s/-initial sequences.

0.97

0.60

0.39

0.22

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

s f z v

Figure 10. Performance based on first segment in Repetition condition

3.3.2.2 Second segment The influence of the second segment of the non-native sequence on speakers’ accuracy was tested with a one-way ANOVA. The independent variables was the second segment (stop, fricative, nasal) The dependent variable was proportion correct. Participants were treated as a random factor. Mean proportion correct based on the second segment is shown in Figure 11. Results of the ANOVA show a main effect for the second segment [F(2,38)=18.45, p<.001]. Pairwise comparisons show that nasals are significantly different than stops or fricatives (p<.001). Stops and fricatives are exactly the same as one another.

Page 54: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

44

0.49 0.49

0.65

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

Stop Fricative Nasal

Prop

ortio

n co

rrec

t

Figure 11. Performance based on second segment in Repetition condition

These results suggest that the second segment plays an important role in influencing accuracy. For any given first consonant, if the second segment is a nasal, speakers are more likely to produce the sequence correctly than if the second segment is an obstruent. This can be corroborated by examining the effect of cluster context—that is, individual combinations of /s/, /f/, /z/, and /v/ with stop, fricatives, or nasals.

3.3.2.3 Cluster type In order to investigate the combined effects of the first and second consonants on participants’ accuracy, clusters were divided into 7 cluster types: /f/+nasal (fN), /f/+obstruent (fO), /z/+nasal (zN), /z/+obstruent (zO), /v/+nasal (vN), /v/+obstruent (vO), and /s/+any consonant (sC) (this category is collapsed since both sN and sO are legal and speakers perform at ceiling on these stimuli). Accuracy based on type was examined with a one-way ANOVA for each condition, with cluster type as independent variable and proportion correct as the dependent variable.

Page 55: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

45

0.97

0.75

0.53 0.56

0.31 0.310.18

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

sC fN fO zN zO vN vO

Prop

ortio

n co

rrec

t

Figure 12. Performance on clusters broken down by cluster type in Repetition condition

For the Repetition condition, results show that there is a main effect of cluster type [F(6,114)=62.84, p<.001]. A Newman-Keuls post-hoc test for significance across types gives the grouping shown in Table 5. Types grouped at the same level are not significantly different from one another; all other differences are significant (p<.001).

Cluster Type Means sC .97 fN .75 fO zN

.53

.56 zO vN

.31

.31 vO .18

Table 5. Accuracy groupings for cluster type in the Repetition condition

Dividing the data by simultaneously considering both the first and second consonants indicates that speakers are sensitive to not only to the characteristics of the consonant individually, but also to the relationship that they have with one another. The results show that for each first consonant, the second consonant further determines how accurate speakers will be. A principled explanation for these differences will be discussed in greater detail in Section 3.4.1.

3.3.2.4 Error types The distribution of error types for the Repetition condition is shown in Figure 13. The single largest error type for all categories of first segments is insertion. There is also some prothesis, especially corresponding to the production of /z/. This repair is consistent with cross-linguistic data showing that using prothesis to repair illegal clusters generally only occurs with sibilants (Broselow 1983, 1991, Fleischhacker 2001). There is also a

Page 56: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

46

moderate number of errors in the Segment Change category for /z/, which becomes the legal cluster-initial /s/.

0.02

0.24

0.31

0.57

0.05

0.20

0.13

0.010.03

0.010.03

0.100.06

0.02 0.01 0.020.06

0.00

0.10

0.20

0.30

0.40

0.50

0.60

s f z v

Prop

ortio

nInsertionProthesisDeletionSegment Change

Other

Figure 13. Error types in Repetition condition

3.3.3. Sentence condition

3.3.3.1 First segment In general, the results of the Sentence condition are similar to those of the Repetition condition. As shown in Figure 14, there was a main effect of the first segment [F(3,57)=61.28 p<.001]. Speakers were most accurate on /s/ and least accurate on /v/-initial clusters, but /f/ and /z/-initial clusters were not significantly different from one another (p>.30).

0.94

0.54 0.51

0.30

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

s f z v

Prop

ortio

n co

rrec

t

Figure 14. Performance on fricative-initial clusters in Sentence condition

Page 57: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

47

3.3.3.2 Second segment The pattern for the second consonant, shown in Figure 15 was nearly identical to that of the Repetition condition. There was a main effect of the second consonant type [F(2,38)=19.30 p<.001], and planned comparisons show that nasals were significantly different than stops and fricatives (p<.003), which were not different from one another (p=.82).

0.53 0.50

0.67

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

Stop Fricative Nasal

Prop

ortio

n C

orre

ct

Figure 15. Performance based on second segment in Repetition condition

3.3.3.3 Cluster type The effects of context for the Sentence condition are similar to the Repetition condition, but differ especially with respect to the behavior of the zN category. This is shown in Figure 16. There is a main effect of context category [F(6,114)=45.65, p<.001].

0.94

0.640.49

0.68

0.43 0.390.26

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

sC fN fO zN zO vN vO

Prop

ortio

n Co

rrec

t

Figure 16. Performance on clusters broken down by cluster type in Sentence condition

Page 58: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

48

The Newman-Keuls grouping for the Sentence condition is given in Table 6 (p<.05).

Cluster Type Means sC .94 fN zN

.68

.64 fO zO vN

.49

.43

.39 vO .26

Table 6. Accuracy groupings for context category in the Sentence condition

3.3.3.4 Error types Error types for the Sentence condition were also nearly the same as for the Repetition condition. Here too insertion was the most frequent repair, as shown in Figure 17.

0.03

0.320.36

0.57

0.02 0.01 0.030.06

0.03 0.050.03 0.04

0.090.050.04

0.02 0.03

0.00

0.10

0.20

0.30

0.40

0.50

0.60

s f z v

Pro

porti

on

Insertion

ProthesisDeletion

Segment ChangeOther

Figure 17. Error types in Sentence condition

3.3.3.5 Discussion In the Sentence condition, speakers produced /z/-initial and /f/-initial clusters with equal accuracy, whereas in the Repetition Condition, speakers were significantly more accurate on /f/-initial clusters. The difference in performance between the Repetition and Sentence conditions is likely due to resyllabification of the fricative as a coda constituent in the Sentence condition, which would explain why speakers seem more accurate on /zC/ clusters in the Sentence condition than in the Repetition condition with respect to /fC/ clusters. Although resyllabification is always a potential option when the target words are in a carrier sentence, the likelihood of its occurrence depends on the ability of the cluster-initial fricative in question to form a legal coda cluster. In this sense, the fricative /z/ is especially susceptible to this effect, since it is the only segment (of the phonotactically illegal clusters) that can be resyllabified after vowels, nasals, liquids, and

Page 59: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

49

voiced obstruents. Other segments are affected to a lesser degree since they are most likely to be resyllabified only after vowels. In a number of the carrier sentences, the initial /z/ in a target word could form a singleton coda or English-legal coda cluster in the preceding word, allowing for the possibility that speakers would in fact associate the /z/ with the coda instead with the onset. For example, in the carrier sentences Do the mountains near ________ have campgrounds?, Will a map of _______ show the subway system?, or Can I buy a ticket to __________ at this window?, the word preceding the target ends in either a vowel or a consonant that can form a coda cluster with /z/. If speakers are using this strategy to avoid illegal word-initial clusters, no correlates of such an “error” will be visible on the spectrogram, leading it to be coded as correct. This also explains the otherwise aberrant performance on /zN/: in addition to nasals being a better context for accuracy, resyllabification could lead to accuracy equal to that of /fN/, instead of /fO/. The resyllabification hypothesis is also supported by a change in the types of errors that are found for /z/-initial sequences. Whereas vowel insertion is the most common type of error for all sequence types in both the Repetition and Sentence conditions, it is shown in Figure 13 that speakers also displayed a considerable amount of prothesis in the Repetition condition for /z/-initial sequences (e.g. zvabu [´zvabu]). In the Sentence condition, however, prothesis nearly disappears as a repair for /z/-initial sequences. This is consistent with a resyllabification strategy: rather than appending a vowel to separate the constituents of the /zC/ cluster, speakers can take advantage of the material preceding the /z/ to break up the cluster. Despite the resyllabification confound, the performance in the Sentence condition is compatible with the Repetition condition in that it can be considered less sensitive to the distinctions speakers make. Consequently, in the remainder of the discussion, the Repetition condition will be considered the most accurate reflection of speakers’ performance on the task of producing non-native word-initial clusters.

3.4. General discussion

3.4.1. Fricative-initial obstruent clusters The results of this experiment demonstrate that performance on non-native word-initial clusters is affected by the first segment of the cluster, the second segment of the cluster, and the characteristics of the whole cluster created by combining the first and second segments. The findings regarding the first segment of the cluster conform to the prediction made in Possibility 3 in (4): /sC/ f /fC/ f /zC/ f /vC/. These results suggest that the [−strident] fricatives corresponding to non-coronal place are marked relative to /s/, but voice is even more marked, as evidenced by the decreased accuracy on /z/ relative to /f/. As hypothesized in Section 3.2.1, /v/-initial clusters induce the poorest performance of all the fricatives, due to the fact that they are marked for both stridency and voice. The second segment of the cluster also affects accuracy on the individual clusters tested in this experiment. As shown in Figure 11, collapsing over the first consonant, accuracy increases significantly when the second segment is a nasal (/CN/ f /CO/). This suggests that /fricative+obstruent/ clusters are more marked than /fricative+nasal/ clusters, which is likely a result of the fact that nasals are sonorants and thus provide better transitions for initial obstruents. This combination is preferred cross-linguistically

Page 60: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

50

for perceptual reasons. The effect of the second segment is more fully realized when the clusters are broken down into categories based on the combination of the first and second segments: /s+consonant/, /f+nasal/, /f+obstruent/, /z+nasal/, /z+obstruent/, /v+nasal/, and /v+obstruent/. The results summarized in Table 5 indicate that at least for English speakers, if a cluster is less marked on the basis of the first consonant but more marked with respect to the second consonant, it is essentially equivalent to a cluster that is less marked on the first consonant and more marked on the second. For example, speakers’ accuracy on /zN/ is equivalent to their performance on /fO/ (/zN/~/fO/). This suggests that the markedness of the first and second consonant are roughly equally important in determining the overall markedness of the cluster. The fact that English speakers produce /f/, /z/, and /v/-initial onset clusters with significantly different accuracy is notable. These distinctions are unlikely to result from an intrinsic phonetic relationship between perceptual effects (weak intensity non-sibilant fricatives) and articulatory factors (voiced initial obstruent sequences). That is, although perceptually and articulatorily disadvantaged phonotactic sequences may be dispreferred with respect to those sequences that are salient and/or simple to produce, there is no sense in which either articulatory or perceptual drives must always be maximized above the other. Rather, this is a language-specific decision (Ohala 1983, Lindblom 1990a). In Optimality Theoretic terms, a ranking such as *[+VOICE] >> *[−STRIDENT] (as a very rough example) could be seen as a language-specific option exploited by the English grammar. The absence of a phonetically constrained implicational relationship, at least among clusters starting with /f/ or /z/, is supported by cross-linguistic data. For example, while languages like Dutch or Norwegian allow some /f/-initial clusters but no /z/-initial ones, languages like Italian and Serbo-Croatian have /z/-initial clusters but not /f/-initial ones. However, if a language has /v/-initial clusters, like Tsou, Greek, or many Slavic languages, then they also have /f/ and /z/-initial ones (note that all of these languages have the phonemes /s/, /f/, /z/, and /v/). A more detailed examination of cross-linguistic fricative-initial inventories will be presented in Chapter 5. Given that cross-linguistic cluster inventories exhibit different patterns that allow or prohibit a given phonotactic structure, discrete relationships among /f/, /z/, and /v/-initial clusters can be said to have become phonologized in the sense of Lindblom (1990a) or Ohala (1983). In other words, though these fricatives are clearly disadvantaged in clusters on phonetic grounds, making them less likely to be found in legal phonotactic sequences cross-linguistically, their distribution is nevertheless governed by phonological grammars. For this reason, /fC/, /zC/, and /vC/ sequences can be considered “marked” as in the traditional phonological sense, though this label can be understood as having perceptual and articulatory (i.e. phonetic) origins. Crucially, it is the grammar—not just phonetic difficulty—that determines which clusters, if any, are allowed in a given language. A phonological analysis of the results is supported precisely because both perceptual and articulatory factors must be taken into account; that is, phonetic or articulatory implementation difficulties cannot account for the role of perceptual salience in causing /f/-initial clusters to be poor phonotactic sequences in English. In addition, /vC/ cluster were predicted to be least accurately produced because they are both perceptually and articulatorily marked. The combination of these factors to

Page 61: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

51

create a more marked structure can only occur at a grammatical level. The structure of the grammar that accounts for the cluster production findings will be discussed in Chapter 5.

3.4.2. Nature of errors Most studies that report finding vowel insertion as a repair for phonotactically illegal sequences in non-native language production have assumed that speakers are phonologically epenthesizing a schwa or other vowel between the consonants that form the illegal sequence (e.g. Broselow 1983, Anderson 1987, Broselow and Finer 1991, Eckman and Iverson 1993, Hancin-Bhatt and Bhatt 1998, Fleischhacker 2001, Davidson et al. 2003). However, it has also been argued that the presence of a vowel, especially a schwa, in the acoustic record does not necessarily arise through epenthesis or deletion of a gesture, but may be a result of lengthening a gesture or changing the timing among two gestures (Price 1980, Browman and Goldstein 1992b, Jannedy 1994). Thus, with respect to the prevalence of insertion used by the speakers in this experiment, one possibility regarding the nature of the insertion repair is that speakers are not actually epenthesizing a vowel, but rather are mistiming gestures with respect to their expected coordination patterns. Other than sibilant-initial clusters, no other type of obstruent-obstruent or obstruent-nasal cluster is found in the inventory of legal word-initial clusters in English. It is reasonable to hypothesize that since English speakers have no experience with onset clusters like /fp/, /zn/, /vd/, and so on, they are unable to assign the correct coordination relation to these consonants in an onset. In fact, it may be that since these clusters are not legal phonotactic sequences in English, it is the grammar that is prohibiting speakers from assigning the correct English coordination to them. Given that the participants in the experiment are purposely intending to produce the clusters correctly, it is possible that epenthesis is blocked since repairing a cluster with a schwa is clearly the strategy for making a prohibited phonotactic sequence legal, not for producing it correctly. Much like second language learners who exhibit unexpected repairs (such as Chinese speakers who devoice word-final voiced obstruents even though no final obstruents are legal in Chinese), English speakers may instead avoid the formation of a cluster by imposing a non-overlapping coordination relationship on consonant sequences that are not legal clusters in English. With respect to the experimental classification “insertion”, then, a schwa could appear on the surface through phonological epenthesis, but also through a gestural coordination in which the release of the first consonant is not overlapped by the target of the second consonant (as proposed for excrescent schwa in Moroccan Arabic by Gafos 2002). That is, if the consonantal gestures comprising the cluster are produced sufficiently far apart and the speaker is in the speech-ready state, then the vocal tract could be fully open just long enough for vocal fold adduction and vibration to occur, ultimately leading to the percept of voicing between two voiceless segments (Goldstein 2002).7 This can be exemplified by the schematic from Section 2.1.3 demonstrating the Moroccan Arabic case, recharacterized in (5): 7 In general, Browman and Goldstein assume that the default glottal state is the one that produces voicing (Browman and Goldstein 1986), so if the constrictions and glottal abduction gestures of the consonants are relaxed for long enough, then voicing can occur.

Page 62: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

52

(5) Percept of excrescent schwa between consonants (no gesture corresponding to /´/) open vocal tract target: output: f p f ´ p The possibility that the schwa arises from a non-overlapping gestural coordination is supported by a closer examination of details of the acoustic record. Spectrographic evidence indicates that there is a considerable amount of variability regarding the duration, intensity and amount of formant structure corresponding to the schwa produced between the two consonants of the non-native cluster. This is the case for both voiceless and voiced clusters. When categorizing voiceless clusters in terms of errors produced by the speakers, an utterance was coded as containing an epenthesized vowel if any period of voicing intervened between the consonants, whether it was similar to a long, robust vowel, or a short, transitional one. In the case of the voiced clusters, the schwa can be long and intense, or shorter and considerably weaker in energy. Examples of these cases are shown in Figure 18 and Figure 19:

f ´ k a d a

Page 63: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

53

f ´ t a k e

Figure 18. Different manifestations of inserted vowel between voiceless consonants

The upper panel represents a large, robust period of voicing in interconsonantal position in the production of the word /fkada/ [f´kada] (subject 20). In the lower panel, the cluster contains a much shorter,

transitional period of voicing in the word /ftaka/ [f´take] (subject 8). The low-intensity energy present in the voiceless consonants is due to the hum of the laptop computer used to present the stimuli.

z ´ d a t e

Page 64: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

54

z ´ b a n o

Figure 19. Different manifestations of inserted vowel between voiced consonants

The upper panel represents a long, intense period of voicing in interconsonantal position of the word /zdate/ [z´date] (subject 20). In the lower panel, the period of voicing is shorter and considerably weaker in the

word /zbano/ [z´bano] (subject 16). When the English productions are compared with the spectrograms for the Czech versions of both voiced and voiceless clusters, it is clear that there is no vocalic material present in the interconsonantal position. This is further evidence that the English speakers are not accurately reproducing the auditory stimuli. Examples of the Czech productions are shown in Figure 20:

f p a k u

Page 65: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

55

z d a b a

Figure 20. Productions of auditory stimuli by Czech speaker The upper panel represents a voiceless cluster in the word [fpaku]. The lower panel contains a voiced

cluster in the word [zdaba]. That there is so much variability in how the schwa is realized in the English speakers’ productions further suggests that speakers may not be epenthesizing a phonological schwa in order to repair the phonotactically ill-formed cluster. Since the speech rate stays relatively constant during the course of the experiment, there is no simple explanation for why the schwa can be produced so variably if it is the manifestation of a phonologically represented vowel. However, if the schwa arises from the inability to adequately coordinate the consonants, then it might be more likely for the duration and intensity of the schwa to show considerable variability, since speakers may not necessarily coordinate the consonants in the same way each time they produce a target word. That is, if the only requirement to be satisfied in the production of consonants that cannot form a legal cluster is that they do not overlap as in the correct production of a cluster in English, then the amount of opening between the two consonants is not necessarily prescribed.8 In order to better determine whether the period of voicing between the consonants used as experimental targets results from phonological epenthesis or gestural “mistiming”, it is necessary to be able to visualize the movement of the articulators during speech. Ultrasound, a technique which allows for the real-time imaging of tongue motion during speech, can be used to determine whether speakers are actively moving toward a gestural target corresponding to a phonological schwa, or are simply pulling apart the consonant gestures of the cluster, giving rise to an excrescent schwa. An experiment designed to examine this question is presented in Chapter 4.

8 On the other hand, there may be restriction on how consonant sequences that do not conform to canonical English cluster coordination can be realized. This is discussed further in Chapter 5.

Page 66: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

56

3.5. Summary The major focus of the experiment reported in this chapter has been on the articulatory and perceptual factors contributing to the production of phonotactically ill-formed sequences. Participants were tested on non-native fricative-initial clusters in order to see how segments that varied along multiple dimensions compared to legal /s/-initial clusters. According to the results, speakers were reliably more accurate on /f/-initial clusters, followed by /z/- and then /v/-initial clusters. The second member of the cluster also had an effect on speakers’ accuracy: when the second consonant was a nasal, accuracy was improved compared to when the second segment was a fricative or a stop. Decreased performance on /f/, /z/, and /v/-initial clusters can be attributed to the perceptual and articulatory characteristics of such clusters. Unlike sibilants, which have high-intensity energy and clear timbre, making them highly salient at the edge of word-initial consonant clusters, the labiodental fricatives /f/ and /v/ have very weak energy and prefer transitional cues from the following segment to bolster their perceptibility. The fact that this perceptual consideration influences speakers’ production even though perceptual confusion is eliminated by providing participants with an orthographic representation of the target word suggests that it takes effect at a phonological level. That is, /f/ and /v/-initial clusters are marked because the overall perceptual poorness of such clusters has been represented in the phonology. Whereas the markedness of /f/-initial clusters is perceptual in origin, the markedness of /z/-initial clusters is of an articulatory nature. In this case, though /z/ is a sibilant, making it a potentially good candidate for the first member of a word-initial consonant cluster, it is disadvantaged compared to /s/ because it is voiced. Since the aerodynamic requirements for sustaining voicing on combinations of voiced fricatives and stops are more difficult than those for voiceless sequences, word-initial voiced clusters are marked relative to voiceless ones. Clearly, they are not articulatorily impossible, since they exist in a number of languages, but they are found in considerably fewer inventories than are voiceless obstruent clusters (Morelli 1999). Notably, /v/-initial clusters, which contain a segment that is both weak-intensity and voiced, induce the poorest performance by the speakers. A priori, it is not clear how to assess the relative costliness of the poor perceptibility of weak-intensity labiodental fricatives and articulatory difficulties of voiced fricatives with respect to the relative markedness of /f/, /z/, and /v/. While it is recognized that both articulatory and perceptual factors influence the structure of grammars, it is not necessarily the case that one is always prioritized over the other. The fact that a distinction is made in English—for example, that speakers are less accurate on voiced /z/-initial clusters than on voiceless but weak-intensity /f/-initial clusters—is another factor suggesting that the relative markedness among these characteristics is determined by the phonological grammar. Another way of saying the same thing is that phonetic factors become phonologized, and once in the grammar, behave as any other phonological constraints. A phonological analysis of the results for the English speakers is developed in Chapter 5. The results of this study also show that the most common repair of illegal phonotactic sequences is insertion of a vowel. This kind of repair is typically considered a result of phonological epenthesis; however, research within Articulatory Phonology has also argued that just such a period of voicing can also be present if the two consonantal

Page 67: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

57

gestures in a cluster are sufficiently far apart. In the next chapter, an experiment addressing this question with ultrasound imaging is presented.

Page 68: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

58

CHAPTER 4. An Ultrasound Investigation of Consonant Coordination in Initial Clusters

The nature of the inserted schwa produced by English speakers attempting to produce phonotactically illegal word-initial clusters is examined with ultrasound imaging. Ultrasound imaging is a non-invasive technology that allows for the real-time visualization of tongue motion during speech. In the experiment in this chapter, ultrasound imaging is used to compare English speakers’ productions of legal words with (a) lexical schwa and (b) initial clusters to their productions of non-native initial clusters. Results from the ultrasound recordings suggest that the intrusive schwa produced between the two consonants of a non-native cluster is not consistent with a vowel that has its own gestural target.

4.1. The nature of schwa

4.1.1. Experimental and typological characterizations of transitional schwa In studies of production and second language acquisition, it is typically assumed that when speakers produce a vowel between the two consonants in a phonotactically illegal sequence, it is a result of the phonological epenthesis of a vowel (e.g. Broselow 1983, Anderson 1987, Major 1987, Tarone 1987, Broselow and Finer 1991, Eckman and Iverson 1993, Davidson 1997, Hancin-Bhatt and Bhatt 1998, Davidson, Smolensky and Jusczyk in press). Sometimes the inserted vowel is a schwa, as for English speakers producing Polish clusters (Davidson et al. in press) or Korean speakers learning English (Tarone 1987). However, speakers of languages that do not have schwa in the inventory often exhibit the epenthesis of a vowel with a particular quality, such as [i] for speakers of Brazilian Portuguese (Major 1987) or Egyptian Arabic (Broselow 1983). The assumption that schwa results from phonological epenthesis of a vowel has been indirectly challenged by some of the research in the Articulatory Phonology framework. It has been proposed by Browman and Goldstein (1990a, 1992a, 1992b) and others that schwas in English, even ones that are generally accepted to be present underlyingly (as in p[´]rade or t[´]morrow), do not necessarily need to have their own gesture associated with them, and can be derived acoustically from variations in the coordination and distance between the flanking consonants. A similar idea was originally proposed by Price (1980), who demonstrated that in a string like [plis], listeners perceived the word as police when the liquid was lengthened. Focusing on the articulatory tasks of the speaker, Browman and Goldstein (1990a) hypothesize that the difference between minimal pairs like beret and bray is not that a schwa gesture is present in the score for beret, but that the /b/ and the /r/ in bray have a tight gestural coordination (i.e. conforms to the C-center effect, see Section 2.1.3) whereas in beret, the bilabial and rhotic gestures are not overlapping at all. Synthetic stimuli produced by a vocal tract model that generated tokens with amounts of overlap varying in 10ms increments were presented to listeners, who tended to categorize the stimuli with greater than 10ms of overlap as bray and the stimuli with greater than 0ms of separation as beret. Jannedy (1994) makes a similar claim for German, as reviewed in Section 2.1.4.

Page 69: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

59

One problem with this account, however, is that if the consonantal gestures were sequential in both beret and bray but differed in their coordination, it is unclear how the phonological component would assign the correct coordination relations to the different lexical items. In a preview of the analysis that will be developed in Chapter 5, the job of the phonology is to assign the correct, language-specific coordination to sequential consonants; coordination patterns are not lexically specified. Consequently, if the consonants /b/ and /r/ are sequential in the input, the grammar should always treat them in the same way, and should not be able to produce both a tightly coordinated /b/ and /r/ in bray and a non-overlapping configuration for beret. While Browman and Goldstein tried to demonstrate that it is theoretically possible for schwa to appear on the surface even if it does not correspond to a gesture in the gestural score, other research of theirs provides evidence that schwa likely does require its own gestural target in the underlying representations. In their simulations examining whether the schwa in nonsense stimuli such as /pip´pip´/ or /pip´pap´/ could be a “targetless” interpolation of the transition from [i]-to-[i] or [i]-to-[a], Browman and Goldstein (1992b) show that schwa cannot be fully described in terms of the flanking vowels and consequently, does appear to have its own target. Furthermore, an examination of tongue position during schwa using x-ray microbeam and ultrasound in two different tasks has shown that schwa seems to have a canonical position involving tongue body lowering and tongue root retraction (Gick 2002, Gick and Wilson to appear). A similar issue is addressed by Smorodinsky (2002), who attempts to posit a distinction between lexical schwa and so-called “epenthetic” schwa. Smorodinsky contends that the English epenthetic schwa found between a coronal-final word and the past-tense morpheme (as in need[´]d) may result from pulling the coronal gestures apart to avoid an OCP violation, whereas lexical schwas like the final sound in the words panda or Rita have their own gestural targets. Using EMMA, the tongue movements for epenthetic schwa and lexical schwa were compared by having speakers produce minimal pairs such as If needed even once and If Needa’d even known. It was hypothesized that movements corresponding to epenthetic schwa, which does not have a gestural target, should vary more given the context of the flanking vowels (i.e. [i]…[´]…[i] in the examples above) than lexical schwa does. In other words, an interaction between vowel context and schwa type is expected. It is also noted that qualitatively, the trajectory corresponding to lexical schwa may demonstrate motion toward a target, whereas the trajectory of epenthetic schwa should remain flat if it is an interpolation between two flanking vowel gestures. Smorodinksy’s results showed that there were no qualitative differences between lexical and epenthetic schwa, since both diverged from the flat trajectory that would have been expected if schwa were just an interpolation between preceding and following gestures. The quantitative results are mixed, neither providing good confirmation of the claim that there is a difference between lexical and epenthetic schwa nor supporting the alternative possibility that both types of schwa are articulatorily the same. Only the vertical dimension of the tongue dorsum EMMA pellet showed a greater effect of vowel context on the tongue position of epenthetic schwa, and it was only significant for one of three subjects. The movements of the tongue tip for the two types of schwas were not different, and tongue body measurements were not reported.

Page 70: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

60

An additional measure of intergestural timing which examined the interval between the achievements of the targets of the first and second consonants was also considered. It was hypothesized that for epenthetic schwa, the two surrounding consonants would be coordinated with one another (though non-overlapping), so the interval between them should be less variable. In the case of tokens with lexical schwa, the flanking consonants are not coordinated with one another, so there might be more variability in the timing of the achievement of each consonantal target. Results indicated that there was less variability for all three subjects in the epenthesis case, but this was not statistically significant. Smorodinsky nevertheless argues that the results support the conclusion that epenthetic past-tense schwa in English differs from lexical schwa because it does not have a gestural target. However, while many of Smorodinsky’s findings were in the predicted direction if the past-tense epenthetic schwa in English is different from lexical schwa, the results are not conclusive and should be interpreted cautiously. Though “targetless” schwa resulting from non-overlapping gestures is not necessarily a viable output of the core phonology of English, a similar idea has been proposed for other languages, such as Piro (Arawak family) and Moroccan Arabic. In consonant clusters in Piro, a transitional vocoid can optionally occur between the consonants, as illustrated in (1) (Matteson and Pike 1958, Lin 1997): (1) /kwali/: [k´wali:], [kowali:] ‘platform’ /tkatSi/: [t •́katSi] ‘sun’ /tçema/: [t •́çema], [ti•çema] ‘she hears’ Matteson and Pike present a number of arguments supporting their claim that the vocoid present between the two consonants of a cluster is not a segment, including (but not limited to): (a) they are in free variation, (b) they have no syllabic stress, (c) they are much shorter than all other vowels in the language, and (d) they are heavily coarticulated with surrounding consonants. A similar case is found in Sierra Popoluca (Elson 1956). Much like the analysis for Piro, Gafos (2002) proposes that the transitional vocoid present in final consonant clusters in Moroccan Arabic does not have an underlying schwa gesture, but rather results from non-overlapping gestures. As discussed in Section 2.1.3, Gafos analyzes this lack of overlap as a result of the gestural coordination for (coda) consonant clusters as specified by the phonology. Specifically, Gafos proposes a family of COORDINATION constraints that determine the alignment of adjacent gestures. According to the gestural account of Moroccan Arabic, the center of the first consonant is aligned with the onset of the second consonant, which ensures that the release of the first consonant is not overlapped and in fact provides for a period of open vocal tract between the two consonants in the cluster. In the well-known case of Berber, it has been argued that words can be composed of only consonants, and that syllabic nuclei are determined by the relative sonority values of the consonants that form the word (e.g. Dell and Elmedlaoui 1985, Prince and Smolensky 1993, Zec 1995). Examples from Berber are shown in (2) (data taken from Dell and Elmedlaoui 1996b). (2) /ukkd+nt/ [ukk´dnt´] ‘they (f) asserted’ /t+!ngd/ [!tn´g´d´] ‘she drowned’

Page 71: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

61

However, prompted by a reanalysis from Coleman (1996) proposing that Berber syllables have empty nuclei, Dell and Elmedlaoui (1996a, 1996b) reexamined the data, and subsequently concluded that Berber (like Piro or Moroccan Arabic) contains non-segmental transitional vocoids in many consonant sequences.9 In this proposal, consonants still serve as syllabic nuclei. The exact placement of transitional schwa is dependent on a number of facts, including consonant length, gemination and homorganicity, which are treated in detail in both Dell and Elmedlaoui papers. A gestural analysis of Berber would have to show how these types of factors interact with the coordination relations to produce audible releases and transitional vocoids between some gestures, but not others. Presumably, such distinctions would be dependent on whether consonants in a sequence form a single syllable or are heterosyllabic.

4.1.2. The relationship between transitional schwa and non-native production The nature of transitional schwa as characterized both by simulations and experimental data and the analyses of languages such as Piro, Sierra Popoluca, Moroccan Arabic, and Berber present an alternative to the assumption that the schwa found in the production of non-native sequences is the epenthesis of a phonological vowel. While it is true that a phonological process may repair a phonotactically illegal sequence through epenthesis, it is also conceivable that in a production task, the repair is actually implemented in the coordination of the gestures. Although Browman and Goldstein’s (1990a, 1992b) speculation that English schwa may result from a particular coordination pattern rather than the presence of a schwa gesture has not been upheld with much empirical support, the cross-linguistic data suggests that for many other languages, transitional schwa arising from non-overlapping coordination is a robust aspect of the phonology. Whereas lexical schwa is likely to have an underlying target, a production task in which speakers are given consonant sequences that are prohibited from being coordinated in the canonical way for English may in fact lead to a different kind of schwa. In this case, it is plausible that speakers attempting to correctly produce these sequences may nevertheless fail, but do so by using non-overlapping coordination to repair them rather than by inserting a schwa in the phonological output. This question is addressed with an ultrasound experiment in Section 4.3.

4.2. Previous ultrasound research Ultrasound imaging has been productively used to investigate both lengthwise and cross-sectional tongue shapes in speech production (e.g. Stone 1991, Stone, Faber, Rafael and Shawker 1992, Stone 1995, Iskarous 1998, Gick and Wilson to appear). Ultrasound is an appealing technology for the study of speech because of good temporal (30 frames/second) and spatial (~1mm) resolution. Furthermore, ultrasound is a non-invasive method of imaging tongue motion, and it does not expose the subject to radiation (as do 9 Dell and Elmedlaoui state that between voiceless consonants, the transitional vocoid is voiceless, whereas it is voiced between voiced sequences. In a study of the production of sequences of voiceless consonant in Tashlhiyt Berber using nasal endoscopy (Ridouane 2002), it is demonstrated that there is no adduction of the vocal folds (i.e. no voicing gesture) present. Based on this result and on phonological facts about assibilation of [t] before vowels, Ridouane concludes that there can be no phonological schwa gestures present in the production of voiceless sequences, even if it were devoiced. However, he does not discuss the role of consonant releases, non-overlapping consonants, and transitional vocoids, which may still be present in his speakers’ productions of voiceless consonant sequences.

Page 72: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

62

other imaging techniques such as x-ray). The images produced by ultrasound are computer reconstructions of sound echoes reflecting off tissue boundaries. The tissue/air boundary on the upper surface of the tongue appears in the ultrasound image as a bright white line. The images can be recorded in real-time (30 frames per second), showing the tongue surface motion as a subject is speaking. Images can be collected either in the coronal or the sagittal plane. Iskarous (1998) employed ultrasound to investigate how tongue trajectories in [i-a] and [a-i] sequences might correlate with the asymmetrical behavior of these sequences in vowel coalescence. In nearly all of the languages studied by Casali (1996) which resolved hiatus through vowel coalescence, [a] followed by [i] coalesces to [e], but [i] followed by [a] does not. Iskarous investigated this asymmetry by having Spanish and Korean speakers (who have non-diphthongal [e] in their vowel inventories) produce the target word [iBeBa] (where /B/=any labial segment). Labial consonants were chosen because they are not considered to have a tongue shape of their own, and consequently will not disturb the trajectory from vowel to vowel. It was hypothesized that when [e] was preceded by [i] and followed by [a], it would be in the middle of a trajectory that would have an overall falling and backing movement. If [e] is “static” (similar to being targetless in Browman and Goldstein’s terminology), then it would be expected that the movement might simply pause at [e] during the trajectory before moving on to [a]. If [e] has a target, however, then it would require an active raising and fronting gesture that should result in a slight upward and frontward motion as [i] moves to [a]. The [i]-[a] environment was examined because it was hypothesized that this is the environment in which [e] is least likely to show movement toward its own target. Results from the ultrasound data on tongue movement showed that there was a peak of upward movement corresponding to the [e], not just a transition from [i] to [a]. It was concluded that the trajectories of [a]-[i] and [e] have dynamic targets that are similar in direction, facilitating coalescence, but because the dynamic targets for [i]-[a] and [e] are in opposite directions, they are not predicted to alternate. Gick and Wilson (to appear) used ultrasound to demonstrate that the percept of a schwa that often occurs in English vowel+liquid sequences (e.g. heel [hi´l], hire [haj´r]) arises as a solution to conflicting articulatory targets. Previous phonological accounts have attempted to explain the presence of excrescent schwa in this environment as a repair of a particularly bad sonority cline (McCarthy 1991), or because [tense vowel + liquid] sequences are trimoraic syllables (Lavoie and Cohn 1999). However, Gick and Wilson hypothesize that the acoustic schwa appears because the articulatory movements necessary for the production of vowels and liquids may conflict: whereas the former have an anterior tongue root position, the latter are characterized by a retracted tongue root. In the production of these sequences, the schwa is a by-product of the route taken to resolve this conflict. This is termed the “schwa space” hypothesis, since it claims that the tongue must move through the canonical position for schwa when transitioning between a vowel and a liquid. By matching up the acoustic signal which contains the schwa with the ultrasound frames, it is shown that the tongue position occurring at this time is very similar to that for a canonical schwa produced by the same speaker. In addition, Gick and Wilson also discuss data from Beijing Mandarin, Nuu-chah-nulth (Wakashan) and Chilcotin (Athapaskan) which indicate that similar processes may also be occurring in these languages.

Page 73: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

63

In the next section, an ultrasound experiment examining the nature of the schwa used to repair phonotactically illegal sequences in English is presented.

4.3. Ultrasound experiment The findings of the experiment discussed in Chapter 3 demonstrate that English speakers tend to produce a vowel in between the consonants of a phonotactically illegal word-initial cluster. Although it is often assumed that the vowel is a product of phonological epenthesis, it is also possible that it may arise from a coordination of the consonants in which the gestures are not fully overlapped. As discussed in Chapter 2, it has been proposed that proper gestural coordination for consonant clusters in English is alignment of the release of the first consonant with the target of the second consonant. The schematic of consonant cluster coordination in English first shown in Section 2.1.3 is repeated in (3): (3) CC-COORD in English: ALIGN (C1, release, C2, target) C1 C2 As further demonstrated in Section 3.4.2, since the Czech speaker’s productions used as auditory stimuli in the experiment did not have any vowel-like material between the consonants, it can be assumed that the target consonantal alignment that the English speakers are hearing and trying to reproduce is similar to that for English. However, although the coordination for phonologically legal clusters may be similar, the English speakers’ grammar still must contain phonological constraints prohibiting this type of coordination in conjunction with certain phonotactic configurations, such as /f/-, /z/-, or /v/-initial clusters when the second member is an obstruent or nasal since these are not found in English. Given the target cluster as produced by the Czech-speaker, then, an English speaker may repair it in one of two ways, either by pulling apart the gestures (gestural mistiming) or by epenthesizing a vowel. The realization of phonological vowel epenthesis in gestural terms is the modification of the gestural score with a new vocalic gesture corresponding to the schwa. The two options are shown in (4)a and (4)b. (4) a. gestural mistiming target: output: open vocal tract z b z ´ b

Page 74: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

64

b. phonological epenthesis target: output: z b z ´ b In order to assess whether the production of a vowel in non-native onsets has an epenthetic (segmental) or gestural coordination origin, production of non-native /CC-/ sequences can be compared to the production of native sequences that are the same in every respect except voicing. Because experimental results demonstrated that English speakers had considerable difficulty accurately producing /z/-initial clusters, the production of these clusters can be compared to the production of /s/-initial clusters which are legal in English. Two important comparisons must be carried out. First, the production of /zC/ sequences can be compared to the production of /sC/ clusters that are matched for the place, manner, and continuancy of the second segment. Second, the production of /zC/ sequences must also be compared to the production of /s´C/ sequences which are assumed to contain a target for the schwa in the underlying representation. Using ultrasound images, tongue shapes during speakers’ production of /zC/ as [z´C] can be compared to their production of /sC/ clusters and /s´C/ sequences. If quantitative measurements show that tongue shapes for /zC/ are more like /s´C/ than /sC/, this would support the claim that there is epenthesis of a phonological schwa gesture into the speaker’s gestural score for the /zC/-initial word being produced. If, on the other hand, tongue shapes for /zC/ are similar to those for /sC/, despite the acoustic presence of a schwa, this would suggest that the schwa percept arises from gestural mistiming. With no schwa target in the gestural score, the production of /zC/ should show no articulatory evidence of the presence of a schwa, including coarticulation between /z/ and /´/, tongue body lowering, or tongue root retraction. The validity of this comparison is based on the assumption that the supraglottal constriction of the cognate fricatives /s/ and /z/ is comparable. Though it is known that maintaining vocal fold vibration during the production of a voiced fricative requires sufficient intraoral pressure, some evidence has shown that any requisite changes in the configuration of the vocal tract do not occur in the supralaryngeal region, but rather in the pharyngeal and glottal regions (Stevens et al. 1992). For example, maintenance of vocal fold vibration can be enhanced by actively expanding the walls of the vocal tract in the pharyngeal region (Perkell 1969), or perhaps by slackening the vocal folds (House and Fairbanks 1953, Halle and Stevens 1971). Thus, the ultrasound imaging experiment is carried out assuming that the tongue shapes produced for the supralaryngeal constrictions are not qualitatively different for voiced and voiceless cognate obstruents.

4.3.1. Participants The participants were 5 University of Maryland graduate students from various schools, including the Law School, the Pharmacy School, the Nursing School, and the Medical School. All students were native speakers of American English; one was also a speaker of Korean. No students had been exposed to Slavic languages. None reported any

Page 75: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

65

history of speech or hearing impairments. Six other participants were also recorded, but their data was not examined here, since these speakers did not produce any errors (that is, they did not produce an acoustic schwa between the consonants of the /zC/ sequence). All participants were paid for their time.

4.3.2. Materials The target stimuli in this study were three triads consisting of /sCi-/, /s´Ci-/, and /zCi-/ initial words. The /zC/-initial words were designed to be possible but non-words in Polish, so that all words could be recorded by a bilingual English-Polish speaker. The second consonant of each member of a triad was matched for place, manner, and continuancy. To the extent possible, an effort was also made to match all members of the triad on the vowel immediately following the second consonant so that any coarticulatory effects of the vowel on the production of the preceding consonant would be minimized. This was not always possible however, since Polish vowels are only a subset of English vowels. For each triad, two possible pseudo-Polish words were constructed in order to improve the likelihood of capturing usable ultrasound images. The target words used in the experiment are shown in Table 7:

Triad English /s´C/ English /sC/ Pseudo-Polish /zC/ Labial: /s´p/-/sp/-/zb/ superfluous spurt [zbura], [zbertu] Coronal: /s´t/-/st/-/zd/ satirical steer [zdiri], [zderu] Velar: /s´k/-/sk/-/zg/ succumb scum [zgama], [zgomu]

Table 7. English and pseudo-Polish experimental target words In addition to the target words, 24 more legal words and 8 more non-words were also presented to the participants, for a total of 44 words. The additional words were collected with the intention of using them for future research. Three different randomized lists of all 44 words were created. The stimuli were recorded by a bilingual English-Polish speaker using the Kay Elemetrics CSL at a 44.1-kHz sampling rate.

4.3.3. Design and data collection

4.3.3.1 Ultrasound setup A commercially available ultrasound machine (Acoustic Imaging, Inc., Phoenix, AZ, Model AI5200S) was used to collect midsagittal images of the tongue during the production of the /s´C/, /sC/, and /zC/-initial words. A 2.0-4.0 MHz multi-frequency convex-curved linear array transducer that produces wedge-shaped scans with a 90° angle was used. Focal depth was set at 10cm, producing 30 scans per second. In order to ensure that the speaker’s tongue does not change position during data collection, the speaker’s head is stabilized by a specially designed head and transducer support (HATS) system (Stone and Davis 1995). This is necessary because speakers’ heads do not stay steady during running speech, and unless the transducer is immobilized, it is likely to shift by rotation or translation, leading to off-plane images. In the HATS system, the speakers’ head is immobilized by padded clamps positioned at the forehead, the base of the skull, and the temples that can be re-sized for different heads. The

Page 76: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

66

transducer is held by a motorized arm that can be positioned under the subject’s head and adjusted to optimize the image for a particular speaker. The transducer holder in the HATS system is designed to both maintain the transducer in constant alignment with the head and allow for full motion of the jaw. A frontal view image of the HATS system is shown in Figure 21.

Figure 21. Frontal image of HATS system.

Speaker’s head is immobilized with a series of padded clamps. The transducer is secured with a specially designed holder that ensures consonant alignment of the head while allowing full motion of the jaw. Image

from Dr. Maureen Stone, http://speech.umaryland.edu. In ultrasound imaging, piezoelectric crystals in the transducer emit a beam of ultra high-frequency sound that is directed through the lingual soft-tissue. A curvilinear array of 96 crystals in the transducer fire sequentially, and the sound waves travel until they research the tongue-air boundary on the superior surface of the tongue. They reflect off the boundary, returning to the same transducer crystals, and are then processed by the computer which reconstructs a 90° wedge-shaped image of the 2-mm thick mid-sagittal slice of the tongue. In the reconstructed image, the tongue slice appears as a bright white line on a gray background. This is shown in Figure 22. Flanking the image of the tongue slice on either side are two shadows; the left shadow is cast by the hyoid bone, and the right is cast by the jaw, since bone refracts the ultrasonic beam.

Figure 22. Mid-sagittal ultrasound image of the beginning of the sound /s/

The bright white curve is the surface of the tongue. The tongue tip is oriented to the right and the back of the tongue to the left, conforming to the image of the speaker in the photo inset. The inset on the right is the

acoustic waveform.

tip

Page 77: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

67

4.3.3.2 Recording procedure Participants were seated in the HATS system, which was adjusted to fit the speaker’s head comfortably. The transducer was coated with ultrasound gel and placed in the holder. The position of the transducer was adjusted until the crispest image of the speaker’s tongue was obtained. The target stimuli and filler words were presented to the speaker using PsyScope 1.2.6 on a Macintosh G3 laptop that was placed on a table in front of the speaker. The speakers were first given instructions informing them that they would be seeing a series of both words and non-words on the screen that would also be presented aurally. The words appeared on the screen in English-like orthography while a sample of each word, as recorded by the bilingual English-Polish speaker, was simultaneously played on an external speaker. Participants were asked to repeat each word seven times, and then wait for the experimenter’s signal to move on. At the signal, the speaker pressed the space bar to move on to the next word. The whole recording procedure lasted between 15-20 minutes depending on small variations in speech rate and in the experimenter’s instructions during the recording. For each word, both the visual ultrasound image and the synchronized acoustic signal were captured. In addition, the speaker’s head was videotaped throughout the duration of the recording, and a video mixer (Panasonic WJ-MX30) was used to insert both the image of the head and an oscilloscopic image of the acoustic signal. A video timer (FOR-A VTG-33, Natick, MA) was used to superimpose a digital clock in hundredths of a second on each frame. This can be seen in Figure 22. The composite video output, which includes the ultrasound image, the videotaped image of the speaker’s head, the image of the oscilloscope, and the time, was recorded along with the audio simultaneously on a VCR and digitally on a computer using the VideoSavant A/V capture program (IO Industries, London, Ontario). Using VideoSavant, the audio and video signal can be synchronized within an accuracy of ±15ms. Each frame on VideoSavant is exported to jpeg format so that they can be analyzed.

4.3.4. Methods

4.3.4.1 Data processing For each token, the ultrasound frames of interest were chosen by examining the acoustic record to determine the time and duration of each /s´C/, /sC/, and /zC/ sequence produced by the speaker. The middle 5 of the 7 repetitions of each sequence produced by the speaker were measured. The starting and ending times and the duration of the sequences in the middle 5 repetitions were ascertained using Praat by marking from the start of frication for the /s/ or /z/ to the beginning of the burst for the following obstruent. To locate the frame at the onset of the target phoneme, the acoustic time values were divided by .033 (since each frame is 33ms long). Since people do not always repeat at the same rate of speed, the duration of repetitions may change slightly, but they were generally within ±2 frames. Since the software ultimately used to compare two sequences of tongue shapes requires that the sequences of frames being compared are equal, the median number of frames (6-7) over each sequence of the triad was determined for each subject. For longer repetitions, the last frame was cut. For shorter repetitions, one additional frame from the steady-state portion of the stop was added.

Page 78: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

68

In order to measure the tongue shapes, Stone and her colleagues have developed EdgeTrak, an automatic system for the extraction and tracking of tongue contours (Akgul, Kambhamettu and Stone 1999, Li, Kambhamettu and Stone 2002). Using this system, a few points on the tongue image are chosen manually. Then EdgeTrak uses an active contour model (or snakes) to determine where the tongue edges in the image are. Once the edge of the first frame in a sequence is tracked and optimized, the algorithm can be applied to all of the tongue contours in a sequence. The tongue contours are defined by 100 interpolated (x,y) values that correspond to the distance from the left of the whole ultrasound image (x-axis) and the top of the image (y-axis). This is illustrated in Figure 23. One recent use of this system has been used to visualize and compare three-dimensional tongue shapes for a variety of different sounds, including almost all of the vowels in English and the segments /T S s l n N/ (Stone and Lundberg 1996, Lundberg and Stone 1999).

Figure 23. Automatically tracked contour

The contour is superimposed on mid-sagittal ultrasound image of the beginning of the sound /s/. The x and y values assigned to the contour are measured from the left and top of the entire ultrasound image, with the

origin in the top left corner. Once the tongue contours are tracked, they can be displayed as a series of x, y, t surfaces using the program CAVITE (Contour Analysis and VIsualization TEchnique: Vijay Parthasarathy, Maureen Stone, Jerry Prince, Min Li, Chandra Kambhamettu, 2002-03). CAVITE is implemented in MATLAB (Mathworks, Natick, MA). In order to be able to compare repetitions of the same utterances or examine tongue contours that are matched for experimental variables, it must be ensured the data collection process does not introduce too much error. CAVITE is designed to minimize a number of shortcomings that may arise in extracting the tongue contours, including (1) small differences in speaking rate or mismatches in the first frame in a sequence across repetitions, (2) small spatial variations due to head motion, and (3) differences in tongue lengths over the course of the utterance and across speakers. In order to minimize the first of these effects, CAVITE implements a time alignment algorithm which ensures that variations in individual repetitions due to speech rate do not affect the formation of averaged tongue contours. The time-alignment

x-axis values

y-axis values

Page 79: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

69

algorithm compares individual contours across repetitions to determine which frames in the sequences being averaged match up most closely, and then uses interpolation to create the best average surface. An Iterative Closest Point (ICP) algorithm slightly rotates and translates overlaid curves from different repetitions to optimize the overlay without modifying the contour shape. The differences in contour lengths is addressed with kriging, a statistical estimation technique that extrapolates and resamples the tongue surface contours so that they are all of equal lengths and can be directly compared (for a detailed explanation of kriging and its application to ultrasound, see Parthasarathy, Stone and Prince 2003). Once the series of tongue contours for an utterance have been temporally and spatially-aligned, CAVITE is used to average multiple repetitions of a single utterance. This is desirable because the tongue shapes for a particular sequence of sounds may vary from utterance to utterance, so averaging provides a more stable estimate of tongue shape. For current purposes, this is most useful as a tool for visualizing the data. In this experiment, each average is over five repetitions. In averaging, the kriging procedure is useful because the curves in all of the repetitions must be the same length in order for multiple repetitions to be averaged. Because kriging may lengthen both the curve both in the back and the front of the tongue in order to produce curves of equal lengths, the data for these regions may have considerable variance. In addition to the variance contributed by kriging, the tongue contours produced by EdgeTrak are not always reliable at the back of the tongue, because accurate refraction of the sound waves and reconstruction of the tongue surface in this region is often hindered by the shadow cast by the hyoid bone (Li et al. 2002). The same is true for the tongue tip, which is affected by the shadow of the jaw. Given these measurement concerns, and because previous research (Gick and Wilson to appear) and the visual imaging of the data suggest that the main manifestation of the presence of a schwa is in the positioning of the tongue blade and body, the main region of interest in this study is the tongue blade and body. In order to ensure that only the relevant region is examined and submitted to a statistical test, the extreme ends of the tongue tip and tongue root were excluded from measurement. Since there are no established divisions of the tongue into root, body, blade and tip in the ultrasound literature, an arbitrary cut of 12mm from both the back of the tongue and the tip of the tongue was imposed. The sequence of steps for the ultrasound data collection and processing are summarized in Figure 24. Following the capture of the ultrasound data (Figure 24a) and extraction of tongue contours with EdgeTrak (Figure 24b), an averaged surface of each /s´C/, /sC/, and /zC/ sequence is calculated using CAVITE. The averaged surface can be displayed either as a waterfall (Figure 24c) or as a spatio-temporal XY-T surface (Figure 24d) that allows for better visualization of how the tongue changes shape over time. These steps are exemplified in Figure 24c-d for the first 8 frames of the word succumb as produced by speaker ELR.

Page 80: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

70

(a) Image collection (b) Tongue tracking

(c) The tracks of consecutive frames are plotted (d) XY-T spatiotemporal surface together as a waterfall

Figure 24. The sequence of steps in ultrasound data collection (a) Images at midline are collected. (b) Tongue shapes are tracked from the original ultrasound image using

EdgeTrak. (c) Using CAVITE, the tracked curves for each repetition of the sequence are averaged. The averaged curves are plotted as a waterfall with respect to spatial x- and y-axes and a temporal axis. The x-axis is the tongue length and the y-axis is tongue height. The units of the x- and y-axes are in millimeters,

where the origin is the top left corner of the ultrasound frame (see Figure 23). Based on the acoustic record, the frames most closely corresponding to the /s/, /´/ and /k/ can be determined. (d) Interpolation is used to connect the frames in order to get a better idea of how the tongue changes shape over time. Note that this

figure does not represent one tongue surface, but rather 6 connected frames. The colors reflect the height of the tongue curve: dark blue is a high tongue position, and lighter blue and yellow are lower positions.

CAVITE can also display two types of more direct comparison of two XY-T spatiotemporal surfaces like the one shown in Figure 24d. One of the visualization techniques built into CAVITE is an overlay function, which superimposes one spatiotemporal surface over another. This is useful because it can be difficult to visually compare two surfaces which are side-by-side. The overlay is illustrated in Figure 25 for the first 8 frames of participant ELR’s succumb and zgama. In these figures, the real English word is represented by the solid spatiotemporal surface, and the /zC/ token is illustrated by the white mesh figure. Again, the x-axis is tongue length, the y-axis is tongue height, and the time frames are plotted on the t-axis.

s ´ k h

tongue blade

tongue body

kh

Page 81: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

71

Figure 25. Overlay of spatiotemporal XY-T images

This is the first 8 frames of ELR’s succumb (solid) and zgama (white mesh). The best method for directly comparing how two spatiotemporal surfaces differ is a difference graph, in which the numerical difference between productions of /s´C/-/zC/ or /sC/-/zC/ can be visualized by subtracting the averaged XY-T spatiotemporal surfaces and plotting the difference in a two-dimensional graph. The difference graph for ELR’s succumb and zgama is given in Figure 26. The length of each tongue curve is plotted on the x-axis, and time frames are plotted on the y-axis. Differences of the subtracted tongue height curves are represented by the different colors. In the graph in Figure 26, small or no differences between the heights of the two tongue curves being compared are signified by light blue. Red and orange positive differences indicate that the tongue shape for the illegal /zC/ token (the white mesh XY-T surface in Figure 25) is higher in the mouth, whereas darker blue negative differences indicate that the native word (/s´C/ or /sC/) is higher. In Figure 26, the first 4 frames of zgama have a higher tongue position, the two productions are very similar in the fifth frame, and in frames 6-8, the tongue blade region is higher for the curves of succumb.

Figure 26. Difference graph for the first 8 frames of ELR’s succumb and zgama

The x-axis represents the distance along the midsagittal line of the tongue from tongue body (left) to tongue blade (right). Time, or individual tongue contour frames, is on the y-axis.

Page 82: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

72

In Section 4.4.1, a number of tongue shapes for different triads and speakers will be examined in order to determine whether speakers’ productions of /zC/ is more similar to their production of [sC] or [s´C].

4.3.4.2 Statistical measures: L2 norms and the sign test One metric useful for determining the differences between tongue shape change surfaces on a frame-by-frame basis is the L2 norm. The L2 norm is an error measure that quantifies the differences between the tongue shapes for each frame based on the subtraction of the vector representing one tongue shape from the vector representing the other. Although each point in each vector is defined by x- and y-values (tongue length and height), only the y-values are necessary in the L2 norm calculation since preprocessing with CAVITE ensures that the x-values are the same. The L2 norms of the difference vectors are calculated by the equation in (5): (5) ( )∑ −= 2

212 YYnormL The L2 norm is a convenient error metric since it allows for a calculation of differences between tongue shape curves for each frame in the sequence. A more global measure that collapses across all of the frames in the tongue surface could be used, but this kind of metric would obscure the finer details of where the differences between the tongue shape changes over time occur. In contrast, the L2 norm allows for the localization of differences in the most appropriate regions. Specifically, it is hypothesized that the most informative differences will occur in the first 4-5 frames of the surfaces, since these are the ones that correspond to the /s/ or /z/ and the /´/ in the acoustic record. Initially, the L2 norm was calculated for the comparison of the averaged tongue shapes (i.e. the L2 norm of the difference of the average of the 5 repetitions) for the first 5 frames of each word for all of the speakers. However, finding the appropriate statistical measure to determine whether the averaged L2 norms for the [sC]-/zC/ comparison are reliably different from the averaged L2 norms for the [s´C]-/zC/ comparison has so far proven intractable. Therefore, the sign test is used to determine whether speakers’ productions of /zC/ are statistically more similar to [s´C] or [sC].10 The sign test is a statistical measure that matches the L2 norms of individual repetitions of [s´C]-/zC/ comparisons to [sC]-/zC/ comparisons to determine whether the L2 norms for one of the comparisons is smaller significantly more often. The logic behind using this measure is that if speakers’ tongue shapes for [z´C] are reliably more like [sC], then the L2 norms for comparisons of the individual repetitions should be smaller for this pair, regardless of which repetitions are compared to one another. The sign test is a conservative test that does not rely on the assumption that the data is normally distributed. To create the input for the sign test, the L2 norm between every possible combination of repetitions from 1-5 for the [s´C]-/zC/ comparison is determined for each frame. The same is done for every possible combination of repetitions for the [sC]-/zC/ comparison. This generates 25 L2 norms per frame for both the [s´C]-/zC/ comparison

10 Thanks to Mary Beckman for suggesting this statistical test.

Page 83: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

73

and the [sC]-/zC/ comparison, which is the input to the matched pairs sign test. This set-up is shown schematically in Figure 27. Framei [s´C]: rep1 rep2 rep3 rep4 rep5

Framei /zC/: rep1 rep2 rep3 rep4 rep5

Figure 27. Schematic of L2 norms

25 L2 norms (black lines) compare all combinations of the 5 repetitions of Frame i for [s´C] (boxes) and /zC/ (circles). L2 norms are calculated for each frame i (i=1-5) and matched to the corresponding L2 norms

for the [sC]-/zC/ comparison in the sign test calculations. Sign test comparisons were conducted with MATLAB. For 25 comparisons, the sign test criterion values for significance are ≥18 and ≤7. That is, if the L2 norms for individual repetitions are smaller for [z´C]-[sC] for at least 18/25 comparisons, then it can be said that a speaker’s tongue shapes for [z´C] are reliably more similar to [sC]. On the other hand, if at most 7/25 L2s are smaller for the [z´C]-[sC] comparison, then a speaker’s tongue shapes for [z´C] are significantly more similar to [s´C]. Any number between 8-17 (inclusive) indicates that the production of [z´C] is not reliably more similar to either [sC] or [s´C]. This is assuming a two-tailed test with an alpha value of .021 for either tail.

4.4. Results

4.4.1. Visual imaging One simple method for appraising whether speakers’ productions of /zC/ sequences are more similar to /s´C/ or /sC/ is to examine the XY-T spatiotemporal surfaces for the members of a triad. There are two types of patterns evident in the cases when the /zC/ sequences is produced as [z´C]: (1) the sequence of tongue shape changes over time is more similar to the speakers production of [sC], and (2) the sequence of tongue shape changes is more like the speakers production of [s´C]. The first pattern is illustrated with JED’s velar triad. In Figure 28, the first 6 frames of the spatiotemporal surfaces for JED’s productions of succumb, scum, and zgomu are plotted. In Figure 28(d), the spectrogram for JED’s utterance of zgomu illustrates that it is produced as [z´gomu].

Page 84: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

74

(a) [s´k] of succumb (b) [sk] of scum

(c) [z´g] of zgomu (d) z | ´ | g | o | m | u

Figure 28. (a)-(c) XY-T displays of each surface in the /s´k/, /sk/, /zg/ triad for speaker JED (d) Spectrogram of one repetition of JED’s production of zgomu. Note the presence of a robust schwa

between [z] and [g]. Impressionistically, a number of differences can be seen among the images in Figure 28(a)-(c). First, the [s] of [s´k] is characterized by a tongue body position that is lower in the mouth than that of either of the other two tokens. This is indicated in the images by a lighter blue which corresponds to a lower position in [s´k] and a darker blue which indicates a higher position for [sk] and [z´g]. More generally, the first two frames are flatter and less vaulted than the first two frames for [sk] or [z´g]. Note that if the schwa target in the word succumb corresponds to its own gesture, then it might be expected that changes in tongue shape might have three stages: the tongue blade and body would (1) be slightly higher for the production of the [s] (first 2 frames), (2) lower and perhaps retract for the production of the [´] (third frame, Gick 2002, Gick and Wilson to appear), and (3) raise considerably in the dorsal region for the production of the [k] (fourth and fifth frames). However, this does not appear to be the case for succumb. In the production of this word, evidence for a schwa target comes from the starting position of the [s], which coarticulates with the immediately following gesture. In scum, the [s] coarticulates with the [k], which has a very high tongue body position

s ´

k s

k

´? z

g

Page 85: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

75

necessary for creating a velar closure. This causes the [s] to start in a relatively high position also. The production of scum can be contrasted with succumb, in which the [s] has a considerably lower starting position because it is coarticulating with a [´], which has a tongue body target position that is lower in the mouth. Despite the fact that the acoustic record for JED’s production of /zg/ contains a schwa, the tongue shape changes for [z´g] appear more similar to those for [sk] than [s´k]. Like the [s] in scum, the [z] appears to be coarticulating with the [g], which has a high tongue body target, rather than with a schwa gesture which would presumably force the [z] to have a lower starting position for the tongue body. It is worth noting that /z/ does not have an intrinsically higher starting position; as shown in Figure 29, when coarticulating with the /E/ of zealot, the [z] has a lower starting position.

Figure 29. JED’s production of [zEl] in zealot The [z] in this word has a lower starting position than JED’s production of [z] in zgomu.

The differences between JED’s [s´k]-[z´g] and [sk]-[z´g] can be further visualized by overlaying the two surfaces, as in Figure 30. These figures indicate that while the tongue for [z´g] is higher in the mouth for most of both tokens, but the starting tongue position for [z´g] is much closer to [sk] than [s´k], as indicated by the arrows in Figure 30.

z E l

Page 86: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

76

Figure 30. Examples from speaker JED Left: Overlay of XY-T surfaces for the [s´k] of succumb (solid) and [z´g] of zgomu (white mesh). Right:

Overlay for the [sk] of scum (solid) and [z´g] of zgomu (white mesh). The arrow points to the starting position of the tongue for each word. Note the considerably smaller difference between [z´g] and [sk].

Because it is difficult to see the differences between the two surfaces beyond the first frame and the edges of the tongue curve, differences between the productions of [s´k]-[z´g] and [sk]-[z´g] can also be visualized by subtracting the two surfaces of tongue curves and examining the difference graph. In these graphs, small or no differences between the two surfaces are signified by the areas in yellow or light green. Greater positive differences are indicated by orange and red (tongue positions in the non-native word are higher in the mouth) and negative differences appear as shades of blue (tongue positions in the native word are higher in the mouth).

Figure 31. Examples from speaker JED Left: Difference between surfaces for the [s´k] of succumb and [z´g] of zgomu. Right: Difference between surfaces for the [sk] of scum and [z´g] of zgomu. The differences in millimeters between the two surfaces is encoded by color: yellow to light green represents the smallest differences between the two curves. There are two areas of greater difference for the [s´k]- [z´g] comparison than for the [sk]-[z´g] comparison, as

indicated by the circles.

Page 87: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

77

In the case of speaker JED’s productions of succumb, scum, and zgomu, as shown in Figure 31, it can be seen that there are larger differences between [s´k]-[z´g] than [sk]-[z´g]. For example, in frames 1-3 of the graph on the left in Figure 31, the height of the tongue differs by 5-8mm in the region of the tongue body, as indicated by the circled orange and dark red shading. Another area in the same figure that shows a substantial difference is in frames 4-5, where the curves differ by as much 8mm in the tongue blade, as indicated by the circled dark blue shading. In the right-side graph in Figure 31, on the other hand, the differences between the [sk]-[z´g] surfaces are much smaller, never reaching greater than 5mm in any location. Thus, for JED’s production of the velar triad, visual inspection indicates that the non-native target /zg/ is more similar to /sk/ than /s´k/. The second pattern—when the tongue shape changes for the production of /zC/ as [z´C] are more similar to [s´C]—is exemplified by speaker HJC’s production of the labial triad. The acoustic record shows that HJC produced the word zbertu with a schwa between the /z/ and /b/ of the initial cluster and the production of /zb/ appears more like [s´p] than [sp]. This is demonstrated by the tongue individual surface images, the overlay images, and the difference images shown in Figure 32.

(a) [s´p] of superfluous (b) [sp] of spurt

c. [z´b] of zbertu (d) z | ´ | b | e | r | t | u

s ´ p

s p

z ´ b

Page 88: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

78

(d) surfaces overlay: [s´p] (solid) & [z´b] (mesh) (e) surfaces overlay: [sp] (solid) & [z´b] (mesh)

(g) Difference between [s´p] and [z´b] (h) Difference between [sp] and [z´b]

Figure 32. Examples from speaker HJC for the labial triad

Unlike JED, HJC’s labial triad suggests that her articulation for [z´b] is more similar to her production of the sequence [s´p] than her production of the cluster [st]. The XY-T spatiotemporal surfaces in Figure 32(a)-(c) illustrate that [s´p] and [z´b] both have a higher starting tongue body position than [sp], as indicated by the darker blue color. The overlay images in Figure 32(e)-(f) additionally show that the initial tongue body for [s´p] and [z´b] is nearly identical, whereas the tongue body starting position for [sp] is nearly 4mm lower than that for [z´b]. These findings are consistent with the hypothesis that the [s] or [z] is coarticulating with the following gesture, which determines what the starting tongue height will be. Because [p] is not produced with any contact between the tongue and the palate, it can be assumed that the tongue position for [p] is similar to rest; that is, it is assumed to be low in the mouth. When [s] coarticulates with a following [p], as in the case of spurt, it has a lower starting position than when followed by a vowel gesture, even schwa. In superfluous, [s] coarticulates with a schwa gesture that apparently has a higher tongue body position than [p]. Thus, for HJC, the tongue remains higher throughout the whole [s´] sequence than it does in an [sp] sequence. Likewise, in [z´b],

Page 89: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

79

the tongue has a higher starting position than it does for [sp]. Differences between the shape of the [p] and [b] for both superfluous and spurt may result from coarticulation with a vowel that differs in quality. Whereas the second vowel in both superfluous and spurt is [‘], the second (rhotacized) vowel in zbertu is produced as [er]. The difference images reinforce the greater similarity of [s´p]-[z´b]. For the first five frames of the [s´p]-[z´b] difference, which is the main time sequence of interest, there is at most a 3mm difference between the tongue shapes (Figure 32(g)). The differences between [sp]-[z´b] are closer to 4-5mm in the tongue blade in multiple regions, as indicated by the two black circles (Figure 32(f)). The white dashed circles indicate that by frames 6-7, the tongue body for [z´b] is beginning to differ considerably from both [s´p] and [sp]. Again, this may be attributable to the differences in vowel quality between the English words and the non-native word. Visual imaging is an important step in understanding how articulation relates to the events in the acoustic signal. In the case of schwa production, the XY-T spatiotemporal surfaces created from the tongue contours in the ultrasound images suggest that an acoustic schwa may correspond to either a real schwa gesture (as evidenced by cases when the production of [z´C] is more similar to [s´C]), or to gestural mistiming (when [z´b] is more similar to [sC]). That is, either of these repairs appear to be available and used by English speakers. However, visual inspection is only impressionistic and requires confirmation by statistical measures. As described in Section 4.3.4.2, the L2 norm and sign test were used for statistical validation.

4.4.2. Statistical results Recall that each of the five speakers were presented with the coronal, labial, and velar triads. The final data set contained 15 non-native words with a /zC/-initial sequence (5 speakers x 3 words, with 5 repetitions for each word). Of these 15 words, 11 of them contained a schwa in the acoustic record.11 In the other cases, the non-native targets were either devoiced or produced correctly. The results for /zC/ initial clusters by speaker are summarized in Table 8.12 When a speaker repaired the non-native sequence with a schwa, the duration of the schwa, averaged over the 5 repetitions, is given in parentheses.

11 Whenever a speaker produced a /zC/ target with a schwa, he or she was consistent for all repetitions. 12 Note that although each speaker produced 2 words for each of the /zC/ targets (see Table 7), only the stimulus with the best image for each speaker was measured. This is why different target words for the /zC/ stimuli are given in Table 8.

Page 90: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

80

Participant JED HJC PDD KAH ELR

[zderu]

[z´diri] (71ms)

[zderu]

[z´diri] (62ms)

[steru]

[z´bura] (24ms)

[z´bertu ] (34ms)

[z´bura] (37ms)

[z´bertu] (31ms)

[z´bura] (54ms)

Non-Native Stimulus

[z´gomu] (35ms)

[z´gomu] (47ms)

[z´gomu] (34ms)

[z´gomu] (41ms)

[skama]

Table 8. Speakers’ productions of non-native stimuli 11 of 15 /zC/ clusters were repaired with a schwa in the acoustic record. Of the remaining 4 tokens, speaker

ELR repaired 2 tokens by devoicing the consonants to make legal /sC/ clusters, and JED and PDD produced the /zd/ token correctly. The number in parentheses is the average duration of the schwa in the

[z´C] sequence as measured from the 5 repetitions. L2 norm values for the /s´C/-/zC/ and /sC/-/zC/ comparisons can be evaluated to determine which native sequence /zC/ is more similar to. The smaller the L2 norm value for a particular comparison, the more similar /zC/ is to that native sequence. In order to determine whether the L2 norms are significantly smaller for one comparison or the other, the sign test is used. For each frame, an L2 norm is determined for every combination of the 5 repetitions of /zC/ with the 5 repetitions of /s´C/ and /sC/. This provides 25 L2 norms per frame that are submitted to the sign test. The mean L2 norm of the 25 values for each of the comparisons is shown in Table 9 for illustrative purposes. Differences between the comparisons of less than 1 were considered within measurement error. The L2 norm values for significantly smaller differences are shaded gray (i.e. the sign test value is ≤7 when the /s´C/-/zC/ comparison is significantly smaller, and ≥18 when the /sC/-/zC/ comparison is significantly smaller).13

13In ELR’s production of zderu, PDD’s production of zbura, and JED’s production of zgomu, only 4 repetitions of each word were included due to measurement errors. Since this gives 20 comparisons to be submitted to the sign test, the criterion values were 5 and 15 just for these three triads.

Page 91: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

81

Cor: /s´t/-/st/-/zd/ Lab: /s´p/-/sp/-/zb/ Vel: /s´k/-/sk/-/zg/

Fr /s´t/-/zd/ /st/-/zd/ /s´p/-/zb/ /sp/-/zb/ /s´k/-/zg/ /sk/-/zg/

1 10.4 12.0 28.8 14.5 30.7 14.42 10.8 15.4 35.0 16.4 41.4 24.33 11.2 19.9 49.8 17.7 37.8 26.74 13.2 25.6 59.1 20.8 27.3 24.0

ELR [st] [z´b] [sk]

5 17.9 31.3 59.3 25.0 20.8 21.71 15.1 11.2 15.9 11.3 28.0 15.82 11.7 13.3 17.8 12.6 35.3 17.63 9.4 20.1 25.8 17.5 36.1 18.74 11.8 28.6 34.7 24.7 17.4 22.9

JED [zd] [z´b] [z´g] 5 18.9 32.3 42.8 36.5 20.4 32.7

1 18.8 14.6 17.9 12.8 13.5 10.42 18.2 10.4 22.8 13.2 22.8 15.43 11.6 11.5 26.3 16.3 28.2 16.14 15.1 15.9 31.5 22.0 23.2 16.6

PDD [zd] [z´b] [z´g]

5 22.5 18.5 33.6 20.6 15.9 16.11 13.2 13.8 13.8 17.6 16.6 20.12 12.7 14.7 14.6 16.7 23.0 25.03 11.7 14.3 15.3 14.0 32.1 29.04 10.5 13.4 13.8 12.3 26.3 22.7

KAH [z´d] [z´b] [z´g]

5 11.2 11.8 15.9 11.8 18.6 20.31 13.9 14.2 9.2 15.4 17.5 18.12 13.3 15.1 9.6 14.9 18.8 18.03 12.2 14.4 10.9 15.6 22.1 23.94 11.7 13.3 11.9 17.0 26.1 26.5

HJC [z´d] [z´b] [z´g]

5 13.5 16.1 17.3 19.2 26.1 23.1Total # of

significant frames 8 3 5 16 2 13Table 9. Frame-by-frame average L2 norms for each speaker

The sequences under the speakers’ initials indicate the speaker’s production for each /zC/ target. Columns contain mean L2 norm results for /zC/ comparisons to [s´C] (odd columns) and [sC] (even columns) for

each of the 5 measured frames in the sequence. Gray shading identifies significance for the sign test, i.e., which legal sequence-/zC/ comparison has smaller L2 norms significantly more often. The bold number in

the final row indicates which comparison for each pair has significantly more smaller L2 norms. The results of the sign test lead to two main observations. First, the tally of significant differences in the last row of Table 9 shows that the coronal sequences pattern differently from the velars and labials. In the velar and labial data, /zC/ tongue shapes were generally more similar to the /sC/ sequences than /s´C/ sequences, despite the fact that in all cases (except ELR’s velar triad), there was an acoustic schwa (column 1). For the labial triads, the number of smaller L2 norms was significant for 16 frames of the /sp/-/zb/ comparison, versus 5 significant frames for /s´p/-/zb/ (4 of which are due to one

Page 92: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

82

speaker). For the velar triad, L2 norms were significantly smaller for /sk/-/zg/ for 13 frames, as compared to 2 significant frames for /s´k/-/zg/. For the coronals, on the other hand, there were fewer significant similarities, and they mostly indicated that the tongue shapes were similar to the /s´C/ shapes even though 3 subjects did not have an acoustic schwa. This finding is discussed further in the General Discussion (Section 4.5). Second, there was also a subject effect. JED, ELR and PDD strongly reflected the majority pattern and accounted for much of the significant data. KAH reflected this pattern more weakly with significant comparisons mostly in the final frames. HJC’s acoustic record contained a schwa in all contexts; her tongue shapes were more similar to /s´C/ for the labial case and did not show statistically greater similarity to either pattern for the velar and coronal data. With these data, 3 of the 5 participants have tongue shapes consistent with patterns for coronals differing from velars and labials, and for non-coronals, tongue shapes do not reflect the anticipated pattern for underlying schwa. Reasons for this unexpected distinction between different types of triads are discussed in the next section.

4.5. General discussion The results of the ultrasound imaging demonstrate that the productions of /zC/ as [z´C] are more similar to their productions of /sC/ than /s´C/ for three speakers. It was hypothesized in Section 4.3 that the greater similarity of [z´C] to /sC/ would arise if the output of the phonology for /zC/ word-initial clusters does not actually include a schwa gesture with its own target. Instead, the schwa present on the acoustic record follows from the hypothesis illustrated in (4)a: speakers are pulling apart the /z/ and subsequent consonant in order to prevent their overlap. If the vocal tract between the constrictions of the two consonants is sufficiently open, then a vowel will be perceived. If a schwa target were actually present in the production of [z´C], it would have direct consequences for the production of the preceding consonant. First, the /z/ would then be part of a syllable with the /´/ rather than forming a cluster with the following /b/, /d/ or /g/, which would this affect the timing relationships that /z/ has with the following gestures (Browman and Goldstein 1995, Byrd 1996a, b). Second, as discussed with respect to JED’s velar triad and HJC’s labial triad, the tongue shape and position of /s/ and /z/ is determined by the immediately following gesture. Thus, the starting tongue shape depends on whether or not a schwa gesture is present in the score. For JED, PDD, and ELR’s labial and velar triads, the direction of the L2 norms for the 5 frames of interest is almost always consistent. While more data might be necessary to detect more robust results in the cases that are non-significant but trending in one direction or the other, the uniformity of the findings suggests that the ultrasound methodology does not contain too much noise and error in the data when a speaker’s strategy is internally consistent. Speakers HJC and KAH, on the other hand, may be vacillating between different strategies. At least for her labial triad, HJC appears to be using an epenthesis strategy rather than a mistiming one. Her velar triad, on the other hand, seems to be a compromise between a mistiming approach and epenthesis, since none of the L2 norms for the frames in that sequence are significantly smaller for either the /s´k/-[z´g] or /sk/-[z´g] comparisons. It is possible that HJC is experimenting with multiple strategies. It is somewhat harder to interpret KAH’s results, since they start off

Page 93: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

83

in the early frames with her [z´C] production being more similar to /s´C/ and becoming more similar to /sC/. More data would need to be collected from KAH in order to better understand her results. At this point, speakers’ behavior on the coronal sequences must also be addressed. A careful look at all of the productions of this triad demonstrate that the results for most speakers’ productions of coronal are somewhat unexpected. For JED, who produced the /zd/ cluster without an acoustic vowel, the L2 norms are smaller for the /s´t/-[zd] comparison despite the fact that there was no schwa. ELR devoiced the /zd/ cluster, but her L2 norms for that production were also smaller for the comparison with /s´t/. KAH’s [z´d] production is more similar to her production of /s´t/. PDD produced [zd] without a schwa, and the significant frames in this comparison are more similar to /st/. Because /s/ and /t/ or /z/ and /d/ are homorganic consonants, the production of /st/ and /zd/ clusters may create a situation that does not arise with the other two triads. If the articulation of /s/ and /t/ is very similar, with the essential difference between them being a tightening of the constriction in the alveolar region from /s/ to /t/ (or /z/ to /d/) (Catford 1988, Stone and Lundberg 1996), then there is no chance for the vocal tract to be open between the two consonants unless the speaker moves the tongue away from the palate. Even if the consonants in the cluster are coordinated such that they do not overlap, the motion of the tongue from /s/ to /t/ will not result in a transitional vowel unless it is pulled away from the alveolar ridge (a similar discussion regarding homorganic consonants can be found in Gafos 2002). It may be that in producing /z/ and /d/ in a non-overlapping configuration, the surface production will sound indistinguishable from a cluster unless the tongue also moves away from the palate. For example, although the acoustic output for JED appears to be [zd], it could be that this results from a configuration that is non-overlapping but in which the tongue is not pulled away from the palate. However, if he produced [zd] with the gestures pulled apart and his tongue moved even slightly away from the palate (though not as much as if he were producing a schwa gesture), the coordination of the sequence may appear articulatorily more similar to the coordination for /s´t/ even though it does not result in an acoustic schwa. For both HJC and KAH, who produced [z´d], the L2 norms for the comparison with /s´t/ are smaller, although not quite significant. In keeping with similar behavior on their other triads (especially HJC), it is possible that these speakers are actually epenthesizing a schwa gesture in their production of [z´d]. The issues unique to the production of the coronal triad may shed light on why there are differences in behavior on these sequences versus labial and velar sequences. The fact that the L2 norms are smaller for the [z´C]-/sC/ comparison for three speakers has been interpreted as demonstrating that speakers at least have available to them the ability to repair phonotactically illegal tautosyllabic consonant sequences with gestural mistiming—the pulling apart of the consonantal gestures of a target cluster. This finding does not rule out the possibility that speakers also have the phonological epenthesis repair available to them as well, and this may be reflected in some of the cases in which the speakers’ productions of [z´C] are more similar to /s´C/. Given the nature of the task, it is likely that some speakers attempting to produce the non-native clusters as close to the Polish target as possible rather than trying to nativize the word to something English-like. Were these words to be borrowed into English, for example, it might be

Page 94: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

84

expected that the speech community would in fact treat them as /z´C/, since this would be a legal English sequence. However, the task in which these words are being uttered is more like a second language production situation, so the speaker is likely intending to be as faithful as possible to the target. This could explain why many speakers would not epenthesize a schwa even though this is the expected repair for English. More important, however, is that even though speakers may successfully avoid epenthesizing a schwa in this case, they nevertheless cannot produce these sequences completely faithfully because they cannot adequately coordinate consonants which are not legal word-initial clusters in English. The locus of coordination effects and the implementation of this repair is discussed further in the next chapter. It should also be noted that this type of repair may only be available for speakers of a language that has schwa as the epenthetic segment. For a language like Japanese, for example, studies of both perception and production (especially of loanwords) of phonotactically illegal sequences have demonstrated that the epenthetic vowels of Japanese are typically high vowels such as /i/ or /u/ (Dupoux et al. 1999, Itô and Mester 1999b, Dupoux et al. 2001). Presumably, in order to produce this vowel, the tongue must have a particular position that is not achievable simply by pulling the flanking consonantal gestures far enough apart. If such a configuration is ruled out on the basis that it gives rise to a percept that is not part of the inventory of a language, then it may be the case that it is not a possible repair of an illegal phonotactic sequence. This is an interesting speculation, considering that non-overlap of consonantal gestures in English is not a productive process, but it may be that it is permitted because the acoustic signal that results from it is well-formed for English. If it is the case that in Japanese the repair is always the phonological epenthesis of a vowel gesture, then this would be consistent with hypotheses that emphasize the role of perception in phonological theory (notably, Boersma 1998). A more articulated theory of how such a system would work is a topic for future research.

4.6. Summary The results of the ultrasound experiment reported in this chapter are consistent with the hypothesis that the schwa produced between the consonants of a phonotactically illegal cluster can be the result of gestural mistiming rather than phonological epenthesis. In order to further investigate claims in the Articulatory Phonology literature that schwa may result from a period of open vocal tract between two non-overlapping consonants (Browman and Goldstein 1990a, 1992b, Jannedy 1994, Gafos 2002, Smorodinsky 2002), five English speakers were asked to produce words starting with /s´Ci/-/sCi/-/zCi/ sequences to determine whether tongue motion during their production of [z´C] was more similar to [s´C] or to [sC]. The findings demonstrate that for three speakers who produced /zC/ as [z´C], the tongue shapes were more similar to /sC/, suggesting that they are not inserting a schwa with its own gestural target. The other two speakers may be using epenthesis, although their results are not conclusive. The phonotactically illegal sequences used in this experiment were specifically chosen to be /z/-initial, since these can be directly compared to legal English /s´C/ and /sC/ sequences. However, it is hypothesized that if speakers use this repair for /zC/ clusters, then they likely also use it for other types of non-native sequences, such as the

Page 95: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

85

/fC/ or /vC/ clusters in the experiment in Chapter 3. With this assumption in mind, an analysis of non-native cluster production based on the results from Chapter 3 is presented in Chapter 5. It is shown that the cluster coordination that gives rise to the transitional schwa as indicated by the ultrasound findings interacts with phonological markedness constraints prohibiting the fricative-initial clusters investigated in the previous chapter.

Page 96: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

86

CHAPTER 5. Cluster Production in a Constraint-Based Gestural Theory In this chapter, an Optimality Theoretic account of the experimental data presented in Chapters 3 and 4 is developed. An analysis that can account for the differences in accuracy in cluster production reported in Chapter 3 must incorporate both perceptual and articulatory considerations. The analysis builds on the Licensing-by-Cue framework (Jun 1995, Silverman 1997, Steriade 1997, Kirchner 1998/2001, Wilson 2000), with the additional inclusion of articulatory factors into the evaluation of the phonotactic environments that make preferred consonant clusters. It is argued that the significant performance differences between combinations of fricatives with stops, fricatives or nasals arise because the final state of the English grammar distinguishes them, despite the fact that none of the /f/, /z/, or /v/-initial clusters are found in English. Under certain situations, such as loan phonology, second language acquisition or the experimental conditions explored in this study, speakers are able to manipulate these grammatical distinctions. Further theoretical and empirical evidence is presented to rule out the possibility that phonetic factors alone are sufficient for accounting for the results. Another finding of the experiment in Chapter 3 is that speakers are able to correctly produce each type of illegal cluster some proportion of the time, although the level of accuracy is dependent on the particular constituents of the cluster. It is argued that such variability is best captured by a grammatical account, and that it is constrained in a way that is consistent with being phonologically represented. The constraint governing accurate coordination does not have a fixed ranking with respect to the markedness constraints pertaining to the clusters. More specifically, a floating constraint analysis is proposed to demonstrate how speakers are able to produce illegal clusters correctly a certain proportion of the time. The results from Chapter 4 suggest that the relevant constraint floating over the cluster markedness constraints is not a typical faithfulness constraint (like DEP), but rather a coordination constraint. The ultrasound findings indicate that speakers’ productions of initial /zC/ clusters are more similar to their production of /sC/ clusters than /s´C/. This pattern suggests that when faced with non-native clusters in an experimental task in which speakers are intending to correctly utter the target structure, they do not epenthesize a schwa gesture, but they still fail to produce the cluster accurately. The locus of the errors is in the coordination of the consonant clusters; because /f/, /z/, or /v/-initial clusters are prohibited by markedness constraints in English, English speakers cannot apply the appropriate cluster coordination to these sequences. In order for these clusters to be added to the allowable inventory for English speakers, the coordination constraint CC-COORD must be ranked above the markedness constraints prohibiting the clusters. An analysis demonstrating this is developed in the next section. Finally, a discussion of consonant cluster coordination crucially depends on the ability to define gestures as having a coordination relationship between them. Just like rules or constraints determining what sequences are phonotactically legal in a language, gesturally-based theories must also have a mechanism for ascertaining which gestures can be associated in a relationship that is governed by a coordination constraint. In Section 5.3.2, Gestural Association Theory (GAT) is introduced in order to determine how gestures are related to one another.

Page 97: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

87

5.1. Loanwords: the interaction between phonology and non-native sequences As reviewed in Chapter 3, the role of the phonology in experimental production and perception of non-native sequences has recently received considerable attention (e.g. Broselow and Finer 1991, Hallé et al. 1998, Pitt 1998, Haunz 2002, Moreton 2002, Davidson et al. 2003). However, this work has been largely exploratory and only limited attempts to formalize the interaction of the native language grammar and non-native inputs have been made. One relevant area in which just this type of formal analysis has been developed more fully is loan phonology. The phonological changes that occur when foreign words are borrowed into a language are similar to those found in the production of phonotactically illegal words in second language acquisition or experimental tasks. Loan phonology may also be useful for shedding light on aspects of the native language grammar that may not be reflected in the native language lexicon; in the context of the experiment in Chapter 3, for example, English speakers exhibited different degrees of accuracy when producing fricative-initial consonant clusters even though none of them are allowed natively in the English phonology. Presumably, the distinctions that English speakers make in production are influenced by particular characteristics of their native phonology; that is, it could be the case that the same clusters would be treated differently by speakers of other languages. It has been claimed that similar effects are evident in loan phonology. The role of the grammar in its interaction with non-native sequences through loan phonology is summed up by Zuraw (2002):

Because of the diversity of the world’s phonologies, foreign words typically contain sounds, sound sequences, and prosody that must undergo ‘repair’ if they are to be usable in the receiving language. Because these foreign sound structures were likely absent from the data speakers encountered during acquisition of their native language, adapting loanwords requires speakers to apply their implicit phonological knowledge to novel situations, revealing aspects of the grammar that are underdetermined by the native-language learning data and are presumably the residue of the grammar’s initial (pre-learning) state and of learning strategies. The treatment of loanwords should therefore tell us something about universal grammar that is not evident from looking at the native phonology alone.

Evidence from loan phonology has been used throughout the development of generative phonology to uncover the details of many different types of phenomena, including phonotactic patterns, epenthesis and deletion, processes at morphological boundaries, reduplication, and stress (e.g. Saciuk 1969, Hyman 1970, Lovins 1974, Byarushengo 1976, Holden 1976, Adams 1977, Kaye 1981, Mehmet 1982, Poplack and Sankoff 1984, Silverman 1992, Paradis and LaCharité 1997). More relevant to the current study is the treatment of loan words in Optimality Theory. In an Optimality Theoretic grammar, it is claimed that the final state consists of a total ranking of constraints, whether they are crucial in determining the correct candidate during an optimization or not (Prince and Smolensky 1993, Tesar and Smolensky 2000). It is very difficult to determine the total ranking of a language given the type of data usually used to construct Optimality Theoretic grammars (Albro 2003), but data from loan phonology can be very informative regarding the ranking of what Zuraw (1996) calls invisible constraints. Invisible constraints are those constraints which are not in conflict with any other markedness or faithfulness constraint, and which do not seem to be violated by any of the

Page 98: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

88

presumed inputs for the lexical items of that language. However, evidence from loan phonology, second language acquisition, and experimental production has been used to support the claim that there are hidden rankings of invisible constraints, and that these can be uncovered by testing speakers with non-native structures (Zuraw 1996, Jacobs and Gussenhoven 2000, O'Connor 2002, Davidson et al. 2003). One phenomenon in loan phonology similar to hidden rankings is the core-periphery structure that has been reported for a number of languages with large numbers of loans in their lexicons (Itô and Mester 1995, Davidson and Noyer 1997, Fukazawa, Kitahara and Ota 1998, Itô and Mester 1999b, a). According to this view of phonological organization, the lexicon of a language is divided into different strata that are subject to particular rankings of constraints that pertain specifically to each stratum. Typically, strata stand in a core-periphery relationship such that a stratum is a proper subset of the ones outside of it. In Japanese, for example, it has been claimed that there are four identifiable lexical strata that correspond to the native lexicon and different levels of nativization of loanwords: native (Yamato), established loans (Sino-Japanese), assimilated foreign, and unassimilated foreign (Itô and Mester 1995, Fukazawa et al. 1998, Itô and Mester 1999b). The phonological characteristics of the lexical items from each strata indicate that Japanese contains a certain ranking of markedness constraints, and the more unassimilated (or peripheral) lexical items are established by positing stratum-specific rankings of faithfulness constraints that are interspersed with the markedness constraints. Crucially for Japanese, the ranking of the markedness constraints remains the same across strata. Without going more fully into the analysis in Itô and Mester (1999b), the ranking they propose to account for all four strata is presented in (1):

(1) SYLLSTRUC (*COMPLEX, CODACOND, etc) | FAITH/UNASSIMILATEDFOREIGN

*DD (“No voiced geminates”) | FAITH/ASSIMILATEDFOREIGN

*P (“No singleton-p”) | FAITH/SINO-JAPANESE

*NT (“Post-nasal obstruents must be voiced”) FAITH/YAMATO

What is interesting about the Japanese case is that it can be hypothesized that the relevant hierarchy of markedness constraints was present in the phonology of Japanese even before the first strata of borrowed words entered the Japanese lexicon. For example, both the Sino-Japanese and Assimilated Foreign strata have a morpheme /paN/, but it is only realized faithfully in the latter case (/paN/ [paN] “bread”); in the Sino-Japanese case, /paN/ “group” surfaces as [haN] (compare to: [ippaN] “group one”) (Itô and Mester 1999b). Because of the loanwords, the ranking of markedness constraints in (1), which might have once been a “hidden” ranking of constraints that did not pertain to the native (Yamato) lexical items of Japanese, is now an integral part of the synchronic grammar. I will argue later that hidden rankings and invisible constraints, much like those which are now considered an integral part of Japanese phonology, can also be probed by experimental methodologies like the one used in Chapter 3.

Page 99: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

89

In Section 5.3, the hidden markedness constraints necessary to account for English speakers’ production of phonotactically illegal word-initial clusters are presented. However, a phonological account can only be supported once a phonetic account of non-native cluster production is shown to be inadequate; this is discussed in the following section.

5.2. Justifying a phonological account The previous section introduced the claim that the performance differences of English speakers on the production of non-native word-initial clusters has a phonological—rather than phonetic—locus. However, this assumption cannot be taken for granted; it is plausible that the pattern of accuracy could be explained by purely articulatory factors without having to appeal to the phonology. It has already been recognized that speakers are not practiced in producing /f/, /z/, or /v/-initial obstruent clusters, so it may be that this lack of experience is instantiated primarily at the articulatory/motor level. There are a number of reasons to believe that this is not the case. Starting with /f/, English already contains some /f/-initial clusters, so it could be the case that speakers would be able to simply transfer the production plans for legal /f/-initial clusters to the target clusters in the experiment. For example, the articulatory motion necessary to produce /fn/ or /ft/ word-initially may be quite comparable to /fl/ since the transition from /f/ to a coronal requires tongue tip raising in all of these cases. However, these are not the only /f/-initial clusters used as experimental stimuli. Targets like /fk/ and /fm/, which are not articulatorily similar to initial clusters already existing in English are also used, and speakers do not seem to perform more accurately on /f+coronal/ sequences than they do on any other cluster in the /fO/ or /fN/ categories. Consequently, it does not seem likely that speakers are applying their knowledge of legal /f/-initial clusters to help them produce the non-native experimental stimuli. A similar “transfer-of-knowledge” explanation might also be expected to account for /z/-initial clusters. Of all legal fricative-initial clusters, only coronal (/s/-initial) clusters may combine with obstruents and nasals, and the oral motor pattern for producing /z/-initial clusters is hypothesized to be nearly the same as the one necessary for /s/-initial clusters. However, it has already been argued that what makes /z/ difficult is its glottal specification, not its oral configuration. Yet, if speakers are relatively “good” at producing /f/-initial clusters, and it can be said that voicing has a greater negative effect on the articulation of non-native sequences than place of articulation does, then it might be expected that speakers would be similarly inaccurate on all voiced-initial sequences, including the /z/-initial and /v/-initial clusters. Yet, the results show that speakers are significantly less accurate on /v/-initial clusters than any others, suggesting that speakers’ performance cannot be affected by voicing alone. A purely articulatory account of the data is also weakened by the fact that English contains many of the experimental sequences word-medially and word-finally, as in aft [ft#], offset [-fs-], heaved [vd#], husband [-zb-], and Mazda [-zd-]. In these kinds of medial sequences, the first consonant is not released nor does it have a transitional vocoid (Henderson and Repp 1982), indicating that there must be some overlap among the gestures. Even if the production of final and medial sequences is somewhat different than initial sequences, an articulatory account may predict that a speaker could appropriate the

Page 100: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

90

implementation necessary for the sequence in another position and apply it to the target words. However, the fact that speakers do not appear to be taking advantage of articulatory patterns they have already learned is consistent with the research showing that different syllabic positions have particular gestural organizations that are not necessarily interchangeable (Krakow 1989, Sproat and Fujimura 1993, Browman and Goldstein 1995, Byrd 1996a, Fougeron and Keating 1997, Kochetov to appear, see also Ussishkin and Wedel 2003a, b). Crucially, within Articulatory Phonology, gestural coordination corresponding to different syllabic positions is not a purely implementational factor, but rather something that is specified phonologically (Browman and Goldstein 1990b, Gafos 2002). That speakers are not able to simply transfer their knowledge regarding medial or final obstruent sequences to word-initial position is further evidence against a phonetic account. It should be recalled that the experiment in Chapter 3 was designed to test predictions regarding the phonological distinctions among the experimental items. Specifically, it was hypothesized that gestures containing the features [−strident] and [+voice,−son] are disadvantaged with respect to the optimal gesture /s/ in cluster-initial position, for perceptual reasons in the case of [−strident] and for articulatory reasons in the case of [+voice,−son]. As discussed in Section 3.2.1, /f/ is not favorable as the first segment of a word-initial cluster and is consequently rarely found in that position cross-linguistically because it is difficult to accurately perceive in the absence of the information that a following vowel provides (e.g. Harris 1958). /z/ is dispreferred in cluster-initial position because long sequences of voiced sounds, especially voiced obstruents, create a difficult aerodynamic environment in which the optimal condition for frication is in opposition with that for voicing (Ohala and Kawasaki-Fukumori 1997). Because /v/ is both [−strident] and [+voice,−son], it was suggested that the effects would be compounded and speakers’ performance on /v/-initial clusters would be even weaker than on /f/ or /z/ initial clusters. This is exactly what the results showed. It was also noted in Section 3.2.1 that there is no basis on which to posit an intrinsic phonetic relationship among clusters that are disadvantaged for perceptual reasons (like /fC/) and those that are dispreferred for their articulatory characteristics (like /zC/). Instead, the decision to favor either articulatory simplicity over acoustic salience or vice versa is a characteristic of phonological systems. It can be said that in some sense, the English grammar prefers sequences that are less perceptible to those that are articulatorily difficult, as evidenced by better performance on /f/-initial clusters. Other languages, however, may show a different pattern. In fact, typological evidence suggests that word-initial cluster inventories of various languages do indeed show various combinations in conformity with phonological predictions. As mentioned in Chapter 3, there are a number of languages that allow word-initial consonant clusters beginning with some combination of /s/, /f/, /z/, or /v/. However, while these languages have all of these fricatives in the phoneme inventory and they have fricative-initial clusters beginning with at least one of these phonemes, they do not necessarily have all of the possible combinations. A survey of 16 languages collated from Greenberg (1965), Morelli (1999), an inquiry posted on Linguist List (December 2, 2002), and other sources demonstrates that languages with all of these fricatives in the inventory may have /f/-initial clusters only, /z/-initial clusters only, or /f/, /z/, and /v/-clusters (although no language has /v/-initial clusters without also having /f/ and /z/, and

Page 101: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

91

vice versa). All of the 16 languages also have /s/-initial clusters in the inventory. A summary of the languages, their inventories, and some examples are shown in Table 10. Cluster Languages Cluster inventory Examples

Danish (M. Tjalve, p.c.)

/fn/ fnadder ‘slush’

Dutch (P. Boersma, p.c.)

/fn/ fnuiken ‘to cripple’

Norwegian (T. Kinn, p.c.)

/fn/ fnyse ‘snort’

/f/-initial only

Afrikaans (Kritzinger 1986)

/fn/ fnuik ‘clip the wings, frustrate’

Hebrew (Y. Falk, p.c.)

/zk zd zg zx zv zm zn/ zvuv ‘fly’, zxut ‘privilege’, zminut ‘availability’

Italian /zb zd zg zv/ zdentato ‘toothless’, zvenire ‘to faint’

Croatian (Bogadek 1985)

/zb zd zg zv zm zn/ zbirka ‘collection’, zveka ‘sound’, znan ‘well-known’

Lithuanian (Piesarskas 1995)

/zv/ zvimbti ‘hum, buzz’

/z/-initial only

Romanian (Schönkron 1967, D. Steriade, p.c.)

/zm zv zb zd zg/ zmeu ‘kite, dragon’, sbor [zb] ‘flight’, zvon ‘noise, rumor’

/f/, /z/-initial

None

Greek (Tserdanelis 2001, N. Tantalou p.c.)

/fθ fx ft/ /zF zm zv/ /vF vD/

ftErç ‘feather’ zminos ‘school (of fish)’ vDomaDa ‘week’, vFazo ‘I take out’

Tsou (Wright 1996)

/fs fz ft fk f/ fts fn/ /zv/ /vz vh vts/

ftuke ‘bent’ fsoi ‘plant name’ zviji ‘a kind of snake’ vtsoNˆ ‘spouse’

Czech (Kucera 1961, Cermak 1963)

/fp ft fk fs fm fn/ /zb zd zg zv zm zn/ /vb vd vg vz vn vm/

vsázka [fs] ‘wager’, vpad [fp] ‘raid’ zdar ‘success’, zmotati ‘to confuse’, vdaná ‘married’, vzor ‘pattern’

Russian (D. Collins, p.c., M. Gouskova, p.c.) (Note: C2 in all clusters can be palatalized, except /S/ and /Z/.)

/fp ft fk fs fS ftS/ /zb zd zg zv zZ zm zn/ /vb vd vg vz vZ vn vm/

ftoroj ‘second’, fsJE ‘all-pl.’ zvuk ‘sound’, znak ‘sign’ vnJE ‘outside’, vdJEtJ ‘to thread’

Polish (Stanislawski 1988, G. Jarosz, p.c.)

/fp ft fk fs/ /zb zd zg zv zm zn/ /vb vd vg vz vn vm/

wtorek [ft] ‘Tuesday’, wpaść [fp] ‘fall in’, zbadać ‘explore’, zmaza ‘blemish’, wzad [vz] ‘back’

/f/,/z/,& /v/-initial

Albanian (Kici 1976)

/fS ftS ft/ /zb zd zg zm zv/ /vn vd/

fshat ‘village’, ftekem ‘I think’ zgorre ‘skeleton’, zmadhim ‘enlargement’, vdekur ‘dead’

Page 102: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

92

Slovenian (Prekmurje dialect,) (A. Ferme and S. Zivanovic, p.c., M. Greenberg, p.c, Grad and Leeming 1994)

/fs fc fk fp ft/ /zb zd zg zm zn/ /vd vb vg vz vZ/

fkaniti ‘to cheat’, fsaki ‘every’ zbadati ‘to sting’, znaten ‘considerable’, vzeti ‘to take’, vdariti ‘to hit’

Table 10. Languages containing /f/, /z/, and /v/-initial clusters.

Each of these languages have these fricatives and /s/ in the phoneme inventory. They also all have /s/-initial clusters, not shown here. Each inventory may not be exhaustive, since the sources are not always complete.

Examples are given in orthography as used in a dictionary, or in transliteration. Brackets following an example give the pronunciation of a cluster when the orthography is not indicative of the surface form.

Typological data validates the notion that constraints in the phonology are independently necessary to account for the fact that the inventory of fricative-initial onset clusters differs among languages that have them. Underlying the phonological analysis developed in the next section is the assumption that these constraints are not only useful for determining cross-linguistic fricative-initial consonant cluster inventories, but also for accounting for the performance of English speakers. For this reason, in the discussion that follows, it will be assumed that the relevant constraints used in the analysis of English speakers’ production of Czech clusters already exist in the speakers’ grammar, even though they do not interact with other constraints and are seemingly not active given the lexicon of English. The simplest explanation for the presence of such constraints is that they are innate. However, nothing in the analysis hinges on an assumption of innateness; any explanation for how unviolated constraints might arise despite a lack of evidence from the lexicon would suffice. Since the origin of such constraints is not the main concern here, it will be left to future research. However, the origin of the rankings will be addressed in Section 5.4.3. Now that arguments against a purely implementational account of speakers’ errors have been presented, a phonological account of English speakers’ production of non-native word-initial clusters is developed in the next section.

5.3. Consonant cluster phonotactics in a grammar of gestural coordination

5.3.1. Constraints on word-initial clusters Like the analysis of voice neutralization in Lithuanian developed in Steriade (1997), the results of the study in Chapter 3 reflect the fact that speakers are sensitive not just to the initial member of the non-native cluster, but also to the combination of consonants found in individual clusters. Steriade posits a hierarchy of specific environments that provide more or fewer cues to a voicing distinction in obstruents, as shown in (2). (2) Environments for voice neutralization (O=obstruent, R=sonorants, #=word

boundary) Weaker cues Stronger cues #_O, O_# < R_O < R_# < _R < R_R

Page 103: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

93

When the environments in (2) are characterized in terms of constraints, the ranking of the faithfulness constraint PRESERVE[voice] determines which consonant sequences exhibit voice neutralization and which do not. In the example in (3), a voice contrast is licensed before sonorants and word-finally, such as is found in French. (3) *αvoice/#_[−son], *αvoice/[−son]_# ≫ *αvoice/[+son]_[−son] ≫

PRESERVE[voice] ≫ *αvoice/[+son]_# ≫ *αvoice/[+son]_[+son] The second consonant contexts in (2) are likely not just a continuum of environments that are better or worse hosts for voicing distinctions; they also have a similar effect for manner and place distinctions in the first consonant as well. That is, the transitional cues present when the second segment of the sequence is a sonorant provide information not just about the voicing of the first consonant, but also about manner and place. Even though consonants like fricatives or nasals are typically considered to have internal cues to their identity (Wright 1996), some members of these classes may have weaker internal cues than others, and these cues can be bolstered by information from the transition with a following sonorant. For example, transitions may help in the identification of place in nasals, since their most obvious spectral characteristic is weakened second and third formants, which are often perceptual cues to place (Borden and Harris 1984, Jun 1995). Just as the second consonant in a cluster determines how well the cues of the first consonant can be recovered, the intrinsic qualities of the first consonant also contribute to the overall goodness of the sequence. As discussed at length in Section 3.2.1, fricatives other than the voiceless strident /s/ are not good candidates for the first consonant of a word-initial consonant cluster, and are rarely found in that position cross-linguistically. In order to improve the articulatory and perceptual environment for non-strident and/or voiced fricative-initial clusters, the minimally disruptive solution is to ensure that the release of the first consonant is not obscured by plateau of the second consonant (i.e. the target, center, or release). Although it may be best to be followed by an approximant, at the very least the release of an obstruent provides further cues to the identity of the consonant (e.g. Dorman, Studdert-Kennedy and Raphael 1977, Blumstein and Stevens 1978, Steriade 1993). Furthermore, allowing the consonant to release is especially beneficial in the voiced case, since the conflicting aerodynamic situation created after the production of the first consonant is allowed to reset before the second is produced. Because the ultimate goal is to reconcile prohibitions on certain combinations of consonants with the mistiming repair that speakers exhibit, the constraints necessary for analyzing speakers’ behavior must combine perceptual, articulatory, and coordination elements. In a gestural framework, allowable and prohibited consonant clusters can be defined by positing a family of constraints called *OVERLAP/Fα,Fβ, which bans the overlap of two gestures that are specified for given phonological features. A first version of the constraint is defined in (4): (4) *OVERLAP/Fα,Fβ (Version 1): Do not overlap the release of a gesture specified for a

feature Fα with the plateau of a following gesture specified for a feature Fβ.

Page 104: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

94

This constraint is the gestural analogue of Steriade’s (1997) Licensing-by-Cue constraints, which identify the environments in which a particular type of segment is dispreferred.14 For example, a constraint like *OVERLAP/[−sonorant],[−sonorant] ultimately prohibits consonants such as stops followed by fricatives (e.g. /ps/ or /dv/) from being realized as clusters (with the canonical coordination) in a language. In the context of the non-native cluster production, this constraint represents the statement that obstruent gestures are preferentially followed by a sonorant gesture (or perhaps an approximant, Steriade 1993), since this would be the best perceptual and articulatory configuration. Note that here, traditional phonological features are employed in the feature specification of a gesture. In a purely gestural system, a constraint pertaining to stop-fricative sequences might be more properly stated as *OVERLAP/[Closure],[Critical], where these features correspond to the vocal tract variables in Articulatory Phonology. However, this would be adequate only if it were true that articulatory factors alone could be used in defining phonotactically well- and ill-formed sequences. But it has been argued in this dissertation that at least in the case of the fricatives, features that have an acoustic basis such as [±strident] must be representable as well. Thus, the *OVERLAP constraint can be considered a hybrid constraint that combines both perceptual and articulatory factors in its featural specification. Furthermore, perceptual and articulatory correlates can be represented by these traditional binary features. If binary features are related to a functional origin, a feature like [−strident] could refer to a poorly perceptible, weak-intensity fricative. Another point regarding the formulation of the *OVERLAP constraints concerns the role of natural classes. Although there might be advantages that come from being able to make reference to and combine individual articulators, constriction locations, and constriction degrees, the larger class groupings that can be stated with traditional phonological features are not so easily defined in Articulatory Phonology. Thus, a language that prohibits stops from combining with stops and fricatives as the second member of a cluster but allows nasals, liquids and glides as second members cannot be simply represented using the vocal tract variables in Articulatory Phonology because there is no one value corresponding to [−sonorant]. That is, while the single constraint *OVERLAP/[−sonorant],[−sonorant] bans all stop+obstruent clusters, Articulatory Phonology would require both *OVERLAP/[Closure],[Closure] and *OVERLAP/[Closure],[Critical]. Ultimately, it may be the case that this is the type of detail represented in the phonological system, but until more evidence is found, it will be assumed that it is preferable to posit fewer constraints. It should be noted that previous formulations of OVERLAP constraints have sometimes been affirmative statements that are the opposite of those proposed here, like 14 Certain constraints presented in this section serve a similar purpose as constraints which have already been proposed in the literature. *OVERLAP constraints, for example, are akin to licensing constraints like *αvoice/#_[−son] or phonotactic constraints like *NC̥ (Pater 1999). Ultimately, in order to account for many phonological phenomena in the framework advocated in this dissertation, more traditional constraints will have to be formulated so that they refer to gestural elements. A phonological framework based on gestural concepts does not eradicate the intent of traditional constraints that pertain to established phonological processes, but will rather incorporate them into a framework based on gestures that also includes temporal coordination as a phonological phenomenon. In a related example, the ASSOCIATION constraints presented in Section 5.3.2 lay the foundation for a gesturally-based syllable theory that can eventually be developed more fully through the examination of more and different types of evidence.

Page 105: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

95

that proposed by Cho (1998) and adopted by others (Hsu 1996, Bradley 2002): “Adjacent consonantal gestures must be overlapped.” However, as mentioned by Bradley (2002) in a footnote, very general positive statements of overlap might be better cast in terms of coordination constraints which more precisely define the relationships between gestures in lexical items. That is, rather than just stating the principle that gestures should be overlapped, a more principled analysis will define the specific coordination relations that gestures have, and how the coordination of different locations can account for the types of surface forms found cross-linguistically. This is the approach pursued in this chapter, using COORDINATION constraints like those proposed in Gafos (2002). In order to account for the experimental production results from Chapter 3, both of the consonants in the word-initial cluster must be taken into consideration. Figure 12 from Section 3.3.2.3 is repeated in Figure 33 below as a reminder of the participants’ performance in the context of both the first and second consonants.

0.97

0.75

0.53 0.56

0.31 0.31

0.18

0.000.100.200.300.400.500.600.700.800.901.00

sC fN fO zN zO vN vO

Prop

ortio

n co

rrec

t

Figure 33. Performance on clusters broken down by context category

The distinction that speakers make among /f/, /z/ and /v/-initial clusters in performance can be captured by ranking *OVERLAP constraints that pertain to these sequences. Like loanwords that provide information about invisible constraints, English speakers’ performance on fricative-initial clusters exposes crucial rankings of the *OVERLAP constraints. The constraints that pertain to /f/-initial clusters can be defined roughly as in (5). This definition of the constraints is preliminary, and will be refined as other important factors come to light. (5) a. *OVERLAP/[−strident],[−sonorant]: Do not overlap the release of a non-strident

fricative with the plateau of an obstruent. (abbreviation: *OV/[−strid],[−son]) b. *OVERLAP/[−strident],[−approximant]: Do not overlap the release of a non-strident fricative with the plateau of a non-approximant. (*OV/[−strid],[−approx])

In the constraints in (5)a and (5)b, the lone feature [−strident] is posited as Fα to refer to non-strident fricatives, since this feature typically is only necessary for distinguishing between different kinds of fricatives (Morelli 1999, Lombardi 2001). The Fβ feature [−sonorant] indicates that non-strident fricatives should not be followed by

Page 106: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

96

another obstruent in a cluster. In (5)b, the feature [−approximant] encompasses stops, fricatives and nasals. Since clusters like /fl/, /fr/ and /fj/ are legal in English but /fk/, /fs/, /fm/, etc. are not, this broader constraint is warranted. The constraint in (5)a includes a subset of the clusters covered by (5)b, and stands in a stringency relation with it (Prince 1997, de Lacy 2002). That is, in order for *OVERLAP/[−strident],[−sonorant] to single out a group of clusters that is different from those that *OVERLAP/[−strident],[−approximant] pertains to, *OVERLAP/[−strid],[−son] must outrank *OVERLAP/[−strid],[−approx]. A stringency relation is similar to a universal ranking, except that it relates the features to one another in a scale that is incorporated into the constraints. These are called scale-referring constraints (de Lacy 2002). In the example given here, although the constraints are not fully formulated as scale-referring, the ranking || *OVERLAP/ [−strident],[−sonorant] ≫ *OVERLAP/[−strident],[−approximant] || is a Paninian one, since any violation of the former constraint also entails a violation of the latter. 15 The constraints that concern /z/ are slightly more complicated since they entail using two features to represent one gesture: [+voice, +continuant].16 Referring only to the feature [+voice] would be too broad, since non-obstruents are preferentially voiced. Thus, the feature bundle [+voice, −sonorant] must be specified, on the expectation that all voiced obstruents are treated similarly in a language. While this may not ultimately hold up cross-linguistically, it will be taken to be true for English based on the results of Davidson, Jusczyk, and Smolensky (2003) (see Section 3.1.2). The first version of the constraints regarding /z/-initial clusters is shown in (6): 15 These constraints may be more precisely stated as scale-referring markedness constraints, which are defined by de Lacy (2002) as the following:

Featural scale-referring markedness constraints (a) For every element p in every scale S, there is a markedness constraint m. (b) m assigns a violation for each segment that either

(i) contains p or (ii) contains anything more marked than p in scale S.

Scale-referring markedness constraints are a part of the Stringent Theory developed to impose harmonic ordering on the elements of a scale without having to posit fixed rankings. Currently, the *OVERLAP constraints do not quite conform to this definition, since both Fα and Fβ would have to incorporate scales (and at present only Fβ does so). de Lacy argues that scale-referring markedness constraints are preferred to fixed hierarchies because they are freely rerankable, and that only such stringent constraints can account for the types of conflations of marked elements found in different types of inventories cross-linguistically. Conflation refers to the ability of individual grammars to collapse distinctions in a scale when elements are treated as part of a group by a phonological process. Although this approach is likely applicable to the case of initial fricative clusters presented in this chapter, its further development will be left until a more comprehensive study of initial obstruent clusters can round out the necessary scales. 16 Under the current assumptions of Articulatory Phonology, /z/ is the single gesture Tongue Tip Critical, not a constellation of gestures since voicing is not specified with its own glottal gesture (as opposed to /s/, which is composed of Tongue Tip Critical and Wide Glottis). As discussed in Chapter 2, AP considers voicing to be the default state of the glottis in speech ready state. Whether or not this is accurate, or whether both voiced and voiceless gestures should be represented in the gestural score is still unresolved. Furthermore, it is in opposition to the traditional phonological conception of voice, which considers [+voice] to be marked for obstruents and therefore requires it to be specified in the underlying representation. The fact that [+voice] plays an important role in determining speaker’s behavior on /zC/ sequences is suggestive of the need to represent both voicing (glottal adduction) and voicelessness (glottal abduction) as a glottis tract variable in the gestural score. Although full support of this claim requires more empirical evidence, it will be tentatively assumed that both [+voice] and [−voice] can be referred to in constraints.

Page 107: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

97

(6) a. *OVERLAP/[+voice,−sonorant],[−sonorant]: Do not overlap the release of a voiced

obstruent with the plateau of an obstruent. (*OV/[+voi],[−son]) b. *OVERLAP/[+voice,−sonorant],[−approximant]: Do not overlap the release of a voiced obstruent with the plateau of a non-approximant. (*OV/[+voi],[−approx])

Given the definitions in (5) and (6), /v/-initial clusters will cause violations of both sets of these constraints since they are both [−strident] and [+voice,−sonorant]. A look at the inventories in Table 10 shows that this is precisely what is expected, and it follows that there are no languages that have /v/-initial clusters without /f/- and /z/-initial clusters, or /f/- and /z/-initial clusters without also having /v/-initial ones. The initial fricative cluster scale can be stated as in (7). (7) | sC ≻ fC, zC ≻ vC |

This scale indicates that /sC/ is always most harmonic, /vC/ least harmonic, and /fC/ and /zC/ are equally harmonic, but can be treated as more or less marked than one another by particular languages. Abstracting away from the influence of the second consonant, possible rankings of the *OVERLAP constraints proposed so far predict that languages can have any of the word-initial cluster inventories shown in (8). (8) a. /sC/

b. /sC/, /fC/ c. /sC/, /zC/ d. /sC/, /fC/, /zC/, /vC/

The inventories in (8) result from the ranking of three different types of *OVERLAP constraints: *OV/[−strid],[+cons], *OV/[+voi],[+cons] and a general *OVERLAP/[+cons],[+cons] constraint that pertains to any kind of overlapping consonants. Currently, this is posited as a cover constraint similar to the traditional OT constraint *COMPLEX which prohibits complex consonant clusters. The inventories of languages that do not have a general ban on clusters result from ranking *OV/[+cons],[+cons] below more specific *OVERLAP constraints and below any faithfulness or coordination constraints. In the particular example of fricative-initial clusters addressed here, ranking *OV/[+cons],[+cons] below the other *OVERLAP constraints produces an inventory which allows /sC/ cluster and prohibits /fC/, /zC/ and /vC/. The typologically possible rankings are shown in Table 11 along with example languages.

Page 108: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

98

RANKINGS CLUSTERS ADMITTED EXAMPLE LANGUAGES

FAITH ≫ *OV/[−strid],[+cons], *OV/[+voi],[+cons] ≫ *OV/[+cons],[+cons]

/sC/, /fC/, /zC/, /vC/ Greek, Czech, Russian

*OV/[−strid],[+cons] ≫ FAITH ≫ *OV/[+voi],[+cons] ≫ *OV/[+cons],[+cons]

/sC/, /zC/ Hebrew, Croatian, Romanian

*OV/[+voi],[+cons] ≫ FAITH *OV/[−strid],[+cons] ≫ *OV/[+cons],[+cons]

/sC/, /fC/ Dutch, Norwegian, Danish

*OV/[−strid],[+cons], *OV/[+voi],[+cons] ≫ FAITH *OV/[+cons],[+cons]

/sC/ Hindi, Telugu (Dravidian), English

*OV/[−strid],[+cons], *OV/[+voi],[+cons] ≫ *OV/[+cons],[+cons] ≫ FAITH

none Chinese, Axininca Campa

Table 11. Ranking typology to account for fricative-initial onset cluster inventories

In the column labeled rankings, the notation A, B ≫ C indicates that A and B are ranked with respect to one another, but it does not matter whether the ranking is A ≫ B or B ≫ A.

Table 11 demonstrates that the current constraints adequately account for the typological data. However, there is a discrepancy between the typological data and the experimental results, which show that speakers’ accuracy is significantly different for each type of fricative-initial cluster. Given the simple rankings of the current constraints in Table 11, if a speaker can accurately produce /fC/ and /zC/ clusters in the experimental task, she must also be able to produce /vC/ clusters. However, the situation is in fact more complicated. Speakers are significantly less accurate on /vC/ clusters than they are on either of the other non-native clusters, which indicates that the current set of constraints is not sufficient to account for behavior in the experimental condition. One way to account for the distinction found in the experimental data is to posit a fourth constraint that can be ranked above both *OV/[−strid],[−approx] and *OV/[+voi],[−approx].17

17 Theoretically, it is possible to account for less accurate performance on /vC/ clusters with only the *OVERLAP constraints proposed so far. This is discussed in Appendix 2, which also shows how the alternative analysis cannot accurately account for the data in this dissertation.

Page 109: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

99

While there is no a priori markedness relationship between /f/-initial and /z/-initial clusters, each of the experimental hypotheses in Section 3.2.1 predicted that speakers’ accuracy on /v/-initial clusters would be the lowest of all since /v/ is both [−strident] and [+voice]. /v/’s violation of both the *OVERLAP/[−strident],Fβ and *OVERLAP/ [+voice,−sonorant],Fβ constraints is a worst-of-the-worst situation that distinguishes it from the other fricatives. Inventories that ban only the worst-of-the-worst are common and are dealt with in Optimality Theory with locally conjoined constraints (e.g. Smolensky 1995, Kirchner 1996, Alderete 1997, Smolensky 1997, Lubowicz 1998, Moreton and Smolensky 2002, but see Padgett 2001 for an alternative view). A local conjunction is a constraint formed through the combination of two lower-ranked constraints that have in common the same domain of application. When each of these constraints is violated separately, their violations are not enough to be fatal, but when the conjoined constraint is violated, the candidate which violates it cannot be optimal. For example, in English, the velar nasal /N/ is not allowed in onsets, but neither velars nor nasals per se are prohibited in that position. Thus constraints like *ONSET/velar (“don’t have velars in onset position”) and *ONSET/nasal (“don’t have nasals in onset position”) might be posited, but violations of these constraints will not rule out /N/ onsets. This is demonstrated in the tableau in (9).

(9) IDENT *ONSET/velar *ONSET/nasal a. /kot/

kot * tot *!

b. /not/ not * tot *!

c. /Not/ Not * * not *!

The ranking in the tableau in (9) clearly produces the wrong optimal output for the input /Not/ in (9)c. However, the correct output for this input can be obtained if the two ONSET constraints are conjoined to specifically ban onsets that are both nasal and velar. The locally conjoined constraint *ONSET/velar & *ONSET/nasal can be rewritten as *ONSET/velar&nasal. In order for the conjoined constraint to have an effect in the phonology, it must be ranked above the singleton constraints. The tableau from (9) is repeated in (10), with the addition of the conjoined constraint which provides the correct output [not] for /Not/.

Page 110: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

100

(10) *ONSET/velar&nasal IDENT *ONSET/velar *ONSET/nasal a. /kot/

kot * tot *!

b. /not/ not * tot *!

c. /Not/ Not *! * * not *

Like the example of /N/, /v/-initial clusters can be ruled out by conjoining the two lower-ranked *OVERLAP constraints. Using the abbreviated notation for the conjoined constraint, the constraints pertaining to /v/-initial clusters are defined in (11): (11) a. *OVERLAP/[−strident]&[+voice,−sonorant],[−sonorant]: Do not overlap the

release of a non-strident and voiced fricative with the plateau of an obstruent. (*OV/[−strid]&[+voi],[−son]) b. *OVERLAP/[−strident]&[+voice,−sonorant],[−approximant]: Do not overlap the release of a non-strident and voiced fricative with the plateau of a non-approximant. (*OV/[−strid]&[+voi],[−approx])

With the addition of the locally conjoined constraint, there is now a distinction between what is found typologically and the inventories that are possible given these four types of *OVERLAP constraints. The existence of the locally-conjoined constraint predicts typologically that there should be languages which allow /fC/ and /zC/ but not /vC/. However, this is only true if conjoined constraints are innate. It is possible that such a language exists and has simply not yet been recorded. On the other hand, if conjoined constraints are language specific or can somehow be constructed “on the fly” from pieces of other constraints during situations like second language learning or experimental tasks, then it could be possible to have a discrepancy between the inventories attested cross-linguistically and the behavior seen in the experimental condition. At this moment, there is no solution to this problem, nor will proposals regarding the origins of local conjunctions be offered. For the purposes of the analyses in the following sections, it will be assumed that the English speakers in this study have access to the locally conjoined constraint prohibiting /v/-initial clusters. As already noted, the feature Fα in the *OVERLAP constraints is not the only aspect of these constraints requiring fixed or stringent rankings. In addition, the features [−approximant] and [−sonorant] in Fβ must similarly stand in a markedness relation. Based on Steriade (1997), it is assumed that obstruents are least preferred as the second element of a cluster, followed by nasals, and finally by approximants. In this study, it is necessary to distinguish between (a) approximants such as /l/ and /r/, which typically are legal second elements of obstruent-initial clusters, (b) nasals, which are not phonotactically legal but are more accurately produced in the experimental task, and (c) obstruents, which are likewise not legal and are also the least accurately produced. In

Page 111: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

101

order to distinguish between nasals and obstruents, the features [−approximant] and [−sonorant] have been proposed. The scale defining the relationship between two features for Fβ relevant to this study is given in (12). (12) | [−approximant] f [−sonorant] | When related to a ranking of constraints, the scale for Fβ entails that *OVERLAP constraints containing [−sonorant] as Fβ be at least equally ranked with or higher-ranked than the corresponding constraint with [−approximant] as Fβ. “Corresponding constraints” refer to those constraints that have Fα in common. Thus, the rankings || *OV/[−strid],[−son] ≫ *OV/[−strid],[−approx] || or || *OV/[+voi],[−son] ≫ *OV/[+voi],[−approx] || could be found in some language, or they may not be ranked with respect to one another. When constraints have different Fαs, there is no intrinsic relation between them. For example, constraints like *OV/[−strid],[−approx] and *OV/[+voi],[−son] may be ranked relative to one another in one language, and not ranked with respect to one another for a different language. Given the experimental results from Chapter 3, it is clear that under certain conditions, speakers are able to “bypass” their native language grammar in order to produce structures that are phonotactically illegal. However, the method by which they can do this is principled: not all word-initial sequences are equally likely to be produced accurately. The fact that there are still restrictions on which sequences can be successfully produced suggests that the grammar is playing an active role even though none of these sequences are found in English. Although the ranking of the *OVERLAP constraints pertaining to the experimental clusters is not detectable from English lexical items, the experimental performance does provide evidence for hidden rankings of the *OVERLAP markedness constraints. Experimental production results from Chapter 3 are reviewed in (13), and a preview of the hidden ranking necessary for capturing the production facts is given in (14). It should be pointed out that the hierarchy in (14) is the only possible ranking in which only two stipulations are required: (a) all constraints are active; therefore, by Panini’s Theorem, we must have S ≫ G where S, G are the special, general constraints in a stringency relation (and hence *OV/Fα,[−son] ≫ *OV/Fα,[−approx] and also A&B ≫ A) and (b) *OV/[+voi],[−son] ≫ *OV/[−strid],[−son]. A more detailed exposition of these rankings and how they reflect the experimental production facts will be given once the relevant repair of illegal clusters is discussed in the next section. (13) better accuracy worse accuracy

sC f fN f fO = zN f zO = vN f vO (O=obstruent, N=nasal, C=consonant)

Page 112: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

102

(14)

*OV/[−strid]&[+voi],[−son] ≫

*OV /[−strid]&[+voi],[−approx], * OV /[+voi],[−son] ≫

*OV/[+voi],[−approx], *OV/[−strid],[−son] ≫

*OV/[−strid],[−approx] ≫*OV/[+cons],[+cons] .

5.3.2. Gestural Association Theory and syllable structure The results of the consonant cluster production experiment in Chapter 3 demonstrate that English speakers producing phonotactically illegal word-initial consonant clusters show different levels of accuracy depending on how marked the cluster is. The findings of the ultrasound experiment in Chapter 4 indicate that when they are not able to produce the sequence correctly, it is not because they are epenthesizing a phonological schwa, but rather because they are failing to accurately coordinate the consonantal gestures. In other words, correct cluster coordination in English requires the consonants to be overlapped, but when particular *OVERLAP constraints are high-ranked in English, certain non-native sequences are prohibited by the grammar. The idea that coordination between consonants in a cluster and between consonants and vowels is determined by the grammar was originally proposed by Browman and Goldstein (1986 et seq.), and formalized by Gafos (2002), who showed that differences in the surface form of various types of consonant clusters in Moroccan Colloquial Arabic (MCA) can be best accounted for by positing different underlying coordination relationships. In Moroccan Arabic, heterorganic sequences are produced with a transitional schwa between the two consonants; this schwa can be analyzed as non-segmental, much like the schwa in Piro or Sierra Popoluca (see Section 4.1.1). Homorganic sequences also have transitional schwas, but their configuration is not exactly the same as that for heterorganic sequences. Gafos argues that the presence of transitional schwas is due to a non-overlapping consonant coordination that differs depending on the consonants are hetero- or homorganic. For heterorganic sequences, the standard coordination relationship defined for the language is sufficient to produce an excrescent schwa between the two consonants. This is shown in (15) (repeated from Section 2.1.3): open vocal tract (15) CC-COORD in MCA: ALIGN (C1, center, C2, onset) C1 C2 Homorganic sequences cannot satisfy the CC-COORD constraint, because this amount of space between release of the first consonant and the target of the second is not sufficient for the tongue to return to a neutral position after the production of the first

Page 113: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

103

consonant before having to move to the same place of articulation for the second consonant. Gafos treats this configuration as an OCP violation, which can be avoided by producing homorganic sequences with less overlap so that the tongue has time to move away from the constriction for the first consonant before producing the same constriction again. Thus, in order to satisfy the OCP, CC-COORD is violated. The resulting configuration for homorganic sequences is shown in (16). open vocal tract (16) C1 C2 for homorganic C1, C2

Unlike Moroccan Arabic, English consonant clusters have a close transition, so the release of the first consonant must be overlapped by the target of the second consonant (Henderson and Repp 1982, Catford 1988), as repeated from Section 2.1.3. (17) CC-COORD in English: ALIGN (C1, release, C2, target) C1 C2 Similar to Moroccan Arabic speakers, however, English speakers faced with /f/, /z/ and /v/-initial consonant sequences must violate CC-COORD in English to satisfy the *OVERLAP constraints that prevent phonotactically illegal sequences. While it is not clear how the amplitude and duration of the transitional schwa that English speakers produce between the two consonants is related to the amount of open vocal tract or “space” found between the two consonants, it is at least the case that speakers cannot overlap the release of the first consonant with the target of the second. If the grammar does not prescribe a particular repair, and the only requirement is that *OVERLAP cannot be violated, then speakers’ outputs could actually conform to a number of different surface coordination relations. A few possible cases for the surface gestural configurations are shown in (18). (18) a. b. c. Gafos (2002) proposes that there is a temporal distance τ, corresponding to the distance between the c-center of a consonant and either the target or release, that is the “minimal unit of temporal distance employed in gradient evaluation of coordination constraints (279).” This means that as the coordination between two consonants increases by a multiple of τ, it will cause another violation of CC-COORD. If gradient evaluation is correct, then only the repair in (18)a will be permitted by the ranking *OVERLAP ≫ CC-COORD, since (18)b and (18)c will necessarily incur more violations of CC-COORD than (18)a does. In the analysis presented in this chapter, it will be assumed that (18)a is the

Page 114: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

104

result when CC-COORD is violated in order to satisfy *OVERLAP, but caveats to this assumption are discussed in Section 5.3.2.1. The goal of Gafos (2002) is to determine the relationship between bases and derived forms in Moroccan Arabic, where the input representation is the base and the output candidates are derived forms which include different possible coordination relations as well as other types of repairs like epenthesis and deletion. Some examples of the data Gafos accounts for is shown in (19). (19) /smin/ ‘fat’ [smim´n] ‘fat, diminutive’ (diminutive template: /CCiCC/) /fddan/ ‘field’ [fdad´n] ‘field, plural’ (plural template: /CCaCC/) Because the cases on which he focuses his analysis have final consonant clusters that arise from the template for the derived word, Gafos does not specifically discuss the formation of syllable structure in a grammar with gestural elements. This is an extremely important point, however, especially for languages in which morphology is concatenative rather than templatic. In other gesturally-based proposals, it has been explicitly assumed that output syllabic affiliations are lexicalized, not determined by the phonology (e.g. Browman and Goldstein 1995, Byrd 1996b, Cho 1998, Browman and Goldstein 2001, Bradley 2002). However, lexicalization of syllabic affiliation cannot be an accurate analysis, since many morphological and phonological processes at morpheme edges would require a reassignment of syllabic affiliation. Unless these authors are also willing to defend the position that all derived and inflected words are fully stored in the lexicon, then the position that surface syllabic affiliation is lexically determined is untenable. For example, principles of syllabification will determine whether a stem-final consonant is in the coda or the onset, which could have consequences for its coordination and ultimately its pronunciation when it is in contact with a following morpheme. Even in the case of consonant clusters in English, a gestural correlate of syllable structure is necessary because it will ultimately account for the fact that some consonant sequences can occur word-medially but not word-initially. In order to address this issue, Gestural Association Theory (GAT) is proposed here as a phonological mechanism for establishing when a consonant and vowel or consonant and consonant are governed by a coordination relationship. Under GAT, it is assumed that (a) the existence of a coordination relationship between a given consonant and vowel or consonant and consonant is not in the input, and (b) GEN allows for a number of possibilities. GAT is implemented in a constraint-based phonology through a number of ASSOCIATE constraints, which establish which gestures are in a coordination relationship with one another.18 Associations must be created for series of gestures which are governed by COORD constraints, namely sequential CV, CC, and VC gestures. Recall from Section 2.1.3 that sequences of C1C2V gestures will require both association of the C1C2 sequence, and separate association of C1↔V and C2↔V. Browman and Goldstein (2001) propose that one of the differences between the production of syllable-initial and syllable-final clusters is that in addition to the CC-coordination relationship, both consonantal gestures in the 18 The idea of this kind of association was first suggested by Browman and Goldstein (1990b), who mention that specific coordination relationships exist only among associated gestures, although they do not discuss how the associations are established.

Page 115: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

105

initial cluster also have independent relationships with the following vowel, whereas only the consonant closest to the vowel in a final cluster is coordinated with the vowel. Final consonants do have a CC coordination relationship, however. The claim about consonants in an initial cluster comes from the finding that the articulatory unit most invariantly timed with respect to the following vowel in a CC(C0)V sequence is the c-center of the consonant sequence, or the mean of all the midpoints of the gestures in that sequence (Browman and Goldstein 1988, see also Section 2.1.3). The fact that onset clusters must conform to both CC-COORD and CV-COORD constraints results in competing coordination pressures, since full satisfaction of CV-COORD by both consonants would require them to be totally overlapped. However, complete overlapping would prevent the recoverability of one or more of the consonants (Mattingly 1981, Wright 1996, Silverman 1997, Chitoran, Goldstein and Byrd in press). According to Browman and Goldstein (2001), the recoverability requirement indicates that the CC-COORD relationship must be the stronger one, although Gafos (2002) attributes this to a separate recoverability constraint. Like CC-COORD, the details of the coordination relationships among consonants and following vowels is defined by the constraint CV-COORD, first introduced in Section 2.1.3 and repeated in (20). (20) CV-COORD (for English): ALIGN(C, center, V, onset) C V Associations, or the gestural correlates of syllable structure, are determined by a series of ASSOCIATE constraints, as defined in (21)-(24). (21) ASSOC(IATE)-CV: A consonant gesture must have a coordination relationship with

the nearest following vowel gesture. (22) ASSOC(IATE)-CC: A consonant gesture must have a coordination relationship with

adjacent consonant gestures. (23) ASSOC(IATE)-C: A (non-nuclear) consonant gesture must not be unassociated. (24) *MULT(IPLE)ASSOC(IATION): A consonantal gesture must not be associated with

multiple vowels. Consonantal gestures associated with different vowels must not be associated with one another.

The constraint ASSOC-CV is straightforward, ensuring that coordination relationships are formed between adjacent CV sequences. One aspect of this constraint that may be slightly less intuitive is that ASSOC-CV pertains to all pre-vocalic consonants, whether they are adjacent to the vocalic gesture or not. This means that consonants will be parsed as onsets unless there is a markedness constraint prohibiting complex onsets (which would cause ASSOC-CV to be violated). Sample association topologies conforming to ASSOC-CV are shown in Figure 34(a)-(c); the case in Figure 34(d) entails a violation of ASSOC-CV.

Page 116: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

106

C C1 C2 C1 C2 C1 C2 V V1 V1 V2 V1 V2

(a) Simple onset association

(b) Complex onset word-initial association

(c) Complex onset word-medial association (onset maximization)

(d) Word-medial association (when markedness prevents association of C1 and C2)

CC-association CV or VC association

Figure 34. Possible association topologies for onset clusters Because languages do not typically exhibit coda maximization, there is no corresponding constraint for VC sequences. However, some mechanism is necessary to guarantee that consonantal gestures will be associated as codas rather than not associated at all. This is formulated as the constraint ASSOC-C in (23), which penalizes unassociated consonant gestures. For example, ASSOC-C guarantees that if a consonant cannot be syllabified as part of the onset because it will violate some markedness constraint, then it will be syllabified as a coda (i.e., as in the case of singing [sIN.IN] in English; see (d) in Figure 34). Because many languages allow syllabic consonants, for the time being the definition refers to non-nuclear consonants. Ultimately the definition will need to be refined, but it is not a central concern to the present analysis. One possible alternative to ASSOC-C is a simple constraint ASSOCIATE, which may be defined as “No gesture can remain unassociated”. However, this would penalize syllables containing only a vowel, which is not the intention of such a constraint. The constraint ASSOC-CC requires that adjacent consonantal gestures form an association. This can pertain to sequences of either prevocalic or postvocalic consonants. Note that currently, this constraint is not formulated as “Consonants associated with the same vowel must have a coordination relation with one another”, because in the coda, only the immediately postvocalic consonant has a coordination relationship with the vowel. In order to ensure that only adjacent consonants that are either both in the onset or both in the coda are associated, the constraint *MULTASSOC is proposed. This constraint is intended to rule out the situation in which a consonant C1 associated to a vowel V1 is also associated with a consonant C2 that is not associated with V1, but only with a different vowel V2. This configuration is shown in Figure 35 (see Section 2.1.1). C1 C2 CC association CV- or VC association V1 V2

Figure 35. Example association topology for onset and coda clusters that violates *MULTASSOC

*MULTASSOC will be low-ranked in languages that require ambisyllabicity for metrical or other reasons. Under most circumstances, the pressures of both *MULTASSOC and ASSOC-CV will force consonantal gestures to be syllabified as onsets rather than codas. Furthermore, ASSOC-CC is necessary because only gestures that are associated are subject to coordination constraints. In a language that allows complex margins, these

Page 117: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

107

three constraints will have crucial interactions with constraints like *OVERLAP, which prevent certain phonotactic configurations.19 Now that the concept of association has been introduced, a caveat to both the *OVERLAP and COORDINATION constraints should be noted. First, following the findings regarding the influence of syllabic organization on gestural coordination (e.g. Krakow 1989, Sproat and Fujimura 1993, Browman and Goldstein 1995, Byrd 1996a, Fougeron and Keating 1997, Kochetov to appear), it is not desirable for consonants or consonants and vowels that are not associated with one another to be subject to COORDINATION constraints. These studies have shown that it is the syllabic position of a consonant that determines how it is coordinated with respect to the preceding or following vowel. In English, for example, the particular coordination of onset /l/ is what makes it a light /l/, whereas the coda /l/ has a different coordination that is perceived as dark (Sproat and Fujimura 1993). Another example is that of /s/-initial onsets: when a stop is the second member of an s-cluster, it is not aspirated. This is a result of the particular coordination relationship between the two consonants in such a cluster (Kingston 1990). The general point to be taken from this research is that particular coordination relationships are defined for consonant-consonant or consonant-vowel gestures that have a syllabic relationship, but have no defined relationship when they are heterosyllabic. Consequently, it follows from the concept of association that only associated gestures should have coordination relationships. Likewise, if the purpose of *OVERLAP is to serve as a markedness constraint prohibiting specific combinations of gestures from co-occurring in the onset, then it must be specified as pertaining only to associated gestures. The general definition for the *OVERLAP constraints can be reformulated as in (25). (25) *OVERLAP/Fα,Fβ (version 2): Do not overlap the release of a gesture specified for a

feature Fα with the plateau of a following associated gesture specified for a feature Fβ.

A similar modification can be made to COORDINATION constraints. Gafos (2002: 278) proposes the generalized definition for the ALIGNMENT constraints that instantiate coordination in (26), which can be revised as in (27).20

19 The intention of the constraints ASSOC-CC and *MULTASSOC departs slightly from the proposal detailed in Gafos (2002). In his analysis, at least one member of an intervocalic consonant cluster is permitted to be associated with both of the vowels surrounding it, as in Figure 35. Gafos proposes this kind of topology to demonstrate why the coordination relations of final consonant clusters give rise to a transitional schwa, but word-medial clusters do not. However, this step may be unnecessary, as becomes evident from Gafos’s analysis of clusters of identical consonants, which also have a transitional schwa word-finally but lack one word-medially. To explain this case, Gafos suggests that there is a V-V relationship governed by a constraint VV-COORD which requires overlap of consecutive vowels. Apparently then, satisfying VV-COORD could lead to an overlap of the consonants in a word-medial cluster, whether or not they are somehow coordinated with one another, either directly or through multiple association. If VV-COORD is enough to account for the more particular case of overlapping identical gestures, it can also account for overlapping heterorganic gestures without requiring medial consonants to be associated to two vowels. 20 It should be noted that the use of alignment in COORDINATION constraints here and by Gafos (2002) differs from the standard definition of alignment in that COORD constraints are not violated if there is a G1 gesture but no G2 present in the output string. The constraints as defined here state that if there are two

Page 118: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

108

(26) ALIGN(G1, landmark1, G2, landmark2): Align landmark1 of G1 to landmark2 of G2 (27) ALIGN(G1, landmark1, G2, landmark2): Align landmark1 of G1 to landmark2 of

associated G2 The phonological outputs of a framework incorporating coordination must include the temporal landmark information that is referred to in coordination constraints. According to GAT, the inputs are a linear sequence of gestures that are not yet either associated or coordinated, since forming these relationships is a process that takes place in the phonology.

5.3.2.1 Analysis of phonotactically legal forms In order to illustrate both the inputs and outputs in a constraint-based theory of coordination, sample candidates for the English word spa are shown in (28). As shown in the cell labeled “Input”, the input gestures are sequentially arranged, but are not coordinated. The separation of consonants and vowels onto different lines reflects the distinction between the consonant and vowel tiers (see Section 2.1.3). The column labeled “Schematic Form” illustrates possible association and coordination relationships defining different candidates; the constraints that these representations violate are listed in the final column. In the schematic form, connecting lines indicate that the gestures are associated. Coordination relations are also encoded in the schematic form, which will be employed in the tableaux to follow, since it is the clearest way to indicate what the relevant coordination and association relationships are. Solid lines between gestures indicate that the coordination relationship (either CV- or CC-COORD, depending on the gestures in question) conforms to the relevant COORD constraint, and dashed lines signify that the coordination does not satisfy the COORD constraint (either because gestures have been “pulled apart” or “pushed together”). The double dashed lines indicate that the coordination on the surface violates CC-COORD by more than the minimal distance τ. The form in square brackets underneath the schematic form reflects both the syllabification and the pronunciation that each gestural output representation gives rise to. Since this analysis is only concerned with intergestural relationships and not intragestural ones, it suffices to use the traditional phoneme symbol to represent a constellation of gestures. For example, in the schematic form, “s” represents the coronal tongue tip gesture and wide laryngeal gesture that would typically be shown on a gestural score as the representation of /s/. The third column, “Gestural Output Representation,” illustrates the coordination among the gestures as it is realized on the surface. The boxes underneath the oral gestures in the gestural output representation represent the glottal gesture that is part of the constellation for /s/ and /p/. The intragestural relationship between the glottal gesture and the oral gesture is not part of this analysis, but the glottal gestures are shown in (28) in order to make it clear why there is aspiration in some cases and not in others. In short, whenever the relationship between associated (voiceless) consonants conforms to CC-COORD, their glottal gestures will be merged and no aspiration will appear on the second gestures G1 and G2 and they are associated, then their landmarks must be aligned as indicated in the COORDINATION constraints.

Page 119: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

109

consonant of the cluster if it is a stop. More information is presented in a short discussion of glottal opening gestures and their relationship to the oral gesture in voiceless consonants following the presentation of the candidates. It should also be noted that some of these gestural output representations give rise to the same surface form. In some of these cases, the candidates are harmonically bounded by other candidates, and consequently, can never win. For others, the ranking determined by inputs other than legal clusters like word-initial /sp/ will resolve the problem of the correct gestural representation for a particular winning surface form.

Page 120: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

110

(28) Input: s p a s p a

Schematic Form

Gestural Output Representation

Constraints violated (ignoring *OVERLAP)

Output candidates a s p a [spa]

CV-COORD (2x)

b s p a [spha]

CV-COORD CV-COORD2 (see following discussion)

c s p a [spha]

ASSOC-CV

d s p a [s´pha]

CC-COORD CV-COORD CV-COORD2

e s p a [s´pa]

CC-COORD CV-COORD (2x) CV-COORD2

f s p a [s´.pha]

ASSOC-C ASSOC-CV ASSOC-CC

g s p ´ a [s´.pha]

DEP

h s p a [s Épa]

ASSOC-CC

i s p a [s Épa]

CC-COORD

The candidate in (28)a, which is the correct representation for English, violates only the constraint CV-COORD. However, note that this constraint is violated twice. This

Page 121: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

111

follows from the proposal in Gafos (2002) that total satisfaction of CV-COORD by both consonants would require total overlap of the two consonants. Since an adjustment in CV coordination must be made to avoid complete overlap, Gafos argues that it is better to displace both consonant gestures a small amount than displace only one consonant but by a larger amount. The two possibilities are shown in (29). τ τ 2τ cc1 cc2 cc1 cc2 (29) a. b. *CV-COORD *CV-COORD2

*CV-COORD (and *CV-COORD, *CV-COORD) In the gestural configuration in (29)a, neither consonant achieves the CV coordination relation for English: ALIGN (C, center, V, onset). However, each consonant is only displaced from the optimal configuration by one temporal unit τ, the distance between the c-center and target or release, which Gafos proposes as the minimal movement distance that will incur a violation of a constraint. In the case of (29)b, the second consonant satisfies the CV-COORD relationship, but the first consonant violates it by a distance of 2τ. Gafos notes that in (29)a, “the temporal disharmony is equally dispersed between the two consonants. Instead, (29)b is maximally harmonic with respect to the second consonant, localizing the temporal disharmony entirely on the first consonant (320-21).” In order to rule out the configuration in (29)b, also given above as candidate (28)b, Gafos proposes that it violates a special form of local conjunction, namely self-conjunction. When the two violations of CV-COORD occur on one gesture, they cause an additional violation of the self-conjoined constraint CV-COORD2, which must be higher-ranked than CV-COORD. It should be noted that a candidate which would satisfy both CV-COORD and CC-COORD for all consonants cannot be produced by GEN. This is because it would be impossible for both consonants to have their centers aligned with the vowel onset and have the release of C1 coordinated with the target of C2. The candidate in (28)c avoids violations of CV-COORD since there is no association between /s/ and /a/. In the pronunciation of the output corresponding to (28)c, the /p/ would be aspirated since it has the standard singleton voiceless stop-vowel relationship for English. In English, a well-formed /s/-initial onset cluster has only one glottal opening (devoicing) gesture that takes place during the production of both the /s/ and the following consonant (Löfqvist and Yoshioka 1981, Yoshioka, Löfqvist and Hirose 1981). This detail of phonetic implementation may be related to the fact that both consonants are individually associated with the vowel and with each other. Since the two consonants are sufficiently overlapped, only one glottal gesture may span the two oral gestures. In (28)c, only the second consonant is associated with the vowel and has the correct CV-COORD relationships, so it is implemented as a singleton consonant would be on the surface: with aspiration. The /s/ also has its own glottal gesture, so this configuration predicts that there should be two separate glottal gestures visible if they were to be measured. Because the second CV association is not formed, this candidate violates ASSOC-CV. The candidate in (28)d, which will ultimately be shown to be the

Page 122: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

112

winner for inputs like fpa, violates both CV-COORD and CC-COORD because the gestures are pulled apart. Furthermore, this candidate must also violate CV-COORD2, because coordinating the consonant gestures such that they do not overlap also entails that C1 has been displaced from its optimal configuration by more than the temporal distance τ. The lack of overlap leads to the production of a transitional schwa between the two consonantal gestures. The violation of COORD constraints in this optimal output for fpa will be forced by higher-ranked *OVERLAP/[−strident],Fβ constraints. In the case of spa, however, the constraint *OVERLAP/[+strident],[+consonantal] is ranked low enough that it will not force violations of COORD constraints. The candidate in (28)e is similar, except that it also violates CV-COORD for the relationship between /p/ and the vowel. This candidate is harmonically bounded by d. In the candidate in (28)f, the gesture for /s/ is not associated either to the consonant or to the vowel, causing violations of ASSOC-CV, ASSOC-CC, and ASSOC-C. In this candidate, the /s/ is treated as being in its own syllable. In addition, since there is no overlap between /s/ and /p/, a transitional schwa would also result in this candidate. In (28)g the epenthesis of a vowel gesture between the two consonantal gestures creates two syllables and violates DEP. In the candidate in (28)h, there is no association between the two consonants, allowing CV-COORD to be totally satisfied for both consonants without violating CC-COORD. On the surface, this entails that the consonantal gestures will be totally overlapping (denoted ‘sÉp’). However, this candidate violates ASSOC-CC. Candidate (28)i also has totally overlapping consonants, but in this case there is an association formed between the two consonants. However, the consonants must be pushed together in order to satisfy CV-COORD for both consonants, and this violates CC-COORD. In addition to the candidates presented in (28), there are a number of candidates with gestural output representations that are logically possible, but which do not immediately correspond to a sensible surface phonetic form. Some examples are shown in (30). The constraint ranking for English will independently ensure that these candidates lose, in some cases because they are harmonically bounded by candidates already given in (28) (such as (30)c/(28)c). Ultimately, a more principled reason (formal or phonetic) for why these candidates are not optimal will be required. This is an area for future research.

(30) Losing candidates with indeterminate surface forms

Schematic Form

Constraints violated (ignoring *OVERLAP)

Output candidates a s p a

ASSOC-C ASSOC-CV ASSOC-CC

b s p a

ASSOC-CV

c s p a

ASSOC-CV (2x)

The candidates in (28) provide evidence for some of the crucial rankings for English: all constraints must be ranked above CV-COORD, except the *OVERLAP

Page 123: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

113

constraint *OV/[+cons],[+cons], which must be low-ranked in English and CV-COORD2, whose ranking is not determined by this input (though it would be ranked above CV-COORD if the special-to-general relationship for conjoined constraints holds in this case). This is illustrated in the tableau in (31). (31) s p

a

ASSOC-C

DEP CC-COORD

ASSOC-CV

ASSOC-CC

CV-COORD2

CV-COORD

*OV/ [+cons], [+cons]

a.

s p a [spa]

** *

b. s p a [spha]

*! ** *

c. s p a [spha]

*! *

d. s p a [s´pha]

*(!) *(!) *

e. s p a [s´pha]

*(!) *(!) **

f. s p a [s´.pha]

*(!) *(!) *(!)

g. s p ´ a [s´.pha]

*!

h. s p a [s Épa]

*!

i. s p a [s Épa]

*!

Page 124: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

114

From the tableau in (31), the ranking in (32) is determined: (32) ASSOC-C, DEP, CC-COORD, ASSOC-CV, ASSOC-CC, CV-COORD2 ≫ CV-COORD,

*OV/[+cons],[+cons] Crucial rankings for more of the constraints can be established by examining how a consonant sequence like /ft/ is syllabified in word-medial position (as in after). In order to account for this cluster word-medially, the *OVERLAP constraint *OV/[−strid],[−son] and *MULTASSOC must be added to the ranking. This is illustrated in the tableau for the hypothetical form /afta/ in (33). (CV-COORD2 is left out to conserve space.) (33) f t

a a

*MULT ASSOC

ASSOC -C

*OV/ [−strid], [−son]

CC-COORD

DEP ASSOC-CV

ASSOC-CC

CV-COORD

*OV/ [+cons], [+cons]

a

f t a a [af.tha]

* *

b. f — t a a [aft.tha]

*(!) *(!) * *

c. f — t a a [af.fta]

*(!) *(!) ** *

d. f t a a [a.fta]

*! ** *

e. f t a a [a.f´tha]

*! **

f. f t a a [a.ftha]

*! * *

g. f t a a [a.f.tha]

*! * *

h. f t a ´ a [a.f´.ta]

*!

Page 125: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

115

The first crucial ranking that can be determined from the tableau for the medial cluster in the hypothetical form /afta/ is that ASSOC-CV and ASSOC-CC must be ranked below *OV/[−strid],[−son], and CC-COORD. This is established by the winning candidate in (33)a, which violates both ASSOC-CV and ASSOC-CC. Candidates (33)d, which parses /ft/ as a syllable onset, violates *OV/[−strid],[−son]. Candidate (33)e does not violate *OVERLAP because the gestures are pulled apart, but this leads to a fatal violation of CC-COORD. ASSOC-CV and ASSOC-CC must also be ranked below DEP, as shown by candidate (33)h. Likewise, candidate (33)g, in which /f/ is not associated with any vowel, shows that ASSOC-C must be ranked above ASSOC-CV and ASSOC-CC. Note that candidates (33)b and c also violate *MULTASSOC, which is not the only constraint guaranteeing that these candidates are non-optimal, but would be if there were an /s/ instead of an /f/ in the input: *[as.spa] (assuming a non-ambisyllabic analysis of English). In the case of /s/, the *OVERLAP constraint that is violated is too low-ranked to otherwise defeat this candidate. Consequently, *MULTASSOC must also be ranked above ASSOC-CV and ASSOC-CC. Candidate (33)g, which does not parse the /f/ into a syllable at all, fatally violates ASSOC-C. From this tableau, the ranking in (34) is established. The ranking for ASSOC-C remains undetermined for the moment, since it could be ranked either above the ASSOC-CX constraints or in the same stratum. The ranking of CV-COORD2 and ASSOC-CC are similarly underdetermined. This is recognized in (34) by listing these constraints in parentheses. (34) *MULTASSOC, (ASSOC-C), *OV/[−strid],[−son], DEP, CC-COORD ≫ ASSOC-CV,

(ASSOC-CC), (CV-COORD2) ≫ CV-COORD, *OV/[+cons],[+cons]

5.3.2.2 Analysis of phonotactically illegal forms The analysis of phonotactically illegal word-initial clusters in forms like /fpa/ assists in further clarifying the ranking of ASSOCIATION and COORDINATION constraints with respect to other constraints. Normally, it is assumed that when English speakers produce borrowed words or names with a schwa, it results from a phonological process of epenthesis in order to make the initial sequence one that would be legal in English, such as the composer Dvořak [d´vçrZQk] or the pickle brand Vlasic [v´lQsIk]. In the tableau in (35), an analysis of the standard view in which epenthesis of a schwa gesture is the preferred repair is considered. The constraint CV-COORD2 is reintroduced into the hierarchy, as its relative ranking will become important in this section.

Page 126: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

116

(35) f p

a ASSOC-C

CC-COORD

*OV/ [−strid], [−son]

DEP ASSOC-CV

ASSOC-CC

CV-COORD2

CV-COORD

a. f p a [fpa]

*! **

b. f p a [fpha]

*! *

c.

f p a [f´pha]

*! * **

d. f p a [f´pha]

*! *

e. f p a [f´.pha]

*! * *

f.

f p ´ a [f´.pha]

*

In the tableau in (35), candidates a and b are not optimal because they both violate *OV/[−strid],[−son]. Candidate e, which treats the initial [f] as being in its own syllable, fatally violates ASSOC-C. Candidates c and d both violate CC-COORD, which is ranked above DEP in this analysis of the base English grammar. Thus, candidate f, which has an epenthesized schwa, is the winner. If it is true that English speakers represent the underlying forms for words like Dvořak or Vlasic with an underlying initial cluster, then this ranking seems plausible as a base grammar for English. Consequently, inputs with phonotactically illegal sequences would be repaired with phonological schwa when being produced as English words. However, this does not necessarily have to be the case. When producing these loanwords, a speaker (especially a literate one) may—in some sense—be aware of the fact that these words do not or should not contain a vowel between the initial consonants even in the output.21 This may be especially true in situations when speakers are either

21 On the other hand, when speakers produce such words with transitional schwas, hearers acquiring the word through that mode of transmission may assume that there is actually a real schwa (rather than one

Page 127: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

117

trying to learn a new language or faithfully reproduce words with a phonotactically illegal target in an experimental condition. When speakers attempt to produce non-English clusters in borrowed words or acquire new ones, not only do they have to learn which *OVERLAP constraints have to be outranked by a faithfulness or coordination constraint, but they also have to learn the coordination relation that exists between the consonants in the cluster. In some cases, this may be learning a new coordination relation that is different than the one active for the native language, or it may simply require promoting the existing CC-COORD constraint above the *OVERLAP constraints. In the case of the fricative-initial clusters that English speakers are trying to produce in the experiment in Chapter 3, being as accurate as possible in the experimental situation requires speakers to determine where to rank CC-COORD with respect to the *OVERLAP constraints. Given that the Czech-legal target clusters as produced by the speaker in the experiment did not have an audible release, it can be hypothesized that it is sufficient for English speakers to rerank the existing CC-COORD constraint in order to accurately produce the target stimuli. If it is left in the base position, speakers will only produce /s/-initial clusters correctly. However, a speaker may experiment with the placement of CC-COORD in an attempt to ascertain the ranking that will allow her to correctly produce the non-native targets. The results of the ultrasound experiment provide support for this hypothesis; instead of epenthesizing a schwa, speakers fail to accurately coordinate the gestures. The repair consists of pulling the gestures apart so that they do not overlap, violating CC-COORD. Instead of assuming that CC-COORD ≫ DEP is the appropriate ranking for English, it is hypothesized here that DEP ≫ CC-COORD is at least equally likely to be the correct ranking for English. This ranking means that speakers repair *OVERLAP violations by “pulling apart” the gestures, which violates CC-COORD. The outcome of this ranking is exemplified in the tableau in (36).

which arises as a result of non-overlapping coordination). In this case, the hearer is likely to posit a schwa in the underlying representation, thereby relieving the phonology of having to repair an illegal cluster.

Page 128: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

118

(36) f p

a ASSOC-C

DEP *OV/ [−strid], [−son]

CC-COORD

ASSOC-CV

ASSOC-CC

CV-COORD2

CV-COORD

a. f p a [fpa]

*! **

b. f p a [fpha]

*! *

c.

f p a [f´pha]

* * **

d. f p a [f´pha]

* *!

e. f p a [f´.pha]

*! * *

f. f p a [fÉpa]

*! *

g.

f p ´ a [f´.pha]

*!

This tableau provides evidence for one more crucial ranking: ASSOC-CV ≫ CV-COORD2. The winning candidate (36)c contains a violation of CV-COORD2. Because candidate (36)d violates ASSOC-CV, it must be ranked above CV-COORD2. The ranking for the English base grammar, including all of the constraints that have been discussed so far, is given in (37). The relative ranking of ASSOC-CC with respect to ASSOC-CV and CV-COORD2 remains undetermined in this grammar; this is shown in parentheses. (37) *MULTASSOC, (ASSOC-C), DEP, *OV/[−strid],[−son] ≫ CC-COORD ≫ ASSOC-CV,

(ASSOC-CC) ≫ CV-COORD2 ≫ CV-COORD, *OV/[+cons],[+cons]

Page 129: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

119

In the next section, the reranking strategy of English speakers will be explored. It will be argued that in the experimental context, speakers treat CC-COORD as a floating constraint that can be ranked in various positions with respect to the family of *OVERLAP constraints. It will also be shown how a floating constraint analysis can account for the actual proportions of accurate versus mistimed clusters produced by the speakers. It should be noted that it is possible that some speakers do in fact have the ranking CC-COORD ≫ DEP. As the ultrasound results showed, two speakers’ productions did not necessarily conform to the predicted tongue shape pattern for gestural mistiming, suggesting that they could be epenthesizing a schwa instead. For these speakers, there are two possibilities. One is that the relevant floating constraint is the faithfulness constraint DEP, and these speakers do not ever modify the coordination of phonotactically illegal consonant sequences. The other possibility is that they have one strategy when dealing with items borrowed into the English lexicon (epenthesis) but another when acquiring a new language or participating in an experimental task (gestural mistiming). In the latter situation, these speakers would not only have to learn the relative ranking of CC-COORD with respect to *OVERLAP constraints (except *OV/[+cons],[+cons]), but they would also have to make sure that it was ranked below DEP. For the sake of simplicity, only the grammar in which speakers already have DEP ≫ CC-COORD will be considered in the analysis of the experimental production in the next section.

5.4. Accounting for variability in production

5.4.1. Variation in OT grammars A number of studies have demonstrated that variation can arise in phonological processes, despite the fact that variation is often used as a diagnostic for gradient, phonetic processes (Reynolds 1994, Anttila 1997a, b, Nagy and Reynolds 1997, Boersma 1998, Ellis and Hardcastle 1999, Boersma and Hayes 2001, Davidson et al. 2003). In this situation, a rule or constraint may be active some proportion of the time, giving rise to one phonological output, while the remainder of the time another output is attested (e.g. Labov 1969, Cedergren and Sankoff 1974, Guy and Boberg 1997). The problem discussed by Anttila (1997a, 1997b), for example, concerns optional use of different allomorphs for the Finnish genitive plural in a single phonological environment. In the speech of Faetar speakers investigated by Nagy and Reynolds (1997), word endings may be deleted in a number of ways, giving rise to multiple variants that are optionally produced for the same lexical item and consistently accepted by listeners. Several formal analyses of phonological and syntactic variation have been presented within a version of Optimality Theory which employs floating constraints (Reynolds 1994, Anttila 1997a, b, Nagy and Reynolds 1997, Nagy and Heap 1998, Walker 2000, Legendre, Hagstrom, Vainikka and Todorova 2001b, Davidson and Goldrick 2003, Davidson et al. 2003, Davidson and Legendre 2003). This framework, in which certain constraints do not have a fixed rank with respect to other strictly ranked constraints, can explain how the phonology can give rise to multiple outputs. Floating constraints account for variation by allowing one or more constraints to move over a range of multiple constraints. Typically, this occurs when markedness constraints are fixed and faithfulness constraints are promoted in order to make a particular structure legal either in borrowing, language acquisition, or during

Page 130: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

120

experimental production. For example, floating constraints have been used to account for the variation in the presence of overt morphology in the acquisition of tense and agreement by children learning French, Catalan, and Mandarin Chinese (Legendre, Hagstrom, Tao, Chen and Davidson 2001a, Legendre et al. 2001b, Davidson and Goldrick 2003, Davidson and Legendre 2003). In the case of the French learners, for example, it was proposed that two separate faithfulness constraints requiring tense and agreement morphology to be realized overtly (PARSEAGREEMENT and PARSETENSE) float with respect to two markedness constraints prohibiting the projection of functional structure (*F: “Do not have a functional projection” and *F2: “Do not have more than one functional projection”) (Legendre et al. 2001b). A schematic of the floating range for one stage for the French learners is shown in (38).

*F2 >> *F PARSEA

(38)

PARSET The illustration in (38) indicates that on any given optimization, PARSEA and PARSET may be ranked anywhere with respect to the markedness constraints *F and *F2. Though they are not fixed in the grammar at this stage, they are fixed for each optimization, or production of each utterance. If a child attempts to produce a first person past tense form and the current ranking is *F2 ≫ PARSEA ≫ *F ≫ PARSET, then the optimal output will be one that realizes first person agreement, but no overt tense marking. Alternatively, if both PARSE constraints are ranked below the *F constraints, then the child will use a root infinitive (the default form for French). The empirical data for French indicates that each PARSE constraint is equally likely to be ranked either above, between, or below the *F constraints, but this does not necessarily hold true for all cases of language acquisition (see Davidson and Goldrick 2003). In a number of other cases of variation, the grammar responsible for allowing multiple outputs cannot be explained by defining the floating range of one or two constraints. For example, Zuraw (2000) accounts for nasal substitution in Tagalog and the exceptions to the process that speakers exhibit when producing loans or foreign words with a grammar that requires a minimum of 9 constraints. Because speakers do not consistently repair illegal sequences the same way and because some lexical items resist repair, no one fixed grammar can be deduced to account for nasal substitution in Tagalog. Nevertheless, there is a pattern in the types of outputs that are admitted, so Zuraw concludes that constraints are stochastically ranked (Stochastic OT: Boersma 1998, Hayes and MacEachern 1998, Hayes 2000, Boersma and Hayes 2001). In a stochastic grammar, each constraint has a probability distribution and constraints may overlap to varying degrees. An example from a hypothetical grammar is shown in (39):

Page 131: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

121

(39) Hypothetical probabilistic constraint ranking range of C1

range of C2 range of C3

majority outcome: C1 ≫ C2 ≫ C3

less probable outcome: C2 ≫ C1 ≫ C3

“vanishingly rare”22 C1 ≫ C3 ≫ C2

The diagram in (39) demonstrates how the specific variation arising from the interaction of a number of constraints can be determined. The constraint C1 is high-ranked and overlaps C2 to some extent. It would not be surprising for the ranking C2 ≫ C1 to be found on a given optimization, but this grammar indicates that the ranking C1 ≫ C2 will be more frequent. C3, on the other hand, will almost never be ranked above either C2 or C1. It has been shown that stochastic rankings can be learned by the Gradual Learning Algorithm (GLA: Boersma and Levelt 2000, Zuraw 2000, Boersma and Hayes 2001, Boersma, Escudero and Hayes 2003) that extracts information from lexical frequency to determine what the ranking is and how much overlap is necessary among constraint distributions. Notably, stochastic OT as proposed by Boersma (1998) cannot account for all attested types of variation. Because stochastic OT in combination with the GLA requires each constraint to have the same probability distribution (relative to a variable mean), it cannot be used to model a case like that of the French morphology example, in which a constraint floats over two other constraints and can be found above, between, or below them. A fixed probability distribution was implemented in the GLA in order to restrict the types of variation that may occur, but it is possible that this is not the right restriction after all. In fact, it can be argued that the situation necessary for the French and Catalan child language acquisition facts is illustrated in (40): (40) Ranking of constraints with non-equal distributions

range of C1 range of C2

range of C3 attested outcomes: C2

≫ C3 ≫ C1

C2 ≫ C1 ≫ C3

C1 ≫ C2

≫ C3

22 While there are technically no fixed constraints in this account, strict-domination effects arise when only the very tail ends of two constraints overlap. Since the floating range of a constraint is a normal distribution, the constraint will have a tendency to be found in the center of the range. In a case like (39), Boersma and Hayes point out that the likelihood of C3 outranking C2 should be “vanishingly rare”.

Page 132: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

122

In the next section, it is shown that a ranking like that described in (40) is also necessary to account for the results of the non-native cluster production experiment present in Chapter 3. Currently, the GLA is not able to learn such a ranking, but this should not be taken as evidence that the scenario in (40) should be unattested. On the contrary, it may be that while constraints and their distributions take the form illustrated in (39) in the final state of a grammar, perhaps the ranking situation in (40) occurs during the learning process or in experimental situations (Davidson and Goldrick 2003, Davidson et al. 2003, Davidson and Legendre 2003). To capture this with the GLA, the algorithm would have to be modified, but that is not the goal of this study.

5.4.2. A floating constraint analysis of English experimental performance In Section 5.3.1, it was shown that the *OVERLAP constraints detailed in that section have an internal ranking. The ranking among these constraints results from a stringency relationship among both the Fα features and the Fβ features in *OVERLAP/Fα,Fβ. For the fricative-initial onset clusters under consideration, universally /sC/ is the most harmonic of all the clusters, followed by /fC/ and /zC/, which are not ordered among themselves, and finally by /vC/, which is the least harmonic sequence of all. The Fβ features [−sonorant] and [−approximant] are also similarly related in that [−approx, +son] is more harmonic than [−approx, −son]. Consequently, for each Fα, the sequences /Fα+[−approx, +son]/ and /Fα+[−approx, −son]/ may be conflated in the sense of de Lacy (2002), or /Fα+[−approx, +son]/ may be treated as more harmonic than /Fα+[−approx, −son]/ in a language, but the opposite is never true. The combination of Fα and Fβ in the ranking of *OVERLAP constraints for English is given again in (41). In cases in which there is no intrinsic hierarchy between certain constraints, the evidence for the ranking comes from English speakers’ performance on the experiment in Chapter 3.

(41) *OV/[−strid]&[+voi],[−son] ≫ *OV/[−strid]&[+voi],[−approx], *OV/[+voi],[−son] ≫

*OV/[+voi],[−approx], *OV/[−strid],[−son] ≫ *OV/[−strid],[−approx] ≫

*OV/[+cons],[+cons] ≫ In production, speakers attempt to accurately produce the target clusters containing /f/, /z/, and /v/-initial clusters. Ultimately, ranking CC-COORD higher than all *OVERLAP constraints is necessary if a speaker is to produce each of these clusters accurately on each attempt. However, as demonstrated in the OT acquisition literature, it is usually not the case that learners can change a ranking so dramatically with little or no data, so one way to accomplish the goal is to posit a floating range for a constraint (e.g. Legendre et al. 2001b, Davidson and Goldrick 2003). If the speakers in the experiment were really to attempt to acquire the target clusters, allowing CC-COORD to float over the *OVERLAP constraints provides them with a gradual mechanism for learning which clusters must be produced correctly. If the acquisition of a new grammar in second language learning is similar to first language acquisition, then it might be expected that a speaker would not radically restructure her grammar immediately.

Page 133: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

123

The interaction of these constraints with CC-COORD, the constraint which is allowed to float over the range of *OVERLAP constraints by speakers intending to produce the non-native clusters correctly, can be illustrated with a simple example. As demonstrated by the experimental results from Chapter 3, participants produce /f/-initial clusters with higher accuracy than the other illegal target clusters. While the constraints in the base grammar in the final state of English must be ranked such that CC-COORD dominates *OV/[+cons],[+cons] but is lower-ranked than all other constraints in (41), speakers attempting to faithfully reproduce an /f/-initial cluster under experimental conditions can sometimes successfully rerank CC-COORD above the constraints *OV/[−strid],[−approx] and *OV/[−strid],[−son]. A speaker who is able to rank CC-COORD above both *OV/[−strident],Fβ constraints will be able to produce all /f/-initial clusters. This is shown in (42). In the tableaux in this section, only the ranking of CC-COORD relative to the *OVERLAP constraints will be depicted. DEP is ranked above CC-COORD, and the ranking of the ASSOCIATION constraints and CV-COORD with respect to the *OVERLAP constraints are as stated in (37). These rankings ensure that speakers will exhibit only gestural mistiming as a repair for phonotactically illegal sequences. (42) /fmatu/ CC-

COORD *OV/[−strid], [−son]

*OV/[−strid], [−approx]

CC-COORD

*OV/[+cons],[+cons]

a.

f m t a u [fmatu]

*

b.

f m t a u [f´matu]

*!

/fkada/ c.

f k d a a [fkada]

*

d.

f k d a a [f´kada]

*! *

In the tableau in (42), the optimal candidate for both /fN/ and /fO/ word-initial clusters is the one with correct coordination for English. Notice, however, that *OV/[−strid],[−approx] and *OV/[−strid],[−son] are crucially ranked with respect to one another. The stringency relation between these two constraints does not necessitate this ranking, but it is posited because performance on /fN/ clusters is significantly more accurate than performance on /fO/ clusters. The results indicate that speakers can rerank CC-COORD to be positioned between *OV/[−strid],[−approx] and *OV/[−strid],[−son],

Page 134: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

124

producing /fN/ clusters with greater accuracy than /fO/ clusters. However, because the individual markedness of the first and second consonant gesture contribute equally to the overall markedness of the cluster, some of the constraints will occupy the same stratum in the hierarchy. For example, the second member of /fO/ clusters is the more marked of the two possible types of gestures varied in the experiment (obstruent or nasal), but in /zN/ clusters, the /z/ is the more marked member. In the comparison of pairs like /fO/ and /zN/, the overall markedness is equivalent, and in fact these two types of clusters are produced with equal accuracy by the speakers. The reranking of CC-COORD in the tableau in (43) would result in the accurate production of both /zN/ clusters and /fO/ clusters, but not /zO/ clusters. The constraints *OV/[+voi],[−approx] and *OV/[−strid],[−son] are in the same stratum. For the sake of simplicity, it is illustrated as if CC-COORD cannot be ranked between them, but the same production results are accounted for if CC-COORD can be ranked between *OV/[+voi],[−approx] and *OV/[−strid],[−son] and their ranking relative to one another may also change on a given optimization. (43) /fkada/

*OV/[+voi], [−son]

CC-COORD

*OV/[+voi],[−approx]

*OV/[−strid], [−son]

CC-COORD

a.

f k d a a [fkada]

*

b.

f k d a a [f´kada]

*!

/zmafo/ c.

z m f a o [zmafo]

*

d.

z m f a o [z´mafo]

*!

/zbasi/ e.

z b s a i [zbasi]

*! *

f.

z b s a i [z´basi]

*

Page 135: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

125

In order for the local conjunction to have an effect, the constraints *OV/[−strid]&[+voi],[−approx] and *OV/[−strid]&[+voi],[−son] which rule out /v/-initial clusters are ranked above the corresponding contextually simple *[−strident] and *[+voice, −son] constraints. Like the other pairs of *OVERLAP constraints, *OV/[−strid]&[+voi],[−approx] is ranked above *OV/[+voi],[−approx], but it does not have to be ranked above *OV/[+voi],[−son]. In fact, according to the results of the experiment, *OV/[−strid]&[+voi],[−approx] must be in the same stratum as *OV/[+voi],[−son], since participants exhibit equal accuracy on /vN/ clusters and /zO/ clusters. Performance on /vO/ clusters, however, is significantly reduced compared to /vN/ clusters, as correctly predicted by the ranking *OV/[−strid]&[+voi],[−son] ≫ *OV/[−strid]&[+voi],[−approx]. This is illustrated in (44), in which CC-COORD is reranked above *OV/[−strid]&[+voi],[−approx]. (44) /zbasi/

*OV/[−strid]& [+voi],[−son]

CC-COORD

*OV/[−strid]& [+voi],[−approx]

*OV/[+voi], [−son]

CC-COORD

a.

z b s a i [zbasi]

*

b. z b s a i [z´basi]

*!

/vmape/ c.

v m p a e [vmape]

*

d. v m p a e [v´mape]

*!

/vbaza/ e.

v b z a a [vbaza]

*! * *

f.

v b z a a [v´baza]

*

Page 136: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

126

Without data from a task in which speakers are faced with phonotactically illegal forms, the “hidden” rankings of the *OVERLAP constraints would not be evident since they are not reflected in native English forms. Speakers who correctly produce illegal clusters at least some proportion of the times that they attempt them seem to be able to move coordination constraints out of the position they occupy in the base grammar of English, which ultimately uncovers the hidden rankings. The next question this leads to, then, is what the idea of “percent correct” on the experimental targets corresponds to in grammatical terms. One possibility is that at the beginning of the task, speakers rerank CC-COORD to some position in the hierarchy and allow it to remain there for the duration of the experiment. However, this predicts that they should be performing at 100% for those clusters whose corresponding markedness constraints are ranked below the coordination constraint, while never accurately producing those clusters governed by higher ranked constraints. In fact, speakers do not show all-or-none performance, suggesting that they are not reranking the coordination constraint at the beginning of the experiment and leaving it in that position. Instead, it indicates that the speakers are more likely reranking the coordination constraint spontaneously for each trial. Such reranking is possible because CC-COORD is allowed to float over the range of markedness constraints that prohibit the illegal clusters. For each optimization, CC-COORD is assigned a fixed position in the hierarchy. This position is potentially different for each attempt of a target cluster. Some clusters, but not others, will be possible depending on where the coordination constraint lands each time it is reranked. This is demonstrated in (45): (45) … ≫ 5

*OV/[−strid]&[+voi],[−son] ≫ 4 *OV/[−strid]&[+voi],[−approx], *OV/[+voi],[−son] ≫ 3

*OV/[+voi],[−approx], *OV/[−strid],[−son] ≫ 2 *OV/[−strid],[−approx] ≫

*OV/[+cons],[+cons] .

CC-COORD

The fact that CC-COORD can float to any position in the hierarchy predicts that if it floats above a higher constraint some proportion of the time, clusters banned by a lower constraint must be accurately produced even more often. The position marked by defines the base grammar of English, allowing only /s/-initial clusters. By reranking CC-COORD to position 2, speakers correctly produce /fN/ clusters, whereas elevating CC-COORD to position 3 adds to the inventory /fO/ and /zN/ clusters. Further elevations of CC-COORD lead to the incremental inclusion of /zO/ and /vN/, and finally /vO/ clusters. The simplest hypothesis regarding the floating range of CC-COORD is that of “uniform floating”, in which each of the positions -5 is an equally likely resting place (Anttila 1997a). Uniform floating entails that clusters that can be attained by a greater number of docks should be attested more often. For example, if there is equal probability that CC-COORD will land in any of the positions whenever an illegal cluster is attempted, the /fN/ clusters should be correctly pronounced approximately 80% of the time, since four of the five positions allow them. These predicted percentages are shown next to the observed experimental percentages in Table 12:

Page 137: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

127

EXPERIMENT THEORY Uniform Floating

Cluster Stratum

Observed Correct (Mean across subjects)

CC-COORD Position

Visitation Probability

Predicted Correct

1. sm,sn,sp,st,sp,st,sk 97% 20% 100% 2. fm,fn 75% 2 20% 80% 3. fp,ft,fk,fs / zm,zn 54% 3 20% 60% 4. zb,zd,zg / vm,vn 31% 4 20% 40% 5. vb,vd,vg 18% 5 20% 20%

Table 12. Observed proportion correct versus predicted proportion correct

Predictions based on uniform floating of CC-COORD constraint. The probabilities in Table 12 are based on the assumption that speakers can rank CC-COORD either above or below the constraints in a stratum, but not between them. Another possibility is that there is both additional (uniform) floating between the constraints unranked with respect to one another in a particular stratum, and the two constraints in a stratum are freely rankable. To take stratum 3 as an example, on a given optimization in this type of floating constraint scenario, CC-COORD could be found not only above or below *OV/[+voi],[−approx] and *OV/[−strid],[−son], but rather all of the partial grammars in (46) would be possible: (46) CC-COORD ≫ *OV/[+voi],[−approx] ≫ *OV/[−strid],[−son]

CC-COORD ≫ *OV/[−strid],[−son] ≫ *OV/[+voi],[−approx] *OV/[+voi],[−approx] ≫ CC-COORD ≫ *OV/[−strid],[−son] *OV/[−strid],[−son] ≫ CC-COORD ≫ *OV/[+voi],[−approx] *OV/[+voi],[−approx] ≫ *OV/[−strid],[−son] ≫ CC-COORD *OV/[−strid],[−son] ≫ *OV/[+voi],[−approx] ≫ CC-COORD

The proportion of correct performance predicted by applying the same type of floating constraint analysis to all of the strata with multiple constraints is given in Table 13. The new predictions with the additional floating are shown in the third column, and the predictions based on the original analysis in Table 12 is in the fourth column.

Page 138: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

128

Cluster Type

Observed Correct

(Mean across subjects)

Predicted Correct

(additional floating)

Predicted Correct (no floating within

strata, from Table 12) sC fN fO zN zO vN zO

.97

.75

.53

.56

.31

.31

.18

1.00 .88 .66 .66 .33 .33 .11

1.00 .80 .60 .60 .40 .40 .20

Table 13. Observed proportion correct versus predicted proportion correct Predictions based on two possible types of floating of CC-COORD constraint.

The predicted proportion correct for each type of floating in Table 13 are comparable and are about equally similar to the observed proportion correct. In keeping with the Optimality Theoretic assertion that the final state of the grammar is fully ranked (Prince and Smolensky 1993, Tesar and Smolensky 2000) and the idea in floating constraint accounts of acquisition that a fully ranked hierarchy is produced for each optimization (Legendre et al. 2001b, Davidson and Goldrick 2003, Davidson and Legendre 2003), it is assumed that speakers can in fact rank CC-COORD between the constraints in a stratum and that those constraints themselves are rerankable. When the averaged proportions for all speakers are considered, the uniform floating hypothesis is very nearly upheld. However, an examination of individual speakers’ performance shows that there is in fact a considerable amount of variation. This is demonstrated by the scatter plots in Figure 36, in which performance on the clusters in a given stratum is compared to performance in the next highest stratum (e.g. fm,fn vs. fp,ft,fk,fs / zm,zn). In these scatter plots, the strata from the first type of floating possibility in Table 12 are shown to simplify the discussion. Regardless of what the probability of CC-COORD floating to any particular position is, if strata are strictly defined by the grammar, then an individual speaker should perform at least as accurately on lower strata than they do on higher strata. In the plots in Figure 36, the proportion correct on a lower stratum is plotted on the X-axis, whereas the proportion correct of the next higher stratum is on the Y-axis. It is predicted that all points will lie on or below the principal diagonal (y=x). This prediction is held up for all but a small number of the speakers, and even for most of these speakers the data points are still close to the diagonal.

Page 139: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

129

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Stratum 2 (fN) accuracy

Stra

tum

3 (f

O/z

N) a

ccur

acy

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Stratum 3 (fO/zN) accuracy

Stra

tum

4 (z

O/v

N) a

ccur

acy

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Stratum 4 (zO/vN) accuracy

Stra

tum

5 (v

O) a

ccur

acy

Figure 36. Scatter plots indicating performance on successively higher ranked strata

Page 140: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

130

These plots show that while speakers do not necessarily conform to the proportion correct predicted by the uniform floating hypothesis for each cluster stratum, the performance of almost all of the speakers is nevertheless consistent with the fixed ranking of markedness constraints given in (45). In other words, assuming that all English speakers share this common hidden ranking of constraints, then for any given speaker, performance on clusters in stratum 2 equals or exceeds performance on clusters in stratum 3, which equals or exceeds performance on clusters in stratum 4, and so on. Determining the floating ranges for CC-COORD for each individual speaker is not the goal of the analysis presented in this section, but it should be noted that differences among speakers can be attributed to speakers’ varying ability to allocate the cognitive resources necessary to elevate the coordination constraint (see Davidson et al. 2003). Furthermore, non-uniform floating is not limited to experimental production; it can also be found in child language acquisition (Davidson and Goldrick 2003).

5.4.3. The origin of hidden rankings A hidden ranking analysis of any data, whether experimental or in language acquisition, inevitably raises the question of the origin of these hidden rankings. There are two possibilities regarding the basis for hidden rankings which cannot be decided between using only the data in this dissertation. The first possibility is that the ranking in (45) is present in the initial state. In this case, the constraint hierarchy can be considered a default ranking that can change given the appropriate input during first language acquisition. Children acquiring languages like Hebrew or Serbo-Croatian then will learn that the ranking || *OV/[+voice],[−son] ≫ *OV/[−strid],[−son] || must be reversed. This possibility also predicts that speakers of languages that do not have the relevant initial clusters should show the same pattern as English speakers on the experiment in Chapter 3, since they would have had no impetus for changing the default ranking. This could be easily tested by administering the experiment to speakers of languages like Spanish, Chinese, or Hindi. Second, such rankings could be language-specific. One way language-specific rankings might occur is as a result of some aspect of the learning process (Tesar and Smolensky 2000, Zuraw 2002), which could end up leading to particular hidden strata, even if they will never have an effect on the base English phonology. In a discussion of the different phonemes that second language learners of varying language backgrounds substitute for /θ/, O’Connor (2002) claims that not only are hidden rankings likely to be language-specific, but that Optimality Theory additionally predicts interspeaker variation, since more than one fully ranked hierarchy can be consistent with the learning data for one’s native language. For example, she cites evidence that French speakers learning English may substitute either [s] or [t] for /θ/. However, as the data for the production of non-native onset clusters indicate, English speakers seem to share a common hidden ranking. The differences between O’Connor’s findings and those presented here may indicate that some hidden rankings exist because they form a default hierarchy, whereas others may vary both by language and even by speaker. Before such a claim can be verified with respect to the cluster data, however, a much larger-scale study of speakers of different languages would have to be carried out. This is an area for future research.

Page 141: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

131

5.5. The special case of /f/-initial clusters The formal analysis developed in this chapter is based on the assumption that speakers are using gestural mistiming as the repair for all of the illegal sequences that they produce. This method of repair was determined from the results of the ultrasound experiment in Chapter 4. However, it is conceivable that speakers actually have different production strategies that are specific to the cluster type that they are attempting to produce. This is most plausible for /f/-initial clusters. Unlike the voiced clusters, the coordination of the consonants in voiceless clusters is not only a matter of determining the relationship between C1 and C2; speakers must also learn the appropriate glottal configuration, since an active abduction gesture is required for voicelessness. This may be especially relevant for clusters of two voiceless consonants like /fp/, /ft/, /fk/ and /fs/. As already discussed, word-initial /s+voiceless obstruent/ sequences in English are produced with only one glottal gesture that spans the duration of the oral gestures for the two individual consonants (Yoshioka et al. 1981, Hoole 1997). Browman and Goldstein (1986) interpret this finding as a phonological requirement for only one devoicing gesture per syllable. However, Kingston (1990) points out that the single, large glottal abduction gesture that has its peak opening during the strident may be unique to strident fricative-initial onset clusters. Though he has no production data to verify this, Kingston argues that the dearth of other fricative-initial clusters cross-linguistically suggests that the coordinative structure for the laryngeal gestures in /s+voiceless obstruent/ sequences “with the loss of an aspiration contrast in the stop it produces, may not be generalizable to other sequences of a continuant followed by a stop (428).” While no glottal information has been collected for the speakers in the Czech cluster experiment in Chapter 3, it may be that the English speakers are facing two separate coordination issues when it comes to producing /f/-initial clusters: learning how to coordinate the oral gestures with respect to the CC-COORD relationship specified for English, and ascertaining the appropriate laryngeal configuration for these sequences, especially /f+voiceless obstruent/. One possibility is that speakers should be attempting to impose the same configuration that is used for /s/-initial sequences, and another is that each consonant in the cluster should be produced with its own laryngeal abduction gesture. Unfortunately, there is no reported glottography or endoscopy data from Slavic to indicate whether the acoustic target generated by the Czech speaker was produced with one or two glottal openings. Such information could be used as a guide to at least speculate about what an English speaker’s strategy might be. A detailed acoustic examination of English speakers’ production of /fp/, /ft/, and /fk/ clusters suggests that speakers are in fact treating oral and glottal configurations as two separate objectives that must ultimately be resolved if the sequences are to be produced accurately. Taking a subset of the data from the Repetition condition in the experiment in Chapter 3, there are 240 tokens of /f+voiceless obstruent/ sequences that were produced (20 participants x 3 clusters x 4 items per cluster). These tokens can be classified in terms of whether the second consonant is aspirated or not, whether a vowel is present between the two consonants, and whether that vowel is voiced or not.

Page 142: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

132

Number of Utterances

Example Surface Form

Oral/Glottal Status Experimental Classification

(a) 37% (88/240) [fpa] • Overlapping oral gestures • Single glottal gesture

“Correct”

(b) 17% (41/240) [fpha] • Overlapping oral gestures • Two glottal gestures

“Correct”

(c) 2% (4/240) [f´•pa] • Mistimed oral gestures • Single glottal gesture?

“Insertion”

(d) 5% (12/240) [f´•pha] • Mistimed oral gestures • Two glottal gestures

“Insertion”

(e) 8% (19/240) [f´pa] • Mistiming or real epenthesis? • Two glottal gestures (but incorrectly coordinated?)

“Insertion”

(f) 13% (32/240) [f´pha] • Mistiming or real epenthesis? • Two glottal gestures (correctly coordinated)

“Insertion”

Table 14. Details of the production of /f+voiceless obstruent/ sequences

Data comes from the Repetition condition of the experiment in Chapter 3. Productions are classified as to whether the vowel is voiced or voiceless and whether or not the second consonant shows aspiration. The

actual proportion and raw number of each utterance type is shown in the first column. A possible interpretation of the surface form in terms of the configuration of the oral and glottal gestures is in the third column, and the classification of each of these forms used in the experiment is given in the fourth column.

Sample spectrograms for selected surface form types are shown in Figure 37.

f k a l e

(a) Surface form type a (no vowel, burst but no aspiration on stop): S11 fkale

Page 143: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

133

f •́ kh a l e

(b) Surface form type d (devoiced vowel, aspiration on stop): S1 fkale

f ´ kh a l e

(c) Surface form type f (voiced vowel, aspiration on stop): S12 fkale Figure 37. Spectrograms demonstrating different productions of /f+voiceless obstruent/ clusters from the

experiment in Chapter 3.

Without glottographic or endoscopic information about the status of the glottis during English speakers’ production of these sequences, it is only possible to speculate about the articulatory configurations for each of the surface forms in Table 14. It is hypothesized that for the surface forms types (a) [fpa] and (b) [fpha], which do not

Page 144: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

134

contain any evidence of a vowel between them, the speakers are producing the oral gestures in conformity with CC-COORD, such that there is overlap between the consonants and no open vocal tract. In (b), however, there is aspiration on the stop, which would not be expected if speakers were imposing the same glottal coordination that is used for English /s+voiceless obstruent/ initial sequences. It may be that these productions include two sequentially produced glottal gestures, each one affiliated with the individual consonants in the cluster. As suggested above, this configuration may be available to speakers who are trying to determine whether all /voiceless continuant+voicless obstruent/ sequences have the same glottal configuration as /s+voiceless obstruent/. The surface form type (a) is likely produced with only one glottal opening. Like (a), the surface form type (c) [f´•pa] does not have an aspirated second consonant. However, (c) also contains a devoiced vocalic period which suggests that these sequences may be produced with oral gestures that are pulled apart (i.e. mistimed). This would suggest that the devoicing gesture spans the period between the consonant in which they are not overlapped. Alternatively, it could be that speakers are producing a voiceless stop with nearly 0ms voice onset time, but this is very unlikely given that VOT is very difficult for second language learners to acquire natively even when they have been learning for a long period of time (e.g. Flege and Eefting 1987a, Flege and Eefting 1987b). Furthermore, since there were only 4 utterances like the surface form type (c), it does not seem to be a very productive form for the participants. The surface form type (d) [f •́pha] may be analyzable as having a glottal gesture for each of the consonants and mistiming of the oral gestures. This gives rise to a voiceless vowel between the consonants and aspiration on the stop. The major differences between types (c), (d) and (e), (f) is that the acoustic record indicates that the vocalic portion between the two consonants is voiceless for the former, whereas in the latter, the vocalic material is voiced. In this case, there is definitely a period of vocal fold vibration between the two voiceless consonants, but it is not clear whether this arises from the epenthesis of a phonological schwa, or from gestural mistiming which leads to a sufficiently large vocal tract opening that produces spontaneous voicing. In (e) [f´pa], the second consonant is not produced with aspiration, which would not be expected if there were phonologically two syllables in /f´.pa/. The surface form in (f) [f´pha], is more like what an output that has phonological epenthesis should be. However, it could still be possible to get (f) if each consonant formed a constellation with a separate glottal gesture, and then these constellations were coordinated such that they violated CC-COORD (did not overlap) but not ASSOC-CV (both consonants are associated to the vowel). Because there is currently no model of the vocal tract that can determine whether or not spontaneous voicing is possible in this situation, it will be assumed that it is feasible, and that in fact speakers prefer this option.23 On the other hand, given the extra complication of having to coordinate glottal gestures in addition to the oral gestures in the /f/-initial cases, it would not necessarily be

23 It should be noted that a number of phoneticians have been consulted on this issue, and while they could not be absolutely certain, none of them were willing to rule out the possibility that spontaneous voicing could occur if two voiceless gestures were pulled far enough apart while the speaker was in a speech-ready state. Any real resolution of this issue, however, requires further research.

Page 145: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

135

surprising to find that speakers use a different repair for voiceless sequences than they do for voiced ones. The analysis in Section 5.4 assumes that speakers repair the violation of *OVERLAP by pulling the consonantal gesture apart. However, when the two consonant gestures are pulled apart, the relationship that exists between the two oral gestures of the consonants and the intended single glottal gesture is disrupted. It is possible that violating a constraint pertaining to the optimal oral-glottal configuration is worse than epenthesizing a vowel and violating DEP. Since the phonological configuration that gives rise to the oral-glottal relationship reported in Yoshioka et al. (1981) is not well understood, and because it is not clear what the glottal motions of the speakers in the production study actually are, determining whether the repairs for /f/-initial sequences are the same as those for voiced sequences must be left for future research.

5.6. Summary The results of the experiment in Chapter 3 show that when asked to produce non-native word-initial consonant clusters, speakers reliably distinguish between clusters even though none of them are found in the legal English cluster inventory. In this chapter, it was shown how these distinctions can arise from a “hidden” hierarchy of ranked markedness constraints that speakers can access in an experimental situation. The relative costliness of the poor perceptibility of weak-intensity labiodental fricatives and the articulatory difficulties of voiced fricatives cannot a priori determine the relative markedness of /f/, /z/, and /v/ . This is reflected phonologically in a scale that treats /fC/ and /zC/ clusters as being equally harmonic, with /sC/ clusters more harmonic and /vC/ cluster less harmonic. The stringency relationships posited for fricative-initial clusters predict that there is constrained variation both in the types of cluster inventories that can be found cross-linguistically, and in the behavior speakers will exhibit in an experimental condition. Both of these sources of data upheld the predictions. In the present analysis, both the phonotactic constraints and the repair that speakers exhibit were characterized in terms of phonological gestural coordination. As the ultrasound results show, participants repair phonotactic violations not by epenthesizing a vowel with its own gestural target, but by reducing the overlap between the two consonants in the cluster. It has been argued that this repair is also phonological in nature, and that coordination constraints that are independently necessary are also recruited in the repair of phonotactic violations. In English, the constraint CC-COORD, which governs the coordination relationship between consonants in a cluster, states that the release of the first consonant must be coordinated with the target of the second consonant. This relationship pertains to associated consonants, which are determined by Gestural Association Theory. Given that articulatory implementation of gestures has been shown to be sensitive to syllable position, GAT has been developed as an alternative to specifying syllable structure as part of the lexical entry. When the association and coordination of two consonants violates a phonotactic *OVERLAP constraint, it can be repaired either by epenthesizing a new vowel gesture, or by “mistiming” the consonant gestures—pulling them apart so that they no longer violate *OVERLAP. It was argued that English speakers do the latter in the experimental conditions. The formal analysis developed in this chapter accounts for the fact that speakers do not exhibit all-or-nothing performance on the illegal word-initial clusters, but rather produce them accurately some proportion of the time. It was proposed that speakers can

Page 146: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

136

treat certain constraints (here, CC-COORD) as floating constraints, which can be reranked when speakers attempt to attain the grammar which allows such word-initial clusters. Individual speakers’ ability to rerank the floating constraint may vary, but in almost all cases they respect the common hidden ranking posited for the English grammar. The idea that the final state of the native language grammar is both fully ranked and also that it can affect speech production under certain circumstances has important consequences for language acquisition, contact, and production.

Page 147: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

137

CHAPTER 6. Coordination in Fast Speech Schwa Elision In the previous chapter, a framework for incorporating gestural coordination in the grammar was examined with respect to English speakers’ production of non-native word-initial consonant clusters. In this chapter, another sort of production data—pre-tonic schwa deletion in English fast speech—is investigated in an effort to begin to determine what other types of speech production phenomena may be in the domain of phonological coordination, as opposed to phonetic (or motoric) implementation. Browman and Goldstein (1990b) argue that many of the surface characteristics of casual speech production that have traditionally been considered a result of phonological processes can be accounted for by changes in gestural overlap or magnitude. For example, nasal assimilation might occur if the release of a nasal followed by a stop is substantially overlapped by the stop, so that its own place of articulation is acoustically obscured. However, Browman and Goldstein are not explicit about whether this kind of relationship among gestures is determined by coordination considerations at a phonological level, or whether it occurs in phonetic implementation after the phonology has produced a “well-formed” output. If it is indeed the case that there is a principled distinction between categorical, phonological processes and gradient, phonetic ones, then it must ultimately be possible to determine the level at which changes affecting gestural properties like overlap and magnitude reside. Pre-tonic schwa deletion has attracted much attention in the phonological literature as an example of how varied registers or speaking styles may allow different phonotactic structures or apply different phonological rules from those found in the canonical grammar of English (e.g. potential *[pt]ential vs. felicity [fl]icity) (Zwicky 1972, Hooper 1978, Kaisse 1985). These accounts tend to be impressionistic, so there has been little quantitative data regarding whether deletion is equally likely to be found in all environments, how often it occurs when it is found, and whether there is a tendency for deletion to be more frequent in environments that would lead to legal onsets (but see Glowacka 2001 for British English). This kind of information is crucial to understanding whether the schwa “deletion” process is phonologically or phonetically governed, since it might be predicted that phonological deletion would be influenced by phonotactic restrictions. That is, if deletion is a phonological process, it would be expected that it would be considerably more common when it would result in a legal word-initial cluster for English. In this chapter, pre-tonic schwa deletion is investigated in order to examine three issues: (a) whether “deletion” is actually deletion of the schwa gesture or rather substantial gestural overlap, (b) whether the nature of the surrounding consonants affects either deletion or the amount of overlap, and (c) if “deletion” is a result of gestural overlap, whether it is possible that the overlap is governed by phonological coordination constraints.

Page 148: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

138

6.1. Pre-tonic schwa elision

6.1.1. Weak vowel deletion in the phonology The claim that English speakers reduce or delete schwa in pre-tonic initial syllables (abbreviated as /#C´C-/) in fast speech has been incorporated into analyses of diachronic change, syllable structure, sonority scales, and the phonology of connected or casual speech (Zwicky 1972, Hooper 1978, Kaisse 1985, Roca and Johnson 1999). In these proposals, schwa deletion may result in the formation of both legal or illegal word-initial consonant clusters such as semester smester or fatigue ftigue. With the exception of “universally unacceptable” initial clusters such as *rmember (Zwicky 1972), these accounts are not concerned with whether the resulting cluster is found in the English cluster inventory per se. The examples of candidates for “pre-stress contraction” discussed by Zwicky (1972), for instance, include derivative, galoshes, and ferocious, but the same list also contains development, vicinity, and demonstrative. He mentions that some words, such as Decameron, revised, and pedestrian do not undergo deletion “presumably for phonetic reasons (284)”, but does not claim that it is because these would result in word-initial clusters that are illegal in English. Hooper (1978) frames the question of schwa deletion in terms of a competition between a tendency to delete unstressed vowels and a pressure to maintain universal constraints on syllable structure (based on Greenberg 1965). In the case of word-initial, pre-stress schwa deletion, she claims that rapid speech provides adequate motivation for the overriding of syllable structure conditions. While the influence rapid speech has on vowel deletion is not explicitly articulated, it can be assumed that deletion is driven by a physical preference to minimize articulatory effort (Flemming 1995, Boersma 1998, Kirchner 1998/2001, Flemming 2001). Unlike Zwicky, who allows for the possibility that some /#C´C-/ sequences will not exhibit schwa deletion (despite failing to explain the properties of those sequences), Hooper argues that all /#C´C-/ sequences, such as /#stop-´-stop-/, /#fricative-´-stop-/, /#obstruent-´-nasal-/, etc., are candidate environments for schwa deletion. Despite the general acceptance of Hooper’s claim, empirical evidence demonstrating the frequency and nature of schwa deletion and reduction in fast speech is sparse. Typically, the conclusions regarding when schwa deletion occurs are based on introspective judgments, leading to the impression that pre-tonic schwa deletion occurs frequently (Zwicky 1972, Hooper 1978). There are, however, two corpora studies that categorized schwa deletion in English speech using empirical data, mainly collected through perceptual judgments involving syllable counting and supplemented by examinations of spectrograms in unclear cases. Dalby (1986) examined schwa deletion in initial, medial, and final syllables in fast and slow read speech and speech from television news broadcasts. For the read speech, Dalby found that pre-tonic schwa was deleted in 2% of the tokens in slow speech and 44% in fast speech. In broadcast speech, pre-tonic schwa was deleted in 9% of the tokens. Dalby also presents a rough examination of the effects of the place, manner and voice of the surrounding consonants on deletion rates, finding that deletion was more common when the schwa in question was preceded by a stop or a fricative than when preceded by a sonorant. A closer analysis shows that while deleted forms may contain sequences of consonants that are otherwise not found in English, other restrictions, such as the number of consonants in the onset or the general

Page 149: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

139

sonority profile of the clusters, are mostly upheld. Dalby argues that these findings “suggest that fast-speech consonant changes such as deletion and assimilation are part of an overall strategy to reduce the number of syllables in an utterance while preserving the well-formedness of surface syllabization (sic). (78)” A similar corpus study which emphasized deletion rates in conversational speech was carried out by Patterson, LoCasto and Connine (2003). With the main focus of this study being the effect of stress, position in the word, and word frequency on schwa deletion, these researchers examined pre-tonic deletion in two and three syllable words and post-tonic deletion in three syllable words using the Switchboard corpus (Godfrey, Holliman and McDaniel 1992). Using a coding technique similar to Dalby’s, it was found that pre-tonic schwa deletion occurred in 15.4% of the tokens for two-syllable high-frequency words, 6.2% for low-frequency two-syllable words, 15.1% for high frequency three-syllable words, and 12.3% for low-frequency three-syllable words. Logistic regressions showed that frequency was not a significant predictor of schwa deletion. Patterson et al. do not discuss the influence of the consonants flanking the schwa on deletion rates, but an examination of their data in the appendix reveals that nearly all of the candidates for pre-tonic schwa deletion would result in an English-legal onset cluster if the schwa were deleted (such as believe blieve). In their concluding remarks, Patterson et al. are primarily interested in the ability of speakers to recognize a target word that has undergone schwa deletion, though they also tentatively note that pre-tonic schwa deletion does not appear to be phonological deletion, but may be the result of a “phonetic realization rule” (62). However, without a comparison of their data to environments in which deletion would lead to the formation of illegal onset clusters, it is difficult to assess this claim. Though Patterson et al.’s data may not shed light on the issue of whether the legality of the resulting cluster influences pre-tonic schwa deletion rates, other research has suggested that perhaps the phonological nature of schwa deletion should be questioned regardless of the resulting phonotactics. Considering the recent work which has shown that processes traditionally thought of as phonological—such as nasal assimilation, closed syllable vowel shortening, retroflexion, full vowel reduction, post-nasal voicing, stressed vowel lengthening and sandhi—are very similar to processes that have been considered phonetic—such as coarticulation or nasalization (Cohn 1993, Zsiga 1995, 1997, Ellis and Hardcastle 1999, Flemming 2001), it may be that pre-tonic schwa deletion is a similar process. As has been observed previously, it may be possible to characterize segmental deletion simply as the extreme endpoint of a phonetic reduction process (Browman and Goldstein 1990b). Alternatively, it is also possible that one characteristic of casual speech or faster speech rates is greater overlap of sequential gestures (Byrd and Tan 1996). If this is the case, then apparent deletion may actually reflect extreme overlap which obscures the schwa, even though it is phonologically produced. This issue is more fully explored in the next section.

6.1.2. Acoustic and articulatory evidence: Elision as overlap Two acoustic studies of schwa deletion in /#C´C-/ sequences have both suggested that in cases where the schwa appears impressionistically to be deleted, there is still phonetic evidence on the surface which indicates that the form must contain a schwa at some level of analysis (Manuel, Shattuck-Hufnagel, Huffman, Stevens, Carlson and

Page 150: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

140

Hunnicut 1992, Fougeron and Steriade 1997). For French, Fougeron and Steriade (1997) used electropalatography to examine the articulatory and acoustic differences between de rôle [d´“ol] “of role”, the elided form d’rôle [d“ol], and the monomorphemic drôle [d“ol] “funny”. Results indicated that for all participants, the [d] in d’rôle contained significantly longer linguopalatal contact, a longer lingual occlusion, and was less likely to be lenited than the [d] in drôle. These phonetic differences between the contracted example and the single lexical item led to the conclusion that the motor program corresponding to de was maintained even though the schwa was not acoustically present in d’rôle. Manuel et al.’s (1992) acoustic examination of the status of schwa in support as compared to speakers’ production of sport in casual versus careful speech indicated that speakers allow significant variability when producing the vowel. While some tokens of support in casual speech provided no acoustic evidence of a schwa, a number of other utterances contained either a vowel, a voice bar, or aspiration between the [s] and the [p]. Furthermore, it was reported that the indications of a vowel were also accompanied by aspiration on the [p]. These two results led to the conclusion that the glottal gesture for the underlying schwa is present in casual speech, and that speakers also retain the same glottal-oral timing found in careful speech. A study by Fokes and Bond (1993) also examined the phonetic effects of weak vowel deletion, comparing the production of /#s´C/-initial sequences, real initial clusters, “inadvertently created” clusters in which speakers appeared to delete a vowel, and “deliberately created” clusters in which speakers were essentially told to produce an /#s´C/-initial word with an initial cluster. The test sequences were triads of /#sp/-/#s´p/-/#s’p/ and /#sk/-/#s´k/-/#s’k/ initial words, where the apostrophe refers to the deliberately created clusters. The results showed that despite well-attested findings that singleton /s/ and the VOT of singleton consonants are longer than those in clusters, speakers did not reliably use singleton consonant values when producing created clusters. On the other hand, the values were not necessarily always like those in real clusters, either. In other words, these phonetic characteristics of speakers’ created clusters (those which resulted from vowel deletion) were not consistently more similar to either real clusters or /#s´C/ sequences. Furthermore, when these productions were presented to listeners who were asked to identify whether the words contained a cluster or a /#s´C/ sequence, the identification of words with created clusters as having clusters was often between 50-70%—essentially at chance—for many of the tokens. While these findings are somewhat inconclusive, they nevertheless suggest that pre-tonic schwa deletion does not simply lead to well-formed consonant clusters on the surface. While the results from Fougeron and Steriade (1997), Manuel et al. (1992), and Fokes and Bond (1993) are useful in pointing out the inadequacy of a purely phonological analysis, it must be noted that these studies, like that of Jannedy (1994) discussed in Section 2.1.2, looked only at schwa deletion that leads to a legal cluster in each of the languages. None of these studies address the role of phonotactics in determining whether schwa deletion is allowed, which is one of the main questions of the present study. Beckman (1996) attempts to provide an analysis of some of the findings detailed in this section in terms of gestures and the types of coordination that are often attributed

Page 151: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

141

to gestural relationships. Following similar analyses in Browman and Goldstein (1990b), Beckman hypothesizes that if the apparent deletion of schwa in various environments in English

…is demonstrably nothing more than the extreme endpoint of an attested continuum of degrees of reduction and confusability, then the continuously variable values of overlap in the gestural score are a better representation than a categorical phonological rule of schwa deletion for fast-speech pronunciations of words (100).

In addition to English pre-tonic schwa deletion, Beckman also presents similar types of phenomena that occur in Japanese, Korean, and Montreal French. More specifically, the devocalization of vowels in certain environments in these languages and the apparent deletion in English and German can both potentially be attributed to greater overlap between the vowel and either preceding or following consonants, since the articulatory or aerodynamic environments imposed by the consonants may eclipse the acoustic presence of the vowel. In addition to the production issues raised by vowel deletion or devoicing, Beckman also discusses the potential for these phonetic processes to affect prosodic reanalysis. This refers to the types of categorical changes that can be found over time (see Ohala 1974, 1981), perhaps as in the “obligatory” schwa deletion found word-medially in words like camera or every. Beckman remarks that while phonotactic considerations may not preclude extreme reduction or deletion in production, reanalysis of these forms will certainly be blocked if they do not conform to the phonotactic patterns of the language. Other factors that may contribute to the possibility of reanalysis are stress, metrical structure, the existence of potential homophones, and the sociolinguistic significance of reduction in a particular language. Beckman’s discussion of prosodic reanalysis suggests another important point that has generally been overlooked by both phonologically and phonetically-oriented researchers in the study of pre-tonic schwa deletion. If schwa deletion is phonological, it might be expected that it would interact with other phonological phenomena. Specifically, it is plausible that while deletion may occur in many types of /#C´C-/ environments, those environments which would result in phonotactically legal onset clusters after deletion might show higher rates of deletion. Without analyzing empirical data, phonological accounts have just assumed that deletion occurs across the board (Zwicky 1972, Hooper 1978, Kaisse 1985, Roca and Johnson 1999), whereas phonetic accounts have typically not investigated enough different environments to shed light on this question (Manuel et al. 1992, Fokes and Bond 1993). Yet, an interaction between deletion and phonotactics would be a strong indication that deletion involves the modification of the gestural score, not just gestural overlap. This issue will be one of the aspects of pre-tonic schwa deletion discussed in this chapter.

6.1.3. The influence of speaking rate on gestural coordination The effect of speaking rate on the coordination relationships between different types of gestures is important because it sheds light on the way that gestural patterns can be manipulated. To the extent that “paralinguistic” factors may influence phonetic and/or phonological regularities (Liberman 1983, Gafos to appear), examining the changes that occur as a result of speech rate (or registers of formality, or sociolinguistic variables such

Page 152: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

142

as group membership, etc.) may indicate not only what types of modifications are possible, but at what level they reside (i.e. phonology vs. phonetic implementation). Research on how speaking rate affects gestural shape and coordination is often descriptive, but instructive in detailing the types of changes that have been observed. Gay (1981) discusses both acoustic and electromyographic (EMG) data indicating that segmental duration does decrease as speech rate increases, but that it does not do so uniformly. Generally, vowels are more compressed than consonants as rate increases. Another variable that has been investigated is velocity, which has been shown to increase for some articulators in rapid speech (e.g. Gay, Ushijima, Hirose and Cooper 1974, Gay 1981), but not for others (e.g. Benguerel and Cowan 1974). Adams, Weismer and Kent (1993) found that the most reliable speech-rate induced modification with respect to velocity was that the velocity profile changed from a symmetrical, single-peaked function at faster rates of speech (i.e. velocity changes occurred smoothly and rapidly) to a multi-peaked function in slower speech (i.e. velocity changes were discontinuous). In an investigation of the control of rate and duration in the tongue dorsum, Ostry and Munhall (1985) found that changes in rate were implemented differently by individual speakers. For two of their speakers, the change from the slow to fast condition was reflected by a decreased average tongue dorsum movement amplitude, whereas the third speaker increased the average maximum velocity for the tongue lowering gesture. These are just a few of the variables that are affected by speech rate. As suggested by Beckman (1996), the factor most relevant to the study of pre-tonic vowel deletion in fast speech is the amount of gestural overlap found at various speech rates, and whether it changes as a function of rate. A number of studies have provided both acoustic and kinematic measures indicating that rate does induce greater overlap in various environments, including intrasyllabically and across word boundaries (Engstrand 1988, Boyce, Krakow, Bell-Berti and Gelfer 1990, Munhall and Löfqvist 1992, Zsiga 1994, Solé 1995, Byrd and Tan 1996, Tjaden and Weismer 1998, Shaiman 2001). For example, in their study of voiceless consonants at the word boundary in the phrase Kiss Ted, Munhall and Löfqvist (1992) found that at slow rates, each consonant retained its own glottal opening gesture, but that at quicker rates of speech, only one glottal opening motion was observed. They conclude that the single opening gesture results from greater overlap of the consonants and subsequent blending of the glottal gestures. Munhall and Löfqvist also carry out simulations of this effect, assuming that greater overlap is the only type of change that occurs as a function of rate. However, this is not necessarily the case. Many studies of speech rate show that there is a decrease in the total duration of CV, C#CV, VCC sequences, and so on, but it is not necessarily the case that smaller durations result from increased overlap only. Byrd and Tan’s (1996) examination of electropalatography (EPG) data demonstrated that while there was some greater gestural overlap for consonants at word boundaries at faster rates, it is also true that the duration of the individual consonants could decrease. It is also worth emphasizing the fact that Byrd and Tan examined overlap at word boundaries; their results may or may not be generalizable to word-internal gestural coordination patterns. Shaiman (2001) looked at the production of CVC, CVCC, and CVCCC words to determine how vowel length is affected by both speaking rate and coda composition (number of consonants in the coda). Using a kinematic jaw opening measure, she found that changes in vowel duration as a function of speaking rate are implemented by both

Page 153: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

143

increasing gestural overlap and decreasing the magnitude of individual gestures, whereas increasing the number of consonants in codas led only to greater overlap of the vowel by the consonants, not to decreased magnitude. Tjaden and Weismer (1998) examined properties of formant transitions in order to investigate the extent of gestural overlap. Tjaden and Weismer predicted that if there is greater overlap of gestures at faster rates of speech, then the F2 onset frequency in the vowel of CVC words should covary with vowel duration (see also Weismer, Tjaden and Kent 1995a, b). More specifically, if overlap of the first consonant and the vowel leads to the obscuring of part of the vowel gesture by the consonantal closing gesture, then even more of the vowel should be eclipsed at faster rates and higher degrees of overlap. By the time the vowel is acoustically present in the more overlapped tokens, it will be at a later point in the vowel and closer to the F2 target values than to the F2 onset values. Results from regression functions showed that F2 onset frequency was significantly related to vowel duration. This suggests that F2 onset frequency is sensitive to rate and gestural overlap. However, this covariation only occurred over extreme changes in speaking rate; in other words, when small variations in rate were implemented, there was no change in the relationship between F2 onset values and vowel duration. Tjaden and Weismer take this to suggest that the linear variations in gestural overlap as implemented in Munhall and Löfqvist’s (1992) simulation may not be accurate. Furthermore, because small variations do not have an effect on F2 transition values, Tjaden and Weismer suggest that there may be categorical differences in the way that overlapping gestures are implemented in faster and slower speech. In addition, they mention that gestural magnitude, though not directly investigated in their study, may also be affected by speech rate. Zsiga (1994) also investigated the relationship between overlap and F2 transitions, focusing on whether greater consonantal overlap in Vd#C sequences would increasingly affect the values of the F2 transitions for the vowel as speech rate increased. She hypothesized that if /d/ and the following consonant were substantially overlapped, then the vowel may demonstrate the characteristic F2 transition patterns corresponding to the place of the consonant following the /d/. Furthermore, as rate increases, the influence of the second consonant may have an even more dramatic effect on the F2 transitions. Zsiga found that while there is an influence of the second consonant at normal speech rates, there is no consistent further effect at increased speech rates. However, she does find evidence of increased overlap in that there was an increase in the ratio of vowel duration to the closure duration of the following consonants for some tokens. In other words, when there is greater overlap between consonant closure gestures, the total duration of the consonant closure is shorter, and so the ratio of vowel duration to closure duration increases. Although Zsiga (1994), Byrd and Tan (1996), Tjaden and Weismer (1998) and Shaiman (2001) all present similar conclusions regarding the effect of overlap on various kinematic or acoustic measures, they also find that there is substantial variability among speakers with respect to how speech rate affects overlap (see also Kuehn and Moll 1976, Lubker and Gay 1982, Adams et al. 1993, Shaiman, Adams and Kimelman 1997, Matthies, Perrier, Perkell and Zandipour 2001 for further evidence regarding variation in the implementation of speech rate changes). Tjaden and Weismer (1998) point out that such wide variability has two important consequences. First, it is risky to make

Page 154: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

144

conclusions based on data from one or two speakers, since two different speakers may use entirely different mechanisms for implementing fast vs. slow speech. Second, the fact that there may be numerous mechanisms employed by speakers to vary speech rate is interesting in itself, and is deserving of further study. Individual speaker variability will be discussed again below in the context of the experimental data presented in Section 6.2. In the next section, an experiment designed to investigate pre-tonic schwa deletion in fast speech in a large number of phonotactic environments is presented. It is hypothesized that even if speakers are not actually deleting the vowel gesture, then the appearance of deletion may arise from the considerable overlap of the first consonant gesture in the initial syllable of the word with the following schwa. A number of factors are addressed in the results in Section 6.3, including apparent vowel deletion—called elision—from the acoustic record (Section 6.3.2), significant periods of devoicing or aspiration occurring between the consonants of the /#C´C-/ sequences (Section 6.3.4), the results for individual sequences to determine the effect of phonotactic legality (Section 6.3.5), the behavior of individual speakers (Section 6.3.6), and the effect of word frequency on elision (Section 6.3.7). The potential compatibility of these results with the gestural coordination constraints introduced in Chapter 5 is discussed in Section 6.4.

6.2. Experiment In this study, a large number of /#C1´C2-/ sequences have been selected in order to examine whether phonotactics play a role in determining whether deletion can occur and clusters can be formed. Despite previous claims by phonologists that pre-tonic schwa deletion is an across-the-board process, it is hypothesized that phonological deletion involving the removal of the schwa gesture should be more frequent when it would result in the creation of phonologically legal word-initial clusters. On the other hand, as suggested by Beckman (1996), it may otherwise be the case that “deletion” is only apparent, and is actually the result of gestural overlap. That is, speakers may not modify the gestural score by deleting the schwa gesture, but they may alter the coordination of the C1 and /´/ to create greater overlap between them (see evidence from Tjaden and Weismer 1998 that speakers can modify the amount overlap between the onset consonant and following vowel). If the overlap is extreme enough, the schwa may be so eclipsed that it is absent. If this is the strategy that speakers are employing, then it is not necessarily the case that overlap should be sensitive to the phonotactic environment. However, it is possible to make predictions about some phonetic characteristics that should be dependent on the particular consonants in the /#C1´C2-/ sequences; these are introduced in Section 6.2.2 and refined in Section 6.3. Finally, even if the acoustic evidence does suggest that speakers are implementing overlap rather than deletion, it may still be governed by the phonology. More specifically, it could be that coordination constraints are recruited in order to execute differences between speech rates. This is discussed in more detail in Section 6.4.1. The few experimental studies examining the acoustic properties of the schwa or “deleted” schwa in fast speech have typically been restricted to one or two tokens of /s/-initial or other consonants that lead to potentially legal word-initial clusters (e.g. Manuel et al. 1992, Fokes and Bond 1993). In this study, many more environments are tested in order to examine the hypothesis that the phonotactic characteristics of the resulting cluster influence deletion. Furthermore, two distinct definitions of deletion are applied in

Page 155: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

145

order to garner the most information about the process underlying deletion (or overlap): elision, which refers to the absence of voicing, formant structure, and aspiration which may signal a voiceless vowel, and schwa devoicing, which takes into account the cases where there is no voicing or voice bar, but where there is a significant period of aspiration or devoicing, signaling the retention of the schwa. Note that this use of “elision” is not meant to have any theoretical import; it is simply meant as a descriptive term to denote the lack of acoustic vocalic information. After this study was completed, it was discovered that another investigation of deletion in a large variety of /#C1´C2-/ sequences in British English has also been carried out. Głowacka (2001) examined whether phonotactically illegal consonant clusters could be created by deletion, both in read speech and in spontaneous speech for 19 British speakers. Głowacka focused mostly on varying the general categories of phonotactic contexts, and her reading passages contained 46 words that were (unevenly) divided into 26 phonotactic environments, such as [voiceless fricative __ voiceless stop] (suppose, chicanery), [voiced stop__ voiceless fricative] (position), [nasal __ voiceless stop] (material), and [voiceless stop __ approximant] (cholesterol, police).24 As in the experiment below, Głowacka is careful to define actual schwa deletion as occurring only when there is neither evidence of voicing, nor any aspiration following a voiceless stop. Large periods of aspiration after voiceless stops are treated as devoiced vowels. Głowacka’s results are shown in the graph in Figure 38. Of the 12 environments which produced more than 10% deletion, five of them had a sonorant as the second member, and four of them had /s/ as the second member. There are five cases which have no deletion, and these are generally either voiced contexts or /r/-initial environments. Głowacka offers the following summary:

The best contexts for unstressed vowel deletion in English are the ones which consist of two voiceless obstruents (e.g. persuade, potential), a voiceless stop followed by a resonant (e.g. tomorrow, solicitor), a voiced obstruent followed by an approximant (e.g. believe, variety), or a nasal followed by an approximant (e.g. malicious) (78).

She also provides some information about the proportion of aspiration in voiceless obstruent-initial sequences. For [voiceless stop __ approximant] sequences (e.g. police, collection), 66% contain significant aspiration, and 9% demonstrate deletion. For [voiceless stop __ nasal] sequences (e.g. tomorrow, contain), there is 49% aspiration and 26% deletion. Since Głowacka does not provide very much detail about her phonetic measurements, it is difficult to know whether it was possible for her to distinguish between the schwa and following consonant, especially /l/ and /r/. It may be that her deletion rates are inflated to some degree if it was impossible to determine the difference between the schwa and a following approximant. Because Głowacka had only one token per individual /#C1´C2-/ sequence and never more than 3 that could be grouped under the same category, it difficult to determine the reliability of her results. In addition, her speakers were British, so it could be

24 It should be noted that 8 of Glowacka’s stimuli would actually lead to tri-consonantal clusters if deletion occurs, as in discovered, computers, neglecting, etc. This may provide an extra confound regarding the likelihood of deletion in these tokens. In some cases, the increased number of consonants may block deletion. In others, like computers, deletion may occur because m can serve as a nucleic nasal.

Page 156: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

146

interesting to compare her findings to those from speakers of American English. The results of that experiment are reported below.

0

0

0

0

0

2.6

2.6

4.3

5

5.2

5.2

5.2

5.5

5.6

6.1

9

10.5

15.7

15.7

15.7

21

26

28.9

28.9

28.9

31.5

36.8

52.6

0 10 20 30 40 50 60

v_k

r_s, r_f

m_t

b_g

b_n,d_m

s_bl, f_g

n_gl

d_t, b_k, d_p

d_sk, d_st

l_m

d_v

s_v

v_l

t_g, t_d

sh_k, s_k

k_l, p_l

v_n

p_z

b_l

n_s

v_s

k_mp, k_nt, tm

p_f, p_s

p_t

s_s

s_p

m_l

v_r

#C_C

pho

nota

ctic

env

iron

men

t

Percentage of deletion

Figure 38. Pre-tonic schwa deletion in word-initial position in read speech.

Deletion values for particular sequences or groups of sequences were determined from numbers in tables, figures and the text of Głowacka (2001).

Page 157: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

147

6.2.1. Participants The participants were 9 Johns Hopkins University undergraduates who received course credit for their participation. All of them were native speakers of English. No speaker reported any hearing impairments or speech impediments.

6.2.2. Materials Target sequences in this experiment are words with an initial pre-tonic /C´/ syllable (i.e. semester, demolish, Venetian) where deletion of the vowel would lead to the creation of an initial two-member consonant cluster. The set of target words is comprised of three tokens each of 28 different /#C´C-/ sequences, for a total of 84 words. The target words can be grouped into categories based on certain characteristics of the consonants comprising the clusters. First, it is hypothesized that there should be differences in deletion rates based on whether a sequence is stop-initial or fricative-initial. Because the burst of the stop and formant transitions into a following sonorant are critical cues for identifying a stop, it is hypothesized that it is less likely that the schwa will be deleted in this environment. In addition, since most of the fricative-initial target words begin with /s/, which is a common first member of initial clusters, these sequences are more likely candidates for deletion. Second, it is also hypothesized that the voicing of the first consonant could influence whether the schwa is deleted. If the formation of an initial cluster through deletion is sensitive to the relative markedness of the voicing of the resulting sequence, then there should be less deletion in the voiced category. Third, it is hypothesized that deletion may be greater when the second consonant of the /#C1´C2-/ sequence is /l/. Like /s/-initial clusters, /l/-second clusters are legal in English, and this could affect deletion rates. The fricative-initial and /l/-second categories include sequences that would lead to both legal (i.e. /#s´m-/ /#sm-/ and /#b´l-/ /#bl-/) and illegal clusters (i.e. /#s´b-/ /#sb-/ and /#d´l-/ /#dl-/). At the start, it is not clear if the potentially illegal sequences are likely to show deletion, or if deletion will be blocked because a non-legal cluster would result. For the time being, they will be grouped with other fricative-initial and /l/-second sequences. The final categories used in the experiment are shown in Table 15. The number of sequences in each category is not equal, because one the main goals of the experiment was to obtain a broad cross-section of /#C1´C2-/ sequences that could result in deletion. Rather than limit the number of sequences in some categories, it was decided that it would be better to have an unequal number. The actual words used in the experiment are given in Appendix 3.

Page 158: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

148

Categorization criteria Sequences Used Voiceless stop-initial k´p- p´t- p´θ- p´d- t´b- k´m- t´m- Voiced-initial b´g- d´b- d´v- d´m- d´n- v´n- d´k- d´p- b´f- d´f- (Voiceless) fricative-initial f´t- s´b- s´p- s´f- s´v- s´m- /l/-second g´l- d´l- m´l- b´l- s´l-25

Table 15. Experimental /#C1´C2-/ tokens The 84 target words were incorporated into four reading passages that were created for this experiment. The passages ranged in length from 200-400 words, and each passage contained between 13-27 target items.

6.2.3. Design and procedure Participants were recorded in a quiet room using a Sony ECM-717 microphone and a Sony MZ-R50 MiniDisc recorder. They were given the four reading passages each on a separate sheet of paper plus a practice passage and told to read them out loud first at a normal reading rate and then a second time, as fast as they could without making mistakes. The order of presentation of passages was randomized for each participant.

6.2.4. Analysis The acoustic data for each participant were digitized at 22kHz and analyzed using Praat for Windows. In order to ascertain the presence or absence of a schwa in /#C1´C2-/ sequences, the waveform and spectrogram of each target word were examined. For each sequence, five duration measurements were taken: the entire /#C1´C2-/ sequence, C1, C1 burst and aspiration, vowel, and C2. Any part of the inter-consonantal interval which included a voice bar and/or formant structure was considered part of the vowel. This is consistent with the claim that the voice bar in this environment is a glottal gesture of vowel retention (Manuel et al. 1992). In the first step, vowel elision was counted only when both the vowel and C1 aspiration were absent, since aspiration could indicate a devoiced vowel (Keating 1984). The spectrograms in Figure 39 and Figure 40 illustrate the presence of the schwa in slow speech and its deletion in fast speech in the phrase “not suffice”. In Figure 39, the slow speech sample, a voiced segment between [s] and [f] is present, indicating a schwa. There is no corresponding voicing in Figure 40.

25 The sequence /s´l-/ was placed into the /l/-second category in order to make the sizes of the categories more equal. This is a somewhat arbitrary decision, but it will be shown in the results that has no crucial effects on the outcome.

Page 159: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

149

(n) o t s u ff i ce

Figure 39. Schwa retention is slow speech “suffice”

(n) o t s ff i ce

Figure 40. Schwa deletion in fast speech “suffice”

6.3. Results

6.3.1. Speaking rate Although a specific fast speaking rate was not induced by any external method other than telling the speakers to speed up, an increase in rate in the fast condition was verified by an Analysis of Variance, treating participants as a random factor. The dependent variable was the duration of the entire /#C´C-/ sequence, and the independent variable was the story rate condition (slow and fast). The mean duration for /#C´C-/ sequences in the slow condition was 183ms, and the mean duration in the fast condition was 160ms. Results confirmed a significant difference between duration in the slow and fast conditions [F(1,8)=77.03, p<.001].

6.3.2. Elision The data for elision in slow versus fast speech was examined using a repeated measures ANOVA. The independent variables were sequence type (voiceless stop-initial, voiced-initial, fricative-initial, and /l/-second, as shown in Table 15) and rate (slow and

Page 160: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

150

fast). Participants were treated as a random variable. The dependent variable is elision, as defined in Section 6.2.4. Mean proportion of deletion for sequence type by rate is shown in the graph in Figure 41. Results from the ANOVA show a main effect of sequence type [F(3,24)=11.95, p<.001], but no main effect of rate [F(1,8)=1.70, p=.23]. The interaction between sequence type and rate is not significant [F(3,24)=2.48, p=.09].26

0.02 0.010.13

0.19

0.03 0.05

0.280.20

0.00

0.10

0.20

0.30

0.40

0.50

Vcd Initial Vcls Stop Initial Fric Initial /l/ Second

Prop

ortio

n of

Elis

ion

SlowFast

Figure 41. Elision by sequence type

With respect to overall deletion patterns across categories, a Student-Newman-Keuls post-hoc test indicated that fricative-initial and /l/-second sequences show significantly more deletion (21% and 19%, respectively, collapsing over rate since there was no main effect of rate) than voiced stop-initial (2%) or voiceless stop-initial (3%) (p<.05). Neither fricative-initial and /l/-second sequences nor voiced stop-initial and voiceless stop-initial sequences were significantly different from one another. This suggests that while voicing does not have an effect on elision in stop-initial sequences, being fricative-initial or /l/-second does. Likewise, the fact that there was no main effect of rate—but that there is still elision for two categories despite this—indicates that elision is present to some extent in the speech of English speakers for multiple speaking registers (perhaps excluding a particularly formal register).

6.3.3. Schwa devoicing In order to determine the whether speakers’ utterances include phonological deletion and/or gestural overlap, elision is not the only relevant variable that must be

26 As shown in Figure 41, there is a large increase in elision in the fricative-initial category from slow to fast speech (13% vs. 28% elision). When participants are included as a random variable, this is not significant (hence no significant interaction between sequence type and rate). However, when participants are not included as a random variable, the comparison for fricative-initial sequences is significant [F(1,314)=84.04, p<.001]. This effect of the participants indicates that there is considerable variability among the speakers, which will be addressed in further detail in Section 6.3.6.

Page 161: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

151

measured. It is possible that the same articulatory maneuver on the part of the speaker results in a different acoustic manifestation depending on the phonotactic environment, and elision may not be the best measure for each type of experimental sequence. More specifically, if overlap of the consonant and schwa gestures occurs across-the-board for all sequence types either for fast speech or simply in a speaker’s habitual speaking register, it would likely have a different acoustic consequence for different flanking-consonant environments. First, in the case of voiceless stop-initial sequences, substantial overlap of the stop and the schwa may lead to devoicing of the vowel, or the appearance of aspiration after the stop even though it is in an unstressed syllable. This would occur because the duration of the glottal opening gesture associated with the initial voiceless stop extends after the stop is released, so if the stop and schwa gestures were overlapped, production of the devoicing gesture may overshadow or even preclude production of a voiced schwa. Consequently, elision may be low since this measure treats aspiration/devoicing as the indication of the vowel, but the proportion of cases showing aspiration/devoicing (but no voice bar) should be high. The distinction between “normal” (or canonical) production and overlapping gestures is illustrated in (1). Illustrative spectrograms for the conditions in (1)-(4) are shown in Appendix 5. (1) a. Normal voiceless consonant-schwa coordination with voiced schwa present (the

vertical dotted line marks the alignment)

p h ´ d b. Overlapping voiceless consonant-schwa coordination with no voiced schwa

present

p h •́ d If there is considerable overlap when the initial consonant is voiced, then there would likely still be a remnant of the schwa present in the acoustic record even if its duration is reduced. This is because there is no glottal abduction gesture eclipsing the production of the vowel, so unless the schwa is completely overlapped, a portion of it should remain on the surface of the speaker’s production. For this sequence type, it is predicted that elision should be low, and there should be few or no cases with

glottis open

glottis open

Page 162: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

152

aspiration/devoicing since vocal fold vibration can be maintained from the consonant to the schwa. A schematic for the overlapped voiced-initial case is shown in (2). (2) Overlapping voiced consonant-schwa coordination with voiced schwa present

b ´ d Considerable overlap when the first consonant in the /#C´C-/ sequence is a voiceless fricative is likely to look like an extended fricative. Since both fricatives and aspiration are characterized by aperiodic noise, there may be very little difference in visual appearance between them on a spectrogram. If the glottal abduction gesture for the /s/—which is typically longer than the one for a voiceless stop like /p/ (Kingston 1990) —eclipses the schwa, then the surface form may simply resemble a longer /s/. This explanation would predict that the duration of the /s/ when the schwa appears to be deleted should be longer than when the schwa is still present. Furthermore, if the gesture corresponding to the schwa is not actually deleted and an initial /s/-cluster is not formed, then it is expected that there would be aspiration on a C2 that is a voiceless stop as well as on C1. Thus, it is predicted that elision should be high and aspiration/devoicing should be low, since the ability to distinguish between the fricative and a devoiced vowel will be difficult. The duration of /s/ when /´/ is deleted should be longer than when it is not, and aspiration should be present on the following stop when one is present. The schematic for overlapping /s/ and /´/ and a following voiceless stop is shown in (3). (3) Overlapping voiceless fricative-schwa coordination with no schwa present

s s •́ ph

Finally, the case of /l/-second sequences is a little more complicated. In all of the stimuli used in this experiment, the initial gesture of the sequences is a voiced stop. Thus, if the stop overlaps the schwa, a remnant of the vowel should still be present on the surface. On the other hand, if the portion of the schwa remaining in the acoustic signal is very short, it may be virtually indistinguishable both perceptually and visibly (on the spectrogram) from the following /l/. Since /l/ is vowel-like and contains formant structure that is influenced by coarticulation with adjacent vowels (Ladefoged and Maddieson 1996), it would not be surprising if the remaining schwa portion could not be distinguished from the following /l/ (see also Patterson et al. 2003 for a similar concern). In this sequence it is predicted that elision will be high, since a schwa may be easily

glottis open glottis open

Page 163: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

153

confused with the acoustic characteristics of /l/, and aspiration/devoicing should be essentially non-existent since all C1s are voiced. In addition, the duration of /l/ when /´/ is deleted should be longer than when it is not if the schwa is being measured as part of the /l/. This is shown in (4). (4) Overlapping voiced consonant-schwa-/l/ coordination with no schwa present

b l l Given these scenarios for different /#C1´C2-/ sequences, it becomes clear that the elision measure used above may be misleading. Elision implies that if there is no voicing, aspiration, or formant structure present between the two consonants, then the schwa is also not present. However, it is now evident that even if none of these characteristics are found on the acoustic record for fricative-initial and /l/-second sequences, the schwa gesture, although overlapped, may still exist in the speaker’s output and may affect other factors, such as aspiration on the second consonant or duration of the fricative or /l/. The predictions based on these expectations about the surface features of overlapping for the different categories of /#C1´C2-/ sequences are summarized in Table 16:

Sequence Type Likelihood of Elision?

Likelihood of Aspiration/Devoicing?

Voiceless Stop-Initial (ex. /#p´t/) Low High Voiced-Initial (ex. /#d´v/) Low Low Fricative-Initial (ex. /#f´t/) High Low /l/-Second (ex. /#b´l/) High Low

Table 16. Predictions for acoustic output of overlapped /C/ and /´/ A gestural overlap account differs from true gestural deletion in that the only characteristic that should be important for deletion is whether or not the resulting cluster is phonotactically legal or not. Elision may be high for both fricative-initial and /l/-second sequences if speakers are actually deleting the schwa gesture, since most of these result in a legal onset for English. However, there should be “islands of resistance” for those fricative-initial and /l/-second sequences that would not be legal clusters after deletion. This will be investigated further in Section 6.3.5. In addition, a gestural deletion account does not make any predictions regarding the presence of aspiration/devoicing for voiceless stop-initial sequences. Since the presence of aspiration/devoicing is counted as the vowel being retained, there is no way to distinguish between tokens that have some voicing, and those that only have aspiration. However, this is an important clue as to how the coordination of these gestures is affected by speech rate, and should not be considered as the same phenomenon. In order to confirm whether or not these predictions are upheld, it is necessary to examine how often the interconsonantal interval in the participants’ productions of the

Page 164: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

154

/#C1´C2-/ sequences consisted of aspiration/devoicing (but no actual vowel). Since aspiration and devoicing are often visually and perceptually indistinguishable from one another, they are collapsed into one measurement. The proportion of productions with only aspiration/devoicing can be compared to those with elision and then assessed with respect to the predictions in Table 16. The data for elision in slow versus fast speech was examined using a repeated measures ANOVA. The independent variables were sequence type (voiceless stop-initial, voiced-initial, fricative-initial, and /l/-second, as shown in Table 15) and rate (slow and fast). Participants were treated as a random variable. The dependent variable is the proportion of tokens containing aspiration/devoicing. Tokens were positively coded as being aspirated/devoiced only if they did not also contain any voicing corresponding to a schwa. Cases which have neither aspiration nor a schwa are not included in this measure, since they were included in the Elision count. Mean proportion of deletion for sequence type by rate is shown in the graph in Figure 42. Results from the ANOVA show a main effect of sequence type [F(3,24)=71.69, p<.001], and a main effect of rate [F(1,8)=6.66, p<.03]. The interaction between sequence type and rate is significant [F(3,24)=3.34, p<.04].

0.04

0.34

0.08 0.040.05

0.43

0.07 0.070.00

0.10

0.20

0.30

0.40

0.50

0.60

Vcd Initial Vcls Stop Initial Fric Initial /l/ Second

Prop

ortio

n of

Asp

iratio

n/De

voic

ing Slow

Fast

Figure 42. Aspiration/devoicing by sequence type

Collapsing over rate, a Student-Newman-Keuls post hoc test indicates that only the voiceless stop-initial sequences have significantly more aspiration/devoicing than the other sequences (p<.05), which are not significantly different from one another (p=.50). A planned comparison of the voiceless stop-initial categories shows that there is only marginally more deletion in the fast condition [F(1,8)=4.43, p=.068]. Although there is a significant main effect of rate and a significant interaction between rate and sequence type, the fact that 34% of the voiceless-initial tokens in the slow condition already contain only aspiration or a devoiced vowel again indicates that “deletion” is not simply a property of fast speech, but extends into casual speech as well. Figure 43 combines both Elision and Aspiration/Devoicing on one graph to demonstrate that the predictions in Table 16 are upheld.

Page 165: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

155

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

Vcd Stop Initial Vcls Initial Fric Initial /l/ Second

Prop

ortio

n/Sc

hwa

Qua

lity

Slow: ElisionFast: ElisionSlow: Asp/DevoiceFast: Asp/Devoice

Figure 43. Elision compared to aspiration/devoicing

In order to determine whether the prediction about the duration of frication is upheld, the duration of /s/ in sequences with vowels versus clusters resulting from schwa deletion is examined. In her impressionistic account of schwa deletion in English, Hooper (1978) noted that fast speech /s/ in /#sC-/ clusters that arise from /#s´C-/ sequences is longer in duration than the slow speech /s/ where deletion has not occurred. In the comparison of French de rôle [d´“ol] “of role”, the elided form d’rôle [d“ol], and the monomorphemic drôle [d“ol] “funny”, Fougeron and Steriade (1997) found that the duration of the [d] in d’rôle was not significantly different than the /d/ in de role, but it was significantly longer than the [d] in drôle. Duration of /s/ for both slow speech and fast speech elisions are calculated. The number of tokens for each condition is summarized in Table 17 and duration measurements are shown in the graph in Figure 44:

Individual Tokens

Number of speakers with tokens in category

Slow Elided 16 5 Slow Unelided 115 7 Fast Elided 40 9 Fast Unelided 91 9

Table 17. Number of cases of /s/-initial tokens with elided schwa and with schwa present by individual

token and by speaker

Page 166: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

156

10793 100

122110 112

0

20

40

60

80

100

120

140

Slow Fast Both rates avg

/s/ d

urat

ion

in m

s

Schwa presentSchwa elided

Figure 44. Duration in ms. of /s/ in tokens with schwa elision vs. schwa retention

For the slow rate, a t-test based on participant averages does not show a significant difference [t(10)=1.56, p=.15]. For the fast rate, however, a t-test shows that /s/-duration is significantly longer when the schwa is elided [t(14)=2.45, p=.03]. Likewise, when the durations are averaged over speech rate (the third column of the graph in Figure 44), the duration is also significantly different [t(15)=2.18, p<.05]. The difference in the slow condition is likely not significant because the number of elided tokens in this condition is too small. The fast condition and averaged values, however, indicate that the duration of /s/ in tokens with elision is significantly longer than tokens which exhibit the vowel. A similar measurement can be made for /l/-second tokens. The number of tokens for each condition is summarized in Table 18, and durations are shown in Figure 45.

Individual Tokens

Number of speakers with tokens in category

Slow Elided 25 8 Slow Unelided 108 9 Fast Elided 26 9 Fast Unelided 107 9

Table 18. Number of cases of /l/-second tokens with elided schwa and with schwa present.

Numbers given for individual tokens and for speakers.

Page 167: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

157

5850 54

7357

65

0

10

20

30

40

50

60

70

80

90

Slow Fast Both rates avg

/l/ d

urat

ion

in m

s

Schwa presentSchwa elided

Figure 45. Duration in ms. of /l/ in tokens with schwa elision vs. schwa retention

Similar to /s/ in the fast condition, the duration of /l/ is significantly longer when the schwa is elided than when it is not in the slow condition and when averaged over speech rate [Slow: t(15)=2.44, p<.03; Fast: t(13)=1.61, p=.13; Averaged: t(16)=3.38, p<.001]. Again, the averaged condition is the most robust measurement, and it is significant. The finding that /l/ is longer when there appears to be schwa elision is consistent with the results for /s/, suggesting that the schwa is in fact being produced by the speaker but is obscured on the acoustic record. Finally, in order to determine whether or not the absence of vocalic material corresponding to the /´/ leads to the formation of a true initial /s/-cluster, all /#s´p-/ tokens (the only /s+voiceless obstruent/ sequences in the experiment) which were coded as having no vowel or vowel-like material were examined to see if the /p/ was aspirated. These tokens consisted of the words superfluous, superior, and support. Of 13 tokens with schwa deletion (two in slow speech and 11 in fast speech), all of them exhibited aspiration on the /p/. Furthermore, the range of aspiration, 34ms-77ms, is typical of the VOT for voiceless stops in English. Since it is well known that voiceless stops in initial s-clusters in English are not aspirated (Kahn 1976, Browman and Goldstein 1986, Cooper 1991), these findings are consistent with the argument that a separate glottal gesture for each consonant is present even though the schwa appears to be deleted. An example of the word superior without a schwa but with a large period of aspiration is shown in Figure 46:

Page 168: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

158

Time (s)0 0.491247

0

5000

Freq

uenc

y (H

z)

s p (asp) i …

Figure 46. The word superior with vowel deletion and aspiration on the /p/

(Participant 6, fast condition) The results in this section show that the predictions in Table 16 are upheld. The amount of elision for voiceless stop-initial and voiced-initial sequences is low, and the amount of aspiration/devoicing for voiceless stop-initial sequences is high. For both fricative-initial and /l/-second sequences, the amount of elision is high. These findings are further corroborated by the finding that the /s/ and /l/ in tokens with elision is significantly longer than those without elision. In addition, the /p/ in /#s´p-/ sequences is aspirated, indicating that the absence of a vowel does not lead to the formation of a typical /#sp/ cluster. These difficulties for fricative-initial and /l/-second sequences may lead to an inflated proportion of deletion in these phonotactic categories, a point which is further addressed in the next section.

6.3.4. Schwa duration Under a gestural overlap analysis of elision, apparent schwa deletion is the result of total overlap of the preceding consonant gesture and the schwa. If overlap increases in fast speech, then significant shifts in the duration of the schwa in comparing normal to fast speech should be observed. Specifically, there should be an increase in the total number of tokens with increasingly smaller vowel durations, with elision being the extreme option along a continuum of vowel reduction (Browman and Goldstein 1990b). The distribution of schwa durations are shown in the histograms in Figure 47. The duration measurement includes both voiced and devoiced/aspirated portions of the schwa.

Page 169: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

159

120.0110.0

100.090.0

80.070.0

60.050.0

40.030.0

20.010.0

0.0

Vowel Duration Slow

Freq

uenc

y

80

60

40

20

0

Std. Dev = 24.34 Mean = 49.5

N = 759.00

100.090.0

80.070.0

60.050.0

40.030.0

20.010.0

0.0

Vowel Duration Fast

Freq

uenc

y

100

80

60

40

20

0

Std. Dev = 24.27 Mean = 42.9

N = 743.00

Figure 47. Distribution of schwa durations for slow and fast speech tokens

Duration measurements include both vocalic and devoiced/aspirated portions of the vowel. The distributions in these histograms do not quite conform to the expectation that if elision is a result of increased gestural overlap, then there should be a shift from a normal distribution to one which has greater frequencies of bins closer to 0ms. In other words, if elision were part of a continuous process, a histogram more similar to the hypothetical one in Figure 48 might be expected.

Page 170: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

160

Figure 48. Hypothetical histogram representing a phonetic continuum of reduction The number of tokens expected in the 0ms bin for each of the speaking rates can be determined based on the total N and the standard deviation of the distribution. Because there cannot be any negative values for duration, or anything shorter than ~6-10ms, since this is the approximate minimal duration of a glottal pulse depending on gender (Johnson 1997), it must be assumed that all of the tokens less than 6ms that would otherwise be found in the left-hand tail of the distribution are actually “piling up” in the 0ms bin. Given the N, mean and standard deviation of the distributions in the slow and fast conditions, the binomial distribution predicts that 29 tokens (4%) are expected to be found in the 0ms bin in the slow condition, and 49 tokens (7%) are expected for the fast condition. Both of these are significantly different from the number of tokens actually found in 0ms bin for the slow and fast conditions (p<.001, slow N=54, fast N=85). The fact that a distribution like that in Figure 48 is not found for the actual schwa durations may be a result of the inflated proportions of elision discussed at the end of Section 6.3.3. If it is really the case that the overlapping frication and glottal gesture of voiceless fricative-initial /#C´C/ sequences makes any evidence of a schwa indistinguishable from the preceding frication on the acoustic record, then the frequency of tokens in the 0ms bin will be artificially high. Likewise, if a very short duration /´/ merges into the /l/, this will also contribute to increasing the number of 0ms tokens. In order to determine whether the rate of elision is really inflated or not, articulatory measures would be necessary in addition to acoustic ones. For example, ultrasound imaging could be used to determine differences in tongue shapes and tongue trajectories for tokens in which there is an acoustic schwa as compared to those utterances which do not contain a schwa on the acoustic record. However, in the absence of articulatory data, it is possible to use the existing acoustic data to speculate on whether the number of tokens in the 0ms bin is inflated. For the tokens of elided /s/-initial and /l/-second sequences that have longer-than-average /s/ and /l/ durations, it can be hypothesized that the extended duration arises from the indistinguishable frication or formant structure (respectively) that actually pertains to the schwa. For each rate condition, the durations of longer-than-average consonants in the elision category can be normalized by subtracting the average /s/ or /l/ duration (not including the elided productions) and treating the remainder as the duration of the vowel. The normalized data can be replotted in histograms to determine whether or not the number of tokens in the 0ms bin is still greater than what would be expected if elision is really due to varying degrees of overlap. These are shown in Figure 49.

Page 171: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

161

120.0110.0

100.090.0

80.070.0

60.050.0

40.030.0

20.010.0

0.0

Normalized Vowel Duration Slow

Freq

uenc

y

80

60

40

20

0

Std. Dev = 22.69 Mean = 50.8

N = 759.00

100.090.0

80.070.0

60.050.0

40.030.0

20.010.0

0.0

Normalized Vowel Duration Fast

Freq

uenc

y

80

60

40

20

0

Std. Dev = 22.63 Mean = 44.5

N = 742.00

Figure 49. Distribution of normalized schwa durations for slow and fast speech tokens

In this case, the binomial distribution predicts that 19 tokens (2%) are expected to

be found in the 0ms bin in the slow condition, and 34 tokens (5%) are expected for the fast condition. Now, these values are not significantly different from the number of tokens actually found in 0ms bin for the slow and fast conditions (slow N=25, p=.07; fast N=38, p=.21). These findings provide tentative support for the possibility that the rates of elision are in fact inflated, and that using the acoustic record to determine when deletion/elision has occurred is not always reliable. It should of course be emphasized that this normalization is speculative, but it nevertheless presents an interesting result.

Page 172: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

162

6.3.5. Individual /#C´C/ sequences A look at the individual /#C´C/ environments shows the breakdown of the categories in Figure 43. Figure 50 shows the proportion of elision for each sequence, and Figure 51 contains the proportion of aspiration/devoicing for the individual sequences. A few important points can be made from these graphs. First, the role of phonotactics can be addressed with respect to the individual sequences. For the sequences in which elision is relatively high (i.e. /#s´C/ and /#C´l/), it is shown that the prospective legality of the resulting /#CC/ does not affect whether the sequence will result in greater deletion or not. In the fricative-initial category, /#s´b/ and /#f´t/ sequences show deletion rates similar to potentially legal sequences. In the slow condition, /#s´v/ does not have any deletion, but in the fast condition, it contains 11% deletion. Likewise, in the /l/-second environment, /#d´l/ shows 11% deletion in the slow condition (and 7% in the fast), and /#m´l/ has the second highest deletion rates in the category. In the devoiced vowel/aspiration measurement in Figure 51, the most aspiration occurs for /#k´m/ and /#t´m/ regardless of the speech rate. This may be because like /l/, it is difficult to distinguish a short, highly overlapped schwa from the following nasal. Alternatively, if this condition is similar to the phonotactic environments explored for the non-native cluster experiment, then it could be that greater overlap is allowed in front of a sonorant. That is, if the phonetic environment for the release of the stop is more favorable when the second consonant is a sonorant (Steriade 1997), then it is possible that the amount of gestural overlap allowed is also sensitive to the nature of the second consonant. The sequence /#g´l/ is also produced with a considerable amount of devoicing at both speech rates (slow: 15%, fast: 31%). A spot check of these tokens reveals that there is a significant amount of spirantization of the /g/, which leads to both the /g/ and the following schwa appearing as fricative-like on the spectrogram. In this case, then, the spectral portion corresponding to the schwa may be voiced, but the energy is not periodic. It is not clear why /g/ is more likely to be spirantized than other stops like /b/ or /d/. One possibility is that speakers can weaken /g/, thereby reducing its articulatory difficulty, since spirantized /g/ cannot be confused with any other sound of English. In the case of /b/ and /d/, however, spirantization could lead to confusion with /v/ and /D/, respectively (Ortega-Llebaria 2002). In sum, two points can be addressed regarding the breakdown of performance on individual /#C´C/ sequences. First, the phonotactic environment does not seem to have a large effect on whether or not elision can occur in that /#s´C/ and /#C´l/ sequences that would lead to both legal and illegal initial clusters demonstrate similar amounts of deletion. Instead, the articulatory and acoustic properties of /s/ and /l/ seem to have a greater influence on deletion patterns. As explained in Section 6.3.3 for /s/-initial sequences, a particularly large glottal opening may entail a larger amount of overlap simply because the glottis is open and obscuring the /´/ for a longer period of time (Kingston 1990). Furthermore, if the devoiced vowel is both acoustically and auditorily difficult to distinguish from the frication of the /s/, then it may appear as if there is total deletion even if speakers are actually overlapping the consonant and the schwa. Similarly, for /l/-second sequences, overlap of the voiced consonant may lead to a particularly short schwa, which is then difficult to distinguish from /l/.

Page 173: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

163

Second, the information from the amount of aspiration/devoicing in individual sequences suggests that perceptibility of the consonants in the /#C1´C2-/ sequences may influence the degree of overlap that speakers are able to produce. The voiceless stop-initial sequences have the greatest amount of aspiration before nasals, which is consistent with the notion that the first consonant is most recoverable when it precedes a sonorant. Furthermore, of all /l/-second sequences with an initial voiced stop, only /#g´l/ gives rise to substantial aspiration. It is hypothesized that this aspiration is actually spirantization of the /g/, which occurs only for that particular voiced stop because it cannot be confused with any other fricative that is in the inventory of English.

0.00 0.20 0.40 0.60 0.80 1.00

db

df

dk

dm

dn

dv

vn

bf

dp

bg

kp

pt

tb

tm

pd

pth

km

sv

ft

sb

sf

sm

sp

gl

sl

dl

ml

bl

Voi

ced

Voi

cele

ss/s

//l/

proportion of elision

FastSlow

Figure 50. Proportion of elision for each individual /#C´C/ environment

Page 174: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

164

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

db

df

dk

dm

dn

dv

vn

bf

dp

bg

kp

pt

tb

tm

pd

pth

km

sv

ft

sb

sf

sm

sp

gl

sl

dl

ml

bl

Voi

ced

Voi

cele

ss/s

//l/

proportion of aspiration/devoicing

FastSlow

Figure 51. Proportion of devoicing/aspiration for each individual /#C´C/ environment

6.3.6. Individual speakers In order to further examine the observation that gestural overlap is implemented

to differing degrees by individual speakers, the performance of each individual participant on elision was examined. Elision is used as just one of several possible measures of individual variation. A breakdown of each participant indicates that there are two types: those that elide much more in fast speech, and those that elide regardless of speaking rate. This is shown in Figure 52.

Page 175: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

165

00.01

0.04

0.070.05

0.25

0.06

0.16

0.04

0.18

0.08

0.22 0.22

0.01

0.11

0.07

0.14

0.02

0

0.05

0.1

0.15

0.2

0.25

0.3

1 2 3 4 5 6 7 8 9Participant

Pro

porti

on o

f Elis

ion

SlowFast

Figure 52. Overall elision patterns by participant

Participants 1-4 can be classified as rate-dependent eliders (for each, t-tests confirm significantly more elision in fast speech, p<.04, except Participant 4, whose difference is marginally significant p<.07). Participants 5-9, who show similar amounts of elision in both slow and fast speech (even speaker 6’s patterns are not significant, p=.18), can be called rate-independent eliders. It should be noted that speech rate increased significantly for both rate-dependent (mean C´C duration: slow=175.3ms, fast=156.8ms, p<.006) and rate-independent eliders (mean C´C duration: slow=192.6ms, fast=163.6ms, p<.006). Re-examination of the voiceless-initial, voiced-initial, and /s/-initial sequence types by type of elider reveals that interaction between phonotactic context and elision is respected whether the elision is rate-dependent or not.

0.01 0.00 0.07 0.070.03 0.07

0.43

0.20

0.00

0.10

0.20

0.30

0.40

0.50

0.60

Vcd Stop Initial Vcls Initial Fric Initial /l/-Second

Prop

ortio

n of

elis

ion

SlowFast

Figure 53. Elision in rate-dependent eliders

Page 176: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

166

0.03 0.00

0.240.33

0.02 0.00

0.190.27

0.00

0.10

0.20

0.30

0.40

0.50

0.60

Vcd Stop Initial Vcls Initial Fric Initial /l/-Second

Prop

ortio

n of

elis

ion

SlowFast

Figure 54. Elision in rate-independent eliders

An ANOVA for rate-dependent eliders reveals a significant main effect for sequence type [F(3,9)=12.73, p<.001] and rate [F(1,3)=19.93, p<.02]. The interaction of sequence type and rate was also significant [F(3,9)=19.65, p<.001]. These results indicate that in the slow or more formal speech of rate-dependent eliders, there is nearly no schwa deletion in any /#C´C-/ sequences. In the fast speech of these speakers, however, t-tests show that there is a significant increase in the amount of elision in fricative-initial sequences [t(3)=5.46, p<.01] and /l/-second sequences [t(3)=5.00, p<.02]. For rate-independent eliders, there is a significant effect of sequence type [F(3,12)=5.49, p<.02] and no effect of rate [F(1,6)=2.48, p=.17]. The interaction of sequence type and rate was also not significant [F(3,12) < 1]. A Student-Newman-Keuls post-hoc test shows that collapsing over rate, the amount of elision in voiced and voiceless-stop initial sequences are not significantly different than one another, and fricative-initial and /l/-second sequences are both different from one another and from the voiced and voiceless stop-initial sequences (all p<.05). These results indicate that whether this schwa absence is a function of speaking rate or whether it is a general characteristic of a participant’s normal speech patterns, it is limited to fricative-initial and /l/-second sequences. The gestural configurations and amount of overlap necessary to account for these findings were discussed in Section 6.3.3. The variation due to speaker type and the fact that the most elision that occurs for any condition is a little over 40% of the time (for the fast speech of rate-dependent eliders) leads to the question of whether the overlap producing this absence is an optional, phonetically-driven process, or one which is actually specified in the coordination patterns of the phonology despite the fact the kind of extreme overlap leading to apparent deletion does not always happen when the appropriate environment is encountered. Possible analyses will be discussed in Section 6.4.

Page 177: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

167

6.3.7. Word frequency The last point to address with respect to speakers’ performance on words with pre-tonic schwa concerns word frequency. It has been argued that in word-initial, pre-tonic vowel reduction processes, the lexical frequency of a word interacts with the phonological environment—that is, the consonants surrounding the vowel targeted for reduction (Fidelholtz 1975). Fidelholtz specifically looks at “strong” initial, pre-tonic syllables, which he defines as a schwa followed by two or more consonants (e.g. abstain, purgation). Relying on the pronunciation given in Kenyon and Knott (1953) and the frequency counts in Thorndike and Lorge (1944), Fidelholtz argues that in environments in which vowel reduction is allowed to exist, whether or not it does occur is affected by the frequency of the word. It should be noted that by reduced vowel, this study refers to the ability to pronounce a pre-tonic vowel as schwa instead of the vowel with a fuller quality (e.g. mosquito as [m´skito], not [moskito]), not to the tendency of speakers to further shorten or delete vowels that are already indisputably pronounced as schwa. Regardless, it is worth pursuing the possibility that frequency interacts with the degree of gestural overlap, or with pre-tonic vowel deletion. The frequency counts for the 84 experimental words in this study were gathered from the Internet search engine Google (www.google.com). Blair, Urland, and Ma (2002) demonstrated that frequency counts from Internet search engines with large databases (over 300 million web pages indexed) were highly consistent with those in Kučera and Francis (1967) and CELEX (Baayen, Piepenbrock and Gulikers 1995). However, as Blair, Urland, and Ma note, the internet databases are not only more up-to-date, but they are cost-effective and include modern or idiosyncratic words (such as place names) that Kučera and Francis or CELEX may not contain. In May 2003, when the counts for these words were performed, Google indexed 3,083,324,652 web pages.27 When Google returns the search results, it also displays the number of pages containing the search term. This is the number used for the frequency counts in this section. These frequency counts are listed in Appendix 4. In order to confirm that word frequency counts from CELEX and Google were consistent, a Pearson correlation of the frequencies from each database was calculated for 73 of the 84 experimental words. Eleven of the experimental words were not found in CELEX, which is one of the main reasons for using the Internet database instead. Using frequency per million as the estimates, a highly significant correlation was found (r= .79, p<.0001). The results for overall correlations between frequency and elision is shown in Table 19. The data was divided into the fast and slow conditions, and correlations were computed over all words, and for the phonotactic categories of voiced-initial, voiceless stop-initial, fricative-initial, and /l/-second sequences.28

27 In order to maximize the frequency of a word, singulars and regular present tense forms (which have more hits on Google) were used instead of plurals and past tense (e.g. petunia and begonia instead of the plurals, and petition, capitulate, denounce instead of the past tense forms of these words). Google does not allow wild-card searches and returns only exact matches, so only this form of the word was searched. Adjectives such as dilapidated and ballooning were not changed to their verbal stems. Only English-language pages were searched. 28 Following Zipf (1949) and Jescheniak and Levelt (1994), elision scores are correlated with the log of the word frequency. The log frequency has been argued to better reflect human performance, since it evens out

Page 178: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

168

SEQUENCE TYPE SLOW CONDITION FAST CONDITION All sequences r=.12 r=.27* Voiced-initial r=−.07 r=−.04 Voiceless stop-initial r=−.14 r=.33* Fricative-initial r=.15 r=.32* /l/-second r=.41** r=.61**

Table 19. Correlation of word frequency and elision scores.

**=p<.001, *=p<.01. Correlation results show that in the slow condition, only elision in the /l/-second sequences are significantly correlated with frequency. No other category, including the calculation over all sequences, is significant, and in some cases, the correlation is even negative. In the fast condition, on the other hand, elision in all categories except voiced-initial is correlated with the frequency of the words. These findings suggest that at habitual speech rates, frequency is not correlated with elision, probably because elision is fairly uncommon (practically non-existent for voiced and voiceless stop-initial sequences, 13% for fricative-initial sequences, and 19% for /l/-second sequences; see Figure 41). The more elision that there is, the more likely there is to be a correlation. Although the increase in speech rate over all participants was not significant, the actual proportions of elision are higher in the fast condition, which is more likely to allow for a correlation to be found if one exists (still nearly non-existent for voiced-initial sequences 5% for voiceless stop-initial sequences, 28% for fricative-initial sequences, and 20% for /l/-second sequences; see Figure 41). Table 19 shows that there is a significant correlation between word frequency and elision in the fast condition. In cases when speakers allow greater overlap—perhaps to the point of elision—then the frequency of the word also influences whether particular individual words are more or less likely to demonstrate that overlap. One plausible reason for why lexical frequency factors into the amount of overlap that speakers allow is perceptibility: the more frequent a word is, the more likely a listener is to be able to recover it even if certain gestures are masked in the acoustic signal (see Rosenzweig and Postman 1957, Owens 1961, Broadbent 1967 on the general role of frequency in the perception of words). If gestural overlap interacts with frequency such that it is sensitive to the recoverability of the stimulus, this would be consistent with the general idea that perceptibility plays an important role in shaping speech production behaviors.

6.4. General discussion The results of this study show that word-initial pre-tonic schwa deletion is not simply a phonological deletion rule that is applied at fast speech rates. A number of factors refute the validity of such an analysis: 1. Impressionistic or introspective accounts of where vowel deletion occurs do not

explain their criteria for deletion. It is unclear whether proponents of these accounts are referring only to the absence of voicing, or whether they also exclude cases in

the distribution of the (relatively few) high-frequency words and the (relatively common) low-frequency words. This makes linear correlation a more appropriate measure.

Page 179: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

169

which there is aspiration or a devoiced vowel (Zwicky 1972, Hooper 1978, Roca and Johnson 1999). Measuring the rates of both elision and aspiration/devoicing provides more insight and shows how acoustic information can be related to speakers’ articulatory behavior. Most notably, it is likely that apparent pre-tonic schwa deletion is a result of increased gestural overlap in some conditions (see Beckman 1996).

2. In fact, the findings regarding elision can be accounted for by gestural overlap, but

not by an analysis treating absence of schwa from the acoustic record as deletion of the schwa gesture. As a result of gestural overlap:

a. The schwa in voiceless stop-initial /#C1´C2/ sequences surfaces as aspiration

of C1 [C1hC2] or a devoiced schwa [C1´ ̥C2], because the glottal opening gesture of C1 which is completed after the C1 oral gesture ends can obscure whatever part of the schwa is not already overlapped by the C1 oral closure.

b. There is no elision in voiced-initial /#C1´C2-/ sequences because overlapping only partially obscures the vowel, and there is no glottal opening gesture to conceal the part of the vowel that does remain.

c. In voiceless fricative-initial /#C1´C2-/ sequences, the frication noise of the first consonant and the aspiration from the glottal opening gesture combine to totally obscure the schwa, making it appear as though it has been deleted. However, because frication and aspiration/devoicing have a similar acoustic representation on a spectrogram, it is difficult to tell where the fricative ends and the vowel begins. It was predicted that schwa absence in these cases would coincide with a longer duration for the fricative, and this prediction was confirmed (see Section 6.3.4).

d. Likewise, the schwa in /l/-second /#C1´C2-/ sequences may appear deleted because overlapping by the initial consonant (either a voiced stop or /s/ in this data) will cause the schwa to have a very short duration on the acoustic record, which can be difficult to distinguish from the formant structure present in the following /l/. Like /s/-initial sequences, schwa absence rates may be inflated because of this.

3. A closer look at individual performance indicates that speakers can be divided into

rate-dependent eliders and rate-independent eliders. For some speakers, very overlapped gestures are characteristic of their habitual speech patterns some proportion of the time, whereas other speakers only increase the overlap at faster rates. Regardless of whether speakers’ overlap is rate dependent or not, however, their productions always follow the predicted pattern for each phonotactic category. This suggests that schwa deletion cannot simply be an optional phonological rule found only in fast speech.

4. At faster speech rates, the lexical frequency of the experimental words was correlated

with the amount of elision for all phonotactic categories except voiced-initial sequences. It is likely that this category reflects a floor effect, since there is so little elision to begin with. For the rest of the stimuli, it is hypothesized that the ability to

Page 180: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

170

recover the target is greater when the word is more frequent, and that this could factor into speakers’ production.

6.4.1. The relationship between overlap, speaking rate, and coordination The data presented in this chapter raise some important questions about the nature

of coordination, how it is implemented both phonologically and at a more peripheral level, and how speech rate affects the amount of overlap that sequential gestures exhibit. In the analysis of the production of phonotactically illegal word-initial clusters in the previous chapter, the coordination constraint CC-COORD plays an integral role in determining how consonants are to be phased with respect to one another in English when there is no constraint prohibiting the consonants from overlapping at all. This constraint ensures that in the output of the phonology, the release of C1 is aligned with the target of C2. It is expected that in ideal (canonical speech) conditions, the amount of overlap that this alignment leads to could be observed empirically.

Of course, it is well-known that there is often considerable variability in the amounts of overlap as determined from both acoustic and articulatory data. Speakers may display some amount of variation in the overlap between associated gestures (or more precisely, associated gestural constellations) (Lubker 1986, Löfqvist 1991), less variation among the gestures within a constellation (e.g Krakow 1989, Sproat and Fujimura 1993, Saltzman, Löfqvist and Mitra 2000), and likely even more across word boundaries (e.g. Byrd and Tan 1996, Zsiga 2000). One framework proposed to account for such variability in a principled way is phase windows (Byrd 1996b, Saltzman and Byrd 2000, see also a similar framework in Keating 1990a,b). As introduced in Chapter 2, according to the phase window proposal, coordination between gestures is not defined with respect to two points (punctate relative phasing), but rather with respect to constrained ranges in the phase cycles of sequential gestures or constellation of gestures (i.e. gestures that are traditionally conceived of as two adjacent segments). In terms of the framework developed in this dissertation, a hypothetical CV-COORD relationship, for example, might be more accurately stated by specifying a range of landmarks in the consonant that the vowel would be allowed to coordinate with: ALIGN(C, [target-release], V, onset) (see also a related proposal in Zsiga 2000). Note that this range encompasses the ALIGN(C, center, V, onset) coordination that has been proposed for English, but allows the coordination to vary in overlap on either side of the center of the consonant.

Byrd argues that if coordination is limited to two single points, or landmarks in gestural duration, then there would be no way to explain the range of variation in the amount of overlap that speakers may produce. According to this proposal, phase windows should be able to account for both the variation in overlap that is seen even when factors such as stress, focus, speech rate, and phrasal boundaries are kept constant, and the variation that occurs as these factors have an increasing influence on how a speaker produces an utterance. On the other hand, it is clear that not all possible overlapping configurations are empirically attested, which is why phase windows are proposed in the first place: by limiting overlap to a pre-specified (language-specific) range, variability can be constrained. In Saltzman and Byrd (2000), the task dynamic model that has been developed to demonstrate how gestural patterning is implemented (Saltzman and Munhall 1989) is expanded to incorporate phase windows. It is shown that task-dynamics are just as well-suited to accounting for phase windows as they are to punctate phasing.

Page 181: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

171

However, the phase window account does not necessarily explain the significantly greater amounts of elision and schwa devoicing that are evidenced by the speakers in this study at fast rates of speech. One point to note about phase windows is that every possible coordination within the window is not necessarily equally likely, but may rather be probabilistically distributed around some point, perhaps with some non-normal values for skewness and kurtosis. This is illustrated in Figure 55, adapted from Byrd (1996b). In this schematic, the coordination relationship between C1 and C2 could be defined as ALIGN(C1, [center-offset], C2, target), such that the target of C2 can vary in its alignment from the center to the offset of the gesture. However, as demonstrated by the hypothetical probability distribution, the alignment will most often conform to (C1, release, C2, target).

window in which alignment of the C2 target may vary

Probability distribution probability increasing

Figure 55. A hypothetical probability distribution for a CC-COORD phase window

In Byrd’s (1996b) proposal, since the amount of overlap (or regions of

coordination within the phase window) is influenced by linguistic and extralinguistic variables, the probability distributions depend on the particular conditions under which they are being implemented. In other words, there will be low token-to-token variability when the contextual influences remain similar. Specifically with regard to speech rate, Byrd mentions that “a variable may cause a preference for a particular region of the phase window…For example, a fast speech rate will favor the ‘more overlapped’ end of the window and a slower speech rate the ‘less overlapped’ end (151).” Other factors, such as the amount of overlap (or coordination) specified for different types of gestures and for syllabic position, may combine with the amount of overlap preferred by a certain speech rate to define the window that will be found for, say, a CC phase window for an initial fricative-stop cluster at a medium-fast rate.

However, this idea is not as simple as it first seems. Even if Byrd is correct, and a variable like speech rate influences the system by pushing alignment within the phase window toward the more overlapped end (but retaining the same beginning and ending values of the phase), it must be explained how this is implemented. That is, what controls the probability distribution of phase windows so that faster speech rates always have the effect of causing greater overlap? Byrd even illustrates a hypothetical distribution for the medium-fast speaking rate as an example, suggesting that there is in fact some fixed plan for how speech rate is executed. Yet, another possibility is that speech rate has the effect not of changing the probability distribution within the window, but of changing the point at which the window is effectively centered. In this scenario, the window and its distribution stay the same, but the starting and ending phase values or landmarks shift so that greater overlap is effected. Taking the CC-COORD example again, it could be that the range for C1 shifts from ALIGN(C1, [center-offset], C2, target) to ALIGN(C1, [target-release], C2, target). This is illustrated in Figure 56.

Page 182: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

172

(a) ALIGN(C1, [center-offset], C2, target) (b) ALIGN(C1, [target-release], C2, target)

Figure 56. (a) Standard English CC-COORD constraint with phase window.

(b) Hypothetical one with the phase window shifted for greater overlap. The straight solid lines and double-sided arrow represent the width of the window, and the oval marks the current point of alignment.

The possible analyses—whether the probability distribution shifts or whether the

landmark range does—likely predict differences in the range of variability that would be observed, and these could be tested empirically. The point that is important for the current discussion is that in general, phase windows are compatible with the proposition that a variable like fast speech can force a categorical change in the amount of overlap, suggesting implementation at a more central or cognitive level. Consequently, it can be hypothesized that speech rate affects phonological coordination. A number of studies have confirmed that the amount of overlap, or the coordination relationship between two gestures/constellations, varies systematically as a function of rate, though sometimes in a speaker-specific way. Nittrouer, Munhall, Kelso, Tuller and Harris (1988) demonstrated that the phase relation between the jaw closing gesture of a vowel and the upper lip lowering gesture of a following bilabial stop significantly shortened as speakers moved from normal to fast speech.29 This was primarily due to a decrease in jaw cycle duration. In other words, the jaw cycle duration was shorter at faster rates, which lead to an increase in the amount of overlap between the jaw closing gesture and lip lowering gesture. Nittrouer et al. concluded that the relation between jaw cycle duration and upper lip phase angle was not continuous.

Shaiman, Adams, and Kimelman (1995) examined the same relationship between jaw closing gesture and upper lip lower gesture as a function of rate. Although their findings were in agreement with Nittrouer et al. (1988) in that the amount of overlap between these two gestures changed significantly as a result of manipulations of speech rate for 5 of 8 speakers, Shaiman et al. (1995) found that the direction of the effects were not consistent for certain speakers. That is, for three of the speakers, as the jaw cycle durations for the vowel decreased, the amount of overlap increased. For two other speakers, the phase angle decreased, meaning that the onset of upper lip lowering for the bilabial gesture started earlier in the jaw cycle at the slower rates of speech. It is concluded that the way in which speakers implement rate changes may vary, although the notable finding is that these different methods all result in categorical differences between the amount of overlap found for slower versus faster speech. The remaining three speakers showed no significant difference in the amount of overlap across speaking rate. The authors note that this effect may be due to the fact that these speakers had particularly large standard deviations for all types of utterances, which may have prevented a significant difference from emerging.

29 In these studies, phase relation refers to the point in the 360º cycle of a gesture at which the following gesture begins. Shortening the phase angle means that the next gesture begins earlier in the cycle than it did under some other condition(s). This is equivalent to saying that overlap increases.

Page 183: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

173

Finally, as already discussed in Section 6.1.3, Tjaden and Weismer (1998) demonstrated that the F2 onset frequency of a vowel can vary systematically as it becomes more overlapped by a preceding consonant at faster rates of speech. However, they argue that the increase in speaking rate must be sufficiently large in order to for overlap to cause a significant, categorical change in F2 onset frequency. In addition, not all of their 8 speakers showed significant changes in F2 onset frequency, leading the authors to suggest that individual speakers may use different factors to signal a faster rate of speech.

The issue of individual variability is important in the pre-tonic schwa elision study presented in this chapter, since in fact 5 of the 9 speakers did not show an increase in elision as speech rate increased. For these speakers, though, the amount of elision (for fricative-initial and /l/-second sequences) was much higher at the slow rate than it was for rate-dependent eliders. While statistical measures showed that these speakers significantly increased their rate in the fast condition, it is possible that their “canonical” coordination already provides for greater overlap. Following the suggestion of Tjaden and Weismer (1998), the increase in speech rate that they self-selected in this study may not have been great enough to shift them to a different coordination pattern. The remaining 4 speakers, however, produced greater amounts of elision in words with pre-tonic schwas in the fast condition. For these speakers, it can be hypothesized that the faster speaking rate they used in this experiment was sufficient to categorically change the coordination pattern used to determine how much overlap will be tolerated between the gestures in the target words. If the amount of overlap between two gestures is a direct consequence of a COORDINATION constraint (whether punctate or with phase windows), then the distinction between the different kinds of speakers could mean that even at habitual speaking rates, they may have different landmarks or windows designated by the alignment constraint. Alternatively, in a phase window account, such a distinction between speakers could also arise if different people have different probability distributions associated with their windows. Obviously speakers have different habitual speaking rates, so to the extent that this influences and is influenced by a speaker’s phonology, it might be expected that the exact gestural landmarks specified by the COORDINATION constraints may not be invariant across all speakers of a language. While it can be established that increases in speaking rate may lead to a categorical change in the nature of the coordination found between two gestures, there is still an open question about how that change is specifically executed. It seems that there are two possibilities. One, alluded to above, is that speech rate is governed by a central parameter that directly influences phonological coordination. In other words, one of the ways in which speech rate is implemented is by using constraints that specify a more overlapping coordination. Presumably, this is done in order to achieve the goal of shortening utterance length without having to directly specify what that length should be. It is not being claimed that overlapping is the only factor that speakers manipulate in order to increase rate, but it is the one most directly related to temporal coordination. The other possibility is that greater overlap is a compensatory mechanism that is not planned at a central level, but falls out of the way in which the peripheral motor system organizes itself when faced with the task of producing a greater amount of speech in less time. Though the second possibility is still viable, a number of studies have provided computational evidence that speech rate is determined by a central/cognitive system that

Page 184: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

174

controls the temporal organization of the gestures before they reach the motor implementation stage. Some models of speech production systems have included a central clock which serves as an extrinsic timer. The task-dynamics model incorporates two levels that control different kinds of coordination: the intergestural level, and the interarticulatory level. Saltzman and Munhall (1989) define the roles of these levels as the following:

The intergestural level accounts for patterns of relative timing and cohesion among the activation intervals of gestural units that participate in a given utterance (e.g. the activation intervals for tongue-dorsum and bilabial gestures in a vowel-bilabial-vowel sequence). The interarticulator level accounts for the coordination among articulators evident at a given point in time due to the currently active set of gestures (e.g. the coordination among lips, jaw, and tongue during periods of vocalic and bilabial gestural coproduction). (336)

From this description, it can be inferred that the intergestural level most closely corresponds to the domain in which coordination constraints would be found. In discussing the types of overlap and “articulator sliding” presented in Stetson (1951) and Hardcastle (1985) due to increases speaking rate, Saltzman and Munhall speculate that these rate affects could be captured in the dynamical model through “hypothetically simple changes in the values of a control parameter or parameter set presumably at the model’s intergestural level (366)”.

A later version of the task-dynamics model presented in Saltzman, Löfqvist and Mitra (2000) eschews explicit gestural scores with predetermined coordination at the intergestural level, incorporating instead recurrent connectionist networks that determine gestural activation trajectories. Specifically, the intergestural level in this hybrid model contains a sequential neural network, while the interarticulator level still employs a task-dynamic model. Crucially, when the nature of the intergestural level is reconceived, an explicit subnetwork acting as a clock is required to ensure that time flow is accurately represented in the model. Certain parameters in the clocking subnetwork can be fixed to control the speech rate, which ultimately will have an effect on the coordination among gestures. In other words, by changing speech rate, gestures can be phased differently with respect to one another. It should be noted that although these researchers intend to eliminate the need for gestural scores as input, they still assume that there are gestures which must be assigned some kind of temporal coordination. This is compatible with the phonological framework proposed in Chapter 5, since individual gestures or gestural constellations are assumed to be in the input, not fully coordinated scores. It is then the job of the phonology—here the network of the intergestural level—to ensure that the gestures are temporally related in a principled, but language-specific way. In another type of speech production model proposed by Kawato, Vatikiotis-Bateson and colleagues, speech rate is determined by a combination of both temporal performance parameters and smoothness control, both of which are considered higher level information that affect trajectory planning and motor command generation (Vatikiotis-Bateson, Hirayama, Honda and Kawato 1992, Hirayama, Vatikiotis-Bateson and Kawato 1994, Munhall, Kawato and Vatikiotis-Bateson 2000). In this model, the gesture is essentially defined by via-points which demarcate the trajectory that a gesture should make through articulatory space. However, whether or not a gesture accurately

Page 185: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

175

attains the positions of the via-points depends on a smoothness constraint, which often forces trajectories to miss the via-points. Munhall, Kawato, and Vatikiotis-Bateson (2000) remark that the smoothness constraint alone may be responsible for distinguishing between casual/fast speech and precise/slow speech. Like the task-dynamics model, this model also assumes that a high-level planning component determines the rate of speech and then implements it in the coordinative patterns of the language. This too is compatible with a phonological account of temporal coordination. The results of the study in Section 6.2 can be tentatively interpreted in terms of the preceding discussion of overlap, temporal coordination and whether speech rate prompts, or is determined by, changes in the phonology. Both the task dynamics framework and the approach based on via-points and a smoothing constraint assume that either gestures (in the case of the former) or phonemes are the underlying units that are being combined and ultimately given temporal specifications. The phonological framework developed in Chapter 5 is compatible with the concepts espoused with respect to the dynamical systems in that different sequences of gestures have their own specific coordination patterns, and these must somehow be defined and enforced for any given language. In the task dynamics framework, it is proposed that there are functional linkages among gestures, and that these arise as a result of dynamical coupling among gestures in some particular relationship (such as onset cluster, onset-nucleus, nucleus-coda, etc: Saltzman and Munhall 1989). It has been argued in this dissertation that these linkages are not posited lexically, but are rather assigned by the phonological system of a language. More specifically, the linkages are established by ASSOCIATION constraints and the temporal relationships are defined by COORDINATION constraints. It is plausible that this process—phonological optimization—resides at the intergestural level, especially considering the formulation of this level as a neural network (Saltzman et al. 2000), given the relationship between Optimality Theory and neural networks (Prince and Smolensky 1997, Smolensky and Legendre to appear). If speech rate is determined at the intergestural level, then one way to conceive of effecting a rate change is to change the values or the range specified in the COORDINATION constraints. For example, using a phase window formulation of the constraints, it could be hypothesized that the CV-COORD constraint for a standard or normal rate is ALIGNNORMAL(C, [target-release], V, onset). At a faster rate, a constraint which gives rise to more overlap may be specified (or higher-ranked): ALIGNFAST(C, [onset-center], V, onset). As mentioned earlier in this section, this kind of grammatical change is one mechanism that speakers use in order to produce more sounds in less time, which is presumably the goal of increasing the speech rate. Note that if the onset of V is aligned with the onset of C, the gestures will be fully overlapped and will give the appearance of deletion, or what has been called elision in this chapter. Alternatively, it could be that the probability distribution shifts so that the onset of the V is more often closer to the target of the C. However, this approach would entail that the vowel gesture must also shrink in size at faster rates, or else there would never be elision. This is because the earliest landmark that the onset of the vowel could be aligned with is the target, and so there would always be some portion of the vowel gesture not overlapped by the consonant. With shifting windows, it is possible to have elision without also shrinking the duration of the vowel gesture, although a decrease in vowel duration is still possible.

Page 186: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

176

The two types of coordination constraints possible under a shifting window analysis are illustrated in Figure 57. ALIGNNORMAL(C, [target-release], V, onset) ALIGNFAST(C, [onset-center], V, onset)

Figure 57. Normal and fast CV-COORD constraints (with phase windows).

The straight solid lines and double-sided arrow represent the width of the window, and the oval marks the current point of alignment. In a punctate (non-phase-window) approach, the alignment would be defined at

the points delineated by the oval. For the speakers in this study who showed significantly greater elision in the fast condition, their self-selected change in rate was sufficient to induce a change from ALIGNNORMAL to ALIGNFAST as the constraint controlling coordination. For the other speakers, a greater change in rate might be necessary to make the ALIGNFAST constraint the active one. On the other hand, it is perhaps possible that these speakers already use ALIGNFAST at all speech rates, and thus cannot have a more overlapping coordination since the onset of the V gesture in ALIGNFAST is already allowed to start at the onset of C, which is the earliest possible landmark it can be aligned with.

6.4.2. Deletion plus gestural mistiming: A possible alternative account? In Chapter 4 and Chapter 5, it was argued that when producing non-native word-

initial clusters, English speakers do not epenthesize a vowel. Instead, the vowel present on the acoustic record was shown to be the result of producing gestures with a non-overlapping coordination. Since it has been demonstrated that this configuration gives rise to the appearance of a schwa on a spectrogram, it is possible that this is what the speakers are doing when producing /#C´C-/ words at faster speaking rates. That is, it could be that speakers actually are deleting the vowel gesture, and that the schwa found on the acoustic record is present because the consonants are not produced in conformity with CC-COORD. Based on the grammar of the phonotactics for English presented in Chapter 5, this is possible for clusters like /tp/ (from Topeka) or /ft/ (from fatigue), since higher ranked *OVERLAP constraints would in fact prohibit these gestures from becoming associated and forming an initial cluster. Considering these types of clusters alone, it would be possible that speakers were actually deleting the vowel at faster rates of speech, but then failing to form a correctly coordinated cluster because *OVERLAP constraints prevented that from happening. Consequently, the schwa on the surface would result from non-overlapping gestures. In this conception of fast speech, deletion of a weak vowel like schwa could come from promoting a *STRUCTURE constraint which would reduce the number of gestures that have to be produced in a shorter period of time.

However, this process is not extendable to /s/-initial or /l/-second sequences. For example, given a word like superior, if a speaker deletes the initial schwa, there would be no *OVERLAP constraint prohibiting the formation of the cluster /sp/. Yet it was shown in Section 6.3.3 that even with elision a true cluster was not formed, as evidenced by the presence of aspiration on the first /p/ and a longer duration for the /s/. It was argued that

Page 187: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

177

the duration of the /s/ likely results from the inability to distinguish between the fricative and a following devoiced/aspirated vowel, but even if it were really the case that the schwa were deleted in these tokens, then it would be impossible for the phonology to produce an optimal output with these surface phonetic characteristics for /s/ and /p/. Putting aside the longer /s/ (which would be difficult to explain with any phonological account assuming deletion had occurred), the presence of aspiration on the /p/ suggests that the /s/ is not associated with the following vowel, leaving /p/ to have the appropriate coordination which would give rise to aspiration. In Chapter 5, this was represented by the candidate shown in (5). (5) s p

i

[sphi]rior The tableau in (6) shows the candidates and optimal outcome for the production

of superior, assuming there is a high-ranked *STRUC constraint that forces deletion to occur. The ranking for all other constraints is taken from Chapter 5. CC-COORD and CV-COORD are taken to be the original constraints with punctate coordination.

(6) s p

´ i *STRUC ASSOC

-C CC-COORD

ASSOC-CV

ASSOC-CC

CV-COORD

*OV/ [+cons], [+cons]

a.

s p i [spi]

** *

b.

s p i [sphi]

*! *

c. s p i [s´pi]

*! **

e. s p i [s.phi]

*! * *

f. s p ´ i [s´phi]

*!

In (6) the optimal candidate is a, which has a well-formed cluster. The candidate in b, which has aspiration on the /p/ but no excrescent schwa between the consonants, is

Page 188: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

178

not the winner, even though this is what is present on the surface. In some productions of superior, there is actually is a schwa in the acoustic record, which could be represented by c under the true deletion story. But again, given this ranking for English, it would be impossible for c to be the optimal candidate. The same situation is true for any other /s/-initial or /l/-second sequence that would give rise to a legal word-initial cluster. Actual deletion in these cases would entail a well-formed cluster on the surface, but this often does not happen. Assuming that the same process is at work for both potentially legal and potentially illegal word-initial clusters (that is, some *STRUC constraint is high-ranked and forces deletion for both types of sequences), this account does not produce the right results, and therefore must be rejected.

6.5. Summary The goal of this chapter was to examine whether pre-tonic schwa deletion in fast speech is a phenomenon that potentially interacts with phonological coordination. In order to determine this, three questions had to be answered. The first question asked whether apparent deletion—elision—is actually the result of the deletion of the schwa gesture, or whether it is more accurately attributed to substantial gestural overlap. Results from the experiment in Section 6.2 suggest that elision is most likely a result of overlap. Whereas a true deletion account would have to assume that different /#C´C-/ sequences selectively allow deletion, overlap can account for the results by assuming that the same process occurs for all /#C´C-/ sequences. Specifically, the rates of elision and aspiration or devoicing for voiceless stop-initial, voiced-initial, fricative-initial, and /l/-second sequences can all be accounted for in terms of gestural overlap. This analysis is bolstered by the answer to the second question, regarding whether the surrounding consonants affect the amount of elision present. In other words, if a legal word-initial cluster will result when the schwa is elided, is elision more common? The answer is that elision rates are dependent on the phonotactic environment in a certain sense, since elision is more frequent in fricative-initial and /l/-second sequences, but this is not related to the phonotactic legality of a resulting cluster. The breakdown of the categories into individual clusters showed that sequences which would lead to illegal onsets, such as /#d´l/ or /#s´b/, lead to just as much elision as the sequences which are found in English, such as /#b´l/ or /#s´m/. Since fast speech can cause greater gestural overlap between the initial consonant and the schwa regardless of the composition of the sequence, the legality of the resulting sequence does not have any effect. Finally, having established that elision is a result of substantial gestural overlap, the mechanism for causing changes in the amount of overlap was discussed. For almost half of the speakers in the study, speaking at a faster rate of speech caused significantly greater elision in the fricative-initial and /l/-second sequences. A review of some models of gestural production including the task-dynamics model and the via-points/smoothing constraint model suggests that speech rate affects coordination patterns at a central level. In the task-dynamics model, this is the intergestural level which also contains the specifications for the gestures in a particular utterance. The fact that coordination and gestural specification are organized at the same level is compatible with the phonological component developed in this dissertation, which also assumes that these aspects interact at the same level of planning. A change in speech rate is implemented by modifying a coordination parameter, which was hypothesized to be a change in the specifications of

Page 189: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

179

the CV-COORD alignment constraint. By shifting the coordinating landmarks to be earlier in the time course of a gesture, more overlap is achieved. A phase-window approach was also considered in order to capture the amount of variation present in the amount of overlap that is allowed. Whether or not phase-windows are ultimately the right approach to capturing the variation is an empirical matter that requires further research.

Page 190: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

180

CHAPTER 7. Concluding Remarks The central goal of this dissertation was to investigate the role of temporal coordination in the grammar, how coordination interacts with perceptual and articulatory features, and whether these factors can be combined with gestural representations to adequately characterize both experimental production and cross-linguistic typological data. The main evidence used to examine these issues came from three experiments focusing on English speakers’ production of word-initial consonant clusters, since previous research has shown that consonant clusters are a phonotactic environment that is particularly sensitive to perceptual, articulatory, and temporal factors. In the first experiment, English speakers produced stimuli with Czech-legal/English-illegal initial clusters, as in the nonsense words vmape or fpala. Though all of the stimuli began with an initial fricative, a number of other variables were manipulated, including the place and voice of the first consonant and the manner of the second consonant. Results showed that when faced with fricative-initial clusters that are illegal in English (/fC/, /zC/ and /vC/), English speakers do not produce them all with equal accuracy. On the contrary, speakers were most accurate on /fC/ clusters, followed by /zC/ and then /vC/. The specification of the second consonant also contributed to speakers’ performance; for each initial consonant, speakers were more accurate when the fricative was followed by a nasal than when it was followed by an obstruent. It was argued that these results follow from the articulatory and perceptual factors that make /fC/, /zC/ and /vC/ disadvantaged with respect to /sC/, a type of fricative-initial cluster that is legal in English. Specifically, evidence was presented showing that decreased accuracy for /fC/ relative to sibilants is perceptually motivated: low-intensity fricatives may be insufficiently detectable at obstruent cluster edges. The disadvantage of /zC/ was claimed to be articulatory in origin, since oral pressure buildup in obstruent clusters makes voicing difficult to maintain. The cluster /vC/, which is both voiced and weak in intensity, caused the lowest rates of accuracy among speakers. That speakers discriminate between /f/, /z/, and /v/-initial clusters, which are not related on an intrinsic phonetic difficulty scale, suggests that a distinction is located at a phonological level. A number of arguments were presented to support the view that these results could not be attributed to articulatory difficulty or poor phonetic implementation. Results of the first experiment also showed that when speakers failed to produce the non-native clusters accurately, they most often repaired them with schwa insertion. In studies of second language acquisition or loan phonology, it is typically assumed that the schwa results from the phonological epenthesis of a vowel with its own gesture. However, a number of studies in the Articulatory Phonology framework have questioned this assumption, arguing that excrescent schwa can appear in the acoustic record in some environments even when there is no corresponding gesture. Specifically, if the consonant gestures in the cluster are coordinated so that they are barely overlapping, then a brief period of vocal tract opening between the consonants can give rise to voicing. Consequently, the second experiment investigated the nature of the inserted schwa from the first experiment. Using ultrasound imaging, speakers were recorded producing triads of words such as succumb-scum-zgomu. When speakers produced the non-native word with a schwa inserted into the initial cluster, the ultrasound frames

Page 191: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

181

containing the tongue shapes corresponding to the [z´C] were compared to the frames for both [s´C] and [sC]. It was hypothesized that if [z´C] did not contain a schwa with its own gesture, then the tongue shapes for [z´C] should be reliably more similar to [sC] than to [s´C]. Results showed that the hypothesis was consistently upheld for 3 of the 5 speakers in the experiment. It was concluded that at least for some speakers, failure to coordinate the consonant gestures according to the canonical coordination patterns for English (gestural mistiming) is a viable repair of phonotactically illegal sequences. The results of these experiments can be accounted for in a constraint-based phonological framework incorporating gestural coordination. It was argued that if the distinction speakers make between illegal word-initial clusters like /fC/, /zC/ and /vC/ has a phonological locus, then gestural mistiming must also be phonological since it interacts with the production of these sequences. That is, speakers ‘mistime’ /vC/ clusters more often than /zC/ or /fC/ clusters. Taking Gafos’s (2002) grammar of gestural coordination as a model, it was argued that an adequate phonological component must be able to refer to both temporally-defined gestural elements and features with perceptual or acoustic origins. Consequently, Gestural Association Theory (GAT), encompassing three families of constraints, was developed. First, *OVERLAP constraints were proposed to establish the combinations of consonant sequences that could be present in a language. These constraints can include both perceptual and articulatory features when determining which consonantal gestures may or may not overlap to form a cluster in a language. To account for differences between consonant sequences found tautosyllabically and those only allowed heterosyllabically, *OVERLAP must interact with a set of constraints that determine the syllabic affiliation of gestures. These were called ASSOCIATION constraints, and they form the second family of constraints in GAT. By definition, *OVERLAP constraints only pertain to associated sequences; in other words, to be treated as an onset cluster, gestures must be associated. Finally, associated gestures are also subject to COORDINATION constraints which determine the temporal relationship between the landmarks of a gesture. Thus, associated units like multiple consonants in an onset must conform to CC-COORD, which establishes whether the cluster will have an open or closed transition, for example. It was proposed that in an experimental situation where participants are trying to accurately produce phonotactically illegal sequences, speakers may not accomplish the task perfectly because they fail to establish the correct coordination between the consonants in the illegal cluster. In the current analysis, six *OVERLAP constraints prohibit these sequences, but when CC-COORD is sufficiently low ranked, speakers can avoid violations of the *OVERLAP constraints. That is, by producing the gestures in the cluster so that they do not overlap, *OVERLAP is satisfied. In order to fully account for the results of the first experiment, it was proposed that speakers must allow CC-COORD to float over the range of the six *OVERLAP constraints. When CC-COORD has an equal probability of being ranked in all points in the range on each optimization, the accuracy levels observed in the first experiment are captured fairly closely. Finally, just as phonological coordination among gestures accounts for the appearance of a schwa between consonants in the production of non-native forms, it also governs the schwa’s disappearance in the production of legal English words. This was addressed in the last experiment, in which deletion of pre-tonic schwa was examined. Despite claims in the phonological literature that pre-tonic schwa deletion is a

Page 192: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

182

phonological rule active in fast speech, instrumental studies have challenged this assertion, instead suggesting that in fact schwa deletion rates are much lower than has been assumed and that cases of apparent deletion are actually due to extreme gestural overlap. The third experiment further investigated the viability of the claim that schwa deletion is phonological by having speakers produce 23 different types of words with pre-tonic schwa (such as semester, potato, vanilla, etc.) at both normal and fast rates of speech. It was hypothesized that if deletion is phonologically governed, it should be sensitive to whether or not it would result in a word-initial cluster that is legal in English. The results showed that phonotactic legality did not affect deletion rates, but that the schwa appeared to be elided more frequently in fricative-initial and /l/-second sequences. A fine-grained analysis of the phonetic details of the resulting sequences indicated that even when there was no evidence of a voiced schwa present on the surface, other factors such as consonant duration and aspiration on the second consonant provide evidence that schwa is not deleted. It was argued that the results could be accounted for with a uniform explanation in terms of gestural overlap. It was predicted that if speakers increase the overlap between the initial consonant and the schwa in fast or casual speech, different sequences would have different acoustic manifestations. Whereas the instances of aspiration in place of a vowel would increase for voiceless stop-initial sequences, and voiced stop-initial sequences would be unaffected, the schwa might appear missing for fricative-initial and /l/-second sequences. These predictions were upheld. Despite the lack of evidence for phonological deletion of a gesture, these results suggest that phonological coordination plays a role in determining the amount of overlap that is preferred at faster rates of speech. It was hypothesized that increased overlap at faster rates of speech or as part of a speaker’s habitual speech rate (as compared to more deliberate speech) may not simply be a reflex of phonetic implementation, but rather a result of a change in the phonological coordination relationship specified for consonants and vowels. This idea is consistent with proposals from the modeling of gestural production which typically treat both temporal coordination and speech rate as high-level parameters that are part of the planning stages. In conclusion, this dissertation has provided evidence that temporal factors must be regulated by the phonology and not simply left to phonetic implementation. Furthermore, I have shown that temporal coordination interacts with both perceptual and articulatory features. It is only at a more abstract, or phonological, level that these elements can combine to account for both speech production and typological data.

Page 193: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

183

APPENDIX 1. Words used in Czech-consonant cluster experiment (Chapter 3) Note: Stimuli are written in IPA

/s/-initial sm smava smani smagu smapo sn snabu snadi snale snava sf sfano sfadu sfagi sfatu sp spagi spanu spato spama st stamo staka stape staga sk skabu skafu skadi skamo

/f/-initial fm fmatu fmake fmapa fmale fn fnagu fnada fnas&a fnapi fs fsaga fsake fsapi fsalu fp fpami fpala fpaze fpaku fk fkada fkabe fkati fkale ft ftake ftapi ftano ftani

/z/-initial zm zmafo zmagu zmapi zmadu zn znagi znafe znas&o znase zv zvato zvabu zvami zvapa zb zbatu zbasi zbagi zbano zd zdanu zdaba zdapo zdati zg zgano zgame zgaba zgade

/v/-initial vm vmala vmape vmabu vmadu vn vnali vnake vnapa vnaze vz vzaku vzamo vzaba vzagi vb vbagu vbano vbadu vbaki vd vdale vdapi vdaba vdagu vg vgane vgalu vgapo vgadi

Page 194: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

184

APPENDIX 2: An alternative to local conjunction (Chapter 5) The analysis of the production of non-native fricative-initial clusters by English speakers in Chapter 5 proposes that the conjoined constraints *OV/[−strid]&[+voi],[−son] and *OV/[−strid]&[+voi],[−approx] are needed to account for the fact that speakers are less accurate on /vC/ than they are on /fC/ and /zC/ clusters. It was noted that /vC/ clusters are the “worst-of-the-worst” since the initial consonant is both voiced and non-strident; typically this situation is accounted for with a locally-conjoined constraint in OT. By ranking the conjoined constraints above the corresponding simple constraints *OV/[−strid],Fβ and *OV/[+voi,-son],Fβ, speakers are able to make a distinction between /fC/, /zC/ and /vC/ clusters. The analysis presented in this chapter assumes that there is a hidden ranking not just of the conjoined constraints, but of all these constraints for English speakers, as shown in (1).

(1) *OV/[−strid]&[+voi],[−son] ≫ *OV/[−strid]&[+voi],[−approx], *OV/[+voi],[−son] ≫

*OV/[+voi],[−approx], *OV/[−strid],[−son] ≫ *OV/[−strid],[−approx] ≫

*OV/[+cons],[+cons] ≫ However, it is not actually the case that a conjoined constraint is required to get significantly worse performance on /vC/ clusters in a floating constraint analysis. Rather than concluding that English speakers have hidden rankings of *OVERLAP constraints that are uncovered by the experimental task, it is also possible that the constraints which make certain clusters phonotactically illegal are actually not ranked with respect to one another in the English grammar. Taking just the simple case in which only the initial consonant of the clusters is considered, it can be demonstrated that allowing CC-COORD to float over the whole range of *OVERLAP constraints does produce lower accuracy for /vC/ than for either /fC/ or /zC/ clusters. When *OV/[−strid],Fβ and *OV/[+voi,-son],Fβ are not ranked with respect to one another, either ranking *OV/[−strid],Fβ ≫ *OV/[+voi,-son],Fβ or *OV/[+voi,-son],Fβ ≫ *OV/[−strid],Fβ is possible. Both of these, however, are ranked above *OV/[+cons],[+cons] since consonant clusters are not generally prohibited. The possible ranking of these constraints (not including *OV/[+cons],[+cons], which is always bottom-ranked) is shown in (2).

Page 195: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

185

(2) Rankings Inventory

CC-COORD ≫ *OV/[+voi,-son],Fβ ≫ *OV/[−strid],Fβ

sC, fC, zC, vC

CC-COORD ≫ *OV/[−strid],Fβ ≫ *OV/[+voi,-son],Fβ

sC, fC, zC, vC

*OV/[+voi,-son],Fβ ≫ CC-COORD ≫ *OV/[−strid],Fβ

sC, fC

*OV/[−strid],Fβ ≫ CC-COORD ≫ *OV/[+voi,-son],Fβ

sC, zC

*OV/[+voi,-son],Fβ ≫ *OV/[−strid],Fβ ≫ CC-COORD

sC

*OV/[−strid],Fβ ≫ *OV/[+voi,-son],Fβ ≫ CC-COORD

sC

Total number of possible grammars: 6 Number of grammars allowing: /sC/: 6/6 /fC/: 3/6 /zC/: 3/6 /vC/: 2/6 Predicted proportion of accuracy on: /sC/: 100% /fC/: 50% /zC/: 50% /vC/: 33%

The rankings and inventories in (2) show that if each of these grammars is equally available to speakers on any given attempt to produce an experimental cluster, then it is expected that speakers would in fact show lower accuracy on /vC/ clusters than they do on /fC/ or /zC/ clusters. However, this analysis makes the wrong prediction regarding /fC/ or /zC/ clusters. Whereas the experiment showed that speakers exhibited 60% accuracy for /fC/ clusters and 39% for /zC/ clusters, this account predicts that they will be equally accurate on both of them. The predictions get even more complicated as Fβ is added and more constraints are included in the floating range. Again, this is a case in which there are no hidden rankings of *OVERLAP constraints, as shown in (3). (3) *OV/[+voi],[−son], *OV/[+voi],[−approx], *OV/[−strid],[−son],

*OV/[−strid],[−approx] ≫ *OV/[+cons],[+cons] The hierarchy in (3) gives rise to 120 possible grammars in a floating constraint analysis ((4! orders for *OVERLAP constraints = 24) x (5 locations for CC-COORD for each order of *OVERLAP constraint) = 120 grammars). Calculating the clusters allowed by each of the 120 grammars gives rise to the predictions in (4).

Page 196: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

186

(4) sC fN fO zN zO vN vO predicted 1.00

(120/120) .50

(60/120) .33

(40/120) .50

(60/120).33

(40/120) .33

(40/120) .21

(25/120)observed .97 .75 .53 .56 .31 .31 .18

Like the predictions for the simple case, including different values for Fα and Fβ in the constraints still fails to produce the observed proportions. Again it is predicted that speakers should produce the pair /fN/ and /zN/ and the pair /fO/ and /zO/ with equal accuracy. Based on these calculations, it is concluded that the experimental results from Chapter 3 can only be accounted for with both hidden rankings and locally-conjoined constraints. However, the discovery that different proportions of accuracy can result from floating constraints alone and that hidden rankings and locally-conjoined constraints are not absolutely required is a notable theoretical point that may enable interesting analyses of other data qualitatively similar to those analyzed here, but with different quantitative structure.

Page 197: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

187

APPENDIX 3: Stimuli from fast speech schwa elision experiment (Chapter 6) VOICED-INITIAL SEQUENCES dv db bg dp dk bf developers debatable begun depend Dakota before development debilitating beginning dependence decanter buffet diversify debauchery begonia departure decaying befall

vn dn dm df

Venetian denotes democracy deficient Vanessa denounced demolish defeat vanilla denied diminish defense VOICELESS-STOP INITIAL SEQUENCES kp pt pth pd tb Copernicus potato pathetic pedantic tabasco capacity petunia pathology pedometer tabouli capitulated petition Pythagorean pedestrian tibetan

tm km tamale commercial tomato commendable timidity community FRICATIVE-INITIAL SEQUENCES sf sv sm sp sb ft sophisticated civilian Somalian superior suburban fatigue sufficient Savannah semester support Sebastian photographersuffice Seville cement superfluous sabbatical photography /l/-SECOND SEQUENCES bl ml dl gl sl balloon Malaysia dilapidated Galapagos cylindrical believe malaria delightful Galicia selection baloney Melissa delectable galactic celebrity

Page 198: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

188

APPENDIX 4: Word frequency counts using Google (Chapter 6)

CELEX Raw Frequency (out of 14.7 million)

Google Raw Frequency (out of 3.08 billion)

ballooning 14 253,000 baloney 91,500 befall 5 223,000 before 17510 116,000,000 beginning 2916 25,300,000 begonias 3 99,100 begun 1114 4,850,000 believe 1289 35,300,000 buffet 2 2,570,000 capacity 926 17,800,000 capitulated 3 45,000 celebrity 46 12,000,000 cement 201 2,310,000 civilians 65 3,030,000 commendable 27 277,000 commercial 1012 34,200,000 community 2256 91,900,000 Copernicus 24 537,000 cylindrical 13 661,000 Dakota 10,600,000 debatable 21 227,000 debauchery 11 187,000 debilitating 32 329,000 decanter 16 168,000 decaying 76 363,000 defeat 400 3,520,000 defense 1843 14,600,000 deficient 59 908,000 delectable 31 253,000 delightful 234 1,660,000 democracy 941 5,320,000 demolish 6 245,000 denied 98 5,680,000 denotes 19 1,930,000 denounced 22 280,000 departure 453 4,800,000 depend 572 6,120,000 dependence 229 2,890,000 developers 65 15,500,000

Page 199: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

189

development 3241 92,300,000 dilapidated 44 180,000 diminish 29 890,000 diversify 8 422,000 fatigued 4 2,570,000 galactic 28 1,030,000 Galapagos 493,000 Galicia 207,000 malaria 37 502,000 Malaysia 52 5,150,000 Melissa 2,220,000 pathetic 195 504,000 pathology 28 1,180,000 pedantic 30 113,000 pedestrian 65 722,000 pedometer 1 49,800 petitioned 4 2,130,000 petunias 4 75,000 photographer 159 2,130,000 photography 98 7,060,000 potato 206 1,740,000 Pythagoras 93,100 sabbatical 15 205,000 Savannah 2,280,000 Sebastian 1,660,000 selection 539 6,500,000 semester 17 4,350,000 Seville 339,000 Somalian 2 1,450,000 sophisticated 495 2,780,000 suburban 163 1,390,000 suffice 17 509,000 sufficient 807 5,640,000 superfluous 71 205,000 superior 527 7,510,000 support 2578 9,170,000 Tabasco 5 160,000 tabouli 8,750 tamales 1 60,200 Tibetan 14 509,000 timidity 42 78,800 tomato 123 1,280,000 Vanessa 1,210,000 vanilla 33 1,240,000 Venetian 66 329,000

Page 200: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

190

APPENDIX 5: Spectrograms showing schwas, aspiration, and elision (Chapter 6) (Note: The low frequency noise present during voiceless consonants in these spectrograms is due to a hum in the room that the recordings took place in.) a. Normal (less overlapped) voiceless consonant-schwa coordination with voiced schwa

present (Participant 3, potato, slow condition)

Time (s)0 0.418186

0

5000

Fre

quen

cy (

Hz)

p ´ t h ej R o b. Overlapping voiceless consonant-schwa coordination with no voiced schwa present

(Participant 12, petitioned, fast condition)

Time (s)0 0.457551

0

5000

Fre

quen

cy (

Hz)

p ´ t h I S ´ n d

Page 201: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

191

c. Overlapping voiced consonant-schwa coordination with voiced schwa present (Participant 6, dependence, fast condition)

0 0.5534240

5000

Fre

quen

cy (

Hz)

d ´ p h E n d E n s d. Overlapping voiceless fricative-schwa coordination with no schwa present (Participant

6, superior, fast condition)

Time (s)0 0.491247

0

5000

Fre

quen

cy (

Hz)

s p h i r j‘

Page 202: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

192

e. Overlapping voiced consonant-schwa-/l/ coordination with no schwa present (Participant 8, Malaysia, fast condition)

0 0.4691610

5000

Fre

quen

cy (

Hz)

m l ej Z ´

Page 203: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

193

REFERENCES Adams, Douglas Q. 1977. Inter-Dialect Rule Borrowing: Some Cases from Greek.

General Linguistics 17 (3), 141-154. Adams, Scott G., Gary Weismer and Raymond Kent. 1993. Speaking rate and speech

movement velocity profiles. Journal of Speech and Hearing Research 36, 41-54. Akgul, Yusuf Sinan, Chandra Kambhamettu and Maureen Stone. 1999. Automatic

extraction and tracking of the tongue contours. IEEE Transactions on Medical Imaging 18 (10), 1035-1045.

Albro, Daniel. 2003. A large-scale, computerized phonological analysis of Malagasy. Talk presented at the Annual Meeting of the Linguistic Society of America, January 2-5, 2003, Atlanta, GA.

Alderete, John. 1997. Dissimilation as local conjunction. In K. Kusumoto, Ed., Proceedings of the North East Linguistics Society 27. Amherst, MA: GLSA, pp. 17-31.

Anderson, J. 1987. The Markedness Differential Hypothesis and syllable structure difficulty. In G Ioup and S. Weinberger, Eds., Interlanguage Phonology: The Acquisition of a Second Language Sound System. Cambridge, MA: Newbury House.

Anttila, Arto. 1997a. Deriving variation from grammar. In F. Hinskens, R. van Hout and W.L. Wetzel, Eds., Variation, Change, and Phonological Theory. Amsterdam: John Benjamins, pp. 35-68.

Anttila, Arto. 1997b. Variation in Finnish Phonology and Morphology. Ph.D. dissertation, Stanford University.

Archangeli, Diana and Douglas Pulleyblank. 1994. Grounded Phonology. Cambridge: MIT Press.

Archibald, John. 1998a. Second Language Phonology. Philadelphia: John Benjamins. Archibald, John. 1998b. Second language phonology, phonetics, and typology. Studies in

Second Language Acquisition 20, 189-211. Baayen, R.H., R. Piepenbrock and L. Gulikers. 1995. The CELEX Lexical Database

(Release 2) [CD-ROM]. Philadelphia: Linguistics Data Consortium. Beckman, Mary. 1996. When is a syllable not a syllable? In Takashi Otake and Anne

Cutler, Eds., Phonological Structure and Language Processing: Cross-Linguistics Studies. New York: Mouton de Gruyter, pp. 95-123.

Behrens, S. J and Sheila Blumstein. 1988. Acoustic characteristics of English voiceless fricatives: A descriptive analysis. Journal of Phonetics 16, 295-63.

Benguerel, A.P. and H.A. Cowan. 1974. Coarticulation of upper lip protrusion in French. Phonetica 30, 40-51.

Benua, Laura. 1997. Transderivational Identity: Phonological Relations between Words. Ph.D. dissertation, University of Massachusetts.

Blair, Irene , Geoffrey Urland and Jennifer Ma. 2002. Using Internet search engines to estimate word frequency. Behavior Research Methods, Instruments, and Computers 34 (2), 286-290.

Blevins, Juliette and Andrew Garrett. 1998. The origins of consonant-vowel metathesis. Language 74, 508-556.

Page 204: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

194

Blevins, Juliette and Andrew Garrett. in press. The evolution of metathesis. In Bruce Hayes, Robert Kirchner and Donca Steriade, Eds., The Phonetic Bases of Markedness. Cambridge: Cambridge University Press.

Blumstein, Sheila and K Stevens. 1978. Invariant cues for place of articulation in stop consonants. Journal of the Acoustical Society of America 64, 1358-1368.

Boersma, Paul. 1998. Functional Phonology: Formalizing the interactions between articulatory and perceptual drives. The Hague: Holland Academic Graphics.

Boersma, Paul, Paola Escudero and Rachel Hayes. 2003. Learning abstract phonological from auditory phonetic categories: An integrated model for the acquisition of language-specific sound categories. In Proceedings of the 15th International Congress of Phonetic Sciences. Barcelona.

Boersma, Paul and Bruce Hayes. 2001. Empirical tests of the Gradual Learning Algorithm. Linguistic Inquiry 32, 45-86.

Boersma, Paul and Clara Levelt. 2000. Gradual constraint-ranking learning algorithm predicts acquisition order. Proceedings of the Child Language Research Forum 30, 229-237.

Bogadek, Francis A. 1985. New English-Croatian and Croatian-English Dictionary. New York: Macmillan.

Borden, G and Katherine Harris. 1984. Speech Science Primer: Physiology, Acoustics, and Perception of Speech. Baltimore: Williams and Williams.

Boyce, S.E., Rena Krakow, Fredericka Bell-Berti and C.E. Gelfer. 1990. Converging sources of evidence for dissecting articulatory movements into core gestures. Journal of Phonetics 18, 173-188.

Bradley, Travis. 2002. Gestural timing and derived environment effects in Norwegian Clusters. In L. Mikkelsen and C. Potts, Eds., Proceedings of WCCFL21. Somerville, MA: Cascadilla Press.

Broadbent, D.E. 1967. Word-frequency effect and response bias. Psychological Review 74, 1-15.

Broselow, Ellen. 1983. Nonobvious transfer: On predicting epenthesis error. In S. Gass and L. Selinker, Eds., Language transfer in language learning. Rowley, MA: Newbury House.

Broselow, Ellen. 1991. The Structure of Fricative-Stop Onsets. Talk presented at the Conference for the Organization of Phonological Features, Santa Cruz.

Broselow, Ellen and Daniel Finer. 1991. Parameter Setting in Second Language Phonology and Syntax. Second Language Research 7 (1), 35-59.

Browman, Catherine and Louis Goldstein. 1986. Towards an articulatory phonology. Phonology Yearbook 3, 219-252.

Browman, Catherine and Louis Goldstein. 1988. Some Notes on Syllable Structure in Articulatory Phonology. Phonetica 45, 140-155.

Browman, Catherine and Louis Goldstein. 1989. Articulatory gestures as phonological units. Phonology 6, 201-251.

Browman, Catherine and Louis Goldstein. 1990a. Gestural specification using dynamically-defined articulatory structures. Journal of Phonetics 18, 299-320.

Browman, Catherine and Louis Goldstein. 1990b. Tiers in articulatory phonology, with some implications for casual speech. In John Kingston and Mary Beckman, Eds.,

Page 205: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

195

Papers in Laboratory Phonology I: Between the Grammar and Physics of Speech. Cambridge: Cambridge University Press.

Browman, Catherine and Louis Goldstein. 1992a. Articulatory Phonology: An overview. Phonetica 49, 155-180.

Browman, Catherine and Louis Goldstein. 1992b. "Targetless" schwa: an articulatory analysis. In Gerard Docherty and D. Robert Ladd, Eds., Papers in Laboratory Phonology II: Gesture, Segment, Prosody. Cambridge: Cambridge University Press.

Browman, Catherine and Louis Goldstein. 1995. Gestural syllable position effects in American English. In Fredericka Bell-Berti and Lawrence Raphael, Eds., Producing Speech: Contemporary Issues for Katherine Safford Harris. New York: American Insititue of Physics.

Browman, Catherine and Louis Goldstein. 2001. Competing Constraints on Intergestural Coordination and Self-Organiztion of Phonological Structures. Bulletin de la Communication Parleé 5, 25-34.

Burzio, Luigi. 2000. Segmental Contrast meets Output-to-Output Faithfulness. The Linguistic Review 17 (2-4), 368-384.

Byarushengo, Ernest R. 1976. Strategies in loan phonology. Proceedings of the Annual Meeting, Berkeley Linguistics Society 2.

Byrd, Dani. 1994. Articulatory Timing in English Consonant Sequences. Ph.D dissertation, UCLA.

Byrd, Dani. 1995. C-centers revisited. Phonetica 52, 285-306. Byrd, Dani. 1996a. Influences on articulatory timing in consonant sequences. Journal of

Phonetics 24, 209-244. Byrd, Dani. 1996b. A phase window framework for articulatory timing. Phonology 13,

139-169. Byrd, Dani and Cheng Cheng Tan. 1996. Saying consonant clusters quickly. Journal of

Phonetics 24, 263-282. Carlisle, Robert. 1998. The acquisition of onsets in a markedness relationship: A

longitudinal study. Studies in Second Language Acquisition 20, 245-260. Carney, P and K Moll. 1971. A Cineflourographic Investigation of Fricative Consonant-

Vowel Coarticulation. Phonetica 23, 193-202. Casali, Roderic. 1996. Resolving Hiatus. Ph.D. dissertation, UCLA. Catford, J.C. 1977. Fundamental Problems in Phonetics. Bloomington: Indiana

University Press. Catford, J.C. 1988. A Practical Introduction to Phonetics. Oxford: Clarendon Press. Cedergren, Henrietta and David Sankoff. 1974. Variable rules: Performance as a

statistical reflection of competence. Language 50, 333-355. Cermak, Alois. 1963. English-Czech Czech-English Dictionary. New York: Saphrograph

Co. Chitoran, Ioana, Louis Goldstein and Dani Byrd. in press. Gestural overlap and

recoverability: Articulatory evidence from Georgian. In Natasha Warner and Carlos Gusshoven, Eds., Papers in Laboratory Phonology VII. Cambridge: Cambridge University Press.

Cho, Taehong. 1998. Specification of Intergestural Timing and Gestural Overlap: EMA and EPG Studies. M.A. Thesis, UCLA.

Page 206: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

196

Chomsky, Noam and Morris Halle. 1968. The Sound Pattern of English. New York: Harper and Row.

Clements, G. N. 1985. The geometry of phonological features. Phonology Yearbook 2, 225-252.

Clements, G. N. and Steven Keyser. 1983. CV Phonology: A generative theory of the syllable. Cambridge, MA: MIT Press.

Clements, G.N. 1990. The role of the sonority cycle in core syllabification. In M Beckman and J Kingston, Eds., Papers in Laboratory Phonology 1. Cambridge, MA: Cambridge University Press.

Cohn, Abigail. 1993. Nasalisation in English: phonology or phonetics. Phonology 10, 43-81.

Coleman, John. 1996. Declarative syllabification in Tashlhit Berber. In J. Durand and B. Laks, Eds., Current Trends in Phonology: Models and Methods, vol. 1. Manchester: University of Salford, pp. 175-216.

Cooper, André. 1991. An Articulatory Account of Aspiration in English. Ph.D. dissertation, Yale University.

Côté, Marie-Hélène. 1997. Phonetic salience and the OCP in coda cluster reduction. Papers from the Meeting of the Chicago Linguistic Society 33, 57-71.

Côté, Marie-Hélène. 2000. Consonant Cluster Phonotactics: A Perceptual Approach. Ph.D. dissertation, MIT.

Dalby, Jonathan. 1986. Phonetic Structure of Fast Speech in American English. Ph.D. dissertation, Indiana University.

d'Anglejan, D., W. Lambert, G. Tucker and J. Greenberg. 1971. Psychological correlates of the French sound system. Perception and Psychophysics 9, 356-357.

Davidson, Lisa. 1997. An Optimality Theoretic Approach to Second Language Acquisition. Senior Honors Thesis, Brown University.

Davidson, Lisa. 2003. The role of gestural coordination in Zoque palatal coalesence. Talk presented at the 2003 LSA Meeting, Atlanta, GA, January 3-5.

Davidson, Lisa and Matthew Goldrick. 2003. Tense, agreement and defaults in child Catalan: An Optimality Theoretic analysis. In Silvina Montrul, Ed., Selected Papers from the 4th Conference on the Acquisition of Spanish and Portuguese as First and Second Languages. Cambridge, MA: Cascadilla Press.

Davidson, Lisa, Peter Jusczyk and Paul Smolensky. 2003. The initial and final states: Theoretical implications and experimental explorations of richness of the base. In Rene Kager, Wim Zonneveld and Joe Pater, Eds., Fixing Priorities: Constraints in Phonological Acquisition. Cambridge: Cambridge University Press.

Davidson, Lisa and Géraldine Legendre. 2003. Defaults and competition in the acquisition of functional categories in Catalan and French. In Rafael Nuñez-Cedeno, Richard Cameron and Luis López, Eds., A Romance Perspective on Language Knowledge and Use. Philadelphia: John Benjamins.

Davidson, Lisa and Rolf Noyer. 1997. Loan phonology in Huave: Nativization and the ranking of faithfulness constraints. Proceedings of the West Coast Conference on Formal Linguistics 15.

Davidson, Lisa, Paul Smolensky and Peter Jusczyk. in press. The initial and final states: Theoretical implications and experimental explorations of richness of the base. In

Page 207: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

197

Joe Pater, Ed., Fixing Priorities: Constraints in Phonological Acquisition. Cambridge: Cambridge University Press.

de Lacy, Paul. 2002. The Formal Expression of Markedness. Ph.D. dissertation, University of Massachusetts at Amherst.

Dell, François and Mohamed Elmedlaoui. 1985. Syllabic consonants and syllabification in Imdlawn Tashlhiyt Berber. Journal of African Languages and Linguistics 7, 105-130.

Dell, François and Mohamed Elmedlaoui. 1996a. Nonsyllabic transitional vocoids in Imdlawn Tashlhiyt Berber. In J. Durand and B. Laks, Eds., Current Trends in Phonology: Models and Methods, vol. 1. Manchester: University of Salford, pp. 217-244.

Dell, François and Mohamed Elmedlaoui. 1996b. On consonant release in Imdlawn Tashlhiyt Berber. Linguistics 34, 357-395.

Donegan, P.J. and David Stampe. 1979. The study of natural phonology. In Daniel Dinnsen, Ed., Current Approaches to Phonological Theory. Bloomington: Indiana University Press, pp. 126-173.

Dorman, Michael, Michael Studdert-Kennedy and Lawrence Raphael. 1977. Stop-consonant recognition: Release bursts and forman transitions as functionally equivalent, context dependent cues. Perception and Psychophysics 22, 109-122.

Dupoux, Emmanuel, Kazuhiko Kakehi, Y. Hirose, Christophe Pallier and Jacques Mehler. 1999. Epenthetic vowels in Japanese: A perceptual illusion? Journal of Experimental Psychology: Human Perception and Performance 25, 1568-1578.

Dupoux, Emmanuel, Christophe Pallier, Kazuhiko Kakehi and Jacques Mehler. 2001. New evidence for prelexical phonological processing in word recognition. Language and Cognitive Processes 16 (5/6), 491-505.

Eckman, Fred. 1977. Markedness and the Contrastive Analysis Hypothesis. Language Learning 27, 315-330.

Eckman, Fred and Gregory Iverson. 1993. Sonority and markedness among onset clusters in the interlanguage of ESL learners. Second Language Research 9, 234-252.

Ellis, Lucy and William Hardcastle. 1999. An instrumental study of alveolar to velar assimilation in careful and fast speech. In Proceedings of the XIV International Congress of Phonetic Sciences, vol. 3. San Francisco, pp. 2425-2428.

Elson, Benjamin. 1956. Sierra Popoluca syllable structure. International Journal of American Linguistics 13, 13-17.

Engstrand, Olle. 1988. Articulatory correlates of stress and speaking rate in Swedish VCV utterances. Journal of the Acoustical Society of America 83 (5), 1863-1875.

Fidelholtz, James L. 1975. Word frequency and vowel reduction in English. In Robin E. Grossman, L. James San and Timothy J. Vance, Eds., Papers from the Eleventh Regional Meeting of the Chicago Linguistic Society. Chicago: CLS, pp. 200-213.

Flege, James Emil and Wieke Eefting. 1987a. Cross-language switching in stop consonant perception and production by Dutch speakers of English. Speech Communication 6 (3), 185-202.

Flege, James Emil and Wieke Eefting. 1987b. Production and perception of English stops by native Spanish speakers. Journal of Phonetics 15 (1), 67-83.

Fleischhacker, Heidi. 2001. Cluster-dependent epenthesis assymetries. UCLA Working Papers in Linguistics 7, 71-116.

Page 208: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

198

Flemming, Edward. 1995. Auditory Representations in Phonology. Ph.D. dissertation, UCLA.

Flemming, Edward. 2001. Scalar and categorical phenomena in a unified model of phonetics and phonology. Phonology 18, 7-44.

Flemming, Edward. to appear. Contrast and perceptual distinctiveness. In Bruce Hayes, Robert Kirchner and Donca Steriade, Eds., The Phonetic Bases of Markedness. Cambridge: Cambridge University Press.

Fokes, Joanne and Z.S. Bond. 1993. The elusive/illusive syllable. Phonetica 50, 102-123. Fougeron, Cécile and Patricia Keating. 1997. Articulatory strengthening at the edges of

prosodic domains. Journal of the Acoustical Society of America 101, 3728-3740. Fougeron, Cécile and Donca Steriade. 1997. Does deletion of French schwa lead to

neutraliztion of lexical distinctions? In Proceedings of Eurospeech97, vol. 2. Rhodes, Greece, pp. 943-946.

Fukazawa, Haruka, Mafuyu Kitahara and Mits Ota. 1998. Lexical stratification and ranking invariance in constraint-based grammars. Papers from the Meeting of the Chicago Linguistic Society 34 (2).

Gafos, Adamantios. 1996/1999. The Articulatory Basis of Locality in Phonology. New York: Garland.

Gafos, Adamantios. 2002. A grammar of gestural coordination. Natural Language and Linguistic Theory 20 (2), 269-337.

Gafos, Adamantios. to appear. Dynamics in grammar: comment on Ladd and Ernestus & Baayen. In Papers in Laboratory Phonology VIII. Berlin/New York: Mouton de Gruyter.

Gay, Thomas. 1981. Mechanisms in the control of speech rate. Phonetica 38, 148-158. Gay, Thomas, Björn Lindblom and James Lubker. 1981. Production of bite-block

vowels: acoustic equivalence by selective compensations. Journal of the Acoustical Society of America 69 (3), 802-810.

Gay, Thomas, T. Ushijima, Hajime Hirose and F. Cooper. 1974. Effect of speaking rate on labial consonant-vowel articulation. Journal of Phonetics 2 (47-63).

Gick, Bryan. 1999. A gesture-based account of intrusive consonants in English. Phonology 16, 29-54.

Gick, Bryan. 2002. An X-ray investigation of pharyngeal constriction in American English schwa. Phonetica 59 (1), 38-48.

Gick, Bryan and Ian Wilson. to appear. Excrescent schwa and vowel laxing: Cross-linguistic responses to conflicting articulatory targets. In Papers in Laboratory Phonology VIII. Cambridge: Cambridge University Press.

Glowacka, Dorota. 2001. Unstressed vowel deletion and new consonant clusters in English. Poznan Studies in Contemporary Linguistics 37, 71-94.

Godfrey, J., E.C. Holliman and J. McDaniel. 1992. SWITCHBOARD: a telephone speech corpus for research and development. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). San Francisco, pp. 517-520.

Goldstein, Louis. 2002. Laryngeal gestures and states of the glottis. Class website for Linguistics 120, Yale University. http://www.ling.yale.edu:16080/ling120/Larynx/Laryngeal_Gestures.html

Page 209: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

199

Gordon, Matthew. 1999. Syllable weight: Phonetics, phonology, and typology. Ph.D. dissertation, UCLA.

Gordon, Matthew. 2002. A phonetically drive account of syllable weight. Language 78 (1), 51-80.

Grad, Anton and Henry Leeming. 1994. Slovensko-Angleski Slovar = Slovene-English dictionar. Ljubljana (Oxford): DZS (Oxford University Press).

Greenberg, J. 1965. Some generalizations concerning initial and final consonant clusters. Linguistics 18, 5-34.

Greenberg, J. and J. Jenkins. 1964. Studies in the psychological correlates of the sound system of American English. Word 20, 157-172.

Guion, Susan. 1998. The role of perception in the sound change of velar palatalization. Phonetica 55, 18-52.

Guy, Gregory and Charles Boberg. 1997. Inherent variability and the obligatory contour principle. Language Variation and Change 9, 149-164.

Haggard, M. 1978. The devoicing of voiced fricatives. Journal of Phonetics 6, 95-102. Hall, Nancy. 2002. Svarabhakti as gestural overlap. Talk presented at LSA, San

Francisco and HUMDRUM 2002, University of Massachusetts. Halle, Morris. 1992. Phonological features. In William Bright, Ed., International

Encyclopedia of Linguistics, vol. 3. Oxford: Oxford University Press, pp. 207-212.

Halle, Morris and Kenneth Stevens. 1971. A note on laryngeal features. MIT Research Laboratory of Electronics Report 101, 198-213.

Halle, Morris and Jean-Roger Vergnaud. 1987. An Essay on Stress. Cambridge, MA: MIT Press.

Hallé, P., Jose Segui, Uli Frauenfelder and C. Meunier. 1998. Processing of illegal consonant clusters: A case of perceptual assimilation? Journal of Experimental Psychology: Human Perception and Performance 24 (2), 592-608.

Hancin-Bhatt, Barbara and Rajesh Bhatt. 1998. Optimal L2 syllables: Interactions of transfer and developmental effects. Studies in Second Language Acquisition 19, 331-378.

Hardcastle, William. 1985. Some phonetic and syntactic constraints on lingual coarticulation during /kl/ sequences. Speech Communication 4, 247-263.

Harris, Katherine. 1958. Cues for the discrimination of American English fricatives in spoken syllables. Language and Speech 1, 1-7.

Haunz, Christine. 2002. Speech perception in loanword adaptation. Talk presented at the Postgraduate Conference of the Edinburgh University Department of Theoretical and Applied Linguistics. May 27-28, 2002.

Hayes, Bruce. 1985. A Metrical Theory of Stress Rules. New York: Garland Press. Hayes, Bruce. 1999. Phonetically-driven phonology: The role of Optimality Theory and

inductive grounding. In Mike Darnell, Edith Moravscik, Michael Noonan, Frederick Newmeyer and Kathleen Wheatly, Eds., Functionalism and formalism in linguistics, volume I: General papers. Amsterdam: John Benjamins, pp. 243-285.

Hayes, Bruce. 2000. Gradient well-formedness in Optimality Theory. In Joost Dekkers, Frank van der Leeuw and Jeroen van de Weijer, Eds., Optimality Theory:

Page 210: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

200

Phonology, Syntax, and Acquisition. Oxford: Oxford University Press, pp. 88-120.

Hayes, Bruce and Margaret MacEachern. 1998. Quatrain form in English folk verse. Language 64, 473-507.

Hayes, Bruce and Tanya Stivers. 2000. Postnasal voicing. ms, UCLA, http://www.humnet.ucla.edu/humnet/linguistics/people/hayes/phonet/ncphonet.pdf.

Henderson, Janet and Bruno Repp. 1982. Is a Stop Consonant Released when Followed by Another Stop Consonant? Phonetica 39, 71 - 82.

Hirayama, Makoto, Eric Vatikiotis-Bateson and Mitsuo Kawato. 1994. Inverse dynamics of speech motor control. In Jack Cowan, Gerry Tesauro and Josh Alspector, Eds., Proceedings of Neural Information Processing Systems (NIPS) 1993. San Francisco: Morgan Kaufman, pp. 1043-1050.

Holden, Kyril. 1976. Assimilation rates of borrowing and phonological productivity. Language 52, 131-147.

Honorof, Douglas and Catherine Browman. 1995. The center or the edge: How are consonant clusters organized with respect to the vowel? In K Elenius and P Branderud, Eds., Proceedings of the XIIIth International Congress of Phonetic Sciences, vol. 3. Stockholm, Sweden, pp. 552-555.

Hoole, Philip. 1997. Laryngeal coarticulation section 1: Coarticulatory investigations of the devoicing gesture. Forschungsberichte des Instituts für Phonetick und Sprachliche Kommunikation der Universität München 35, 89-99.

Hooper, Joan. 1976. An Introduction to Natural Generative Phonology. New York: Academic Press.

Hooper, Joan. 1978. Constrains on schwa-deletion in American English. In J Fisiak, Ed., Recent Developments in Historical Linguisitics. The Hague: Mouton, pp. 183-207.

House, A.S. and G. Fairbanks. 1953. The influence of consonant environment upon the secondary acoustical characteristics of vowels. Journal of the Acoustical Society of America 25, 105-113.

Hsu, Chai-Shune. 1996. A phonetically-based Optimality-Theoretic Account of Consonant Reduction in Taiwanese. UCLA Working Papers in Phonetics 92, 1-44.

Hyman, Larry. 1970. The role of borrowings in the justification of phonological grammars. Studies in African Linguistics 1, 1-48.

Iskarous, Khalil. 1998. Vowel Dynamics and Vowel Phonology. In Shahin Kimary, Susan Blake and Eun-Sook Kim, Eds., The Proceedings of the Seventeenth West Coast Conference on Formal Linguistics. Palo Alto: CSLI.

Itô, Junko and Armin Mester. 1995. Japanese phonology. In John Goldsmith, Ed., Handbook of Phonological Theory. Oxford: Blackwell, pp. 817-838.

Itô, Junko and Armin Mester. 1999a. Loanword phonology in Optimality Theory. Talk presented at the Parasession on Loan Word Phenomena, Annual Meeting of the Berkeley Linguistics Society, February 12-15, 1999.

Itô, Junko and Armin Mester. 1999b. The phonological lexicon. In N Tsujimura, Ed., Handbook of Japanese Linguistics. Oxford: Blackwell.

Page 211: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

201

Jacobs, Haike and Carlos Gussenhoven. 2000. Loan phonology: Perception, Salience, the Lexicon, and OT. In Joost Dekkers, Frank van der Leeuw and Jeroen van de Weijer, Eds., Optimality Theory: Phonology, Syntax, and Acquisition. Oxford: Oxford University Press, pp. 193-209.

Jannedy, Stefanie. 1994. Rate effects on German unstressed syllables. OSU Working Papers in Linguistics 44, 105-124.

Janson, Tore. 1986. Cross-linguistic trends in the frequency of CV sequences. Phonology Yearbook 3, 179-195.

Jescheniak, Joerg D. and Willem J.M. Levelt. 1994. Word frequency effects in speech production: Retrieval of syntactic information and of phonological form. Journal of Experimental Psychology: Learning, Memory and Cognition 20 (4), 824-843.

Johnson, Keith. 1997. Acoustic and Auditory Phonetics. Oxford: Blackwell. Jongman, Allard, Ratree Wayland and Serena Wong. 2000. Acoustic characteristics of

English fricatives. Journal of the Acoustical Society of America 108 (3), 1252-1263.

Jun, Jongho. 1995. Perceptual and Articulatory Factors in Place Assimilation: An Optimality Theoretic Approach. Ph.D. dissertation, UCLA.

Jun, Jongho. 1996. Place assimilation is not the result of gestural overlap: evidence from Korean and English. Phonology 13, 377-407.

Kahn, Daniel. 1976. Syllable-based generalizations in English phonology. Ph.D. dissertation, MIT.

Kaisse, Ellen. 1985. Connected Speech: The Interaction of Syntax and Phonology. San Diego: Academic Press.

Kawasaki, Haruko. 1982. An Acoustical Basis for Universal Constraints on Sound Sequences. Ph.D. dissertation, UC Berkeley.

Kaye, Jonathan. 1981. Loan Words and Abstract Phonotactic Constraints. Studia Anglica Posnaniensia 13, 21-42.

Keating, Patricia. 1984. Phonetic and phonological representation of stop consonant voicing. Language 60, 286-319.

Keating, Patricia. 1985. CV phonology, experimental phonetics, and coarticulation. UCLA Working Papers in Phonetics 62, 1-13.

Keating, Patricia. 1988. The phonology-phonetics interface. In F.J. Newmeyer, Ed., Linguistics: The Cambridge Survey, vol. 1. Cambridge: Cambridge University Press, pp. 281-302.

Keating, Patricia. 1990a. Phonetic representations in a generative grammar. Journal of Phonetics 18, 321-334.

Keating, Patricia. 1990b. The window model of coarticulation: articulatory evidence. In John Kingston and Mary Beckman, Eds., Papers in Laboratory Phonology I: Between the grammar and physics of speech. Cambridge: Cambridge University Press, pp. 451-470.

Kenstowicz, Michael. 1994. Phonology in Generative Grammar. Oxford: Blackwell. Kenyon, John and Thomas Knott. 1953. A Pronouncing Disctionary of American

English. Springfield, MA: G&C Merriam Co. Kici, Gasper. 1976. Albanian-English dictionary. Tivoli: Tip. A. Picchi.

Page 212: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

202

Kingston, John. 1990. Articulatory binding. In John Kingston and Mary Beckman, Eds., Papers in Laboratory Phonology I: Between the Grammar and Physics of Speech. Cambridge: Cambridge University Press.

Kingston, John. 1992. Comments on Chapter 2. In Gerard Docherty and D.R. Ladd, Eds., Papers in Laboratory Phonology II: Gesture, Segment, Prosody. Cambridge: Cambridge University Press.

Kingston, John and Avis Cohen. 1992. Extending Articulatory Phonology. Phonetica 49 (49), 194-204.

Kirchner, Robert. 1996. Synchronic chain shifts in Optimality Theory. Linguistic Inquiry 27, 341-350.

Kirchner, Robert. 1997. Contrastiveness and faithfulness. Phonology 14, 83-111. Kirchner, Robert. 1998/2001. An Effort-Based Approach to Consonant Lenition. London:

Routledge. Kirchner, Robert. to appear. Phonological contrast and articulatory effort. In Linda

Lombardi, Ed., Segmental Processes in Optimality Theory. Cambridge: Cambridge University Press.

Kochetov, Alexei. 2001/2002. Production, Perception and Emergent Phonotactic Patterns: A Case of Contrastive Palatalization. New York: Routledge.

Kochetov, Alexei. to appear. Syllable position effects and gestural organization: Articulatory evidence from Russian. In Papers in Laboratory Phonology VIII. Cambridge: Cambridge University Press.

Kohler, K.J. 1990. Segmental reduction in connected speech: Phonological facts and phonetic explanations. In William Hardcastle and Alain Marchal, Eds., Speech Production and Speech Modeling. Dordrecht: Kluwer, pp. 69-92.

Kohler, K.J. 1991. The phonetics/phonology issue in the study of articulatory reduction. Phonetica 48, 180-192.

Kohler, K.J. 1992. Gestural reorganization in connected speech: a functional viewpoint on Articulatory Phonology. Phonetica 49, 205-211.

Krakow, R. 1989. The articulatory organization of syllables: A kinematic analysis of labial and velar gestures. Ph.D. dissertation, Yale University.

Kritzinger, Matthys. 1986. Groot woordeboek : Afrikaans-Engels, [English-Afrikaans]. Pretoria: J.L. van Schaik.

Kucera, Henry. 1961. The Phonology of Czech. 'S-Gravenhage: Mouton & Co. Kucera, Henry and W. Nelson Francis. 1967. Computational Analysis of Present-Day

American English. Providence, RI: Brown University Press. Kuehn, D.P. and K. Moll. 1976. A cineflourographic investigation of CV and VC

articulatory velocities. Journal of Phonetics 3, 303-320. Labov, William. 1969. Contraction, deletion, and inherent variability of the English

copula. Language 45, 715-762. Ladd, D. Robert and James M. Scobbie. forthcoming. Post-lexical phonology does not

reduce to phonetics: The case of Sardinian external sandhi. In Papers in Laboratory Phonology VI. Cambridge: Cambridge University Press.

Ladefoged, Peter. 1988a. Hierarchical features of the International Phonetic Alphabet. UCLA Working Papers in Phonetics 70, 1-12.

Ladefoged, Peter. 1988b. The many interfaces between phonetics and phonology. UCLA Working Papers in Phonetics 70, 13-23.

Page 213: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

203

Ladefoged, Peter and Ian Maddieson. 1996. The Sounds of the World's Languages. Oxford: Blackwell.

Lavoie, Lisa and Abigail Cohn. 1999. Sesquisyllables of English: the structure of vowel- liquid syllables. In Proceedings of the XIV International Congress of Phonetic Sciences. San Francisco, CA, pp. 109-112.

Legendre, Géraldine, Paul Hagstrom, Ling Tao, Joan Chen and Lisa Davidson. 2001a. A preliminary look at the acquisition of aspect in Mandarin Chinese in Optimality Theory. In L Chen and Y Zhou, Eds., Proceedings of the Third International Conference on Cognitive Science. Beijing, China: Press of the University of Science and Technology of China, pp. 398-405.

Legendre, Géraldine, Paul Hagstrom, Anne Vainikka and Marina Todorova. 2001b. Evidence for syntactic competition in the acquisition of tense and agreement in child French. In Arika Okrent and John P Boyle, Eds., Papers from the 36th Regional Meeting of the Chicago Linguistics Society. Chicago, Ill.: Chicago Linguistics Society.

Li, Min, Chandra Kambhamettu and Maureen Stone. 2002. Region based contour tracking for human tongue. IEEE International Symposium on Biomedical Imaging: Macro to Nano (ISBI2002), Washington, DC, July 7-10, 2002.

Liberman, Mark. 1983. In favor of some uncommon approaches to the study of speech. In Peter MacNeilage, Ed., The Production of Speech. New York: Springer-Verlag.

Liljencrants, Johan and Björn Lindblom. 1972. Numerical simulation of vowel systems: the role of perceptual contrast.

Lin, Yen-Hwei. 1997. Syllabic and moraic structures in Piro. Phonology 14, 403-436. Lindblom, B., J. Lubker, B. Lyberg, P. Branderud and K. Holmgen. 1987. The concept of

target and speech timing. In R. Channon and L. Shockey, Eds., In Honor of Ilse Lehiste. Dordrecht: Foris.

Lindblom, Björn. 1967. Vowel duration and a model of lip mandible coordination. STL-QPSR 4/1967, 1-29.

Lindblom, Björn. 1983. Economy of Speech Gestures. In Peter MacNeilage, Ed., The Production of Speech. New York: Springer-Verlag.

Lindblom, Björn. 1986. Phonetic universals in vowel systems. In John Ohala and Jeri Jaeger, Eds., Experimental Phonology. Orlando: Academic Press.

Lindblom, Björn. 1990a. Explaining phonetic variation: a sketch of the H&H theory. In William Hardcastle and Alain Marchal, Eds., Speech Production and Speech Modeling. Netherlands: Kluwer, pp. 403-439.

Lindblom, Björn. 1990b. Phonetic content in phonology. PERILUS 11. Lindblom, Björn and Ian Maddieson. 1988. Phonetic universals in consonant systems. In

Larry Hyman and C. Li, Eds., Language, Speech and Mind. London: Routledge, pp. 62-78.

Löfqvist, Anders. 1991. Proportional timing in speech motor control. Journal of Phonetics 19, 343-350.

Löfqvist, Anders and Hirohide Yoshioka. 1981. Interarticulator programming in obstruent production. Phonetica 38, 21-34.

Lombardi, Linda. 2001. Why place and voice are different: constraint-specific alternations and Optimality Theory. In Linda Lombardi, Ed., Segmental

Page 214: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

204

Phonology in Optimality Theory: Constraints and Representations. Cambridge: Cambridge University Press.

Lovins, Julie. 1974. Why Loan Phonology is Natural Phonology. Papers from the Regional Meetings, Chicago Linguistic Society Special Issue, 240-250.

Lubker, James. 1986. Articulatory timing and the concept of phase. Journal of Phonetics 14, 133-137.

Lubker, James and Thomas Gay. 1982. Anticipatory labial coarticulation: Experimental, biological, and linguistic variables. Journal of the Acoustical Society of America 71, 437-448.

Lubowicz, Anya. 1998. Derived environment effects in Optimality Theory. Lingua 112, 243-280.

Lundberg, Andrew and Maureen Stone. 1999. Three-dimensional tongue surface reconstruction: Practical considerations for ultrasound data. Journal of the Acoustical Society of America 106 (5), 2858-2867.

Major, Roy. 1987. A model for interlanguage phonology. In Georgette Ioup and Steven Weinberger, Eds., Interlanguage Phonology: The Acquisition of a Second Language Sound System. Cambridge, MA: Newbury House.

Manuel, S., S. Shattuck-Hufnagel, S. Huffman, K. Stevens, R. Carlson and S. Hunnicut. 1992. Studies of vowel and consonant reduction. In Proceedings of the International Conference on Spoken Language Processing. Banff, Canada.

Martinet, André. 1952. Function, structure, and sound change. Word 8, 1-32. Massaro, Dominic and M. Cohen. 1983. Phonological context in speech perception.

Perception and Psychophysics 34 (3), 338-348. Matteson, Esther and Kenneth Pike. 1958. Non-phonemic transition vocoids in Piro

(Arawak). Miscellanea Phonetica 3, 22-30. Matthies, Melanie, Pascal Perrier, Joseph Perkell and Majid Zandipour. 2001. Variation

in anticipatory coarticulation with changes in clarity and rate. Journal of Speech, Language and Hearing Research 44 (2), 340-353.

Mattingly, Ignatius. 1981. Phonetic representation is speech synthesis by rule. In Terry Myers, John Laver and John Anderson, Eds., The Cognitive Representaiton of Speech. Amsterdam: North Holland.

McCarthy, John. 1988. Feature geometry and dependency: a review. Phonetica 45, 84-108.

McCarthy, John. 1991. Synchronic rule inversion. In Proceedings of the Berkeley Linguistics Society, vol. 17. Berkeley, CA: Berkeley Linguistics Society, pp. 192-207.

McCarthy, John. 1993. A case of surface constraint violation. Canadian Journal of Linguistics 38, 169-195.

McCarthy, John and Alan Prince. 1993. Generalized alignment. In Geert Booij and Jan van Marle, Eds., Yearbook of Morphology. Dordrecht: Kluwer.

McMahon, April, Paul Foulkes and Laura Tollfree. 1994. Gestural representation and lexical phonology. Phonology 11, 277-316.

Mehmet, Yavas. 1982. Natural Phonology and Borrowing Assimilations. Linguistics 20 (1-2), 123-132.

Page 215: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

205

Miller, George and Patricia Nicely. 1955. An analysis of perceptual confusions among some English consonants. Journal of the Acoustical Society of America 27 (2), 338-352.

Morelli, Frida. 1999. The Phonotactics and Phonology of Obstruent Clusters in Optimality Theory. Ph.D. dissertation, University of Maryland.

Moreton, Elliott. 2002. Structural constraints in the perception of English stop-sonorant clusters. Cognition 84, 55-71.

Moreton, Elliott and Paul Smolensky. 2002. Typological consequences of local constraint conjunction. In L. Mikkelsen and C. Potts, Eds., Proceedings of the West Coast Conference on Formal Linguistics 21. Cambridge, MA: Cascadilla Press.

Munhall, Kevin, Mitsuo Kawato and Eric Vatikiotis-Bateson. 2000. Coarticulation and physical models of speech production. In Michael Broe and Janet Pierrehumbert, Eds., Papers in Laboratory Phonology 5: Acquisition and the Lexicon. Cambridge: Cambridge University Press, pp. 9-28.

Munhall, Kevin and Anders Löfqvist. 1992. Gestural aggregation in speech: laryngeal gestures. Journal of Phonetics 20, 111-126.

Nagy, Naomi and David Heap. 1998. Francoprovençal null subjects and constraint interaction. In The Proceedings from the Chicago Linguistic Society's 34th Meeting. Chicago: Chicago Linguistic Society, pp. 151-166.

Nagy, Naomi and William Reynolds. 1997. Optimality Theory and variable word-final deletion in Faeter. Language Variation and Change 9, 37-55.

Ni Chiosáin, Máire and Jaye Padgett. 2001. Markedness, Segment Realization, and Locality in Spreading. In Linda Lombardi, Ed., Segmental Phonology in Optimality Theory. Cambridge: Cambridge University Press, pp. 118-156.

Nittrouer, Susan, Kevin Munhall, J.A. Scott Kelso, Betty Tuller and Katherine Harris. 1988. Patterns of interarticulator phasing and their relation to linguistic structure. Journal of the Acoustical Society of America 84 (5), 1653-1661.

O'Connor, Kathleen. 2002. Hidden rankings in Optimality Theory: Evidence from second language acquisition. Talk presented at the GLOW Workshop on Phonological Acquisition, Utrecht, The Netherlands.

Ohala, John. 1974. Phonetic explanation in phonology. In A. Bruck, R.A. Fox and M.W LaGaly, Eds., Papers from the parasession on natural phonology, Chicago Linguistic Society. Chicago: Chicago Linguistic Society, pp. 251-274.

Ohala, John. 1981. The listener as the source of sound change. In Carrie S. Masek, Roberta A. Hendrick and Mary Francis Miller, Eds., Papers from the Parasession on Language and Behavior, Chicago Linguistic Society. Chicago: Chicago Linguistic Society, pp. 178-203.

Ohala, John. 1983. The origin of sound patterns in vocal tract constraints. In Peter MacNeilage, Ed., The Production of Speech. New York: Springer-Verlag.

Ohala, John. 1989. Sound change is drawn from a pool of synchronic variation. In Leiv Egil Brevik and Enrst Hakon Jahr, Eds., Language Change: Contributions to the Study of Its Causes. New York: Mouton de Gruyter, pp. 171-198.

Ohala, John. 1990. The phonetics and phonology of assimilation. In John Kingston and Mary Beckman, Eds., Papers in Laboratory Phonology I: Between the Grammar and Physics of Speech. Cambridge: Cambridge University Press.

Page 216: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

206

Ohala, John. 1993. The phonetics of sound change. In Charles Jones, Ed., Historical Linguistics: Problems and Perspectives. London: Longman, pp. 237-278.

Ohala, John. 1994. Speech aerodynamics. In R.E. Asher and J.M.Y. Simpson, Eds., The Encyclopedia of Language and Linguistics, vol. 8. New York: Pergamon Press, pp. 4144-4148.

Ohala, John and Haruko Kawasaki-Fukumori. 1997. Alternatives to the sonority hierarchy for explaining segmental sequential constraints. In Stig Eliasson and Ernst Hakon Jahr, Eds., Language and Its Ecology. Berlin: Mouton de Gruyter.

Öhman, S. 1966. Coarticulation in VCV sequences: spectrographic measurements. Journal of the Acoustical Society of America 39, 151-168.

Orr, Carolyn. 1962. Ecuadorian Quichua phonology. In Benjamin Elson, Ed., Studies in Ecuadorian Indian Languages I. Norman, OK: Summer Institute of Linguistics.

Ortega-Llebaria, Marta. 2002. Interplay between phonetic and inventory constraints in the degree of spirantization of voiced stops: Comparing intervocalic /b/ and intervocalic /g/ in Spanish and English. Talk presented at the Laboratory Approaches to Spanish Phonology Conference, September 6-7, 2002, University of Minnesota.

Ostry, David and Kevin Munhall. 1985. Control of rate and duration of speech movements. Journal of the Acoustical Society of America 77 (2), 640-648.

Owens, Elmer. 1961. Intelligibility of words varying in familiarity. Journal of Speech and Hearing Research 4, 113-129.

Padgett, Jaye. 2001. Constraint conjunction versus grounded constraint subhierarchies in Optimality Theory. ms, UC Santa Cruz.

Paradis, Carole and Darlene LaCharité. 1997. Preservation and Minimality in Loanword Adaptation. Journal of Linguistics 33 (2), 379-430.

Parthasarathy, Vijay, Maureen Stone and Jerry Prince. 2003. Spatiotemporal visualiztion of the tongue surface using ultrasound and kriging. Proceedings of SPIE-Medical Imaging.

Pater, Joe. 1999. Austronesian Nasal Substitution and Other NC Effects. In Harry van der Hulst, Rene Kager and Wim Zonneveld, Eds., The Prosody Morphology Interface. Cambridge: Cambridge University Press, pp. 310-343.

Patterson, David, Paul C. LoCasto and Cynthia M. Connine. 2003. Corpora analyses of frequency of schwa deletion in conversational American English. Phonetica 60, 45-69.

Perkell, Joseph. 1969. Physiology of speech production: results and implications of a quantitative cineradiographic study. Cambridge, MA: MIT Press.

Pertz, Doris and Tom Bever. 1975. Sensitivity to phonoogical universals in children and adolescents. Language 51 (1), 149-162.

Pierrehumbert, Janet. 1990. Phonological and phonetic representation. Journal of Phonetics 18, 375-394.

Piesarskas, Bronius. 1995. Lithuanian dictionary : English-Lithuanian, Lithuanian-English. London: Routledge.

Pitt, Mark. 1998. Phonological processes and the perception of phonotactically illegal consonant clusters. Perception and Psychophysics 60 (6), 941-951.

Poplack, Shana and David Sankoff. 1984. Borrowing: The synchrony of integration. Linguistics 22, 99-135.

Page 217: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

207

Price, P.J. 1980. Sonority and syllabicity: Acoustic correlates of perception. Phonetica 37, 327-343.

Prince, Alan. 1997. Stringency and anti-Paninian hierarchies. Handout from LSA Institute; http://ling.rutgers.edu/gamma/talks/insthdt2.pdf.

Prince, Alan and Paul Smolensky. 1993. Optimality Theory: Constraint Interaction in Generative Grammar. Technical Report Rutgers University and Technical Report, University of Colorado, To appear, MIT Press.

Prince, Alan and Paul Smolensky. 1997. Optimality: From neural networks to universal grammar. Science 275, 1604-1610.

Reynolds, William. 1994. Variation and Phonological Theory. Ph.D. dissertation, University of Pennsylvania.

Ridouane, Rachid. 2002. Words without vowels: Phonetic and phonological evidence from Tashlhiyt Berber. ZAS Papers in Linguistics 28, 93-110.

Roca, Iggy and Wyn Johnson. 1999. A Workbook in Phonology. Oxford: Blackwell. Rosenzweig, M.R. and L. Postman. 1957. Intelligibility as a function of frequency of

usage. Journal of Experimental Psychology 54, 412–422. Saciuk, Bohdan. 1969. The stratal division of the lexicon. Papers in Linguistics 1, 464-

532. Sagey, E.C. 1986. The Representation of Features and Relations in Non-Linear

Phonology. Ph.D. dissertation, MIT. Saltzman, Elliot and Dani Byrd. 2000. Task-dynamics of gestural timing: Phase windows

and multifrequency rhythms. Human Movement Science 19, 499-526. Saltzman, Elliot, Anders Löfqvist and Subhobrata Mitra. 2000. 'Glue' and 'clocks':

intergestural cohesion and global timing. In Michael Broe and Janet Pierrehumbert, Eds., Papers in Laboratory Phonology 5: Acquisition and the Lexicon. Cambridge: Cambridge University Press, pp. 88-101.

Saltzman, Elliot and Kevin Munhall. 1989. A dynamical approach to gestural patterning in speech production. Ecological Psychology 1, 333-382.

Sato, Charlene. 1987. Phonological Processes in Second Language Acquisition: Another Look at Interlanguage Syllable Structure. In Georgette Ioup and Steven Weinberger, Eds., Interlanguage Phonology: The Acquisition of a Second Language Sound System. Cambridge, MA: Newbury House, pp. 248-260.

Schönkron, Marcel. 1967. Rumanian-English and English-Rumanian dictionary; with supplement of new words. New York: F. Ungar Publishing Company.

Selkirk, Elizabeth. 1984. On the major class features and syllable theory. In Mark Aronoff and Richard Oehrle, Eds., Language Sound Structure. Cambridge, MA: MIT Press.

Shaiman, Susan. 2001. Kinematics of compensatory vowel shortening: the effect of speaking rate and coda composition on intra- and inter-articulatory timing. Journal of Phonetics 29, 89-107.

Shaiman, Susan, Scott G. Adams and M.D.Z. Kimelman. 1997. Velocity profiles of lip protrusion acorss changes in speaking rate. Journal of Speech, Language and Hearing Research 40, 144-158.

Shaiman, Susan, Scott G. Adams and Mikael D.Z. Kimelman. 1995. Timing relationships of the upper lip and jaw across changes in speaking rate. Journal of Phonetics 23, 119-128.

Page 218: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

208

Silverman, Daniel. 1992. Multiple scansions in loanword phonology: evidence from Cantonese. Phonology 9, 289-328.

Silverman, Daniel. 1995/1997. Phasing and Recoverability. New York: Garland Press. Silverman, Daniel. 1997. Phasing and Recoverability. New York: Garland Press. Smolensky, Paul. 1995. On the internal structure of the constraint component Con of UG.

Handout of talk given at University of Arizona. Smolensky, Paul. 1997. Constraint interaction in generative grammar II: Local

Conjunction (or, Random rules in Universal Grammar). Paper presented at the Hopkins Optimality Theory Conference/University of Maryland Mayfest. Baltimore, MD.

Smolensky, Paul and Géraldine Legendre, Eds. to appear. The harmonic mind: From neural computation to optimality-theoretic grammar. Cambridge, Blackwell.

Smorodinsky, Iris. 2002. Schwas with and without active control. Ph.D. dissertation, Yale University.

Solé, M.J. 1995. Spatio-temporal patterns of velopharyngeal action in phonetic and phonological nasalization. Language and Speech 38, 1-23.

Sproat, R. and Osamu Fujimura. 1993. Allophonic variation in English /l/ and its implications for phonetic implementation. Journal of Phonetics 21 (291-311).

Stampe, David. 1973. A Dissertation on Natural Phonology. Ph.D. dissertation, University of Chicago.

Stanislawski, J. 1988. McKay's English-Polish/Polish-English dictionary. New York: Random House.

Steriade, Donca. 1982. Greek prosodies and the nature of syllabification. Ph.D. dissertation, MIT.

Steriade, Donca. 1993. Closure, release and nasal contours. In M. Huffman and R. Krakow, Eds., Nasals, Nasalization, and the Velum. San Diego: Academic Press, pp. 401-470.

Steriade, Donca. 1997. Phonetics in Phonology: The Case of Laryngeal Neutralization. ms., ms., UCLA.

Steriade, Donca. 1999a. Alternatives to the syllabic interpretation of consonantal phonotactics. In O Fujimura, B Joseph and B Palek, Eds., Proceedings of the 1998 Linguistics and Phonetics Conference. Prague: The Karolinum Press.

Steriade, Donca. 1999b. The phonology of perceptibility effects: the P-map and its consequences for constraint organization. ms, UCLA.

Steriade, Donca. 2000. Paradigm uniformity and the phonetics-phonology boundary. In Janet Pierrehumbert and Michael Broe, Eds., Papers in Laboratory Phonology VI. Cambridge: Cambridge University Press.

Steriade, Donca. in press. Directional asymmetries in place assimilation: a perceptual account. In Elizabeth Hume and Keith Johnson, Eds., The rold of speech perception phenomena in phonology. San Diego: Academic Press.

Stetson, R.H. 1951. Motor phonetics: A study of speech movements in action. Amsterdam: North-Holland.

Stevens, Kenneth. 1998. Acoustic Phonetics. Cambridge, MA: MIT Press. Stevens, Kenneth, Sheila Blumstein, L. Glicksman, Martha Burton and K. Kurowski.

1992. Acoustic and perceptual characteristics of voicing in fricatives and fricative clusters. Journal of the Acoustical Society of America 91 (5), 2979-3000.

Page 219: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

209

Stone, Maureen. 1991. Imaging the tongue and vocal tract. British Journal of Disorders of Communication 26, 11-23.

Stone, Maureen. 1995. How the tongue takes advantage of the palate during speech. In Fredericka Bell-Berti and Lawrence Raphael, Eds., Producing Speech: Contemporary Issues: A Festschrift for Katherine Safford Harris. New York: American Institute of Physics, pp. 143-153.

Stone, Maureen and Edward P. Davis. 1995. A head and transducer support system for making ultrasound iamges of tongue/jaw movement. Journal of the Acoustical Society of America 98 (6), 3107-3112.

Stone, Maureen, Alice Faber, Lawrence Rafael and Thomas Shawker. 1992. Cross-sectional tongue shape and linguopalatal contact patterns in [s], [esh], and [l]. Journal of Phonetics 20 (2), 253-270.

Stone, Maureen and Andrew Lundberg. 1996. Three-dimensional tongue surface shapes of English consonants and vowels. Journal of the Acoustical Society of America 99, 3728-3737.

Tarone, Elaine. 1987. Some influences on the syllable structure of interlanguage phonology. In Georgette Ioup and Steven Weinberger, Eds., Interlanguage Phonology: The Acquisition of a Second Language Sound System. Cambridge: Newbury House Publishers.

Tesar, Bruce and Paul Smolensky. 2000. Learnability in Optimality Theory. Cambridge, Mass.: MIT Press.

Thorndike, Edward L. and Irving Lorge. 1944. A Teacher's Word Book of 30,000 Words. New Yrok: Teachers College, Columbia University.

Tjaden, Kris and Gary Weismer. 1998. Speaking-Rate-Induced Variability in F2 Trajectories. Journal of Speech, Language and Hearing Research 41 (5), 976-989.

Tomiak, G. R. 1990. An acoustic and perceptual analysis of the spectral moments invariant with voiceless fricative obstruents. Ph.D. dissertation, SUNY Buffalo.

Tserdanelis, Georgios. 2001. A perceptual account of manner dissimilation in Greek. OSU Working Papers in Linguistics 56, 172-199.

Ussishkin, Adam and Andrew Wedel. 2003a. Gestural motor programs account for asymmetries in loanword adaptation patterns. Talk presented at the 77th Annual Meeting of the Linguistic Society of America, Atlanta, Georgia, January 2-5, 2003.

Ussishkin, Adam and Andrew Wedel. 2003b. Gestural motor programs and the nature of phonotactic restrictions: Evidence from loanword phonology. Talk presented at West Coast Conference on Formal Linguistics (WCCFL) 22, UC San Diego, March 21-23, 2003.

Vatikiotis-Bateson, Eric, Makoto Hirayama, K. Honda and Mitsuo Kawato. 1992. The articulatory dynamics of running speech: gestures from phonemes. In Proceedings of the 1992 International Conference on Spoken Language Processing, vol. 2, pp. 887-890.

Walker, James. 2000. Prosodic optimality and variability in English auxiliaries. McGill Working Papers in Linguistics 15 (1), 105-119.

Warner, Natasha, Allard Jongman, Anne Cutler and Doris Mücke. 2002. The phonological status of Dutch epenthetic schwa. Phonology 18.

Page 220: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

210

Weismer, Gary, Kris Tjaden and Raymond Kent. 1995a. Can articulatory behavior in motor speech disorders be accounted for by theories of normal speech production? Journal of Phonetics 23, 149-164.

Weismer, Gary, Kris Tjaden and Raymond Kent. 1995b. Speech production theory and articulatory behavior in motor speech disorders. In Fredericka Bell-Berti and Lawrence Raphael, Eds., Producing Speech: Contemporary Issues for Katherine Safford Harris. New York: AIP Press, pp. 35-50.

Westbury, John and Patricia Keating. 1986. On the naturalness of stop consonant voicing. Journal of Linguistics 22, 145-166.

Wilson, Colin. 2000. Targeted Constraints: An Approach to Contextual Neutralization in Optimality Theory. Ph.D. dissertation, Johns Hopkins University.

Wilson, Colin. 2001. Consonant cluster neutralisation and targeted constraints. Phonology 18, 147-197.

Wright, J.T. 1986. The behavior of nasalized vowels in the perceptual vowel space. In John Ohala and Jeri Jaeger, Eds., Experimental Phonology. New York: Academic Press, pp. 45-67.

Wright, Richard. 1996. Consonant Clusters and Cue Preservation in Tsou. Ph.D. dissertation, UCLA.

Yoshioka, Hirohide, Anders Löfqvist and Hajime Hirose. 1981. Laryngeal adjustments in the production of consonant cluster and geminates in American English. Journal of the Acoustical Society of America 70, 1615-1623.

Zec, Draga. 1995. Sonority constraints on syllable structure. Phonology 12, 85-129. Zhang, Jie. 2001. The Effects of Duration and Sonority on Contour Tone Distribution-

Typological Survey and Formal Analysis. Ph.D. dissertation, UCLA. Zipf, George. 1949. Human Behavior and the Principle of Least Effort. Cambridge:

Addison-Wesley Press, Inc. Zsiga, Elizabeth. 1993. Features, gestures, and the temporal aspects of phonological

organization. Ph.D. dissertation, Yale University. Zsiga, Elizabeth. 1994. Acoustic evidence for gestural overlap in consonant sequences.

Journal of Phonetics 22, 121-140. Zsiga, Elizabeth. 1995. An acoustic and electropalatographic study of lexical and post-

lexical palatalisation in American English. In B Connell and A Arvaniti, Eds., Phonology and Phonetic Evidence: Papers in Laboratory Phonology IV. Cambridge: Cambridge University Press, pp. 282-302.

Zsiga, Elizabeth. 1997. Features, gestures, and Igbo vowels: An approach to the phonology-phonetics interface. Language 73 (2), 227-274.

Zsiga, Elizabeth. 2000. Phonetic alignment constraints: consonant overlap and palatalization in English and Russian. Journal of Phonetics 28, 69-102.

Zuraw, Kie. 1996. Floating Phonotactics: Variability in Infixation and Reduplication of Tagalog Loanwords. MA Thesis, UCLA.

Zuraw, Kie. 2000. Patterned exceptions in phonology. Ph.D. dissertation, UCLA. Zuraw, Kie. 2002. Class notes from Topics in Phonetics and Phonology: Loan

Phonology. UCLA. Zwicky, Arnold. 1972. Note on a phonological hierarchy in English. In R Stockwell and

R Macaulay, Eds., Linguistic Change and Phonological Theory. Bloomington, IN: Indiana University Press.

Page 221: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

211

Page 222: THE ATOMS OF PHONOLOGICAL REPRESENTATION ...

212

CURRICULUM VITA Lisa Davidson was born in Poughkeepsie, New York on March 9, 1975. Ms. Davidson graduated magna cum laude from Brown University in 1997 with an A.B. in Linguistics and Hispanic Studies. While at Brown, Ms. Davidson began research in linguistics, first with Dr. Rolf Noyer, and later with Dr. Katherine Demuth. After graduating from Brown, Ms. Davidson was awarded a Fulbright Pre-Doctoral Grant and spent a year working with Dr. Núria Sebastián Gallés in the Department of Basic Psychology at the University of Barcelona. She entered graduate school at Johns Hopkins University in 1998 and received her M.A. in Cognitive Science in 2000. While at Johns Hopkins, Ms. Davidson has been the teaching assistant for a number of courses, including Introduction to Cognitive Neuropsychology, World of Language, Language and Mind, and Sound Structure in Natural Language. Her research has been directed by Dr. Paul Smolensky. Ms. Davidson has presented her work at a number of conferences, including the West Coast Conference on Formal Linguistic, the Annual Meeting of the Linguistic Society of America, and the International Congress on Phonetic Sciences. Ms. Davidson has also had several articles published in edited volumes. In fall of 2003, Ms. Davidson will begin a position as an assistant professor of linguistics at New York University.