L2 Phonology of Cantonese speakers of English voicing and aspiration of voicing and stops in onset...
-
Upload
buddhajesus -
Category
Documents
-
view
1.773 -
download
3
description
Transcript of L2 Phonology of Cantonese speakers of English voicing and aspiration of voicing and stops in onset...
L2 phonology of Cantonese speakers of English:Voicing and aspiration contrast of stops in onset and coda.1
Angus Fung
Department of Linguistics
University of Calgary
Supervisor: John Archibald
April 2004
1.
1 I would like to express my gratitude to all those who helped me to complete this thesis. I am deeplyindebted to my supervisor Prof. Dr. J. Archibald whose help, suggestions and encouragement helpedme in all the time for writing of this thesis. I extend my thanks to him for his countless hours ofdiscussion and commentary.
2
Introduction
Second language acquisition is the phrase used to describe the process that
people go through when confronted by a need to use a language other than their native
one for communication. People acquire their first and second languages differently.
Some of the issues and processes involved in language acquisition include the idea of
innateness (Is language ability determined genetically?), the relevance of the language
input the language learner receives, and the nature of early (developmental) grammars
(O'Grady et al, 1989). In this paper, I am going to address a number of issues that
have to do with the acquisition of voicing and aspiration contrast in a second language
(L2). My major focus will be on what Cantonese speaker learners do when they are
learning English stops. I will also look at a few other languages and their acquisition of
new stop consonants in an L2.
Most if not all of the pronunciation problems encountered by Cantonese
learners of English may be adequately accounted for by the contrastive differences of
the two languages. I will also examine the phonological differences between the two
languages, ranging from their phoneme inventories, the characteristics of the
phonemes, the distributions of the phonemes, syllable structure. At the segmental
level, substitution by a related sound in the native language, deletion and epenthesis
are by far the most common strategies.
Cantonese and English are two typologically different languages. Cantonese is
one of the major dialects of Chinese and the language belongs to the Sino-Tibetan
language family. It is spoken in Guangdong (including Hong Kong), Macau, and in the
southern part of Guangxi (Figure 1). On the other hand, English is a Germanic language
which belongs to the Indo-European family. (Ethnologue, 2004).
3
Figure 1: Map of Guangdong Province (Wertz, 2003)
China
Figure 1
2. Phonetics of VOT
In this section, I will take a look at the production and perception of stops in
term of different voice onset time, one of the cues to contrast voicing and aspiration
contrast in stops2.1 Articulation
Voice Onset Time (VOT) is the duration of the period of time between the
release of a plosive and the beginning of vocal fold vibration. This period is usually
4
measured in milliseconds (ms). It is useful to distinguish at least three types of
plosives with different VOT: voiceless-unaspirated, voiceless-aspirated and voiced.
Figure 2 shows the waveforms of the two plosives “t” and “d”. The arrows
indicate the release burst of the stop consonant and the onset of glottal vibrations for
the vowel. Clearly, the VOT is longer for the voiceless than the voiced stop. This is
due to the glottal abduction, which is the closure of the vocal folds for the voiceless
stop and its temporal relationship to the oral closing and opening movements.
Figure 2. The picture is a waveform of English [t, d] each followed by the vowel[a]. The y-axis represents amplitude. The x-axis is time - 1.5s overall. Morton, K.(1995)
2.2 Voicing and Aspiration
When a plosive sound has a fairly long positive VOT (longer than about
50ms). The air from the lungs is traveling quite quickly through the vocal tract. It is
not slowed down either by the vocal folds, which are open, nor by a constriction in
the vocal tract because the plosive has been released. The rapid airflow creates a weak
friction noise. When a voiceless unaspirated plosive is followed by a vowel, the time
when the vocal folds begin vibrating for the vowel will coincide almost exactly with
the time when the plosive is released (give or take up to 20 milliseconds). After a
voiceless aspirated stop, however, the vocal folds will not begin vibrating until well
after the plosive is released.
The production of stops is not always uniform in terms of VOT, but when
you have two or more contrasting stops in a language, for example /t/ and /d/ in
5
English. The two stops would be produced within a particular range of VOT. In the
following graph (Figure 3), it shows the production of a speaker of American English
for words beginning with /d/ or /t/. The production of /d/ ranges from 0ms to 25ms and
that of /t/ ranges from 50ms to 80ms.
Figure 3. VOT production of a single normal adult speaker of American English forwords beginning with /d/ and /t/. Blumstein et al., (1980)
These are just two different possible ways of coordinating the timing between
vocal fold vibration and a closure in the mouth. Various languages make use of many
points along this VOT continuum. In the following diagrams (Figure 4), the top half
represents the closing and opening of a plosive in the mouth and the bottom half
represents the state of the vocal folds -- a straight line means voicelessness and a
wavy line means voicing.
Lip closure
1 fully voiced /ba/
6
2 partially voiced /ba/
3 voiceless
unaspirated
/pa/
4 aspirated /pha/
5 strongly aspirated /pha/
Figure 4. Different VOT of stops. (Russell, 1997)
Languages that make voicing contrasts usually choose two or three points
along this continuum (Abramson, & Lisker, 1970). English has chosen to use position
2 for its voiced sounds and either 3 or 4 (depending on position in the word or
syllable) for voiceless sounds. French has chosen to use 1 (fully voiced) and 3
(voiceless unaspirated) (Flege 1987). Cantonese has chosen to use 3 (voiceless
unaspirated) and 5 (strongly aspirated) (Tsui & Ciocca 2000).
2.3 Perception of VOT
The perception of the voicing and aspiration contrasts (e.g. /p/ vs. /b/, /ph/ vs.
/p/) in stops depends on acoustic cues such as VOT. We usually do not perceive
stimuli categorically (Kess 1992). For example, we do not see a colour spectrum from
blue to red as either pure blue or pure red and nothing in between. A colour can be
kind of blue and kind of green. Whereas a stop cannot be kind of [d] and kind of [t]; it
is either a [d] or a [t].
One of the things that people seem to perceive categorically is speech. This is
called categorical perception because instead of getting a percept that is ambiguous,
you get a percept that perfectly matches an example of a particular category. So even
when the physical stimuli change continuously, people would still perceive it
categorically. For example, both /b/ and /p/ are stop consonants and to produce these,
you close your lips, then open them, release some air, and the vocal cords begin
7
vibrating. The difference between /ba/ and /pa/ is the different VOT of the two stops.
For /b/, VOT is very short; voicing begins at almost the same time as the air is
released. For /p/, the onset of voicing is delayed.
To show the categorical perception of stops, a study by Pisoni & Tash (1974)
used a series of synthetic stimuli that span the VOT continuum between /ba/ and /pa/.
When people were asked to identify these stimuli, they generally have no difficulty:
the lower half of the continuum is consistently identified as /ba/ and the upper half as
/pa/ as show in Figure 5. People did not report hearing something that is a bit like [b]
and a bit like [p]. Rather, they report hearing either [b] or [p]. Thus, the actual VOT
of the individual stimulus appears to be discarded, and all that remains in the percept
is category membership.
Because of the categorical perception of speech, it is not an easy task for
people to distinguish all speech sounds. Generally, they can only distinguish the
speech sounds that result in meaningful differences in their native language. To find
out an infant’s ability to discriminate different speech sound, Eimas et al, (1971)
tested two groups of infant whowere1 month and 4 months of age in their study.
Result showed that infants at both ages distinguished sounds that were members of
separate phonemes (i.e. categories) from one another but they failed to distinguish
sounds within a given category. The study also shows that infants can distinguish
speech sounds before they can produce them. Figure 6 shows the result of this
experiment. For stops with VOT at –20ms and 0ms, infants perceived them as the
same stop; it is also true for stops with VOT at 60ms to 80ms. But for the stops with
20ms and 40ms VOT, they perceived as two different stops.
8
Figure 6. Experimental design of infant discrimination study. Eimas et al, (1971)
Note S = perceives as the same stops; D = perceived as two different stops.
In the next section, we are going to look at the differences between English and
Cantonese phonological systems, as this would help us to account for problems and
difficulties encountered by Cantonese speakers in the process of learning English
pronunciation.
3. Phonology of Syllables
There are 24 consonants in English and 19 consonants in Cantonese. In both
English and Cantonese, they both have six stops in bilabial /p, b/, alveolar /t, d /, and
velar /k, g/. In English, /p, t, k / are voiceless whereas /b, d, g / are voiced. In
Cantonese, however, there are no voiced plosives; all plosives are voiceless. The
feature that distinguishes between the stops is aspiration (/p, t, k/ vs. / ph, th, kh/).
9
Table 1. An overview of English and Cantonese consonants (Chan & Li ,2000).Method ofarticulation
Place of articulation
Bilabial Labio-dental Dental Alveolar Palatal-alveolar
Palatal Velar Labio-velar Glottal
(C)Plosives(E)
p, ph
p, b
t, th
t, d
k, kh
k, g
kw, kwh
(C)Fricatives(E)
f
f, v
s
s, z ,
h
h
(C)Affricates(E)
ts, tsh
t, t(C)Nasals(E)
m
m
n
n
(C)Lateral (E)
l
l(C)Approximants(E)
w
w r
j
j
English has a relatively complex syllable structure. There can be a maximum of
three consonants before a vowel and a maximum of four consonants after a vowel. One
such example is ‘strengths’ /streks /. The syllable structure of Cantonese, in
contrast, is rather simple; the possible combinations of sounds are restricted. Unlike
English, there are no consonant clusters in Cantonese. Thus, in terms of possible
configurations of V and C, English clearly outnumbers Cantonese, the latter being
limited to V, CV, VC, and CVC. Examples are given in Table 2 below:Table 2Syllable structure ExamplesV // _ ‘exclamation showing surprise’CV /fu/ _ ‘husband’VC / an/ _ ‘late’
CVC /sik/ _ ‘colour’
In terms of distribution of consonants, all the stops in English may occur in
initial or final position of a syllable except [] which cannot occur in syllale initials. In
contrast, only /p, t, k / in Cantonese may occur in syllable-final position, as illustrated
in Figure 7. It should be noted that unlike plosives in English, Cantonese plosives in
10
word-final position are always unreleased. For example, in the word ‘duck’, /ap/,
the word ‘prosper’,/fat/, and in the word ‘house’, /uk/. Whereas in English,
unreleased stops only occur in connected speech when a word-final stop is followed
by a word in a word initial stop. For example, the word final [p] of the word “cup” is
unreleased when it is followed by a consonant.
(1) “cup to” /kptu/
(2) “cup on” /kpn/Figure 7. Syllable structure of Cantonese and English.
Cantonese: English: _
onset rhyme
nucleus coda
(C) V (C) p, t, k m, n,
_
onset rhyme
nucleus coda
C C C V C C C C
p, b, t, d, k, g, f, v, s, z, , , h,
t, t, m, n, l, w, r, j
4. Explaining L2 Behavior
So, as we have examined the phonological differences between the two
languages, I would like to review the behavior of L2 learners, how do we predict what
they will do if the target forms are not found in their native language.
Second language researchers have proposed a number of theories to explain
why certain target forms are more difficult to acquire than others. One of the earliest
was the Contrastive Analysis Hypothesis (Lado, 1957). This hypothesis stated that
when two languages are similar, positive transfer will occur and hence those form will
be easy to learn; where they are different, negative transfer or interference will result
and those forms will be difficult to acquire. However, it turns out that defining
similarity and difference is not always easy. Some researches (Eckman & Iverson
11
1993, 1994) suggested that typological markedness be the basis of prediction.
Structures that are complex and/or especially common in human language are said to be
unmarked, while structures that are less complex or less common are said to be
marked. A definition is given in Eckman (1981). "A phenomenon A in some language
is more marked relative to some other phenomenon B if, cross-linguistically, the
presence of A in a language necessarily implies the presence of B, but the presence of
B does not necessarily imply the presence of A." In other words, when a language has
voiced stops e.g. [d], we would expect that the language would have a voiceless
counterpart, [t] but not vice versa. From that, we could say that voiced stops are more
marked than voiceless ones.
Sometimes something that is not in your L1 can be easy to acquire, e.g. English
does not make contrast between [] and [] in word initial position. But English
speakers seem able to make the contrast in French onsets without trouble. The
Markedness Differential Hypothesis (MDH) investigates second language acquisition
by comparing the relative markedness of structures in L1 and L2. In those areas in
which there are differences between a target and a native language, the degree of
difficulty will be greater when the area of difficulty is more marked in the native
language and smaller when the degree of markedness is smaller. The degree of difficulty
among those target language (TL) structures that are different from those in the native
language (NL) will correspond directly to the degree of markedness The two
considerations made by the MDH that we need to consider when predicting L2
difficulty of the target language are as follow:
-The difference between the NL and TL.
-The markedness relationships holding between those areas of differences.
In (3), the presence of nasal vowels implies the presence of oral vowels but not
vice versa. There are languages which have [a] and [a]; languages which have [a] alone,
but there are no languages which have [a] but not [a]. From that, we know that nasal
vowels are more marked than oral vowels and so we would predict that the degree of
12
difficulty is higher when there are nasal vowels in the target language but not the
native language.
(3) [a] implies [a]
∴Nasal vowels are more marked than oral vowels. Hence, the prediction of the
MDH would be that nasal vowels are more difficult to acquire.
On the other hand, those TL differences that are not more marked will not be
difficult. MDH can explain several major patterns of difficulties found in second
language acquisition. Now we know that what kind of target forms are difficult for L2
learners, we will discuss what L2 learners will do when the target forms are difficult to
acquire.
4.1 Repair Strategies
It is a common phenomenon in second language learning which involves
modifying an L2 word so that it fits the L1 syllable structure. For example in
Japansese loanword, “strike” /straik / becomes /sutoraiki/ because Japanese mainly
allows CV in its syllable structure. Another example is found in German speakers.
When they are learning English, they would produce words with syllable final
obstruent devoicing (producing [hæt] for [hæd] “had”) because they have no voicing
contrast at the end of words in their L1.
The consonant-vowel (CV) is the least marked syllable structure because it
can be found in all languages in the world. In order for Cantonese speakers to
pronounce the target English items, Cantonese speaker would adopt a number of
strategies to break up the more complex, more marked syllabification in English.
Epenthesis is one of the strategies Cantonese speaker use. A vowel, usually a schwa
// is inserted between a consonant cluster or after a final consonant of the syllable.
Another repair strategy is deletion. In this case, Coda consonants or one of the
consonant clusters are deleted in order to obtain the more optimal syllable (CV). The
final type of strategy concerning coda consonants or onset consonant clusters is
replacement or substitution. This strategy doesn’t alter the syllable structure and it
13
appears quite frequently in final voiced stops (Edge, 1991). For Cantonese L2 learners
of English, the most number of errors found in these items are voice feature.
Devoicing is the most common in final voiced stops. The follow examples illustrate
the three strategy mentioned above.
(4) Solutions to syllable structure problems:
a. Epenthesis /dg/ /dg/
b. Deletion /dg/ /d/
c. Devoicing /dg/ /dk/
Different strategies for syllable structure simplification result in different
outcomes: CVC sequences undergoing final consonant deletion or epenthesis surface
as CV syllables, whereas repair strategies such as final devoicing and substitution
maintain the relatively marked, closed CVC structure. Even though both deletion and
epenthesis convert the relatively complex CVC syllable into relatively simple CV
syllables, their outcomes differ as to what degree of ambiguity they impose on a word.
According to Weinberger (1994), recoverability is a principle “subsumed under
a theory of universal grammar” languages, native speakers, and language learners avoid
or minimize ambiguity. Young children frequently delete segments in both onset and
coda position but very rarely make use of epenthesis. This is because their phonetic
ability is low and their functional knowledge (in terms of the recoverability principle)
is not yet developed. Adults learning L2s seem to exhibit far more instances of
epenthesis than children acquiring their L1. The reason why epenthesis is a more
common simplification strategy in adult L2 acquisition is, according to Weinberger
(1994), that even though adults’ phonetic skills in the target language lag behind that
of native speakers, they do have access to the recoverability principle.
To learn more about recoverability of L2 learners, Abrahamsson (2003) did a
study of Chinese-Swedish interphonology. Three Chinese subjects were included in
this longitudinal study of their L2 acquisition of Swedish. Recordings were made in a
3- to 5-week intervals from August 1990 to May 1991. This experiment was to test
his hypotheses about L2 learns’ developmental aspect and selection of repair
14
strategies by L2 learners regarding grammatical and functional aspects. He predicted
that the error frequencies will be relatively low in the initial stages, higher frequencies
at a later stage, and relatively low frequencies again at even later stages of acquisition.
Also, epenthetic forms will be relatively lower in early phrases of development but
greater in later phrases. Figure 8 shows the results of the overall error frequencies in
the experiment. The result agreed with his prediction that learners’ acquisition of
codas can be characterized by the following four phases: (a) an initial phase with
relatively high error rates, followed by a rapid decrease in error frequency; (b) a linear
increase in error frequency; (c) a stable plateau phase of relatively high error
frequencies; and (d) a possible decrease in error rates as acquisition proceeds.
Figure 8. overall error frequencies (deletion + epenthesis),development over time.
Figure 9 gives a summarized description of what the pattern looks like when
the mean epenthesis proportions for the autumn semester 1990 are compared with the
mean proportions for the spring semester 1991. Subject C1 already used epenthesis
more than twice as much as deletion during the autumn semester (epenthesis-deletion
proportion: 2.13) and almost three times as often during the spring semester
(proportion: 2.87). C2’s use of epenthesis is barely half as frequent as his use of
deletion during the first semester (proportion: 0.44), but there is a significant increase
15
in his use of epenthesis, which is almost as frequent as deletion during the second
semester (proportion: 0.88). Finally, C4 increased her use of epenthesis, which was
nearly as frequent as deletion during the autumn of 1990 (proportion: 0.75), to a level
almost three times as frequent during the spring of 1991 (proportion: 2.77). This was
a significant change
Figure 9. proportion of epenthesis to deletion errors, development over time.
The functional or grammatical role of the coda also determines the use of
different repair strategies. In Abrahamsson 2003’s hypothesis, word-final codas that
are relatively more important for the retention of semantically relevant information
will generate lower overall frequencies of simplification, greater epenthesis-deletion
proportions, or both, than will codas containing information that is more recoverable
(or predictable) from other segments or features in the context. In Swedish, /r/ coda
can serve as a plural marker, or a tense marker and also occurred in noninflected
words. According to Abrahamsson’s hypothesis, if the final consonant of an
noninflected word has been deleted, it is generally not expressed by other explicit
markers or features in the context, and it can be argued that deletion of the stem-final
/r/ results in much greater lexical-semantic ambiguity than the partial deletion of an
inflectional morpheme. It may therefore also be argued that the retention of final /r/ is
16
more beneficial in noninflected words. To test the hypothesis, inflected words that
ended in either the present-tense morpheme -r/-er or the plural morpheme -r/-ar/-er/-or
were compared with noninflected words with stems that ended in a single /r/. Figure
10 shows the proportions of epenthesis to deletion, although the differences again
appear to be very small, all subjects produced significantly more epentheses for
noninflected forms than for inflected forms.
Figure 10. proportion of epenthesis to deletion errors, inflectional vs.
lexical /r/ codas.
Figure 11. proportion of epenthesis to deletion errors, present tense
vs. plural /r/ codas.
Two pairs of word classes were compared on the subject of epenthesis and
deletion. One of them is the comparison between present tense and plural, As can be
seen in Figure 11, there is no consistency between the three subjects: C1 used
epenthesis significantly more often for present-tense (proportion: 0.1) than for plural
codas (0.02); subject C2 did not differentiate his use of epenthesis between the two
17
inflectional categories in any significant way. The other comparison deals with
differences between an open- or closed-category words. Since Swedish word-final /r/
of open-class words is less recoverable from the context, they will thus be pronounced
more accurately with a lower overall error frequency and a higher proportion of
epenthesis than the more recoverable or predictable /r/ of closed-class words. The
result is shown in Figure 12.
Figure 12. Proportion of epenthesis to deletion errors, closed-class vs.
open-class /r/ codas.
It is generally believed that greater accuracy is obtained by L2 learners as style
becomes more formal in learners’ production of singleton consonants (Schmidt, 1987).
However, Lin (2001) found that in the case of consonant clusters, it is the learners’
choice of repair strategy but not the error rates that varies with the style of speech.
Twenty Chinese adults were included in her study of production of English onset
consonant clusters in four different types of tasks. The experiment include a wide
variety of task types, ranging from the most formal “reading of minimal pairs”, “word
list reading”, “sentence reading” to the least formal “conversation” as shown in the
following Figure 13.Figure 13
reading of minimal pairs word list reading sentence reading conversation
most formal least formal
18
The results of the error rates support her hypotheses and do not conform to
the general prediction that more accuracy will be obtained from L2 learners’
production of target items as the style becomes more formal. There is no significant
difference was found in the students’ error rates in the four speech tasks as shown in
Figure 14.
Figure 14. Overall error rates in the four tasks. (Lin 2000)
Her study also showed that the use of epenthesis increased as the style of the
task became more formal, and the percentage of deletion and replacements became
higher in less formal tasks. It is also true that the proportion of epenthesis vs. deletion
should be greater in tasks without linguistic context than in tasks with linguistic
context. For tasks that were more formal or that require more attention to form or
pronunciation rather than to content, the use of epenthesis would increase. One the
other hand, when the tasks became less formal or as more attention was paid to
content rather than form, more instances of deletion and replacement would be
preferred. The results of her experiment indicate that what is shifted with style is the
learners’ choice of the repair strategies rather than the accuracy rates.
19
Figure 15. Percentages of the three strategies in the four tasks. (Lin2000)
Note: MP = minimal pair; WL = word list; S = sentence; C = conversation.
5. Phonetics of L2 Learners
So, can L2 learners acquire new VOT? In this section, I will review the existing
literature that studied the acquisition of different stops in L2 which are different from
their L1.
Curtin et al. (1998)
Curtin et al. (1998) studied the acquisition of Thai voice and aspiration by
English and French speakers. Thai has a 3-way voicing contrast phonemically in stops
which includes voiced, voiceless unaspirated and voiceless aspirated stops. English
also has the three phonetically different stops, but only two phonemically different
stops. Aspiration is not the contrasting feature in the language in English and so there
is no lexical distinction between aspirated and non-aspirated stops. Still there is a
phonetic difference between the [p] in “spin” and the aspirated [ph] in “pin”.
Underspecification means that underlying representations are not fully specified and
that predictable information is not underlyingly present. Underspecification theory
expresses this by assuming that underlyingly both p's are not specified for aspiration.
In this study, Curtin et al. (1998) wanted to find out whether allophonic aspiration in
English [p] vs. [ph] aids in the acquisition of contrastive aspiration in Thai /p/ and / ph
20
/. They also wanted to compare the developmental progression of the English learners
to that of native speakers of French. Like English, French has a 2-way voicing contrast
both phonemically. But phonetically, it only makes voicing contrast with no
aspiration contrast. You could find voiced and voiceless stops in French, but you
couldn’t find any aspirated stops in French
There is some cross-language speech perception research (Abramson and
Lisker, 1970; Strange, 1972; Pisoni et al., 1982) which has shown that English
speakers find it easier to perceptually distinguish aspirated-unaspirated segments (e.g.
/ph/ vs. /p/) than voiced-voiceless segments (e.g. /p/ vs. /b/) in the synthetic VOT
study. But in Curtin et al. (1998)’s study, result showed the opposite in one of the
tasks. English speakers did better in distinguishing voiced-voiceless segments than
aspirated-unaspirated segments in a minimal pair task. Curtin et al. (1998) claimed
that the contradictory orders (aspiration contrast are perceptually easier to distinguish
by English speakers, but English subjects did better in voicing contrast in this study)
of acquisition of L2 voiced and aspiration contrasts by native speakers of English can
be explained by the generative phonological differences between lexical and surface
representation and responses on that task must be made on the basis of lexically
stored representation. The details and the result of the experiment will be discussed
later in this section.
Aspiration is not part of the lexical representation in English; all voiceless
stops are stored as unaspirated in the lexicon and emerge in the fully specified
phonetic representation. Underspecification theory expresses this by assuming that
underlyingly both /p/s are not specified for aspiration in [ph in] and in [spin]. The
aspiration feature in [ph in] is later specified by a context-sensitive at the beginning of
a syllable; aspiration does not apply in other contexts. English has no lexical
distinction between aspirated and non-aspirated stops but still there is a phonetic
difference.
(5) Lexical representation: /pæt/ /spæt/ /bæt/
Aspiration rule: [phæt] — —
Surface representation: [phæt] [spæt] [bæt]
21
Triads of words that minimally differ in both voice and aspiration are found in
Thai, neither of these features is predictable and so both voice and aspiration features
are represented lexically.
(6) /bèt/ ‘fishhook’ /pèt/ ‘duck’ /phèt/ ‘spicy’
The first task of the study is a Minimal Pair Task. Nine Canadian English
speakers, 8 Canadian French speakers and 10 native speakers of Thai (controls) were
asked to choose between pictures of words that are in minimal pair relationship, when
presented with one word aurally. The pictures of the minimal pair are accompanied by
a picture of a foil that differs phonetically in more than one segment from the other
words. An aural presentation was heard and subjects were asked to respond by
pressing a key that corresponds to the position of the appropriate picture on the
screen. This task was used to study the development of lexical representation and to
find out if the subjects could lexically contrast both voice and aspiration, to see if they
can access the correct lexical entry if they hear a word.
The second task is called an ABX Task. In this task, a minimal pair ‘AB’ is
presented aurally followed by a third word ‘X’ that matches either A or B. The
tokens used for A, B and X were each produced by a different speaker. There were 72
trials: 16 each of Aspiration–Voiceless, Voiced–Voiceless and Aspiration–Voiced, and
24 Place controls. Subjects were asked to matches either A or B when they heard a
third word ‘X’.
The results of the Minimal Pair task show that aspirated–unaspirated Minimal
Pairs were discriminated by both English and French groups at a level only slightly
better than chance, performance on the voicing contrast was better (Figure 16). This
experiment lasted for 11 days and results were collected in day 2, day 4 and day 11.
From the results in the last day (day 11), we could see the developmental difference
between some of the English and French subjects. This suggests that the presence of
surface aspiration in English might facilitate the establishment of a lexical aspiration
22
contrast in the L2 acquisition of Thai. Because of this, Curtin et al. (1998) suggested
that L1 surface features can be lexicalized in L2 acquisition.
Figure 16. Minimal Pair Task- proportion correct (Curtin et al. 1998)
French only has voicing contrast in both lexical and surface representations, so
as expected in the ABX task, French speakers perform better on voice contrast than
on aspiration (Figure 17), similar to what they did in the Minimal Pair task. English
speakers perform similarly on voicing and aspiration contrast in the ABX task as
shown in Figure 17. This ABX results were quite different from what English
speakers did in the Minimal Pair task in which their performance on aspiration was
significantly worse than on voice.
Figure 17. ABX Task- proportion correct (Curtin et al. 1998)
23
Curtin et al. (1998) claimed that the Minimal Pair task accesses lexical
representations which lack aspiration in English, while the discrimination task
accesses surface representations which contain aspiration in English. We could see
from the results of an ABX discrimination task that English subjects did better than
the French subjects on aspiration.
L2 learners initially construct lexical representations that make use of only
those features that are present lexically in the L1, even though they may be able to
discriminate other L2 contrasts on the basis of surface features, and may eventually
lexicalize these surface features. Results show that aspirated–unaspirated Minimal
Pairs were better discriminated by the English speakers than the French speaker. The
French speakers perform better on the voice contrast than on aspiration.
In a task which accesses lexical representations, English learners lack aspiration
discrimination, while the task that accesses surface representations, English speakers
did better in aspiration discrimination. It was supported by results from the
discrimination task that English subjects did better than the French subjects.
Flege and Eefting (1988)
Flege and Eefting (1988) examined the imitation of a VOT continuum ranging
from /da/ to /ta/ (-60 to +90 ms) by subjects differing in age and/or linguistic
experience. Subjects were native speakers of English, native speakers of Spanish and
bilingual speakers of both. Spanish and English use different phonetic categories to
implement the contrast between /t/ and /d/. In Spanish, [d] is used to implement /d/
and [t] implements /t/. Spanish categories of [d] and [t] yield stops with VOT values
of approximately –80 ms and 20 ms respectively, in word initial position. In English,
/d/ is implemented by [d] and [t], and /t/ is implemented by [th]. English output of [d]
and [t] result in VOT values of about –80 ms and 20 ms. The rule used to implement
[th] yields VOT values of approximately 80 ms. (Flege and Eefting, 1986). Figure 18
24
illustrates how English and Spanish speakers divide up a VOT continuum based on
their native language catergories.
Figure 18. Identification of a VOT contiuum by English and Spanish speakers
In the experiment, subjects were asked to identify the stimuli before imitating
them. The stimuli, which consisted of a 16-member continuum ranging from /da/ to
/ta/, were presented twice on each trial. Results showed that regardless of the
properties of the acoustic input, children and adults who spoke only Spanish
produced only lead and short-lag VOT responses, which are their phoneme boundaries
in their L1 and they perceived the VOT continuum input as a member of either of
their L1 categories (Figure 19). English speakers also tended to produce phoneme
boundaries in their L1. They produce stop with only short-lag and long-lag VOT
values (Figure 20). On the other hand, native speaker of Spanish who spoke English
produced stops with VOT values falling into three modal VOT ranges (Figure 21).
They had acquired a new phonetic category that isn’t in their L1.
English /d/ /t/
Spanish /d/ /t./
-80 VOT in ms. 80
25
Figure 19. The frequency of VOT valuesproduced by the native Spanish subjects.
SA=Spanish adultSC=Spanish children
Figure 20. The frequency of VOTvalues produced by the native Englishsubjects.
EA= English adultEC= English children
26
Figure 21. The frequency of VOT values produced by the native Spanish speakers ofEnglish. LCB= late childhood bilinguals. ECB= early childhood bilingualsBC= bilingual children
6. Phonology of L2 Learners
After looking at the phonetics of L2 learners, we will now consider what is
acquired to be acquired in the domain of phonology. In this section, we are reviewing
literatures that examined segmental level, which has to do with phonological segments
(consonants) and prosodic level, which has to do with syllabification in L2
phonology.
Eckman & Iverson (1993)
Even when the L1 has no clusters, some clusters are easier to acquire than
other. E.g. [pl] is easier to acquire by L2 learners than [fl]. To explain the
phenomenon, Broselow & Finer (1991) proposed that a Minimal Sonority Distance
(MSD) parameter can give us the prediction on the acquisition of L2 consonant
clusters in syllable onsets. The basis for the markedness of the clusters in Broselow &
Finer (1991)’s study is the Sonority Index shown in (7) and the proposed MSD
parameter.
(7) Sonority IndexClass ScaleStops 1Fricatives 2Nasals 3Liquids 4Glides 5
The function of the MSD parameter is to provide a characterization of
consonant clusters allowed in a language. Languages can be constrained by the minimal
difference allowed in syllable onsets on the Sonority Index. Other things being equal,
languages that required a greater difference in sonority between adjacent segments will
have fewer kinds of consonant clusters in the onset. E.g. a stop-liquid cluster [pr]
would be less marked than a stop-fricative cluster [ps]. But Eckman & Iverson (1993)
argued it is typological markedness rather than sonority distance which better explains
27
L2 learners’ knowledge of English clusters in syllable onsets. they suggested
sequential markedness principle as the better explaination: “For any two segments A
and B and any given context X_Y, if A is less marked than B, then XAY is less
marked than XBY.” On this assumption, since [p] is less marked than [f], hence [pr]
clusters are less marked than [fr] clusters and are predicted to cause less IL difficulty
than do [fr] clusters.
Eckman & Iverson (1993) did an experiment with 11 subjects: 4 Japanese, 4
Korean, and 3 Cantonese speakers. They studied the production of English onset
consonant clusters (CCV). Threshold for definition of acquisition is said to have the
onset in the IL of a subject if the subject produces onset clusters at least 80% of the
time on at least 4 attempts. The data was collected 8 times in casual conversations
between 5 to 10 minutes. No attempt was made to control the vocabulary used by the
subject. They claimed that a less marked cluster would be present just in case one or
more of the more marked clusters is also present. 55 potential test of their claim (five
sets of onset per subject 11 subjects) were collected. Out of the 55 potential tests,
the data allow 50 to be tested (91%). Five of the potential tests yield no result
because the subject did not produce at least four tokens of the relevant clusters. Four
instances out of these 50 appeared to go against what typologcal markedness would
predict. In 92% of the cases, the subject’s performance obeyed the markedness
predictions.
The four cases which ostensibly violated what typological Markedness would
predict. two cases were from Cantonese speaking subjects in which they got the two
clusters [br] and [fr] but not [pr]. Since [p] is less marked than [b] and [f], we would
expect that [pr] would also be less marked than [br] and [fr]. Analysis of the actual
errors from these two subjects showed that both of them substituted [ ph] onsets for
the intended [pr] onsets. In order to explain this, Eckman & Iverson (1993) assumed
that on the basis of similarities in VOT, the two subjects are associating their NL /p/
with the TL /b/, and their NL / ph/ with TL /p/.
(8) Mapping of the NL obstruents on to the TL.
28
NL TL/p/ /b/ Short-lag VOTs.
/ph / /p/ Long-lag VOTs.
With this assumption, the subjects’ production would agree with markedness
prediction because aspirated stops are typologically more marked relative to
unaspirated stops. Hence, the [ph]-liquid onset is more marked than [p]-liquid onset
and [f]-liquid onset. From Eckman & Iverson’s explanation, it brings up the question
whether Cantonese speakers might have this kind of mapping.
Edge (1991)
This is a replication and extension of Eckman’s (1981) study on the
production of English word-final voiced obstruents by native speakers of Japanese
and Cantonese. In Edge’s (1991) study, the data of native speakers of English was
included to account for the native devoicing and epenthesis. This was done to avoid
classifying native-like articulation as evidence of IL rules since devoicing, vowel
epenthesis, and the deletion of final voiced obstruents all characterize spoken English.
7 Japanese, 7 Cantonese and 4 native speakers of English were subjects of this study.
The tasks in this study included (1) a picture-elicited storytelling task which
contained words with voiced obstruents, (2) an oral reading of a short story and (3) an
oral reading of 41 randomly ordered words. The voiced obstruents were classified in
the data as either target, deletion, glottal stop substitution, devoicing, epenthesis,
fricativization and other consonants substitution. In Eckman’s model, while the
surface phonetic forms are influenced by language-specific processes, the underlying
processes, such as terminal devoicing, are universal. Edge’s data from the Cantonese
speakers provide evidence for an IL rule of terminal devoicing and supporting
Eckman’s hypothesis. For the Cantonese subjects, 67% of the non-target variants
were devoiced and deletion appeared to be more frequent in connected speech. When
compared to deletion in the Native English subjects’ data, the deletion of Cantonese
subjects is quite different in its distribution. While deletion of /v/ in function words
29
(fond of playing) rarely occurred, deletion of final /g/, as in dog and of /d/ after a
diphthong in words (beside) occurred across phonetic environments. The results of
this experiment indicate that under the three tasks, devoicing is the strategy that was
most frequent used by Cantonese speakers. It is also important to take into account
native speech in formulating rules for a language learner’s IL production. After we’ve
loo
Cichocki, et al. (1999)
Cichocki, et al. (1999) studied the acquisition of French consonants by native
speakers of Cantonese in onset and coda positions. The two consonant inventories
differ in several ways. French allows more consonants in both onset and coda
position. The number of consonants differs greatly between the two languages in coda
positions since Cantonese only allows unreleased stops /p, t, k/ and nasals in the
coda. Cantonese does not have the voiced/voicing contrast found in French stops but
does have an aspiration contrast that is implemented as voiceless unaspirated and
voiceless aspirated.
There were 6 subjects in this study and their level of proficiency in French
was at the upper beginner and lower intermediary levels. The subjects were asked to
read a passage in the first task. For the second task, subjects were given an English and
Cantonese translation of the items and were asked to give the French equivalent. The
37 words were expected to be well known. Only five words were unknown to some of
the subjects and only three cases were the target words read and repeated after the
fieldworker.
In judging whether a response was acceptable or unacceptable, they followed
principles such as judging the response as acceptable when it was or contained a
merely sub-phonemic inaccuracy even though it contained a wrong nucleus, e.g. [ph ]
was treated as acceptable for initial /p/. They also judged as acceptable when it ended
in a nonnuclear element agreeable with the target phoneme even though it contained a
wrong nucleus, e.g. [sz] for /s/ and [sz] for /z/. Finally, they also judged as acceptable
30
when the target contained an allophone of the target but ended in a wrong phonetype,
e.g. [p] for /p/.
As we can see from the table below (Figure 17), focusing on the result of
stops, Cantonese speakers had greater problem in producing French initial voiceless
stop /p, t, k/ accuracy around 50% even though their native language has the
equivalent phone types. They made errors by producing the stops with prevoicing
and sometimes with a schwa-like vowel inserted after the consonant. In learning to
produce onset /p, t, k/, about 40% of their production were voiced [b, d, k], 35 % are
voiceless aspirated [ph, th, kh], and only about 20% are voiceless unaspirated [p, t, k].
This contradicts the MDH because these French stops have Cantonese counterparts
and one might expect that they be easily learned. In coda position, the result of this
experiment is expected as Cantonese speakers have more difficulty in voiced stops
than in voiceless stops. The voiced stops are nearly always devoiced in final position.
As Figure 18 shows, of all the errors made in the production of stops, 95% included
errors made involving the presence or absence of the voice feature.
Figure 17. Cichocki, et al. (1999)
31
To account for the difficulties with French onset stops in Cantonese speakers’
production, Cichocki, et al. (1999) suggested that we could look at the patterns of
difficulty found in first language acquisition, which shows that voiceless initial stops
are more difficult than are voiced initial stops. (Ingram, 1978). Cichocki, et al. also
claimed that one of the problems in this study is that all the subjects were learning
French as a second foreign language. It is because English is taught in all Hong Kong
schools and is the medium if instruction in many. The possibility of interference from
English cannot be neglected when we look at the data obtained in this study. My
prediction is that English speakers would not have this trouble because English
speakers has the voicing contrast in their L1. Cantonese speakers may have difficulties
contrasting voiced stops and voiceless unaspirated stops.
32
Figure 18. Cichocki, et al. (1999)
7. Discussion
Based on a comparison and contrast of the major differences between the
English and Cantonese phonological systems in this article, we have examined some
difficulties that Cantonese speakers may have when learning English pronunciation. It
is argued that most of the Cantonese ESL learners’ difficulties with English
pronunciation may be accounted for by reference to fundamental differences between
the phoneme inventories of the two languages, the characteristics and distribution of
the phonemes and the permissible syllable structures of the two languages in question.
In this section, we are going to look at differences between the acquisition of stops in
onset and coda position, and different repaired strategies are used under different
circumstances.
Onset vs. Coda
From the data of Cantonese speakers of English collected by Eckman 1981,
Cantonese speakers exhibit a voice contrast in word-initial, -medial and final position.
However, devoicing occurred in some voiced stops in coda position but not onset and
word-medial position. Although voiced stops are absent in the L1 phonology,
Cantonese speakers seems to have no difficulty in onset voiced stops. Since coda is a
33
more marked position than onset, we would expect that people would have more
difficulties in coda positions. Similar to Flege & Eefting (1988)’s studies of English
and Spanish speaker, Cantonese speakers judge tokens of [p, t, k] in their L1 and the
tokens of [b, d, g] to be realizations of the same phonetic categories in the coda
position even though they can detect auditorily the acoustic differences between
corresponding L1 and L2 stops.
We found that Cantonese speakers had fewer problems in the production of
onset voiced stops in the acquisition of French. the result of the study by Cichocki, et
al. (1999). This only happened in the onset but not the coda position. Since voiced
stops are more marked than voiceless stops, this is not what we expected from the
prediction by MDH. Comparable to the result in Eckman 1981, subjects in this study
also showed that they had more difficulties in coda voiced stops. Apart from the fact
that voiceless initial stops are more difficult than are voiced initial stops in L1
acquisition studies, the reason why voiceless French voiceless onsets are difficult to
acquire by Cantonese speaker may also due to the perception of the voicing contrast.
Cantonese subjects may have a wrong realization in time of the phonological units
(phonemes) that distinguish word. Voiced stops in French is easier to distinguish by
Cantonese speaker as Flege (1987) stated that, all other things being equal, we actually
learn L2 sounds which are dissimilar to the sounds in our L1 more easily than their
less dissimilar counterparts.
Repair strategies
In terms of the kind of repair strategies that Cantonese speaker will choose in
the acquisition of English voiced stop, we need to look at proficiency, formality and
the grammatical and functional aspects of the speech. In Abrahamsson (2003)’s
study, data shows that coda deletion is low in the initial phrase of development; it
would increase during the early phrase and decrease during later phrases. The
proportion of epenthesis to deletion will increase over time, which means that the use
of epenthesis would be relatively low at the early stage and increase later on in the L2
development. Error rate increases because of the fact that fluency also increases
34
considerably with higher L2 proficiency. Fluent speech is characterized by more focus
on content and less focus on form and so the increase of deletion and epenthesis
would be found in the early phrase of L2 development. Another factor that varies
individual L2 learner’s utilization of epenthesis versus deletion is the phenomena of
avoiding ambiguity and facilitating recoverability. As suggested by Lin 2001, it
appears that epenthesis-deletion distribution of consonant clusters correlates
positively with increased formality of the speech task such that epenthesis is
frequently employed in formal tasks (e.g.,word-list or minimal-pair reading) but less
frequently in less formal tasks (e.g.,sentence, text, and story reading or natural
conversation), where deletion is the dominant simplification strategy. Other than that,
one aspect of recoverability from the context is whether the coda is crucial part of a
noninflected lexical form or whether it is part of an inflectional morpheme. It can be
argued that the reduction of lexical forms generally increases lexical ambiguity, and this
might particularly be the case for content words. In contrast, the information
expressed by inflectional morphemes is usually redundantly expressed by other
formal markers or otherwise predictable from the context, and it might be argued that
inflectional information is more easily recoverable from the context than the
underlying form of a reduced lexical stem. It is more likely that word-final codas that
are part of a lexical stem will be pronounced less incorrectly than word-final codas
that are part of an inflectional morpheme.
35
References:
Abrahamsson, N. (2003), Development and recoverability of L2 codas: A longitudinalstudy of Chinese/Swedish interphonology. Studies in Second LanguageAcquisition, 25:3, 313-349.
Abramson, A. & Lisker, L. (1970). Discriminability along the voicingcontinuum:cross-language tests. In Hala, B.,Romportl, M. and Janota, P.,editors, Proceedings of the Sixth International Congress of Phonetic Sciences.Prague:Academia, 569–573.
Blumstein, S., Cooper, W., Goodglass, H., Statlender, S., & Gottlieb, J. (1980).Production deficits in aphasia: A voice-onset time analysis. Brain and Language9, 153–170.
Chan, A.Y.M. & Li, D.C.S. (2000). “English and Cantonese phonology in contrast:explaining Cantonese ESL learners’ English pronunciation problems”. Language,Culture and Curriculum, 13, 67-85.
Cichocki, W., House, A.B. Kinloch, A.M. & Lister, A.C. (1999). “Cantonesespeakers and the acquisition of French consonants”. Language Learning, 49, 95-121.
Curtin, S., Goad, H. & Pater, J. (1998). “Phonological transfer and levels ofrepresentation: the perceptual acquisition of Thai voice and aspiration byEnglish and French.” Second Language Research 14, 4. 389-405.
Eckman, F. (1981): “On predicting phonological difficulty in second languageacquisition.” Studies in Second Language Acquisition 4: 18-30.
Eckman, F & Iverson G. (1993) “Sonority and markedness among onset clusters in theinterlanguage of ESL learners” Second Language Research 9, 3. 234-252.
Eckman, F & Iverson G. (1994). 'Pronunciation difficulties in ESL: coda consonants inEnglish interlanguage.' In M. Yavas (ed.), First and Second Language Phonology.San Diego: Singular Publishing Company. 251-265.
Edge, B.A. (1991). ‘The production of word-final voiced obstruents in English by L1speakers of Japanese and Cantonese’. Studies in Second Language Acquisition,13, 377-393.
Eimas, P.D., Siqueland E.R., Jusczyk, P.W., & Vigorito, J. (1971). Speech perceptionin infants. Science 171:303.6
Ethnologue. Website of the Summer Institute of Linguistics.http://www.ethnologue.com/ (Jan, 2004)
Flege, J. (1987). 'The production of "new" and "similar" phones in a foreign language:Evidence for the effect of equivalence classification.' Journal of Phonetics 15: 47-65.
36
Flege, J.E. & Eefting, W. (1988) "Imitation of a VOT continuum by native speakersof English and Spanish: Evidence for phonetic category formation", Journal ofthe Acoustical Society of America 83: 729-740.
Hansen, J. (2001). “Linguistics constraints on the acquisition of English syllable codasby native speaker of Mandarin Chinese”. Applied Linguistics, 22, 338-365.
Kess, J. F. Psycholinguistics: Psychology, Linguistics, and the Study of NaturalLanguage. Amsterdam: John Benjamins Publishers BV, 1992.
Lado, R. 1957: Linguistics across cultures. Ann Arbor: University of Michigan Press.
Lin, Y. H. (2001). “Syllable simplification strategies-A stylistic perspective”.Language Learning 51:4, 681-718.
Morton, K. (1995) Kate Morton's Image Resource.http://www.essex.ac.uk/speech/material/kate/k-images.html
O'Grady, W., Dobrovolsky, M. and Aronoff, M. (1989). Contemporary Linguistics.New York: St. Martin's Press.
Pisoni, D., and Tash, J. (1974) Reaction times to comparisons with and acrossphonetic categories. Perception and Psychophysics 15(2), 285-290.
Pisoni, D., Aslin, R., Perey, A. and Hennessy, B. (1982): Some effects of laboratorytraining on identification and discrimination of voicing contrasts in stopconsonants. Journal of Experimental Psychology: Human Perception andPerformance 8, 297–314.
Radwanska-Williams, J. & Yam, J.P.S.. (2001). “The acquisition of English plosivesby Chinese learners”. In Phonetics Teaching & Learning Conference 2001.
Russell, K. (1997). Narrower transcriptions of English: Aspiration (and Voice OnsetTime). http://www.umanitoba.ca/linguistics/russell/138/2001/notes.html
Schmidt, R. (1987). "Sociolinguistic variation and language transfer in phonology."In G. Ioup & SH Weinberger (Eds.), Interlanguage phonology, 365-377.Rowley, MA: Newbury House Publishers.
Strange, W. 1972: The effects of training on the perception of synthetic speechsounds: voice onset time. Doctoral dissertation, University of Minnesota.
Tsui, I. Y. H., & Ciocca, V. (2000). “The perception of aspiration and place ofarticulation of Cantonese initial stops by normal and sensorineural hearing-impaired listeners”. The International Journal of Language and CommunicationDisorders, 35, 507-525
Wertz, R. R. (2003). Geographical Database: Map of Guangdong Provincehttp://www.ibiblio.org/chinesehistory/images/atlas/provincial/guangdong.html
37