A Publication of ISPhS/International Society of Phonetic ...
Transcript of A Publication of ISPhS/International Society of Phonetic ...
the
PPhhoonneettiicciiaann A Publication of ISPhS/International Society of Phonetic Sciences
Historic larynx models from Franz Wethlo
Number 101/102 2010 – I / II
2
ISPhS
International Society of Phonetic Sciences
President: Ruth Huntley Bahr
Secretary General: Honorary President: Mária Gósy Harry Hollien
Vice Presidents: Past Presidents: Angelika Braun Jens-Peter Köster Marie Dohalská-Zichová Harry Hollien Mária Gósy William A. Sakow † Damir Horga Martin Kloster-Jensen Eric Keller Milan Romportl † Heinrich Kelz Bertil Malmberg † Stephen Lambacher Eberhard Zwirner † Asher Laufer Daniel Jones † Judith Rosenhouse
Honorary Vice Presidents:
A. Abramson P. Janota A. Marchal M. Rossi R. Weiss
S. Agrawal W. Jassem H. Morioka M. Shirt
L. Bondarko M. Kloster-Jensen R. Nasr E. Stock
E. Emerit M. Kohno T. Nikolayeva M. Tatham
G. Fant E.-M. Krech R. K. Potapova F. Weingartner
Auditor: Angelika Braun Treasurer: Ruth Huntley Bahr Affiliated Members (Associations): American Association of Phonetic Sciences Dutch Society of Phonetics B. Schouten International Association for Forensic Phonetics and Acoustics P. French Phonetic Society of Japan I. Oshima & K. Maekawa Polish Phonetics Association G. Demenko Affiliated Members (Institutes and Companies): KayPENTAX, Lincoln Park, NJ, USA J. Crump Inst. for Advanced Study of the Communication Processes, University of Florida, USA H. Hollien Dept. of Phonetics, University of Trier, Germany A. Braun Dept. of Phonetics, University of Helsinki, Finland A. Iivonen Dept. of Phonetics, University of Zürich, Switzerland S. Schmid Centre of Poetics and Phonetics, University of Geneva, Switzerland S. Vater
3
International Society of Phonetic Sciences (ISPhS) Addresses
www.isphs.org
President: Secretary General:
Professor Ruth Huntley Bahr, Ph.D. Prof. Dr. Mária Gósy
President's Office: Secretary General's Office:
University of South Florida Kempelen Farkas Speech Research Laboratory
Dept. of Communication Sciences & Disorders Hungarian Academy of Sciences
4202 E. Fowler Ave., PCD 1017 Benczúr u. 33
Tampa, FL 33620-8200 H-1068 Budapest
USA Hungary
Tel.: ++1-813-974-3182 ++36 (1) 321-4830 ext. 172
Fax: ++1-813-974-0822 ++36 (1) 322-9297
e-mail:rbahr@ usf.edu e-mail: [email protected]
Guest Editor: Book Review Editor:
Dr. Jürgen Trouvain Prof. Judith Rosenhouse, Ph.D.
FR 4.7 Computational Linguistics Swantech
and Phonetics 89 Hagalil St
Saarland University Haifa 32684
Campus C7.2 Israel
D-66041 Saarbrücken Tel.: ++972-4-8235546
Germany Fax: ++972-4-8235546
Tel.: +49 (681) 302 4694 e-mail: [email protected]
Fax: +49 (681) 302-4684
Email: [email protected]
4
FROM THE PRESIDENT
I hope that you are enjoying the new format of the Phonetician.
The ability to include color photographs and graphs makes the
text come alive. I am grateful to the individuals who volunteer to
edit an issue. Prof./Dr. Mária Gósy is doing an excellent job of
recruiting editors; however we would welcome you to volunteer
to edit an issue. The Phonetician would be an excellent way to
showcase your area of phonetics and your institute. We all
benefit from hearing about each other’s work. So, please consider editing an issue
for us. A quick email to me or Prof./Dr. Gósy and we will help you get started and
guide you through the process.
Many thanks go out to Dr. Jürgen Trouvain for editing the current issue. My
favorite thing about this issue is the variety of topics covered. The research articles
range from a description of long term formant distributions in read and
spontaneous speech to throat singing. There is a good article on the acoustic-
phonetic collection in Dresden, as well as an article on a lesser studied language,
Lower Sorbian. Finally, we have an article in French dealing with prosody. There
is definitely something for everyone. We would love to hear your comments on
the recent issues of the Phonetician and its new online format.
FROM THE EDITOR
After various guest editorships, this double issue of the
Phonetician comes from Saarbrücken. It brings together
different research contributions which reflect as large range of
the phonetic sciences: from the acoustics of individual speaker
characteristics to the physiology of throat singing, from the
collection of historical phonetic instruments via the acquisition
of a corpus of an endangered language to an experimental study at the syntax-
prosody interface. In addition to the research articles, the reader finds conference
reports, the presentation of phonetic institutes, book reviews, as well as obituaries.
My warm thanks go to all contributors of this issue. I would like to express
my gratitude to all colleagues who supported me as a guest editor, be it as a
reviewer or in another form.
Jürgen Trouvain
Saarbrücken, March, 2012
5
The Phonetician
A Publication of ISPhS/International Society of Phonetic Sciences
ISSN 0741-6164
Numbers 101/102 / 2010-I/II
Contents
From the President …………………………………………………………. 4
From the Editor ………...……...…………………………………………… 4
Articles and Research Notes
Long-term formant distribution as a measure of speaker characteristics in
read and spontaneous speech
by Anja Moos …………………………………………………………………
7
On the Physiology of Voice Production in South-Siberian Throat singing –
Extended Abstract
by Sven Grawunder …………………………………………………………..
25
The Historical Phonetic-Acoustic Collection of the TU Dresden
by Rüdiger Hoffmann & Dieter Mehnert ..........................................................
33
GENIE: The Corpus for Spoken Lower Sorbian (GEsprochenes
NIEdersorbisch)
by Roland Marti, Bistra Andreeva & William J. Barry ………………………
47
Adjectif épithète et attribut de l’objet. Qu’en est-il de la prosodie?
by Denis Ramasse …………………………………………………………….
60
Obituaries
Eli Fischer-Jørgensen (1911-2010)
by Jack Windsor Lewis ………………………………………………………
78
Eva Sivertsen (1922-2010)
by Jack Windsor Lewis ……………………………………………………….
79
Gösta Bruce (1947-2010)
by Merle Horne ……………………………………………………………….
80
Ilse Lehiste (1922-2010)
by Viola Váradi ……………………………………………………………….
83
6
Awards
Svend Smith Award 2008 for Elisabeth Lhote
by Jens-Peter Köster ………………………………………………………….
86
Phonetic Institutes Present Themselves
The Department of Language and Communication Studies at Norwegian
University of Science and Technology, Trondheim, Norway
by Jacques Koreman ………………………………………………………….
88
Phonetics Lab and the Phonogram Archives at Zurich University,
Switzerland
by Volker Dellwo & Dieter Studer …………………………………………...
91
Conference Reports
Speech Prosody 2010 Chicago (USA)
by Stefan Baumann …………………………………………………………...
96
19th Annual Conference of the IAFPA 2010 Trier (Germany)
by Peter Knopp ……………………………………………………………….
96
New Sounds 2010 – 6th International Symposium of the Acquisition of
Second Language Speech Poznań (Poland)
by Matthias Jilka ……………………………………………………………..
99
Book Reviews
Steve Parker (ed) 2009. Phonological Argumentation. Essays on Evidence
and Motivation.
reviewed by Péter Siptár …………………………………………………….
106
Géza Németh & Gábor Olaszy (eds.) 2010. A magyar beszéd.
Beszédkutatás, beszédtechnológia, beszédinformációs rendszerek
[Hungarian Speech. Speech research, speech technology, speech information
systems]
reviewed by Péter Siptár ……………………………………………………..
110
Halicki, Shannon D. 2010. Learner Knowledge of Target Phonotactics:
Judgements of French Word Transformations.
reviewed by Chantal Paboudjian …………………………………………….
112
Meetings, Conferences and Workshops …………………………………... 116
Call for Papers ……………………………………………………………… 118
Instruction for Book Reviewers …………………………………………… 118
ISPhS Membership Application Form ……………………………………. 119
News on Dues ……………………………………………………………….. 120
7
LONG-TERM FORMANT DISTRIBUTION AS A MEASURE OF
SPEAKER CHARACTERISITICS IN READ AND SPONTANEOUS
SPEECH
Anja Moos
GULP (Glasgow University Laboratory of Phonetics) and School of
Psychology, University of Glasgow, UK
e-mail: [email protected]
Abstract The simple method of averaging formant values of a recording of a speaker known as
Long-Term Formant Distribution (LTF) is applied here to German speech in the context
of forensic speaker identification. Introduced by Nolan and Grigoras (2005), the
advantage of LTF is that it is not necessary to categorize and label each vowel produced.
Instead, for each speaker, the formants of all vocalic portions are averaged, thus leading
to one mean value per formant. The volume of speech data necessary to attain reliable
LTF values is also examined.
LTF values of 71 German speaking males in spontaneous and read speech
recorded via mobile phone connections were analysed. Good speaker characterisation is
possible using the LTF values of F2 and F3; LTF values of F3 seem slightly more useful
because it is less variable within speakers than F2. Comparison of spontaneous and read
speech revealed significant differences between the LTF values of F2 and F3 of the two
speaking styles. The LTF values of formants of read speech are higher. As LTF values
only return the average and standard deviation of formants, they are not suitable for
speaker recognition on their own. However, LTF is independent of many other measures
of a speaker, such as speaking rate, dialect, and fundamental frequency. Therefore, LTF
values can be used as an additional independent factor in speaker recognition.
Keywords
Long-term formant distribution, LTF, read vs. spontaneous speech, mobile phone
recordings, speaker comparison
Definition of LTF
Long-Term Formant Distribution (LTF) is a method used to determine average formant
values of a speaker. For each formant, all formant measurements of all vowels produced
by a speaker are averaged (across the entire recording or appropriate sub-portions of a
recording). This average is the LTF value for this formant. That means that every speaker
has one LTF value and a standard deviation (SD) per formant which shall be called
LTF1, LTF2 and so on. It is a frame-by-frame measurement, meaning that long vowels
carry more weight than short vowels.
1. Introduction
To identify a speaker by his or her phonetic speaker characteristics, various
acoustic and auditory measures are taken into account. According to Jessen
8
(2007), auditory measures such as estimation of age, health, sex, dialect and
sociolect mostly refer to group characteristics. Whereas fundamental frequency,
articulation rate, formants and voice quality, which are often measured both
acoustically and aurally, are more speaker specific. This paper focuses on
formants as the importance of and interest in formant measures for forensic cases
grows. Many studies in the last decade have shown that formants carry speaker-
specific information and that their analysis is also possible under forensic
conditions, i.e. given poor quality and bandpass filter due to phone recordings (see
Rose, 2006; Nolan, 2002; Byrne & Foulkes, 2004). This paper follows Nolan &
Grigoras (2005) who state:
“It is argued here that formants, whose frequencies and dynamics are
the product of the interaction of an individual vocal tract with the
idiosyncratic articulatory gestures needed to achieve linguistically
agreed targets, are so central to speaker identity that they must play a
pivotal role in speaker identification.” (Nolan & Grigoras, 2005: 143)
Of course formants of different people are not unique; but when combined with
other speaker characteristics listed above, they may lead to a very idiosyncratic
speaker description. Each additional independent feature can help to identify a
speaker.
The most commonly used method for formant measures in forensic phonetics
to date is the centre frequency of different vowels (cf. Jessen, 2008; Rose, 2002).
Here, formants are measured at the midpoint which is defined as the articulatory
target of the vowel produced. Usually one tries to find a number of representatives
of a couple of different vowels, mostly /i a o/, to compare their formant values
from the suspect with those of the perpetrator. Comparison of vowels in speech
can be problematic using this method as the context influences the formants. It
might also be difficult to define vowel phonemes in general or their centre
frequency in particular when dealing with a foreign language and/or poor
recording quality. Although it is an accurate method, it is very time consuming.
Another method is the study of formant dynamics. McDougall did this for
/aI/ (McDougall, 2004) and /u/ (McDougall & Nolan, 2007). They found within-
speaker consistency and between-speaker differences in the data and argued that
more attention should be paid to the development of techniques to measure
dynamic features (McDougall, 2006). However, this method bares unknown
effects of the vowel context and further research is necessary. Long term spectra
(LTS) are also used to show formant average distributions (see e.g. Nolan &
Grigoras, 2005; Hollien, 1990). An LTS is the average of all spectral slices of a
sound sample. As well as voiced speech, LTS takes everything else in the signal
into account, including voiceless portions of speech, background noise etc.
Long-Term Formant Distribution (LTF) was developed by Nolan and
Grigoras (2005) in order to address the flaws of the single vowel phoneme
9
measures and LTS. This method does not require a categorization of vowels;
instead, every vowel is used for the measurements. It is also less time consuming
to select all vowels by reading the spectrogram rather than carefully listening to
the file repeatedly to detect single vowel phonemes. In addition to saving time,
being easy to use and suitable for foreign languages, Nolan & Grigoras (2005)
mention two more benefits. First, the distribution of the formants not only reflects
the dimensions of the vocal tract but also shows habits in articulatory settings like
palatalization or lip rounding. Second, the shape of the distribution of a formant
might show useful information about the speaker insofar that a broad peaked or
narrow peaked distribution might reflect the speakers’ vowel space. The
disadvantages of LTF are that inter-individual differences on single vowels cannot
be detected and speech dynamics like transitions and coarticulation are lost.
The work of Nolan & Grigoras (2005) showed the benefits, usefulness and
efficiency of the LTF method on an English forensic case. This study will show its
applicability to German and also provide information on the following aspects:
Testing for correlation of LTF values with the fundamental frequency,
articulation rate, and dialect groups. If they correlate, it is not necessary to use
LTF in addition because no further information is gained. If they do not
correlate, LTF can be used as an independent measure that adds further
information to the characterisation or discrimination.
Determination of how many seconds of vocalic stream or of speech
recordings are needed to derive reliable LTF values. This is an essential issue
for forensic case work because voice recordings are often limited in duration.
Different speaking styles (read and spontaneous speech) were compared. It is
important to know whether, and to what degree, recordings of the same voice
differ in their LTF values between speaking styles so that it can be determined
whether spontaneous speech of a perpetrator can be reliably compared with
read speech of a suspect.
Creation of a reference database for German LTF values comprising 71
speakers. This will be useful for future use in Bayesian methods like the
likelihood ratio (see Jessen, 2008; Morrison, 2009; Rose, 2002 for usage of
likelihood ratio in forensic speaker comparison).
2. Methods
2.1. Data
Recordings of the speech corpus “Pool2010” (Jessen et al., 2005) were used. From
this German corpus, recordings of 71 male participants who read out the German
version of “North wind and the sun” were used for this experiment. For
spontaneous speech, participants were asked to describe objects to another person
without using predefined words, similar to the game “Taboo”. The person
guessing the object played ignorant to encourage the speaker to describe the items
more extensively, thereby triggering longer stretches of spontaneous speech. All
10
the recordings were made in high studio quality and later played back through
speakers and re-recorded through mobile phones to have data close to forensic
case data. The mobile phone data was used for this experiment. The recordings of spontaneous speech were 79-313 seconds long (M=178 seconds).
Recordings of the read story were 31-54 seconds long (M= 39 seconds). For the LTF
analysis, recordings were cut in a way described in section “2.3 Data preparation” below,
so that only vowels remained. After that, the vocalic stream of spontaneous speech was
12-83 seconds (M= 40 seconds), and the vocalic stream of read speech was 8-16 seconds
(M= 12 seconds). In total 142 sound files were used (71 speakers X 2 speaking styles).
2.2. Speakers
Recordings of 71 male German speakers were used. Speakers were 25 to 55 years
of age (M= 38 years). Roughly half of them had recognizable but generally weak
dialectal features of Hessian German (‘Hessisch’); 45 of the participants were
actually from that area. The remaining participants were from other parts of
Germany. None of the speakers had heavy dialectal features, and everyone had an
average or above average educational background. No noticeable speech or voice
disorders were present. Speaker IDs ranged from 35-107 (excluding 61 because of
lack of data); speakers will be referred to later in the text by their IDs. 2.3. Data preparation
For LTF, only the vocalic stream is used (i.e., every recording was cut in such a
way that only vowel sounds remained). WaveSurfer (Sjölander & Beskow, 2005)
was used for the cutting procedure. The selection process was based on several
criteria:
● Clear and visible formant structure of the first three formants (intensity
settings were sometimes increased to find F3, especially for back vowels
which tend to have a higher spectral tilt)
● Laterals and approximants were kept1
● Filled pauses and hesitations were kept if vocalic
● Creaky voice was kept if vocalic
● No nasals or strong nasality (because of zero formants at 2-3 kHz)
● No vowels spoken with a very high pitch so that harmonics rather than
formants were visible
This procedure resulted in sound files of pure vocalic stream, without any pauses
or consonants other than those stated above. This criteria was applied while
reading the spectrogram and deleting all unwanted regions. When it was unclear
whether nasality was present or not from reading the spectrogram, additional
auditory judgements were made. 2.4. Data analysis
The cut sound files were used for the formant measurements of F1, F2 and F3 with
WaveSurfer. The automatic formant tracking was set to four formants, an LPC
1 Because the formant structure of laterals and approximants is very similar to vowels, they were kept. It
doesn’t distort the data but saves working hours if no auditory inspection is needed after visual selection of
vocalic stream.
11
order of 12, a frame interval of 0.01 seconds and a nominal F1 of 500 Hz.
Recordings were down-sampled to 10 kHz. Usually the band width of telephone
recordings (roughly 300 Hz to 3-4 kHz) does not display F4 correctly or at all
because of the upper cut-off frequency. However, without a fourth dummy
formant, the tracking of F2 and F3 was often found to be unreliable, so it was kept.
Every file was manually checked and corrected if necessary. This correction
was needed because, due to the cutting procedure, samples could contain jumps
(e.g., from /i/ to /u/) without the usual formant transition. The prediction algorithm
would find such unnatural jumps quite problematic.
3. Results
3.1. General results for LTF
Figure 1 presents individual LTF2 and LTF3 values for every speaker for
spontaneous (Figure 1a) and read speech (Figure 1b). LTF1 is not shown as it is
too error prone due to the lower cut-off frequency in mobile phone transmission.
Byrne & Foulkes (2004) showed that F1 on average shifts 29 % in mobile phone
recordings compared to direct high fidelity recordings. Table 1 lists the average
LTF values for every formant and speaking style averaged across all speakers.
Both figures and the table show that LTF values are higher for read than for
spontaneous speech. A t-test for paired samples showed that this difference is
significant for all formants: t=-6.016, p<0.0001 for LTF1; t=-11.449, p<0.0001 for
LTF2; t=-6.917, p<0.0001 for LTF3. Regarding the within speaker comparison,
Figure 1a shows that hardly any LTF2 in read speech was lower than for
spontaneous speech. Only very few LTF3 values for read speech were lower than
for spontaneous speech, as Figure 1b displays.
(a) Spontaneous LTF2 ascending
1200
1300
1400
1500
1600
1700
1800
1900
2000
2100
2200
2300
2400
2500
2600
2700
70
100
37
106
54
39
91
46
107
77
90
41
103
73
97
55
53
85
66
84
49
99
78
81
52
93
87
48
42
80
72
86
47
68
101
Hz
speaker
F3_read
F3_spont
F2_read
F2_spont
12
(b) Spontaneous LTF3 ascending
Figure 1. LTF2 and LTF3 of every speaker in read and spontaneous speech. Speakers
ordered by ascending LTF values of spontaneous speech.
Table 1. LTF values in Hz for spontaneous and read speech and their standard deviations
(SD) averaged across all speakers.
F1_spont F1_ read F2_spont F2_ read F3_spont F3_ read
LTF 470 484 1400 1463 2378 2422
SD 24 21 79 70 128 125
3.2. Between-speaker comparison
Speaker-specific features can be identified in the distribution of LTF values, as
well as their mean value. Figure 2 shows the distribution of F2 and F3 for two
speakers with very different LTF values at the top, and two different speakers with
very similar values at the bottom. As the top graph shows, speakers not only
differed in their LTF mean value (with up to 500 Hz difference), but also in the
distribution. While the distribution of speaker 44 is more platykurtic (broad peak),
the distribution of speaker 95 is more leptokurtic (narrow peak). As the bottom
graph shows, both speakers have a double peak distribution for F3, but their main
peaks lie 250 Hz apart while having very similar F2 distributions. While this is not
a very distinctive feature, it would still raise some doubt whether these two
distributions are from the same speaker or not.
1200
1300
1400
1500
1600
1700
1800
1900
2000
2100
2200
2300
2400
2500
2600
2700
46
92
63
78
55
73
97
98
10
3
70
58
54
43
96
66
10
7
81
88
35
72
62
83
51
99
84
48
69
71
93
87
40
80
86
10
5
10
0
Hz
speaker
F3_read
F3_spont
F2_read
F2_spont
13
Figure 2. F2 and F3 distributions of two speakers producing spontaneous speech in
comparison. Top: Clearly distinguishable formant distributions of speaker 44 and 95.
Bottom: Similar formant distributions of speaker 35 and 66.
In comparison, Figure 3 (top) shows the distribution of F2 and F3 for speaker
44 only, with the recording of his spontaneous speech divided into two halves. The
same was done for speaker 35 in Figure 3 (bottom). For speaker 44, the
distributions of F2 and F3 are very similar in the two parts of his spontaneous
speech; however, there is a peak shift of 125 Hz for F3. No differences of the
distributions of F2 and F3 were found for speaker 35, indicating no within-speaker
differences for spontaneous speech.
0%
2%
4%
6%
8%
10%
12%
14%
60
0
72
5
85
0
97
5
11
00
12
25
13
50
14
75
16
00
17
25
18
50
19
75
21
00
22
25
23
50
24
75
26
00
27
25
28
50
29
75
31
00
32
25
33
50
Hz
44 F2
44 F3
95 F2
95 F3
0%
1%
2%
3%
4%
5%
6%
7%
600
725
850
975
1100
1225
1350
1475
1600
1725
1850
1975
2100
2225
2350
2475
2600
2725
2850
2975
3100
3225
3350
Hz
35 F2
35 F3
66 F2
66 F3
14
Figure 3. F2 and F3 distributions of speaker 44 (top) and 35 (bottom) producing spon-
taneous speech; first half of recorded speech in black, second half in grey.
To sum up, it can be very useful to look at the distribution of F2 and F3 for
speakers with very similar LTF means because their distributions can be manifold:
They can be single vs. double peaked and/or lepto- vs. platykurtic, and these
15
distribution shapes seem to be stable within speakers but can vary between
speakers. 3.3. Within-speaker comparison
3.3.1. Effect of speaking style on mean LTF
Recordings of the perpetrator are sometimes compared to recordings of the suspect
reading what has been said by the perpetrator during the crime. For this reason it is
important to know whether, and to what degree, recordings of the same voice
differ in their LTF values between spontaneous and read speech. To determine
whether spontaneous speech of the perpetrator can be reliably compared with read
speech of the suspect, LTF values within speakers were analysed across speaking
styles.
Table 2 shows the results of a t-test for paired samples. LTF values of spontaneous
and read speech were paired for every speaker. A negative mean value indicates
that spontaneous speech has lower values than the read speech. This is the case for
all three formants. The mean difference is given in Hz, so LTF2 of spontaneous
speech is 62.21 Hz lower than that of read speech and is the formant with the
largest difference between speaking styles. Given a mean difference of -62.21 Hz,
the standard deviation (here 45.78 Hz) indicates that the LTF2 difference of 68 %
of the speakers lies between -107.99 and -16.43 Hz. These numbers were derived
like this:
(1) -62.21 – 45.78 = -107.99
(2) -62.21 + 45.78 = -16.43
A positive frequency indicates that some speakers of the typical 68 % have a
higher LTF in spontaneous speech. This is the case for LTF3 where the SD of the
mean difference of read and spontaneous speech ranges between -97.6 and +9.06
Hz. All the differences are highly significant (p<0.0001). As already mentioned
LTF1 should not be taken as a reliable measure because of the lower cut-off
frequency of the mobile phone transmission. LTF3 seems most reliable to use for
speaker identification because it shows less difference between speaking styles
and is less influenced by the mobile phone bandwidth than LTF1.
Table 2. t-test for paired samples. Pairs: LTF values of read and spontaneous speech of
every speaker. SD = standard deviation, SE = standard error. All T’s significant with
p<0.0001.
paired differences
mean SD SE T df
LTF1 -14.08 19.72 2.34 -6.016 70
LTF2 -62.21 45.78 5.43 -11.449 70
LTF3 -44.00 53.60 6.36 -6.917 70
16
Despite the fact that LTF values differ significantly across speaking styles
they can still correlate strongly. In this case, a stable difference in their values can
be assumed. To find correlations, a Pearson product-moment-correlation for
interval-scaled data was conducted between all LTF values across all speakers to
look at relationships of formant specific LTF values. Table 3 shows the
statistically significant correlations between LTF values. Correlations between
LTF values of the same formant across the two speaking styles were stronger
(indicated in bold print) than correlations within the same speaking style across
different formants. The strongest correlation was for LTF3 which is known to be
the most stable formant within a speaker.
Table 3. Pearson product-moment-correlation of all LTF values. All r values are signifi-
cant with p<0.01. n.s. = not significant.
LTF2spon LTF3spon LTF1read LTF2read LTF3read
LTF1spon 0.395 n.s. 0.615 0,370 n.s.
LTF2spon 1 0.514 0,400 0.819 0,484
LTF3spon 0.514 1 n.s. 0,502 0.910
LTF1read 0,400 n.s. 1 0.377 n.s.
LTF2read 0,819 0,502 0.377 1 0.575
These correlations were made using the data of all the speakers.
Correlations of individuals may vary, so these r values can only be used as guide
values. Combining the results of the t-test and the Pearson correlation, it was
found that there is a stable difference in LTF insofar that read speech produces
mostly higher LTF values than spontaneous speech.
The scatter plot in Figure 4 shows the downshift of LTF2 and LTF3 from
read to spontaneous speech in the F2-F3-vowel space. The LTF values of every
speaker are connected with a grey arrow indicating the direction of change from
read (red circle) to spontaneous (blue x) speech. The general trend leads to a lower
LTF2 and LTF3, but some speakers also show upward shifts or downward shifts
of only one formant; only three speakers show an upward shift of both LTF2 and
LTF3.
17
Figure 4. Scatter plot of LTF2 and LTF3 of all speakers. Circle = read speech, x
= spontaneous speech. Values of every speaker connected through grey arrows.
3.3.2. Effect of speaking style on formant distribution
When investigating the distributions of read and spontaneous speech within one
speaker, it is not only interesting to see the differences between the LTF means but
also the distributions. Are they similar apart from a little upward shift? No clear
answer can be given, as shown in Figure 5. While speaker 35 has very different F3
distributions across speaking styles, the F3 distribution of speaker 100 is nearly
identical. This raises problems discussed earlier in section “3.2 Between-speaker
comparison”. When the mean is similar but the distribution different, it is still not
clear whether the samples are from different speakers or whether the same speaker
is using different speaking styles.
18
Figure 5. F2 and F3 distributions of spontaneous and read speech in comparison.
Spontaneous speech in black, read speech in grey. F2 solid line, F3 dashed line. Top:
Speaker 35. Bottom: Speaker 100.
3.4. Amount of data necessary for LTF
One of the most important questions regarding LTF values for forensic phonetics
use is: How much speech data is necessary to get reliable LTF measurements? The
amount of data is crucial because an LTF value is only meaningful if enough
(different) vowels are used. In a sample of 2 seconds of pure vocalic stream, for
19
example, it might well be that only /e/, /a/ and /ə/ are present and this would skew
the data towards the open front side of the vowel quadrilateral and therefore not
represent the vowel space of a speaker.
Because in most forensic cases there will not be extensive recordings to
extract many seconds of vocalic stream, it is necessary to find out whether short
recordings are sufficient. For this, each LTF sound file was divided into packages.
Each package represents a short sound file of one speaker. If the LTF values of the
packages (of one sound file) do not differ much from each other, it is assumed that
this size is sufficient to get reliable LTF data. The difference between packages
was detected by calculating the standard deviation between packages.
These calculations were made with various package sizes to detect the
threshold package size (because the package size is an approximation of the length
of vocalic stream needed to get reliable LTF values). Every sound file was divided
into packages of 1, 1.5, 2, 2.5 … 10 sec. The average number of packages per size
per speaker is listed in Table 4. Within each package size, the LTF package values
were taken, and a standard deviation was determined for every speaker separately
and for every package size. As the package size increases, the number of packages
naturally decreases, so the standard deviation might be influenced by size and,
therefore, the number of packages. On the other hand, in bigger packages there is
much more variation within a package, so LTF values do not differ much any
more and not many packages are needed to get a stable SD.
Table 1. Average amount of packages per speaker used to calculate standard deviations.
Top: spontaneous speech. Bottom: read speech. Package size in seconds.
package size 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
number of packages 39.9 26.5 19.6 15.7 13.0 11.0 9.6 8.5 7.6 6.8
package size 6 6.5 7 7.5 8 8.5 9 9.5 10
number of packages 6.3 5.8 5.3 5.0 4.7 4.3 4.1 3.9 3.8
package size 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
number of packages 11.5 7.5 5.5 4.4 3.6 3.0 2.6 2.1 2.1 2.0
package size 6 6.5 7 7.5 8
number of packages 2.0 2.0 2.0 2.0 2.0
If the standard deviation asymptotically reaches a constant, package sizes
do not differ much anymore and it can be assumed that the amount of data of this
package size is enough to get reliable LTF values. Figure 6 shows the course of
the SD curves across the different package sizes. The x-axis lists the package sizes
and the y-axis the SD. It is shown that for both read and spontaneous speech, the
SD was smallest for LTF1 and largest for LTF2. LTF1 is not very meaningful
because the lower cut-off frequency of mobile phone transmission shifts the
formant values in unpredictable ways and amounts, mostly upwards. LTF3 has a
20
smaller SD and is regarded as being more speaker specific (see Rose, 2002, p.
237; Ladefoged, 2001, p. 194). It is therefore best to work with LTF3. For
spontaneous speech, LTF3 seems to become stable at a package size of 6 seconds
(see Figure 6a), which equals about 27 seconds of spontaneous speech dialogue
recording. It has to be noted that the curve does not seem to have reached its
asymptotical level but, nonetheless, there is very little change in its course
anymore.
Figure 6. Standard deviation of package sizes averaged across all speakers. It can be
assumed that enough data is collected to get reliable LTF values when the curve reaches
an asymptotical level.
0
20
40
60
80
100
120
140
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10
sta
nd
ard
devia
tio
n (
Hz)
package size (sec)
F1
F2
F3
0
20
40
60
80
100
120
140
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8
sta
nd
ard
de
via
tio
n (
Hz)
package size (sec)
F1
F2
F3
21
For read speech, the LTF3 threshold is difficult to detect. It might be at 5
seconds, equal to about 16 seconds of read speech recording. Empty symbols were
used in Figure 6 for read speech at a package size of 7 seconds or larger because
they were only based on 6, 4 and 1 speaker(s) respectively. The other speakers
produced passages too short to be divided into packages larger than 6.5 seconds.
As the reading passage was not very long, very few speakers produced vocalic
data of that length and it cannot be assumed that the data represented by the empty
symbols act in a typical way.
For LTF2 a package size of 4.5 seconds seems to give reliable LTF data in
read speech (equivalent to about 14.5 seconds of read speech recording). For
spontaneous speech, the threshold is also difficult to detect. The safest choice is
the package size of 9 seconds (equivalent to about 50 seconds of spontaneous
dialogue recording) but 5.5 seconds (≈25 sec) seems to be a justifiable choice as
well.
In sum, LTF values of speech samples with at least 6 seconds of pure vocalic
stream can be considered reliable. This estimation is based on the average
behaviour of all speakers. There can sometimes be large variation between
speakers as to the threshold of sufficient LTF data (see Moos, 2008, Figure 3.15).
4. Discussion
In this study, LTF has been shown to be a valuable measure for speech
comparison and can aid in speaker identification. Some speakers had very similar
LTF values, but the distribution of the formants may vary, resulting in leptokurtic,
platykurtic or double-peaked curve shapes. Other speakers had easily
distinguishable distributions with clearly distinct means. Within-speaker
comparisons of speaking style revealed that read speech had significantly higher
LTF values than spontaneous speech. It is unclear whether this upward shift is a
shift or an expansion of the vowel quadrilateral. Hyper-articulation in read speech
would explain an expansion of the vowel space (an expansion would also result in
an upward shift of LTF because front and open vowels are used more often than
close back vowels in German, see Simpson, 1998). But, as the SD remained
constant (see Table 1), a simple upward shift rather than an expansion is assumed
(for an expansion the SD increases as well). Despite the shift, LTF values within
formants correlated strongly across speaking styles. The curve distribution within
speakers across speaking styles can also vary in different ways but generally does
not show drastic changes and shifts.
LTF is a measure of speaker characterisation that is independent of f0,
dialect and speech rate; Moos (2008) showed no correlation between LTF and
these measures using a dataset common to both studies. One aspect that could not
be covered in this article is the correlation between LTF and the physiognomy of
the speakers (e.g., body height). Several studies found weak negative correlations
between body height and formant measures (Greisbach, 1999 for German;
Gonzales, 2004 for Spanish; Rendall et al., 2005 for Canadian English, but only
22
for males). The same was found by Jessen (2010) using the same data the current
study is based on. The size of the vocal tract might be a mediator of these
correlations. Although no clear assumptions can be drawn from weak correlations,
it is very unlikely that someone with high formants will be tall and that someone
with low formants will be small.
Before working with LTF measures, it is very important to know whether
one has a sufficient amount of data. Because LTF is an average of all vowels
produced in a speech sample, short samples are not suitable for this measurement.
By dividing the given samples into smaller packages, it was estimated that roughly
6 seconds of pure vocalic stream (equivalent to 27 seconds of dialogue or 19
seconds of read speech) are, on average, enough to produce reliable LTF values.
An important aim of this work was to create a reference database for LTF to
work towards probability statements using likelihood ratios (LR). In court,
evidence has to be weighed, and probabilities have to be given in a strength-of-
evidence statement. How similar or different are the LTF values of two voice
samples, and how typical are they (i.e., do many people of the population have
those LTF values? See Jessen, 2008; Morrison, 2009; Rose 2002 to learn about LR
in forensic speaker comparison.) To be able to give a strength-of-evidence
statement (i.e., to be able to say how much more likely it is that two LTF values
are from the same or different speakers), the creation of a reference database is
essential. If there are, for example, two very similar LTF values of a suspect and
the perpetrator, it does not necessarily mean that they are from the same speaker;
if the LTF values are very typical in the population, there is relatively more
evidence that they are from different speakers than if they are very atypical (e.g.,
very low or high). An LTF database was constructed from 71 German speakers
producing read and spontaneous speech recorded through mobile phone
transmission as part of this work. This database, which enables such likelihood
ratio statements, is more extensively described in Moos (2008).
Prospects for future work are to compare the mobile phone data with high
fidelity recordings which exist for the data that has been used here as well.
Another interesting investigation would be to explore the influence telephone
bandwidth has on LTF values. The results could then be compared with those of
Byrne & Foulkes (2004) with the advantage that the same speech data was used
for both hi-fi and mobile phone qualities. Comparisons across different languages
should also be made to investigate whether LTF measures of recordings of one
person speaking different languages can be reliably compared. A further important
test concerns the reliability of LTF measures across different phoneticians taking
the measures. Will every expert include and exclude the same vocalic portions and
hence produce the same data for analysis? The same question can be applied to
different formant tracking algorithms used in different programmes like Praat,
WaveSurfer, Emu, etc. Statistical measures to evaluate the amount of LTF data
necessary to be reliable would improve the validity of the prediction. Research is
23
currently being undertaken to answer many of these questions and will hopefully
give insight into these neglected areas of LTF research.
References
Byrne, C. & Foulkes, P. (2004). The 'mobile phone effect' on vowel formants.
International Journal of Speech, Language and the Law 11(1), pp. 1350-
1771.
Greisbach, R. (1999). Estimation of speaker height from formant frequencies.
Forensic Linguistics 6(2), pp. 265-277.
Gonzalez, J. (2004). Formant frequencies and body size of speaker: A weak
relationship in adult humans. Journal of Phonetics 32(2), pp. 277-287.
Hollien, H. (1990). The Acoustics of Crime: The New Science of Forensic
Phonetics. New York: Plenum Press.
Jessen, M. (2007). Speaker classification in forensic phonetics and acoustics. In:
C. Mueller (ed): Speaker Classification I, pp. 180-204. New York, Berlin:
Springer.
Jessen, M. (2008). Forensic phonetics. Language and Linguistics Compass 2(4),
pp. 671-711.
Jessen, M. (2010). The forensic phonetician. Forensic speaker identification by
experts. In: M. Coulthard & A. Johnson (eds): The Routledge Handbook of
Forensic Linguistics, pp. 378-394. London, New York: Routledge.
Jessen, M., Köster, O. & Gfroerer, S. (2005). Influence of vocal effort on average
and variability of fundamental frequency. International Journal of Speech,
Language and the Law 12(2), pp. 174-213.
Ladefoged, P. (2001). A Course in Phonetics. USA: Heinle & Heinle.
McDougall, K. (2004). Speaker-specific formant dynamics: An experiment on
Australian English /ai/. International Journal of Speech, Language and the
Law 11(1), pp. 103-130.
McDougall, K. (2006). Dynamic features of speech and the characterization of
speakers: Towards a new approach using formant frequencies. International
Journal of Speech, Language and the Law 13(1), pp. 89-126.
McDougall, K. & Nolan, F. (2007). Discrimination of speakers using the formant
dynamics of /u:/ in British English. Proceedings of the 16th International
Congress of Phonetic Sciences, Saarbrücken, Germany, pp. 1825-1828.
Moos, A. (2008). Forensische Sprechererkennung mit der Messmethode LTF
(long-term formant distribution). Unpublished Master thesis (Magister-
arbeit), Saarbrücken, Universität des Saarlandes.
http://www.psy.gla.ac.uk/docs/download.php?type=PUBLS&id=1286
(accessed 17/08/2010).
Morrison, G. (2009). Forensic voice comparison and the paradigm shift. Science &
Justice 49(4), pp. 298-308.
Nolan, F. (2002). The 'telephone effect' on formants: A response. Forensic
Linguistics 9(1), pp. 74-82.
24
Nolan, F. & Grigoras, C. (2005). A case for formant analysis in forensic speaker
identification. International Journal of Speech, Language and the Law
12(2), pp. 143-173.
Rendall, D., Kollias, S., Ney, C. & Lloyd, P. (2005). Pitch (F0) and formant
profiles of human vowels and vowel-like baboon grunts: The role of
vocalizer body size and voice-acoustic allometry. The Journal of the
Acoustical Society of America 117(2), pp. 944-955.
Rose, P. (2002). Forensic Speaker Identification. London: Taylor & Francis.
Rose, P. (2006). Technical forensic speaker recognition: Evaluation, types and
testing of evidence. Computer, Speech and Language 20, pp. 159-191.
Simpson, A. (1998). Phonetische Datenbanken des Deutschen in der empirischen
Sprachforschung und der phonologischen Theoriebildung.
Habilitationsschrift, Christian-Albrechts-Universität zu Kiel.
Sjölander, K. & Beskow, J. (2005). WaveSurfer 1.8.5, Stockholm, KTH Royal
Institute of Technology. Software available online:
http://www.speech.kth.se/wavesurfer/index.html (accessed 06/10/2007).
25
ON THE PHYSIOLOGY OF VOICE PRODUCTION IN SOUTH-
SIBERIAN THROAT SINGING – EXTENDED ABSTRACT
Sven Grawunder
Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
e-mail: [email protected]
This paper is an extended abstract of a PhD project that was finished in 2005 and
published as a book (Grawunder, 2009). The project represents the first field-work
based phonetic study of the extraordinary voice production mechanisms that occur
in throat singing.
Throat singing (ThS) is practiced in four areas in South-Siberia: the Republic
of Tuva, the Republic of Hakassia, the Republic of Gorno-Altai as well as parts of
the Russian Federation and adjacent Mongolia. ThS is a defined genre among and
intertwined with other oral folk-arts and singing types, and it is distinct from
Western overtone singing. Like Western overtone singing, South-Siberian ThS
uses reinforced harmonics as carriers of sung melodies and enforced phonation
modes. However, different from Western overtone singing, such targeted use of
harmonics appears as common but not essential to ThS in the conceptions of
singers (cf. van Tongeren, 2002, Grawunder, 2003b). There are two (sometimes
three) main styles with regard to voice use, as discussed by singers and
ethnomusicologists (cf. Kyrgyz, 2002): first, a tensed medial (chest-) register
voice and second, a raspy growling low register voice. Often these voice registers
are referred to in the literature with the Tuvan style names khöömei and kargyraa,
respectively.
Including a small-scale endoscopic study of one singer (the author), which
contributes to the few available articulatory studies of throat-singers (e.g. Dmitriev
et al., 1983, Edgerton, 2005, Grawunder, 2003a, 2003b, Lindestad et al., 2001,
2004, Sakakibara et al., 2002), the laryngoscopic evidence suggests that throat-
singers make use of three voice production mechanisms. All mechanisms share an
excessive constriction of the larynx entrance resulting, at various levels, in an
approximation of the aryepiglottic folds and the epiglottis. Therefore the study
focuses on phonation types which result, in addition to the normal activity of the
vocal folds (VF), from various combinations of phonation activities involving the
aryepiglottic sphincter chain (AES), the ventricular folds (VTF) and sometimes
even the aryepiglottic folds (AEF).
Two main types are therefore proposed for voice production in South-
Siberian throat singing: a voice production by means of the vocal folds featuring a
constriction of the AES (Phonation Mode 1, henceforth PM1), and a voice
production with involvement of the ventricular folds (Phonation Mode 2,
henceforth PM2). VTF involvement in PM2 appears as a double cyclic period,
with the vocal folds vibrating twice as fast as the ventricular folds; every second
cycle consists of a (near-) synchronous closure of VF and VTF (cf. Bailly et al.,
26
2010). A third proposed mechanism for PM2 is the involvement of the AEF (cf.
Sakakibara, 2004), similar to epiglottic trill (Esling et al., 2007). On the one hand,
the mechanisms of the constriction of VTFs and AEFs are discussed with respect
to histoanatomical findings of muscular tissue that facilitates the medial
compression of the VTFs (Kotby et al., 1991; Reidenbach, 1998a) as well as with
respect to the findings of muscular and ligamentous components for an AEF-
sphincter framework (Reidenbach, 1998b) that takes part in the anterior-posterior
constriction by means of the AES. On the other hand, the constriction mechanisms
are discussed with respect to the anterior-posterior constriction that is often found
in professional singing (Yanigsawa et al., 1989; Koufman et al., 1996; Stager et
al., 2003). Finally, the occurrence of these structures in linguistically relevant
sound patterns (cf. Esling et al., 2007; Edmondson & Esling, 2006) emphasizes the
significance of these phonation modes to general phonetic research.
The typical oro-pharyngeal configurations in ThS are described in a rough
scheme of at least three (overtone) articulation techniques (denoted here as
articulation types, AT) that are generally also found in overtone singing (cf.
Edgerton, 2005; Neuschaefer-Rube et al., 2001; Saus, 2004; Trân, 1991): an [l]-
like articulation of the tongue tip (AT1), an [n]-, [ŋ]-, [i]- or [u]-like articulation of
the tongue dorsum (AT2), and a mid-low vowel articulation of different heights of
front and back vowels (AT3) including larger jaw movement than in the other two
ATs. The three main ATs can be easily linked with techniques that are commonly
associated with particular (here Tuvan) styles (AT1 – sygyt, AT2 – khöömei, AT3
– kargyraa). Although all combinations of PMs and ATs are found to be used in
ThS, PM1 is mainly combined with AT1 and AT2 whereas PM2 is mainly
combined with AT3. Further ‘articulatory substyles,’ such as ezeŋgileer AT2 or
AT1 with a strong nasal (AT4) component or birlaŋnaadyr AT1/AT2 with strong
labial component (AT5), were considered but have been excluded from the
analysis to a great extent.
AT1 and AT2 display the highest prominence of the single ‘melodic’
harmonic, measured as amplitude difference to the previous and next harmonic
(12-14dB). However, the bandwidths for AT1 tend to be wider since here F2 and
F3 usually merge. Besides a general explorative investigation of the properties of
the phonation modes and articulation types, it was essential that the project
investigate possible areal patterns, i.e. differences between the four groups of
singers with regard to their origin in Southern Central Siberia. Questions
addressing the areal typology of traditional music performance (especially
singing) have gained more attention recently (see Blench & Dendo, 2006). In
particular cases, the analyses of specific ThS styles may help to retrieve parts of
the unwritten demographic history of the area, including population contact.
27
Figure 1: Sample sequence of the Tuvan Singer Oleg Kuular, starting out with
PM1AT3 and PM1AT2 (khöömei) switching to PM1AT1 (sygyt) and proceeding
further with PM2AT3 (kargyraa); the third tier contains the reinforced harmonics
for AT1/AT2 and vowel qualities for AT3
The current study is comprised of data from 69 male singers. The material
was in part collected during fieldwork in South Siberia in the years 1999 to 2002,
where 25 singers were recorded by use of a specific field setting for acoustic (Vx),
electroglottographic (Lx) and subglottal resonance (Sx) signal acquisition. For the
latter, an approach by Neumann et al. (2003) was adopted, which makes use of a
signal acquired with a small condenser microphone placed in contact with the skin
of the singer’s jugular notch, i.e. the dip at the superior border of the sternum,
between the clavicular notches. The cricothyroid ligament (ligamentum conicum),
which is palpable as small depression below the Adam’s apple, had been recently
suggested by Wokurek and Madsack (2009) as an alternative measure point for
subglottal resonance. Supplementary recordings from 44 available professional
music recordings and field recordings of other researchers were added to the
acoustic analysis.
The results of the perturbation measures for acoustic signals show dominance
of individual variability over areal (cultural) factors. As one could expect there is a
strong influence of the articulatory strategy. Nonetheless there are some
parameters, such as APQ11, the amplitude perturbation quotient (i.e. shimmer)
over 11 cycles, which seem to allow areal grouping, e.g. with Mongolian singers,
who show the highest values for PM1. However, the articulatory reinforcement
strategy interacted strongly with the phonation mode and showed the highest
amplitude perturbation values for AT1 (see Fig. 2). Another clear areal group
tendency is observable for Hakas singers with a clear preference for lower F0 in
PM1 (median values: 110Hz for AT2/AT3, 160Hz for AT1) and PM2 (60Hz for
AT3). PM2 samples of Hakas singers also show higher harmonics-to-noise ratio
28
(HNR) values and lower mean spectral slopes in all three bands investigated (0-
2kHz; 2-5kHz; 5-8kHz).
Figure 2. Areal tendencies represented by the median (full circle within the box)
for the acoustic shimmer (APQ11) measures of 44 singers
For perturbation measures of Lx-signals of the double-cyclic phonation
(PM2), the perturbation parameters have been adopted so that every second cycle
could be taken into account (see bottom channel in Figure 3). This reveals a very
stable vibratory pattern, unlike similarly labeled pathological patterns (cf. Fuks et
al., 1998).
Figure 3. Double cycle phonation mode (kargyraa) sequence of a three-channel
recording (VxLxSx) of the Tuvan singer SI
29
For the ultra-structure of the Lx, besides a schematic description of the cycle
shape, the applicable phase quotients (closed, closing, open quotient) and
symmetry indicators (speed quotient, contact index) were analyzed. Based on the
values of the closed quotient and closing quotient for PM1 (AES-VF), the
impression of a tensed (sometimes pressed) voice seems to be justified. In AT1,
the subglottal wave is fully dominated by reinforced harmonic formants (usually
F2).
For the low PM2, there was only one singer for whom the involvement of
an AEF-VF phonation type seemed reasonably certain. A controlled imitation of
AEF-VF phonation by the author was added. For both singers, the lack of a peak
around 3 kHz in the long-term-average spectra (cf. spectrogram in Figure 1)
comes into question. However, the noise-to-harmonics ratio value, that is the ratio
of nonharmonic energy (frequency range: 70Hz – 4500Hz) in the spectrum, which
is therefore taken as an indicator of higher frequency noise, was not particularly
noticeable. The majority of the investigated singers seem to use a phonation type
of a double-cycle ventricular fold/vocal fold oscillation (VTF-VF). Based on the
synchronous analysis of Vx, Lx and inverse filtered Vx it can be concluded that
the main vocal tract excitation occurs with the closure phase of the ‘pure’ vocal
fold cycle (cf. Henrich et al., 2006; Bailly et al., 2010). One cycle, presumably the
VF-cycle, showed short closing phases and higher symmetry indicators. Then the
vibration of the VFs triggers the VTF vibration at F0/2. However, in terms of
cycle-to-cycle amplitude difference, the subcorpus of Vx-Lx signals contains
examples with exactly the reverse patterns: the Vx excitation instant aligns well
with the higher closing peak in Lx but more frequently with the lower peak (see
Figure 3). It also remains uncertain to what degree the subglottal wave is able to
support one of the two cycles. For the one case where all three channels were
successfully recorded, the subglottal sound pressure maximum seemed to precede
the supraglottal peak, appearing right at the end of the opening phase.
Overall, the acquired data support a model of reinforcement of harmonics
by four different means (cf. Edgerton, 2005). First, there is voice source variation
(shortened closing phase, with increased excitation strength presumably via
increased subglottal pressure, while air flow remains constant or lowered for the
tensed mode (PM1); and double cycle modes involving mass bodies of
supralaryngeal structures for the low mode (PM2) enabling fundamentals at half of
VF-F0). Second, a specific formant adjustment for F2 comes into play that results,
for some articulatory strategies, in formant merging (F1/F2 or F2/F3) due to
multiple vocal tract constrictions (e.g. the sublaminal cavity; Engstrad et al.,
2007), including a coupling of source and adjacent epi- and supralaryngeal rooms
(of approx. 1/6 vocal-tract length; cf. Titze & Story, 1997). Third, a specific
bandwidth tuning results partially from adjustment of lip radiation and partially
from a stiffness of the articulators. Finally, the fourth mechanism of reinforcing
harmonics is the aryepiglottic sphincter which facilitates F1 and F0 damping, a
mechanism that is used individually to a very different extent.
30
[A short audio sample from the Tuvan singer Ayas Danzyrin, recorded by author in 2000,
can be found here: http://www.eva.mpg.de/~grawunde/otsths/phdxtdabs.html]
References
Bailly, L., Henrich, N. & Pelorson, X. (2010). Vocal fold and ventricular fold
vibration in period-doubling phonation: Physiological description and
aerodynamic modeling. Journal of the Acoustical Society of America 127(5),
pp. 3212–3222.
Blench, R. & Dendo, M. (2006) Musical instruments and musical practice as
markers of the Austronesian expansion post- Taiwan. Paper presented at the
18th Congress of the Indo-pacific Prehistory Association, University of the
Philippines, Manila, 20 – 26 March 2006 retrieved 2011-04-01 from
http://www.rogerblench.info/Ethnomusicology
%20data/Papers/Asia/General/Roger%20Blench%20AN%20music%20II%2
0paper%20submit.pdf
Dmitriev, L. B., Chernov, B. P. & Maslov, V. T. (1983). Functioning of the voice
mechanism in double-voice touvinian singing. Folia Phoniatrica 35(5),
pp.193–197.
Edgerton, M. (2005). The 21st-century voice: contemporary and traditional extra-
normal voice. The New Instrumentation (Vol. 9). Scarecrow, Lanham (ML),
Toronto, Oxford.
Edmondson, J. & Esling, J. (2006). The valves of the throat and their functioning
in tone, vocal register, and stress: laryngoscopic case studies. Phonology 23,
pp.157–191.
Engstrand, O., Frid, J. & Lindblom, B. (2007). A perceptual bridge between
coronal and dorsal /r/. In Solé, M.-J., Beddor, P. S. & Ohala, M.,(eds),
Experimental approaches to phonology, pp 175–191. Oxford University
Press.
Esling, J. H., Zeroual, C. & Crevier-Buchman, L. (2007). A study of muscular
synergies at the glottal, ventricular and aryepiglottic levels. Proc. of the 16th
ICPhS, Saarbrücken, pp. 585-588.
Fuks, L., Hammarberg, B. & Sundberg, J. (1998). A self-sustained vocal-
ventricular phonation mode: acoustical, aerodynamic and glottographic
evidences. TMH-QPSR 3/1998, pp. 49–59.
Grawunder, S. (2003a). Comparison of voice production types of ’western’
overtone singing and south siberian throat singing. Proc. of the 15th ICPhS,
Barcelona., pp. 1699–1702.
Grawunder, S. (2003b). Der südsibirische Kehlgesang als Gegenstand
phonetischer Untersuchungen. In: Krech, E.-M. & Stock, E. (eds)
Gegenstandsauffassung und aktuelle Forschungen der halleschen
Sprechwissenschaft (Hallesche Schriften zur Sprechwissenschaft und
Phonetik vol. 10), pp 53–91. Peter Lang, Frankfurt am Main.
31
Grawunder, S. (2009). On the Physiology of Voice Production in South-Siberian
Throat Singing - Analysis of Acoustic and Electrophysiological Evidences.
Frank & Timme, Berlin.
Kotby, M. N., Kirchner, J. A., Kahane, J. C., Basiouny, S. E. & el Samaa, M.
(1991). Histo-anatomical structure of the human laryngeal ventricle. Acta
Otolaryngol, 111(2), pp. 396–402.
Koufman, J. A., Radomski, T. A., Joharji, G. M., Russell, G. B., & Pillsbury, D.
C. (1996). Laryngeal biomechanics of the singing voice. Otolaryngol Head
Neck Surg,115(6), pp.527–537.
Kyrgys, Z. K. (2002). Tuvinskoe gorlovoe penie - etnomuzikovečeskoe
issledovanie [Tuvan Throat Singing - ethnomusicological studies]. Nauka,
Novosibirsk.
Lindestad, P. A., Sødersten, M., Merker, B. & Granqvist, S. (2001). Voice source
characteristics in mongolian “throat singing” studied with high-speed
imaging technique, acoustic spectra, and inverse filtering. Journal of Voice
15(1), pp.78–85.
Lindestad, P. A., Blixt, V., Pahlberg-Olsson, J. & Hammarberg, B. (2004).
Ventricular fold vibration in voice production: a high-speed imaging study
with kymographic, acoustic and perceptual analyses of a voice patient and a
vocally healthy subject. Logoped Phoniatr Vocol 29(4), pp. 162–70.
Neumann, K., Gall, V., Schutte, H. K. & Miller, D. G. (2003). A new method to
record subglottal pressure waves: potential applications. Journal of Voice
17(2), pp.140–59.
Neuschaefer-Rube, C., Saus, W., Matern, G., Kob, M. & Klajman, S. (2001).
Sono-graphische und endoskopische Untersuchungen beim Obertonsingen.
In: Geissner, H. (ed) Stimmkulturen – 3. Stuttgarter Stimmtage 2000, pp.
219–222. Röhrig Universitätsverlag, St. Ingbert.
Reidenbach, M. M. (1998a). Aryepiglottic fold: normal topography and clinical
implications. Clin Anat 11(4), pp. 223–35.
Reidenbach, M. M. (1998b). The muscular tissue of the vestibular folds of the
larynx. Eur Arch Otorhinolaryngol 255(7), pp.365–7.
Sakakibara, K.-I., Kimura, M., Imagawa, H., Niimi, S. & Tayama, N. (2004).
Physiological study of the supraglottal structure. In ICVPB 2004, Marseille.
Stager, S. V., Neubert, R., Miller, S., Regnell, J. R. & Bielamowicz, S. A. (2003).
Incidence of supraglottic activity in males and females: a preliminary report.
Journal of Voice 17(3), pp. 395–402.
Saus, W. (2004). Oberton singen – Das Geheimnis einer magischen Stimmkunst.
Traumzeit-Verlag, Schönau, Odenwald.
Titze, I. R. & Story, B. H. (1997). Acoustic interactions of the voice source with
the lower vocal tract. Journal of the Acoustical Society of America 101(4),
pp. 2234–2243.
32
Trân, Q. H. (1991). New experiments about the overtone singing style. Bulletin d’
Audio-phonologie. Ann. Sc. Univ. Franche-Comté, Vol. VII (N◦5&6), pp.
607–618.
van Tongeren, M. (2002). Overtone Singing - physics and metaphysics of
harmonics in east and west. The Harmonic Series (vol. 1). Fusica,
Amsterdam.
Wokurek, W., & Madsack, A. (2009). Comparison of manual and automated
estimates of subglottal resonances, Proc. Interspeech, Brighton, pp. 1671-
1674.
Yanagisawa, E., Estill, J., Kmucha, S. T. & Leder, S. B. (1989). The contribution
of aryepiglottic constriction to “ringing” voice quality - a videolaryngoscopic
study with acoustic analysis. Journal of Voice 3(4), pp. 342–350.
33
THE HISTORIC ACOUSTIC-PHONETIC COLLECTION
OF THE TU DRESDEN
Rüdiger Hoffmann, Dieter Mehnert
Technische Universität Dresden, Institut für Akustik und
Sprachkommunikation
email: [email protected], [email protected]
1 Introduction
At the beginning of the last century, the growing interest in foreign cultures and
languages led to a rapid development in experimental phonetics. In Germany,
Rousselot’s scholar, Panconcelli-Calzia, introduced experimental phonetics as a
scientific discipline in Hamburg, as did Gutzmann and Wethlo in Berlin. With the
development of electronic computing in the middle of the century, the interest in
human hearing and speaking was extended to machines, and the field of speech
technology, with the main topics of speech recognition and synthesis, started to be
investigated. In this way, we have far more than one century of fascinating
development of experimental phonetics and speech technology. It can be
illustrated by numerous material objects coming from phonetic or acoustic
laboratories. The Dresden University of Technology, which was one of the
pioneering institutions in German speech technology, hosts a collection of such
objects, called Historic Acoustic-phonetic Collection (HAPS). HAPS was formally
founded more than one decade ago, in 1999, but its roots go back to very
renowned German institutes of the past. This paper describes the history and the
recent activities of this university-owned collection.
2 History of the Collection
2.1 Forming a Collection in Speech Technology
Information Technology at the TU Dresden goes back to Heinrich Barkhausen
(1881–1956), the “father of the electron valve,” who taught from 1911 to 1953. He
was also interested in psycoacoustics and invented the first measurement device
for loudness. Speech research in a narrower sense started with the development of
a vocoder in the 1950s. Walter Tscheschner (1927–2004, Figure 1) started his
extensive investigations on the speech signal using components of this vocoder.
34
Figure 1. Walter Tscheschner (right), pioneer of speech
technology in Dresden, with the founder of the Institute of
Telecommunikation of the former TH Dresden, Kurt
Freitag. Photograph from about 1960.
In 1969, a scientific unit for Communication and Measurement was founded
in Dresden. It was the main root of the present Institute of Acoustics and Speech
Communication. W. Tscheschner was appointed Professor of Speech Communica-
tion and started with research in speech synthesis and recognition.
A number of representative devices for speech synthesis and recognition
have been developed in Dresden. Over six decades, they formed a historic
collection, which demonstrates how speech technology was developed depending
on the technological base, starting with electronic valves and ending with
embedded devices [1].
2.2 Expanding the Collection towards Experimental Phonetics
At the Berlin University, phonetics was established as an institution out of two
disciplines: linguistics and medicine. The linguistic root was formed by the
Phonographic Commission, founded in 1915, which was started to record the
voices of speakers representing foreign peoples on wax cylinders or records. This
institution developed in several stages into the Institute of Sound Research at
Berlin University. In 1951, the institute was renamed Institute of Phonetics.
The second root of phonetics at the Berlin University is represented by
Hermann Gutzmann sen. (1865-1922), who worked as a voice and speech
pathologist. Gutzmann, who made speech therapy part of the university’s
curriculum, collected all the new instruments and research devices that had been
used since 1900 by the emerging discipline of experimental phonetics. It was on
Gutzmann’s initiative that the first Phonetics Laboratory was founded in Berlin. In
1926, the Phonetics Laboratory became an independent institution under the
direction of Franz Wethlo (1866–1960). Wethlo received the teaching assignment
35
for Experimental Phonetics in 1926, which gave him the opportunity to extend the
laboratory and to purchase new equipment. He developed numerous pieces of
equipment. After the re-opening of Berlin University in 1947, the Phonetics
Laboratory became part of the Institute for Special Education in 1950, which had
just been founded.
More details and a description of how the two roots came together can be
found in [2] and [3]. After the restructuring following German reunification in
1990, phonetics was organized under the roof of the School of Rehabilitation
Sciences. As a result of the higher education reform at the three Berlin
universities, enrollment for the course of study ‘Science of Speech/specialisation
Voice and Speech Therapy’ was stopped by decree in the autumn semester, 1993.
This led to the closing of the subject area Phonetics in Berlin at the end of the year
1996.
Based on the long lasting cooperation between the phonetics in Berlin and
Speech Technology in Dresden [4], the historical remnants of the phonetic
equipment were transferred to Dresden following the closing of the Chair of
Phonetics in Berlin. This equipment set was complemented by a number of
devices which came from numerous other German institutions, mainly from a
former laboratory in Chemnitz which was founded by Georg Zöppel (1892–1963).
With this merger, the Dresden collection expanded to represent one century of
continuous development of experimental phonetics and speech technology. The
merger was completed in 1999. Therefore, we consider this as the founding year
of the HAPS.
2.3 The Merger with the Former Hamburg Phonetic Collection
The Humanities Faculty of the Hamburg University goes back mainly to the
Hamburg Colonial Institute, which was opened in 1908. It included a number of
chairs working with foreign languages. There, a phonetics laboratory was founded
in 1910 as a part of the Department of African Languages, developing later into a
separate institute of the Hamburg University, which was founded in 1919.
From 1910 to 1949, the Phonetics Laboratory or Institute, respectively, was
directed by Giulio Panconcelli-Calzia (1878–1966, [5], Figure 2) who was a
scholar of the Abb´e Rousselot. He was an ingenious researcher who built the
institute into a place of international scientific importance. He founded the journal
VOX, which served as an international platform for experimental phonetics. It is
notable that the First International Congress of Experimental Phonetics took place
in Hamburg back in 1914.
36
Figure 2. Giulio Panconcelli-Calzia demonstrates the
application of a kymograph. Photograph from the
HAPS collection.
A detailed description of the history of the institute is given in [6]. In the
1990s, the educational branch of the institute was transferred to another
department. The remaining part, which focused to general phonetics, was closed at
the end of the winter term 2006/07 due to the restructuring of Hamburg
University.
The large collection of phonetic devices, which was part of the Phonetic
Institute, fortunately survived the destruction of Hamburg during World War II
and was opened for the public in 1986 [7]. As a plan for preserving this valuable
collection, despite of the closing of the institute, the responsible department
proposed a merger with the collection in Dresden. The collection was transferred
to Dresden in 2005. Since 2006, the united collection can be visited in two rooms
of the Barkhausen building of the TU Dresden (Figure 3).
37
Figure 3. View on one room of the collection in the Barkhausen building of the
TU Dresden.
3 Recent Status of the Collection
The HAPS preserves parts of the material estate of several important institutions in
Germany. It represents, therefore, the development of experimental phonetics and
speech technology in Germany with a high degree of completeness. In more detail,
the following groups of exhibits are available:
Historic phonetic devices of the pre-electronic era
These devices from the first half of the 20th century are mainly mechanical and
include different groups:
instruments for the experimental work of the phonetician (devices for
recording speech and related signals,
devices for interpreting the recordings like measuring pitch contours,
devices for measuring frequencies and performing spectral analysis,
objects for teaching purposes (models of voicing and articulation),
early devices for speech training and rehabilitation of handicapped people.
Historic phonetic devices of the early electronic era
The purpose of these objects from the second half of the 20th century is similar to
that of the mechanical devices, which are mentioned above, but are now
38
accomplished by electronic means. This collection stops with the introduction of
the computer in the phonetic laboratories.
Historic objects demonstrating the development of speech technology
A few objects of this collection demonstrate how sounds and speech can be
produced by mechanical means. Of course, the real development of devices for
speech synthesis and speech recognition is connected to the electronic and,
primarily, the computer era. The collection includes not only objects from the
research and development in Dresden (following the vocoder from the 1950s), but
also a number of early speech synthesizers from other laboratories.
Historic sound recordings
At first, it must be noted that the placement of the important collection of wax
cylinders from the former Hamburg Colonial Institute and its successor chairs is
not known. Hence, they did not come to Dresden. However, the HAPS includes a
larger number of shellac records. Some of them were produced in the laboratory of
Panconcelli-Calcia for demonstration purposes. The main collection, however,
consists of commercial music records with lower scientific importance. They were
collected by Wilhelm Heinitz (1883–1963) who directed a research unit for
ethnomusicology in Hamburg until 1948. Furthermore, the HAPS includes tapes
with sound examples of the Dresden vocoder and early speech synthesizers.
Historic photographs and transparencies
The collection includes, among other visual media, a set of valuable photographic
plates from Panconcelli-Calzia’s laboratory. Some of them are very useful because
they demonstrate the correct application of early phonetic devices.
4 Public Activities
The HAPS is a collection of the university which is used in teaching and research.
The university collections in Dresden are managed by a curator which is
responsible for the inventory. Due to the rapid growth of the collection during the
last decade, the simple activity of producing such an inventory was very
important. The objects have been photographed, and a first selection of the images
is available on the websites of the institute [8]. A printed catalogue of the
collection is in preparation. A first volume, which includes the historic phonetic
devices, will be published around the end of this year by the publisher Thelem in
Dresden. The HAPS can be visited on demand and at special opportunities like the
dies academicus or the annual “night of sciences”. Additionally, selected objects
have been presented at special exhibitions as follows:
Exhibition about measuring pitch with historic instruments at the 3rd
International Conference on Speech Prosody in Dresden, 2006,
Participation with selected objects at the exhibition “Kempelen – Man in
the Machine” in the Hall of Arts, Budapest, 2007,
39
Exhibition of selected objects at the 16th International Congress of
Phonetic Sciences (ICPhS) in Saarbrücken, 2007 (Figure 4),
Special exhibition “SprachSignale” (SpeechSignals) in the Technical
Museum Dresden, 2009–2010.
Figure 4. Selected exhibits from the HAPS at the International Congress of
Phonetic Sciences in Saarbrücken, Germany, 2007.
5 Scientific Projects
A number of scientific historic projects have been performed during the last
decade. They have been partially supported by the German Acoustic Society
(DEGA). A short overview on these activities follows:
5.1 History of the Institutions
The HAPS illustrates more than a century of development in experimental
phonetics and more than a half century in speech technology. It is important to
connect the exhibits with the scientific development at the places where they
originated. Therefore, we are collecting and publishing material on the
development in Dresden and Berlin [4], Hamburg and other places. In particular,
we are working on a monograph about the development of speech technology in
Dresden.
5.2 Investigations on Selected Phonetic Devices
It is sometimes not easy to understand how the historic phonetic devices worked.
Many questions had to be answered for the descriptions of the instruments in the
catalogue which is prepared for printing now. Among them, some devices were
investigated in more detail.
Wethlo’s cushion pipes
An early project dealt with the reconstruction of historical larynx models. In 1898,
Ewald had proposed an improvement of the existing larynx models by replacing
40
the simple membranes with air-pressurized cushions. Wethlo investigated this
more natural construction in great detail from 1913 onwards [9]. The model,
which was critical in the development of voicing theories, is known as “Wethlo’s
Polsterpfeife” (cushion pipe). The Dresden collection includes a number of these
objects in different sizes (Figure 5). Some of them are originals from Wethlo’s
estate. They were reconstructed, and a number of experiments and measurements
were performed [10].
Figure 5. Historic larynx models from Franz Wethlo, so-called cushion
pipes.
History of pitch measurement
Pitch measurement has always played an important role in phonetics. There were
different methods for recording speech signals, but the application of a kymograph
was the predominant one. After recording the speech signal, it had to be measured
to produce a curve showing the pitch vs. time. The whole procedure of converting
kymographic waveforms into pitch contours required a number of steps, which
had to be performed with great precision. Because this was a very time-consuming
process, a number of aids were proposed, which were in use until the 1950s. We
have tried to explain their application [11].
Pitch measurement with Boeke’s rack
Another way to measure pitch contours and other parameters is based on the
measurement of the “glyphs” at the surface of the wax cylinders of Edison’s
phonograph. This was performed using a very sophisticated instrument which was
41
designed by J. D. Boeke. One of these devices is part of the HAPS (Figure 6) and
was described in more detail in [12].
Figure 6. Boeke’s rack for measuring the ‘glyphs’ at wax cylinders.
Accuracy of measuring frequencies with mechanical resonators
A simple and widespread method for measuring the frequency of sounds was the
application of Helmholtz resonators (fixed frequencies) or resonator tubes of
Schaefer (tunable frequencies, see Figure 7). It is interesting to know more about
the accuracy of these historic measuring devices. Therefore we performed a
number of listening experiments which showed high accuracy in general, but a
systematic deviation in the case of Schaefer’s resonators [13].
42
Figure 7. A set of tunable resonators
from Schaefer.
Transfer functions of Marey’s capsules
Transducers, which convert speech sounds into mechanic movements of writing
pins, have been used successfully for waveform recording early in experimental
phonetics. The sound is transmitted through a hose into a flat, normally circular
capsule, which is closed by a thin rubber membrane. The movement of the
membrane is transferred to a light lever with an attached pin. The tip of the pin
scratches the waveform in the sooted paper on the revolving drum of a
kymograph. This approach dates back to E. J. Marey (1830–1904) who used it for
recordings of the movement of the pulse artery (sphygmograph) and other
physiologic motions. Later, it was widely applied in experimental phonetics by P.-
J. Rousselot (1846–1924), his scholar G. Panconcelli-Calcia, and other successors.
The properties of the transducers of the Marey type were evaluated by interfero-
metric measurements of the transfer functions of numerous capsules from the
HAPS collection [13]. It became clear that the transfer functions are not at all flat
over the frequency range of interest. They show several maxima which are
determined by the interplay of the system components, mainly the hose and the
capsule. Fortunately, the missing flatness does not influence the period lengths of
the recorded signals, which are measured for determining the pitch contour.
Historic devices for rehabilitation purposes
Rehabilitation engineering is a classical application field of speech technology.
Therefore, it is interesting to study the early attempts, mainly from the pre-
electronic era. Prototypes of such devices are rare exhibits. The HAPS owns some
examples which have been demonstrated in [14].
5.3 History of Speech Technology
Speech technology is the main research focus of the chair where the HAPS is
maintained. Therefore the development of speech analysis and synthesis is one of
43
the foci of the historic interest. During the last years, special attention was directed
to the following problems.
History of mechanical speech synthesis
This research activity is due to the existence of small mechanical sound or word
synthesizers which came to the HAPS from the Hamburg collection (Figure 8). In
the year 1899, the notable otologist Johannes Kessel (1839–1907) presented such
instruments at a scientific meeting in Munich [15]. Kessel aimed to use them to
teach people who have a significant degree of deafness. He recognized, however,
that the quality of the synthetic voice was still insufficient for this purpose. Later,
the original devices came to the Hamburg laboratory. The mechanical voices are
interesting as early mechanical speech synthesizers. Therefore, we started a project
to explore the development of this technology [16]. It can be interpreted as a late
spin-off of Kempelen’s speaking machine, the principle of which came (via
Melzel) to the puppet manufacturers.
Figure 8. A collection of voice mechanics by Hugo Hölbe, arranged in
a demonstration box.
In our case, Hugo Hölbe (1844–1931) from Sonneberg was the manufacturer
of the voices. Sonneberg is a town in Thuringia and was known as the world
capital of toys in former times (Figure 9). We learned that “Stimmenmacher”
(voice manufacturer) was a separate profession in the production of puppets and
cuddly toys.
44
Figure 9. The “speaking picture book” was patented in 1874 by the bookseller
Theodor Brand from Sonneberg, Germany. It applies voice mechanics similar to
that from Figure 8. Left: view of the title; right: the interior.
History of early vocoders
The development of the vocoder in the 1930s had a profound impact on speech
research in general. The first patent of the principle of the channel vocoder was
derived by K.-O. Schmidt [17], but the most important prototype was originated
by H. Dudley [18] who also coined the name. A number of other prototypes were
developed during and after WorldWar II in different countries. We tried to collect
all available information about this period [19]. It was not always easy because
much of the work was secret in that time.
History of electronic speech synthesis
As already mentioned, there was also a vocoder developed in Dresden in the 1950s
(Figure 10). In the following decades, many prototypes of a speech synthesis
terminal were developed [20], partially in cooperation with the computer company
Robotron. We demonstrated these objects in the special exhibition SprachSignale
(cf. 4) and included the historic examples in our lectures on speech technology.
45
Figure 10. Photograph of the Dresden vocoder from the 1950s.
6 Conclusion
The HAPS has been well developed during the last decade. We are confident to be
able to continue the work of collecting equipment, as well as continuing some
research activity. We hope that the Department of Electrical Engineering and
Information Technology at the TU Dresden specifies a final place for all scientific
collections in the near future, which would guarantee stable conditions for the
future of the HAPS.
References
[1] Hoffmann, R.: 40 Jahre institutionalisierte Sprachtechnologie in Dresden.
Studientexte zur Sprachkommunikation, vol. 54. Dresden: TUDpress 2009,
7–35.
[2] Mehnert, D.: Phonetics at the University of Berlin – a history. The
Phonetician, No. 92 (2005–II), 34–39.
[3] Mehnert, D.: Phonetik an der Berliner Universität - ein Rückblick auf ihre
Geschichte und auf Forschungsarbeiten der letzten Jahre. Studientexte zur
Sprachkommunikation, vol. 35. Dresden: Universitätsverlag 2005, 33–54.
[4] Hoffmann, R.; Mehnert, D.: Berlin-Dresden traditions in experimental
phonetics and speech communication. In: Boe, L.-J.; Vilain, C.-E. (eds.): Un
siècle de phonétique expérimentale. Lyon: ENS Éditions 2010, 191–210.
[5] Köster, J.: Giulio Panconcelli-Calzia. The Phonetician, CL-61, 1992, 3–10.
46
[6] Neppert, J.; Pétursson, M.: Death of a Phonetic Institute: The Phonetic
Institute of the University of Hamburg. Studientexte zur Sprach-
kommunikation, vol. 54. Dresden: TUDpress 2009, 36–39.
[7] Grieger, W.: Führer durch die Schausammlung, Phonetisches Institut.
Hamburg: Christians 1989.
[8] www.ias.et.tu-dresden.de/sprache
[9] Wethlo, F.: Versuche mit Polsterpfeifen. Passow-Schaefers Beiträge für die
gesamte Physiologie 6(1913) 3, 268–280.
[10] Hoffmann, R.; Mehnert, D.; Dietzel, R.; Kordon, U.: Acoustic experiments
with Wethlo’s larynx model. International Workshop to the Memory of
Wolfgang von Kempelen, Budapest, March 11–13, 2004. Grazer
Linguistische Studien 62 (2004), 51–60.
[11] Mehnert, D.; Hoffmann, R.: Measuring Pitch with Historic Phonetic Devices.
3rd International Conference Speech Prosody, Dresden. May 2–5, 2006.
Dresden: TUDpress 2006, 927–931.
[12] Mehnert, D.; Dietzel, R.: Von Glyphen zu Tonhöhen und Intensitäten – das
Boekesche Gestell, ein historisches Auswertegerät. Studientexte zur
Sprachkommunikation, vol. 52. Dresden: TUDpress 2009, 198–208.
[13] Hoffmann, R.; Mehnert, D.; Dietzel, R.: Measuring the accuracy of historic
phonetic instruments. Proc. 17th Int. Congress of Phonetic Sciences, Hong
Kong 2011, pp. 176-179.
[14] Mehnert, D.; Dietzel, R.; Kordon, U.: Aus den Anfängen der Experimental-
phonetik – Hilfsgeräte zur Behandlung Hör- und Sprachbehinderter.
Fortschritte der Akustik, DAGA 2011, Düsseldorf, 147–148.
[15] Denker, A.: Bericht über die Versammlung deutscher Ohrenärzte und
Taubstummenlehrer zu München. Archiv für Ohrenheilkunde 47, Nr. 3, Nov.
1899, 198–208.
[16] Hoffmann, R.; Mehnert, D.: Die Kesselschen Stimm-Mechaniken in der
historischen akustisch-phonetischen Sammlung der TU Dresden. DAGA,
Stuttgart, March 19–22, 2007.
[17] Schmidt, K.-O.: Verfahren zur besseren Ausnutzung des Übertragungsweges.
German Patent 594 976, patented February 27, 1932.
[18] Dudley, H. W.: Signaling System. US Patent 2,098,956, patented Nov. 16,
1937.
[19] Hoffmann, R.: On the development of early vocoders. Proc. IEEE Histelcon,
Madrid 2010, 6 p.
[20] Hoffmann, R.: Sprachsynthese an der TU Dresden: Wurzeln und
Entwicklung. Studientexte zur Sprachkommunikation, vol. 35. Dresden:
Universitätsverlag 2005, 55–77.
47
GENIE: The Corpus for Spoken Lower Sorbian (GEsprochenes
NIEdersorbisch)
Roland Marti, Bistra Andreeva, William J. Barry
Department of Slavonic Languages, Saarland University, Saarbrücken,
Germany
Phonetics, Saarland University, Saarbrücken, Germany
e-mail: [email protected], [email protected],
Abstract
Lower Sorbian is a Slavonic minority language spoken in Eastern Germany in
German-speaking surroundings. The language is on the brink of extinction as there
are basically no native speakers below the age of sixty. Therefore, the
documentation of spoken Lower Sorbian is crucial. The corpus of spoken Lower
Sorbian GENIE (GE[sprochenes] NIE[dersorbisch]: http://genie.coli.uni-
saarland.de/) is the first documentation of this kind. It brings together various
kinds of spoken Lower Sorbian: recordings from the archive of Sorbian broadcasts
(years 1956-2006), recordings from the Archive of Sorbian Culture (dialect
recordings 1951-1971), and new recordings from native speakers made especially
for the corpus in 2005/2006.
The paper presents the corpus and its defining features, paying special
attention to the particular situation of Lower Sorbian and its bilingual speakers. On
the one hand, there is a very strong German influence; but on the other, Upper
Sorbian interference is also clearly recognizable in the recordings. Furthermore,
the paper illustrates the problem of what constitutes the speech of a native speaker
in the case of minority languages. Finally, the problems of corpora of endangered
languages are discussed.
1. Sorbian
Sorbian is currently geographically the furthermost western part of the Slavic
speaking area. It is at present a language island (more exactly, an archipelago of
islets) within a German speaking area, that is situated in Upper and Lower Lusatia.
This represents the remainder of the originally much larger territory, which, by
means of language exchange, was gradually Germanized; a process that was
repeatedly triggered and fostered by language-political measures that still continue
(cf. Figure 1).
48
Figure 1: The Sorbian-speaking region in Germany.
This language area can be roughly divided into Upper and Lower Sorbian.
Only in the Upper Sorbian area, more precisely in the Catholic districts, are there
still villages where Sorbian is the common language (Scholze, 2008); elsewhere it
remains nothing more than a family language, or rather the language of the older
generation(s). The number of people with an active command of Sorbian can only
be estimated. The estimates vary between 15,000 and 30,000 for Upper Sorbian
and between 5,000 and 10,000 for Lower Sorbian (Jodlbauer, Spieß & Steenwijk,
2001). Upper, as well as Lower, Sorbian are autonomous languages. They are
officially acknowledged as minority languages in Germany, first, in the
constitutions and appropriate laws concerning Sorbs (or Sorbs/Wends) in the Free
State of Saxony and the state Brandenburg1 and, second, in the European Charter
for Regional or Minority Languages.
The main problem for the Sorbian language is the dying-off of the Sorbian
speaking community due to the lack of younger native speakers and the
consequent shrinking of the area in which Sorbian is spoken. Geographical
shrinkage is a phenomenon that has been observed since the 16th century. Both
trends have been accelerating since the mid 19th century, and neither the revival
measures nor fostering throughout the German Democratic Republic era could
stop them. There are language preservation and revitalization measures at present
1 The official name in Brandenburg is “Sorbs/Wends” (“Sorben/Wenden”) and “Sorbian/Wendish”
(“sorbisch/ wendisch”) since a part of the Lower Sorbian speaking community refuses the name “Sorbs”
(“Sorben”) and “Sorbian” (“sorbisch”), where native speakers are concerned. According to linguistic
(Slavic) tradition only “Sorbs” (“Sorben”) and “Sorbian” (“sorbisch”) are used.
Brandenburg
POLAND
CZECH
REPUBLIC
LUSATIA - EnglishŁUŽYCA - Lower SorbianŁUŽICA - Upper SorbianLAUSITZ - German
Berlin
Saxony
Brandenburg
POLAND
CZECH
REPUBLIC
LUSATIA - EnglishŁUŽYCA - Lower SorbianŁUŽICA - Upper SorbianLAUSITZ - German
Berlin
Saxony
49
(especially the so called WITAJ-project; Budar & Norberg, 2006), which can,
however, at best slow down the language assimilation process. The situation of
Lower Sorbian is particularly dramatic since inter-generational transmission does
not exist any longer and children are led by means of (partial) immersion to the
status of a kind of “secondary native speaker”.
There are yet other specific problems concerning Lower Sorbian. The
revival of Sorbian life and its organization after the Second World War was
primarily initiated in the Upper Sorbian region and by Upper Sorbian exponents.
This led to the perception that the cultural life was Upper-Sorbian oriented, which
was in fact partially the case. This was experienced especially intensively in the
language domain. The spelling reform from 1949-1952 brought about the
approximation of Lower Sorbian to Upper Sorbian orthography. Since
pronunciation that oriented itself on the written language was fostered and
required at school and in the media, the spelling reform also had orthoepic
consequences (so-called “spelling pronunciation”). The Upper Sorbian linguistic
influence was further strengthened by the fact that, owing to the small number of
autochthonous Lower Sorbian experts, functionaries in Sorbian organizations and
teachers came predominantly from Upper Lusatia, and their language did not
conform to the linguistic features of Lower Sorbian. This resulted in the popular
impression that the Lower Sorbian standard language does not represent real
Lower Sorbian at all, but an overall Sorbian hybrid language at best, or a kind of
Upper Sorbian that had been adjusted slightly to Lower Sorbian. Many native
speakers of Lower Sorbian therefore refused to participate in official efforts to
strengthen the language and restricted its use to private life. Often they even
stopped transmitting the language to the next generation. On the other hand, the
official language policy, centred on the standard language and neglecting dialects,
gave rise to the feeling in Lower Sorbian speakers that they could not speak
correct Sorbian (an opinion that is heard repeatedly during field recordings). This
explains the wish for reinforced demarcation from Upper Sorbian which emerged
when state control over cultural life ceased. The latter finds expression in the
adoption of different terminology (“Wendish” instead of “Lower Sorbian”, cf. n.
1), in the withdrawal of some parts of the spelling reform from 1949-1952, and in
the rejection of a purist language that is felt to be Upper Sorbian.2
2. The Corpus for Spoken Lower Sorbian GENIE
In view of the precarious situation of Lower Sorbian that was described in relevant
studies (Jodlbauer, Spieß & Steenwijk, 2001; Norberg, 1996), it was foreseeable
that the “authentic” mother tongue would no longer exist within one generation at
best. That turned out to be particularly fatal for the spoken language since the
2 This results in the current (re)appearance of lexical Germanisms (lazowaś instead of cytaś, hundert
instead of sto), that have always been in colloquial use, also in written language. The similar situation can
be observed in the grammar section, e.g. with determination (occasional use of the definite and marginally
also the indefinite article).
50
“secondary mother tongue” (the maximum goal aimed at by efforts of
revitalization) differs strongly from the “authentic” mother tongue, especially in
its pronunciation.3 In this respect, it was important, and extremely urgent, to
document spoken Lower Sorbian. With this objective in mind, the corpus GENIE:
GEsprochenes NIEdersorbisch (Spoken Lower Sorbian) was created. The corpus
creation was partially funded by the Scientific Committee of the University of
Saarland in the years 2005-2006. The endeavour was also financially supported by
the Radio Berlin-Brandenburg (RBB) and the Sorbian Institute/Serbski Institut. In
order to make this corpus internationally usable for the scientific research, it was
made available on the web (http://genie.coli.uni-saarland.de). The GENIE website
is supported by the Insitut für Phonetik (http://www.coli.uni-
saarland.de/groups/WB/Phonetics/index.php) together with the Institut für
Slavistik (http://www.uni-saarland.de/fak4/fr44/) at the University of Saarland.
Due to copyright and data privacy protection rights, it could not be made generally
available; its use is permitted for scientific purpose by application
(http://genie.coli.uni-saarland.de/cgi-bin/benutzer.html). The corpus arrangement
was structured to meet the special features of the situation of Lower Sorbian
presented above and, where possible, to take into account the diachronic level.4
There are more than sixty hours of spoken Lower Sorbian in its distinct variants
available in GENIE. Even though the period of time covered by the recordings
ranges only from 1951 to 2006, the speakers' dates of birth indicate that the
diachrony is considerably deeper: the oldest speaker was born in 1860 (he was 94
years old at the time of the recording), the youngest speaker was born in 1973.
Individual diachrony is also traceable since several people are represented in
multiple recordings that were produced at different times.
2.1 Sources
The corpus consists of recordings from three different sources:
a) Archive of the Sorbian Radio (Studio Cottbus of the Radio Berlin-Brandenburg
RBB, formerly ORB, earlier still Radio of GDR)
This source consists of 110 recordings made between 1956 and 2006. Speakers of
dialects and of the standard language (native speakers of Lower Sorbian/ Wendish,
Upper Sorbian or German) are both represented in different variants of the
standard language. The text types are very different: conversation, interview,
address, report etc.
3 The reason for this is primarily due to the fact that the teachers employed in the revitalization project
WITAJ, apart from a few exceptions, do not have a command of Lower Sorbian as their mother tongue, but
at best as their secondary mother tongue. 4 Owing to copyright, the oldest recordings of Sorbian could not be adopted from the Berlin Archive,
therefore only marginal diachronic depth is taken into consideration: the recordings were made in the years
1951-2006.
51
b) Archive of Sorbian Culture/Serbski kulturny archiw (SKA) in the Sorbian
Institute/Serbski Institut
The source contains 135 recordings made between 1951 and 1971.The recordings
were compiled for linguistic purposes by the Institute, in particular for the Sorbian
Linguistic Atlas (cf. References SSA 1-13 1965-1993). Its aim was the recording
of local dialects (story, interview, elicitation etc.).
c) The field study project specifically for this corpus
The source consists of 100 recordings made between 2005 and 2006. They involve
conversations between J. Frahnow (pastor and native speaker) and mostly elderly
native speakers whose speech usually represents a local dialect. While selecting
the recordings and test persons, we attempted to depict the complexity of dialectal
forms of Lower Sorbian/Wendish along with diverse standard linguistic variants
employing the three sources mentioned.
2.2 Metadata files
There is a data record sheet for every recording containing the most important
information about the recording. Specifically, these are:
call number (the recording identifier): this consists of the letters f, r or s and
a four-character-number where f means field recording created by J.
Frahnow, r stands for recordings from the radio archive of the RBB, and s
signifies recordings from the Archive of Sorbian Culture. In addition to the
call numbers valid for this corpus, there are archive call numbers as used in
the source.
text type (e.g., conversation, interview, report)
contents (e.g., village life, customs, farming)
place of the recording
date of the recording
indication of sex (names are not given to protect the person’s identity)
speaker’s place of birth
speaker’s date of birth
dialect
family language: it is specified here whether the family language was
Lower Sorbian/ Wendish, German or mixed (or Upper Sorbian where
applicable)
places of residence
education
The place names in the arrays (place of the recording, place of birth, dialect and
place of residence) are given in German and Lower Sorbian/Wendish and can be
shown and arranged in three sections: place, municipality, and district.
Additionally, all the Lower Sorbian places covered are allocated to the dialect
52
areas. In doing so, the classification of the Sorbian Language Atlas was taken into
consideration, which ultimately goes back to the categorization by Muka (1911-
1926). In it, only Lower Sorbian dialects proper or transitional dialects are
distinguished. In the case of native speakers of Upper Sorbian, there is only a
reference to this fact without indication of the dialect area. In case of non-native
speakers or native speakers that use the standard language, the word “standard” is
used.
There are several metadata sets available for some recordings, namely in
cases where there is more than one speaker participating in the recording (hosts
and interviewers were usually not taken into account). The call numbers of the
metadata sets are identified in these cases by the attached index letters (e.g., a, b,
etc.).
Access to the datasets and audio recordings in the corpus may be obtained
either directly, by stating the call number, or indirectly by using a search form,
within which you can search or classify all specified arrays with intelligent filter
functions.
2.3 Technical data of the recordings
In addition to the specified background information, data record sheets comprise
the following information:
length of the recording in minutes and seconds
size of the .wav-file in bytes/kilobytes/megabytes
size of the .mp3-file5 in bytes/kilobytes/megabytes
sampling rate in Hz
amplitude quantization rate in bits per sample
number of channels (1 for mono, 2 for stereo)
signal-to-noise ratio SNR (as yet only with data from the field search
project)
bit rate (.mp3-file) in kBit/s
3. Examples from GENIE
It is evident from the description of the GENIE corpus that the material can be
analysed with various objectives in mind. For one thing, the description and the
comparison of the structural characteristics of the various dialect areas are an
attractive challenge in itself. Even though the spontaneous speech of the
recordings does not allow for an exhaustive grammatical description, the newly
recorded material provides a valuable supplement to the (not immediately
accessible) dialect recordings made during the German Democratic Republic era.
Another important question is to what extent the spoken standard language may
vary and, depending on the speaker’s origin, adopts a dialectal form, thus actually
containing Lower Sorbian, Upper Sorbian or German features. The focus of our
5 mp3 audio files are highly compressed in size. They take much less time to transmit over the internet.
53
first analyses, though, will be on the influence of German on spoken Lower
Sorbian; an influence that grew steadily over the 20th century, but which had been
present a long time before. The comparison of recordings of younger and older
people can shed light on the extent of this influence, as well as on the linguistic
features affected by it. More striking yet is the comparison of recordings of the
same person made at different times.
According to the existing descriptions (Schwela, 1906; Janaš, 1984; Starosta,
1991), there are well-known phonetic dissimilarities between German and Lower
Sorbian on the segmental level, the vowel quality and quantity, the R sound, the
realization of plosives with regard to voicing and aspiration, as well as the
existence of the dark L or a [w] and of the correlation of palatalization, widespread
in Slavic languages. There are, above all, characteristic features of intonation and
word stress known from impressionistic descriptions of the prosody. Other rarely
mentioned, though important discrepancies, are word-chaining modes, such as the
division of neighbouring vowels by means of a glottal stop or the type of voice
assimilation (progressive or regressive).
As examples of the existing and growing impact of the influence of German
on Lower Sorbian, we show here four of the phenomena mentioned above in
utterances of an elderly speaker (A, born in 1890) and of a younger speaker (B,
born in 1960).
Figure 2, a representation of the microphone signal and the spectrogram of the
utterance, “Chtož tu rolu wobźěłajo” (English “Who works on the land”)
illustrates several pronunciation features in one short stretch of speech that prove
the influence of German, three of which we comment on below:
1. In the word “rolu” the /r/ is realized as a uvular approximant ʁ (see I).
2. “wobźěłajo” /'obʑewajo/ starts with a glottal onset instead of a smooth
transition from “rolu” (see II) or an alternatively possible [h]
3. The syllable-final /b/ and the following syllable-initial /ʑ/ are voiceless (see
III).
54
Figure 2. The utterance “Chtož tu rolu wobźěłajo” (here: [xtɔʃ tʊ ʁɔlu ʔɔpʃevajɔ )
by speaker B (born in 1960) with (I) uvular ʁ , (II) hard vowel onset (glottal stop)
und (III) devoicing at the word coda with progressive devoicing of a voiced initial
fricative.
In Figure 3, depicting the oscillogram of an acoustic time signal and the
spectrogram of the utterance “tak daloko” (English “so far”), the voiceless
plosives /t/ (see I) and /k/ (see II) demonstrate, contrary to the claim that in Lower
Sorbian voiceless plosives are unaspirated, clear features of a moderate degree of
aspiration (in both cases 26 ms). The measured duration of aspiration is relatively
short if compared to that of monolingual speakers of German. Therefore it is
important to examine whether an intermediate form (similar to the weak aspiration
with Canadian speakers of French; Sundara et al., 2006; Fowler et al., 2008) has
become established in Sorbian, within this generation or with this speaker alone.
III I
II
55
Figure 3. The utterance “tak daloko” (here: [thak dalɔkhɔ]) by speaker B (born in
1960), where clear aspiration (I) of /t/ and (II) of /k/ can be noticed.
The older speaker (born in 1890) demonstrates a different articulation
pattern. Indeed, in figure 4 in her statement, “To njejo tak dobre” (English “It’s
not that good”), a tendency to aspirate can be observed: /t/ in “to” manifests an
aspiration duration of 37 ms (see I).
On the other hand, following /k/ in “tak” she produces a fully voiced initial
/d/ in “dobre” that affects /k/ regressively, making it voiced (see II). This suggests
that the assimilation process contrasts with the common German pattern but
corresponds to what is typical of other Slavic languages. The apical [r] in “dobre”
also differs from the German standard-/r/, which is a uvular fricative ʁ . There
are two signal muting taps of apical [r] to be seen in spectrogram as well as in the
microphone signal (see III).
I II
56
Figure 4. The utterance “To njejo tak dobre” (here: [thɔ ne t
hag dɔbrə]) by speaker
A (born in 1890), where (I) aspiration of /t/, (II) a fully voiced /d/ with partial
voicing of the preceding /k/ and (III) a double-contact apical /r/ can be observed.
As far as the fourth phenomenon in the younger speaker's recording is
concerne (the missing smooth transition from one vowel to the next across a word
boundary), it cannot be maintained that in earlier times glottal constriction,
according to German pattern, did not appear. In a short utterance (“a to ak,”
English “as”) of speaker A, there is a clear glottalization at the beginning of the
utterance and at the word boundary between “to” and “ak” (see I and II in figure
5). Further studies will allow us to determine how often such instances of
glottalization occur in her speech. It also cannot be ruled out that Slavic languages
behave similarly to other “binding” languages (French, Italian, English etc.) and
dialects (such as Alemannic). That is to say a stressed word with an initial vowel
in an emphatic context can very well start with a hard glottal onset. In the younger
speaker’s example, however, the glottalization appears in non-emphatic context.
The older speaker’s utterances are characterized by a general emphatic “word by
word” style. The utterance is not distinctively emphatic, but the glottalization
might be attributed to this general style. A further uncertainty, when comparing
the two speakers, results from age-related differences in the voice quality that add
to the difficulty of interpreting glottal phenomena.
I
C
II III
57
Figure 5. The utterance “a to ak” (here: [ʔa thɔ ʔak]) by speaker A (born in 1890)
with glottalization (I) at the beginning of the utterance and (II) at the word
boundary between “to” and “ak.”
4. Corpora of endangered languages – an exceptional case?
Following the presentation of a concrete corpus of an endangered language,
we should ask whether, from a general linguistic perspective, corpora of
endangered languages, or of micro languages in the broader sense (see The UCLA
Phonetics Lab Archive [http://archive.phonetics.ucla.edu/], The Endangered
Language Fund [ELF: http://endangeredlanguagefund.org/], DOkumentation
BEdrohter Sprachen/documentation of endangered languages [DOBES:
http://www.mpi.nl/DOBES/], and the Leipzig Endangered Languages Archive
[LELA: http://www.eva.mpg.de/lingua/resources/ lela.php] among others), are
essentially different from the corpora of other languages and whether this has
consequences for their planning, composition and supervision. In fact, there are
differences, but they are not of a principal nature.
An important difference concerns information value or, in other words,
representativeness of the corpora. Paradoxically, the corpora of endangered
languages are simultaneously more and less representative than those of other
languages. The higher degree of representativeness becomes especially clear in the
case of written corpora. Only languages with a limited written tradition may
include a high percentage of all that has been written in the corpus.
I
C
II
58
There are two reasons for lower representativeness. First, endangered
languages are either not documented at all, or if they are, then by relatively small-
sized corpora and only rarely by means of several corpora. In addition, the data
that exists has usually been collected by chance and does not reflect an intentional
selection. The second reason for lower representativeness lies in the fact that the
norm of endangered languages is less fixed, and so there is greater variability
within them that can only be imperfectly represented. It is even possible that
idiolectal predominance in a corpus may distort linguistic structures.
A further discrepancy is related to the composition, processing and
supervision of the corpora. As far as endangered languages are concerned, the
group of people that are interested in the corpora and are capable to put them
together is rather small. The same applies to the financial possibilities of
minorities. As a consequence, corpora of minority languages, if they are created at
all, cannot be specialized (they are the proverbial 'all-in-one' tools) and will only
be partially annotated, if at all. Continuous development, updating and
documentation are only possible to a very limited degree.
A major difference is ultimately inherent in the function of the corpora. As far
as endangered languages are concerned, the corpus is not a linguistic working tool
in the first place. It is, rather, a memorial with a quite distinct culture-political
objective. It shall document what still exists and what will possibly soon
disappear.6 This may well have consequences for the choice of the texts to be
recorded if the “antiquarian” idea prevails.
Corpora of endangered languages are clearly an exceptional case. Both
producers and consumers must take this into consideration. The producers must
take into account the limiting general conditions and the additional functions and
ensure that such corpora will be supervised in spite of limited resources. The users
must show understanding for the particularities of such corpora and also be willing
to contribute actively to their optimization, for example, by making the
transcriptions and annotations they created themselves available for the corpus.
References
Budar, L. & Norberg, M. (2006). „Les écoles sorabes après 1990“. Education et
Sociétés Plurilingues 20 (juin): 27-38.
Fowler, C. A., Sramko, V., Ostry, D. J., Rowland, S. & Halle, P. (2008). Cross-
language phonetic influences on the speech of French-English bilinguals.
Journal of Phonetics 36, pp. 649-663.
Janaš, Pětr (1984). Niedersorbische Grammatik für den Schulgebrauch. Bautzen:
Domowina.
6 It is not a coincidence that in the “Archive of vanished places” (“Archiv verschwundener Orte/archiw
zgubjonych jsow”) in the village of Baršć/Forst, recordings of Sorbian language are to be heard in order to
demonstrate how “Devastation” (open-cast lignite mining) affected the cultural heritage of the region
(www.forst-lausitz.de/sixcms/media.php/674/Broschuere_AVO_Aufl2.pdf).
59
Jodlbauer, R., Spieß, G. & Steenwijk, H. (2001). Die aktuelle Situation der nieder-
sorbischen Sprache: Ergebnisse einer soziolinguistischen Untersuchung der
Jahre 1993-1995. Bautzen: Domowina (= Schriften des Sorbischen Instituts
27).
Muka, Ernst (1911-1926). Słownik dolnoserbskeje rěcy a jeje narěcow I–III.
Petrograd: RAN; Praha: ČAVU.
Norberg, Madlena (1996). Sprachwechselprozeß in der Niederlausitz. Sozio-
linguistische Fallstudie der deutsch-sorbischen Gemeinde Drachhausen/ Ho-
choza. Uppsala (= Acta Universitatis Upsaliensis. Studia Slavica Upsaliensia
37).
Scholze, Lenka (2008). Das grammatische System der obersorbischen Umgangs-
sprache im Sprachkontakt. Bautzen: Domowina (= Schriften des Sorbischen
Instituts 45).
Schwela, Gotthold (1906). Lehrbuch der Niederwendischen Sprache. Erster Teil:
Grammatik. Heidelberg: Ficker.
SSA 1-15 1965-1996 Sorbischer Sprachatlas (Serbski rěčny atlas), bearbeitet von
H. Faßke, H. Jentsch und S. Michalk, 1-15, Bautzen (Budyšin) 1965-1996.
Starosta, Manfred (1991). Niedersorbisch schnell und intensiv 1. Bautzen:
Domowina.
Sundara, M., Polka, L., & Baum, S. (2006). Production of coronal stops by simult-
aneously bilingual adults. Bilingualism: Language and Cognition 9, pp. 97–
114.
Internet sources (accessed 30.03.2011):
www.forst-lausitz.de/sixcms/media.php/674/Broschuere_AVO_Aufl2.pdf
http://genie.coli.uni-saarland.de
http://www.mpi.nl/DOBES/
http://www.eva.mpg.de/lingua/resources/lela.php
http://endangeredlanguagefund.org/
60
Adjectif épithète et attribut de l’objet. Qu’en est-il de la prosodie ?
Denis Ramasse
CRISCO EA 4255, Université de Caen, France
e-mail: [email protected]
Résumé
En français, un adjectif placé juste après un nom peut avoir deux fonctions
différentes : épithète et attribut du complément d’objet (a.c.o.). Une confusion
peut ainsi naître dans l’interprétation d’une phrase comme : J’ai cru cet homme
sincère qui peut être comprise de deux façons : cet homme était vraiment sincère
et je l’ai cru, cela correspond à la fonction épithète ; ou j’ai cru que cet homme
était sincère et je me suis peut-être trompé, dans cette interprétation l’adjectif est
attribut de l’objet homme. On a cherché à savoir si la prosodie permettait de lever
cette ambiguïté sous deux aspects : celui de l’encodage et celui du décodage. 10
phrases ambiguës, présentées dans deux cotextes (l’un forçant l’analyse de
l’adjectif en épithète, l’autre en a.c.o.), ont été enregistrées par 6 locuteurs (3
hommes, 3 femmes). L’analyse acoustique de ce corpus a révélé 4 indices
prosodiques susceptibles de différencier les deux fonctions: un court silence entre
nom et adjectif (appelé pausette dans une description précédente), une montée
mélodique finale, un allongement moyen de durée et une élévation moyenne de
hauteur. Une analyse statistique des données a montré l’importance des deux
premiers indices. Un double test de perception a permis de vérifier que cette
hiérarchie des indices n’était pas la même au niveau du décodage parce qu’elle a
révélé aussi qu’une élévation moyenne de hauteur venait renforcer le rôle de la
pausette pour indiquer une fonction attribut de l’objet.
Abstract
Can prosody help to decide whether an adjective is epithet or attribute of the
object in a sentence?
In French, there can be an ambiguity when you don’t know by the context the
exact function of an adjective. In the sentence J’ai cru cet homme sincère, you can
understand: I trusted this sincere man (the adjective is epithet) or I thought this
man was sincere (the adjective is attribute of the object man). Perhaps prosodic
cues could disambiguate such sentences. To check this hypothesis 20 sentences, in
fact 10 sentences but realized in two different co-texts, were recorded by 3 men
and 3 women. The acoustic analysis of the recordings revealed 4 cues which could
establish a distinction between the two functions. The adjective was analyzed as
an attribute of the object when i) there was a short silence between the noun and
the adjective, ii) there was a melodic rising at the end of the sentence, iii) and iv)
the average duration and the average height of the sentence were a little greater. A
statistical analysis of the data showed that the silence and the final rising were the
61
most important cues. A perceptual test was then prepared to check whether these
cues were used in perception. It proved that there was not the same hierarchy
between cues in the perception, because the average height of the sentence seems
to be a useful cue which completes the role of the silence.
1 Introduction
Une séquence Verbe (V) + Nom (N) + Adjectif (A) peut être source d’ambiguïté
en français. Il y a, en effet, deux rattachements possibles de l’adjectif (Fuchs 1996)
:
soit il dépend du nom, il n’y a pas de frontière syntaxique entre N et A — la
parenthétisation est V(NA) —, et il est épithète
soit il dépend du verbe ; il y a une frontière entre V et N — la structure
syntaxique est ((V N) A) — l’adjectif est alors attribut du complément
d’objet (abrégé en a.c.o. selon Riegel, Pellat & Rioul 1994).
Riegel (1991) propose un ensemble de tests pour mettre en évidence la fonction
d’un adjectif selon tel ou tel emploi et ainsi faire la distinction entre épithète et
a.c.o. Par exemple, en prenant une phrase du corpus qui sera étudié:
Il a acheté cette voiture neuve.
Table 1:
épithète a.c.o.
pronominalisation Il l’a achetée. Il l’a achetée neuve.
interrogation en qu(e)
ou qu’est-ce qu(e)
Qu’est-ce qu’il a acheté ? Qu’est-ce qu’il a acheté
neuf ?
transformation en:
nom+relative
Cette voiture neuve qu’il a
achetée.
Cette voiture qu’il a achetée
neuve.
passivation Cette voiture neuve a été
achetée.
Cette voiture a été achetée
neuve.
Extraction en
c’est… que
C’est cette voiture neuve
qu’il a achetée.
C’est cette voiture qu’il a
achetée neuve.
détachement Cette voiture neuve, il l’a
achetée.
Cette voiture, il l’a achetée
neuve.
(Un septième test (en fait, le troisième dans la liste qu’il présente) semble difficile
à appliquer, même dans l’exemple qu’il donne (Le jury a jugé ce travail
remarquable.); il s’agit de l’interrogation en comment :
62
Table 2:
épithète a.c.o.
interrogation en
comment
Comment a-t-il acheté cette
voiture neuve?
? Comment a-t-il acheté
cette voiture ?
Ce test, implicitement, tend à considérer l’adjectif en fonction a.c.o. comme un
complément circonstanciel ; c’est pourquoi il semble préférable de ne pas
l’utiliser.)
Si la distinction, à l’écrit, entre les deux fonctions est délicate, on peut se
demander s’il n’y a pas, à l’oral, des indices permettant de lever cette ambiguïté.
Les locuteurs pourraient, en effet, à l’encodage, ajouter des éléments prosodiques
que les auditeurs seraient, au décodage, habitués à retrouver. L’étude présentée ici
s’attachera à mettre en évidence, dans la prosodie des phrases, l’existence
éventuelle d’indices permettant d’opposer les deux fonctions de l’adjectif.
1.1. a.c.o. essentiel et a.c.o. accessoire (ou accidentel)
Après certains verbes par exemple d’opinion (juger, croire, trouver, voir, sentir,
etc.) ou déclaratifs (dire, prétendre, assurer, affirmer, etc.), l’adjectif attribut du
complément d’objet est considéré comme essentiel car il détermine l’acception de
ces verbes (d’après Noailly 1999: 120, et Le Goffic 1993, en particulier § 263).
Avec ces verbes, d’après Fuchs (1996: 133), apparaissent des constructions dites
"réduites",
soit réduction d’une complétive en que si l’adjectif a une fonction a.c.o.
essentiel : J’ai cru que cet homme était sincère.
soit réduction d’une relative si l’adjectif a une fonction épithète : J’ai cru
cet homme qui était sincère.
Le résultat de la réduction est : J’ai cru cet homme sincère. (phrase 10 du corpus).
Avec les autres verbes, il n’y a pas de réduction, ce sont des attributs accessoires.
Dans le corpus qui sera étudié, on peut ainsi faire une distinction entre :
1. Il a trouvé cette idée folle.
10. J’ai cru cet homme sincère.
où, si l’adjectif est attribut de l’objet, il est considéré comme attribut essentiel de
l’objet ─ ce seront les seuls attributs essentiels du corpus ─, et, par exemple :
6. Il boit son chocolat froid.
8. J’ai connu cet homme intraitable.
où, le cas échéant, sera analysée une fonction attribut accessoire de l’objet.
On peut faire remarquer à propos de la phrase 1 du corpus :
Il a trouvé cette idée folle.
que la langue anglaise fait intervenir l’ordre des mots pour éviter l’ambiguïté
créée par les deux fonctions possibles, épithète ou a.c.o., de l’adjectif ; en effet,
s’il est épithète, on a la phrase :
He found this crazy idea.
63
et inversement, s’il est attribut de l’objet, il est postposé au nom :
He found this idea crazy.
Il est alors tout à fait justifié de supposer que, s’il y a désambiguïsation dans une
langue par des moyens syntactiques, des indices prosodiques pourront avoir le
même rôle dans une autre langue.
1.2. Sémantisme des deux fonctions
La fonction épithète ou attribut de l’objet fera ressortir telle ou telle clique d’un
verbe, une clique étant un sens microscopique, selon le dictionnaire des
synonymes du CRISCO, et un sous-graphe complet maximal dans la représen-
tation graphique de la synonymie d’un mot, selon Ploux et Victorri (1998).
Par exemple pour la phrase n° 1 : trouver = 22 : concevoir, créer, découvrir,
imaginer, inventer, trouver, avoir avec adjectif épithète; = 48 : estimer, juger,
penser, trouver, être d'avis avec adjectif attribut de l’objet.
Par ailleurs, les objets possèdent une propriété intrinsèque, ontologique (pour
reprendre le terme de Thomas 2003) ; par exemple pour la phrase n° 3 du corpus
présenté ci-dessous, la propriété d’un feu de circulation peut être sa couleur.
L’adjectif vient définir cette propriété. De même, la propriété, à la phrase n° 6, du
chocolat est sa température. L’adjectif épithète la définit de façon « durable »
(pour reprendre le terme de Blanche-Benvéniste 1991), tandis que l’adjectif
attribut de l’objet la définit de façon passagère.
Pour cette phrase n° 6 ainsi que pour la phrase n° 8, il y aurait avec les attributs de
l’objet, selon Fabienne Martin (2006), simultanéité de deux procès et
juxtaposition de deux prédicats, le second étant un dépictif (prédicat second
descriptif) :
6. Il boit son chocolat froid. (=Il boit son chocolat alors qu’il est froid)
8. J’ai connu cet homme intraitable. (=J’ai connu cet homme alors qu’il était
intraitable.)
Fabienne Martin oppose les dépictifs aux prédicats seconds résultatifs que
l’on trouve dans les phrases n° 4 et n° 5 du corpus :
4. Il a rendu son devoir irréprochable.
5. Il a gardé sa chemise propre.
Dans la première de ces deux phrases, le caractère irréprochable est le résultat
obtenu par le premier procès. Même si c’est moins évident pour la seconde phrase,
l’aspect propre de la chemise est le résultat d’un procès implicite de protection.
Une autre opposition sémantique subjectif/objectif peut être véhiculée par
cette différence de fonction. Les attributs essentiels des phrases n° 1 et n° 10
s’opposent en effet par leur aspect subjectif au caractère objectif conféré par la
fonction épithète. Par exemple dans la phrase n° 10, la sincérité de l’homme est le
fruit d’une impression ou d’un jugement dans un cas et une réalité dans l’autre
cas.
64
Un aspect objectif et durable véhiculé par la fonction épithète s’oppose ainsi
au caractère subjectif et éphémère apporté par la fonction attribut de l’objet avec
des nuances circonstancielles de simultanéité (dans les dépictifs) ou de finalité
(dans les prédicats seconds résultatifs).
2. Étude présentée
Dans cette étude, on a cherché à mettre en évidence une différence dans la
prosodie de deux phrases, identiques d’un point de vue segmental, mais
comportant
l’une, un adjectif épithète
l’autre, le même adjectif en fonction attribut de l’objet.
Par exemple, la phrase 1 du corpus Il a trouvé cette idée folle. peut se paraphraser
en Il a conçu cette idée folle. (adjectif épithète) d’une part, et en Il a jugé cette
idée folle. (adjectif a.c.o.) d’autre part.
Pour parvenir à ce résultat, la même phrase d’un point de vue segmental a été
placée dans deux cotextes différents induisant deux fonctions différentes de
l’adjectif. Ces cotextes étaient très simples, n’avaient rien de littéraire, mais
avaient été imaginés dans le seul but de donner une fonction très distincte à
l’adjectif. Par exemple, pour cette première phrase :
Cotexte 1 : Il cherche toujours à se faire remarquer. Il a trouvé cette idée
folle. Il s’est acheté une chemise violette.
Cotexte 2 : Elle lui a suggéré d’acheter une chemise violette. Il a trouvé
cette idée folle.
Ou, pour prendre la phrase 6 du corpus : Il boit son chocolat froid.
Cotexte 1 : Il fait très chaud. Il entre dans un café et se commande un
chocolat froid. Il regarde sa montre. Il boit son chocolat froid. Il sort.
Cotexte 2 : Il se sert son chocolat bien chaud, bien fumant. Il s’attarde plus
qu’il ne l’aurait fallu à sa lecture. Il boit son chocolat froid. Il part travailler.
10 phrases ont ainsi été réunies dans un petit corpus1, en pratique deux corpus,
l’un avec les adjectifs épithètes, l’autre avec les adjectifs en fonction attribut de
l’objet.
1 Un premier corpus réalisé par une seule locutrice a d’abord été analysé dans une étude préliminaire qui a
été présentée à un groupe de recherche sur l’adjectif du CRISCO. Ce corpus a été modifié, en partie grâce
aux remarques faites par des membres du groupe, car il comportait deux phrases présentant des problèmes
dans l’analyse.
1° il y avait une phrase qui était en quelque sorte une "intruse", puisque l’adjectif est susceptible
d’être non pas attribut du complément d’objet, mais attribut du complément du présentatif. Il s’agissait de
la phrase Voilà la question insoluble. Elle figurait dans le corpus pour tester sur le plan prosodique ce que
disent RIEGEL et coll. (1994) à propos de l’attribut du complément du présentatif (p. 241), à savoir :
Les séquences introduites par les présentatifs voici, voilà et par le verbe impersonnel
falloir occupent la position structurelle d'un c.o.d. Elles peuvent être suivies d'un élément
prédicatif fonctionnant comme un a.c.o. : Le voici enfin libre. Mais l’analyse a montré que
65
Le but était de dégager des indices utilisés pour encoder les deux fonctions
et non pas de faire une étude statistique exhaustive et rigoureuse de la production
des phrases avec adjectif épithète ou attribut de l’objet. Ce corpus (cf figure 1) a
donc été enregistré par six locuteurs (dont moi-même2), trois femmes (désignées
par loc1, loc2 et loc3) et trois hommes (loc4, loc5 et loc6). Le corpus des adjectifs
épithètes a toujours été enregistré avant celui des attributs de l’objet, selon l’ordre
adopté dans la présentation de la figure 1.
la réalisation prosodique d’une telle phrase était différente de celle comportant un adjectif
a.c.o.
2° une autre phrase a été modifiée car la forme phonétique de l’adjectif était la même au masculin et au
féminin. Il s’agissait de :Il a acheté cette voiture chère. Ceci avait pour conséquence une réalisation de cet
adjectif comme l’adverbe cher. La réalisation prosodique de cette phrase se distinguait nettement de celle
des autres, car l’adjectif chère n’avait plus la fonction a.c.o. mais complément circonstanciel.
2 Il n’y a pas de différence significative entre ma réalisation et celle de chacun des deux locuteurs
masculins. Ceci est prouvé par l’application de 2 tests de Wilcoxon signés (degré de significativité à .05)
sur les différences des moyennes, pour les 3 derniers paramètres, entre chaque phrase avec adjectif
épithète et la phrase correspondante avec adjectif a.c.o. ; par ailleurs, je n’ai pas réalisé de pausette.
66
3. Contributions antérieures à la description de la prosodie de l’adjectif
épithète et a.c.o.
Il n’y a pas de véritables études de la prosodie des phrases dans lesquelles
apparaissent les adjectifs épithètes et attributs de l’objet, mais des descriptions
"impressives" plus ou moins détaillées. La plus récente date de 1999, c’est celle de
Noailly (p.120), qui commente la phrase : Lise voudrait un mur jaune.
en disant : on est indécis, tout dépendant du contexte, et de l'intonation, plus ou
moins liée.
Ce qui peut être glosé de la façon suivante :
l’adjectif jaune est épithète si l’intonation est liée
il est attribut de l’objet s’il y a une rupture dans l’intonation.
Cette rupture dans l’intonation apparaissant dans des phrases avec des adjectifs
attributs de l’objet avait déjà été décrite par Damourette et Pichon (1911-1940)
dans le tome II de leur ouvrage (p.18), à propos de la phrase : Je veux ma robe
rouge.
Selon eux :
La confusion peut se produire pour la dianathète de l'ayance. Soit la phrase:
« Je veux ma robe rouge ». Rouge est-il épithète ou diathète ? Plusieurs critères
permettent de préciser :
S'il y a pausette après robe, on a affaire à une (échoite) dianathète :
en même temps que je commande ma robe, j'indique ma volonté qu'elle soit
rouge; s'il n'y a pas de pause, on a affaire à une épithète: j'exprime la volonté
d'avoir celle de mes robes qui est rouge. L'allocutaire est donc renseigné sur
l'intention du locuteur par la pause vocale.
Selon le glossaire des termes spéciaux ou de sens spécial employés dans
l’Essai de grammaire qu’ils font figurer dans leurs Compléments :
DIATHÈTE : attribut à valeur adjective : « Je suis grand.»
ÉCHOITE : attribut d’un complément autre que le sujet.
DIANATHÈTE : attribut à valeur adjective d’attache moyennement
serrée : « Petit poisson deviendra grand.»
AYANCE : complément direct d’objet.
On en déduit que l’expression dianathète de l’ayance désigne la fonction attribut
du complément direct d’objet.
4. La pausette Damourette et Pichon emploient, dans le passage cité, le terme de pausette fondé
sur une classification des pauses proposée dans le tome I (§169 p.188). Ils
distinguent trois types de pauses :
67
1. Grandes pauses : pauses finales des phrases marquées d’ordinaire par le
point.
2. Pausules : petites pauses marquées d’ordinaire par la virgule.
3. Pausettes : très petites pauses pour lesquelles la graphie actuelle ne
dispose malheureusement d’aucun signe de ponctuation, encore que le
besoin s’en fasse à chaque instant sentir.
Ou plus simplement selon leur glossaire :
PAUSULE : pose (sic) vocale marquée ordinairement par une virgule.
PAUSETTE : pose vocale moindre que celle marquée ordinairement par une
virgule.
Donc la présence d’une courte pause entre le nom et l’adjectif est, selon eux,
l’indice prosodique d’une fonction attribut de l’objet, l’absence d’une telle pause
contribuant à faire interpréter l’adjectif comme épithète.
5. A la recherche de pausettes
La première question à laquelle il fallait répondre était si les pausettes de
Damourette et Pichon était un indice permettant d’opposer adjectifs épithètes et
attributs de l’objet. Dans l’affirmative on pouvait alors se demander quelle était
l’importance de cet indice, s’il apparaissait systématiquement. Il suffisait pour cela
de chercher un silence entre le nom et l’adjectif subséquent. Cette analyse a révélé
la présence de six pausettes entre 50 ms et 100 ms (la moyenne étant de 72 ms)
Sur les 60 phrases avec un adjectif en fonction a.c.o. , cela ne représente que 10%
de ce qui aurait pu être réalisé. Il n’y a qu’une seule pausette réalisée par un
homme, toutes les autres se trouvant dans la réalisation des 3 locutrices.
L’exemple de Damourette et Pichon pour illustrer le rôle de la pausette comme
indice prosodique de l’a.c.o., est en conformité avec cette constatation, puisqu’il
ne peut être prononcé que par une locutrice : Je veux ma robe rouge.
Et 3 des 6 pausettes apparaissent dans la phrase 8, entre homme et
intraitable de : J’ai connu cet homme intraitable (suivi de la phrase : Il est
maintenant doux comme un agneau.). Ceci est illustré dans la Figure 2 où on peut
remarquer une pausette de 68 ms entre homme et intraitable dans la courbe
supérieure.
68
Figure 2. Mise en regard de la forme de deux courbes pour la phrase 8 réalisée
par la première locutrice (loc1). On remarquera la pausette de 68 ms (entre homme
et intraitable) manifestée par l’interruption du trait dans la courbe supérieure.
D’autres indices prosodiques ont été cherchés. Ils seront d’abord présentés
séparément puis leur importance relative sera évaluée.
6. Comparaison de la forme des courbes
Elle a été appréhendée par des mesures prises sur les voyelles : leur durée et leur
fréquence ; plus exactement :
la fréquence initiale du fondamental
la fréquence finale du fondamental
La fréquence du fondamental des consonnes voisées n’a pas été mesurée pour la
comparaison entre voyelles, mais elle a été prise en compte pour la description de
la fin des phrases. La mesure de la fréquence a été effectuée le plus souvent par :
l’AMDF (Average Magnitude Difference Function Pitch Extractor) proposé
par Ross et al. (1974), mais, parfois, pour des parties où le signal était trop
faible, trois autres algorithmes ont été utilisés :
la fonction peigne proposée par Martin (1981),
un algorithme fondé sur une méthode d’autocorrélation de Boersma (1993),
une simple F.F.T. (Fast Fourier Transform), qui a , semble-t-il, donné des
mesures précises.
69
Ces algorithmes sont proposés dans trois suites logicielles : PHONÉDIT, SPEECH
ANALYZER, et PRAAT. La hauteur du fondamental a ensuite été évaluée par une
conversion des fréquences en demi-tons avec 100 Hz comme valeur de référence
(100 Hz = 0 demi-ton, toute valeur inférieure à 100 Hz, exprimée en demi-tons,
devenant négative).
7. La durée des voyelles
Une stratégie de différenciation des fonctions aurait pu être d’allonger les voyelles
des phrases comportant un adjectif a.c.o. ; Une comparaison, voyelle par voyelle
(échantillons appariés) des durées pouvait montrer que les voyelles des phrases
comportant un adjectif attribut de l’objet sont plus longues que celles des phrases
comportant un adjectif épithète ; c’est ce qui a été vérifié en comparant pour
chaque paire de voyelles, leur durée. Pour chaque phrase le degré de signification
des différences a été vérifié soit par le test du t de Student (si les distributions des
échantillons étaient normales selon le test de Liliefors) soit par le test de
Wilcoxon.
Ceci n’est avéré que pour 3 phrases, n°1, 4 et 6, prononcées par la première
locutrice (loc1) où les :
voyelles des phrases avec a.c.o. sont plus longues que celles des phrases avec
adjectif épithète de 20,3 ms de moyenne (degré de signification du t de Student <
.02).
Un modèle mixte3, avec, en variable dépendante, la durée, en facteur fixe, la
catégorie épithète/a.c.o. et, en facteur aléatoire, les locuteurs, a été utilisé pour en
tester la pertinence de cet indice dans la distinction des catégories. Ceci a été
confirmé avec un degré de significativité inférieur à .01 (< .0001).
8. Recherche d’une différence de hauteur systématique
Pour chaque paire de voyelles, la hauteur a été comparée. Pour chaque phrase le
degré de signification des différences a été contrôlé soit par le test du t de Student
(si les distributions des échantillons étaient normales selon le test de Liliefors) soit
par le test de Wilcoxon.
Il y a différence significative de hauteur pour 10 phrases, 7 différences
positives (soit 11.66%) et 3 différences négatives (soit 5%). Les phrases 4 et 8,
présentent à elles-seules 5 des différences positives et on se souvient qu’elles
3 Dans le modèle mixte utilisé pour cette étude, ce sont les moyennes de la durée des voyelles, d’une
part, des phrases avec adjectif épithète et d’autre part, de celles avec adjectif a.c.o. , qui ont été comparées. Les deux autres modèles mixtes, dont il sera question plus loin, portaient sur les moyennes des hauteurs et celles des montées mélodiques finales.
70
comportent 4 des 6 pausettes apparues dans le corpus, 3 pour la phrase 8, et 2 pour
la phrase 4. Mais cette phrase 4 présente une différence négative de 3 tons.
Un modèle mixte, analogue à celui utilisé pour confirmer la pertinence de
l’indice de durée, mais avec la hauteur en variable dépendante, a été utilisé. Un
degré de significativité inférieur à .01 (= .007) a, là-aussi, confirmé cette
pertinence.
9. L’indice de la montée mélodique finale
48 des 120 phrases (soit 40%) présentent une montée finale de la mélodie; cette
montée se réalise parfois seulement au sein de la dernière voyelle : il y a alors
glissando (montée mélodique au sein d’une voyelle), mais la montée peut aussi
commencer depuis le début de la consonne précédant cette voyelle finale et
continuer jusqu’à la fin de la consonne suivante : dans ces deux derniers cas il y
aurait ce qu’on peut appeler « montée intraconsonantique» que l’on peut opposer à
« glissando simple ». A deux ou trois reprises, la montée se termine dans la
réalisation d’un schwa, s’étendant sur deux voyelles et une consonne. Quatre cas
de figures sont donc possibles :
glissando
montée intraconsonantique + glissando
montée intraconsonantique + glissando+ montée intraconsonantique
montée intraconsonantique + glissando+ montée intraconsonantique +
schwa (cf. figure 3)
Figure 3. Phrase 3 Il a vu le feu orange. avec adjectif a.c.o. prononcée par le
locuteur 6. Cette réalisation a la particularité de présenter une montée mélodique
finale d’un peu plus de 7 tons (montée la plus importante relevée dans cette étude).
Parmi ces phrases, 29 comportent un adjectif attribut de l’objet, (ce qui
constitue environ 24% des phrases cette catégorie), et 19 phrases ont un adjectif
épithète. Le tableau 1 présente les données concernant les deux ensembles de
phrases.
Là-aussi, la pertinence de cet indice est confirmée par un modèle mixte (avec la
montée mélodique finale en variable dépendante), le degré de significativité étant
ici égal à .001.
71
10. Étude de l’importance relative de chaque indice
Des analyses en termes d’agrégation ("clustering"), en l’occurrence des
classifications k-means, ont été pratiquées pour vérifier l’importance de chaque
indice dans la discrimination entre les deux fonctions épithète et attribut de l’objet.
Pour chaque analyse, il y avait 1000 itérations.
L’idéal aurait été d’obtenir une répartition avec 60 phrases comportant un
adjectif épithète dans une classe 1 et 60 phrases avec adjectif a.c.o. dans une classe
2. Le résultat est très loin de ce qui aurait été souhaité. Mais même si on s’en
approche que de très loin, les proportions entre les différents résultats peuvent
servir à évaluer l’importance relative des 4 indices. Ainsi la différence entre le
nombre de phrases avec adjectif épithète et celui de phrases avec adjectif a.c.o.
dans chaque classe peut donner une estimation de l’importance de chaque indice
pour la discrimination entre les deux fonctions. Ceci est résumé dans le tableau 2.
Tableau 2. Résultat des tests de classification k-means. La classe 1 comporte
(sauf exception pour l’indice de la hauteur) un plus grand nombre de phrases avec
adjectif épithète et la classe 2 un plus grand nombre de phrases avec adjectif
a.c.o. ; la différence entre chaque type de phrase dans chaque classe figure à la
dernière ligne.
pausette montée mélodique finale durée hauteur
classe 1 2 1 2 1 2 1 2
phrases avec adjectif
épithète 60 0 47 13 39 21 30 30
phrases avec adjectif
a.c.o. 54 6 43 17 37 23 30 30
différence 6 4 2 0
Il apparaît que la pausette est l’indice qui a la fonction discriminante la plus
importante. Ceci peut paraître surprenant parce qu’il n’y a que 6 pausettes
réalisées sur les 60 possibles. Mais la manifestation des autres indices est aussi très
restreinte. La montée mélodique finale vient après, dans cette classification, avec
une différence de 4 éléments dans chaque classe (4 phrases avec adjectif épithète
de plus dans la classe 1, et 4 phrases avec adjectif a.c.o. de plus dans la classe 2).
Le rôle discriminant de l’indice de la durée est très réduit puisqu’il n’y a une
différence de 2 éléments. Enfin l’indice de la hauteur des voyelles n’a aucun rôle
discriminant puisqu’on a un nombre égal de phrases avec un adjectif de chaque
fonction dans les deux classes.
La hiérarchie des indices est donc la suivante : pausette, montée mélodique
finale, durée et hauteur. La hauteur ne semblant avoir aucune fonction
discriminante, il n’a pas la fonction d’un indice permettant de différencier les deux
fonctions de l’adjectif contenu dans les phrases. Par ailleurs, ce résultat est obtenu
72
à partir d’un nombre de locuteurs trop faible pour qu’il soit significatif d’un point
de vue statistique.
Néanmoins quelques indices susceptibles d’être utilisés à l’encodage, ont
été dégagés.
On pouvait se demander, s’ils intervenaient au décodage, comment les
auditeurs faisaient pour obtenir une indication sur la fonction d’un adjectif placé
immédiatement après un nom complément d’objet direct ; c’est ce qui a été
cherché par des tests de perception.
11. Tests de perception
Pour savoir plus précisément quel est le rôle de chaque indice dans l’indication de
la fonction, deux tests faisant intervenir des stimuli naturels et synthétiques ont été
préparés à partir de deux phrases du corpus prononcées par la première locutrice
(loc1) : la phrase n° 8 : J’ai connu cet homme intraitable. et la phrase n° 6 : Il boit
son chocolat froid.
Pour chaque test, la phrase avec adjectif épithète et celle avec adjectif a.c.o.
prononcée par la locutrice 1 a constitué respectivement le stimulus st000 et le
stimulus st100. Les autres stimuli ont été manipulés à partir de ces phrases grâce
au logiciel Praat. Dans le premier test, la durée de la pausette était de 68 ms et
dans le second test la montée mélodique finale était de de 5.5 demi-tons. La nature
des stimuli est résumée dans le tableau 3. Après une phase d’écoute préliminaire,
ces stimuli ont ensuite été présentés dans un ordre aléatoire, à 7 s d’intervalle, à
18 sujets (étudiants de 1ère
année de lettres modernes). Chaque série de stimuli
était répétée 3 fois.
Tableau 3. Description des stimuli utilisés pour les deux tests de perception ; la
pausette avait une durée de 68 ms et la montée mélodique finale était de 5,5 demi-
tons.
73
Une feuille de réponses devait être remplie de la façon suivante ; dans le premier
test (phrase : J’ai connu cet homme intraitable.), pour chaque stimulus présenté, il
était demandé de cocher une case oui ou non en réponse à la question : l’homme a-
t-il changé ? La question dans le second test (phrase : Il boit son chocolat froid.)
était : le chocolat a-t-il refroidi ? et il fallait aussi cocher une case. Les cas de
refus de réponse ont été pris en compte dans l’analyse (ils seront notés nsp sur les
graphiques).
L’analyse globale des résultats a été l’occasion de constater la difficulté des
tests puisque aucune conclusion significative d’un point de vue statistique n’a pu
être obtenue pour les 18 auditeurs. C’est pourquoi il a fallu faire une sélection
parmi les résultats : 6 auditeurs ont donc été choisis en fonction de la diversité et
de l’exactitude de leurs réponses en ce qui concernait les stimuli non modifiés.
Une analyse factorielle des correspondances (AFC) a été pratiquée, pour le
premier test, sur les données obtenues, représentées selon le tableau de
contingence de la figure 4. Le test d’indépendance entre lignes et colonnes (khi2)
est significatif à .006.
Figure 4. Tableau de contingence des résultats du premier test ; sur les 18 sujets
initiaux, 6 ont été sélectionnés et il y a eu 3 présentations d’une série de 8 stimuli,
ce qui fait un total de 144 données.
nsp aco
épi
0
2
4
6
8
10
12
14
st000 st001
st010 st011
st100 st101
st110 st111
Colonnes Lignes
Vue 3D du tableau de contingence
74
La figure 5 représente le "mapping" de cette analyse. Il montre une répartition très
nette des stimuli
selon l’axe F1 :
o les stimuli dont l’adjectif est perçu en fonction a.c.o. comportent
tous une pausette
o les stimuli dont l’adjectif est perçu en fonction épithète ne
comportent pas de pausette
o une exception cependant quand la pausette est le seul indice de la
fonction a.c.o. (st100), le stimulus est à la limite mais perçu avec
adjectif épithète.
Selon l’axe F2 :
o une élévation de hauteur moyenne des phrases assure une meilleure
identification de la fonction a.c.o. de l’adjectif
o l’absence d’une telle élévation laisse apparaitre le doute, même s’il y
a présence simultanée d’une pausette et d’un allongement relatif
(st101)
Figure 5. Mapping de l’analyse factorielle des correspondances sur les résultats du
premier test. L’axe F1 représente, d’un côté, les phrases avec pausette et, de
l’autre, les autres phrases. Selon l’axe F2, une élévation de hauteur moyenne des
phrases assure une meilleure identification de la fonction a.c.o. de l’adjectif et
l’absence d’une telle élévation laisse apparaitre le doute.
aco
épi
nsp
st000 st001
st010 st011
st100
st101
st110
st111
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
-1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2
F2 (
15
,54
%)
F1 (84,46 %)
Graphique symétrique (axes F1 et F2 : 100,00 %)
Colonnes Lignes
75
Une AFC a aussi été pratiquée sur les données du second test, leur tableau de
contingence est présenté figure 6. Le test d’indépendance entre lignes et colonnes
(khi2) est ici encore plus significatif (.0001).
Figure 6. Tableau de contingence des résultats du second test. Comme il n’y avait
que 4 stimuli par série, le nombre de données présentées n’est que de 72.
La figure 7 représente le mapping de cette analyse. On remarque que:
selon l’axe F1, la montée mélodique finale assure exactement le même rôle
que la pausette dans l’identification de la fonction a.c.o. (stimuli avec
montée) et épithète (stimuli sans montée). Il y a donc ici analogie avec les
résultats du premier test.
l’axe F2 montre que la présence de la montée mélodique finale permet à
elle-seule une reconnaissance sûre de la fonction de l’adjectif,
contrairement à la pausette.
nsp
aco
épi
0
2
4
6
8
10
12
14
16
18
st000 st100
st011
st111
Colonnes Lignes
Vue 3D du tableau de contingence
76
Figure 7. Mapping de l’AFC sur les résultats du second test. L’axe F1 montre que
la montée mélodique finale est perçue comme un indice prosodique de la fonction
a.c.o. ; Il y a analogie et complémentarité, au niveau de la perception , entre
montée mélodique finale et pausette.
12. Conclusion
Deux indices principaux permettant d’opposer la fonction épithète et a.c.o. à
l’encodage comme au décodage se dégagent : la montée mélodique finale et la
pausette. Mais il est surprenant de constater que la pausette, le seul indice qui ait
déjà été décrit est peu utilisé, et à une exception près, que par des locutrices.
La montée mélodique finale est l’objet d’un emploi plus important, mais on
la trouve aussi dans des phrases avec adjectif épithète ; ceci n’entrave pas son rôle
discriminant entre les deux fonctions comme le prouve le test de perception.
Enfin, une hauteur moyenne plus importante de la phrase vient renforcer le
rôle discriminant de la pausette au décodage. Ces indices sont peu utilisés mais ils
relèvent de la prosodie, ce qui explique leur caractère facultatif. Cette étude a
porté sur un corpus de phrases lues. La validité des indices trouvés doit maintenant
être vérifiée sur des corpus de parole spontanée.
Références bibliographiques
Bachelet, R. (2010). L’analyse factorielle des correspondances. http://rb.ec-
lille.fr
épi aco
nsp
st000
st100
st011
st111
-0.6
-0.4
-0.2
-1E-15
0.2
0.4
0.6
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2
F2 (
12
,40
%)
F1 (87,60 %)
Graphique symétrique (axes F1 et F2 : 100,00 %)
Colonnes Lignes
77
Blanche-Benvéniste, C. (1991). Deux relations de solidarité utiles pour l’analyse
de l’attribut. Gaulmyn, M.M, Rémi-Giraud, S. & Basset, L. (éds), À la
recherche de l'attribut, PUL, Lyon, pp. 83-98.
Boersma, P. (1993). Accurate short-term analysis of the fundamental frequency
and the harmonics-to-noise ratio of a sampled sound. Proc. of the Institute of
Phonetic Sciences of the University of Amsterdam 17, pp.97-110.
Cibois, P. (2007). Les méthodes d’analyse d’enquêtes. PUF, Paris. CRISCO (2011), Dictionnaire des synonymes, http://www.crisco.unicaen.fr/ Damourette, J. & Pichon, E. (1911-1940). Des mots à la pensée. Essai de
grammaire de la langue française, tomes I, II et Compléments. D'Artrey, Paris.
Fuchs, C. (1996). Les ambiguïtés du français. Ophrys, Gap-Paris. Le Goffic, P. (1993). Grammaire de la phrase française. Hachette, Paris. Martin, F. (2006). Prédicats statifs, causatifs et résultatifs en discours –
Sémantique des adjectifs évaluatifs et des verbes psychologiques. Thèse présentée à l’Université libre de Bruxelles.
Martin, Ph. (1981). Mesure de la fréquence fondamentale par intercorrélation avec une fonction Peigne. Actes des XIIèmes Journées d’Étude sur la Parole, Montréal.
Noailly, M. (1999). L'adjectif en français. Ophrys, Gap-Paris. Ploux, S. (1997). Modélisation et traitement informatique de la synonymie.
Linguisticae Investigationes, 21/1, pp.1-28. Ploux, S. & Victorri, B. (1998). Construction d’espaces sémantiques à l’aide de
dictionnaires de synonymes. Traitement automatique des langues 39, n°1, pp.161-182.
Riegel, M. (1991). Pour ou contre la notion grammaticale d'attribut de l'objet: critères et arguments. Gaulmyn, M.M., Rémi-Giraud, S. & Basset, L. (éds), À la recherche de l'attribut. PUL, Lyon, pp.99-118.
Riegel, M., Pellat, J.C. & Rioul, R. (1994). Grammaire méthodique du français. PUF, Paris.
Ross, M.J., Schaeffer, H.L., Cohen, A., Freudberg, R. & Manley, H.J. (1974).
Average Magnitude Difference Function Pitch Extraction. IEEE Trans ASSP-
22, pp.353-362. Thomas, I. (2003). Quels types de données pour la traduction automatique de
l’adjectif qualificatif dans les groupes ADJ NOM/NOM ADJ : vers une approche ontologique et contextuelle. Bulletin de Linguistique appliquée et générale 28, pp.255-274.
Logiciels utilisés PHONÉDIT développé par la société S.Q.Lab en collaboration avec le Laboratoire
Parole et Langage d’Aix-en-Provence (C.N.R.S. URA 261). PRAAT logiciel d’analyse et de synthèse de la parole développé par Paul
Boersma and David Weenink, Phonetic Sciences, University of Amsterdam. SPEECH ANALYZER version 3.0.1. (2007) développé par la SIL (Dallas) XLSTATS logiciel de statistiques et d’analyse de données développé par
Addinsoft
78
OBITUARIES
Eli Fischer-Jørgensen (1911- 2010)
At the age of ninety-nine, Emeritus Professor Eli Fischer-Jørgensen died at her
home in Denmark in February, 2010. This marked the end of a very long and
distinguished career that had begun in 1929 with studies of the French and
German languages which were firmly in the Danish tradition stemming from great
scholars of the linguistic sciences, such as Otto Jespersen. While still a student,
she was accepted into the Linguistic Circle of Copenhagen, which was famous for
the "glossematic" theories of Louis Hjelmslev. He was a scholar who may be easy
to overlook due to the fact that he collaborated with a colleague (Poul Andersen)
to produce a practical textbook for their students of phonetics. While still a
student, Eli developed her lifelong passion for integrating observational and
instrumental phonetic work with phonological theory. Graduating MA in 1936,
she set off on travels to and sojourns in places which included Marburg (for
German dialectology), Paris to work with Martinet and Marguérite Durand and
Berlin to study with Eberhard Zwirner. Returning home just before the outbreak of
World War II, she got work in the Department of German which, in due course,
morphed into a lectureship in phonetics created for her under the aegis of
Hjelmslev.
After the War, she extended her experience by visits to London to the
Phonetics Department at University College to study with Jones and Hélène
Coustenoble and also to the School of Oriental and African Studies to attend
lectures by J. R. Firth and on Yoruba and Chinese as well. Other journeys took her
to America to the Haskins Laboratories and to Stockholm to cooperate with
Gunnar Fant. At home, her work became recognised by the creation of a Chair of
79
Phonetics for her in 1966 and an associated institute. Fruitful connections with
colleagues at Lund also followed.
As time went by, she became the host herself of researchers from abroad,
including individuals from Japan, Edinburgh, Berkeley and Germany. A most
memorable and brilliantly managed (very much by her) occasion was the 1979
visit to Copenhagen for the Ninth International Congress of Phonetic Sciences.
This was something of a swan song for her since two years later, on her reaching
70, regulations no doubt required her to relinquish her post.
Her varied publications were far too many to detail here. They included a
classic account of the Danish stød, the historical Tryk i ældre dansk (on Stress in
Old Danish), Trends in phonological theory and her accounts of the phonetic
symbolisms of vowels. Nor should her modest, concise, clear summary of general
phonetics for her Danish students, Almen Fonetik, be quite forgotten. She was held
in high esteem amongst her friends for her gifted water colours. She'll be
remembered for a long time to come.
Jack Windsor-Lewis
Eva Sivertsen (1922-2010)
Eva Sivertsen was born on the 8th
of July, 1922 at Trondheim, the ancient city on
the shores of a fjord in the middle of Norway's thousand-mile coastline. She
graduated in English at the University of Oslo continuing her studies there with a
Ph.D. on the famous dialect of working-class Londoners, known as Cockney. This
activity developed after some years of further work into the 280-page book
published by Oslo University Press in 1960 as Cockney Phonology. She did much
of her work on Cockney from a base at University College London's Department
of Phonetics, but also lived for a while among her main informants at a social
settlement in the East End area of Bethnal Green. Besides the influence of the
contemporary and previous UCL staff which she clearly acknowledged, she
80
became a great enthusiast for the work of the American structuralists. The
influence of Charles F Hockett certainly pervades the whole book.
A three-page review of it in Le Maître Phonétique by J. D. O'Connor began
"the standard work on Cockney Phonetics has now been written" and ended with
"altogether a splendid book". She included in it also an admirable "conspectus of
the general problems posed by the phonological analysis of English" thus making
it "two books in one".
Besides being a brilliant scholar she was an equally gifted administrator, as
was seen when she became a principal organiser of the Eighth International
Congress of Linguists in 1957 and edited its volume of Proceedings. In 1960, she
headed the Department of English at Trondheim University. She ultimately
became the Rektor of the whole University. She always maintained an interest in
the teaching of English as an extra language in its grammar and other linguistic
features, as well as its phonology. She was an outstandingly energetic person
physically, as well as intellectually — much given to outdoor pursuits with
remarkable endurance. She never married, but she had many friends by whom she
was well liked.
Jack Windsor-Lewis
Gösta Bruce (1947-2010)
(picture courtesy of Daniel Bruce)
Gösta Bruce, Professor of Phonetics at Lund University, Sweden, passed away on
June 15, 2010, following a short period of hospitalization. He was 63 years old.
Gösta Bruce is survived by his wife, Barbro, and his children Sara (with partner
Valtteri), Daniel, and Niklas.
Born and brought up in the southern Swedish town of Helsingborg, Gösta
chose to continue his higher education at Lund University, 60 km south of
81
Helsingborg. After an undergraduate degree in Russian, Gösta went on to study
phonetics, drawn to the department where Bertil Malmberg and Kerstin Hadding
had developed the field of phonetics as an experimental discipline at the
Humanities faculty at Lund University. Under the direction of Hadding’s
successor, Eva Gårding, Gösta Bruce developed the Lund model of intonation. He
carried the phonetic analysis of Swedish word accents in a new direction by
analysing them with respect to their syntactic position and pragmatic function
(focus) in utterances. His seminal dissertation, Swedish Word Accents in Sentence
Perspective (1977) laid the theoretical foundation for the development of ideas
about how intonational phenomena could be analysed as components in a
hierarchical prosodic structure. These fundamental ideas on intonational structure
and their relation to syntax and pragmatics have since been adopted and developed
by many researchers the world over.
Following a research stay at Bell Labs in 1984, as well as a period as a
visiting professor at Stockholm University during 1985-1986, Gösta Bruce was
appointed to the chair of phonetics at Lund University in 1986. The contributions
to the festschrift to Gösta on the occasion of his 50th
birthday in 1997 (Horne,
2000) bear witness to the influence that his work had for researchers, not only in
phonetics, but also in general linguistics and in speech technology.
Although Gösta Bruce’s model was based on the prosodic patterning of
‘standard’ central Swedish, Gösta’s own dialect, that of Helsingborg in the
southern province of Scania, differed quite considerably from that of the standard
variety. This variation in the patterning of word accents in Swedish dialects was an
area that intrigued Gösta as it had earlier Eva Gårding (1977) and Ernst Meyer
(1937–1954). Gösta Bruce followed in their footsteps and carried the investigation
of dialectal variation to new heights in his work on prosodic modeling. Although
the phonetic realization of the two Swedish word accents differs quite
considerably dialectally, the crucial timing difference between the word accents
with respect to the stressed syllable is something that is constant for all dialects
and is something which fascinated Gösta. He had an extremely sensitive ear for
tonal variation and timing, and in recent years, his work was focused on
systematizing this variation as regards Swedish dialect prosody in several
externally financed research projects such as SweDia 2000 and SIMULEKT.
Shortly before his untimely death, his vast accumulated knowledge on the varieties
of Swedish was published in his book Vår fonetiska geografi ‘Our phonetic
geography’ (Bruce, 2010).
Gösta’s sensitivity for timing differences also lead to a number of novel
studies on rhythmic structure in Swedish. By carrying out a number of innovative
experimental studies on differences in the duration of unstressed syllables, he
could show how rhythmic alternation was created postlexically in strings of
nonprominent syllables (Bruce, 1987).
Gösta Bruce was not only a creative researcher and scientist; he was also a
dedicated and respected teacher. His undergraduate courses on prosody, Swedish
82
dialect variation and sounds of the world’s languages were always highly
evaluated. At the time of his premature death, Gösta was planning to rework and
update his very popular course book on Swedish prosody (Bruce, 1998). On the
graduate level, Gösta was regularly engaged in doctoral courses on both a local
and national level. He was a devoted teacher and supervisor, and during his career,
Gösta supervised 13 doctoral dissertations. He sincerely cared about his students
and constantly inspired and encouraged them, both by his words of wisdom and by
his empathetic manner. His humor, often spontaneously expressed in terms of
perfect sound imitation (everything from different Swedish dialects to Russian
intonation to complex African click consonants), was another productive outlet for
his very creative mind.
Despite all his research and teaching duties, Gösta Bruce played an important
role in academic leadership at Lund University. During his time as professor, he
served as head of the department of linguistics and phonetics, vice dean of the
humanities faculty, chairman of the appointments’ board for language and
linguistics, and most recently, member of the board of research at the Center for
Languages and Literature. He was also engaged as an expert evaluator at the
Swedish and Norwegian Research Councils and was a member of the editorial
board of Phonetica. In addition, he was an active member in several learned
societies, including The Royal Swedish Academy of Letters, History, and
Antiquities.
In 2007, Gösta Bruce was appointed president of the International Phonetic
Association. In this role, Gösta saw the opportunity to approach a discussion of
fundamental issues related to the future of the discipline of phonetics, including
the relationship of prosodic research within a larger interdisciplinary perspective
where phonetics plays a central role in understanding speech processing
phenomena. Due to his untimely death, however, many of Gösta’s plans were
tragically left at the planning stage.
Following a suggestion by Gösta’s family at the time of his funeral, the IPA
set up a memorial fund to honor Gösta and his accomplishments. Since that time,
the IPA Council has decided to make the fund a permanent fund. The Gösta Bruce
Memorial Fund is intended to serve as a means to support students in phonetics
and speech sciences by awarding scholarships in Gösta’s name that will assist
them in traveling to ICPhS conferences in order to meet other speech scientists
and present their research results to the international community. Nothing could be
more fitting to keep the memory of Gösta Bruce’s many scientific
accomplishments and his constant devotion to developing knowledge of phonetics
alive.
References
Bruce, Gösta. 1977. Swedish word accents in sentence perspective. (Travaux de l’Institut
de linguistique de Lund XII). Lund: Gleerup.
Bruce, Gösta. 1987. On the phonology and phonetics of rhythm: Evidence from Swedish.
83
In Dressler, W., Luschützky, H., Pfeiffer, O. & Rennison, J. (Eds.), Phonologica
1984. Proceedings of the Fifth International Phonology Meeting, Eisenstadt, 25–28
June 1984, pp. 21-32. Cambridge: Cambridge University Press.
Bruce, Gösta. 1998. Allmän och svensk prosodi [General and Swedish prosody]. (Praktisk
lingvistik 16). Dept. of linguistics and phonetics, Lund University.
Bruce, Gösta. 2010. Vår fonetiska geografi [Our phonetic geography]. Lund:
Studentlitteratur.
Gårding, Eva. 1977. The Scandinavian word accents (Travaux de l’Institut de
linguistique de Lund XI). Lund: Gleerup.
Horne, Merle (Ed.). 2000. Prosody: Theory and experiment. Studies presented to Gösta
Bruce. Dordrecht: Kluwer.
Meyer, Ernst A. 1937-1954. Die Intonation im Schwedischen [Intonation in Swedish], 2
vols. (Stockholm Studies in Scandinavian Philology, 0562-1097). Stockholm:
Fritzes.
Merle Horne
Professor of general linguistics
Dept. of linguistics and phonetics
Lund University, Sweden
Ilse Lehiste (1922 – 2010)
(picture by courtesy of Sarah Ritschert)
One of the greatest phoneticians who, was a remarkable scientist, passed
away. Ilse Lehiste, born on January 31, 1922 in Tallinn, Estonia, died at Riverside
Methodist Hospital on Saturday, December 25, 2010. She was born into the family
of a higher officer. She started her studies in Estonia: graduated from the Lender
high school, then studied piano for one year at the Conservatory of Tallinn, and
she came up to the University of Tartu, Faculty of Arts (1942).
84
After two years, she continued her studies in Germany because she left
Estonia as a refugee in 1944, fleeing the Soviet invasion of her homeland. At first,
she studied at the University of Leipzig and then at the University of Hamburg.
Her postgraduate studies concentrated on the work of William Morris, the many-
sided Victorian designer, artist, writer, and socialist. She was especially interested
in the motives of the Nordic literature in his work. She defended her PhD in
Philology at the University of Hamburg in 1948. At that time, she lived in a
refugee camp in Germany.
During the next year she moved to United States, where she continued her
studies. Here, she was engaged especially in linguistics. In 1959, she defended her
second PhD at the University of Michigan. Her main research was acoustic
phonetics, besides this she was engaged in other fields of linguistics: prosody,
language contact, Estonian, phonetics and phonology, Serbo-Croatian
accentology. After receiving her PhD, she spent four years at the Communication
Sciences Laboratory there as a research associate.
In 1963, Ilse Lehiste joined the linguistics faculty at The Ohio State
University (OSU), Columbus. At first, she spent two years in the Slavic
Department, then she was elected to be the Linguistics Department’s first Chair
when it was founded in 1965. She enjoyed a long and especially distinguished
career at OSU: she was elected Professor in Linguistic in 1965. Since 1987, she
was continuing as Professor Emeritus. She has given exciting lectures at
universities and at conferences all over the world.
She was not only a linguist, but a phonetician. She worked to build a bridge
between the linguists of Estonia and the West. That interest is exemplified by the
11th International Phonetics Conference which was organized in Tallinn in 1987
because of her suggestion. She was a Renaissance person: linguist, literateur, poet,
musician, etc. Her poems were published in 1989 (Noorest peast kirjutatud
laulud). She analyzed the Estonian literature and she wrote several overviews for
the World Literature Today in the United States. In the past decade, she was
cooperating with the Institute of Estonian and General Linguistics of the
University of Tartu to investigate Finno-Ugric prosody.
Lehiste left behind an enormous body of work: she was author, co-author or
editor of twenty books, two hundred articles and around a hundred reviews. I
would like to emphasize only one of her admirable books. She was employed in
researching the production and perception of suprasegmental features, and the
general work, called Suprasegmentals was published in 1970. Lehiste summarized
what was known about the phonetic nature of suprasegmentals and evaluated the
available evidence from the point of view of linguistic theory.
Ilse Lehiste attended the Speech Research ’89 Conference in Budapest
(Hungary) more than 20 years ago offering her help to the conference organizers.
It was a great experience for the Hungarian phoneticians to meet her personally.
The title of her talk was The experimental studies of poetic rhythm.
85
The importance of her scientific work was well recognized by a number of
professional bodies around the world. Lehiste has received a honorary doctorate
from Essex University, England (1977), the University of Lund, Sweden (1982),
Tartu University, Estonia (1989), and The Ohio State University (1999). She was a
Fellow of the American Academy of Arts and Sciences (1990), Foreign Member
of the Finnish Academy of Sciences (1998), and Foreign Member of the Estonian
Academy of Sciences (2008). Ilse Lehiste will be remembered both personally and
professionally.
Viola Váradi
Eötvös Loránd University
Phonetics Department
Budapest, Hungary
86
Svend Smith Award 2008 for Elisabeth Lhote
Elisabeth Lhote was born in Toul. After graduating
from high school, she studied French literature and
linguistics at the University of Lille and was
introduced to phonetics there. Motivated by her
growing interest in general, experimental and applied
phonetics, she moved to the Institute of Phonetics of
Strasbourg University where she joined the research
team around Georges Straka. Under his guidance,
Elisabeth Lhote specialized in voice production and
earned her doctorate in Phonetics in 1970 with a thesis
on "La méthode glottospectro-graphique et la
simulation de la parole" (Glottospectrography and the simulation of speech). She
continued her career as a researcher under the supervision of Péla Simon, who had
succeeded Georges Straka in the position of head of the Phonetics Department in
1971, and presented an excellent habilitation treatise in 1980 on "Analyse et
synthèse de faits de langue au niveau du larynx" (Analysis and synthesis of
laryngeal features).
In 1980, Elisabeth Lhote was appointed Professor of Phonetics and head of
the Phonetics Laboratory at the University of Franche-Comté in Besançon. In
1986, she became director of the Center of Applied Linguistics and head of the
Laboratory of Speech Analysis. In these positions she was able to substantially
develop and foster phonetics and applied linguistics at her university until her
retirement in 1997.
Elisabeth Lhote’s list of publications comprises 4 books and 65 articles. She
started publishing the results of her research activities in the late sixties. Her first
publications may be characterized as reports on detailed experimental
investigations of the activities of the vocal cords by glottography and
glottospectrography. Her findings shed new light on the acoustics of the glottal
source, provided new impulses to the theory of phonation and stimulated new
research initiatives in the domaines of intonation and tones in the tone languages.
Later, her interests shifted to speech pathology and therapy, speech perception and
comprehension, speaker recognition and foreign language teaching.
As an academic teacher, Professor Lhote has supervised 18 doctoral and 2
habilitation theses. By her outstanding commitment, devotion and excellence as a
researcher and academic teacher, she has profoundly promoted the phonetic
sciences and applied linguistics in France, Europe and the world. ISPhS’s
membership is proud to confer the 2008 Svend Smith Award to her.
Jens-Peter Koester
87
References
Lhote, E. (1970). La méthode glottospectrographique et la simulation de la parole. Dr.
dissertation, Strasbourg.
Lhote E. (1973). Contribution à l'étude de la fonction linguistique du larynx. Phonetica,
n° 28, p. 26-41.
Lhote, E. (1982). La parole et la voix. Hamburg (Buske).
Lhote, E. (Ed.) (1990). Le paysage sonore d'une langue, le français. Hamburg (Buske).
Lhote, E. (1995). Enseigner l'ora1 en interaction. Percevoir, écouter, comprendre. Paris
(Hachette).
88
PHONETICS INSTITUTES PRESENT THEMSELVES
THE DEPARTMENT OF LANGUAGE AND COMMUNICATION
STUDIES
NORWEGIAN UNIVERSITY OF SCIENCE AND TECHNOLOGY,
TRONDHEIM, NORWAY
The Department of Language and Communication Studies, or in Norwegian:
Institutt for språk- og kommunikasjonsstudier (ISK), is the only department in
Norway where it is possible to study Phonetics. Its research is both fundamental
and applied, and often cross-disciplinary.
The Dragvoll campus, which houses the Department of Language and
Communication Studies
Study programmes
The Department of Language and Communication Studies
<http://www.ntnu.edu/isk> offers a full BA/MA programme in Phonetics
<http://www.ntnu.edu/studies/bfon>. The programme covers all traditional areas
of phonetics (transcription, physiology and articulation, acoustics, and speech
perception) and focuses on experimental phonetics. All courses aim to combine
phonetic theory with practical exercises, usually in the studio or in the phonetic
lab. The Phonetics section is represented by two professors, Wim van Dommelen
and Jacques Koreman. In addition to Phonetics, the Department of Language and
Communication Studies offers full study programmes in General Linguistics and
Applied Linguistics, as well as subsidiary programmes in Swahili and Norwegian
as a Second Language. It is responsible for all Norwegian courses for exchange
students at the Norwegian University of Science and Technology (NTNU). This
varied environment, and collaboration with speech technologists at NTNU, opens
up possibilities for a wide range of research themes.
89
Research
The research in the Department of Language and Communication Studies covers
comparative language studies and foreign language acquisition, speech perception,
speaker recognition and speech technology.
In a long-standing collaboration with Norwegian as a Second Language,
Wim van Dommelen <http://www.hf.ntnu.no/hf/isk/Ansatte/wim.van.dommelen/
personInfo.html> has investigated the difficulties foreigners have in learning
Norwegian. His research covers both segmental and supra-segmental properties.
Tone and intonation has been (and is) an area of interest, especially the realization
of Norwegian lexical tones, in which he has a tight collaboration with Linguistics.
As a spin-off result of his involvement in the Sound-to-Sense project
<http://www.sound2sense.eu/>, he is also involved in experiments on foreigners’
perception of English sounds in noise. This research is carried out in collaboration
with University College London, the University of the Basque Country (Bilbao)
and Radboud University in Nijmegen. The Sound-to-Sense project is a Marie-
Curie Research Training Network in which Ph.D. students and post-docs are
trained outside their native country. It also brought Helena Spilková to Trondheim.
Helena is carrying out her Ph.D. research on reductions in spontaneous
conversational speech and comparing productions of native English speakers with
productions of two groups of non-native speakers of English (Czech and
Norwegian speakers). This research involves detailed phonetic analysis as well as
evaluation of various context influences on the word realizations. In the same
project, there is a collaborative research effort with Radboud University in
Nijmegen on the systematic phonetic variation of word-final /t/ in Dutch, where
the influence of linguistic (e.g. morphological structure) and probabilistic factors
(word frequency) on the realization of canonical /t/’s is being investigated and
compared to the way an automatic speech recognition system deals with such
phonetic variation.
Recently, the department has started a collaborative project which brings
together theoretical expertise from phonetics with pedagogical experience from
the Norwegian teachers in the department to build a computer-assisted
pronunciation teaching system (CAPT). This system is based on VILLE
<http://www.speech.kth.se/ville>, which was developed by KTH in Stockholm,
who are also one of the partners in the project, “Computer-Assisted Listening and
Speaking Tutor (CALST)” <http://www.ntnu.edu/isk/projects>. This project aims
to not only adapt the Swedish system to Norwegian, but also extend it so that users
can train with different dialects. The reason for this is that there is no accepted
pronunciation standard for Norwegian, so that foreigners must learn to deal with
different dialects in their communication with Norwegians to be able to
understand different speakers. Besides focusing on different target dialects, the
system is being developed for specific source languages (or native languages of
the users), so that learners of Norwegian can be guided through pronunciation
exercises that are relevant for their native language. This is done in detail for a few
90
major learner groups in Norway, but we also analyse a large number of languages
in less detail. In order to do this, an automatic contrastive analysis of the phoneme
inventory is made on the basis of UPSID (UCLA Phonological Segment Inventory
Database) <http://www.linguistics.ucla.edu/faciliti/sales/software.htm#upsid>.
The aim is to build a flexible, extendable interface for contrastive analysis
between any language pair that can be used in CAPT applications for any
language. Jacques Koreman <http://www.hf.ntnu.no/isk/koreman> is the project
manager. He is interested in speech technology, and has previously worked on
speech recognition with the use of phonetic features. He also coordinated research
on biometric user authentication in the SecurePhone Project
<http://www.secure-phone.info>, where he and his colleagues specifically worked
with speaker recognition and fusion (combination) of different modalities (voice,
face and signature). The biometric recognizer was also implemented on a
PDA/mobile phone. Besides speech technology, he is interested in the voice and
voice pathology. He has carried out research on the phonetic consequences of
unilateral vocal fold paralysis with a colleague at Saarland University, Germany,
where he worked before moving to Trondheim. He also investigated vocal fold
aerodynamics using a Rothenberg mask. He is now involved in other research
projects in collaboration with Saarland University (project leader) and with the
Technical University in Berlin. These projects investigate the production and
perception of prominent syllables in several languages, of which Norwegian is
one. The investigations so far show that languages use different prosodic
properties to signal that a syllable is prominent, which of course has implications
for second language acquisition and perception.
Equipment
The department has a high-quality recording studio. Besides audio recordings, it is
possible to record electroglottograms (Glottal Enterprises EG-2) as well as
aerodynamic signals (Rothenberg mask). In addition, a motion capture system is
being installed.
Recording of the airflow and microphone signals in the studio
91
Location
The Norwegian University of Science and Technology < http://www.ntnu.edu>
(NTNU) consists of two campuses <http://www.ntnu.edu/about-ntnu/campuses>.
The Gløshaugen campus is home to the engineering sciences, while Dragvoll hosts
the humanist and social sciences. Dragvoll is just outside Trondheim, and a bus
ride into the city centre takes 15 minutes. Most of the buildings are connected by
glass-roofed streets, with a bookshop, a café, small shops and a student cafeteria,
in addition to the university library, lecture halls and offices.
Walking the indoor streets of Dragvoll or enjoying the sunny spell we call winter
What else?
Students can use the university’s sports facilities, and there is ample opportunity
for hiking in the beautiful surroundings of Trondheim, which is situated next to a
fjord. During the long winters, you can go skiing in “lysløper” (lighted ski trails)
in the Estenstadsmarka close to Dragvoll, or in the Bymarka. There are also ski
jumps, as well as alpine slopes, in the vicinity of Trondheim. There are many lakes
where you can go for a swim in summer or skate in winter. The city itself is the
third-largest city in Norway – but it is still small. It has a cozy atmosphere with its
wooden houses, and is at the same time alive with its large student population and
rich cultural life.
Jacques Koreman
e-mail: [email protected]
THE PHONETICS LAB AND THE PHONOGRAM ARCHIVES AT
ZURICH UNIVERSITY, SWITZERLAND
The need for knowledge in phonetics as a language expert was probably one of the
main motivations for the English philology professor Eugen Dieth to found the
Phonetics Lab at the University of Zürich (UZH) in 1935 and to carry out
phonetics research using early versions of palatography and sound kymography
(Dieth, 1950). Apart from focusing on speech research activities, Dieth was also
involved in descriptive work on dialectal variability. For this reason, he desired to
92
maintain the ‘Phonogram Archives’, which were co-founded in 1909 at UZH by
Albert Bachmann and Louis Gauchat with the aim of collecting vernacular
language recordings in the four Swiss national languages (German, French, Italian
and Reto-Romance). At present, both the Phonetics Lab and the Phonogram
Archives compose two inseparable institutions in the Faculty of Philosophy at
UZH that have actively been involved in phonetics and dialectology research and
teaching for the past decade.
UZH is the largest of the 10 Swiss universities in terms of number of
students and staff members. A need for knowledge in phonetics and speech
sciences in both research and education exists across a wide variety of disciplines
such as the philologies (German, English and Romance languages), psychology,
general linguistics and others. The Phonetics Lab/Phonogram Archives can be
viewed as a hybrid institute which serves research needs in a variety of
departments and offers students from a wide range of disciplines the facilities and
expertise to carry out projects in phonetics and speech sciences at Graduate,
Postgraduate and Doctoral level. We do not offer degree courses specifically in
phonetics, but it is part of the required program for most philology students
(English, German and Romance languages) for them to visit the phonetics lectures
provided by the Phonetics Lab. Students with a deeper interest in the subject then
take part in voluntary higher level phonetics courses and graduate in a related
discipline (at any level) with a focus in a phonetic topic. Supervision and
examination of such students is provided by staff-members of the Phonetics Lab.
Our lab consists of a sound-proof booth with a supervisory window that is
well suited for high-quality speech recordings and speech perception experiments.
The booth has high-end recording equipment permanently installed, and we carry
out standard speech measurement and analysis techniques, like laryngography,
palatography and phonatory aerodynamic analysis. We also own a large variety of
portable recording devices and perceptual testing equipment for field work. In
addition, we have our own research library with the main journals in the area of
phonetics and speech sciences and a large number of monographs from all areas of
spoken language, phonetics, linguistics, acoustics, and speech and hearing
sciences. All of our facilities are easily accessible in the tower of the main UZH
building right in the heart of Zurich.
At present our team is formed by the following researchers who are actively
involved in teaching and/or research in phonetics and speech archiving
(alphabetically by surname):
Camilla Bernardasci (Student Research Assistant)
Dario Brander (Post-graduate Research Assistant)
Volker Dellwo (PhD, Assistant Professor of Phonetics/Phonology)
Elvira Glaser (PhD, Professor of German Linguistics and member of
permanent leading board)
Lea Hagmann (Student Research Assistant)
93
Ingrid Hove (PhD, part-time Lecturer)
Marie-José Kolly (Research Assitant and PhD student)
Adrian Leemann (PhD, Post-Doc in Phonetics/Speakeridentification)
Michele Loporcaro (PhD, Professor of Romance Linguistics and Head-of-
Lab)
Mathias Müller (Student Research Assistant)
Stephan Schmid (PhD, PD, Senior Lecturer of Phonetics)
Daniel Schreier (PhD, Professor of English Linguistics and member of
permanent leading board)
Michael Schwarzenbach (lic. phil, Research Assistant)
Jürg Strässler (PhD, part-time Lecturer)
Dieter Studer (lic. phil, Research Assistant)
Sibylle Sutter (Post-graduate Research Assistant)
Our research interests range from historical sound development over synchronic
dialectology to speech production, acoustics and perception, and we work on
segmental, as well as suprasegmental/prosodic levels of analysis. Work is
currently being carried out on the distribution of rhythmic patterns across Italian
and Swiss German dialects (Stephan Schmid), and we are interested in which
functions rhythmic and timing variability may have in human speech
communication (Volker Dellwo, Lea Hagmann, Mathias Müller). In a number of
pilot studies, we found that there is significant rhythmic variability between
speakers. We are now interested in how this variability can be used in areas like
speaker identification (Volker Dellwo, Adrian Leeman, Marie-José Kolly, Stephan
Schmid). For this project we received major grant funding for three years by the
Swiss National Science Foundation (SNF). We are also interested in how this
variability may help listeners to segregate two speakers speaking simultaneously
(Volker Dellwo, Dario Brander, Sibylle Sutter; see Cushing & Dellwo, 2010). For
this project we received one year start-up funding by the University of Zurich
Research Fund. Another significant expertise in the group is dialectal distribution
of sound patterns and the diachronic phonological development in Italian dialects
(Michele Loporcaro & Stephan Schmid) and Swiss German (Elvira Glaser), as
well as socio-phonetic distribution of speech features across non-standard varieties
of English (Daniel Schreier). On a yearly basis, the Romance language oriented
members of the group organize fieldwork trips to various regions of the Italian
speaking world to systematically record a wide variety of Italian accents and
dialects. These recordings have led to research on the distribution and functions of
phonemic vowel quantity across different accents of Italian and to arguments
about the historical phonological development of Romance languages (Loporcaro,
2007). For research into the historical development and synchronic dialectal
variability of Swiss German (Fleischer & Schmid, 2006, Christen, Glaser &
Friedli, 2010), the Phonogram Archives offer an impressive collection of sound
94
carriers which have been collected and archived over the past 100 years. This
material contains valuable specimens of language varieties that have since become
extinct or near-extinct – such as the West Yiddish dialect spoken in Lengnau and
Endingen (Aargau) or the franco-provençal “Patois” – formerly spoken all over
the Western (now French-speaking) part of Switzerland. It also contains early
recordings on wax disc (collaboratively recorded with the Phonogram Archives of
Vienna between 1909 and 1923), which are now part of the UNESCO Memory of
the World Programme (Fleischer & Gadmer, 2002). Major projects of the archives
(Dieter Studer, Michael Schwarzenbach, Lea Hagmann & Camilla Bernardasci)
are currently the compilation of an on-line catalogue, the production of a digital
version of the entire historic archives holdings (in collaboration with the Swiss
national Sound Archives in Lugano) and the presentation of a major exhibition on
Swiss dialects together with the Swiss National Library in Bern in 2012.
In teaching, we offer a variety of lectures, seminars and practical lab sessions
at an introductory and advanced level of phonetics. For students of philology, we
have specifically designed courses in German, English and Romance phonetics.
Additionally, we offer lab sessions in which higher level and postgraduate students
learn experimental techniques in speech production, acoustic measurements and
speech perception. In different lecture series, students are introduced to the main
concepts, as well as specialist areas of phonetics (e.g. speaker idiosyncratic
features or speech rhythmic variability). We have strong links to other departments
like Experimental Audiology or Psychology with whom we provide collaborative
PhD supervision. There are currently four PhD students in the lab, and the interest
is growing.
At present, both the Phonetics Lab and the Phonogram Archives are in a
highly dynamic situation of change. Both institutions are co-directed in different
ways by a board of professors from the philologies, Michele Loporcaro (Romance
Linguistics), Elvira Glaser (German Linguistics) and Daniel Schreier (English
Linguistics). While both institutions were rather separate entities during the past
decades, a proposal is currently being carried out to unite them in a single unit (on
a practical level, this process is nearly completed). In addition, the university
recently decided to invest into the area of spoken language sciences and
established a new Assistant Professorship in Phonetics/Phonology for which
Volker Dellwo (formerly University College London) was hired in August 2010.
With the merger of the Phonetics Lab and the Phonogram Archives, we are
expecting to strengthen phonetics and dialectology research and teaching at UZH
in the future. The group managed to attract grant funding in the past and at
present. More major and minor grant applications have been submitted over the
past months. We thus hope to further enlarge our research team and be able to
offer more funded PhD research in Phonetic Sciences at UZH in the near future.
Should we manage to convince UZH to make further investments into our lab (for
example a full-professorship in Phonetics); our aim would be to set up a degree
course in phonetics at the postgraduate level.
95
Further information on the Phonetics Lab, the Phonogram Archives and our
dynamic situation can be found at our (still separate) webpages
www.pholab.uzh.ch and www.phonogrammarchiv.uzh.ch.
References
Christen, H. , Glaser, E. and Friedli, M. (2010) Kleiner Sprachatlas der deutschen
Schweiz. Huber: Frauenfeld.
Cushing, I.R., and Dellwo, V. (2010) The role of speech rhythm in attending to
one of two simultaneous speakers. In: Electronic Proceedings of Speech
Prosody, Chicago/USA (http://speechprosody2010.illinois.edu/papers/100039
.pdf )
Dieth, E. (1950) Vademekum der Phonetik. Bern: Francke.
Fleischer, J. and Gadmer, T. (2002) Schweizer Aufnahmen–Enregistrements
Suisses–Ricordi sonori Svizzeri–Registraziuns Svizras. Sound Documents
from the Phonogrammarchiv of the Austrian Academy of Science. The
Complete Historical Collections 1899-1950, Series 6/1- 6/3. Wien:
Österreichische Akademie der Wissenschaften, Zürich: Phonogrammarchiv
der Universität Zürich.
Fleischer, J. and Schmid, S. (2006) Zurich German. In: Journal of the
International Phonetic Association 36.2: 243-253
Loporcaro, M. (2007) Facts, theory and dogmas in historical linguistics: vowel
quantity from Latin to Romance. In : Salmons J. C. and Dubenion-Smith S.
(eds.), Historical Linguistics 2005. Selected papers from the 17th
International Conference on Historical Linguistics, Madison, Wisconsin, 31
July- 5 August 2005. Amsterdam, Philadelphia: John Benjamins, 311-336.
Some staff members of the Phonetics Lab and the Phonogram Archives at Zurich
University in front of our recording cabin/sound lab (from left to right: Volker
Dellwo, Michael Schwarzenbach, Stephan Schmid, Ingrid Hove, Dieter Studer,
Camilla Bernardasci).
Volker Dellwo & Dieter Studer
e-mail: [email protected]
96
CONFERENCE REPORTS
Speech Prosody 2010
Chicago, USA, 11-14 May 2010
Speech Prosody is the biennial meeting of ISCA’s (the International Speech
Communication Association) Speech Prosody Special Interest Group (SProSIG).
In 2010, it was held in Chicago and was co-organized by various departments of
the University of Illinois at Urbana-Champaign, the Northwestern Institute on
Complex Systems and the Toyota Technological Institute. For five days (an
externally organized Satellite Workshop on the perceptual and automatic
identification of prosodic prominence took place on May 10th
), more than 300
participants attended the 270 oral and poster presentations on aspects of prosody
which play a role in various disciplines next to Linguistics, such as Psychology,
Computer Science, Speech and Hearing Science, and Electrical Engineering.
The general theme of Speech Prosody 2010 was the large diversity, as well
as the universality of prosody, also addressed in the Keynote lectures: the role of
prosody research in enriching speech engineering (Shrikanth Narayanan), prosodic
cues in first and second sign language acquisition (Diane Brentari), representations
of prosodic cues in computational models for language processing (Mari
Ostendorf), prosody from an evolutionary perspective (Steven Mithen) and from a
psycho- and neuro-linguistic perspective (Aniruddh Patel). Interestingly, the last
two Keynote lectures related language to music, adding another interdisciplinary
facet.
In addition to the Keynotes, three of the special sessions included in the
program can be regarded as highlights of this year’s conference. Their topics were
computer aided pronunciation training and prosody, experimental approaches to
focus, as well as shape, scaling, and alignment of F0 events. It has to be stated,
however, that the quality of the papers and posters was generally very high. In
particular, there were a large number of excellent student papers, which is a
promising sign for the workshops and conferences to come.
Stefan Baumann, Cologne
19th Annual Conference of the
International Association for Forensic Phonetics and Acoustics (IAFPA)
Trier, Germany, 18-21 July 2010
Seventeen years after the last IAFPA conference in Germany’s oldest city, the
Phonetics Department of the University of Trier hosted the 19th Annual
Conference of the International Association for Forensic Phonetics and Acoustics.
Prof. Dr. Angelika Braun and her team of organizers were pleased to break the
unprecedented 100 participant threshold and welcomed phoneticians and
97
acousticians from 14 countries. The main topics presented and discussed in 27
presentations and 10 posters were formants, whispered voice, speech databases,
automatic voice/speaker comparison, and language analysis for the determination
of origin (LADO).
The conference was opened by the President of the University of Trier, Prof.
Dr. Schwenkmezger, who commemorated 40 years of (forensic) phonetic
expertise at the university and at the same time assured that degrees in phonetics
will continue to be awarded in the future. In her opening address, the Dean of the
Department of Languages, Literature and Media Science Prof. Dr. Hilaria
Gössmann referred to the large number of students at this university attending this
year´s conference, citing it as evidence of an active and interested student body
and of the spirit of cooperation in the phonetics department. Both Prof.
Schwenkmezger and Prof. Gössmann stressed the importance for Trier in being
host to this high-profile international conference and wished all the participants a
successful and enjoyable time.
The first session of the 2010 conference was chaired by Jens-Peter Koester,
the founder and long-time head of Trier’s phonetics department. It started with a
presentation by Francis Nolan, Kirsty McDougall and Toby Hudson entitled
Perceived voice similarity and acoustic measures following up on previous
research towards a model of voice similarity for linguistically homogeneous
voices. Their perception experiment showed that telephone recordings level out
the perceived difference between different speakers. Furthermore, the mixing of
studio and telephone recordings increases the perceived difference between
samples from the same speaker. In a second step Nolan et al. applied
multidimensional scaling (MDS, dim1- dim5) to the perceptual results of the
studio recordings and correlated them with acoustic parameters. Some correlation
was found between dim2 - F3, dim3 - F2 and dim4 - F1. The strongest correlation
however was found between dim1 and F0 indicating the importance of
fundamental frequency to naive listeners when judging voice similarity.
These results were supported by Mette Hjortshøj Sørensen in her paper on
Perception of voice similarity by different groups of listeners. Her experiment
included 3 groups of listeners (Danish L1, Danish L2 and no knowledge of
Danish) who listened to paired Danish voice samples with the task of judging
degrees of similarity or dissimilarity. Her preliminary findings suggest that most
listeners used fundamental frequency as the main cue for their decision making
although L1 listeners utilised linguistic cues as well. She also noted that regardless
of their linguistic background, listener performance varied significantly, thus
indicating that voice-discrimination ability varied among listeners. Both findings
are relevant for earwitness testimony evaluation. Her presentation received the
2010 IAFPA student paper award.
The first day of the conference ended with a session in which two papers
shifted the focus from forensic speech evidence proper to the meta-level of
evidence presentation. Allen Hirson in his talk Electronic presentation of evidence
98
in Forensic Phonetics: A critical appraisal argued that electronic presentation of
evidence promotes effectiveness and efficiency in court. The analysis and
decision-making process of the expert becomes more comprehensible when
explained with the help of digital presentations or interactive visualizations. Jonas
Lindh, Anders Eriksson and Gustaf Nelhans concerned themselves with the
phrasing of conclusions, questioning the claim made by some scientists that the
Baysian framework actually constitutes a paradigm shift as compared to traditional
verbal scales.
The tell-tale dialect: Analysis of dialectal variation of German native
speakers in telephone conversations by Karen Masthoff, Yasmin Hadj Boubaker
and Olaf Köster showed that when they are given the task of dialect identification
on telephone voice samples, experts’ performance does not correlate with time
spent, number and type of methods applied or perceived degree of difficulty.
Individual skill and experience appear to be the dominant factors for dialect
identification performance.
Anna Czajkowskis’ contribution, Vocal tract Resonances in Voiced and
Whispered Speech and Listeners’ Perception of Voice Depth and Pitch, compared
mean F1 and F2 LPC values of voiced and whispered recordings. F1 was higher in
whispered speech for all vowels and all speakers. The same proved to be the case
for F2 except with /i/ and /u/. She also presented findings from an experiment on
listeners’ perception of a deep voice, concluding that untrained listeners may
associate low mid-points of F1/F2 vowel spaces with ’deep’ voice even if F0
values do not indicate a low voice.
Probably the most anticipated talk of the conference was Tina Cambier-
Langeveld’s presentation on Performance of native speakers and linguists in
LADO cases with true origin established. She presented results based on actual
LADO cases in which the speakers’ true origins could be confirmed beyond
reasonable doubt after the forensic speech analysis had been done. The
combination of trained native speakers and supervising linguists turned out to
perform very well with 120/124 cases (primary aim: verification of claimed
origin) and 65/69 cases (secondary aim: identification of real origin) correctly
established. Counter-expert reports by specialized linguists only on some of the
same cases did not show this level of accuracy: 1/8 correct for primary aim but
incorrect for secondary aim, 5/8 incorrect for primary aim and 2/8 inconclusive.
She concluded that both trained native speakers and linguists can contribute to
LADO and that a priori exclusion of trained native speakers is unfounded.
The session on automatic speaker and voice comparison opened with
Automatic Forensic Voice Comparison: Experiments on Real Case Data from the
BKA by Timo Becker et al. They presented findings based on experiments with
their own SPES system using real case material, confirming that transmission
channel and speaking style mismatch as well as short recording durations reduce
system performance. As a result, the use of global EER measures for automatic
voice comparison systems was discouraged. In fact, system evaluation requires
99
suitable data, matching the conditions of the case recordings in question, in order
to provide meaningful EERs.
Herman Künzel presented Automatic Speaker Identification with
Multilingual Speech Material in which he tested Batvox 3.1 for three channel
conditions (studio, landline, GSM) and language mismatch conditions (GER-RUS,
GER-POL, GER-ENG, GER-SPAN, GER-SPAN CATL). He confirmed that
system performance generally decreases with reduction of channel quality
(studio>landline>GSM). His language mismatch settings however seemed to have
no or very little effect on the system’s EERs, leading him to the conclusion that
language mismatch, at least for non-tone languages, can be ignored when using
Batvox or similar systems for automatic speaker identification.
In his presentation Empirically Assessing the Validity and Reliability of
Forensic-Comparison Systems Geoffrey Morrison explained and supported the use
of log-likelihood-ratio cost (Cllr) as an appropriate measure of accuracy for
automatic speaker recognition systems used in forensic voice-comparison.
The well-received poster sessions featured, among others, three contributions
concerning speech databases: A Swedish Dialect Database by Jonas Lindh, an
Alcohol Language Corpus by Florian Schiel et al. and a Database of Chinese
Female Voice Recordings by Cuiling Zhang and Geoffrey Morrison.
The conference ended on Wednesday afternoon with the announcement that
the next IAPFA annual conference in 2011 will be hosted by the Austrian
Academy of Sciences in Vienna, Austria.
Peter Knopp, Trier
New Sounds 2010
Sixth International Symposium on the Acquisition of Second Language
Speech
Poznań, Poland, 1-3 May, 2010
The sixth New Sounds meeting took place at Adam Mickiewicz University in
Poznań. As the name (and subtitle) suggest “New Sounds” aims to describe and
investigate the acquisition of second language speech, i.e. the
phonetic/phonological aspects of a second language acquisition. The idea of a
“New Sounds”conference was originally developed by Allan James and Jonathan
Leather who organized the first meeting in Amsterdam in 1990, as well as the
following three meetings in 1992 (Amsterdam), 1997 (Klagenfurt) and 2000
(Amsterdam once again). New Sounds returned in 2007, taking place in
Florianópolis, Brazil (organized by Barbara Baptista, Michael Watkins and
Andréia Rauber).
For 2010, the responsibility for setting up the conference was taken over by
Katarzyna Dziubalska-Kołaczyk, Magdalena Wrembel and Małgorzata Kul. With
180 participants, the Poznań conference (see also http://ifa.amu.edu.pl/newsounds/
introduction) can safely be said to be largest and most successful one yet.
100
The New Sounds conferences have always stood out thanks to being very
well-organized and providing an especially friendly and relaxed atmosphere,
which allows for fruitful and extensive discussions both during and outside of the
actual presentation sessions. The Poznań conference did not break with this
tradition. On the contrary, the excellent food supply for the lunches, the very
pleasant conference reception and a cultural program, including a guided tour of
the old city and an exhilarating choir performance, can be described as
exceptional.
Each of the three conference days was introduced by a keynote speech that
provided an overview of a core area of phonetic/phonological SLA studies while
presenting new insights into its theoretical underpinnings. Conference co-founder
Allan James opened the meeting with a talk entitled “Sounds new? Extending the
explanatory remit of second language phonology: identifications, multivalent
sound categories and a use take on acquisition” in which he argued that recent,
different sociolinguistically influenced conceptions of language, which involve
‘unordered scenarios’ of selective learning, partial competence and performance
without competence, should also be reflected in the acquisition process and thus
the phonetic and phonological paradigms used to describe it.
For the second keynote speech, it was especially fortunate that the organizers
were successful in coaxing a relaxed, serene and helpful (many of the younger
researchers at the conference benefitted from his advice and encouragement) Jim
Flege out of retirement in Italy. Now an “immigrant” and late L2 learner himself,
Flege spoke about his latest insights into an area, to which he has already
contributed very much, namely “Age effects on second language acquisition”. He
concentrated especially on the factor of age of arrival (AOA), what underlying
variables (neural maturation, cognitive changes across the life span, change in the
way L1 and L2 systems interact, and difference in L2 input) may be correlated
with it, and how this co-variation among multiple variables might be controlled.
Finally, Martha Young-Scholten started the last day of the conference by
introducing her most recent ideas and undertakings in the study of “Development
in L2 phonology”. She convincingly argued that in order to effectively compare
the different stages of phonological development in native and non-native learners,
there is a need for longitudinal studies that involve naturalistic L2 learners, i.e.,
learners under conditions comparable to those applying to younger L1 learners
(who do, of course, receive regular and plentiful input from the native speakers of
their target language, but have no or very limited exposure to written text).
She presented data from three learners of L2 German, analyzing their
progress in terms of the successive re-ranking of OT constraints.
The fact that the Poznań meeting has been the biggest New Sounds
conference to date can certainly be interpreted to mean that the study of the
acquisition of second language speech phonetics/phonology is a growing area.
This is also reflected by the increasing variety within the field.
101
In order to provide an impression of the multitude of different subjects
addressed during the conference, a classification of major blocks of topics seems
useful, even though it is of course subjective (and deviates slightly from the
categories the organizers had proposed before the conference). Similarly, the
following overview of papers given at the conference is just as subjective and
guided by what the author of this report witnessed himself and/or perceived as
interesting.
Segmental production of second language speech
“Production of English interdental fricatives by Dutch, German, and English
speakers” by Adriana Hanulikova and Andrea Weber examined the substitution of
/θ/ by other sounds. German learners tend towards /s/, while the majority of Dutch
learners prefer /t/. Besides the distribution of the substitutions, the study also
aimed to compare these productions with actually intended /t, s/ productions and
acoustically analyzed those instances of /θ/ where the speakers succeeded.
In his study on “Voiced obstruents in L2 French: the case of Swiss German
learners” Stephan Schmid showed that speakers of Swiss German, depending on
phonotactic context, frequently did not reproduce voicing in obstruents when
speaking French, realizing contrasts instead by means of longer/shorter durations.
Thorsten Piske (co-authors James Flege, Ian MacKay and Diane Meador)
gave a presentation “Investigating native and non-native vowels produced in
conversational speech” arguing that true mastery of L2 vowels should be
determined with respect to this more realistic and more challenging criterion.
An instrumental approach measuring “Language-specific articulatory
settings in L2 speech” and comparing them to native speaker settings was
demonstrated in a paper by Sonja Schäffler, Ineke Mennen and James Scobbie.
Rob Drummond combined L2 research with sociolinguistic aspects in his
study of native Polish speakers in Manchester adopting local features, i.e.,
northern high, rounded pronunciation of the STRUT vowel vs. more widespread
features like t-glottaling (“Speaking like the locals - the acquisition of local accent
features by native Polish speakers living in Manchester”)
L2 speech perception
Silke Hamann, Paul Boersma and Małgorzata Ćavar examined whether closely
related languages show a similar use of perceptual cues to identify phonological
categories, thus facilitating L2 learning (“Language-specific differences in the
weighting of perceptual cues for labiodentals”). They investigated such perceptual
cues as duration, amplitude of friction noise and percentage of voicing, for the
Dutch labiodentals /f, v, υ/ and how they would be perceived by native speakers of
German, English, Croatian and Polish. Preliminary results indicated that the
number of labiodental categories in these second languages was more influential
than being a member of the same language family.
102
Joan C. Mora, James L. Keidel and James Flege argued that the perception
of the contrasts between the mid vowels /e/ - /ε/ and /o/ - /ɔ/ was difficult even for
Spanish-Catalan bilinguals because of a smaller degree of categoriality. A higher
percentage of language use/experience was the most important factor for success
(“Why are Catalan contrasts between /e/ - /ε/ and /o/ - /ɔ/ so difficult for even
early Spanish-Catalan bilinguals to perceive?”).
In their study of “The impact of visual cues and lexical knowledge on the
perception of a non-native consonant contrast for Colombian adults” Michele
Thompson and Valerie Hazan showed not only that both of the mentioned
parameters were indeed used to support the identification of contrasts (e.g., /b/ vs.
/v/), but also that there seemed to be a culture-specific bias with respect to the use
of visual cues, as the Colombian speakers relied much more on them than Korean
or mainland Spanish speakers did in earlier studies.
Several studies, of course, combined production and perceptual data from
L2 speakers, e.g. “Speech production and perception findings for native German
speakers learning English as a second language” by Bruce L. Smith and Rachel
Hayes-Harb or “Individual variation in the production and perception of SL
phonemes: French speakers learning /i - ɪ/” by Georgina Oliver and Paul Iverson,
who showed in their experiment that L2 vowel production was not highly linked to
L2 vowel perception. They interpreted this result as indicating that learning an L2
category did not rely on just a single underlying ability or representation.
There were also a number of studies that examined perceptual abilities
employing neurolinguistic methods. Nuria Kaufmann, Martin Meyer and Stephan
Schmid, for example, performed an EEG experiment using mismatch negativity
paradigms to investigate contrasts between Serbian affricates as perceived by
native speakers of Swiss German and of Rhaeto-Romance (“Phonetic contrasts in
foreign language perception: A neuropsychological study on Serbian affricates”).
Cheryl Frenck-Mestre and colleagues also used event-related potentials to
investigate the perception of contrasts between the American English vowels /ε/,
/æ/ and /ɪ/ by native speakers of American English, of French and by late French-
English bilinguals (“ERP evidence of the acquisition of non-native contrasts in
late learners”).
Prosody
The number of studies dealing with prosodic features has increased in recent years
and the field was also well-represented at New Sounds 2010. Ineke Mennen, Aoju
Chen and Fredrik Karlsson’s paper “Characterising the internal structure of learner
intonation and its development over time” examined the internal organization and
longitudinal development of L2 learner intonation. Their approach thus did not
look at individual aspects of intonation, but aimed to describe each learner
intonation variety in its entirety. Results suggested that apart from language-
103
specific transfer phenomena, learners started out with a set of basic elements to
build a simple, but efficient intonation system.
“Categorizing Mandarin tones into prosodic categories: the role of phonetic
properties” by Connie K. So and Catherine T. Best described how L2 learners
perceived foreign tones according to the pitch patterns of the intonational
categories in their native prosodic systems. Speakers of non-tone languages (e.g.
English or French) therefore assimilated Mandarin tones into the corresponding
categories (e.g., Mandarin tone 3 (fall-rise) may be interpreted as expressing
uncertainty).
The realization of different types of focus (narrow, broad, contrastive) as a
source of foreign accent was discussed in Mary O’Brien and Ulrike Gut’s paper
“Phonological and phonetic realisation of different types of focus in L2 speech.”
Johannes Schliesser’s poster on “Prosodic encoding of focus and sentence
mode in L2” also considered the realization of focus in L2 speech and especially
considered Gussenhoven’s biological codes as an explanation for patterns that
transfer from the L1 cannot easily account for.
Foreign accent detection/identification
Steven Weinberger and Stephen Kunath introduced “A computational model for
accent identification”, the Speech Transcription Analysis Tool (STAT), which
used segment and syllable structure generalizations, such as vowel shortening,
final obstruent devoicing, palatalization, interdental fricative substitution, vowel
epenthesis or consonant deletion to derive a specific set of phonological speech
patterns that are characteristic of a particular foreign accent.
Sylwia Scheuer’s presentation “How sure are judges about their foreign
accent judgments?” on the other hand, dealt with human quality judgments of
foreign accent. Scheuer confirmed that judges are constant in their ratings and on
that basis attempted to identify those phonetic features (in this case of L2 English)
that promise to provide the greatest reliability.
Teaching
The studies just described do of course have a close connection to the applied
aspects of the study of second language speech, i.e. pronunciation teaching. New
Sounds also offered various papers dealing with particular phonetic phenomena
that trigger the impression of foreign accent.
Walcir Cardoso, co-host of New Sounds 2013 in Montréal, looked at the
production of foreign /s/ + consonant clusters by learners, e.g. speakers of
Brazilian Portuguese, who were not familiar with them (“Teaching foreign sC
onset clusters: Comparing the effect of three types of instruction”). He tested the
success of three different forms of instruction (and the underlying philosophy)
finding that the Projection Model of Markedness showed the largest instructional
effect.
104
Wiktor Gonet, Jolanta Szpyra-Kosłowska and Radosław Święciński
investigated why the velar nasal /ŋ/ is especially difficult for Polish learners of
English to acquire when it is not followed by a velar plosive (“Acquiring angma –
the velar nasal in advanced learners’ English”), while Esther Gómez Lacabex and
María Luisa García Lecumberri demonstrated success in instructing native
speakers of Spanish to produce correct instances of vowel reduction in English
(“Investigating training effects in the production of English weak forms by
Spanish learners”).
Factors influencing second language performance
The study of the various individual parameters that play a role in a learner’s
overall competence has always been one of the major subjects in second language
speech research. New Sounds again included many interesting papers devoted to
particular aspects of the individual and demonstrated their relevance. Various
areas were covered, ranging from cognitive psychology, e.g. “Phonological short-
term memory and L2 speech learning in adulthood” by Cristina Aliaga-Garcia,
Joan C. Mora and Eva Cerviño-Povedano to “classic” factors ,like age, albeit from
the unusual perspective of very young learners as in Henning Wode’s talk on “L2
phonological acquisition by young learners: Evidence from production” to other,
somewhat external, linguistic aspects,and Yasaman Rafat’s paper on “Orthography
as a conditioning factor in L2 transfer: evidence from English speakers’
production of Spanish consonants.”
Several presentations also attempted to investigate the possible interactions
between different phonetic abilities and the many known relevant psychological
and neurological factors, as well as those describing the external circumstances of
acquisition in order to isolate the significance a particular parameter. This is the
case in the study “Investigating the concept of talent in phonetic performance” by
Matthias Jilka, Natalie Lewandowska and Giuseppina Rota and a connected
investigation of the phenomenon of phonetic convergence as an indicator of talent
(“Is dynamic phonetic adaptation in dialog related to talent?” by Lewandowski
,Jilka and Grzegorz Dogil). Yoon Hyun Kim and Valerie Hazan’s study on
“Individual variability in perceptual learning of L2 speech sounds and its cognitive
correlates” also followed a similar methodology (use of a test battery covering
various cognitive abilities) in order to investigate individual variability in
discriminating non-native phonetic contrasts.
Models and theories of the acquisition of second language speech
Another important aspect of second language acquisition research was provided by
studies that explicitly attempt to contribute to the (further) development and
explanatory/predictive power of models of sound acquisition and/or
representation.
Ocke-Schwen Bohn and Catherine T. Best attempted to account for native
German listeners’ abilities to perceive the constrasts between the American
105
English approximants /r/, /l/, /w/ and /j/ in terms of Flege’s Speech Learning
Model and Best’s own Perceptual Assimilation Model.
John Archibald argued for the existence of a L1 phonological filter that can
be overcome by especially robust cues, explaining why certain articulations,
although equally unfamiliar to learners, are acquired more easily than others
(“Conditions for overriding the L1 phonological filter”)
Finally, conference host Katarzyna Dziubalska-Kołaczyk and co-author
Daria Zielińska presented an approach predicting preferred and dispreferred
consonant clusters based on the recognition of phonotactic and morphonotactic
(sound clusters across morphological boundaries) structures. Phonotactic
preferences were based on the notion of markedness, which in turn was defined by
the perceptual distance between segments (as measured according to Dziubalska-
Kołaczyk’s own Net Auditory Distance Principle). Morphonotactic clusters
behaved differently as they contained morphological information and markedness
was used to signal their function.
As indicated earlier, this can only be a subjective, somewhat
impressionistic summary of the many interesting presentation given at New
Sounds 2010. Full Proceedings can be found at
http://ifa.amu.edu.pl/newsounds/Proceedings_guidelines.
The conference organizers intend to publish two books with more elaborate
versions of many of the presented papers early next year.
The next New Sounds conference will take place in 2013 at Concordia
University in Montréal, Canada!
Matthias Jilka, Stuttgart
106
BOOK REVIEWS
Steve Parker ed. (2009) Phonological Argumentation. Essays on Evidence and
Motivation. London/Oakville: Equinox (377 pp. ISBN 978-1-84553-221-5)
Reviewed by: Péter Siptár
Eötvös Loránd University, Budapest, Hungary
e-mail: [email protected]
The Equinox series Advances in Optimality Theory (series editors: Ellen Woolford
and Armin Mester) was launched in 2007 with John J. McCarthy’s monograph
Hidden Generalizations: Phonological Opacity in Optimality Theory. The present
volume is the fifth in the series and is a Festschrift for McCarthy, written by his
former students, all of them alumni of the graduate school of the University of
Massachusetts at Amherst (except Joe Pater who is McCarthy’s colleague, a
professor in the Department of Linguistics there). The book has a Foreword by
Elisabeth Selkirk and the editor’s Introduction includes excerpts from some of the
authors’ personal comments on John McCarthy.
The eleven chapters of the collection all discuss the process of phonological
argumentation, the way the validity (or otherwise) of particular phonological
analyses can (or must) be demonstrated within the framework of Optimality
Theory (and in general). The chapters are divided into two main sections: the first
six chapters discuss the evidence for, and the methodology used in, discovering
the bases of phonological theory (i.e., how constraints are formed and what sort of
evidence is relevant in positing them); the last five chapters present case studies
that focus on particular theoretical issues within OT through various phenomena
in one or several languages, arguing in favour of or against specific formal
analyses.
Andries W. Coetzee’s “Grammar is both categorical and gradient” (pp. 9–42)
motivates the claim in its title by presenting the results of psycholinguistic
experiments involving speakers of English and Hebrew. In particular, the author
shows that the subjects’ mental grammars are capable of making both categorical
and gradient judgements about the well-formedness of hypothetical word-like
forms. He also proposes a new type of comparative OT tableau to model both
types of decision-making behaviour, pointing out that traditional grammars are
unable to handle them. Standard derivational models of generative grammar can
easily account for the categorical distinction between grammatical and
ungrammatical forms but have some difficulty with gradient well-formedness
distinctions. On the other hand, models in which the bifurcation of grammatical
and ungrammatical forms does not exist, that is, where an ungrammatical form is
taken to be simply a form with extremely low probability of occurrence, are also
challenged by the experimental results. The author argues that the inherent
107
comparative character of OT grammars enables that theory to model both kinds of
behaviours in a straightforward manner.
Paul de Lacy’s contribution on “Phonological evidence” (pp. 43–77)
examines the innatist theory of generative grammar’s phonological component and
related modules, asking what such a framework identifies as empirical evidence
that supports it. The chapter also refers to predicted ambiguities where two or
more modules influence the same phenomenon. Specifically, the author discusses
phenomena like alternations, phonotactics, phonetic neutralization, free variation,
diachronic change, loanword adaptation, language games, language acquisition
data, and typological frequency, and concludes that the theory – or at least its
phonological component – does not claim responsibility for many of these
phenomena. Based on his earlier work on markedness, he proposes methods to
help separate valid from spurious evidence.
Elliott Moreton’s “Underphonologization and modularity bias” (pp. 79–101)
proposes a stochastic learning algorithm to capture the relative frequency of
phonologization effects, showing that the model derives the correct results in a
simulation of typological patterns involving tones interacting with other tones. The
author concludes that the hypothesis pairing “hard typology” (what grammars are
cognitively possible) with Universal Grammar and “soft typology” (how frequent
they are) with other factors affecting language change is probably too strong.
“Cognition and phonetics interact to determine typology in ways more
complicated (and interesting) than has been generally acknowledged. Further
progress will require a better quantitative understanding of the typology of
phonetic precursors, and of the differential receptiveness of learners to different
patterns” (p. 100).
Máire Ní Chiosáin and Jaye Padgett’s “Contrast, comparison sets, and the
perceptual space” (pp. 103–121) uses a systemic approach couched in Flemming’s
Dispersion Theory to argue for a principled restriction of the perceptual space of
comparison sets which resolves the problem of infinite candidate generation. The
discussion focuses on secondary palatalization contrasts in onset versus coda
position, using perceptual data from Irish.
Joe Pater’s “Morpheme-specific phonology: Constraint indexation and
inconsistency resolution” (123–154) argues that exceptions and other instances of
morpheme-specific phonology are best analysed in OT in terms of lexically
indexed markedness and faithfulness constraints (as opposed to lexically specified
rankings, i.e., cophonologies). This approach can capture locality restrictions,
distinctions between exceptional and truly impossible patterns, distinctions
between blocking and triggering, and distinctions between variation and
exceptionality. The chapter discusses data from Assamese, Finnish, and Yine
(formerly known as Piro) and provides a learnability account of the genesis of
lexically indexed constraints.
Jennifer L. Smith’s “Source similarity in loanword adaptation:
Correspondence Theory and the posited source-language representation” (pp. 155–
108
177) assumes a correspondence relation between loanwords and their “pLs
representations”, i.e., the borrower’s posited representation of the source-language
form, allowing for a consistent account of the interaction between phonological
adaptation processes and factors such as perception and orthography. The author
provides empirical support from Japanese, Finnish, Hmong, and Sranan,
predicting multiple phonological adaptation strategies for loanwords.
Part Two of the volume includes five case studies. John Alderete’s
“Exploring recursivity, stringency, and gradience in the Pama-Nyungan stress
continuum” (pp. 181–202) reviews contemporary approaches to the morphological
influences on stress in Diyari, Dyirbal, Warlpiri, and other Pama-Nyungan
languages. The author develops nine different theories to account for the variation
found that differ in the constraints responsible for edge effects in stress and the
alignment of morphological and prosodic structure. Analysing the factorial
typology of each theory, the author comes up with three conclusions. First,
stringency (special-general) relations between morpho-prosodic alignment
constraints are necessary because theories that ignore them either fail to describe
all relevant data or predict the existence of implausible (and unattested) stress
patterns. Second, some gradiently evaluated constraints have to stay even though
some others can (and must) be dispensed with. And third, McCarthy and Prince’s
recursive prosodic word analysis can be given both theoretical and empirical
support.
Maria Gouskova and Nancy Hall’s “Acoustics of epenthetic vowels in
Lebanese Arabic” (pp. 203–225) examines Lebanese epenthetic vowels through
acoustic experiments and shows that such vowels have phonetic traces that can
help learners distinguish them from underlying vowels. Although epenthetic and
lexical vowels are often transcribed as identical, they turn out to be acoustically
distinct: epenthetic vowels are either shorter or backer or both. The authors
propose a learning strategy based on McCarthy’s theory of Candidate Chains that
provides a way to model this incomplete neutralization and its opaque interaction
with stress assignment. In particular, they suggest that phonetic implementation
optionally accesses an intermediate level of phonological derivation, that is, a
stage that is closer to the underlying representation than the (fully neutralized)
surface phonological form of the given item.
Junko Ito and Armin Mester’s “The onset of the prosodic word” (pp. 227–
260) is my personal favourite in the whole volume. In one of the pioneering works
of OT, McCarthy offered a comprehensive analysis of r-insertion in non-rhotic
English dialects, suggesting that the constraint driving the process was not an
onset-related one but rather a constraint requiring prosodic words to end in a
consonant. This paper shows that this counter-intuitive ‘anti-wellformedness’
constraint can be done away with on the basis of an enriched view of prosodic
constituent structure involving functional morphemes and the onset properties of
the maximal prosodic word. “Empirically, our analysis not only accounts for the
complex distribution of the linking r-consonant in RP and the Eastern
109
Massachusetts dialect, but also extends straightforwardly to the different
distributions in other dialects. While preserving the central insights of
[McCarthy’s paper], which remains not just a classic but also a model of
optimality-theoretic analysis, the present proposal is theoretically grounded in
correspondence theory (positional faithfulness), and is a natural outgrowth of a
conception of prosodic structure that views function words as occupying positions
within extended word structures (maximal prosodic words)” (pp. 256–7).
Ania Łubowitz’s “Infixation as morpheme absorption” (pp. 261–284)
presents evidence that infixes in Palauan and Akkadian are subject to feature
cooccurrence restrictions (OCP) on the root domain, whereas segmentally
identical prefixes are not. In order to account for this asymmetry, the author
proposes that infixes are structurally incorporated into the root morpheme in the
output through a process called morpheme absorption.
Finally, Sam Rosenthall’s “Vowel length in Arabic verb stems” (pp. 285–
307) relies on a foundational insight of OT, the interaction between ranked and
violable constraints, in analysing the intricate morphophonemics of Arabic verb
roots containing a glide as one of their radicals. Vowel coalescence and
compensatory lengthening are both seen to arise from the same subhierarchy of
constraints, but only if verb roots are crucially triliteral underlyingly. The chapter
also argues for a prosodic analysis of verb stems, in accordance with McCarthy
and Prince’s Prosodic Morphology Hypothesis.
The back matter includes a cumulative list of References (pp. 308–347), as
well as an author index, an index of constraints, an index of languages, and a
subject index.
All in all, this is an important book and, although by no means an easy
bedside reading, it is thoroughly enjoyable even for readers whose acquaintance
with the current OT scene is somewhat superficial. It is a pity that the volume is
riddled by a substantial number of typos of various sorts from simple
misalignments (as on p. 275 (26) or p. 294 (14)) through cases like “it is difficult
how to see how” (p. 138), “a language that that neutralizes contrasts” (p. 149), “it
less likely to affect” (p. 210), “the fact that that the optimal stem has a long vowel”
(p. 298), “as well in as clusters” (p. 154, fn. 3), “such as constraint” (for such a
constraint, p. 258, fn. 8), to truly embarrassing instances like “case ending suffix”
for infinitive suffix (p. 283 fn. 20), “obstruent-sonorant clusters” for sonorant-
obstruent clusters (p. 205 (4)), and even transcription errors (in nonsense items)
like “stʌt” for stɔɪt (p. 37). Perhaps the most serious error is this: “a special
PRECEDENCE constraint requires that epenthesis precede insertion of stress” where
the correct requirement is that stress assignment precedes epenthesis (p. 219). The
typographic details of referencing conventions are not uniform throughout
(Alderete’s chapter is the odd man out in this respect). And even the editor’s own
name is misspelt at one point as “Stever Parker” (p. 75, fn. 1).
110
Such minor (or not-so-minor) imperfections notwithstanding, the book will
be of interest to anyone who seriously follows what is going on in the field of
phonology in general and Optimality Theory in particular.
Géza Németh & Gábor Olaszy eds. (2010) A magyar beszéd. Beszédkutatás,
beszédtechnológia, beszédinformációs rendszerek
[Hungarian Speech. Speech research, speech technology, speech information
systems]
Budapest: Akadémiai Kiadó (708 pp. ISBN 978-963-05-8966-6)
Reviewed by: Péter Siptár
Eötvös Loránd University, Budapest, Hungary
e-mail: [email protected]
Speech technology is one of the new industries of the late twentieth and early
twenty-first centuries – and this volume is its first systematic book-size overview
in Hungarian and on Hungarian. As the various devices and services of speech
technology, with their functions growing fast both in number and in diversity,
become part of our everyday lives and especially part and parcel of our children’s
lives, it is increasingly important that they are made interesting, attractive, easy to
learn and simple to use. The forms and functions of human speech communication
have taken several millennia to emerge; their application for information exchange
between man and machine is therefore a great opportunity and a great challenge.
Scientists have only taken the very first steps in that direction. So far; their
machines have but a tiny fraction of the communicative endowments of human
speakers at their disposal, especially with respect to the realm of meaning or
semantic interpretation.
With respect to the time of a potential financial breakthrough for speech
technology solutions, serious experts had predicted back in the 1980s that an
exponential increase was to be expected in the English-language speech
recognition market in a matter of two years or so. This did not happen; at best,
linear development took place; a fact that discouraged decision makers who
wielded influence over financial resources. Ever since, due to a tension between
marketing promises and actual performance, cycles of increased attention followed
by less awareness can be observed every five or six years. However, if we
compare the early eighties with the present day, the overall rate of development is
enormous. Fortunately, speech research has a significant tradition in Hungary.
Hence, it is not necessary for Hungarians to wait for technologies from big
multinational companies to fill the relatively small market of this country. Instead,
the Hungarians have found competitive solutions based on their own intellectual
and material resources.
111
This book is a compendium of what current results of scientific and
technological research have to tell us about Hungarian speech in the twenty-first
century. The aim of the authors, as the editors point out in the preface, is to present
an overview of the acoustic structure of present-day Hungarian speech, and to
review the recent results, problem areas, and applications of speech technology as
a relatively new interdisciplinary area of research, especially insofar as it pertains
to Hungary. The book has essential chapters (for instance, those on speech
acoustics or signal processing), as well as chapters on various applications and
technologies that characterize the state of the art technology. The book has an
associated homepage (http://magyarbeszed. tmit.bme.hu) that contains a host of
relevant data that had to be left out of the book due to lack of space.
The authors are leading speech technology experts of this country: Géza
Németh and Gábor Olaszy (the two editors), as well as Kálmán Abari, Mátyás
Bartalis, Tamás Bőhm, Tamás Gábor Csapó, László Czap, Tibor Fegyó, Géza
Kiss, Péter Mihajlik, György Szaszák, György Takács, Péter Tatai, Bálint Tóth,
Klára Vicsi, Ákos Viktóriusz, and Csaba Zainkó. The volume also has a
“supervising editor”, Géza Gordos.
The book has four large sections, preceded by a preface, a list of authors, and
a key to abbreviations, and followed by a large list of references, an appendix and
an index. The first section (People, language, and speech, pp. 1–92) consists of
four introductory chapters (Speech and the information society, pp. 3–7, The
complex structure of speech, pp. 9–18, Physiological and physical basics, pp. 19–
71, The connection between speech and writing, pp. 73–92). The second section,
still on a preliminary note, but focusing on Hungarian, discusses The structural
analysis of speech (pp. 93–205). This is the part of the book that is closest to
linguistic phonetics and is divided into two chapters (The segmental structure of
speech, pp. 95–170, and The suprasegmental structure of speech, pp. 171–205).
The third and largest section (Speech technology, pp. 207–522) discusses The
science of speech technology (pp. 209–259), Data bases serving speech
technology (pp. 261–331), Speech perception and recognition by machine (pp.
333–409), and Speech production by machine (pp. 411–522). Finally, the fourth
section (Applications of speech technology, pp. 523–655) tells us about Speech
information systems (pp. 525–539), provides Examples of the areas of application
of speech technology (pp. 541–629), lists Interfaces, standards, homepages, and
programs (631–651), and concludes in a very brief chapter by Nick Campbell and
Géza Németh on The future of speech technology (pp. 653–655), the last sentence
of which is “Speech technology is roughly at a stage of development that the
vehicle industry had reached by 1900.” It remains for the reader to decide whether
this is an optimistic or a pessimistic note to end a book like this on.
The book is primarily intended as a textbook for students of informatics.
However, it will also be useful for experts and decision makers in
telecommunication, speech technology research and development, designers of
112
content providing services, the health industry and rehabilitation. But the authors
had an even larger audience in mind when writing it. In their view, the book may
turn out to be useful in a range of less technologically minded university courses
in the humanities and elsewhere (phonetics, speech analysis, linguistics, speech
psychology, health promotion and disease prevention, mass communication, and
so on). The authors furthermore recommend this book for secondary schools, and
indeed for anybody who might be interested (like physicists, linguists, people who
work for radio or television or in the movie industry or media experts in popular
science). The comprehensive contents and the relatively popular attitude of the
book make it readable and even enjoyable for everybody from philosophers to
engineers (and beyond).
Halicki, Shannon D. (2010)
Learner Knowledge of Target Phonotactics: Judgements of French Word
Transformations
Lincom GmbH, (LINCOM Studies in Language Acquisition Series (LSLA), 27),
ix + 234 pages,
ISBN 9783895867408, price €65,10
USD 79.70 / EUR 64.80 / GBP 55.10.2009.
Reviewed by
Chantal Paboudjian
University of Provence, Aix-en-Provence, France
This 27th
volume of the LINCOM Studies in Language Acquisition series contains
the publication of the Ph.D. dissertation of Dr. Shannon Halicki who is now
assistant professor of French and Spanish at the Department of Humanities of
West Liberty University in West Virginia. The dissertation was defended in 2009
at Indiana University in Bloomington.
Throughout its 7 chapters, the book addresses the relationship between
language learners’ inter-language phonology and Universal Grammar (UG). More
precisely, it seeks to determine the extent to which inter-language phonology is
constrained by UG principles. Following Chomsky’s 1965 Aspects of the Theory
of Syntax, it is assumed that an innate language learning mechanism is intact in
adult second language acquisition. Learners would thus be equipped with pre-
existing knowledge that makes acquisition possible. The author takes the opposing
view of most studies on the subject attesting that second language phonology is
not native-like and attempts to investigate if English adult second-language (L2)
learners of French acquire L2 phonotactic constraints at abstract levels in the same
way as native learners and hypothesizes that learners can reconfigure L1
113
parameters to accommodate new L2 material. Two major research questions are
addressed:
“Do non-native speakers of a language exhibit consistent judgments of
wordlikeness in their target language?” and if they do,
“Are the judgements native-like, driven by L1 transfer and inhabit the niche
occupied by native language phonologies?”
To provide answers to these questions, the author has been testing L2 knowledge
of three structural features which differ between native language (L1) and L2, i.e.,
consonant cluster limits in French, sonorancy assimilation at morpheme
boundaries, and similarity avoidance at morpheme boundaries.
Chapter 1. Introduction and Background (pp. 1-33) reviews arguments
that grammars (including phonological grammars) are generative systems whose
acquisition is driven by an innate learning mechanism. It presents research on
language acquisition, particularly syntactic well-formedness and interpretation, by
innate ability for both native and L2 learners. It also briefly addresses issues such
as evidence of native speaker intuition about phonotactics, L2 learners’ acquisition
of non-learnable knowledge about the target language (not transferred from the
L1), the relationship between UG constraints and phonology and L2 phonological
systems (with focus on learner’s pronunciation). The chapter concludes with a
presentation of the research questions the author studies in the volume.
Chapter 2. Studies in native and learner phonotactic performance (pp.
32-66). Since L2 learners seem to demonstrate native-like judgments of syntactic
well-formedness and interpretation, the author asks whether they demonstrate
similar abilities in L2 phonotactics. She thus surveys literature relevant to the
study of L2 phonotactic knowledge with special focus on syllable well-formedness
contrasts (relationship between markedness and language universals) and the
concept of ‘wordlikeness’ in cognitive linguistics. She also reviews two relevant
L2 studies carried out in an Optimality Theory framework and presents studies
showing the importance of an abstract phonological level in the account of the
data.
Chapter 3. The learnability of French syllable Constraints by L1 English
Speakers (pp. 67- 110). The author examines here the ‘learnability’ (author’s
expression) of constraints on French consonant clusters exhibiting L1-L2
contrasts. She describes facts of syllable structure in French and English in order
to specify the nature of the learning task, the type of input available to learners as
well as representations that may be transferred from the L2 system. The
parametric difference of syllable structure between English and French is
presented using McCarthy and Prince’s Prosodic Morphology analysis. Syllable
structure constraints and a detailed description of the French maximal syllable in
codas are provided and illustrated and the validity of some minor linguistic
phenomena such as word transformations is discussed. A section further analyses
114
the rules of popular French re-suffixation (assimilated to slang language
manipulations) which are difficult to acquire by L2 learners.
Chapter 4. Experimental Design and Methodology (pp. 111-130)
describes the design of the word-building experiment and the statistical procedures
used in the data analysis. Three structural features, i.e., consonant cluster limit,
sonorancy assimilation and continuancy dissimilation, have been tested. The tests,
designed to probe intuitions regarding the well-formedness of re-suffixed items in
French, instructed intermediate and advanced English-speaking learners of French
as well as native French speakers to give their levels of acceptance of series of
items with sequences varying at the phonotactic level. The stimuli, the
questionnaire and the experimental hypotheses for the tests are described.
Moreover the questionnaires for participants and lists of the test items are provided
in the volume appendix.
Chapter 5. Quantitative Results (pp. 131-161) provides quantitative
results for the 3 tests in the experiment illustrated by 24 tables and figures. Both
French native speakers and English learners of French appear to exhibit similar
judgments of asymmetries in the well-formedness of proposed nonce re-suffixed
items with level of confidence increasing with language proficiency. However a
difference is noted in the rates of acceptances of some consonant clusters in nonce
words between learners and native speakers. Legal sequences in French were
accepted within the context of roots but not in derivations.
Chapter 6. Discussion (pp. 162-189). In this last part, the author interprets
and discusses the central findings of Chapter 5, which are that advanced and
intermediate learners as well as native French speakers rejected some items but
accepted others as well-formed. The author concludes that formal phonological
grammar (knowledge of phonotactic constraints and knowledge of alternations) is
the primary locus of both groups. She argues that L2 learners construct the
phonological shape of the suffix at the prosodic level and obey constraints on
operations having to do with the preservation of roots, and specification of
phonological features of allomorphs. She discusses the potential influences of
lexical frequency and universal markedness which would predict outcomes in the
word judgment task.
Chapter 7. Conclusion (pp. 190-203). This chapter contains a general
discussion with conclusions drawn from the experimental findings. The author
points out two novel aspects of the presented research: (1) her adoption of a new
approach to issues such as learner simplification strategies, which is the
Optimality Theory framework which assumes the universality of constraints on
language output as well as the parametric difference between languages; (2) her
adoption of a new psycholinguistic approach in phonological testing, that is the
introduction of the notion of relative acceptability/rejection taking into account
gradient judgments of listeners. Finally two sections stress the role of UG in L2
phonology and of lexical frequencies in phonotactic knowledge. The concluding
remarks show that a line of research has been opened up onto important issues in
115
language acquisition. Further research could focus on the order of acquisition of
structures and the definition of cues needed to establish correct parameter settings.
The first impression made by this book is that it is geared towards language
acquisition specialists who are fluent readers of English. Other readers may be
discouraged by the complexity of a presentation (particularly in the last two
chapters) more suited to a dissertation than to the communication of a scientific
work to a larger public. In addition, the use of a small font size doesn't make
reading easier for some. However language acquisition specialists will
acknowledge the tremendous work that has been conducted in the presentation of
the reviews, and language teachers will appreciate the analysis of research on
language acquisition mechanisms. French teachers will also find helpful and
sometimes practical information they can directly use in their teaching approach.
Reference
Chomsky, Noam (1965). Aspects of the Theory of Syntax, Cambridge, MA, MIT
Press.
116
WORKSHOPS AND CONFERENCES
+++
2-3 May 2012
The Listening Talker (LISTA) Workshop
Edinburgh, Scotland
+++
2-4 May 2012
2nd Workshop on Sound Change
Kloster Seeon, Germany
+++
21-27 May 2012
8th International Conference on Language Resources and Evaluation (LREC)
Istanbul, Turkey
+++
22-25 May 2012
Speech Prosody 2012
Shanghai, China
+++
26 May 2012
4th International Workshop on Corpora for Research on Emotion Sentiment &
Social Signals
Istanbul, Turkey
+++
19-21 July 2012
Interdisciplinary Workshop on Perspectives on Rhythm and Timing
Glasgow, UK
+++
27-29 July 2012
Lab Phon 13
Stuttgart, Germany
+++
5-8 August 2012
Annual Conference of the International Association for Forensic Phonetics and
Acoustics (IAFPA)
Santander, Spain
117
+++
3-5 September 2012
ISICS 2012: International Symposium on Imitation and Convergence in Speech
Aix-en-Provence, France
+++
September 7-8, 2012
Interdisciplinary Workshop on Feedback Behaviors in Dialog
Portland, U.S.
+++
09-13 September 2012
Interspeech 2012
Portland, U.S.
+++
25-29 August 2013
Interspeech 2013
Lyon, France
+++
7-11 September 2014
Interspeech 2014
Singapore
+++
August 2015
18th International Congress of the Phonetic Sciences (ICPhS)
Glasgow, Scotland
+++
September 2015
Interspeech 2015
Dresden, Germany
118
CALL FOR PAPERS
The Phonetician will publish peer-reviewed papers and short articles in all areas of
speech science including articulatory, acoustic phonetics, speech production and
perception, speech synthesis, speech technology, applied phonetics,
psycholinguistics, sociophonetics, history of phonetics, etc. Contributions should
primarily focus on experimental work but theoretical and methodological papers
will also be considered. Papers should be original works that have not been
published and are not considered for publication elsewhere.
Authors should follow the guidelines of the Journal of Phonetics for the
preparation of their manuscripts. Manuscripts will be reviewed anonymously by
two experts of the field. The title page should include the authors’ names and
affiliations, address, e-mail, telephone, and fax numbers. Manuscripts should
include an abstract of no more than 150 words and up to four keywords. The final
version of the manuscript should be sent both in .doc and in .pdf files. It is the
authors’ responsibility to obtain written permission to reproduce copyright
material.
All kinds of manuscripts should be sent in electronic form (.doc and .pdf) to the
Editor. We encourage our colleagues to send manuscripts for our newly released
section entitled Master’s research: Introduction. Master’s students are invited to
sum up their research in the area of phonetics answering the questions of
motivation, topic, goal, and results (no more than 1,200 words).
INSTRUCTIONS FOR BOOK REVIEWERS
Reviews in the Phonetician are dedicated to books related to
phonetics and phonology. Usually the editor contacts prospective
reviewers. Readers who wish to review a book mentioned in the list
of “Publications Received” or any other book, should address the
editor about it.
A review should begin with the author’s surname and name,
publication date, the book title and subtitle, publication place, publishers, ISBN
numbers, price, page numbers, and other relevant information such as number of
indexes, tables, or figures. The reviewer’s name, surname, and address should
follow “Reviewed by” in a new line.
The review should be factual and descriptive rather than interpretive, unless
reviewers can relate a theory or other information to the book which could benefit
our readers. Review length usually ranges between 700 and 2500 words. All
reviews should be sent in electronic form to prof. Judith Rosenhouse (e-mail:
119
ISPhS MEMBERSHIP APPLICATION FORM
Please mail the completed form to:
Treasurer:
Prof. Dr. Ruth Huntley Bahr, Ph.D.
Treasurer’s Office:
Dept. of Communication Sciences and Disorders
4202 E. Fowler Ave. PCD 1017
University of South Florida
Tampa, FL 33620 USA
I wish to become a member of the International Society of Phonetic Sciences
Title: ____ Last Name: _________________ First Name: _________________
Company/Institution: ________________________________________________
Full mailing address: ________________________________________________
________________________________________________________________
Phone: __________________________ Fax: ____________________________
E-mail: ___________________________________________________________
Education degrees: __________________________________________________
Area(s) of interest: __________________________________________________
The Membership Fee Schedule (check one):
1.Members (Officers, Fellows, Regular) $ 30.00 per year
2.Student Members $ 10.000 per year
3.Emeritus Members NO CHARGE
4.Affiliate (Corporate) Members $ 60.000 per year
5.Libraries (plus overseas airmail postage) $ 32.000 per year
6.Sustaining Members $ 75.000 per year
7.Sponsors $ 150.000 per year
8.Patrons $ 300.000 per year
9.Institutional/Instructional Members $ 750.000 per year
Go online at www.isphs.org and pay your dues via PayPal using your credit card.
I have enclosed a cheque (in US $ only), made payable to ISPhS.
Date ___________________ Full Signature _____________________________
Students should provide a copy of their student card.
120
News on Dues
Your dues should be paid as soon as it convenient for you to do so. Please send
them directly to the Treasurer in US$:
Prof. Ruth Huntley Bahr, Ph.D.
Dept. of Communication Sciences & Disorders
4202 E. Fowler Ave., PCD 1017
University of South Florida
Tampa, FL 33620-8200 USA
Tel.: +1.813.974.3182, Fax: +1.813.974.0822
e-mail: rbahr@ usf.edu
VISA and MASTERCARD: You now have the option to pay your ISPhS
membership dues by credit card using PayPal if you hold a VISA or
MASTERCARD. Please visit our website, www.isphs.org, and click on the
Membership tab and look under Dues for the underlined phrase, “paid online via
PayPal.” Click on this phrase and you will be directed to PayPal.
The Fee Schedule:
1. Members (Officers, Fellows, Regular) $ 30.00 per year
2. Student Members $ 10.00 per year
3. Emeritus Members NO CHARGE
4. Affiliate (Corporate) Members $ 60.00 per year
5. Libraries (plus overseas airmail postage) $ 32.00 per year
6. Sustaining Members $ 75.00 per year
7. Sponsors $ 150.00 per year
8. Patrons $ 300.00 per year
9. Institutional/Instructional Members $ 750.00 per year
Special members (categories 6–9) will receive certificates; Patrons and
Institutional members will receive plaques, and Affiliate members will be
permitted to appoint/elect members to the Council of Representatives (two each
national groups; one each for other organizations).
Libraries: Please encourage your library to subscribe to The Phonetician. Library
subscriptions are quite modest – and they aid us in funding our mailings to
phoneticians in Third World Countries.
Life members: Based on the request of several members, the Board of Directors
has approved the following rates for Life Membership in ISPhS:
Age 60 or older: $ 150.00
Age 50–60: $ 250.00
Younger than 50 years: $ 450.00