A Publication of ISPhS/International Society of Phonetic ...

120
the P P h h o o n n e e t t i i c c i i a a n n A Publication of ISPhS/International Society of Phonetic Sciences Historic larynx models from Franz Wethlo Number 101/102 2010 I / II

Transcript of A Publication of ISPhS/International Society of Phonetic ...

the

PPhhoonneettiicciiaann A Publication of ISPhS/International Society of Phonetic Sciences

Historic larynx models from Franz Wethlo

Number 101/102 2010 – I / II

2

ISPhS

International Society of Phonetic Sciences

President: Ruth Huntley Bahr

Secretary General: Honorary President: Mária Gósy Harry Hollien

Vice Presidents: Past Presidents: Angelika Braun Jens-Peter Köster Marie Dohalská-Zichová Harry Hollien Mária Gósy William A. Sakow † Damir Horga Martin Kloster-Jensen Eric Keller Milan Romportl † Heinrich Kelz Bertil Malmberg † Stephen Lambacher Eberhard Zwirner † Asher Laufer Daniel Jones † Judith Rosenhouse

Honorary Vice Presidents:

A. Abramson P. Janota A. Marchal M. Rossi R. Weiss

S. Agrawal W. Jassem H. Morioka M. Shirt

L. Bondarko M. Kloster-Jensen R. Nasr E. Stock

E. Emerit M. Kohno T. Nikolayeva M. Tatham

G. Fant E.-M. Krech R. K. Potapova F. Weingartner

Auditor: Angelika Braun Treasurer: Ruth Huntley Bahr Affiliated Members (Associations): American Association of Phonetic Sciences Dutch Society of Phonetics B. Schouten International Association for Forensic Phonetics and Acoustics P. French Phonetic Society of Japan I. Oshima & K. Maekawa Polish Phonetics Association G. Demenko Affiliated Members (Institutes and Companies): KayPENTAX, Lincoln Park, NJ, USA J. Crump Inst. for Advanced Study of the Communication Processes, University of Florida, USA H. Hollien Dept. of Phonetics, University of Trier, Germany A. Braun Dept. of Phonetics, University of Helsinki, Finland A. Iivonen Dept. of Phonetics, University of Zürich, Switzerland S. Schmid Centre of Poetics and Phonetics, University of Geneva, Switzerland S. Vater

3

International Society of Phonetic Sciences (ISPhS) Addresses

www.isphs.org

President: Secretary General:

Professor Ruth Huntley Bahr, Ph.D. Prof. Dr. Mária Gósy

President's Office: Secretary General's Office:

University of South Florida Kempelen Farkas Speech Research Laboratory

Dept. of Communication Sciences & Disorders Hungarian Academy of Sciences

4202 E. Fowler Ave., PCD 1017 Benczúr u. 33

Tampa, FL 33620-8200 H-1068 Budapest

USA Hungary

Tel.: ++1-813-974-3182 ++36 (1) 321-4830 ext. 172

Fax: ++1-813-974-0822 ++36 (1) 322-9297

e-mail:rbahr@ usf.edu e-mail: [email protected]

Guest Editor: Book Review Editor:

Dr. Jürgen Trouvain Prof. Judith Rosenhouse, Ph.D.

FR 4.7 Computational Linguistics Swantech

and Phonetics 89 Hagalil St

Saarland University Haifa 32684

Campus C7.2 Israel

D-66041 Saarbrücken Tel.: ++972-4-8235546

Germany Fax: ++972-4-8235546

Tel.: +49 (681) 302 4694 e-mail: [email protected]

Fax: +49 (681) 302-4684

Email: [email protected]

4

FROM THE PRESIDENT

I hope that you are enjoying the new format of the Phonetician.

The ability to include color photographs and graphs makes the

text come alive. I am grateful to the individuals who volunteer to

edit an issue. Prof./Dr. Mária Gósy is doing an excellent job of

recruiting editors; however we would welcome you to volunteer

to edit an issue. The Phonetician would be an excellent way to

showcase your area of phonetics and your institute. We all

benefit from hearing about each other’s work. So, please consider editing an issue

for us. A quick email to me or Prof./Dr. Gósy and we will help you get started and

guide you through the process.

Many thanks go out to Dr. Jürgen Trouvain for editing the current issue. My

favorite thing about this issue is the variety of topics covered. The research articles

range from a description of long term formant distributions in read and

spontaneous speech to throat singing. There is a good article on the acoustic-

phonetic collection in Dresden, as well as an article on a lesser studied language,

Lower Sorbian. Finally, we have an article in French dealing with prosody. There

is definitely something for everyone. We would love to hear your comments on

the recent issues of the Phonetician and its new online format.

FROM THE EDITOR

After various guest editorships, this double issue of the

Phonetician comes from Saarbrücken. It brings together

different research contributions which reflect as large range of

the phonetic sciences: from the acoustics of individual speaker

characteristics to the physiology of throat singing, from the

collection of historical phonetic instruments via the acquisition

of a corpus of an endangered language to an experimental study at the syntax-

prosody interface. In addition to the research articles, the reader finds conference

reports, the presentation of phonetic institutes, book reviews, as well as obituaries.

My warm thanks go to all contributors of this issue. I would like to express

my gratitude to all colleagues who supported me as a guest editor, be it as a

reviewer or in another form.

Jürgen Trouvain

Saarbrücken, March, 2012

5

The Phonetician

A Publication of ISPhS/International Society of Phonetic Sciences

ISSN 0741-6164

Numbers 101/102 / 2010-I/II

Contents

From the President …………………………………………………………. 4

From the Editor ………...……...…………………………………………… 4

Articles and Research Notes

Long-term formant distribution as a measure of speaker characteristics in

read and spontaneous speech

by Anja Moos …………………………………………………………………

7

On the Physiology of Voice Production in South-Siberian Throat singing –

Extended Abstract

by Sven Grawunder …………………………………………………………..

25

The Historical Phonetic-Acoustic Collection of the TU Dresden

by Rüdiger Hoffmann & Dieter Mehnert ..........................................................

33

GENIE: The Corpus for Spoken Lower Sorbian (GEsprochenes

NIEdersorbisch)

by Roland Marti, Bistra Andreeva & William J. Barry ………………………

47

Adjectif épithète et attribut de l’objet. Qu’en est-il de la prosodie?

by Denis Ramasse …………………………………………………………….

60

Obituaries

Eli Fischer-Jørgensen (1911-2010)

by Jack Windsor Lewis ………………………………………………………

78

Eva Sivertsen (1922-2010)

by Jack Windsor Lewis ……………………………………………………….

79

Gösta Bruce (1947-2010)

by Merle Horne ……………………………………………………………….

80

Ilse Lehiste (1922-2010)

by Viola Váradi ……………………………………………………………….

83

6

Awards

Svend Smith Award 2008 for Elisabeth Lhote

by Jens-Peter Köster ………………………………………………………….

86

Phonetic Institutes Present Themselves

The Department of Language and Communication Studies at Norwegian

University of Science and Technology, Trondheim, Norway

by Jacques Koreman ………………………………………………………….

88

Phonetics Lab and the Phonogram Archives at Zurich University,

Switzerland

by Volker Dellwo & Dieter Studer …………………………………………...

91

Conference Reports

Speech Prosody 2010 Chicago (USA)

by Stefan Baumann …………………………………………………………...

96

19th Annual Conference of the IAFPA 2010 Trier (Germany)

by Peter Knopp ……………………………………………………………….

96

New Sounds 2010 – 6th International Symposium of the Acquisition of

Second Language Speech Poznań (Poland)

by Matthias Jilka ……………………………………………………………..

99

Book Reviews

Steve Parker (ed) 2009. Phonological Argumentation. Essays on Evidence

and Motivation.

reviewed by Péter Siptár …………………………………………………….

106

Géza Németh & Gábor Olaszy (eds.) 2010. A magyar beszéd.

Beszédkutatás, beszédtechnológia, beszédinformációs rendszerek

[Hungarian Speech. Speech research, speech technology, speech information

systems]

reviewed by Péter Siptár ……………………………………………………..

110

Halicki, Shannon D. 2010. Learner Knowledge of Target Phonotactics:

Judgements of French Word Transformations.

reviewed by Chantal Paboudjian …………………………………………….

112

Meetings, Conferences and Workshops …………………………………... 116

Call for Papers ……………………………………………………………… 118

Instruction for Book Reviewers …………………………………………… 118

ISPhS Membership Application Form ……………………………………. 119

News on Dues ……………………………………………………………….. 120

7

LONG-TERM FORMANT DISTRIBUTION AS A MEASURE OF

SPEAKER CHARACTERISITICS IN READ AND SPONTANEOUS

SPEECH

Anja Moos

GULP (Glasgow University Laboratory of Phonetics) and School of

Psychology, University of Glasgow, UK

e-mail: [email protected]

Abstract The simple method of averaging formant values of a recording of a speaker known as

Long-Term Formant Distribution (LTF) is applied here to German speech in the context

of forensic speaker identification. Introduced by Nolan and Grigoras (2005), the

advantage of LTF is that it is not necessary to categorize and label each vowel produced.

Instead, for each speaker, the formants of all vocalic portions are averaged, thus leading

to one mean value per formant. The volume of speech data necessary to attain reliable

LTF values is also examined.

LTF values of 71 German speaking males in spontaneous and read speech

recorded via mobile phone connections were analysed. Good speaker characterisation is

possible using the LTF values of F2 and F3; LTF values of F3 seem slightly more useful

because it is less variable within speakers than F2. Comparison of spontaneous and read

speech revealed significant differences between the LTF values of F2 and F3 of the two

speaking styles. The LTF values of formants of read speech are higher. As LTF values

only return the average and standard deviation of formants, they are not suitable for

speaker recognition on their own. However, LTF is independent of many other measures

of a speaker, such as speaking rate, dialect, and fundamental frequency. Therefore, LTF

values can be used as an additional independent factor in speaker recognition.

Keywords

Long-term formant distribution, LTF, read vs. spontaneous speech, mobile phone

recordings, speaker comparison

Definition of LTF

Long-Term Formant Distribution (LTF) is a method used to determine average formant

values of a speaker. For each formant, all formant measurements of all vowels produced

by a speaker are averaged (across the entire recording or appropriate sub-portions of a

recording). This average is the LTF value for this formant. That means that every speaker

has one LTF value and a standard deviation (SD) per formant which shall be called

LTF1, LTF2 and so on. It is a frame-by-frame measurement, meaning that long vowels

carry more weight than short vowels.

1. Introduction

To identify a speaker by his or her phonetic speaker characteristics, various

acoustic and auditory measures are taken into account. According to Jessen

8

(2007), auditory measures such as estimation of age, health, sex, dialect and

sociolect mostly refer to group characteristics. Whereas fundamental frequency,

articulation rate, formants and voice quality, which are often measured both

acoustically and aurally, are more speaker specific. This paper focuses on

formants as the importance of and interest in formant measures for forensic cases

grows. Many studies in the last decade have shown that formants carry speaker-

specific information and that their analysis is also possible under forensic

conditions, i.e. given poor quality and bandpass filter due to phone recordings (see

Rose, 2006; Nolan, 2002; Byrne & Foulkes, 2004). This paper follows Nolan &

Grigoras (2005) who state:

“It is argued here that formants, whose frequencies and dynamics are

the product of the interaction of an individual vocal tract with the

idiosyncratic articulatory gestures needed to achieve linguistically

agreed targets, are so central to speaker identity that they must play a

pivotal role in speaker identification.” (Nolan & Grigoras, 2005: 143)

Of course formants of different people are not unique; but when combined with

other speaker characteristics listed above, they may lead to a very idiosyncratic

speaker description. Each additional independent feature can help to identify a

speaker.

The most commonly used method for formant measures in forensic phonetics

to date is the centre frequency of different vowels (cf. Jessen, 2008; Rose, 2002).

Here, formants are measured at the midpoint which is defined as the articulatory

target of the vowel produced. Usually one tries to find a number of representatives

of a couple of different vowels, mostly /i a o/, to compare their formant values

from the suspect with those of the perpetrator. Comparison of vowels in speech

can be problematic using this method as the context influences the formants. It

might also be difficult to define vowel phonemes in general or their centre

frequency in particular when dealing with a foreign language and/or poor

recording quality. Although it is an accurate method, it is very time consuming.

Another method is the study of formant dynamics. McDougall did this for

/aI/ (McDougall, 2004) and /u/ (McDougall & Nolan, 2007). They found within-

speaker consistency and between-speaker differences in the data and argued that

more attention should be paid to the development of techniques to measure

dynamic features (McDougall, 2006). However, this method bares unknown

effects of the vowel context and further research is necessary. Long term spectra

(LTS) are also used to show formant average distributions (see e.g. Nolan &

Grigoras, 2005; Hollien, 1990). An LTS is the average of all spectral slices of a

sound sample. As well as voiced speech, LTS takes everything else in the signal

into account, including voiceless portions of speech, background noise etc.

Long-Term Formant Distribution (LTF) was developed by Nolan and

Grigoras (2005) in order to address the flaws of the single vowel phoneme

9

measures and LTS. This method does not require a categorization of vowels;

instead, every vowel is used for the measurements. It is also less time consuming

to select all vowels by reading the spectrogram rather than carefully listening to

the file repeatedly to detect single vowel phonemes. In addition to saving time,

being easy to use and suitable for foreign languages, Nolan & Grigoras (2005)

mention two more benefits. First, the distribution of the formants not only reflects

the dimensions of the vocal tract but also shows habits in articulatory settings like

palatalization or lip rounding. Second, the shape of the distribution of a formant

might show useful information about the speaker insofar that a broad peaked or

narrow peaked distribution might reflect the speakers’ vowel space. The

disadvantages of LTF are that inter-individual differences on single vowels cannot

be detected and speech dynamics like transitions and coarticulation are lost.

The work of Nolan & Grigoras (2005) showed the benefits, usefulness and

efficiency of the LTF method on an English forensic case. This study will show its

applicability to German and also provide information on the following aspects:

Testing for correlation of LTF values with the fundamental frequency,

articulation rate, and dialect groups. If they correlate, it is not necessary to use

LTF in addition because no further information is gained. If they do not

correlate, LTF can be used as an independent measure that adds further

information to the characterisation or discrimination.

Determination of how many seconds of vocalic stream or of speech

recordings are needed to derive reliable LTF values. This is an essential issue

for forensic case work because voice recordings are often limited in duration.

Different speaking styles (read and spontaneous speech) were compared. It is

important to know whether, and to what degree, recordings of the same voice

differ in their LTF values between speaking styles so that it can be determined

whether spontaneous speech of a perpetrator can be reliably compared with

read speech of a suspect.

Creation of a reference database for German LTF values comprising 71

speakers. This will be useful for future use in Bayesian methods like the

likelihood ratio (see Jessen, 2008; Morrison, 2009; Rose, 2002 for usage of

likelihood ratio in forensic speaker comparison).

2. Methods

2.1. Data

Recordings of the speech corpus “Pool2010” (Jessen et al., 2005) were used. From

this German corpus, recordings of 71 male participants who read out the German

version of “North wind and the sun” were used for this experiment. For

spontaneous speech, participants were asked to describe objects to another person

without using predefined words, similar to the game “Taboo”. The person

guessing the object played ignorant to encourage the speaker to describe the items

more extensively, thereby triggering longer stretches of spontaneous speech. All

10

the recordings were made in high studio quality and later played back through

speakers and re-recorded through mobile phones to have data close to forensic

case data. The mobile phone data was used for this experiment. The recordings of spontaneous speech were 79-313 seconds long (M=178 seconds).

Recordings of the read story were 31-54 seconds long (M= 39 seconds). For the LTF

analysis, recordings were cut in a way described in section “2.3 Data preparation” below,

so that only vowels remained. After that, the vocalic stream of spontaneous speech was

12-83 seconds (M= 40 seconds), and the vocalic stream of read speech was 8-16 seconds

(M= 12 seconds). In total 142 sound files were used (71 speakers X 2 speaking styles).

2.2. Speakers

Recordings of 71 male German speakers were used. Speakers were 25 to 55 years

of age (M= 38 years). Roughly half of them had recognizable but generally weak

dialectal features of Hessian German (‘Hessisch’); 45 of the participants were

actually from that area. The remaining participants were from other parts of

Germany. None of the speakers had heavy dialectal features, and everyone had an

average or above average educational background. No noticeable speech or voice

disorders were present. Speaker IDs ranged from 35-107 (excluding 61 because of

lack of data); speakers will be referred to later in the text by their IDs. 2.3. Data preparation

For LTF, only the vocalic stream is used (i.e., every recording was cut in such a

way that only vowel sounds remained). WaveSurfer (Sjölander & Beskow, 2005)

was used for the cutting procedure. The selection process was based on several

criteria:

● Clear and visible formant structure of the first three formants (intensity

settings were sometimes increased to find F3, especially for back vowels

which tend to have a higher spectral tilt)

● Laterals and approximants were kept1

● Filled pauses and hesitations were kept if vocalic

● Creaky voice was kept if vocalic

● No nasals or strong nasality (because of zero formants at 2-3 kHz)

● No vowels spoken with a very high pitch so that harmonics rather than

formants were visible

This procedure resulted in sound files of pure vocalic stream, without any pauses

or consonants other than those stated above. This criteria was applied while

reading the spectrogram and deleting all unwanted regions. When it was unclear

whether nasality was present or not from reading the spectrogram, additional

auditory judgements were made. 2.4. Data analysis

The cut sound files were used for the formant measurements of F1, F2 and F3 with

WaveSurfer. The automatic formant tracking was set to four formants, an LPC

1 Because the formant structure of laterals and approximants is very similar to vowels, they were kept. It

doesn’t distort the data but saves working hours if no auditory inspection is needed after visual selection of

vocalic stream.

11

order of 12, a frame interval of 0.01 seconds and a nominal F1 of 500 Hz.

Recordings were down-sampled to 10 kHz. Usually the band width of telephone

recordings (roughly 300 Hz to 3-4 kHz) does not display F4 correctly or at all

because of the upper cut-off frequency. However, without a fourth dummy

formant, the tracking of F2 and F3 was often found to be unreliable, so it was kept.

Every file was manually checked and corrected if necessary. This correction

was needed because, due to the cutting procedure, samples could contain jumps

(e.g., from /i/ to /u/) without the usual formant transition. The prediction algorithm

would find such unnatural jumps quite problematic.

3. Results

3.1. General results for LTF

Figure 1 presents individual LTF2 and LTF3 values for every speaker for

spontaneous (Figure 1a) and read speech (Figure 1b). LTF1 is not shown as it is

too error prone due to the lower cut-off frequency in mobile phone transmission.

Byrne & Foulkes (2004) showed that F1 on average shifts 29 % in mobile phone

recordings compared to direct high fidelity recordings. Table 1 lists the average

LTF values for every formant and speaking style averaged across all speakers.

Both figures and the table show that LTF values are higher for read than for

spontaneous speech. A t-test for paired samples showed that this difference is

significant for all formants: t=-6.016, p<0.0001 for LTF1; t=-11.449, p<0.0001 for

LTF2; t=-6.917, p<0.0001 for LTF3. Regarding the within speaker comparison,

Figure 1a shows that hardly any LTF2 in read speech was lower than for

spontaneous speech. Only very few LTF3 values for read speech were lower than

for spontaneous speech, as Figure 1b displays.

(a) Spontaneous LTF2 ascending

1200

1300

1400

1500

1600

1700

1800

1900

2000

2100

2200

2300

2400

2500

2600

2700

70

100

37

106

54

39

91

46

107

77

90

41

103

73

97

55

53

85

66

84

49

99

78

81

52

93

87

48

42

80

72

86

47

68

101

Hz

speaker

F3_read

F3_spont

F2_read

F2_spont

12

(b) Spontaneous LTF3 ascending

Figure 1. LTF2 and LTF3 of every speaker in read and spontaneous speech. Speakers

ordered by ascending LTF values of spontaneous speech.

Table 1. LTF values in Hz for spontaneous and read speech and their standard deviations

(SD) averaged across all speakers.

F1_spont F1_ read F2_spont F2_ read F3_spont F3_ read

LTF 470 484 1400 1463 2378 2422

SD 24 21 79 70 128 125

3.2. Between-speaker comparison

Speaker-specific features can be identified in the distribution of LTF values, as

well as their mean value. Figure 2 shows the distribution of F2 and F3 for two

speakers with very different LTF values at the top, and two different speakers with

very similar values at the bottom. As the top graph shows, speakers not only

differed in their LTF mean value (with up to 500 Hz difference), but also in the

distribution. While the distribution of speaker 44 is more platykurtic (broad peak),

the distribution of speaker 95 is more leptokurtic (narrow peak). As the bottom

graph shows, both speakers have a double peak distribution for F3, but their main

peaks lie 250 Hz apart while having very similar F2 distributions. While this is not

a very distinctive feature, it would still raise some doubt whether these two

distributions are from the same speaker or not.

1200

1300

1400

1500

1600

1700

1800

1900

2000

2100

2200

2300

2400

2500

2600

2700

46

92

63

78

55

73

97

98

10

3

70

58

54

43

96

66

10

7

81

88

35

72

62

83

51

99

84

48

69

71

93

87

40

80

86

10

5

10

0

Hz

speaker

F3_read

F3_spont

F2_read

F2_spont

13

Figure 2. F2 and F3 distributions of two speakers producing spontaneous speech in

comparison. Top: Clearly distinguishable formant distributions of speaker 44 and 95.

Bottom: Similar formant distributions of speaker 35 and 66.

In comparison, Figure 3 (top) shows the distribution of F2 and F3 for speaker

44 only, with the recording of his spontaneous speech divided into two halves. The

same was done for speaker 35 in Figure 3 (bottom). For speaker 44, the

distributions of F2 and F3 are very similar in the two parts of his spontaneous

speech; however, there is a peak shift of 125 Hz for F3. No differences of the

distributions of F2 and F3 were found for speaker 35, indicating no within-speaker

differences for spontaneous speech.

0%

2%

4%

6%

8%

10%

12%

14%

60

0

72

5

85

0

97

5

11

00

12

25

13

50

14

75

16

00

17

25

18

50

19

75

21

00

22

25

23

50

24

75

26

00

27

25

28

50

29

75

31

00

32

25

33

50

Hz

44 F2

44 F3

95 F2

95 F3

0%

1%

2%

3%

4%

5%

6%

7%

600

725

850

975

1100

1225

1350

1475

1600

1725

1850

1975

2100

2225

2350

2475

2600

2725

2850

2975

3100

3225

3350

Hz

35 F2

35 F3

66 F2

66 F3

14

Figure 3. F2 and F3 distributions of speaker 44 (top) and 35 (bottom) producing spon-

taneous speech; first half of recorded speech in black, second half in grey.

To sum up, it can be very useful to look at the distribution of F2 and F3 for

speakers with very similar LTF means because their distributions can be manifold:

They can be single vs. double peaked and/or lepto- vs. platykurtic, and these

15

distribution shapes seem to be stable within speakers but can vary between

speakers. 3.3. Within-speaker comparison

3.3.1. Effect of speaking style on mean LTF

Recordings of the perpetrator are sometimes compared to recordings of the suspect

reading what has been said by the perpetrator during the crime. For this reason it is

important to know whether, and to what degree, recordings of the same voice

differ in their LTF values between spontaneous and read speech. To determine

whether spontaneous speech of the perpetrator can be reliably compared with read

speech of the suspect, LTF values within speakers were analysed across speaking

styles.

Table 2 shows the results of a t-test for paired samples. LTF values of spontaneous

and read speech were paired for every speaker. A negative mean value indicates

that spontaneous speech has lower values than the read speech. This is the case for

all three formants. The mean difference is given in Hz, so LTF2 of spontaneous

speech is 62.21 Hz lower than that of read speech and is the formant with the

largest difference between speaking styles. Given a mean difference of -62.21 Hz,

the standard deviation (here 45.78 Hz) indicates that the LTF2 difference of 68 %

of the speakers lies between -107.99 and -16.43 Hz. These numbers were derived

like this:

(1) -62.21 – 45.78 = -107.99

(2) -62.21 + 45.78 = -16.43

A positive frequency indicates that some speakers of the typical 68 % have a

higher LTF in spontaneous speech. This is the case for LTF3 where the SD of the

mean difference of read and spontaneous speech ranges between -97.6 and +9.06

Hz. All the differences are highly significant (p<0.0001). As already mentioned

LTF1 should not be taken as a reliable measure because of the lower cut-off

frequency of the mobile phone transmission. LTF3 seems most reliable to use for

speaker identification because it shows less difference between speaking styles

and is less influenced by the mobile phone bandwidth than LTF1.

Table 2. t-test for paired samples. Pairs: LTF values of read and spontaneous speech of

every speaker. SD = standard deviation, SE = standard error. All T’s significant with

p<0.0001.

paired differences

mean SD SE T df

LTF1 -14.08 19.72 2.34 -6.016 70

LTF2 -62.21 45.78 5.43 -11.449 70

LTF3 -44.00 53.60 6.36 -6.917 70

16

Despite the fact that LTF values differ significantly across speaking styles

they can still correlate strongly. In this case, a stable difference in their values can

be assumed. To find correlations, a Pearson product-moment-correlation for

interval-scaled data was conducted between all LTF values across all speakers to

look at relationships of formant specific LTF values. Table 3 shows the

statistically significant correlations between LTF values. Correlations between

LTF values of the same formant across the two speaking styles were stronger

(indicated in bold print) than correlations within the same speaking style across

different formants. The strongest correlation was for LTF3 which is known to be

the most stable formant within a speaker.

Table 3. Pearson product-moment-correlation of all LTF values. All r values are signifi-

cant with p<0.01. n.s. = not significant.

LTF2spon LTF3spon LTF1read LTF2read LTF3read

LTF1spon 0.395 n.s. 0.615 0,370 n.s.

LTF2spon 1 0.514 0,400 0.819 0,484

LTF3spon 0.514 1 n.s. 0,502 0.910

LTF1read 0,400 n.s. 1 0.377 n.s.

LTF2read 0,819 0,502 0.377 1 0.575

These correlations were made using the data of all the speakers.

Correlations of individuals may vary, so these r values can only be used as guide

values. Combining the results of the t-test and the Pearson correlation, it was

found that there is a stable difference in LTF insofar that read speech produces

mostly higher LTF values than spontaneous speech.

The scatter plot in Figure 4 shows the downshift of LTF2 and LTF3 from

read to spontaneous speech in the F2-F3-vowel space. The LTF values of every

speaker are connected with a grey arrow indicating the direction of change from

read (red circle) to spontaneous (blue x) speech. The general trend leads to a lower

LTF2 and LTF3, but some speakers also show upward shifts or downward shifts

of only one formant; only three speakers show an upward shift of both LTF2 and

LTF3.

17

Figure 4. Scatter plot of LTF2 and LTF3 of all speakers. Circle = read speech, x

= spontaneous speech. Values of every speaker connected through grey arrows.

3.3.2. Effect of speaking style on formant distribution

When investigating the distributions of read and spontaneous speech within one

speaker, it is not only interesting to see the differences between the LTF means but

also the distributions. Are they similar apart from a little upward shift? No clear

answer can be given, as shown in Figure 5. While speaker 35 has very different F3

distributions across speaking styles, the F3 distribution of speaker 100 is nearly

identical. This raises problems discussed earlier in section “3.2 Between-speaker

comparison”. When the mean is similar but the distribution different, it is still not

clear whether the samples are from different speakers or whether the same speaker

is using different speaking styles.

18

Figure 5. F2 and F3 distributions of spontaneous and read speech in comparison.

Spontaneous speech in black, read speech in grey. F2 solid line, F3 dashed line. Top:

Speaker 35. Bottom: Speaker 100.

3.4. Amount of data necessary for LTF

One of the most important questions regarding LTF values for forensic phonetics

use is: How much speech data is necessary to get reliable LTF measurements? The

amount of data is crucial because an LTF value is only meaningful if enough

(different) vowels are used. In a sample of 2 seconds of pure vocalic stream, for

19

example, it might well be that only /e/, /a/ and /ə/ are present and this would skew

the data towards the open front side of the vowel quadrilateral and therefore not

represent the vowel space of a speaker.

Because in most forensic cases there will not be extensive recordings to

extract many seconds of vocalic stream, it is necessary to find out whether short

recordings are sufficient. For this, each LTF sound file was divided into packages.

Each package represents a short sound file of one speaker. If the LTF values of the

packages (of one sound file) do not differ much from each other, it is assumed that

this size is sufficient to get reliable LTF data. The difference between packages

was detected by calculating the standard deviation between packages.

These calculations were made with various package sizes to detect the

threshold package size (because the package size is an approximation of the length

of vocalic stream needed to get reliable LTF values). Every sound file was divided

into packages of 1, 1.5, 2, 2.5 … 10 sec. The average number of packages per size

per speaker is listed in Table 4. Within each package size, the LTF package values

were taken, and a standard deviation was determined for every speaker separately

and for every package size. As the package size increases, the number of packages

naturally decreases, so the standard deviation might be influenced by size and,

therefore, the number of packages. On the other hand, in bigger packages there is

much more variation within a package, so LTF values do not differ much any

more and not many packages are needed to get a stable SD.

Table 1. Average amount of packages per speaker used to calculate standard deviations.

Top: spontaneous speech. Bottom: read speech. Package size in seconds.

package size 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

number of packages 39.9 26.5 19.6 15.7 13.0 11.0 9.6 8.5 7.6 6.8

package size 6 6.5 7 7.5 8 8.5 9 9.5 10

number of packages 6.3 5.8 5.3 5.0 4.7 4.3 4.1 3.9 3.8

package size 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

number of packages 11.5 7.5 5.5 4.4 3.6 3.0 2.6 2.1 2.1 2.0

package size 6 6.5 7 7.5 8

number of packages 2.0 2.0 2.0 2.0 2.0

If the standard deviation asymptotically reaches a constant, package sizes

do not differ much anymore and it can be assumed that the amount of data of this

package size is enough to get reliable LTF values. Figure 6 shows the course of

the SD curves across the different package sizes. The x-axis lists the package sizes

and the y-axis the SD. It is shown that for both read and spontaneous speech, the

SD was smallest for LTF1 and largest for LTF2. LTF1 is not very meaningful

because the lower cut-off frequency of mobile phone transmission shifts the

formant values in unpredictable ways and amounts, mostly upwards. LTF3 has a

20

smaller SD and is regarded as being more speaker specific (see Rose, 2002, p.

237; Ladefoged, 2001, p. 194). It is therefore best to work with LTF3. For

spontaneous speech, LTF3 seems to become stable at a package size of 6 seconds

(see Figure 6a), which equals about 27 seconds of spontaneous speech dialogue

recording. It has to be noted that the curve does not seem to have reached its

asymptotical level but, nonetheless, there is very little change in its course

anymore.

Figure 6. Standard deviation of package sizes averaged across all speakers. It can be

assumed that enough data is collected to get reliable LTF values when the curve reaches

an asymptotical level.

0

20

40

60

80

100

120

140

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10

sta

nd

ard

devia

tio

n (

Hz)

package size (sec)

F1

F2

F3

0

20

40

60

80

100

120

140

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8

sta

nd

ard

de

via

tio

n (

Hz)

package size (sec)

F1

F2

F3

21

For read speech, the LTF3 threshold is difficult to detect. It might be at 5

seconds, equal to about 16 seconds of read speech recording. Empty symbols were

used in Figure 6 for read speech at a package size of 7 seconds or larger because

they were only based on 6, 4 and 1 speaker(s) respectively. The other speakers

produced passages too short to be divided into packages larger than 6.5 seconds.

As the reading passage was not very long, very few speakers produced vocalic

data of that length and it cannot be assumed that the data represented by the empty

symbols act in a typical way.

For LTF2 a package size of 4.5 seconds seems to give reliable LTF data in

read speech (equivalent to about 14.5 seconds of read speech recording). For

spontaneous speech, the threshold is also difficult to detect. The safest choice is

the package size of 9 seconds (equivalent to about 50 seconds of spontaneous

dialogue recording) but 5.5 seconds (≈25 sec) seems to be a justifiable choice as

well.

In sum, LTF values of speech samples with at least 6 seconds of pure vocalic

stream can be considered reliable. This estimation is based on the average

behaviour of all speakers. There can sometimes be large variation between

speakers as to the threshold of sufficient LTF data (see Moos, 2008, Figure 3.15).

4. Discussion

In this study, LTF has been shown to be a valuable measure for speech

comparison and can aid in speaker identification. Some speakers had very similar

LTF values, but the distribution of the formants may vary, resulting in leptokurtic,

platykurtic or double-peaked curve shapes. Other speakers had easily

distinguishable distributions with clearly distinct means. Within-speaker

comparisons of speaking style revealed that read speech had significantly higher

LTF values than spontaneous speech. It is unclear whether this upward shift is a

shift or an expansion of the vowel quadrilateral. Hyper-articulation in read speech

would explain an expansion of the vowel space (an expansion would also result in

an upward shift of LTF because front and open vowels are used more often than

close back vowels in German, see Simpson, 1998). But, as the SD remained

constant (see Table 1), a simple upward shift rather than an expansion is assumed

(for an expansion the SD increases as well). Despite the shift, LTF values within

formants correlated strongly across speaking styles. The curve distribution within

speakers across speaking styles can also vary in different ways but generally does

not show drastic changes and shifts.

LTF is a measure of speaker characterisation that is independent of f0,

dialect and speech rate; Moos (2008) showed no correlation between LTF and

these measures using a dataset common to both studies. One aspect that could not

be covered in this article is the correlation between LTF and the physiognomy of

the speakers (e.g., body height). Several studies found weak negative correlations

between body height and formant measures (Greisbach, 1999 for German;

Gonzales, 2004 for Spanish; Rendall et al., 2005 for Canadian English, but only

22

for males). The same was found by Jessen (2010) using the same data the current

study is based on. The size of the vocal tract might be a mediator of these

correlations. Although no clear assumptions can be drawn from weak correlations,

it is very unlikely that someone with high formants will be tall and that someone

with low formants will be small.

Before working with LTF measures, it is very important to know whether

one has a sufficient amount of data. Because LTF is an average of all vowels

produced in a speech sample, short samples are not suitable for this measurement.

By dividing the given samples into smaller packages, it was estimated that roughly

6 seconds of pure vocalic stream (equivalent to 27 seconds of dialogue or 19

seconds of read speech) are, on average, enough to produce reliable LTF values.

An important aim of this work was to create a reference database for LTF to

work towards probability statements using likelihood ratios (LR). In court,

evidence has to be weighed, and probabilities have to be given in a strength-of-

evidence statement. How similar or different are the LTF values of two voice

samples, and how typical are they (i.e., do many people of the population have

those LTF values? See Jessen, 2008; Morrison, 2009; Rose 2002 to learn about LR

in forensic speaker comparison.) To be able to give a strength-of-evidence

statement (i.e., to be able to say how much more likely it is that two LTF values

are from the same or different speakers), the creation of a reference database is

essential. If there are, for example, two very similar LTF values of a suspect and

the perpetrator, it does not necessarily mean that they are from the same speaker;

if the LTF values are very typical in the population, there is relatively more

evidence that they are from different speakers than if they are very atypical (e.g.,

very low or high). An LTF database was constructed from 71 German speakers

producing read and spontaneous speech recorded through mobile phone

transmission as part of this work. This database, which enables such likelihood

ratio statements, is more extensively described in Moos (2008).

Prospects for future work are to compare the mobile phone data with high

fidelity recordings which exist for the data that has been used here as well.

Another interesting investigation would be to explore the influence telephone

bandwidth has on LTF values. The results could then be compared with those of

Byrne & Foulkes (2004) with the advantage that the same speech data was used

for both hi-fi and mobile phone qualities. Comparisons across different languages

should also be made to investigate whether LTF measures of recordings of one

person speaking different languages can be reliably compared. A further important

test concerns the reliability of LTF measures across different phoneticians taking

the measures. Will every expert include and exclude the same vocalic portions and

hence produce the same data for analysis? The same question can be applied to

different formant tracking algorithms used in different programmes like Praat,

WaveSurfer, Emu, etc. Statistical measures to evaluate the amount of LTF data

necessary to be reliable would improve the validity of the prediction. Research is

23

currently being undertaken to answer many of these questions and will hopefully

give insight into these neglected areas of LTF research.

References

Byrne, C. & Foulkes, P. (2004). The 'mobile phone effect' on vowel formants.

International Journal of Speech, Language and the Law 11(1), pp. 1350-

1771.

Greisbach, R. (1999). Estimation of speaker height from formant frequencies.

Forensic Linguistics 6(2), pp. 265-277.

Gonzalez, J. (2004). Formant frequencies and body size of speaker: A weak

relationship in adult humans. Journal of Phonetics 32(2), pp. 277-287.

Hollien, H. (1990). The Acoustics of Crime: The New Science of Forensic

Phonetics. New York: Plenum Press.

Jessen, M. (2007). Speaker classification in forensic phonetics and acoustics. In:

C. Mueller (ed): Speaker Classification I, pp. 180-204. New York, Berlin:

Springer.

Jessen, M. (2008). Forensic phonetics. Language and Linguistics Compass 2(4),

pp. 671-711.

Jessen, M. (2010). The forensic phonetician. Forensic speaker identification by

experts. In: M. Coulthard & A. Johnson (eds): The Routledge Handbook of

Forensic Linguistics, pp. 378-394. London, New York: Routledge.

Jessen, M., Köster, O. & Gfroerer, S. (2005). Influence of vocal effort on average

and variability of fundamental frequency. International Journal of Speech,

Language and the Law 12(2), pp. 174-213.

Ladefoged, P. (2001). A Course in Phonetics. USA: Heinle & Heinle.

McDougall, K. (2004). Speaker-specific formant dynamics: An experiment on

Australian English /ai/. International Journal of Speech, Language and the

Law 11(1), pp. 103-130.

McDougall, K. (2006). Dynamic features of speech and the characterization of

speakers: Towards a new approach using formant frequencies. International

Journal of Speech, Language and the Law 13(1), pp. 89-126.

McDougall, K. & Nolan, F. (2007). Discrimination of speakers using the formant

dynamics of /u:/ in British English. Proceedings of the 16th International

Congress of Phonetic Sciences, Saarbrücken, Germany, pp. 1825-1828.

Moos, A. (2008). Forensische Sprechererkennung mit der Messmethode LTF

(long-term formant distribution). Unpublished Master thesis (Magister-

arbeit), Saarbrücken, Universität des Saarlandes.

http://www.psy.gla.ac.uk/docs/download.php?type=PUBLS&id=1286

(accessed 17/08/2010).

Morrison, G. (2009). Forensic voice comparison and the paradigm shift. Science &

Justice 49(4), pp. 298-308.

Nolan, F. (2002). The 'telephone effect' on formants: A response. Forensic

Linguistics 9(1), pp. 74-82.

24

Nolan, F. & Grigoras, C. (2005). A case for formant analysis in forensic speaker

identification. International Journal of Speech, Language and the Law

12(2), pp. 143-173.

Rendall, D., Kollias, S., Ney, C. & Lloyd, P. (2005). Pitch (F0) and formant

profiles of human vowels and vowel-like baboon grunts: The role of

vocalizer body size and voice-acoustic allometry. The Journal of the

Acoustical Society of America 117(2), pp. 944-955.

Rose, P. (2002). Forensic Speaker Identification. London: Taylor & Francis.

Rose, P. (2006). Technical forensic speaker recognition: Evaluation, types and

testing of evidence. Computer, Speech and Language 20, pp. 159-191.

Simpson, A. (1998). Phonetische Datenbanken des Deutschen in der empirischen

Sprachforschung und der phonologischen Theoriebildung.

Habilitationsschrift, Christian-Albrechts-Universität zu Kiel.

Sjölander, K. & Beskow, J. (2005). WaveSurfer 1.8.5, Stockholm, KTH Royal

Institute of Technology. Software available online:

http://www.speech.kth.se/wavesurfer/index.html (accessed 06/10/2007).

25

ON THE PHYSIOLOGY OF VOICE PRODUCTION IN SOUTH-

SIBERIAN THROAT SINGING – EXTENDED ABSTRACT

Sven Grawunder

Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany

e-mail: [email protected]

This paper is an extended abstract of a PhD project that was finished in 2005 and

published as a book (Grawunder, 2009). The project represents the first field-work

based phonetic study of the extraordinary voice production mechanisms that occur

in throat singing.

Throat singing (ThS) is practiced in four areas in South-Siberia: the Republic

of Tuva, the Republic of Hakassia, the Republic of Gorno-Altai as well as parts of

the Russian Federation and adjacent Mongolia. ThS is a defined genre among and

intertwined with other oral folk-arts and singing types, and it is distinct from

Western overtone singing. Like Western overtone singing, South-Siberian ThS

uses reinforced harmonics as carriers of sung melodies and enforced phonation

modes. However, different from Western overtone singing, such targeted use of

harmonics appears as common but not essential to ThS in the conceptions of

singers (cf. van Tongeren, 2002, Grawunder, 2003b). There are two (sometimes

three) main styles with regard to voice use, as discussed by singers and

ethnomusicologists (cf. Kyrgyz, 2002): first, a tensed medial (chest-) register

voice and second, a raspy growling low register voice. Often these voice registers

are referred to in the literature with the Tuvan style names khöömei and kargyraa,

respectively.

Including a small-scale endoscopic study of one singer (the author), which

contributes to the few available articulatory studies of throat-singers (e.g. Dmitriev

et al., 1983, Edgerton, 2005, Grawunder, 2003a, 2003b, Lindestad et al., 2001,

2004, Sakakibara et al., 2002), the laryngoscopic evidence suggests that throat-

singers make use of three voice production mechanisms. All mechanisms share an

excessive constriction of the larynx entrance resulting, at various levels, in an

approximation of the aryepiglottic folds and the epiglottis. Therefore the study

focuses on phonation types which result, in addition to the normal activity of the

vocal folds (VF), from various combinations of phonation activities involving the

aryepiglottic sphincter chain (AES), the ventricular folds (VTF) and sometimes

even the aryepiglottic folds (AEF).

Two main types are therefore proposed for voice production in South-

Siberian throat singing: a voice production by means of the vocal folds featuring a

constriction of the AES (Phonation Mode 1, henceforth PM1), and a voice

production with involvement of the ventricular folds (Phonation Mode 2,

henceforth PM2). VTF involvement in PM2 appears as a double cyclic period,

with the vocal folds vibrating twice as fast as the ventricular folds; every second

cycle consists of a (near-) synchronous closure of VF and VTF (cf. Bailly et al.,

26

2010). A third proposed mechanism for PM2 is the involvement of the AEF (cf.

Sakakibara, 2004), similar to epiglottic trill (Esling et al., 2007). On the one hand,

the mechanisms of the constriction of VTFs and AEFs are discussed with respect

to histoanatomical findings of muscular tissue that facilitates the medial

compression of the VTFs (Kotby et al., 1991; Reidenbach, 1998a) as well as with

respect to the findings of muscular and ligamentous components for an AEF-

sphincter framework (Reidenbach, 1998b) that takes part in the anterior-posterior

constriction by means of the AES. On the other hand, the constriction mechanisms

are discussed with respect to the anterior-posterior constriction that is often found

in professional singing (Yanigsawa et al., 1989; Koufman et al., 1996; Stager et

al., 2003). Finally, the occurrence of these structures in linguistically relevant

sound patterns (cf. Esling et al., 2007; Edmondson & Esling, 2006) emphasizes the

significance of these phonation modes to general phonetic research.

The typical oro-pharyngeal configurations in ThS are described in a rough

scheme of at least three (overtone) articulation techniques (denoted here as

articulation types, AT) that are generally also found in overtone singing (cf.

Edgerton, 2005; Neuschaefer-Rube et al., 2001; Saus, 2004; Trân, 1991): an [l]-

like articulation of the tongue tip (AT1), an [n]-, [ŋ]-, [i]- or [u]-like articulation of

the tongue dorsum (AT2), and a mid-low vowel articulation of different heights of

front and back vowels (AT3) including larger jaw movement than in the other two

ATs. The three main ATs can be easily linked with techniques that are commonly

associated with particular (here Tuvan) styles (AT1 – sygyt, AT2 – khöömei, AT3

– kargyraa). Although all combinations of PMs and ATs are found to be used in

ThS, PM1 is mainly combined with AT1 and AT2 whereas PM2 is mainly

combined with AT3. Further ‘articulatory substyles,’ such as ezeŋgileer AT2 or

AT1 with a strong nasal (AT4) component or birlaŋnaadyr AT1/AT2 with strong

labial component (AT5), were considered but have been excluded from the

analysis to a great extent.

AT1 and AT2 display the highest prominence of the single ‘melodic’

harmonic, measured as amplitude difference to the previous and next harmonic

(12-14dB). However, the bandwidths for AT1 tend to be wider since here F2 and

F3 usually merge. Besides a general explorative investigation of the properties of

the phonation modes and articulation types, it was essential that the project

investigate possible areal patterns, i.e. differences between the four groups of

singers with regard to their origin in Southern Central Siberia. Questions

addressing the areal typology of traditional music performance (especially

singing) have gained more attention recently (see Blench & Dendo, 2006). In

particular cases, the analyses of specific ThS styles may help to retrieve parts of

the unwritten demographic history of the area, including population contact.

27

Figure 1: Sample sequence of the Tuvan Singer Oleg Kuular, starting out with

PM1AT3 and PM1AT2 (khöömei) switching to PM1AT1 (sygyt) and proceeding

further with PM2AT3 (kargyraa); the third tier contains the reinforced harmonics

for AT1/AT2 and vowel qualities for AT3

The current study is comprised of data from 69 male singers. The material

was in part collected during fieldwork in South Siberia in the years 1999 to 2002,

where 25 singers were recorded by use of a specific field setting for acoustic (Vx),

electroglottographic (Lx) and subglottal resonance (Sx) signal acquisition. For the

latter, an approach by Neumann et al. (2003) was adopted, which makes use of a

signal acquired with a small condenser microphone placed in contact with the skin

of the singer’s jugular notch, i.e. the dip at the superior border of the sternum,

between the clavicular notches. The cricothyroid ligament (ligamentum conicum),

which is palpable as small depression below the Adam’s apple, had been recently

suggested by Wokurek and Madsack (2009) as an alternative measure point for

subglottal resonance. Supplementary recordings from 44 available professional

music recordings and field recordings of other researchers were added to the

acoustic analysis.

The results of the perturbation measures for acoustic signals show dominance

of individual variability over areal (cultural) factors. As one could expect there is a

strong influence of the articulatory strategy. Nonetheless there are some

parameters, such as APQ11, the amplitude perturbation quotient (i.e. shimmer)

over 11 cycles, which seem to allow areal grouping, e.g. with Mongolian singers,

who show the highest values for PM1. However, the articulatory reinforcement

strategy interacted strongly with the phonation mode and showed the highest

amplitude perturbation values for AT1 (see Fig. 2). Another clear areal group

tendency is observable for Hakas singers with a clear preference for lower F0 in

PM1 (median values: 110Hz for AT2/AT3, 160Hz for AT1) and PM2 (60Hz for

AT3). PM2 samples of Hakas singers also show higher harmonics-to-noise ratio

28

(HNR) values and lower mean spectral slopes in all three bands investigated (0-

2kHz; 2-5kHz; 5-8kHz).

Figure 2. Areal tendencies represented by the median (full circle within the box)

for the acoustic shimmer (APQ11) measures of 44 singers

For perturbation measures of Lx-signals of the double-cyclic phonation

(PM2), the perturbation parameters have been adopted so that every second cycle

could be taken into account (see bottom channel in Figure 3). This reveals a very

stable vibratory pattern, unlike similarly labeled pathological patterns (cf. Fuks et

al., 1998).

Figure 3. Double cycle phonation mode (kargyraa) sequence of a three-channel

recording (VxLxSx) of the Tuvan singer SI

29

For the ultra-structure of the Lx, besides a schematic description of the cycle

shape, the applicable phase quotients (closed, closing, open quotient) and

symmetry indicators (speed quotient, contact index) were analyzed. Based on the

values of the closed quotient and closing quotient for PM1 (AES-VF), the

impression of a tensed (sometimes pressed) voice seems to be justified. In AT1,

the subglottal wave is fully dominated by reinforced harmonic formants (usually

F2).

For the low PM2, there was only one singer for whom the involvement of

an AEF-VF phonation type seemed reasonably certain. A controlled imitation of

AEF-VF phonation by the author was added. For both singers, the lack of a peak

around 3 kHz in the long-term-average spectra (cf. spectrogram in Figure 1)

comes into question. However, the noise-to-harmonics ratio value, that is the ratio

of nonharmonic energy (frequency range: 70Hz – 4500Hz) in the spectrum, which

is therefore taken as an indicator of higher frequency noise, was not particularly

noticeable. The majority of the investigated singers seem to use a phonation type

of a double-cycle ventricular fold/vocal fold oscillation (VTF-VF). Based on the

synchronous analysis of Vx, Lx and inverse filtered Vx it can be concluded that

the main vocal tract excitation occurs with the closure phase of the ‘pure’ vocal

fold cycle (cf. Henrich et al., 2006; Bailly et al., 2010). One cycle, presumably the

VF-cycle, showed short closing phases and higher symmetry indicators. Then the

vibration of the VFs triggers the VTF vibration at F0/2. However, in terms of

cycle-to-cycle amplitude difference, the subcorpus of Vx-Lx signals contains

examples with exactly the reverse patterns: the Vx excitation instant aligns well

with the higher closing peak in Lx but more frequently with the lower peak (see

Figure 3). It also remains uncertain to what degree the subglottal wave is able to

support one of the two cycles. For the one case where all three channels were

successfully recorded, the subglottal sound pressure maximum seemed to precede

the supraglottal peak, appearing right at the end of the opening phase.

Overall, the acquired data support a model of reinforcement of harmonics

by four different means (cf. Edgerton, 2005). First, there is voice source variation

(shortened closing phase, with increased excitation strength presumably via

increased subglottal pressure, while air flow remains constant or lowered for the

tensed mode (PM1); and double cycle modes involving mass bodies of

supralaryngeal structures for the low mode (PM2) enabling fundamentals at half of

VF-F0). Second, a specific formant adjustment for F2 comes into play that results,

for some articulatory strategies, in formant merging (F1/F2 or F2/F3) due to

multiple vocal tract constrictions (e.g. the sublaminal cavity; Engstrad et al.,

2007), including a coupling of source and adjacent epi- and supralaryngeal rooms

(of approx. 1/6 vocal-tract length; cf. Titze & Story, 1997). Third, a specific

bandwidth tuning results partially from adjustment of lip radiation and partially

from a stiffness of the articulators. Finally, the fourth mechanism of reinforcing

harmonics is the aryepiglottic sphincter which facilitates F1 and F0 damping, a

mechanism that is used individually to a very different extent.

30

[A short audio sample from the Tuvan singer Ayas Danzyrin, recorded by author in 2000,

can be found here: http://www.eva.mpg.de/~grawunde/otsths/phdxtdabs.html]

References

Bailly, L., Henrich, N. & Pelorson, X. (2010). Vocal fold and ventricular fold

vibration in period-doubling phonation: Physiological description and

aerodynamic modeling. Journal of the Acoustical Society of America 127(5),

pp. 3212–3222.

Blench, R. & Dendo, M. (2006) Musical instruments and musical practice as

markers of the Austronesian expansion post- Taiwan. Paper presented at the

18th Congress of the Indo-pacific Prehistory Association, University of the

Philippines, Manila, 20 – 26 March 2006 retrieved 2011-04-01 from

http://www.rogerblench.info/Ethnomusicology

%20data/Papers/Asia/General/Roger%20Blench%20AN%20music%20II%2

0paper%20submit.pdf

Dmitriev, L. B., Chernov, B. P. & Maslov, V. T. (1983). Functioning of the voice

mechanism in double-voice touvinian singing. Folia Phoniatrica 35(5),

pp.193–197.

Edgerton, M. (2005). The 21st-century voice: contemporary and traditional extra-

normal voice. The New Instrumentation (Vol. 9). Scarecrow, Lanham (ML),

Toronto, Oxford.

Edmondson, J. & Esling, J. (2006). The valves of the throat and their functioning

in tone, vocal register, and stress: laryngoscopic case studies. Phonology 23,

pp.157–191.

Engstrand, O., Frid, J. & Lindblom, B. (2007). A perceptual bridge between

coronal and dorsal /r/. In Solé, M.-J., Beddor, P. S. & Ohala, M.,(eds),

Experimental approaches to phonology, pp 175–191. Oxford University

Press.

Esling, J. H., Zeroual, C. & Crevier-Buchman, L. (2007). A study of muscular

synergies at the glottal, ventricular and aryepiglottic levels. Proc. of the 16th

ICPhS, Saarbrücken, pp. 585-588.

Fuks, L., Hammarberg, B. & Sundberg, J. (1998). A self-sustained vocal-

ventricular phonation mode: acoustical, aerodynamic and glottographic

evidences. TMH-QPSR 3/1998, pp. 49–59.

Grawunder, S. (2003a). Comparison of voice production types of ’western’

overtone singing and south siberian throat singing. Proc. of the 15th ICPhS,

Barcelona., pp. 1699–1702.

Grawunder, S. (2003b). Der südsibirische Kehlgesang als Gegenstand

phonetischer Untersuchungen. In: Krech, E.-M. & Stock, E. (eds)

Gegenstandsauffassung und aktuelle Forschungen der halleschen

Sprechwissenschaft (Hallesche Schriften zur Sprechwissenschaft und

Phonetik vol. 10), pp 53–91. Peter Lang, Frankfurt am Main.

31

Grawunder, S. (2009). On the Physiology of Voice Production in South-Siberian

Throat Singing - Analysis of Acoustic and Electrophysiological Evidences.

Frank & Timme, Berlin.

Kotby, M. N., Kirchner, J. A., Kahane, J. C., Basiouny, S. E. & el Samaa, M.

(1991). Histo-anatomical structure of the human laryngeal ventricle. Acta

Otolaryngol, 111(2), pp. 396–402.

Koufman, J. A., Radomski, T. A., Joharji, G. M., Russell, G. B., & Pillsbury, D.

C. (1996). Laryngeal biomechanics of the singing voice. Otolaryngol Head

Neck Surg,115(6), pp.527–537.

Kyrgys, Z. K. (2002). Tuvinskoe gorlovoe penie - etnomuzikovečeskoe

issledovanie [Tuvan Throat Singing - ethnomusicological studies]. Nauka,

Novosibirsk.

Lindestad, P. A., Sødersten, M., Merker, B. & Granqvist, S. (2001). Voice source

characteristics in mongolian “throat singing” studied with high-speed

imaging technique, acoustic spectra, and inverse filtering. Journal of Voice

15(1), pp.78–85.

Lindestad, P. A., Blixt, V., Pahlberg-Olsson, J. & Hammarberg, B. (2004).

Ventricular fold vibration in voice production: a high-speed imaging study

with kymographic, acoustic and perceptual analyses of a voice patient and a

vocally healthy subject. Logoped Phoniatr Vocol 29(4), pp. 162–70.

Neumann, K., Gall, V., Schutte, H. K. & Miller, D. G. (2003). A new method to

record subglottal pressure waves: potential applications. Journal of Voice

17(2), pp.140–59.

Neuschaefer-Rube, C., Saus, W., Matern, G., Kob, M. & Klajman, S. (2001).

Sono-graphische und endoskopische Untersuchungen beim Obertonsingen.

In: Geissner, H. (ed) Stimmkulturen – 3. Stuttgarter Stimmtage 2000, pp.

219–222. Röhrig Universitätsverlag, St. Ingbert.

Reidenbach, M. M. (1998a). Aryepiglottic fold: normal topography and clinical

implications. Clin Anat 11(4), pp. 223–35.

Reidenbach, M. M. (1998b). The muscular tissue of the vestibular folds of the

larynx. Eur Arch Otorhinolaryngol 255(7), pp.365–7.

Sakakibara, K.-I., Kimura, M., Imagawa, H., Niimi, S. & Tayama, N. (2004).

Physiological study of the supraglottal structure. In ICVPB 2004, Marseille.

Stager, S. V., Neubert, R., Miller, S., Regnell, J. R. & Bielamowicz, S. A. (2003).

Incidence of supraglottic activity in males and females: a preliminary report.

Journal of Voice 17(3), pp. 395–402.

Saus, W. (2004). Oberton singen – Das Geheimnis einer magischen Stimmkunst.

Traumzeit-Verlag, Schönau, Odenwald.

Titze, I. R. & Story, B. H. (1997). Acoustic interactions of the voice source with

the lower vocal tract. Journal of the Acoustical Society of America 101(4),

pp. 2234–2243.

32

Trân, Q. H. (1991). New experiments about the overtone singing style. Bulletin d’

Audio-phonologie. Ann. Sc. Univ. Franche-Comté, Vol. VII (N◦5&6), pp.

607–618.

van Tongeren, M. (2002). Overtone Singing - physics and metaphysics of

harmonics in east and west. The Harmonic Series (vol. 1). Fusica,

Amsterdam.

Wokurek, W., & Madsack, A. (2009). Comparison of manual and automated

estimates of subglottal resonances, Proc. Interspeech, Brighton, pp. 1671-

1674.

Yanagisawa, E., Estill, J., Kmucha, S. T. & Leder, S. B. (1989). The contribution

of aryepiglottic constriction to “ringing” voice quality - a videolaryngoscopic

study with acoustic analysis. Journal of Voice 3(4), pp. 342–350.

33

THE HISTORIC ACOUSTIC-PHONETIC COLLECTION

OF THE TU DRESDEN

Rüdiger Hoffmann, Dieter Mehnert

Technische Universität Dresden, Institut für Akustik und

Sprachkommunikation

email: [email protected], [email protected]

1 Introduction

At the beginning of the last century, the growing interest in foreign cultures and

languages led to a rapid development in experimental phonetics. In Germany,

Rousselot’s scholar, Panconcelli-Calzia, introduced experimental phonetics as a

scientific discipline in Hamburg, as did Gutzmann and Wethlo in Berlin. With the

development of electronic computing in the middle of the century, the interest in

human hearing and speaking was extended to machines, and the field of speech

technology, with the main topics of speech recognition and synthesis, started to be

investigated. In this way, we have far more than one century of fascinating

development of experimental phonetics and speech technology. It can be

illustrated by numerous material objects coming from phonetic or acoustic

laboratories. The Dresden University of Technology, which was one of the

pioneering institutions in German speech technology, hosts a collection of such

objects, called Historic Acoustic-phonetic Collection (HAPS). HAPS was formally

founded more than one decade ago, in 1999, but its roots go back to very

renowned German institutes of the past. This paper describes the history and the

recent activities of this university-owned collection.

2 History of the Collection

2.1 Forming a Collection in Speech Technology

Information Technology at the TU Dresden goes back to Heinrich Barkhausen

(1881–1956), the “father of the electron valve,” who taught from 1911 to 1953. He

was also interested in psycoacoustics and invented the first measurement device

for loudness. Speech research in a narrower sense started with the development of

a vocoder in the 1950s. Walter Tscheschner (1927–2004, Figure 1) started his

extensive investigations on the speech signal using components of this vocoder.

34

Figure 1. Walter Tscheschner (right), pioneer of speech

technology in Dresden, with the founder of the Institute of

Telecommunikation of the former TH Dresden, Kurt

Freitag. Photograph from about 1960.

In 1969, a scientific unit for Communication and Measurement was founded

in Dresden. It was the main root of the present Institute of Acoustics and Speech

Communication. W. Tscheschner was appointed Professor of Speech Communica-

tion and started with research in speech synthesis and recognition.

A number of representative devices for speech synthesis and recognition

have been developed in Dresden. Over six decades, they formed a historic

collection, which demonstrates how speech technology was developed depending

on the technological base, starting with electronic valves and ending with

embedded devices [1].

2.2 Expanding the Collection towards Experimental Phonetics

At the Berlin University, phonetics was established as an institution out of two

disciplines: linguistics and medicine. The linguistic root was formed by the

Phonographic Commission, founded in 1915, which was started to record the

voices of speakers representing foreign peoples on wax cylinders or records. This

institution developed in several stages into the Institute of Sound Research at

Berlin University. In 1951, the institute was renamed Institute of Phonetics.

The second root of phonetics at the Berlin University is represented by

Hermann Gutzmann sen. (1865-1922), who worked as a voice and speech

pathologist. Gutzmann, who made speech therapy part of the university’s

curriculum, collected all the new instruments and research devices that had been

used since 1900 by the emerging discipline of experimental phonetics. It was on

Gutzmann’s initiative that the first Phonetics Laboratory was founded in Berlin. In

1926, the Phonetics Laboratory became an independent institution under the

direction of Franz Wethlo (1866–1960). Wethlo received the teaching assignment

35

for Experimental Phonetics in 1926, which gave him the opportunity to extend the

laboratory and to purchase new equipment. He developed numerous pieces of

equipment. After the re-opening of Berlin University in 1947, the Phonetics

Laboratory became part of the Institute for Special Education in 1950, which had

just been founded.

More details and a description of how the two roots came together can be

found in [2] and [3]. After the restructuring following German reunification in

1990, phonetics was organized under the roof of the School of Rehabilitation

Sciences. As a result of the higher education reform at the three Berlin

universities, enrollment for the course of study ‘Science of Speech/specialisation

Voice and Speech Therapy’ was stopped by decree in the autumn semester, 1993.

This led to the closing of the subject area Phonetics in Berlin at the end of the year

1996.

Based on the long lasting cooperation between the phonetics in Berlin and

Speech Technology in Dresden [4], the historical remnants of the phonetic

equipment were transferred to Dresden following the closing of the Chair of

Phonetics in Berlin. This equipment set was complemented by a number of

devices which came from numerous other German institutions, mainly from a

former laboratory in Chemnitz which was founded by Georg Zöppel (1892–1963).

With this merger, the Dresden collection expanded to represent one century of

continuous development of experimental phonetics and speech technology. The

merger was completed in 1999. Therefore, we consider this as the founding year

of the HAPS.

2.3 The Merger with the Former Hamburg Phonetic Collection

The Humanities Faculty of the Hamburg University goes back mainly to the

Hamburg Colonial Institute, which was opened in 1908. It included a number of

chairs working with foreign languages. There, a phonetics laboratory was founded

in 1910 as a part of the Department of African Languages, developing later into a

separate institute of the Hamburg University, which was founded in 1919.

From 1910 to 1949, the Phonetics Laboratory or Institute, respectively, was

directed by Giulio Panconcelli-Calzia (1878–1966, [5], Figure 2) who was a

scholar of the Abb´e Rousselot. He was an ingenious researcher who built the

institute into a place of international scientific importance. He founded the journal

VOX, which served as an international platform for experimental phonetics. It is

notable that the First International Congress of Experimental Phonetics took place

in Hamburg back in 1914.

36

Figure 2. Giulio Panconcelli-Calzia demonstrates the

application of a kymograph. Photograph from the

HAPS collection.

A detailed description of the history of the institute is given in [6]. In the

1990s, the educational branch of the institute was transferred to another

department. The remaining part, which focused to general phonetics, was closed at

the end of the winter term 2006/07 due to the restructuring of Hamburg

University.

The large collection of phonetic devices, which was part of the Phonetic

Institute, fortunately survived the destruction of Hamburg during World War II

and was opened for the public in 1986 [7]. As a plan for preserving this valuable

collection, despite of the closing of the institute, the responsible department

proposed a merger with the collection in Dresden. The collection was transferred

to Dresden in 2005. Since 2006, the united collection can be visited in two rooms

of the Barkhausen building of the TU Dresden (Figure 3).

37

Figure 3. View on one room of the collection in the Barkhausen building of the

TU Dresden.

3 Recent Status of the Collection

The HAPS preserves parts of the material estate of several important institutions in

Germany. It represents, therefore, the development of experimental phonetics and

speech technology in Germany with a high degree of completeness. In more detail,

the following groups of exhibits are available:

Historic phonetic devices of the pre-electronic era

These devices from the first half of the 20th century are mainly mechanical and

include different groups:

instruments for the experimental work of the phonetician (devices for

recording speech and related signals,

devices for interpreting the recordings like measuring pitch contours,

devices for measuring frequencies and performing spectral analysis,

objects for teaching purposes (models of voicing and articulation),

early devices for speech training and rehabilitation of handicapped people.

Historic phonetic devices of the early electronic era

The purpose of these objects from the second half of the 20th century is similar to

that of the mechanical devices, which are mentioned above, but are now

38

accomplished by electronic means. This collection stops with the introduction of

the computer in the phonetic laboratories.

Historic objects demonstrating the development of speech technology

A few objects of this collection demonstrate how sounds and speech can be

produced by mechanical means. Of course, the real development of devices for

speech synthesis and speech recognition is connected to the electronic and,

primarily, the computer era. The collection includes not only objects from the

research and development in Dresden (following the vocoder from the 1950s), but

also a number of early speech synthesizers from other laboratories.

Historic sound recordings

At first, it must be noted that the placement of the important collection of wax

cylinders from the former Hamburg Colonial Institute and its successor chairs is

not known. Hence, they did not come to Dresden. However, the HAPS includes a

larger number of shellac records. Some of them were produced in the laboratory of

Panconcelli-Calcia for demonstration purposes. The main collection, however,

consists of commercial music records with lower scientific importance. They were

collected by Wilhelm Heinitz (1883–1963) who directed a research unit for

ethnomusicology in Hamburg until 1948. Furthermore, the HAPS includes tapes

with sound examples of the Dresden vocoder and early speech synthesizers.

Historic photographs and transparencies

The collection includes, among other visual media, a set of valuable photographic

plates from Panconcelli-Calzia’s laboratory. Some of them are very useful because

they demonstrate the correct application of early phonetic devices.

4 Public Activities

The HAPS is a collection of the university which is used in teaching and research.

The university collections in Dresden are managed by a curator which is

responsible for the inventory. Due to the rapid growth of the collection during the

last decade, the simple activity of producing such an inventory was very

important. The objects have been photographed, and a first selection of the images

is available on the websites of the institute [8]. A printed catalogue of the

collection is in preparation. A first volume, which includes the historic phonetic

devices, will be published around the end of this year by the publisher Thelem in

Dresden. The HAPS can be visited on demand and at special opportunities like the

dies academicus or the annual “night of sciences”. Additionally, selected objects

have been presented at special exhibitions as follows:

Exhibition about measuring pitch with historic instruments at the 3rd

International Conference on Speech Prosody in Dresden, 2006,

Participation with selected objects at the exhibition “Kempelen – Man in

the Machine” in the Hall of Arts, Budapest, 2007,

39

Exhibition of selected objects at the 16th International Congress of

Phonetic Sciences (ICPhS) in Saarbrücken, 2007 (Figure 4),

Special exhibition “SprachSignale” (SpeechSignals) in the Technical

Museum Dresden, 2009–2010.

Figure 4. Selected exhibits from the HAPS at the International Congress of

Phonetic Sciences in Saarbrücken, Germany, 2007.

5 Scientific Projects

A number of scientific historic projects have been performed during the last

decade. They have been partially supported by the German Acoustic Society

(DEGA). A short overview on these activities follows:

5.1 History of the Institutions

The HAPS illustrates more than a century of development in experimental

phonetics and more than a half century in speech technology. It is important to

connect the exhibits with the scientific development at the places where they

originated. Therefore, we are collecting and publishing material on the

development in Dresden and Berlin [4], Hamburg and other places. In particular,

we are working on a monograph about the development of speech technology in

Dresden.

5.2 Investigations on Selected Phonetic Devices

It is sometimes not easy to understand how the historic phonetic devices worked.

Many questions had to be answered for the descriptions of the instruments in the

catalogue which is prepared for printing now. Among them, some devices were

investigated in more detail.

Wethlo’s cushion pipes

An early project dealt with the reconstruction of historical larynx models. In 1898,

Ewald had proposed an improvement of the existing larynx models by replacing

40

the simple membranes with air-pressurized cushions. Wethlo investigated this

more natural construction in great detail from 1913 onwards [9]. The model,

which was critical in the development of voicing theories, is known as “Wethlo’s

Polsterpfeife” (cushion pipe). The Dresden collection includes a number of these

objects in different sizes (Figure 5). Some of them are originals from Wethlo’s

estate. They were reconstructed, and a number of experiments and measurements

were performed [10].

Figure 5. Historic larynx models from Franz Wethlo, so-called cushion

pipes.

History of pitch measurement

Pitch measurement has always played an important role in phonetics. There were

different methods for recording speech signals, but the application of a kymograph

was the predominant one. After recording the speech signal, it had to be measured

to produce a curve showing the pitch vs. time. The whole procedure of converting

kymographic waveforms into pitch contours required a number of steps, which

had to be performed with great precision. Because this was a very time-consuming

process, a number of aids were proposed, which were in use until the 1950s. We

have tried to explain their application [11].

Pitch measurement with Boeke’s rack

Another way to measure pitch contours and other parameters is based on the

measurement of the “glyphs” at the surface of the wax cylinders of Edison’s

phonograph. This was performed using a very sophisticated instrument which was

41

designed by J. D. Boeke. One of these devices is part of the HAPS (Figure 6) and

was described in more detail in [12].

Figure 6. Boeke’s rack for measuring the ‘glyphs’ at wax cylinders.

Accuracy of measuring frequencies with mechanical resonators

A simple and widespread method for measuring the frequency of sounds was the

application of Helmholtz resonators (fixed frequencies) or resonator tubes of

Schaefer (tunable frequencies, see Figure 7). It is interesting to know more about

the accuracy of these historic measuring devices. Therefore we performed a

number of listening experiments which showed high accuracy in general, but a

systematic deviation in the case of Schaefer’s resonators [13].

42

Figure 7. A set of tunable resonators

from Schaefer.

Transfer functions of Marey’s capsules

Transducers, which convert speech sounds into mechanic movements of writing

pins, have been used successfully for waveform recording early in experimental

phonetics. The sound is transmitted through a hose into a flat, normally circular

capsule, which is closed by a thin rubber membrane. The movement of the

membrane is transferred to a light lever with an attached pin. The tip of the pin

scratches the waveform in the sooted paper on the revolving drum of a

kymograph. This approach dates back to E. J. Marey (1830–1904) who used it for

recordings of the movement of the pulse artery (sphygmograph) and other

physiologic motions. Later, it was widely applied in experimental phonetics by P.-

J. Rousselot (1846–1924), his scholar G. Panconcelli-Calcia, and other successors.

The properties of the transducers of the Marey type were evaluated by interfero-

metric measurements of the transfer functions of numerous capsules from the

HAPS collection [13]. It became clear that the transfer functions are not at all flat

over the frequency range of interest. They show several maxima which are

determined by the interplay of the system components, mainly the hose and the

capsule. Fortunately, the missing flatness does not influence the period lengths of

the recorded signals, which are measured for determining the pitch contour.

Historic devices for rehabilitation purposes

Rehabilitation engineering is a classical application field of speech technology.

Therefore, it is interesting to study the early attempts, mainly from the pre-

electronic era. Prototypes of such devices are rare exhibits. The HAPS owns some

examples which have been demonstrated in [14].

5.3 History of Speech Technology

Speech technology is the main research focus of the chair where the HAPS is

maintained. Therefore the development of speech analysis and synthesis is one of

43

the foci of the historic interest. During the last years, special attention was directed

to the following problems.

History of mechanical speech synthesis

This research activity is due to the existence of small mechanical sound or word

synthesizers which came to the HAPS from the Hamburg collection (Figure 8). In

the year 1899, the notable otologist Johannes Kessel (1839–1907) presented such

instruments at a scientific meeting in Munich [15]. Kessel aimed to use them to

teach people who have a significant degree of deafness. He recognized, however,

that the quality of the synthetic voice was still insufficient for this purpose. Later,

the original devices came to the Hamburg laboratory. The mechanical voices are

interesting as early mechanical speech synthesizers. Therefore, we started a project

to explore the development of this technology [16]. It can be interpreted as a late

spin-off of Kempelen’s speaking machine, the principle of which came (via

Melzel) to the puppet manufacturers.

Figure 8. A collection of voice mechanics by Hugo Hölbe, arranged in

a demonstration box.

In our case, Hugo Hölbe (1844–1931) from Sonneberg was the manufacturer

of the voices. Sonneberg is a town in Thuringia and was known as the world

capital of toys in former times (Figure 9). We learned that “Stimmenmacher”

(voice manufacturer) was a separate profession in the production of puppets and

cuddly toys.

44

Figure 9. The “speaking picture book” was patented in 1874 by the bookseller

Theodor Brand from Sonneberg, Germany. It applies voice mechanics similar to

that from Figure 8. Left: view of the title; right: the interior.

History of early vocoders

The development of the vocoder in the 1930s had a profound impact on speech

research in general. The first patent of the principle of the channel vocoder was

derived by K.-O. Schmidt [17], but the most important prototype was originated

by H. Dudley [18] who also coined the name. A number of other prototypes were

developed during and after WorldWar II in different countries. We tried to collect

all available information about this period [19]. It was not always easy because

much of the work was secret in that time.

History of electronic speech synthesis

As already mentioned, there was also a vocoder developed in Dresden in the 1950s

(Figure 10). In the following decades, many prototypes of a speech synthesis

terminal were developed [20], partially in cooperation with the computer company

Robotron. We demonstrated these objects in the special exhibition SprachSignale

(cf. 4) and included the historic examples in our lectures on speech technology.

45

Figure 10. Photograph of the Dresden vocoder from the 1950s.

6 Conclusion

The HAPS has been well developed during the last decade. We are confident to be

able to continue the work of collecting equipment, as well as continuing some

research activity. We hope that the Department of Electrical Engineering and

Information Technology at the TU Dresden specifies a final place for all scientific

collections in the near future, which would guarantee stable conditions for the

future of the HAPS.

References

[1] Hoffmann, R.: 40 Jahre institutionalisierte Sprachtechnologie in Dresden.

Studientexte zur Sprachkommunikation, vol. 54. Dresden: TUDpress 2009,

7–35.

[2] Mehnert, D.: Phonetics at the University of Berlin – a history. The

Phonetician, No. 92 (2005–II), 34–39.

[3] Mehnert, D.: Phonetik an der Berliner Universität - ein Rückblick auf ihre

Geschichte und auf Forschungsarbeiten der letzten Jahre. Studientexte zur

Sprachkommunikation, vol. 35. Dresden: Universitätsverlag 2005, 33–54.

[4] Hoffmann, R.; Mehnert, D.: Berlin-Dresden traditions in experimental

phonetics and speech communication. In: Boe, L.-J.; Vilain, C.-E. (eds.): Un

siècle de phonétique expérimentale. Lyon: ENS Éditions 2010, 191–210.

[5] Köster, J.: Giulio Panconcelli-Calzia. The Phonetician, CL-61, 1992, 3–10.

46

[6] Neppert, J.; Pétursson, M.: Death of a Phonetic Institute: The Phonetic

Institute of the University of Hamburg. Studientexte zur Sprach-

kommunikation, vol. 54. Dresden: TUDpress 2009, 36–39.

[7] Grieger, W.: Führer durch die Schausammlung, Phonetisches Institut.

Hamburg: Christians 1989.

[8] www.ias.et.tu-dresden.de/sprache

[9] Wethlo, F.: Versuche mit Polsterpfeifen. Passow-Schaefers Beiträge für die

gesamte Physiologie 6(1913) 3, 268–280.

[10] Hoffmann, R.; Mehnert, D.; Dietzel, R.; Kordon, U.: Acoustic experiments

with Wethlo’s larynx model. International Workshop to the Memory of

Wolfgang von Kempelen, Budapest, March 11–13, 2004. Grazer

Linguistische Studien 62 (2004), 51–60.

[11] Mehnert, D.; Hoffmann, R.: Measuring Pitch with Historic Phonetic Devices.

3rd International Conference Speech Prosody, Dresden. May 2–5, 2006.

Dresden: TUDpress 2006, 927–931.

[12] Mehnert, D.; Dietzel, R.: Von Glyphen zu Tonhöhen und Intensitäten – das

Boekesche Gestell, ein historisches Auswertegerät. Studientexte zur

Sprachkommunikation, vol. 52. Dresden: TUDpress 2009, 198–208.

[13] Hoffmann, R.; Mehnert, D.; Dietzel, R.: Measuring the accuracy of historic

phonetic instruments. Proc. 17th Int. Congress of Phonetic Sciences, Hong

Kong 2011, pp. 176-179.

[14] Mehnert, D.; Dietzel, R.; Kordon, U.: Aus den Anfängen der Experimental-

phonetik – Hilfsgeräte zur Behandlung Hör- und Sprachbehinderter.

Fortschritte der Akustik, DAGA 2011, Düsseldorf, 147–148.

[15] Denker, A.: Bericht über die Versammlung deutscher Ohrenärzte und

Taubstummenlehrer zu München. Archiv für Ohrenheilkunde 47, Nr. 3, Nov.

1899, 198–208.

[16] Hoffmann, R.; Mehnert, D.: Die Kesselschen Stimm-Mechaniken in der

historischen akustisch-phonetischen Sammlung der TU Dresden. DAGA,

Stuttgart, March 19–22, 2007.

[17] Schmidt, K.-O.: Verfahren zur besseren Ausnutzung des Übertragungsweges.

German Patent 594 976, patented February 27, 1932.

[18] Dudley, H. W.: Signaling System. US Patent 2,098,956, patented Nov. 16,

1937.

[19] Hoffmann, R.: On the development of early vocoders. Proc. IEEE Histelcon,

Madrid 2010, 6 p.

[20] Hoffmann, R.: Sprachsynthese an der TU Dresden: Wurzeln und

Entwicklung. Studientexte zur Sprachkommunikation, vol. 35. Dresden:

Universitätsverlag 2005, 55–77.

47

GENIE: The Corpus for Spoken Lower Sorbian (GEsprochenes

NIEdersorbisch)

Roland Marti, Bistra Andreeva, William J. Barry

Department of Slavonic Languages, Saarland University, Saarbrücken,

Germany

Phonetics, Saarland University, Saarbrücken, Germany

e-mail: [email protected], [email protected],

[email protected]

Abstract

Lower Sorbian is a Slavonic minority language spoken in Eastern Germany in

German-speaking surroundings. The language is on the brink of extinction as there

are basically no native speakers below the age of sixty. Therefore, the

documentation of spoken Lower Sorbian is crucial. The corpus of spoken Lower

Sorbian GENIE (GE[sprochenes] NIE[dersorbisch]: http://genie.coli.uni-

saarland.de/) is the first documentation of this kind. It brings together various

kinds of spoken Lower Sorbian: recordings from the archive of Sorbian broadcasts

(years 1956-2006), recordings from the Archive of Sorbian Culture (dialect

recordings 1951-1971), and new recordings from native speakers made especially

for the corpus in 2005/2006.

The paper presents the corpus and its defining features, paying special

attention to the particular situation of Lower Sorbian and its bilingual speakers. On

the one hand, there is a very strong German influence; but on the other, Upper

Sorbian interference is also clearly recognizable in the recordings. Furthermore,

the paper illustrates the problem of what constitutes the speech of a native speaker

in the case of minority languages. Finally, the problems of corpora of endangered

languages are discussed.

1. Sorbian

Sorbian is currently geographically the furthermost western part of the Slavic

speaking area. It is at present a language island (more exactly, an archipelago of

islets) within a German speaking area, that is situated in Upper and Lower Lusatia.

This represents the remainder of the originally much larger territory, which, by

means of language exchange, was gradually Germanized; a process that was

repeatedly triggered and fostered by language-political measures that still continue

(cf. Figure 1).

48

Figure 1: The Sorbian-speaking region in Germany.

This language area can be roughly divided into Upper and Lower Sorbian.

Only in the Upper Sorbian area, more precisely in the Catholic districts, are there

still villages where Sorbian is the common language (Scholze, 2008); elsewhere it

remains nothing more than a family language, or rather the language of the older

generation(s). The number of people with an active command of Sorbian can only

be estimated. The estimates vary between 15,000 and 30,000 for Upper Sorbian

and between 5,000 and 10,000 for Lower Sorbian (Jodlbauer, Spieß & Steenwijk,

2001). Upper, as well as Lower, Sorbian are autonomous languages. They are

officially acknowledged as minority languages in Germany, first, in the

constitutions and appropriate laws concerning Sorbs (or Sorbs/Wends) in the Free

State of Saxony and the state Brandenburg1 and, second, in the European Charter

for Regional or Minority Languages.

The main problem for the Sorbian language is the dying-off of the Sorbian

speaking community due to the lack of younger native speakers and the

consequent shrinking of the area in which Sorbian is spoken. Geographical

shrinkage is a phenomenon that has been observed since the 16th century. Both

trends have been accelerating since the mid 19th century, and neither the revival

measures nor fostering throughout the German Democratic Republic era could

stop them. There are language preservation and revitalization measures at present

1 The official name in Brandenburg is “Sorbs/Wends” (“Sorben/Wenden”) and “Sorbian/Wendish”

(“sorbisch/ wendisch”) since a part of the Lower Sorbian speaking community refuses the name “Sorbs”

(“Sorben”) and “Sorbian” (“sorbisch”), where native speakers are concerned. According to linguistic

(Slavic) tradition only “Sorbs” (“Sorben”) and “Sorbian” (“sorbisch”) are used.

Brandenburg

POLAND

CZECH

REPUBLIC

LUSATIA - EnglishŁUŽYCA - Lower SorbianŁUŽICA - Upper SorbianLAUSITZ - German

Berlin

Saxony

Brandenburg

POLAND

CZECH

REPUBLIC

LUSATIA - EnglishŁUŽYCA - Lower SorbianŁUŽICA - Upper SorbianLAUSITZ - German

Berlin

Saxony

49

(especially the so called WITAJ-project; Budar & Norberg, 2006), which can,

however, at best slow down the language assimilation process. The situation of

Lower Sorbian is particularly dramatic since inter-generational transmission does

not exist any longer and children are led by means of (partial) immersion to the

status of a kind of “secondary native speaker”.

There are yet other specific problems concerning Lower Sorbian. The

revival of Sorbian life and its organization after the Second World War was

primarily initiated in the Upper Sorbian region and by Upper Sorbian exponents.

This led to the perception that the cultural life was Upper-Sorbian oriented, which

was in fact partially the case. This was experienced especially intensively in the

language domain. The spelling reform from 1949-1952 brought about the

approximation of Lower Sorbian to Upper Sorbian orthography. Since

pronunciation that oriented itself on the written language was fostered and

required at school and in the media, the spelling reform also had orthoepic

consequences (so-called “spelling pronunciation”). The Upper Sorbian linguistic

influence was further strengthened by the fact that, owing to the small number of

autochthonous Lower Sorbian experts, functionaries in Sorbian organizations and

teachers came predominantly from Upper Lusatia, and their language did not

conform to the linguistic features of Lower Sorbian. This resulted in the popular

impression that the Lower Sorbian standard language does not represent real

Lower Sorbian at all, but an overall Sorbian hybrid language at best, or a kind of

Upper Sorbian that had been adjusted slightly to Lower Sorbian. Many native

speakers of Lower Sorbian therefore refused to participate in official efforts to

strengthen the language and restricted its use to private life. Often they even

stopped transmitting the language to the next generation. On the other hand, the

official language policy, centred on the standard language and neglecting dialects,

gave rise to the feeling in Lower Sorbian speakers that they could not speak

correct Sorbian (an opinion that is heard repeatedly during field recordings). This

explains the wish for reinforced demarcation from Upper Sorbian which emerged

when state control over cultural life ceased. The latter finds expression in the

adoption of different terminology (“Wendish” instead of “Lower Sorbian”, cf. n.

1), in the withdrawal of some parts of the spelling reform from 1949-1952, and in

the rejection of a purist language that is felt to be Upper Sorbian.2

2. The Corpus for Spoken Lower Sorbian GENIE

In view of the precarious situation of Lower Sorbian that was described in relevant

studies (Jodlbauer, Spieß & Steenwijk, 2001; Norberg, 1996), it was foreseeable

that the “authentic” mother tongue would no longer exist within one generation at

best. That turned out to be particularly fatal for the spoken language since the

2 This results in the current (re)appearance of lexical Germanisms (lazowaś instead of cytaś, hundert

instead of sto), that have always been in colloquial use, also in written language. The similar situation can

be observed in the grammar section, e.g. with determination (occasional use of the definite and marginally

also the indefinite article).

50

“secondary mother tongue” (the maximum goal aimed at by efforts of

revitalization) differs strongly from the “authentic” mother tongue, especially in

its pronunciation.3 In this respect, it was important, and extremely urgent, to

document spoken Lower Sorbian. With this objective in mind, the corpus GENIE:

GEsprochenes NIEdersorbisch (Spoken Lower Sorbian) was created. The corpus

creation was partially funded by the Scientific Committee of the University of

Saarland in the years 2005-2006. The endeavour was also financially supported by

the Radio Berlin-Brandenburg (RBB) and the Sorbian Institute/Serbski Institut. In

order to make this corpus internationally usable for the scientific research, it was

made available on the web (http://genie.coli.uni-saarland.de). The GENIE website

is supported by the Insitut für Phonetik (http://www.coli.uni-

saarland.de/groups/WB/Phonetics/index.php) together with the Institut für

Slavistik (http://www.uni-saarland.de/fak4/fr44/) at the University of Saarland.

Due to copyright and data privacy protection rights, it could not be made generally

available; its use is permitted for scientific purpose by application

(http://genie.coli.uni-saarland.de/cgi-bin/benutzer.html). The corpus arrangement

was structured to meet the special features of the situation of Lower Sorbian

presented above and, where possible, to take into account the diachronic level.4

There are more than sixty hours of spoken Lower Sorbian in its distinct variants

available in GENIE. Even though the period of time covered by the recordings

ranges only from 1951 to 2006, the speakers' dates of birth indicate that the

diachrony is considerably deeper: the oldest speaker was born in 1860 (he was 94

years old at the time of the recording), the youngest speaker was born in 1973.

Individual diachrony is also traceable since several people are represented in

multiple recordings that were produced at different times.

2.1 Sources

The corpus consists of recordings from three different sources:

a) Archive of the Sorbian Radio (Studio Cottbus of the Radio Berlin-Brandenburg

RBB, formerly ORB, earlier still Radio of GDR)

This source consists of 110 recordings made between 1956 and 2006. Speakers of

dialects and of the standard language (native speakers of Lower Sorbian/ Wendish,

Upper Sorbian or German) are both represented in different variants of the

standard language. The text types are very different: conversation, interview,

address, report etc.

3 The reason for this is primarily due to the fact that the teachers employed in the revitalization project

WITAJ, apart from a few exceptions, do not have a command of Lower Sorbian as their mother tongue, but

at best as their secondary mother tongue. 4 Owing to copyright, the oldest recordings of Sorbian could not be adopted from the Berlin Archive,

therefore only marginal diachronic depth is taken into consideration: the recordings were made in the years

1951-2006.

51

b) Archive of Sorbian Culture/Serbski kulturny archiw (SKA) in the Sorbian

Institute/Serbski Institut

The source contains 135 recordings made between 1951 and 1971.The recordings

were compiled for linguistic purposes by the Institute, in particular for the Sorbian

Linguistic Atlas (cf. References SSA 1-13 1965-1993). Its aim was the recording

of local dialects (story, interview, elicitation etc.).

c) The field study project specifically for this corpus

The source consists of 100 recordings made between 2005 and 2006. They involve

conversations between J. Frahnow (pastor and native speaker) and mostly elderly

native speakers whose speech usually represents a local dialect. While selecting

the recordings and test persons, we attempted to depict the complexity of dialectal

forms of Lower Sorbian/Wendish along with diverse standard linguistic variants

employing the three sources mentioned.

2.2 Metadata files

There is a data record sheet for every recording containing the most important

information about the recording. Specifically, these are:

call number (the recording identifier): this consists of the letters f, r or s and

a four-character-number where f means field recording created by J.

Frahnow, r stands for recordings from the radio archive of the RBB, and s

signifies recordings from the Archive of Sorbian Culture. In addition to the

call numbers valid for this corpus, there are archive call numbers as used in

the source.

text type (e.g., conversation, interview, report)

contents (e.g., village life, customs, farming)

place of the recording

date of the recording

indication of sex (names are not given to protect the person’s identity)

speaker’s place of birth

speaker’s date of birth

dialect

family language: it is specified here whether the family language was

Lower Sorbian/ Wendish, German or mixed (or Upper Sorbian where

applicable)

places of residence

education

The place names in the arrays (place of the recording, place of birth, dialect and

place of residence) are given in German and Lower Sorbian/Wendish and can be

shown and arranged in three sections: place, municipality, and district.

Additionally, all the Lower Sorbian places covered are allocated to the dialect

52

areas. In doing so, the classification of the Sorbian Language Atlas was taken into

consideration, which ultimately goes back to the categorization by Muka (1911-

1926). In it, only Lower Sorbian dialects proper or transitional dialects are

distinguished. In the case of native speakers of Upper Sorbian, there is only a

reference to this fact without indication of the dialect area. In case of non-native

speakers or native speakers that use the standard language, the word “standard” is

used.

There are several metadata sets available for some recordings, namely in

cases where there is more than one speaker participating in the recording (hosts

and interviewers were usually not taken into account). The call numbers of the

metadata sets are identified in these cases by the attached index letters (e.g., a, b,

etc.).

Access to the datasets and audio recordings in the corpus may be obtained

either directly, by stating the call number, or indirectly by using a search form,

within which you can search or classify all specified arrays with intelligent filter

functions.

2.3 Technical data of the recordings

In addition to the specified background information, data record sheets comprise

the following information:

length of the recording in minutes and seconds

size of the .wav-file in bytes/kilobytes/megabytes

size of the .mp3-file5 in bytes/kilobytes/megabytes

sampling rate in Hz

amplitude quantization rate in bits per sample

number of channels (1 for mono, 2 for stereo)

signal-to-noise ratio SNR (as yet only with data from the field search

project)

bit rate (.mp3-file) in kBit/s

3. Examples from GENIE

It is evident from the description of the GENIE corpus that the material can be

analysed with various objectives in mind. For one thing, the description and the

comparison of the structural characteristics of the various dialect areas are an

attractive challenge in itself. Even though the spontaneous speech of the

recordings does not allow for an exhaustive grammatical description, the newly

recorded material provides a valuable supplement to the (not immediately

accessible) dialect recordings made during the German Democratic Republic era.

Another important question is to what extent the spoken standard language may

vary and, depending on the speaker’s origin, adopts a dialectal form, thus actually

containing Lower Sorbian, Upper Sorbian or German features. The focus of our

5 mp3 audio files are highly compressed in size. They take much less time to transmit over the internet.

53

first analyses, though, will be on the influence of German on spoken Lower

Sorbian; an influence that grew steadily over the 20th century, but which had been

present a long time before. The comparison of recordings of younger and older

people can shed light on the extent of this influence, as well as on the linguistic

features affected by it. More striking yet is the comparison of recordings of the

same person made at different times.

According to the existing descriptions (Schwela, 1906; Janaš, 1984; Starosta,

1991), there are well-known phonetic dissimilarities between German and Lower

Sorbian on the segmental level, the vowel quality and quantity, the R sound, the

realization of plosives with regard to voicing and aspiration, as well as the

existence of the dark L or a [w] and of the correlation of palatalization, widespread

in Slavic languages. There are, above all, characteristic features of intonation and

word stress known from impressionistic descriptions of the prosody. Other rarely

mentioned, though important discrepancies, are word-chaining modes, such as the

division of neighbouring vowels by means of a glottal stop or the type of voice

assimilation (progressive or regressive).

As examples of the existing and growing impact of the influence of German

on Lower Sorbian, we show here four of the phenomena mentioned above in

utterances of an elderly speaker (A, born in 1890) and of a younger speaker (B,

born in 1960).

Figure 2, a representation of the microphone signal and the spectrogram of the

utterance, “Chtož tu rolu wobźěłajo” (English “Who works on the land”)

illustrates several pronunciation features in one short stretch of speech that prove

the influence of German, three of which we comment on below:

1. In the word “rolu” the /r/ is realized as a uvular approximant ʁ (see I).

2. “wobźěłajo” /'obʑewajo/ starts with a glottal onset instead of a smooth

transition from “rolu” (see II) or an alternatively possible [h]

3. The syllable-final /b/ and the following syllable-initial /ʑ/ are voiceless (see

III).

54

Figure 2. The utterance “Chtož tu rolu wobźěłajo” (here: [xtɔʃ tʊ ʁɔlu ʔɔpʃevajɔ )

by speaker B (born in 1960) with (I) uvular ʁ , (II) hard vowel onset (glottal stop)

und (III) devoicing at the word coda with progressive devoicing of a voiced initial

fricative.

In Figure 3, depicting the oscillogram of an acoustic time signal and the

spectrogram of the utterance “tak daloko” (English “so far”), the voiceless

plosives /t/ (see I) and /k/ (see II) demonstrate, contrary to the claim that in Lower

Sorbian voiceless plosives are unaspirated, clear features of a moderate degree of

aspiration (in both cases 26 ms). The measured duration of aspiration is relatively

short if compared to that of monolingual speakers of German. Therefore it is

important to examine whether an intermediate form (similar to the weak aspiration

with Canadian speakers of French; Sundara et al., 2006; Fowler et al., 2008) has

become established in Sorbian, within this generation or with this speaker alone.

III I

II

55

Figure 3. The utterance “tak daloko” (here: [thak dalɔkhɔ]) by speaker B (born in

1960), where clear aspiration (I) of /t/ and (II) of /k/ can be noticed.

The older speaker (born in 1890) demonstrates a different articulation

pattern. Indeed, in figure 4 in her statement, “To njejo tak dobre” (English “It’s

not that good”), a tendency to aspirate can be observed: /t/ in “to” manifests an

aspiration duration of 37 ms (see I).

On the other hand, following /k/ in “tak” she produces a fully voiced initial

/d/ in “dobre” that affects /k/ regressively, making it voiced (see II). This suggests

that the assimilation process contrasts with the common German pattern but

corresponds to what is typical of other Slavic languages. The apical [r] in “dobre”

also differs from the German standard-/r/, which is a uvular fricative ʁ . There

are two signal muting taps of apical [r] to be seen in spectrogram as well as in the

microphone signal (see III).

I II

56

Figure 4. The utterance “To njejo tak dobre” (here: [thɔ ne t

hag dɔbrə]) by speaker

A (born in 1890), where (I) aspiration of /t/, (II) a fully voiced /d/ with partial

voicing of the preceding /k/ and (III) a double-contact apical /r/ can be observed.

As far as the fourth phenomenon in the younger speaker's recording is

concerne (the missing smooth transition from one vowel to the next across a word

boundary), it cannot be maintained that in earlier times glottal constriction,

according to German pattern, did not appear. In a short utterance (“a to ak,”

English “as”) of speaker A, there is a clear glottalization at the beginning of the

utterance and at the word boundary between “to” and “ak” (see I and II in figure

5). Further studies will allow us to determine how often such instances of

glottalization occur in her speech. It also cannot be ruled out that Slavic languages

behave similarly to other “binding” languages (French, Italian, English etc.) and

dialects (such as Alemannic). That is to say a stressed word with an initial vowel

in an emphatic context can very well start with a hard glottal onset. In the younger

speaker’s example, however, the glottalization appears in non-emphatic context.

The older speaker’s utterances are characterized by a general emphatic “word by

word” style. The utterance is not distinctively emphatic, but the glottalization

might be attributed to this general style. A further uncertainty, when comparing

the two speakers, results from age-related differences in the voice quality that add

to the difficulty of interpreting glottal phenomena.

I

C

II III

57

Figure 5. The utterance “a to ak” (here: [ʔa thɔ ʔak]) by speaker A (born in 1890)

with glottalization (I) at the beginning of the utterance and (II) at the word

boundary between “to” and “ak.”

4. Corpora of endangered languages – an exceptional case?

Following the presentation of a concrete corpus of an endangered language,

we should ask whether, from a general linguistic perspective, corpora of

endangered languages, or of micro languages in the broader sense (see The UCLA

Phonetics Lab Archive [http://archive.phonetics.ucla.edu/], The Endangered

Language Fund [ELF: http://endangeredlanguagefund.org/], DOkumentation

BEdrohter Sprachen/documentation of endangered languages [DOBES:

http://www.mpi.nl/DOBES/], and the Leipzig Endangered Languages Archive

[LELA: http://www.eva.mpg.de/lingua/resources/ lela.php] among others), are

essentially different from the corpora of other languages and whether this has

consequences for their planning, composition and supervision. In fact, there are

differences, but they are not of a principal nature.

An important difference concerns information value or, in other words,

representativeness of the corpora. Paradoxically, the corpora of endangered

languages are simultaneously more and less representative than those of other

languages. The higher degree of representativeness becomes especially clear in the

case of written corpora. Only languages with a limited written tradition may

include a high percentage of all that has been written in the corpus.

I

C

II

58

There are two reasons for lower representativeness. First, endangered

languages are either not documented at all, or if they are, then by relatively small-

sized corpora and only rarely by means of several corpora. In addition, the data

that exists has usually been collected by chance and does not reflect an intentional

selection. The second reason for lower representativeness lies in the fact that the

norm of endangered languages is less fixed, and so there is greater variability

within them that can only be imperfectly represented. It is even possible that

idiolectal predominance in a corpus may distort linguistic structures.

A further discrepancy is related to the composition, processing and

supervision of the corpora. As far as endangered languages are concerned, the

group of people that are interested in the corpora and are capable to put them

together is rather small. The same applies to the financial possibilities of

minorities. As a consequence, corpora of minority languages, if they are created at

all, cannot be specialized (they are the proverbial 'all-in-one' tools) and will only

be partially annotated, if at all. Continuous development, updating and

documentation are only possible to a very limited degree.

A major difference is ultimately inherent in the function of the corpora. As far

as endangered languages are concerned, the corpus is not a linguistic working tool

in the first place. It is, rather, a memorial with a quite distinct culture-political

objective. It shall document what still exists and what will possibly soon

disappear.6 This may well have consequences for the choice of the texts to be

recorded if the “antiquarian” idea prevails.

Corpora of endangered languages are clearly an exceptional case. Both

producers and consumers must take this into consideration. The producers must

take into account the limiting general conditions and the additional functions and

ensure that such corpora will be supervised in spite of limited resources. The users

must show understanding for the particularities of such corpora and also be willing

to contribute actively to their optimization, for example, by making the

transcriptions and annotations they created themselves available for the corpus.

References

Budar, L. & Norberg, M. (2006). „Les écoles sorabes après 1990“. Education et

Sociétés Plurilingues 20 (juin): 27-38.

Fowler, C. A., Sramko, V., Ostry, D. J., Rowland, S. & Halle, P. (2008). Cross-

language phonetic influences on the speech of French-English bilinguals.

Journal of Phonetics 36, pp. 649-663.

Janaš, Pětr (1984). Niedersorbische Grammatik für den Schulgebrauch. Bautzen:

Domowina.

6 It is not a coincidence that in the “Archive of vanished places” (“Archiv verschwundener Orte/archiw

zgubjonych jsow”) in the village of Baršć/Forst, recordings of Sorbian language are to be heard in order to

demonstrate how “Devastation” (open-cast lignite mining) affected the cultural heritage of the region

(www.forst-lausitz.de/sixcms/media.php/674/Broschuere_AVO_Aufl2.pdf).

59

Jodlbauer, R., Spieß, G. & Steenwijk, H. (2001). Die aktuelle Situation der nieder-

sorbischen Sprache: Ergebnisse einer soziolinguistischen Untersuchung der

Jahre 1993-1995. Bautzen: Domowina (= Schriften des Sorbischen Instituts

27).

Muka, Ernst (1911-1926). Słownik dolnoserbskeje rěcy a jeje narěcow I–III.

Petrograd: RAN; Praha: ČAVU.

Norberg, Madlena (1996). Sprachwechselprozeß in der Niederlausitz. Sozio-

linguistische Fallstudie der deutsch-sorbischen Gemeinde Drachhausen/ Ho-

choza. Uppsala (= Acta Universitatis Upsaliensis. Studia Slavica Upsaliensia

37).

Scholze, Lenka (2008). Das grammatische System der obersorbischen Umgangs-

sprache im Sprachkontakt. Bautzen: Domowina (= Schriften des Sorbischen

Instituts 45).

Schwela, Gotthold (1906). Lehrbuch der Niederwendischen Sprache. Erster Teil:

Grammatik. Heidelberg: Ficker.

SSA 1-15 1965-1996 Sorbischer Sprachatlas (Serbski rěčny atlas), bearbeitet von

H. Faßke, H. Jentsch und S. Michalk, 1-15, Bautzen (Budyšin) 1965-1996.

Starosta, Manfred (1991). Niedersorbisch schnell und intensiv 1. Bautzen:

Domowina.

Sundara, M., Polka, L., & Baum, S. (2006). Production of coronal stops by simult-

aneously bilingual adults. Bilingualism: Language and Cognition 9, pp. 97–

114.

Internet sources (accessed 30.03.2011):

www.forst-lausitz.de/sixcms/media.php/674/Broschuere_AVO_Aufl2.pdf

http://genie.coli.uni-saarland.de

http://www.mpi.nl/DOBES/

http://www.eva.mpg.de/lingua/resources/lela.php

http://endangeredlanguagefund.org/

60

Adjectif épithète et attribut de l’objet. Qu’en est-il de la prosodie ?

Denis Ramasse

CRISCO EA 4255, Université de Caen, France

e-mail: [email protected]

Résumé

En français, un adjectif placé juste après un nom peut avoir deux fonctions

différentes : épithète et attribut du complément d’objet (a.c.o.). Une confusion

peut ainsi naître dans l’interprétation d’une phrase comme : J’ai cru cet homme

sincère qui peut être comprise de deux façons : cet homme était vraiment sincère

et je l’ai cru, cela correspond à la fonction épithète ; ou j’ai cru que cet homme

était sincère et je me suis peut-être trompé, dans cette interprétation l’adjectif est

attribut de l’objet homme. On a cherché à savoir si la prosodie permettait de lever

cette ambiguïté sous deux aspects : celui de l’encodage et celui du décodage. 10

phrases ambiguës, présentées dans deux cotextes (l’un forçant l’analyse de

l’adjectif en épithète, l’autre en a.c.o.), ont été enregistrées par 6 locuteurs (3

hommes, 3 femmes). L’analyse acoustique de ce corpus a révélé 4 indices

prosodiques susceptibles de différencier les deux fonctions: un court silence entre

nom et adjectif (appelé pausette dans une description précédente), une montée

mélodique finale, un allongement moyen de durée et une élévation moyenne de

hauteur. Une analyse statistique des données a montré l’importance des deux

premiers indices. Un double test de perception a permis de vérifier que cette

hiérarchie des indices n’était pas la même au niveau du décodage parce qu’elle a

révélé aussi qu’une élévation moyenne de hauteur venait renforcer le rôle de la

pausette pour indiquer une fonction attribut de l’objet.

Abstract

Can prosody help to decide whether an adjective is epithet or attribute of the

object in a sentence?

In French, there can be an ambiguity when you don’t know by the context the

exact function of an adjective. In the sentence J’ai cru cet homme sincère, you can

understand: I trusted this sincere man (the adjective is epithet) or I thought this

man was sincere (the adjective is attribute of the object man). Perhaps prosodic

cues could disambiguate such sentences. To check this hypothesis 20 sentences, in

fact 10 sentences but realized in two different co-texts, were recorded by 3 men

and 3 women. The acoustic analysis of the recordings revealed 4 cues which could

establish a distinction between the two functions. The adjective was analyzed as

an attribute of the object when i) there was a short silence between the noun and

the adjective, ii) there was a melodic rising at the end of the sentence, iii) and iv)

the average duration and the average height of the sentence were a little greater. A

statistical analysis of the data showed that the silence and the final rising were the

61

most important cues. A perceptual test was then prepared to check whether these

cues were used in perception. It proved that there was not the same hierarchy

between cues in the perception, because the average height of the sentence seems

to be a useful cue which completes the role of the silence.

1 Introduction

Une séquence Verbe (V) + Nom (N) + Adjectif (A) peut être source d’ambiguïté

en français. Il y a, en effet, deux rattachements possibles de l’adjectif (Fuchs 1996)

:

soit il dépend du nom, il n’y a pas de frontière syntaxique entre N et A — la

parenthétisation est V(NA) —, et il est épithète

soit il dépend du verbe ; il y a une frontière entre V et N — la structure

syntaxique est ((V N) A) — l’adjectif est alors attribut du complément

d’objet (abrégé en a.c.o. selon Riegel, Pellat & Rioul 1994).

Riegel (1991) propose un ensemble de tests pour mettre en évidence la fonction

d’un adjectif selon tel ou tel emploi et ainsi faire la distinction entre épithète et

a.c.o. Par exemple, en prenant une phrase du corpus qui sera étudié:

Il a acheté cette voiture neuve.

Table 1:

épithète a.c.o.

pronominalisation Il l’a achetée. Il l’a achetée neuve.

interrogation en qu(e)

ou qu’est-ce qu(e)

Qu’est-ce qu’il a acheté ? Qu’est-ce qu’il a acheté

neuf ?

transformation en:

nom+relative

Cette voiture neuve qu’il a

achetée.

Cette voiture qu’il a achetée

neuve.

passivation Cette voiture neuve a été

achetée.

Cette voiture a été achetée

neuve.

Extraction en

c’est… que

C’est cette voiture neuve

qu’il a achetée.

C’est cette voiture qu’il a

achetée neuve.

détachement Cette voiture neuve, il l’a

achetée.

Cette voiture, il l’a achetée

neuve.

(Un septième test (en fait, le troisième dans la liste qu’il présente) semble difficile

à appliquer, même dans l’exemple qu’il donne (Le jury a jugé ce travail

remarquable.); il s’agit de l’interrogation en comment :

62

Table 2:

épithète a.c.o.

interrogation en

comment

Comment a-t-il acheté cette

voiture neuve?

? Comment a-t-il acheté

cette voiture ?

Ce test, implicitement, tend à considérer l’adjectif en fonction a.c.o. comme un

complément circonstanciel ; c’est pourquoi il semble préférable de ne pas

l’utiliser.)

Si la distinction, à l’écrit, entre les deux fonctions est délicate, on peut se

demander s’il n’y a pas, à l’oral, des indices permettant de lever cette ambiguïté.

Les locuteurs pourraient, en effet, à l’encodage, ajouter des éléments prosodiques

que les auditeurs seraient, au décodage, habitués à retrouver. L’étude présentée ici

s’attachera à mettre en évidence, dans la prosodie des phrases, l’existence

éventuelle d’indices permettant d’opposer les deux fonctions de l’adjectif.

1.1. a.c.o. essentiel et a.c.o. accessoire (ou accidentel)

Après certains verbes par exemple d’opinion (juger, croire, trouver, voir, sentir,

etc.) ou déclaratifs (dire, prétendre, assurer, affirmer, etc.), l’adjectif attribut du

complément d’objet est considéré comme essentiel car il détermine l’acception de

ces verbes (d’après Noailly 1999: 120, et Le Goffic 1993, en particulier § 263).

Avec ces verbes, d’après Fuchs (1996: 133), apparaissent des constructions dites

"réduites",

soit réduction d’une complétive en que si l’adjectif a une fonction a.c.o.

essentiel : J’ai cru que cet homme était sincère.

soit réduction d’une relative si l’adjectif a une fonction épithète : J’ai cru

cet homme qui était sincère.

Le résultat de la réduction est : J’ai cru cet homme sincère. (phrase 10 du corpus).

Avec les autres verbes, il n’y a pas de réduction, ce sont des attributs accessoires.

Dans le corpus qui sera étudié, on peut ainsi faire une distinction entre :

1. Il a trouvé cette idée folle.

10. J’ai cru cet homme sincère.

où, si l’adjectif est attribut de l’objet, il est considéré comme attribut essentiel de

l’objet ─ ce seront les seuls attributs essentiels du corpus ─, et, par exemple :

6. Il boit son chocolat froid.

8. J’ai connu cet homme intraitable.

où, le cas échéant, sera analysée une fonction attribut accessoire de l’objet.

On peut faire remarquer à propos de la phrase 1 du corpus :

Il a trouvé cette idée folle.

que la langue anglaise fait intervenir l’ordre des mots pour éviter l’ambiguïté

créée par les deux fonctions possibles, épithète ou a.c.o., de l’adjectif ; en effet,

s’il est épithète, on a la phrase :

He found this crazy idea.

63

et inversement, s’il est attribut de l’objet, il est postposé au nom :

He found this idea crazy.

Il est alors tout à fait justifié de supposer que, s’il y a désambiguïsation dans une

langue par des moyens syntactiques, des indices prosodiques pourront avoir le

même rôle dans une autre langue.

1.2. Sémantisme des deux fonctions

La fonction épithète ou attribut de l’objet fera ressortir telle ou telle clique d’un

verbe, une clique étant un sens microscopique, selon le dictionnaire des

synonymes du CRISCO, et un sous-graphe complet maximal dans la représen-

tation graphique de la synonymie d’un mot, selon Ploux et Victorri (1998).

Par exemple pour la phrase n° 1 : trouver = 22 : concevoir, créer, découvrir,

imaginer, inventer, trouver, avoir avec adjectif épithète; = 48 : estimer, juger,

penser, trouver, être d'avis avec adjectif attribut de l’objet.

Par ailleurs, les objets possèdent une propriété intrinsèque, ontologique (pour

reprendre le terme de Thomas 2003) ; par exemple pour la phrase n° 3 du corpus

présenté ci-dessous, la propriété d’un feu de circulation peut être sa couleur.

L’adjectif vient définir cette propriété. De même, la propriété, à la phrase n° 6, du

chocolat est sa température. L’adjectif épithète la définit de façon « durable »

(pour reprendre le terme de Blanche-Benvéniste 1991), tandis que l’adjectif

attribut de l’objet la définit de façon passagère.

Pour cette phrase n° 6 ainsi que pour la phrase n° 8, il y aurait avec les attributs de

l’objet, selon Fabienne Martin (2006), simultanéité de deux procès et

juxtaposition de deux prédicats, le second étant un dépictif (prédicat second

descriptif) :

6. Il boit son chocolat froid. (=Il boit son chocolat alors qu’il est froid)

8. J’ai connu cet homme intraitable. (=J’ai connu cet homme alors qu’il était

intraitable.)

Fabienne Martin oppose les dépictifs aux prédicats seconds résultatifs que

l’on trouve dans les phrases n° 4 et n° 5 du corpus :

4. Il a rendu son devoir irréprochable.

5. Il a gardé sa chemise propre.

Dans la première de ces deux phrases, le caractère irréprochable est le résultat

obtenu par le premier procès. Même si c’est moins évident pour la seconde phrase,

l’aspect propre de la chemise est le résultat d’un procès implicite de protection.

Une autre opposition sémantique subjectif/objectif peut être véhiculée par

cette différence de fonction. Les attributs essentiels des phrases n° 1 et n° 10

s’opposent en effet par leur aspect subjectif au caractère objectif conféré par la

fonction épithète. Par exemple dans la phrase n° 10, la sincérité de l’homme est le

fruit d’une impression ou d’un jugement dans un cas et une réalité dans l’autre

cas.

64

Un aspect objectif et durable véhiculé par la fonction épithète s’oppose ainsi

au caractère subjectif et éphémère apporté par la fonction attribut de l’objet avec

des nuances circonstancielles de simultanéité (dans les dépictifs) ou de finalité

(dans les prédicats seconds résultatifs).

2. Étude présentée

Dans cette étude, on a cherché à mettre en évidence une différence dans la

prosodie de deux phrases, identiques d’un point de vue segmental, mais

comportant

l’une, un adjectif épithète

l’autre, le même adjectif en fonction attribut de l’objet.

Par exemple, la phrase 1 du corpus Il a trouvé cette idée folle. peut se paraphraser

en Il a conçu cette idée folle. (adjectif épithète) d’une part, et en Il a jugé cette

idée folle. (adjectif a.c.o.) d’autre part.

Pour parvenir à ce résultat, la même phrase d’un point de vue segmental a été

placée dans deux cotextes différents induisant deux fonctions différentes de

l’adjectif. Ces cotextes étaient très simples, n’avaient rien de littéraire, mais

avaient été imaginés dans le seul but de donner une fonction très distincte à

l’adjectif. Par exemple, pour cette première phrase :

Cotexte 1 : Il cherche toujours à se faire remarquer. Il a trouvé cette idée

folle. Il s’est acheté une chemise violette.

Cotexte 2 : Elle lui a suggéré d’acheter une chemise violette. Il a trouvé

cette idée folle.

Ou, pour prendre la phrase 6 du corpus : Il boit son chocolat froid.

Cotexte 1 : Il fait très chaud. Il entre dans un café et se commande un

chocolat froid. Il regarde sa montre. Il boit son chocolat froid. Il sort.

Cotexte 2 : Il se sert son chocolat bien chaud, bien fumant. Il s’attarde plus

qu’il ne l’aurait fallu à sa lecture. Il boit son chocolat froid. Il part travailler.

10 phrases ont ainsi été réunies dans un petit corpus1, en pratique deux corpus,

l’un avec les adjectifs épithètes, l’autre avec les adjectifs en fonction attribut de

l’objet.

1 Un premier corpus réalisé par une seule locutrice a d’abord été analysé dans une étude préliminaire qui a

été présentée à un groupe de recherche sur l’adjectif du CRISCO. Ce corpus a été modifié, en partie grâce

aux remarques faites par des membres du groupe, car il comportait deux phrases présentant des problèmes

dans l’analyse.

1° il y avait une phrase qui était en quelque sorte une "intruse", puisque l’adjectif est susceptible

d’être non pas attribut du complément d’objet, mais attribut du complément du présentatif. Il s’agissait de

la phrase Voilà la question insoluble. Elle figurait dans le corpus pour tester sur le plan prosodique ce que

disent RIEGEL et coll. (1994) à propos de l’attribut du complément du présentatif (p. 241), à savoir :

Les séquences introduites par les présentatifs voici, voilà et par le verbe impersonnel

falloir occupent la position structurelle d'un c.o.d. Elles peuvent être suivies d'un élément

prédicatif fonctionnant comme un a.c.o. : Le voici enfin libre. Mais l’analyse a montré que

65

Le but était de dégager des indices utilisés pour encoder les deux fonctions

et non pas de faire une étude statistique exhaustive et rigoureuse de la production

des phrases avec adjectif épithète ou attribut de l’objet. Ce corpus (cf figure 1) a

donc été enregistré par six locuteurs (dont moi-même2), trois femmes (désignées

par loc1, loc2 et loc3) et trois hommes (loc4, loc5 et loc6). Le corpus des adjectifs

épithètes a toujours été enregistré avant celui des attributs de l’objet, selon l’ordre

adopté dans la présentation de la figure 1.

la réalisation prosodique d’une telle phrase était différente de celle comportant un adjectif

a.c.o.

2° une autre phrase a été modifiée car la forme phonétique de l’adjectif était la même au masculin et au

féminin. Il s’agissait de :Il a acheté cette voiture chère. Ceci avait pour conséquence une réalisation de cet

adjectif comme l’adverbe cher. La réalisation prosodique de cette phrase se distinguait nettement de celle

des autres, car l’adjectif chère n’avait plus la fonction a.c.o. mais complément circonstanciel.

2 Il n’y a pas de différence significative entre ma réalisation et celle de chacun des deux locuteurs

masculins. Ceci est prouvé par l’application de 2 tests de Wilcoxon signés (degré de significativité à .05)

sur les différences des moyennes, pour les 3 derniers paramètres, entre chaque phrase avec adjectif

épithète et la phrase correspondante avec adjectif a.c.o. ; par ailleurs, je n’ai pas réalisé de pausette.

66

3. Contributions antérieures à la description de la prosodie de l’adjectif

épithète et a.c.o.

Il n’y a pas de véritables études de la prosodie des phrases dans lesquelles

apparaissent les adjectifs épithètes et attributs de l’objet, mais des descriptions

"impressives" plus ou moins détaillées. La plus récente date de 1999, c’est celle de

Noailly (p.120), qui commente la phrase : Lise voudrait un mur jaune.

en disant : on est indécis, tout dépendant du contexte, et de l'intonation, plus ou

moins liée.

Ce qui peut être glosé de la façon suivante :

l’adjectif jaune est épithète si l’intonation est liée

il est attribut de l’objet s’il y a une rupture dans l’intonation.

Cette rupture dans l’intonation apparaissant dans des phrases avec des adjectifs

attributs de l’objet avait déjà été décrite par Damourette et Pichon (1911-1940)

dans le tome II de leur ouvrage (p.18), à propos de la phrase : Je veux ma robe

rouge.

Selon eux :

La confusion peut se produire pour la dianathète de l'ayance. Soit la phrase:

« Je veux ma robe rouge ». Rouge est-il épithète ou diathète ? Plusieurs critères

permettent de préciser :

S'il y a pausette après robe, on a affaire à une (échoite) dianathète :

en même temps que je commande ma robe, j'indique ma volonté qu'elle soit

rouge; s'il n'y a pas de pause, on a affaire à une épithète: j'exprime la volonté

d'avoir celle de mes robes qui est rouge. L'allocutaire est donc renseigné sur

l'intention du locuteur par la pause vocale.

Selon le glossaire des termes spéciaux ou de sens spécial employés dans

l’Essai de grammaire qu’ils font figurer dans leurs Compléments :

DIATHÈTE : attribut à valeur adjective : « Je suis grand.»

ÉCHOITE : attribut d’un complément autre que le sujet.

DIANATHÈTE : attribut à valeur adjective d’attache moyennement

serrée : « Petit poisson deviendra grand.»

AYANCE : complément direct d’objet.

On en déduit que l’expression dianathète de l’ayance désigne la fonction attribut

du complément direct d’objet.

4. La pausette Damourette et Pichon emploient, dans le passage cité, le terme de pausette fondé

sur une classification des pauses proposée dans le tome I (§169 p.188). Ils

distinguent trois types de pauses :

67

1. Grandes pauses : pauses finales des phrases marquées d’ordinaire par le

point.

2. Pausules : petites pauses marquées d’ordinaire par la virgule.

3. Pausettes : très petites pauses pour lesquelles la graphie actuelle ne

dispose malheureusement d’aucun signe de ponctuation, encore que le

besoin s’en fasse à chaque instant sentir.

Ou plus simplement selon leur glossaire :

PAUSULE : pose (sic) vocale marquée ordinairement par une virgule.

PAUSETTE : pose vocale moindre que celle marquée ordinairement par une

virgule.

Donc la présence d’une courte pause entre le nom et l’adjectif est, selon eux,

l’indice prosodique d’une fonction attribut de l’objet, l’absence d’une telle pause

contribuant à faire interpréter l’adjectif comme épithète.

5. A la recherche de pausettes

La première question à laquelle il fallait répondre était si les pausettes de

Damourette et Pichon était un indice permettant d’opposer adjectifs épithètes et

attributs de l’objet. Dans l’affirmative on pouvait alors se demander quelle était

l’importance de cet indice, s’il apparaissait systématiquement. Il suffisait pour cela

de chercher un silence entre le nom et l’adjectif subséquent. Cette analyse a révélé

la présence de six pausettes entre 50 ms et 100 ms (la moyenne étant de 72 ms)

Sur les 60 phrases avec un adjectif en fonction a.c.o. , cela ne représente que 10%

de ce qui aurait pu être réalisé. Il n’y a qu’une seule pausette réalisée par un

homme, toutes les autres se trouvant dans la réalisation des 3 locutrices.

L’exemple de Damourette et Pichon pour illustrer le rôle de la pausette comme

indice prosodique de l’a.c.o., est en conformité avec cette constatation, puisqu’il

ne peut être prononcé que par une locutrice : Je veux ma robe rouge.

Et 3 des 6 pausettes apparaissent dans la phrase 8, entre homme et

intraitable de : J’ai connu cet homme intraitable (suivi de la phrase : Il est

maintenant doux comme un agneau.). Ceci est illustré dans la Figure 2 où on peut

remarquer une pausette de 68 ms entre homme et intraitable dans la courbe

supérieure.

68

Figure 2. Mise en regard de la forme de deux courbes pour la phrase 8 réalisée

par la première locutrice (loc1). On remarquera la pausette de 68 ms (entre homme

et intraitable) manifestée par l’interruption du trait dans la courbe supérieure.

D’autres indices prosodiques ont été cherchés. Ils seront d’abord présentés

séparément puis leur importance relative sera évaluée.

6. Comparaison de la forme des courbes

Elle a été appréhendée par des mesures prises sur les voyelles : leur durée et leur

fréquence ; plus exactement :

la fréquence initiale du fondamental

la fréquence finale du fondamental

La fréquence du fondamental des consonnes voisées n’a pas été mesurée pour la

comparaison entre voyelles, mais elle a été prise en compte pour la description de

la fin des phrases. La mesure de la fréquence a été effectuée le plus souvent par :

l’AMDF (Average Magnitude Difference Function Pitch Extractor) proposé

par Ross et al. (1974), mais, parfois, pour des parties où le signal était trop

faible, trois autres algorithmes ont été utilisés :

la fonction peigne proposée par Martin (1981),

un algorithme fondé sur une méthode d’autocorrélation de Boersma (1993),

une simple F.F.T. (Fast Fourier Transform), qui a , semble-t-il, donné des

mesures précises.

69

Ces algorithmes sont proposés dans trois suites logicielles : PHONÉDIT, SPEECH

ANALYZER, et PRAAT. La hauteur du fondamental a ensuite été évaluée par une

conversion des fréquences en demi-tons avec 100 Hz comme valeur de référence

(100 Hz = 0 demi-ton, toute valeur inférieure à 100 Hz, exprimée en demi-tons,

devenant négative).

7. La durée des voyelles

Une stratégie de différenciation des fonctions aurait pu être d’allonger les voyelles

des phrases comportant un adjectif a.c.o. ; Une comparaison, voyelle par voyelle

(échantillons appariés) des durées pouvait montrer que les voyelles des phrases

comportant un adjectif attribut de l’objet sont plus longues que celles des phrases

comportant un adjectif épithète ; c’est ce qui a été vérifié en comparant pour

chaque paire de voyelles, leur durée. Pour chaque phrase le degré de signification

des différences a été vérifié soit par le test du t de Student (si les distributions des

échantillons étaient normales selon le test de Liliefors) soit par le test de

Wilcoxon.

Ceci n’est avéré que pour 3 phrases, n°1, 4 et 6, prononcées par la première

locutrice (loc1) où les :

voyelles des phrases avec a.c.o. sont plus longues que celles des phrases avec

adjectif épithète de 20,3 ms de moyenne (degré de signification du t de Student <

.02).

Un modèle mixte3, avec, en variable dépendante, la durée, en facteur fixe, la

catégorie épithète/a.c.o. et, en facteur aléatoire, les locuteurs, a été utilisé pour en

tester la pertinence de cet indice dans la distinction des catégories. Ceci a été

confirmé avec un degré de significativité inférieur à .01 (< .0001).

8. Recherche d’une différence de hauteur systématique

Pour chaque paire de voyelles, la hauteur a été comparée. Pour chaque phrase le

degré de signification des différences a été contrôlé soit par le test du t de Student

(si les distributions des échantillons étaient normales selon le test de Liliefors) soit

par le test de Wilcoxon.

Il y a différence significative de hauteur pour 10 phrases, 7 différences

positives (soit 11.66%) et 3 différences négatives (soit 5%). Les phrases 4 et 8,

présentent à elles-seules 5 des différences positives et on se souvient qu’elles

3 Dans le modèle mixte utilisé pour cette étude, ce sont les moyennes de la durée des voyelles, d’une

part, des phrases avec adjectif épithète et d’autre part, de celles avec adjectif a.c.o. , qui ont été comparées. Les deux autres modèles mixtes, dont il sera question plus loin, portaient sur les moyennes des hauteurs et celles des montées mélodiques finales.

70

comportent 4 des 6 pausettes apparues dans le corpus, 3 pour la phrase 8, et 2 pour

la phrase 4. Mais cette phrase 4 présente une différence négative de 3 tons.

Un modèle mixte, analogue à celui utilisé pour confirmer la pertinence de

l’indice de durée, mais avec la hauteur en variable dépendante, a été utilisé. Un

degré de significativité inférieur à .01 (= .007) a, là-aussi, confirmé cette

pertinence.

9. L’indice de la montée mélodique finale

48 des 120 phrases (soit 40%) présentent une montée finale de la mélodie; cette

montée se réalise parfois seulement au sein de la dernière voyelle : il y a alors

glissando (montée mélodique au sein d’une voyelle), mais la montée peut aussi

commencer depuis le début de la consonne précédant cette voyelle finale et

continuer jusqu’à la fin de la consonne suivante : dans ces deux derniers cas il y

aurait ce qu’on peut appeler « montée intraconsonantique» que l’on peut opposer à

« glissando simple ». A deux ou trois reprises, la montée se termine dans la

réalisation d’un schwa, s’étendant sur deux voyelles et une consonne. Quatre cas

de figures sont donc possibles :

glissando

montée intraconsonantique + glissando

montée intraconsonantique + glissando+ montée intraconsonantique

montée intraconsonantique + glissando+ montée intraconsonantique +

schwa (cf. figure 3)

Figure 3. Phrase 3 Il a vu le feu orange. avec adjectif a.c.o. prononcée par le

locuteur 6. Cette réalisation a la particularité de présenter une montée mélodique

finale d’un peu plus de 7 tons (montée la plus importante relevée dans cette étude).

Parmi ces phrases, 29 comportent un adjectif attribut de l’objet, (ce qui

constitue environ 24% des phrases cette catégorie), et 19 phrases ont un adjectif

épithète. Le tableau 1 présente les données concernant les deux ensembles de

phrases.

Là-aussi, la pertinence de cet indice est confirmée par un modèle mixte (avec la

montée mélodique finale en variable dépendante), le degré de significativité étant

ici égal à .001.

71

10. Étude de l’importance relative de chaque indice

Des analyses en termes d’agrégation ("clustering"), en l’occurrence des

classifications k-means, ont été pratiquées pour vérifier l’importance de chaque

indice dans la discrimination entre les deux fonctions épithète et attribut de l’objet.

Pour chaque analyse, il y avait 1000 itérations.

L’idéal aurait été d’obtenir une répartition avec 60 phrases comportant un

adjectif épithète dans une classe 1 et 60 phrases avec adjectif a.c.o. dans une classe

2. Le résultat est très loin de ce qui aurait été souhaité. Mais même si on s’en

approche que de très loin, les proportions entre les différents résultats peuvent

servir à évaluer l’importance relative des 4 indices. Ainsi la différence entre le

nombre de phrases avec adjectif épithète et celui de phrases avec adjectif a.c.o.

dans chaque classe peut donner une estimation de l’importance de chaque indice

pour la discrimination entre les deux fonctions. Ceci est résumé dans le tableau 2.

Tableau 2. Résultat des tests de classification k-means. La classe 1 comporte

(sauf exception pour l’indice de la hauteur) un plus grand nombre de phrases avec

adjectif épithète et la classe 2 un plus grand nombre de phrases avec adjectif

a.c.o. ; la différence entre chaque type de phrase dans chaque classe figure à la

dernière ligne.

pausette montée mélodique finale durée hauteur

classe 1 2 1 2 1 2 1 2

phrases avec adjectif

épithète 60 0 47 13 39 21 30 30

phrases avec adjectif

a.c.o. 54 6 43 17 37 23 30 30

différence 6 4 2 0

Il apparaît que la pausette est l’indice qui a la fonction discriminante la plus

importante. Ceci peut paraître surprenant parce qu’il n’y a que 6 pausettes

réalisées sur les 60 possibles. Mais la manifestation des autres indices est aussi très

restreinte. La montée mélodique finale vient après, dans cette classification, avec

une différence de 4 éléments dans chaque classe (4 phrases avec adjectif épithète

de plus dans la classe 1, et 4 phrases avec adjectif a.c.o. de plus dans la classe 2).

Le rôle discriminant de l’indice de la durée est très réduit puisqu’il n’y a une

différence de 2 éléments. Enfin l’indice de la hauteur des voyelles n’a aucun rôle

discriminant puisqu’on a un nombre égal de phrases avec un adjectif de chaque

fonction dans les deux classes.

La hiérarchie des indices est donc la suivante : pausette, montée mélodique

finale, durée et hauteur. La hauteur ne semblant avoir aucune fonction

discriminante, il n’a pas la fonction d’un indice permettant de différencier les deux

fonctions de l’adjectif contenu dans les phrases. Par ailleurs, ce résultat est obtenu

72

à partir d’un nombre de locuteurs trop faible pour qu’il soit significatif d’un point

de vue statistique.

Néanmoins quelques indices susceptibles d’être utilisés à l’encodage, ont

été dégagés.

On pouvait se demander, s’ils intervenaient au décodage, comment les

auditeurs faisaient pour obtenir une indication sur la fonction d’un adjectif placé

immédiatement après un nom complément d’objet direct ; c’est ce qui a été

cherché par des tests de perception.

11. Tests de perception

Pour savoir plus précisément quel est le rôle de chaque indice dans l’indication de

la fonction, deux tests faisant intervenir des stimuli naturels et synthétiques ont été

préparés à partir de deux phrases du corpus prononcées par la première locutrice

(loc1) : la phrase n° 8 : J’ai connu cet homme intraitable. et la phrase n° 6 : Il boit

son chocolat froid.

Pour chaque test, la phrase avec adjectif épithète et celle avec adjectif a.c.o.

prononcée par la locutrice 1 a constitué respectivement le stimulus st000 et le

stimulus st100. Les autres stimuli ont été manipulés à partir de ces phrases grâce

au logiciel Praat. Dans le premier test, la durée de la pausette était de 68 ms et

dans le second test la montée mélodique finale était de de 5.5 demi-tons. La nature

des stimuli est résumée dans le tableau 3. Après une phase d’écoute préliminaire,

ces stimuli ont ensuite été présentés dans un ordre aléatoire, à 7 s d’intervalle, à

18 sujets (étudiants de 1ère

année de lettres modernes). Chaque série de stimuli

était répétée 3 fois.

Tableau 3. Description des stimuli utilisés pour les deux tests de perception ; la

pausette avait une durée de 68 ms et la montée mélodique finale était de 5,5 demi-

tons.

73

Une feuille de réponses devait être remplie de la façon suivante ; dans le premier

test (phrase : J’ai connu cet homme intraitable.), pour chaque stimulus présenté, il

était demandé de cocher une case oui ou non en réponse à la question : l’homme a-

t-il changé ? La question dans le second test (phrase : Il boit son chocolat froid.)

était : le chocolat a-t-il refroidi ? et il fallait aussi cocher une case. Les cas de

refus de réponse ont été pris en compte dans l’analyse (ils seront notés nsp sur les

graphiques).

L’analyse globale des résultats a été l’occasion de constater la difficulté des

tests puisque aucune conclusion significative d’un point de vue statistique n’a pu

être obtenue pour les 18 auditeurs. C’est pourquoi il a fallu faire une sélection

parmi les résultats : 6 auditeurs ont donc été choisis en fonction de la diversité et

de l’exactitude de leurs réponses en ce qui concernait les stimuli non modifiés.

Une analyse factorielle des correspondances (AFC) a été pratiquée, pour le

premier test, sur les données obtenues, représentées selon le tableau de

contingence de la figure 4. Le test d’indépendance entre lignes et colonnes (khi2)

est significatif à .006.

Figure 4. Tableau de contingence des résultats du premier test ; sur les 18 sujets

initiaux, 6 ont été sélectionnés et il y a eu 3 présentations d’une série de 8 stimuli,

ce qui fait un total de 144 données.

nsp aco

épi

0

2

4

6

8

10

12

14

st000 st001

st010 st011

st100 st101

st110 st111

Colonnes Lignes

Vue 3D du tableau de contingence

74

La figure 5 représente le "mapping" de cette analyse. Il montre une répartition très

nette des stimuli

selon l’axe F1 :

o les stimuli dont l’adjectif est perçu en fonction a.c.o. comportent

tous une pausette

o les stimuli dont l’adjectif est perçu en fonction épithète ne

comportent pas de pausette

o une exception cependant quand la pausette est le seul indice de la

fonction a.c.o. (st100), le stimulus est à la limite mais perçu avec

adjectif épithète.

Selon l’axe F2 :

o une élévation de hauteur moyenne des phrases assure une meilleure

identification de la fonction a.c.o. de l’adjectif

o l’absence d’une telle élévation laisse apparaitre le doute, même s’il y

a présence simultanée d’une pausette et d’un allongement relatif

(st101)

Figure 5. Mapping de l’analyse factorielle des correspondances sur les résultats du

premier test. L’axe F1 représente, d’un côté, les phrases avec pausette et, de

l’autre, les autres phrases. Selon l’axe F2, une élévation de hauteur moyenne des

phrases assure une meilleure identification de la fonction a.c.o. de l’adjectif et

l’absence d’une telle élévation laisse apparaitre le doute.

aco

épi

nsp

st000 st001

st010 st011

st100

st101

st110

st111

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

-1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2

F2 (

15

,54

%)

F1 (84,46 %)

Graphique symétrique (axes F1 et F2 : 100,00 %)

Colonnes Lignes

75

Une AFC a aussi été pratiquée sur les données du second test, leur tableau de

contingence est présenté figure 6. Le test d’indépendance entre lignes et colonnes

(khi2) est ici encore plus significatif (.0001).

Figure 6. Tableau de contingence des résultats du second test. Comme il n’y avait

que 4 stimuli par série, le nombre de données présentées n’est que de 72.

La figure 7 représente le mapping de cette analyse. On remarque que:

selon l’axe F1, la montée mélodique finale assure exactement le même rôle

que la pausette dans l’identification de la fonction a.c.o. (stimuli avec

montée) et épithète (stimuli sans montée). Il y a donc ici analogie avec les

résultats du premier test.

l’axe F2 montre que la présence de la montée mélodique finale permet à

elle-seule une reconnaissance sûre de la fonction de l’adjectif,

contrairement à la pausette.

nsp

aco

épi

0

2

4

6

8

10

12

14

16

18

st000 st100

st011

st111

Colonnes Lignes

Vue 3D du tableau de contingence

76

Figure 7. Mapping de l’AFC sur les résultats du second test. L’axe F1 montre que

la montée mélodique finale est perçue comme un indice prosodique de la fonction

a.c.o. ; Il y a analogie et complémentarité, au niveau de la perception , entre

montée mélodique finale et pausette.

12. Conclusion

Deux indices principaux permettant d’opposer la fonction épithète et a.c.o. à

l’encodage comme au décodage se dégagent : la montée mélodique finale et la

pausette. Mais il est surprenant de constater que la pausette, le seul indice qui ait

déjà été décrit est peu utilisé, et à une exception près, que par des locutrices.

La montée mélodique finale est l’objet d’un emploi plus important, mais on

la trouve aussi dans des phrases avec adjectif épithète ; ceci n’entrave pas son rôle

discriminant entre les deux fonctions comme le prouve le test de perception.

Enfin, une hauteur moyenne plus importante de la phrase vient renforcer le

rôle discriminant de la pausette au décodage. Ces indices sont peu utilisés mais ils

relèvent de la prosodie, ce qui explique leur caractère facultatif. Cette étude a

porté sur un corpus de phrases lues. La validité des indices trouvés doit maintenant

être vérifiée sur des corpus de parole spontanée.

Références bibliographiques

Bachelet, R. (2010). L’analyse factorielle des correspondances. http://rb.ec-

lille.fr

épi aco

nsp

st000

st100

st011

st111

-0.6

-0.4

-0.2

-1E-15

0.2

0.4

0.6

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2

F2 (

12

,40

%)

F1 (87,60 %)

Graphique symétrique (axes F1 et F2 : 100,00 %)

Colonnes Lignes

77

Blanche-Benvéniste, C. (1991). Deux relations de solidarité utiles pour l’analyse

de l’attribut. Gaulmyn, M.M, Rémi-Giraud, S. & Basset, L. (éds), À la

recherche de l'attribut, PUL, Lyon, pp. 83-98.

Boersma, P. (1993). Accurate short-term analysis of the fundamental frequency

and the harmonics-to-noise ratio of a sampled sound. Proc. of the Institute of

Phonetic Sciences of the University of Amsterdam 17, pp.97-110.

Cibois, P. (2007). Les méthodes d’analyse d’enquêtes. PUF, Paris. CRISCO (2011), Dictionnaire des synonymes, http://www.crisco.unicaen.fr/ Damourette, J. & Pichon, E. (1911-1940). Des mots à la pensée. Essai de

grammaire de la langue française, tomes I, II et Compléments. D'Artrey, Paris.

Fuchs, C. (1996). Les ambiguïtés du français. Ophrys, Gap-Paris. Le Goffic, P. (1993). Grammaire de la phrase française. Hachette, Paris. Martin, F. (2006). Prédicats statifs, causatifs et résultatifs en discours –

Sémantique des adjectifs évaluatifs et des verbes psychologiques. Thèse présentée à l’Université libre de Bruxelles.

Martin, Ph. (1981). Mesure de la fréquence fondamentale par intercorrélation avec une fonction Peigne. Actes des XIIèmes Journées d’Étude sur la Parole, Montréal.

Noailly, M. (1999). L'adjectif en français. Ophrys, Gap-Paris. Ploux, S. (1997). Modélisation et traitement informatique de la synonymie.

Linguisticae Investigationes, 21/1, pp.1-28. Ploux, S. & Victorri, B. (1998). Construction d’espaces sémantiques à l’aide de

dictionnaires de synonymes. Traitement automatique des langues 39, n°1, pp.161-182.

Riegel, M. (1991). Pour ou contre la notion grammaticale d'attribut de l'objet: critères et arguments. Gaulmyn, M.M., Rémi-Giraud, S. & Basset, L. (éds), À la recherche de l'attribut. PUL, Lyon, pp.99-118.

Riegel, M., Pellat, J.C. & Rioul, R. (1994). Grammaire méthodique du français. PUF, Paris.

Ross, M.J., Schaeffer, H.L., Cohen, A., Freudberg, R. & Manley, H.J. (1974).

Average Magnitude Difference Function Pitch Extraction. IEEE Trans ASSP-

22, pp.353-362. Thomas, I. (2003). Quels types de données pour la traduction automatique de

l’adjectif qualificatif dans les groupes ADJ NOM/NOM ADJ : vers une approche ontologique et contextuelle. Bulletin de Linguistique appliquée et générale 28, pp.255-274.

Logiciels utilisés PHONÉDIT développé par la société S.Q.Lab en collaboration avec le Laboratoire

Parole et Langage d’Aix-en-Provence (C.N.R.S. URA 261). PRAAT logiciel d’analyse et de synthèse de la parole développé par Paul

Boersma and David Weenink, Phonetic Sciences, University of Amsterdam. SPEECH ANALYZER version 3.0.1. (2007) développé par la SIL (Dallas) XLSTATS logiciel de statistiques et d’analyse de données développé par

Addinsoft

78

OBITUARIES

Eli Fischer-Jørgensen (1911- 2010)

At the age of ninety-nine, Emeritus Professor Eli Fischer-Jørgensen died at her

home in Denmark in February, 2010. This marked the end of a very long and

distinguished career that had begun in 1929 with studies of the French and

German languages which were firmly in the Danish tradition stemming from great

scholars of the linguistic sciences, such as Otto Jespersen. While still a student,

she was accepted into the Linguistic Circle of Copenhagen, which was famous for

the "glossematic" theories of Louis Hjelmslev. He was a scholar who may be easy

to overlook due to the fact that he collaborated with a colleague (Poul Andersen)

to produce a practical textbook for their students of phonetics. While still a

student, Eli developed her lifelong passion for integrating observational and

instrumental phonetic work with phonological theory. Graduating MA in 1936,

she set off on travels to and sojourns in places which included Marburg (for

German dialectology), Paris to work with Martinet and Marguérite Durand and

Berlin to study with Eberhard Zwirner. Returning home just before the outbreak of

World War II, she got work in the Department of German which, in due course,

morphed into a lectureship in phonetics created for her under the aegis of

Hjelmslev.

After the War, she extended her experience by visits to London to the

Phonetics Department at University College to study with Jones and Hélène

Coustenoble and also to the School of Oriental and African Studies to attend

lectures by J. R. Firth and on Yoruba and Chinese as well. Other journeys took her

to America to the Haskins Laboratories and to Stockholm to cooperate with

Gunnar Fant. At home, her work became recognised by the creation of a Chair of

79

Phonetics for her in 1966 and an associated institute. Fruitful connections with

colleagues at Lund also followed.

As time went by, she became the host herself of researchers from abroad,

including individuals from Japan, Edinburgh, Berkeley and Germany. A most

memorable and brilliantly managed (very much by her) occasion was the 1979

visit to Copenhagen for the Ninth International Congress of Phonetic Sciences.

This was something of a swan song for her since two years later, on her reaching

70, regulations no doubt required her to relinquish her post.

Her varied publications were far too many to detail here. They included a

classic account of the Danish stød, the historical Tryk i ældre dansk (on Stress in

Old Danish), Trends in phonological theory and her accounts of the phonetic

symbolisms of vowels. Nor should her modest, concise, clear summary of general

phonetics for her Danish students, Almen Fonetik, be quite forgotten. She was held

in high esteem amongst her friends for her gifted water colours. She'll be

remembered for a long time to come.

Jack Windsor-Lewis

Eva Sivertsen (1922-2010)

Eva Sivertsen was born on the 8th

of July, 1922 at Trondheim, the ancient city on

the shores of a fjord in the middle of Norway's thousand-mile coastline. She

graduated in English at the University of Oslo continuing her studies there with a

Ph.D. on the famous dialect of working-class Londoners, known as Cockney. This

activity developed after some years of further work into the 280-page book

published by Oslo University Press in 1960 as Cockney Phonology. She did much

of her work on Cockney from a base at University College London's Department

of Phonetics, but also lived for a while among her main informants at a social

settlement in the East End area of Bethnal Green. Besides the influence of the

contemporary and previous UCL staff which she clearly acknowledged, she

80

became a great enthusiast for the work of the American structuralists. The

influence of Charles F Hockett certainly pervades the whole book.

A three-page review of it in Le Maître Phonétique by J. D. O'Connor began

"the standard work on Cockney Phonetics has now been written" and ended with

"altogether a splendid book". She included in it also an admirable "conspectus of

the general problems posed by the phonological analysis of English" thus making

it "two books in one".

Besides being a brilliant scholar she was an equally gifted administrator, as

was seen when she became a principal organiser of the Eighth International

Congress of Linguists in 1957 and edited its volume of Proceedings. In 1960, she

headed the Department of English at Trondheim University. She ultimately

became the Rektor of the whole University. She always maintained an interest in

the teaching of English as an extra language in its grammar and other linguistic

features, as well as its phonology. She was an outstandingly energetic person

physically, as well as intellectually — much given to outdoor pursuits with

remarkable endurance. She never married, but she had many friends by whom she

was well liked.

Jack Windsor-Lewis

Gösta Bruce (1947-2010)

(picture courtesy of Daniel Bruce)

Gösta Bruce, Professor of Phonetics at Lund University, Sweden, passed away on

June 15, 2010, following a short period of hospitalization. He was 63 years old.

Gösta Bruce is survived by his wife, Barbro, and his children Sara (with partner

Valtteri), Daniel, and Niklas.

Born and brought up in the southern Swedish town of Helsingborg, Gösta

chose to continue his higher education at Lund University, 60 km south of

81

Helsingborg. After an undergraduate degree in Russian, Gösta went on to study

phonetics, drawn to the department where Bertil Malmberg and Kerstin Hadding

had developed the field of phonetics as an experimental discipline at the

Humanities faculty at Lund University. Under the direction of Hadding’s

successor, Eva Gårding, Gösta Bruce developed the Lund model of intonation. He

carried the phonetic analysis of Swedish word accents in a new direction by

analysing them with respect to their syntactic position and pragmatic function

(focus) in utterances. His seminal dissertation, Swedish Word Accents in Sentence

Perspective (1977) laid the theoretical foundation for the development of ideas

about how intonational phenomena could be analysed as components in a

hierarchical prosodic structure. These fundamental ideas on intonational structure

and their relation to syntax and pragmatics have since been adopted and developed

by many researchers the world over.

Following a research stay at Bell Labs in 1984, as well as a period as a

visiting professor at Stockholm University during 1985-1986, Gösta Bruce was

appointed to the chair of phonetics at Lund University in 1986. The contributions

to the festschrift to Gösta on the occasion of his 50th

birthday in 1997 (Horne,

2000) bear witness to the influence that his work had for researchers, not only in

phonetics, but also in general linguistics and in speech technology.

Although Gösta Bruce’s model was based on the prosodic patterning of

‘standard’ central Swedish, Gösta’s own dialect, that of Helsingborg in the

southern province of Scania, differed quite considerably from that of the standard

variety. This variation in the patterning of word accents in Swedish dialects was an

area that intrigued Gösta as it had earlier Eva Gårding (1977) and Ernst Meyer

(1937–1954). Gösta Bruce followed in their footsteps and carried the investigation

of dialectal variation to new heights in his work on prosodic modeling. Although

the phonetic realization of the two Swedish word accents differs quite

considerably dialectally, the crucial timing difference between the word accents

with respect to the stressed syllable is something that is constant for all dialects

and is something which fascinated Gösta. He had an extremely sensitive ear for

tonal variation and timing, and in recent years, his work was focused on

systematizing this variation as regards Swedish dialect prosody in several

externally financed research projects such as SweDia 2000 and SIMULEKT.

Shortly before his untimely death, his vast accumulated knowledge on the varieties

of Swedish was published in his book Vår fonetiska geografi ‘Our phonetic

geography’ (Bruce, 2010).

Gösta’s sensitivity for timing differences also lead to a number of novel

studies on rhythmic structure in Swedish. By carrying out a number of innovative

experimental studies on differences in the duration of unstressed syllables, he

could show how rhythmic alternation was created postlexically in strings of

nonprominent syllables (Bruce, 1987).

Gösta Bruce was not only a creative researcher and scientist; he was also a

dedicated and respected teacher. His undergraduate courses on prosody, Swedish

82

dialect variation and sounds of the world’s languages were always highly

evaluated. At the time of his premature death, Gösta was planning to rework and

update his very popular course book on Swedish prosody (Bruce, 1998). On the

graduate level, Gösta was regularly engaged in doctoral courses on both a local

and national level. He was a devoted teacher and supervisor, and during his career,

Gösta supervised 13 doctoral dissertations. He sincerely cared about his students

and constantly inspired and encouraged them, both by his words of wisdom and by

his empathetic manner. His humor, often spontaneously expressed in terms of

perfect sound imitation (everything from different Swedish dialects to Russian

intonation to complex African click consonants), was another productive outlet for

his very creative mind.

Despite all his research and teaching duties, Gösta Bruce played an important

role in academic leadership at Lund University. During his time as professor, he

served as head of the department of linguistics and phonetics, vice dean of the

humanities faculty, chairman of the appointments’ board for language and

linguistics, and most recently, member of the board of research at the Center for

Languages and Literature. He was also engaged as an expert evaluator at the

Swedish and Norwegian Research Councils and was a member of the editorial

board of Phonetica. In addition, he was an active member in several learned

societies, including The Royal Swedish Academy of Letters, History, and

Antiquities.

In 2007, Gösta Bruce was appointed president of the International Phonetic

Association. In this role, Gösta saw the opportunity to approach a discussion of

fundamental issues related to the future of the discipline of phonetics, including

the relationship of prosodic research within a larger interdisciplinary perspective

where phonetics plays a central role in understanding speech processing

phenomena. Due to his untimely death, however, many of Gösta’s plans were

tragically left at the planning stage.

Following a suggestion by Gösta’s family at the time of his funeral, the IPA

set up a memorial fund to honor Gösta and his accomplishments. Since that time,

the IPA Council has decided to make the fund a permanent fund. The Gösta Bruce

Memorial Fund is intended to serve as a means to support students in phonetics

and speech sciences by awarding scholarships in Gösta’s name that will assist

them in traveling to ICPhS conferences in order to meet other speech scientists

and present their research results to the international community. Nothing could be

more fitting to keep the memory of Gösta Bruce’s many scientific

accomplishments and his constant devotion to developing knowledge of phonetics

alive.

References

Bruce, Gösta. 1977. Swedish word accents in sentence perspective. (Travaux de l’Institut

de linguistique de Lund XII). Lund: Gleerup.

Bruce, Gösta. 1987. On the phonology and phonetics of rhythm: Evidence from Swedish.

83

In Dressler, W., Luschützky, H., Pfeiffer, O. & Rennison, J. (Eds.), Phonologica

1984. Proceedings of the Fifth International Phonology Meeting, Eisenstadt, 25–28

June 1984, pp. 21-32. Cambridge: Cambridge University Press.

Bruce, Gösta. 1998. Allmän och svensk prosodi [General and Swedish prosody]. (Praktisk

lingvistik 16). Dept. of linguistics and phonetics, Lund University.

Bruce, Gösta. 2010. Vår fonetiska geografi [Our phonetic geography]. Lund:

Studentlitteratur.

Gårding, Eva. 1977. The Scandinavian word accents (Travaux de l’Institut de

linguistique de Lund XI). Lund: Gleerup.

Horne, Merle (Ed.). 2000. Prosody: Theory and experiment. Studies presented to Gösta

Bruce. Dordrecht: Kluwer.

Meyer, Ernst A. 1937-1954. Die Intonation im Schwedischen [Intonation in Swedish], 2

vols. (Stockholm Studies in Scandinavian Philology, 0562-1097). Stockholm:

Fritzes.

Merle Horne

Professor of general linguistics

Dept. of linguistics and phonetics

Lund University, Sweden

Ilse Lehiste (1922 – 2010)

(picture by courtesy of Sarah Ritschert)

One of the greatest phoneticians who, was a remarkable scientist, passed

away. Ilse Lehiste, born on January 31, 1922 in Tallinn, Estonia, died at Riverside

Methodist Hospital on Saturday, December 25, 2010. She was born into the family

of a higher officer. She started her studies in Estonia: graduated from the Lender

high school, then studied piano for one year at the Conservatory of Tallinn, and

she came up to the University of Tartu, Faculty of Arts (1942).

84

After two years, she continued her studies in Germany because she left

Estonia as a refugee in 1944, fleeing the Soviet invasion of her homeland. At first,

she studied at the University of Leipzig and then at the University of Hamburg.

Her postgraduate studies concentrated on the work of William Morris, the many-

sided Victorian designer, artist, writer, and socialist. She was especially interested

in the motives of the Nordic literature in his work. She defended her PhD in

Philology at the University of Hamburg in 1948. At that time, she lived in a

refugee camp in Germany.

During the next year she moved to United States, where she continued her

studies. Here, she was engaged especially in linguistics. In 1959, she defended her

second PhD at the University of Michigan. Her main research was acoustic

phonetics, besides this she was engaged in other fields of linguistics: prosody,

language contact, Estonian, phonetics and phonology, Serbo-Croatian

accentology. After receiving her PhD, she spent four years at the Communication

Sciences Laboratory there as a research associate.

In 1963, Ilse Lehiste joined the linguistics faculty at The Ohio State

University (OSU), Columbus. At first, she spent two years in the Slavic

Department, then she was elected to be the Linguistics Department’s first Chair

when it was founded in 1965. She enjoyed a long and especially distinguished

career at OSU: she was elected Professor in Linguistic in 1965. Since 1987, she

was continuing as Professor Emeritus. She has given exciting lectures at

universities and at conferences all over the world.

She was not only a linguist, but a phonetician. She worked to build a bridge

between the linguists of Estonia and the West. That interest is exemplified by the

11th International Phonetics Conference which was organized in Tallinn in 1987

because of her suggestion. She was a Renaissance person: linguist, literateur, poet,

musician, etc. Her poems were published in 1989 (Noorest peast kirjutatud

laulud). She analyzed the Estonian literature and she wrote several overviews for

the World Literature Today in the United States. In the past decade, she was

cooperating with the Institute of Estonian and General Linguistics of the

University of Tartu to investigate Finno-Ugric prosody.

Lehiste left behind an enormous body of work: she was author, co-author or

editor of twenty books, two hundred articles and around a hundred reviews. I

would like to emphasize only one of her admirable books. She was employed in

researching the production and perception of suprasegmental features, and the

general work, called Suprasegmentals was published in 1970. Lehiste summarized

what was known about the phonetic nature of suprasegmentals and evaluated the

available evidence from the point of view of linguistic theory.

Ilse Lehiste attended the Speech Research ’89 Conference in Budapest

(Hungary) more than 20 years ago offering her help to the conference organizers.

It was a great experience for the Hungarian phoneticians to meet her personally.

The title of her talk was The experimental studies of poetic rhythm.

85

The importance of her scientific work was well recognized by a number of

professional bodies around the world. Lehiste has received a honorary doctorate

from Essex University, England (1977), the University of Lund, Sweden (1982),

Tartu University, Estonia (1989), and The Ohio State University (1999). She was a

Fellow of the American Academy of Arts and Sciences (1990), Foreign Member

of the Finnish Academy of Sciences (1998), and Foreign Member of the Estonian

Academy of Sciences (2008). Ilse Lehiste will be remembered both personally and

professionally.

Viola Váradi

Eötvös Loránd University

Phonetics Department

Budapest, Hungary

86

Svend Smith Award 2008 for Elisabeth Lhote

Elisabeth Lhote was born in Toul. After graduating

from high school, she studied French literature and

linguistics at the University of Lille and was

introduced to phonetics there. Motivated by her

growing interest in general, experimental and applied

phonetics, she moved to the Institute of Phonetics of

Strasbourg University where she joined the research

team around Georges Straka. Under his guidance,

Elisabeth Lhote specialized in voice production and

earned her doctorate in Phonetics in 1970 with a thesis

on "La méthode glottospectro-graphique et la

simulation de la parole" (Glottospectrography and the simulation of speech). She

continued her career as a researcher under the supervision of Péla Simon, who had

succeeded Georges Straka in the position of head of the Phonetics Department in

1971, and presented an excellent habilitation treatise in 1980 on "Analyse et

synthèse de faits de langue au niveau du larynx" (Analysis and synthesis of

laryngeal features).

In 1980, Elisabeth Lhote was appointed Professor of Phonetics and head of

the Phonetics Laboratory at the University of Franche-Comté in Besançon. In

1986, she became director of the Center of Applied Linguistics and head of the

Laboratory of Speech Analysis. In these positions she was able to substantially

develop and foster phonetics and applied linguistics at her university until her

retirement in 1997.

Elisabeth Lhote’s list of publications comprises 4 books and 65 articles. She

started publishing the results of her research activities in the late sixties. Her first

publications may be characterized as reports on detailed experimental

investigations of the activities of the vocal cords by glottography and

glottospectrography. Her findings shed new light on the acoustics of the glottal

source, provided new impulses to the theory of phonation and stimulated new

research initiatives in the domaines of intonation and tones in the tone languages.

Later, her interests shifted to speech pathology and therapy, speech perception and

comprehension, speaker recognition and foreign language teaching.

As an academic teacher, Professor Lhote has supervised 18 doctoral and 2

habilitation theses. By her outstanding commitment, devotion and excellence as a

researcher and academic teacher, she has profoundly promoted the phonetic

sciences and applied linguistics in France, Europe and the world. ISPhS’s

membership is proud to confer the 2008 Svend Smith Award to her.

Jens-Peter Koester

[email protected]

87

References

Lhote, E. (1970). La méthode glottospectrographique et la simulation de la parole. Dr.

dissertation, Strasbourg.

Lhote E. (1973). Contribution à l'étude de la fonction linguistique du larynx. Phonetica,

n° 28, p. 26-41.

Lhote, E. (1982). La parole et la voix. Hamburg (Buske).

Lhote, E. (Ed.) (1990). Le paysage sonore d'une langue, le français. Hamburg (Buske).

Lhote, E. (1995). Enseigner l'ora1 en interaction. Percevoir, écouter, comprendre. Paris

(Hachette).

88

PHONETICS INSTITUTES PRESENT THEMSELVES

THE DEPARTMENT OF LANGUAGE AND COMMUNICATION

STUDIES

NORWEGIAN UNIVERSITY OF SCIENCE AND TECHNOLOGY,

TRONDHEIM, NORWAY

The Department of Language and Communication Studies, or in Norwegian:

Institutt for språk- og kommunikasjonsstudier (ISK), is the only department in

Norway where it is possible to study Phonetics. Its research is both fundamental

and applied, and often cross-disciplinary.

The Dragvoll campus, which houses the Department of Language and

Communication Studies

Study programmes

The Department of Language and Communication Studies

<http://www.ntnu.edu/isk> offers a full BA/MA programme in Phonetics

<http://www.ntnu.edu/studies/bfon>. The programme covers all traditional areas

of phonetics (transcription, physiology and articulation, acoustics, and speech

perception) and focuses on experimental phonetics. All courses aim to combine

phonetic theory with practical exercises, usually in the studio or in the phonetic

lab. The Phonetics section is represented by two professors, Wim van Dommelen

and Jacques Koreman. In addition to Phonetics, the Department of Language and

Communication Studies offers full study programmes in General Linguistics and

Applied Linguistics, as well as subsidiary programmes in Swahili and Norwegian

as a Second Language. It is responsible for all Norwegian courses for exchange

students at the Norwegian University of Science and Technology (NTNU). This

varied environment, and collaboration with speech technologists at NTNU, opens

up possibilities for a wide range of research themes.

89

Research

The research in the Department of Language and Communication Studies covers

comparative language studies and foreign language acquisition, speech perception,

speaker recognition and speech technology.

In a long-standing collaboration with Norwegian as a Second Language,

Wim van Dommelen <http://www.hf.ntnu.no/hf/isk/Ansatte/wim.van.dommelen/

personInfo.html> has investigated the difficulties foreigners have in learning

Norwegian. His research covers both segmental and supra-segmental properties.

Tone and intonation has been (and is) an area of interest, especially the realization

of Norwegian lexical tones, in which he has a tight collaboration with Linguistics.

As a spin-off result of his involvement in the Sound-to-Sense project

<http://www.sound2sense.eu/>, he is also involved in experiments on foreigners’

perception of English sounds in noise. This research is carried out in collaboration

with University College London, the University of the Basque Country (Bilbao)

and Radboud University in Nijmegen. The Sound-to-Sense project is a Marie-

Curie Research Training Network in which Ph.D. students and post-docs are

trained outside their native country. It also brought Helena Spilková to Trondheim.

Helena is carrying out her Ph.D. research on reductions in spontaneous

conversational speech and comparing productions of native English speakers with

productions of two groups of non-native speakers of English (Czech and

Norwegian speakers). This research involves detailed phonetic analysis as well as

evaluation of various context influences on the word realizations. In the same

project, there is a collaborative research effort with Radboud University in

Nijmegen on the systematic phonetic variation of word-final /t/ in Dutch, where

the influence of linguistic (e.g. morphological structure) and probabilistic factors

(word frequency) on the realization of canonical /t/’s is being investigated and

compared to the way an automatic speech recognition system deals with such

phonetic variation.

Recently, the department has started a collaborative project which brings

together theoretical expertise from phonetics with pedagogical experience from

the Norwegian teachers in the department to build a computer-assisted

pronunciation teaching system (CAPT). This system is based on VILLE

<http://www.speech.kth.se/ville>, which was developed by KTH in Stockholm,

who are also one of the partners in the project, “Computer-Assisted Listening and

Speaking Tutor (CALST)” <http://www.ntnu.edu/isk/projects>. This project aims

to not only adapt the Swedish system to Norwegian, but also extend it so that users

can train with different dialects. The reason for this is that there is no accepted

pronunciation standard for Norwegian, so that foreigners must learn to deal with

different dialects in their communication with Norwegians to be able to

understand different speakers. Besides focusing on different target dialects, the

system is being developed for specific source languages (or native languages of

the users), so that learners of Norwegian can be guided through pronunciation

exercises that are relevant for their native language. This is done in detail for a few

90

major learner groups in Norway, but we also analyse a large number of languages

in less detail. In order to do this, an automatic contrastive analysis of the phoneme

inventory is made on the basis of UPSID (UCLA Phonological Segment Inventory

Database) <http://www.linguistics.ucla.edu/faciliti/sales/software.htm#upsid>.

The aim is to build a flexible, extendable interface for contrastive analysis

between any language pair that can be used in CAPT applications for any

language. Jacques Koreman <http://www.hf.ntnu.no/isk/koreman> is the project

manager. He is interested in speech technology, and has previously worked on

speech recognition with the use of phonetic features. He also coordinated research

on biometric user authentication in the SecurePhone Project

<http://www.secure-phone.info>, where he and his colleagues specifically worked

with speaker recognition and fusion (combination) of different modalities (voice,

face and signature). The biometric recognizer was also implemented on a

PDA/mobile phone. Besides speech technology, he is interested in the voice and

voice pathology. He has carried out research on the phonetic consequences of

unilateral vocal fold paralysis with a colleague at Saarland University, Germany,

where he worked before moving to Trondheim. He also investigated vocal fold

aerodynamics using a Rothenberg mask. He is now involved in other research

projects in collaboration with Saarland University (project leader) and with the

Technical University in Berlin. These projects investigate the production and

perception of prominent syllables in several languages, of which Norwegian is

one. The investigations so far show that languages use different prosodic

properties to signal that a syllable is prominent, which of course has implications

for second language acquisition and perception.

Equipment

The department has a high-quality recording studio. Besides audio recordings, it is

possible to record electroglottograms (Glottal Enterprises EG-2) as well as

aerodynamic signals (Rothenberg mask). In addition, a motion capture system is

being installed.

Recording of the airflow and microphone signals in the studio

91

Location

The Norwegian University of Science and Technology < http://www.ntnu.edu>

(NTNU) consists of two campuses <http://www.ntnu.edu/about-ntnu/campuses>.

The Gløshaugen campus is home to the engineering sciences, while Dragvoll hosts

the humanist and social sciences. Dragvoll is just outside Trondheim, and a bus

ride into the city centre takes 15 minutes. Most of the buildings are connected by

glass-roofed streets, with a bookshop, a café, small shops and a student cafeteria,

in addition to the university library, lecture halls and offices.

Walking the indoor streets of Dragvoll or enjoying the sunny spell we call winter

What else?

Students can use the university’s sports facilities, and there is ample opportunity

for hiking in the beautiful surroundings of Trondheim, which is situated next to a

fjord. During the long winters, you can go skiing in “lysløper” (lighted ski trails)

in the Estenstadsmarka close to Dragvoll, or in the Bymarka. There are also ski

jumps, as well as alpine slopes, in the vicinity of Trondheim. There are many lakes

where you can go for a swim in summer or skate in winter. The city itself is the

third-largest city in Norway – but it is still small. It has a cozy atmosphere with its

wooden houses, and is at the same time alive with its large student population and

rich cultural life.

Jacques Koreman

e-mail: [email protected]

THE PHONETICS LAB AND THE PHONOGRAM ARCHIVES AT

ZURICH UNIVERSITY, SWITZERLAND

The need for knowledge in phonetics as a language expert was probably one of the

main motivations for the English philology professor Eugen Dieth to found the

Phonetics Lab at the University of Zürich (UZH) in 1935 and to carry out

phonetics research using early versions of palatography and sound kymography

(Dieth, 1950). Apart from focusing on speech research activities, Dieth was also

involved in descriptive work on dialectal variability. For this reason, he desired to

92

maintain the ‘Phonogram Archives’, which were co-founded in 1909 at UZH by

Albert Bachmann and Louis Gauchat with the aim of collecting vernacular

language recordings in the four Swiss national languages (German, French, Italian

and Reto-Romance). At present, both the Phonetics Lab and the Phonogram

Archives compose two inseparable institutions in the Faculty of Philosophy at

UZH that have actively been involved in phonetics and dialectology research and

teaching for the past decade.

UZH is the largest of the 10 Swiss universities in terms of number of

students and staff members. A need for knowledge in phonetics and speech

sciences in both research and education exists across a wide variety of disciplines

such as the philologies (German, English and Romance languages), psychology,

general linguistics and others. The Phonetics Lab/Phonogram Archives can be

viewed as a hybrid institute which serves research needs in a variety of

departments and offers students from a wide range of disciplines the facilities and

expertise to carry out projects in phonetics and speech sciences at Graduate,

Postgraduate and Doctoral level. We do not offer degree courses specifically in

phonetics, but it is part of the required program for most philology students

(English, German and Romance languages) for them to visit the phonetics lectures

provided by the Phonetics Lab. Students with a deeper interest in the subject then

take part in voluntary higher level phonetics courses and graduate in a related

discipline (at any level) with a focus in a phonetic topic. Supervision and

examination of such students is provided by staff-members of the Phonetics Lab.

Our lab consists of a sound-proof booth with a supervisory window that is

well suited for high-quality speech recordings and speech perception experiments.

The booth has high-end recording equipment permanently installed, and we carry

out standard speech measurement and analysis techniques, like laryngography,

palatography and phonatory aerodynamic analysis. We also own a large variety of

portable recording devices and perceptual testing equipment for field work. In

addition, we have our own research library with the main journals in the area of

phonetics and speech sciences and a large number of monographs from all areas of

spoken language, phonetics, linguistics, acoustics, and speech and hearing

sciences. All of our facilities are easily accessible in the tower of the main UZH

building right in the heart of Zurich.

At present our team is formed by the following researchers who are actively

involved in teaching and/or research in phonetics and speech archiving

(alphabetically by surname):

Camilla Bernardasci (Student Research Assistant)

Dario Brander (Post-graduate Research Assistant)

Volker Dellwo (PhD, Assistant Professor of Phonetics/Phonology)

Elvira Glaser (PhD, Professor of German Linguistics and member of

permanent leading board)

Lea Hagmann (Student Research Assistant)

93

Ingrid Hove (PhD, part-time Lecturer)

Marie-José Kolly (Research Assitant and PhD student)

Adrian Leemann (PhD, Post-Doc in Phonetics/Speakeridentification)

Michele Loporcaro (PhD, Professor of Romance Linguistics and Head-of-

Lab)

Mathias Müller (Student Research Assistant)

Stephan Schmid (PhD, PD, Senior Lecturer of Phonetics)

Daniel Schreier (PhD, Professor of English Linguistics and member of

permanent leading board)

Michael Schwarzenbach (lic. phil, Research Assistant)

Jürg Strässler (PhD, part-time Lecturer)

Dieter Studer (lic. phil, Research Assistant)

Sibylle Sutter (Post-graduate Research Assistant)

Our research interests range from historical sound development over synchronic

dialectology to speech production, acoustics and perception, and we work on

segmental, as well as suprasegmental/prosodic levels of analysis. Work is

currently being carried out on the distribution of rhythmic patterns across Italian

and Swiss German dialects (Stephan Schmid), and we are interested in which

functions rhythmic and timing variability may have in human speech

communication (Volker Dellwo, Lea Hagmann, Mathias Müller). In a number of

pilot studies, we found that there is significant rhythmic variability between

speakers. We are now interested in how this variability can be used in areas like

speaker identification (Volker Dellwo, Adrian Leeman, Marie-José Kolly, Stephan

Schmid). For this project we received major grant funding for three years by the

Swiss National Science Foundation (SNF). We are also interested in how this

variability may help listeners to segregate two speakers speaking simultaneously

(Volker Dellwo, Dario Brander, Sibylle Sutter; see Cushing & Dellwo, 2010). For

this project we received one year start-up funding by the University of Zurich

Research Fund. Another significant expertise in the group is dialectal distribution

of sound patterns and the diachronic phonological development in Italian dialects

(Michele Loporcaro & Stephan Schmid) and Swiss German (Elvira Glaser), as

well as socio-phonetic distribution of speech features across non-standard varieties

of English (Daniel Schreier). On a yearly basis, the Romance language oriented

members of the group organize fieldwork trips to various regions of the Italian

speaking world to systematically record a wide variety of Italian accents and

dialects. These recordings have led to research on the distribution and functions of

phonemic vowel quantity across different accents of Italian and to arguments

about the historical phonological development of Romance languages (Loporcaro,

2007). For research into the historical development and synchronic dialectal

variability of Swiss German (Fleischer & Schmid, 2006, Christen, Glaser &

Friedli, 2010), the Phonogram Archives offer an impressive collection of sound

94

carriers which have been collected and archived over the past 100 years. This

material contains valuable specimens of language varieties that have since become

extinct or near-extinct – such as the West Yiddish dialect spoken in Lengnau and

Endingen (Aargau) or the franco-provençal “Patois” – formerly spoken all over

the Western (now French-speaking) part of Switzerland. It also contains early

recordings on wax disc (collaboratively recorded with the Phonogram Archives of

Vienna between 1909 and 1923), which are now part of the UNESCO Memory of

the World Programme (Fleischer & Gadmer, 2002). Major projects of the archives

(Dieter Studer, Michael Schwarzenbach, Lea Hagmann & Camilla Bernardasci)

are currently the compilation of an on-line catalogue, the production of a digital

version of the entire historic archives holdings (in collaboration with the Swiss

national Sound Archives in Lugano) and the presentation of a major exhibition on

Swiss dialects together with the Swiss National Library in Bern in 2012.

In teaching, we offer a variety of lectures, seminars and practical lab sessions

at an introductory and advanced level of phonetics. For students of philology, we

have specifically designed courses in German, English and Romance phonetics.

Additionally, we offer lab sessions in which higher level and postgraduate students

learn experimental techniques in speech production, acoustic measurements and

speech perception. In different lecture series, students are introduced to the main

concepts, as well as specialist areas of phonetics (e.g. speaker idiosyncratic

features or speech rhythmic variability). We have strong links to other departments

like Experimental Audiology or Psychology with whom we provide collaborative

PhD supervision. There are currently four PhD students in the lab, and the interest

is growing.

At present, both the Phonetics Lab and the Phonogram Archives are in a

highly dynamic situation of change. Both institutions are co-directed in different

ways by a board of professors from the philologies, Michele Loporcaro (Romance

Linguistics), Elvira Glaser (German Linguistics) and Daniel Schreier (English

Linguistics). While both institutions were rather separate entities during the past

decades, a proposal is currently being carried out to unite them in a single unit (on

a practical level, this process is nearly completed). In addition, the university

recently decided to invest into the area of spoken language sciences and

established a new Assistant Professorship in Phonetics/Phonology for which

Volker Dellwo (formerly University College London) was hired in August 2010.

With the merger of the Phonetics Lab and the Phonogram Archives, we are

expecting to strengthen phonetics and dialectology research and teaching at UZH

in the future. The group managed to attract grant funding in the past and at

present. More major and minor grant applications have been submitted over the

past months. We thus hope to further enlarge our research team and be able to

offer more funded PhD research in Phonetic Sciences at UZH in the near future.

Should we manage to convince UZH to make further investments into our lab (for

example a full-professorship in Phonetics); our aim would be to set up a degree

course in phonetics at the postgraduate level.

95

Further information on the Phonetics Lab, the Phonogram Archives and our

dynamic situation can be found at our (still separate) webpages

www.pholab.uzh.ch and www.phonogrammarchiv.uzh.ch.

References

Christen, H. , Glaser, E. and Friedli, M. (2010) Kleiner Sprachatlas der deutschen

Schweiz. Huber: Frauenfeld.

Cushing, I.R., and Dellwo, V. (2010) The role of speech rhythm in attending to

one of two simultaneous speakers. In: Electronic Proceedings of Speech

Prosody, Chicago/USA (http://speechprosody2010.illinois.edu/papers/100039

.pdf )

Dieth, E. (1950) Vademekum der Phonetik. Bern: Francke.

Fleischer, J. and Gadmer, T. (2002) Schweizer Aufnahmen–Enregistrements

Suisses–Ricordi sonori Svizzeri–Registraziuns Svizras. Sound Documents

from the Phonogrammarchiv of the Austrian Academy of Science. The

Complete Historical Collections 1899-1950, Series 6/1- 6/3. Wien:

Österreichische Akademie der Wissenschaften, Zürich: Phonogrammarchiv

der Universität Zürich.

Fleischer, J. and Schmid, S. (2006) Zurich German. In: Journal of the

International Phonetic Association 36.2: 243-253

Loporcaro, M. (2007) Facts, theory and dogmas in historical linguistics: vowel

quantity from Latin to Romance. In : Salmons J. C. and Dubenion-Smith S.

(eds.), Historical Linguistics 2005. Selected papers from the 17th

International Conference on Historical Linguistics, Madison, Wisconsin, 31

July- 5 August 2005. Amsterdam, Philadelphia: John Benjamins, 311-336.

Some staff members of the Phonetics Lab and the Phonogram Archives at Zurich

University in front of our recording cabin/sound lab (from left to right: Volker

Dellwo, Michael Schwarzenbach, Stephan Schmid, Ingrid Hove, Dieter Studer,

Camilla Bernardasci).

Volker Dellwo & Dieter Studer

e-mail: [email protected]

96

CONFERENCE REPORTS

Speech Prosody 2010

Chicago, USA, 11-14 May 2010

Speech Prosody is the biennial meeting of ISCA’s (the International Speech

Communication Association) Speech Prosody Special Interest Group (SProSIG).

In 2010, it was held in Chicago and was co-organized by various departments of

the University of Illinois at Urbana-Champaign, the Northwestern Institute on

Complex Systems and the Toyota Technological Institute. For five days (an

externally organized Satellite Workshop on the perceptual and automatic

identification of prosodic prominence took place on May 10th

), more than 300

participants attended the 270 oral and poster presentations on aspects of prosody

which play a role in various disciplines next to Linguistics, such as Psychology,

Computer Science, Speech and Hearing Science, and Electrical Engineering.

The general theme of Speech Prosody 2010 was the large diversity, as well

as the universality of prosody, also addressed in the Keynote lectures: the role of

prosody research in enriching speech engineering (Shrikanth Narayanan), prosodic

cues in first and second sign language acquisition (Diane Brentari), representations

of prosodic cues in computational models for language processing (Mari

Ostendorf), prosody from an evolutionary perspective (Steven Mithen) and from a

psycho- and neuro-linguistic perspective (Aniruddh Patel). Interestingly, the last

two Keynote lectures related language to music, adding another interdisciplinary

facet.

In addition to the Keynotes, three of the special sessions included in the

program can be regarded as highlights of this year’s conference. Their topics were

computer aided pronunciation training and prosody, experimental approaches to

focus, as well as shape, scaling, and alignment of F0 events. It has to be stated,

however, that the quality of the papers and posters was generally very high. In

particular, there were a large number of excellent student papers, which is a

promising sign for the workshops and conferences to come.

Stefan Baumann, Cologne

19th Annual Conference of the

International Association for Forensic Phonetics and Acoustics (IAFPA)

Trier, Germany, 18-21 July 2010

Seventeen years after the last IAFPA conference in Germany’s oldest city, the

Phonetics Department of the University of Trier hosted the 19th Annual

Conference of the International Association for Forensic Phonetics and Acoustics.

Prof. Dr. Angelika Braun and her team of organizers were pleased to break the

unprecedented 100 participant threshold and welcomed phoneticians and

97

acousticians from 14 countries. The main topics presented and discussed in 27

presentations and 10 posters were formants, whispered voice, speech databases,

automatic voice/speaker comparison, and language analysis for the determination

of origin (LADO).

The conference was opened by the President of the University of Trier, Prof.

Dr. Schwenkmezger, who commemorated 40 years of (forensic) phonetic

expertise at the university and at the same time assured that degrees in phonetics

will continue to be awarded in the future. In her opening address, the Dean of the

Department of Languages, Literature and Media Science Prof. Dr. Hilaria

Gössmann referred to the large number of students at this university attending this

year´s conference, citing it as evidence of an active and interested student body

and of the spirit of cooperation in the phonetics department. Both Prof.

Schwenkmezger and Prof. Gössmann stressed the importance for Trier in being

host to this high-profile international conference and wished all the participants a

successful and enjoyable time.

The first session of the 2010 conference was chaired by Jens-Peter Koester,

the founder and long-time head of Trier’s phonetics department. It started with a

presentation by Francis Nolan, Kirsty McDougall and Toby Hudson entitled

Perceived voice similarity and acoustic measures following up on previous

research towards a model of voice similarity for linguistically homogeneous

voices. Their perception experiment showed that telephone recordings level out

the perceived difference between different speakers. Furthermore, the mixing of

studio and telephone recordings increases the perceived difference between

samples from the same speaker. In a second step Nolan et al. applied

multidimensional scaling (MDS, dim1- dim5) to the perceptual results of the

studio recordings and correlated them with acoustic parameters. Some correlation

was found between dim2 - F3, dim3 - F2 and dim4 - F1. The strongest correlation

however was found between dim1 and F0 indicating the importance of

fundamental frequency to naive listeners when judging voice similarity.

These results were supported by Mette Hjortshøj Sørensen in her paper on

Perception of voice similarity by different groups of listeners. Her experiment

included 3 groups of listeners (Danish L1, Danish L2 and no knowledge of

Danish) who listened to paired Danish voice samples with the task of judging

degrees of similarity or dissimilarity. Her preliminary findings suggest that most

listeners used fundamental frequency as the main cue for their decision making

although L1 listeners utilised linguistic cues as well. She also noted that regardless

of their linguistic background, listener performance varied significantly, thus

indicating that voice-discrimination ability varied among listeners. Both findings

are relevant for earwitness testimony evaluation. Her presentation received the

2010 IAFPA student paper award.

The first day of the conference ended with a session in which two papers

shifted the focus from forensic speech evidence proper to the meta-level of

evidence presentation. Allen Hirson in his talk Electronic presentation of evidence

98

in Forensic Phonetics: A critical appraisal argued that electronic presentation of

evidence promotes effectiveness and efficiency in court. The analysis and

decision-making process of the expert becomes more comprehensible when

explained with the help of digital presentations or interactive visualizations. Jonas

Lindh, Anders Eriksson and Gustaf Nelhans concerned themselves with the

phrasing of conclusions, questioning the claim made by some scientists that the

Baysian framework actually constitutes a paradigm shift as compared to traditional

verbal scales.

The tell-tale dialect: Analysis of dialectal variation of German native

speakers in telephone conversations by Karen Masthoff, Yasmin Hadj Boubaker

and Olaf Köster showed that when they are given the task of dialect identification

on telephone voice samples, experts’ performance does not correlate with time

spent, number and type of methods applied or perceived degree of difficulty.

Individual skill and experience appear to be the dominant factors for dialect

identification performance.

Anna Czajkowskis’ contribution, Vocal tract Resonances in Voiced and

Whispered Speech and Listeners’ Perception of Voice Depth and Pitch, compared

mean F1 and F2 LPC values of voiced and whispered recordings. F1 was higher in

whispered speech for all vowels and all speakers. The same proved to be the case

for F2 except with /i/ and /u/. She also presented findings from an experiment on

listeners’ perception of a deep voice, concluding that untrained listeners may

associate low mid-points of F1/F2 vowel spaces with ’deep’ voice even if F0

values do not indicate a low voice.

Probably the most anticipated talk of the conference was Tina Cambier-

Langeveld’s presentation on Performance of native speakers and linguists in

LADO cases with true origin established. She presented results based on actual

LADO cases in which the speakers’ true origins could be confirmed beyond

reasonable doubt after the forensic speech analysis had been done. The

combination of trained native speakers and supervising linguists turned out to

perform very well with 120/124 cases (primary aim: verification of claimed

origin) and 65/69 cases (secondary aim: identification of real origin) correctly

established. Counter-expert reports by specialized linguists only on some of the

same cases did not show this level of accuracy: 1/8 correct for primary aim but

incorrect for secondary aim, 5/8 incorrect for primary aim and 2/8 inconclusive.

She concluded that both trained native speakers and linguists can contribute to

LADO and that a priori exclusion of trained native speakers is unfounded.

The session on automatic speaker and voice comparison opened with

Automatic Forensic Voice Comparison: Experiments on Real Case Data from the

BKA by Timo Becker et al. They presented findings based on experiments with

their own SPES system using real case material, confirming that transmission

channel and speaking style mismatch as well as short recording durations reduce

system performance. As a result, the use of global EER measures for automatic

voice comparison systems was discouraged. In fact, system evaluation requires

99

suitable data, matching the conditions of the case recordings in question, in order

to provide meaningful EERs.

Herman Künzel presented Automatic Speaker Identification with

Multilingual Speech Material in which he tested Batvox 3.1 for three channel

conditions (studio, landline, GSM) and language mismatch conditions (GER-RUS,

GER-POL, GER-ENG, GER-SPAN, GER-SPAN CATL). He confirmed that

system performance generally decreases with reduction of channel quality

(studio>landline>GSM). His language mismatch settings however seemed to have

no or very little effect on the system’s EERs, leading him to the conclusion that

language mismatch, at least for non-tone languages, can be ignored when using

Batvox or similar systems for automatic speaker identification.

In his presentation Empirically Assessing the Validity and Reliability of

Forensic-Comparison Systems Geoffrey Morrison explained and supported the use

of log-likelihood-ratio cost (Cllr) as an appropriate measure of accuracy for

automatic speaker recognition systems used in forensic voice-comparison.

The well-received poster sessions featured, among others, three contributions

concerning speech databases: A Swedish Dialect Database by Jonas Lindh, an

Alcohol Language Corpus by Florian Schiel et al. and a Database of Chinese

Female Voice Recordings by Cuiling Zhang and Geoffrey Morrison.

The conference ended on Wednesday afternoon with the announcement that

the next IAPFA annual conference in 2011 will be hosted by the Austrian

Academy of Sciences in Vienna, Austria.

Peter Knopp, Trier

New Sounds 2010

Sixth International Symposium on the Acquisition of Second Language

Speech

Poznań, Poland, 1-3 May, 2010

The sixth New Sounds meeting took place at Adam Mickiewicz University in

Poznań. As the name (and subtitle) suggest “New Sounds” aims to describe and

investigate the acquisition of second language speech, i.e. the

phonetic/phonological aspects of a second language acquisition. The idea of a

“New Sounds”conference was originally developed by Allan James and Jonathan

Leather who organized the first meeting in Amsterdam in 1990, as well as the

following three meetings in 1992 (Amsterdam), 1997 (Klagenfurt) and 2000

(Amsterdam once again). New Sounds returned in 2007, taking place in

Florianópolis, Brazil (organized by Barbara Baptista, Michael Watkins and

Andréia Rauber).

For 2010, the responsibility for setting up the conference was taken over by

Katarzyna Dziubalska-Kołaczyk, Magdalena Wrembel and Małgorzata Kul. With

180 participants, the Poznań conference (see also http://ifa.amu.edu.pl/newsounds/

introduction) can safely be said to be largest and most successful one yet.

100

The New Sounds conferences have always stood out thanks to being very

well-organized and providing an especially friendly and relaxed atmosphere,

which allows for fruitful and extensive discussions both during and outside of the

actual presentation sessions. The Poznań conference did not break with this

tradition. On the contrary, the excellent food supply for the lunches, the very

pleasant conference reception and a cultural program, including a guided tour of

the old city and an exhilarating choir performance, can be described as

exceptional.

Each of the three conference days was introduced by a keynote speech that

provided an overview of a core area of phonetic/phonological SLA studies while

presenting new insights into its theoretical underpinnings. Conference co-founder

Allan James opened the meeting with a talk entitled “Sounds new? Extending the

explanatory remit of second language phonology: identifications, multivalent

sound categories and a use take on acquisition” in which he argued that recent,

different sociolinguistically influenced conceptions of language, which involve

‘unordered scenarios’ of selective learning, partial competence and performance

without competence, should also be reflected in the acquisition process and thus

the phonetic and phonological paradigms used to describe it.

For the second keynote speech, it was especially fortunate that the organizers

were successful in coaxing a relaxed, serene and helpful (many of the younger

researchers at the conference benefitted from his advice and encouragement) Jim

Flege out of retirement in Italy. Now an “immigrant” and late L2 learner himself,

Flege spoke about his latest insights into an area, to which he has already

contributed very much, namely “Age effects on second language acquisition”. He

concentrated especially on the factor of age of arrival (AOA), what underlying

variables (neural maturation, cognitive changes across the life span, change in the

way L1 and L2 systems interact, and difference in L2 input) may be correlated

with it, and how this co-variation among multiple variables might be controlled.

Finally, Martha Young-Scholten started the last day of the conference by

introducing her most recent ideas and undertakings in the study of “Development

in L2 phonology”. She convincingly argued that in order to effectively compare

the different stages of phonological development in native and non-native learners,

there is a need for longitudinal studies that involve naturalistic L2 learners, i.e.,

learners under conditions comparable to those applying to younger L1 learners

(who do, of course, receive regular and plentiful input from the native speakers of

their target language, but have no or very limited exposure to written text).

She presented data from three learners of L2 German, analyzing their

progress in terms of the successive re-ranking of OT constraints.

The fact that the Poznań meeting has been the biggest New Sounds

conference to date can certainly be interpreted to mean that the study of the

acquisition of second language speech phonetics/phonology is a growing area.

This is also reflected by the increasing variety within the field.

101

In order to provide an impression of the multitude of different subjects

addressed during the conference, a classification of major blocks of topics seems

useful, even though it is of course subjective (and deviates slightly from the

categories the organizers had proposed before the conference). Similarly, the

following overview of papers given at the conference is just as subjective and

guided by what the author of this report witnessed himself and/or perceived as

interesting.

Segmental production of second language speech

“Production of English interdental fricatives by Dutch, German, and English

speakers” by Adriana Hanulikova and Andrea Weber examined the substitution of

/θ/ by other sounds. German learners tend towards /s/, while the majority of Dutch

learners prefer /t/. Besides the distribution of the substitutions, the study also

aimed to compare these productions with actually intended /t, s/ productions and

acoustically analyzed those instances of /θ/ where the speakers succeeded.

In his study on “Voiced obstruents in L2 French: the case of Swiss German

learners” Stephan Schmid showed that speakers of Swiss German, depending on

phonotactic context, frequently did not reproduce voicing in obstruents when

speaking French, realizing contrasts instead by means of longer/shorter durations.

Thorsten Piske (co-authors James Flege, Ian MacKay and Diane Meador)

gave a presentation “Investigating native and non-native vowels produced in

conversational speech” arguing that true mastery of L2 vowels should be

determined with respect to this more realistic and more challenging criterion.

An instrumental approach measuring “Language-specific articulatory

settings in L2 speech” and comparing them to native speaker settings was

demonstrated in a paper by Sonja Schäffler, Ineke Mennen and James Scobbie.

Rob Drummond combined L2 research with sociolinguistic aspects in his

study of native Polish speakers in Manchester adopting local features, i.e.,

northern high, rounded pronunciation of the STRUT vowel vs. more widespread

features like t-glottaling (“Speaking like the locals - the acquisition of local accent

features by native Polish speakers living in Manchester”)

L2 speech perception

Silke Hamann, Paul Boersma and Małgorzata Ćavar examined whether closely

related languages show a similar use of perceptual cues to identify phonological

categories, thus facilitating L2 learning (“Language-specific differences in the

weighting of perceptual cues for labiodentals”). They investigated such perceptual

cues as duration, amplitude of friction noise and percentage of voicing, for the

Dutch labiodentals /f, v, υ/ and how they would be perceived by native speakers of

German, English, Croatian and Polish. Preliminary results indicated that the

number of labiodental categories in these second languages was more influential

than being a member of the same language family.

102

Joan C. Mora, James L. Keidel and James Flege argued that the perception

of the contrasts between the mid vowels /e/ - /ε/ and /o/ - /ɔ/ was difficult even for

Spanish-Catalan bilinguals because of a smaller degree of categoriality. A higher

percentage of language use/experience was the most important factor for success

(“Why are Catalan contrasts between /e/ - /ε/ and /o/ - /ɔ/ so difficult for even

early Spanish-Catalan bilinguals to perceive?”).

In their study of “The impact of visual cues and lexical knowledge on the

perception of a non-native consonant contrast for Colombian adults” Michele

Thompson and Valerie Hazan showed not only that both of the mentioned

parameters were indeed used to support the identification of contrasts (e.g., /b/ vs.

/v/), but also that there seemed to be a culture-specific bias with respect to the use

of visual cues, as the Colombian speakers relied much more on them than Korean

or mainland Spanish speakers did in earlier studies.

Several studies, of course, combined production and perceptual data from

L2 speakers, e.g. “Speech production and perception findings for native German

speakers learning English as a second language” by Bruce L. Smith and Rachel

Hayes-Harb or “Individual variation in the production and perception of SL

phonemes: French speakers learning /i - ɪ/” by Georgina Oliver and Paul Iverson,

who showed in their experiment that L2 vowel production was not highly linked to

L2 vowel perception. They interpreted this result as indicating that learning an L2

category did not rely on just a single underlying ability or representation.

There were also a number of studies that examined perceptual abilities

employing neurolinguistic methods. Nuria Kaufmann, Martin Meyer and Stephan

Schmid, for example, performed an EEG experiment using mismatch negativity

paradigms to investigate contrasts between Serbian affricates as perceived by

native speakers of Swiss German and of Rhaeto-Romance (“Phonetic contrasts in

foreign language perception: A neuropsychological study on Serbian affricates”).

Cheryl Frenck-Mestre and colleagues also used event-related potentials to

investigate the perception of contrasts between the American English vowels /ε/,

/æ/ and /ɪ/ by native speakers of American English, of French and by late French-

English bilinguals (“ERP evidence of the acquisition of non-native contrasts in

late learners”).

Prosody

The number of studies dealing with prosodic features has increased in recent years

and the field was also well-represented at New Sounds 2010. Ineke Mennen, Aoju

Chen and Fredrik Karlsson’s paper “Characterising the internal structure of learner

intonation and its development over time” examined the internal organization and

longitudinal development of L2 learner intonation. Their approach thus did not

look at individual aspects of intonation, but aimed to describe each learner

intonation variety in its entirety. Results suggested that apart from language-

103

specific transfer phenomena, learners started out with a set of basic elements to

build a simple, but efficient intonation system.

“Categorizing Mandarin tones into prosodic categories: the role of phonetic

properties” by Connie K. So and Catherine T. Best described how L2 learners

perceived foreign tones according to the pitch patterns of the intonational

categories in their native prosodic systems. Speakers of non-tone languages (e.g.

English or French) therefore assimilated Mandarin tones into the corresponding

categories (e.g., Mandarin tone 3 (fall-rise) may be interpreted as expressing

uncertainty).

The realization of different types of focus (narrow, broad, contrastive) as a

source of foreign accent was discussed in Mary O’Brien and Ulrike Gut’s paper

“Phonological and phonetic realisation of different types of focus in L2 speech.”

Johannes Schliesser’s poster on “Prosodic encoding of focus and sentence

mode in L2” also considered the realization of focus in L2 speech and especially

considered Gussenhoven’s biological codes as an explanation for patterns that

transfer from the L1 cannot easily account for.

Foreign accent detection/identification

Steven Weinberger and Stephen Kunath introduced “A computational model for

accent identification”, the Speech Transcription Analysis Tool (STAT), which

used segment and syllable structure generalizations, such as vowel shortening,

final obstruent devoicing, palatalization, interdental fricative substitution, vowel

epenthesis or consonant deletion to derive a specific set of phonological speech

patterns that are characteristic of a particular foreign accent.

Sylwia Scheuer’s presentation “How sure are judges about their foreign

accent judgments?” on the other hand, dealt with human quality judgments of

foreign accent. Scheuer confirmed that judges are constant in their ratings and on

that basis attempted to identify those phonetic features (in this case of L2 English)

that promise to provide the greatest reliability.

Teaching

The studies just described do of course have a close connection to the applied

aspects of the study of second language speech, i.e. pronunciation teaching. New

Sounds also offered various papers dealing with particular phonetic phenomena

that trigger the impression of foreign accent.

Walcir Cardoso, co-host of New Sounds 2013 in Montréal, looked at the

production of foreign /s/ + consonant clusters by learners, e.g. speakers of

Brazilian Portuguese, who were not familiar with them (“Teaching foreign sC

onset clusters: Comparing the effect of three types of instruction”). He tested the

success of three different forms of instruction (and the underlying philosophy)

finding that the Projection Model of Markedness showed the largest instructional

effect.

104

Wiktor Gonet, Jolanta Szpyra-Kosłowska and Radosław Święciński

investigated why the velar nasal /ŋ/ is especially difficult for Polish learners of

English to acquire when it is not followed by a velar plosive (“Acquiring angma –

the velar nasal in advanced learners’ English”), while Esther Gómez Lacabex and

María Luisa García Lecumberri demonstrated success in instructing native

speakers of Spanish to produce correct instances of vowel reduction in English

(“Investigating training effects in the production of English weak forms by

Spanish learners”).

Factors influencing second language performance

The study of the various individual parameters that play a role in a learner’s

overall competence has always been one of the major subjects in second language

speech research. New Sounds again included many interesting papers devoted to

particular aspects of the individual and demonstrated their relevance. Various

areas were covered, ranging from cognitive psychology, e.g. “Phonological short-

term memory and L2 speech learning in adulthood” by Cristina Aliaga-Garcia,

Joan C. Mora and Eva Cerviño-Povedano to “classic” factors ,like age, albeit from

the unusual perspective of very young learners as in Henning Wode’s talk on “L2

phonological acquisition by young learners: Evidence from production” to other,

somewhat external, linguistic aspects,and Yasaman Rafat’s paper on “Orthography

as a conditioning factor in L2 transfer: evidence from English speakers’

production of Spanish consonants.”

Several presentations also attempted to investigate the possible interactions

between different phonetic abilities and the many known relevant psychological

and neurological factors, as well as those describing the external circumstances of

acquisition in order to isolate the significance a particular parameter. This is the

case in the study “Investigating the concept of talent in phonetic performance” by

Matthias Jilka, Natalie Lewandowska and Giuseppina Rota and a connected

investigation of the phenomenon of phonetic convergence as an indicator of talent

(“Is dynamic phonetic adaptation in dialog related to talent?” by Lewandowski

,Jilka and Grzegorz Dogil). Yoon Hyun Kim and Valerie Hazan’s study on

“Individual variability in perceptual learning of L2 speech sounds and its cognitive

correlates” also followed a similar methodology (use of a test battery covering

various cognitive abilities) in order to investigate individual variability in

discriminating non-native phonetic contrasts.

Models and theories of the acquisition of second language speech

Another important aspect of second language acquisition research was provided by

studies that explicitly attempt to contribute to the (further) development and

explanatory/predictive power of models of sound acquisition and/or

representation.

Ocke-Schwen Bohn and Catherine T. Best attempted to account for native

German listeners’ abilities to perceive the constrasts between the American

105

English approximants /r/, /l/, /w/ and /j/ in terms of Flege’s Speech Learning

Model and Best’s own Perceptual Assimilation Model.

John Archibald argued for the existence of a L1 phonological filter that can

be overcome by especially robust cues, explaining why certain articulations,

although equally unfamiliar to learners, are acquired more easily than others

(“Conditions for overriding the L1 phonological filter”)

Finally, conference host Katarzyna Dziubalska-Kołaczyk and co-author

Daria Zielińska presented an approach predicting preferred and dispreferred

consonant clusters based on the recognition of phonotactic and morphonotactic

(sound clusters across morphological boundaries) structures. Phonotactic

preferences were based on the notion of markedness, which in turn was defined by

the perceptual distance between segments (as measured according to Dziubalska-

Kołaczyk’s own Net Auditory Distance Principle). Morphonotactic clusters

behaved differently as they contained morphological information and markedness

was used to signal their function.

As indicated earlier, this can only be a subjective, somewhat

impressionistic summary of the many interesting presentation given at New

Sounds 2010. Full Proceedings can be found at

http://ifa.amu.edu.pl/newsounds/Proceedings_guidelines.

The conference organizers intend to publish two books with more elaborate

versions of many of the presented papers early next year.

The next New Sounds conference will take place in 2013 at Concordia

University in Montréal, Canada!

Matthias Jilka, Stuttgart

106

BOOK REVIEWS

Steve Parker ed. (2009) Phonological Argumentation. Essays on Evidence and

Motivation. London/Oakville: Equinox (377 pp. ISBN 978-1-84553-221-5)

Reviewed by: Péter Siptár

Eötvös Loránd University, Budapest, Hungary

e-mail: [email protected]

The Equinox series Advances in Optimality Theory (series editors: Ellen Woolford

and Armin Mester) was launched in 2007 with John J. McCarthy’s monograph

Hidden Generalizations: Phonological Opacity in Optimality Theory. The present

volume is the fifth in the series and is a Festschrift for McCarthy, written by his

former students, all of them alumni of the graduate school of the University of

Massachusetts at Amherst (except Joe Pater who is McCarthy’s colleague, a

professor in the Department of Linguistics there). The book has a Foreword by

Elisabeth Selkirk and the editor’s Introduction includes excerpts from some of the

authors’ personal comments on John McCarthy.

The eleven chapters of the collection all discuss the process of phonological

argumentation, the way the validity (or otherwise) of particular phonological

analyses can (or must) be demonstrated within the framework of Optimality

Theory (and in general). The chapters are divided into two main sections: the first

six chapters discuss the evidence for, and the methodology used in, discovering

the bases of phonological theory (i.e., how constraints are formed and what sort of

evidence is relevant in positing them); the last five chapters present case studies

that focus on particular theoretical issues within OT through various phenomena

in one or several languages, arguing in favour of or against specific formal

analyses.

Andries W. Coetzee’s “Grammar is both categorical and gradient” (pp. 9–42)

motivates the claim in its title by presenting the results of psycholinguistic

experiments involving speakers of English and Hebrew. In particular, the author

shows that the subjects’ mental grammars are capable of making both categorical

and gradient judgements about the well-formedness of hypothetical word-like

forms. He also proposes a new type of comparative OT tableau to model both

types of decision-making behaviour, pointing out that traditional grammars are

unable to handle them. Standard derivational models of generative grammar can

easily account for the categorical distinction between grammatical and

ungrammatical forms but have some difficulty with gradient well-formedness

distinctions. On the other hand, models in which the bifurcation of grammatical

and ungrammatical forms does not exist, that is, where an ungrammatical form is

taken to be simply a form with extremely low probability of occurrence, are also

challenged by the experimental results. The author argues that the inherent

107

comparative character of OT grammars enables that theory to model both kinds of

behaviours in a straightforward manner.

Paul de Lacy’s contribution on “Phonological evidence” (pp. 43–77)

examines the innatist theory of generative grammar’s phonological component and

related modules, asking what such a framework identifies as empirical evidence

that supports it. The chapter also refers to predicted ambiguities where two or

more modules influence the same phenomenon. Specifically, the author discusses

phenomena like alternations, phonotactics, phonetic neutralization, free variation,

diachronic change, loanword adaptation, language games, language acquisition

data, and typological frequency, and concludes that the theory – or at least its

phonological component – does not claim responsibility for many of these

phenomena. Based on his earlier work on markedness, he proposes methods to

help separate valid from spurious evidence.

Elliott Moreton’s “Underphonologization and modularity bias” (pp. 79–101)

proposes a stochastic learning algorithm to capture the relative frequency of

phonologization effects, showing that the model derives the correct results in a

simulation of typological patterns involving tones interacting with other tones. The

author concludes that the hypothesis pairing “hard typology” (what grammars are

cognitively possible) with Universal Grammar and “soft typology” (how frequent

they are) with other factors affecting language change is probably too strong.

“Cognition and phonetics interact to determine typology in ways more

complicated (and interesting) than has been generally acknowledged. Further

progress will require a better quantitative understanding of the typology of

phonetic precursors, and of the differential receptiveness of learners to different

patterns” (p. 100).

Máire Ní Chiosáin and Jaye Padgett’s “Contrast, comparison sets, and the

perceptual space” (pp. 103–121) uses a systemic approach couched in Flemming’s

Dispersion Theory to argue for a principled restriction of the perceptual space of

comparison sets which resolves the problem of infinite candidate generation. The

discussion focuses on secondary palatalization contrasts in onset versus coda

position, using perceptual data from Irish.

Joe Pater’s “Morpheme-specific phonology: Constraint indexation and

inconsistency resolution” (123–154) argues that exceptions and other instances of

morpheme-specific phonology are best analysed in OT in terms of lexically

indexed markedness and faithfulness constraints (as opposed to lexically specified

rankings, i.e., cophonologies). This approach can capture locality restrictions,

distinctions between exceptional and truly impossible patterns, distinctions

between blocking and triggering, and distinctions between variation and

exceptionality. The chapter discusses data from Assamese, Finnish, and Yine

(formerly known as Piro) and provides a learnability account of the genesis of

lexically indexed constraints.

Jennifer L. Smith’s “Source similarity in loanword adaptation:

Correspondence Theory and the posited source-language representation” (pp. 155–

108

177) assumes a correspondence relation between loanwords and their “pLs

representations”, i.e., the borrower’s posited representation of the source-language

form, allowing for a consistent account of the interaction between phonological

adaptation processes and factors such as perception and orthography. The author

provides empirical support from Japanese, Finnish, Hmong, and Sranan,

predicting multiple phonological adaptation strategies for loanwords.

Part Two of the volume includes five case studies. John Alderete’s

“Exploring recursivity, stringency, and gradience in the Pama-Nyungan stress

continuum” (pp. 181–202) reviews contemporary approaches to the morphological

influences on stress in Diyari, Dyirbal, Warlpiri, and other Pama-Nyungan

languages. The author develops nine different theories to account for the variation

found that differ in the constraints responsible for edge effects in stress and the

alignment of morphological and prosodic structure. Analysing the factorial

typology of each theory, the author comes up with three conclusions. First,

stringency (special-general) relations between morpho-prosodic alignment

constraints are necessary because theories that ignore them either fail to describe

all relevant data or predict the existence of implausible (and unattested) stress

patterns. Second, some gradiently evaluated constraints have to stay even though

some others can (and must) be dispensed with. And third, McCarthy and Prince’s

recursive prosodic word analysis can be given both theoretical and empirical

support.

Maria Gouskova and Nancy Hall’s “Acoustics of epenthetic vowels in

Lebanese Arabic” (pp. 203–225) examines Lebanese epenthetic vowels through

acoustic experiments and shows that such vowels have phonetic traces that can

help learners distinguish them from underlying vowels. Although epenthetic and

lexical vowels are often transcribed as identical, they turn out to be acoustically

distinct: epenthetic vowels are either shorter or backer or both. The authors

propose a learning strategy based on McCarthy’s theory of Candidate Chains that

provides a way to model this incomplete neutralization and its opaque interaction

with stress assignment. In particular, they suggest that phonetic implementation

optionally accesses an intermediate level of phonological derivation, that is, a

stage that is closer to the underlying representation than the (fully neutralized)

surface phonological form of the given item.

Junko Ito and Armin Mester’s “The onset of the prosodic word” (pp. 227–

260) is my personal favourite in the whole volume. In one of the pioneering works

of OT, McCarthy offered a comprehensive analysis of r-insertion in non-rhotic

English dialects, suggesting that the constraint driving the process was not an

onset-related one but rather a constraint requiring prosodic words to end in a

consonant. This paper shows that this counter-intuitive ‘anti-wellformedness’

constraint can be done away with on the basis of an enriched view of prosodic

constituent structure involving functional morphemes and the onset properties of

the maximal prosodic word. “Empirically, our analysis not only accounts for the

complex distribution of the linking r-consonant in RP and the Eastern

109

Massachusetts dialect, but also extends straightforwardly to the different

distributions in other dialects. While preserving the central insights of

[McCarthy’s paper], which remains not just a classic but also a model of

optimality-theoretic analysis, the present proposal is theoretically grounded in

correspondence theory (positional faithfulness), and is a natural outgrowth of a

conception of prosodic structure that views function words as occupying positions

within extended word structures (maximal prosodic words)” (pp. 256–7).

Ania Łubowitz’s “Infixation as morpheme absorption” (pp. 261–284)

presents evidence that infixes in Palauan and Akkadian are subject to feature

cooccurrence restrictions (OCP) on the root domain, whereas segmentally

identical prefixes are not. In order to account for this asymmetry, the author

proposes that infixes are structurally incorporated into the root morpheme in the

output through a process called morpheme absorption.

Finally, Sam Rosenthall’s “Vowel length in Arabic verb stems” (pp. 285–

307) relies on a foundational insight of OT, the interaction between ranked and

violable constraints, in analysing the intricate morphophonemics of Arabic verb

roots containing a glide as one of their radicals. Vowel coalescence and

compensatory lengthening are both seen to arise from the same subhierarchy of

constraints, but only if verb roots are crucially triliteral underlyingly. The chapter

also argues for a prosodic analysis of verb stems, in accordance with McCarthy

and Prince’s Prosodic Morphology Hypothesis.

The back matter includes a cumulative list of References (pp. 308–347), as

well as an author index, an index of constraints, an index of languages, and a

subject index.

All in all, this is an important book and, although by no means an easy

bedside reading, it is thoroughly enjoyable even for readers whose acquaintance

with the current OT scene is somewhat superficial. It is a pity that the volume is

riddled by a substantial number of typos of various sorts from simple

misalignments (as on p. 275 (26) or p. 294 (14)) through cases like “it is difficult

how to see how” (p. 138), “a language that that neutralizes contrasts” (p. 149), “it

less likely to affect” (p. 210), “the fact that that the optimal stem has a long vowel”

(p. 298), “as well in as clusters” (p. 154, fn. 3), “such as constraint” (for such a

constraint, p. 258, fn. 8), to truly embarrassing instances like “case ending suffix”

for infinitive suffix (p. 283 fn. 20), “obstruent-sonorant clusters” for sonorant-

obstruent clusters (p. 205 (4)), and even transcription errors (in nonsense items)

like “stʌt” for stɔɪt (p. 37). Perhaps the most serious error is this: “a special

PRECEDENCE constraint requires that epenthesis precede insertion of stress” where

the correct requirement is that stress assignment precedes epenthesis (p. 219). The

typographic details of referencing conventions are not uniform throughout

(Alderete’s chapter is the odd man out in this respect). And even the editor’s own

name is misspelt at one point as “Stever Parker” (p. 75, fn. 1).

110

Such minor (or not-so-minor) imperfections notwithstanding, the book will

be of interest to anyone who seriously follows what is going on in the field of

phonology in general and Optimality Theory in particular.

Géza Németh & Gábor Olaszy eds. (2010) A magyar beszéd. Beszédkutatás,

beszédtechnológia, beszédinformációs rendszerek

[Hungarian Speech. Speech research, speech technology, speech information

systems]

Budapest: Akadémiai Kiadó (708 pp. ISBN 978-963-05-8966-6)

Reviewed by: Péter Siptár

Eötvös Loránd University, Budapest, Hungary

e-mail: [email protected]

Speech technology is one of the new industries of the late twentieth and early

twenty-first centuries – and this volume is its first systematic book-size overview

in Hungarian and on Hungarian. As the various devices and services of speech

technology, with their functions growing fast both in number and in diversity,

become part of our everyday lives and especially part and parcel of our children’s

lives, it is increasingly important that they are made interesting, attractive, easy to

learn and simple to use. The forms and functions of human speech communication

have taken several millennia to emerge; their application for information exchange

between man and machine is therefore a great opportunity and a great challenge.

Scientists have only taken the very first steps in that direction. So far; their

machines have but a tiny fraction of the communicative endowments of human

speakers at their disposal, especially with respect to the realm of meaning or

semantic interpretation.

With respect to the time of a potential financial breakthrough for speech

technology solutions, serious experts had predicted back in the 1980s that an

exponential increase was to be expected in the English-language speech

recognition market in a matter of two years or so. This did not happen; at best,

linear development took place; a fact that discouraged decision makers who

wielded influence over financial resources. Ever since, due to a tension between

marketing promises and actual performance, cycles of increased attention followed

by less awareness can be observed every five or six years. However, if we

compare the early eighties with the present day, the overall rate of development is

enormous. Fortunately, speech research has a significant tradition in Hungary.

Hence, it is not necessary for Hungarians to wait for technologies from big

multinational companies to fill the relatively small market of this country. Instead,

the Hungarians have found competitive solutions based on their own intellectual

and material resources.

111

This book is a compendium of what current results of scientific and

technological research have to tell us about Hungarian speech in the twenty-first

century. The aim of the authors, as the editors point out in the preface, is to present

an overview of the acoustic structure of present-day Hungarian speech, and to

review the recent results, problem areas, and applications of speech technology as

a relatively new interdisciplinary area of research, especially insofar as it pertains

to Hungary. The book has essential chapters (for instance, those on speech

acoustics or signal processing), as well as chapters on various applications and

technologies that characterize the state of the art technology. The book has an

associated homepage (http://magyarbeszed. tmit.bme.hu) that contains a host of

relevant data that had to be left out of the book due to lack of space.

The authors are leading speech technology experts of this country: Géza

Németh and Gábor Olaszy (the two editors), as well as Kálmán Abari, Mátyás

Bartalis, Tamás Bőhm, Tamás Gábor Csapó, László Czap, Tibor Fegyó, Géza

Kiss, Péter Mihajlik, György Szaszák, György Takács, Péter Tatai, Bálint Tóth,

Klára Vicsi, Ákos Viktóriusz, and Csaba Zainkó. The volume also has a

“supervising editor”, Géza Gordos.

The book has four large sections, preceded by a preface, a list of authors, and

a key to abbreviations, and followed by a large list of references, an appendix and

an index. The first section (People, language, and speech, pp. 1–92) consists of

four introductory chapters (Speech and the information society, pp. 3–7, The

complex structure of speech, pp. 9–18, Physiological and physical basics, pp. 19–

71, The connection between speech and writing, pp. 73–92). The second section,

still on a preliminary note, but focusing on Hungarian, discusses The structural

analysis of speech (pp. 93–205). This is the part of the book that is closest to

linguistic phonetics and is divided into two chapters (The segmental structure of

speech, pp. 95–170, and The suprasegmental structure of speech, pp. 171–205).

The third and largest section (Speech technology, pp. 207–522) discusses The

science of speech technology (pp. 209–259), Data bases serving speech

technology (pp. 261–331), Speech perception and recognition by machine (pp.

333–409), and Speech production by machine (pp. 411–522). Finally, the fourth

section (Applications of speech technology, pp. 523–655) tells us about Speech

information systems (pp. 525–539), provides Examples of the areas of application

of speech technology (pp. 541–629), lists Interfaces, standards, homepages, and

programs (631–651), and concludes in a very brief chapter by Nick Campbell and

Géza Németh on The future of speech technology (pp. 653–655), the last sentence

of which is “Speech technology is roughly at a stage of development that the

vehicle industry had reached by 1900.” It remains for the reader to decide whether

this is an optimistic or a pessimistic note to end a book like this on.

The book is primarily intended as a textbook for students of informatics.

However, it will also be useful for experts and decision makers in

telecommunication, speech technology research and development, designers of

112

content providing services, the health industry and rehabilitation. But the authors

had an even larger audience in mind when writing it. In their view, the book may

turn out to be useful in a range of less technologically minded university courses

in the humanities and elsewhere (phonetics, speech analysis, linguistics, speech

psychology, health promotion and disease prevention, mass communication, and

so on). The authors furthermore recommend this book for secondary schools, and

indeed for anybody who might be interested (like physicists, linguists, people who

work for radio or television or in the movie industry or media experts in popular

science). The comprehensive contents and the relatively popular attitude of the

book make it readable and even enjoyable for everybody from philosophers to

engineers (and beyond).

Halicki, Shannon D. (2010)

Learner Knowledge of Target Phonotactics: Judgements of French Word

Transformations

Lincom GmbH, (LINCOM Studies in Language Acquisition Series (LSLA), 27),

ix + 234 pages,

ISBN 9783895867408, price €65,10

USD 79.70 / EUR 64.80 / GBP 55.10.2009.

Reviewed by

Chantal Paboudjian

University of Provence, Aix-en-Provence, France

This 27th

volume of the LINCOM Studies in Language Acquisition series contains

the publication of the Ph.D. dissertation of Dr. Shannon Halicki who is now

assistant professor of French and Spanish at the Department of Humanities of

West Liberty University in West Virginia. The dissertation was defended in 2009

at Indiana University in Bloomington.

Throughout its 7 chapters, the book addresses the relationship between

language learners’ inter-language phonology and Universal Grammar (UG). More

precisely, it seeks to determine the extent to which inter-language phonology is

constrained by UG principles. Following Chomsky’s 1965 Aspects of the Theory

of Syntax, it is assumed that an innate language learning mechanism is intact in

adult second language acquisition. Learners would thus be equipped with pre-

existing knowledge that makes acquisition possible. The author takes the opposing

view of most studies on the subject attesting that second language phonology is

not native-like and attempts to investigate if English adult second-language (L2)

learners of French acquire L2 phonotactic constraints at abstract levels in the same

way as native learners and hypothesizes that learners can reconfigure L1

113

parameters to accommodate new L2 material. Two major research questions are

addressed:

“Do non-native speakers of a language exhibit consistent judgments of

wordlikeness in their target language?” and if they do,

“Are the judgements native-like, driven by L1 transfer and inhabit the niche

occupied by native language phonologies?”

To provide answers to these questions, the author has been testing L2 knowledge

of three structural features which differ between native language (L1) and L2, i.e.,

consonant cluster limits in French, sonorancy assimilation at morpheme

boundaries, and similarity avoidance at morpheme boundaries.

Chapter 1. Introduction and Background (pp. 1-33) reviews arguments

that grammars (including phonological grammars) are generative systems whose

acquisition is driven by an innate learning mechanism. It presents research on

language acquisition, particularly syntactic well-formedness and interpretation, by

innate ability for both native and L2 learners. It also briefly addresses issues such

as evidence of native speaker intuition about phonotactics, L2 learners’ acquisition

of non-learnable knowledge about the target language (not transferred from the

L1), the relationship between UG constraints and phonology and L2 phonological

systems (with focus on learner’s pronunciation). The chapter concludes with a

presentation of the research questions the author studies in the volume.

Chapter 2. Studies in native and learner phonotactic performance (pp.

32-66). Since L2 learners seem to demonstrate native-like judgments of syntactic

well-formedness and interpretation, the author asks whether they demonstrate

similar abilities in L2 phonotactics. She thus surveys literature relevant to the

study of L2 phonotactic knowledge with special focus on syllable well-formedness

contrasts (relationship between markedness and language universals) and the

concept of ‘wordlikeness’ in cognitive linguistics. She also reviews two relevant

L2 studies carried out in an Optimality Theory framework and presents studies

showing the importance of an abstract phonological level in the account of the

data.

Chapter 3. The learnability of French syllable Constraints by L1 English

Speakers (pp. 67- 110). The author examines here the ‘learnability’ (author’s

expression) of constraints on French consonant clusters exhibiting L1-L2

contrasts. She describes facts of syllable structure in French and English in order

to specify the nature of the learning task, the type of input available to learners as

well as representations that may be transferred from the L2 system. The

parametric difference of syllable structure between English and French is

presented using McCarthy and Prince’s Prosodic Morphology analysis. Syllable

structure constraints and a detailed description of the French maximal syllable in

codas are provided and illustrated and the validity of some minor linguistic

phenomena such as word transformations is discussed. A section further analyses

114

the rules of popular French re-suffixation (assimilated to slang language

manipulations) which are difficult to acquire by L2 learners.

Chapter 4. Experimental Design and Methodology (pp. 111-130)

describes the design of the word-building experiment and the statistical procedures

used in the data analysis. Three structural features, i.e., consonant cluster limit,

sonorancy assimilation and continuancy dissimilation, have been tested. The tests,

designed to probe intuitions regarding the well-formedness of re-suffixed items in

French, instructed intermediate and advanced English-speaking learners of French

as well as native French speakers to give their levels of acceptance of series of

items with sequences varying at the phonotactic level. The stimuli, the

questionnaire and the experimental hypotheses for the tests are described.

Moreover the questionnaires for participants and lists of the test items are provided

in the volume appendix.

Chapter 5. Quantitative Results (pp. 131-161) provides quantitative

results for the 3 tests in the experiment illustrated by 24 tables and figures. Both

French native speakers and English learners of French appear to exhibit similar

judgments of asymmetries in the well-formedness of proposed nonce re-suffixed

items with level of confidence increasing with language proficiency. However a

difference is noted in the rates of acceptances of some consonant clusters in nonce

words between learners and native speakers. Legal sequences in French were

accepted within the context of roots but not in derivations.

Chapter 6. Discussion (pp. 162-189). In this last part, the author interprets

and discusses the central findings of Chapter 5, which are that advanced and

intermediate learners as well as native French speakers rejected some items but

accepted others as well-formed. The author concludes that formal phonological

grammar (knowledge of phonotactic constraints and knowledge of alternations) is

the primary locus of both groups. She argues that L2 learners construct the

phonological shape of the suffix at the prosodic level and obey constraints on

operations having to do with the preservation of roots, and specification of

phonological features of allomorphs. She discusses the potential influences of

lexical frequency and universal markedness which would predict outcomes in the

word judgment task.

Chapter 7. Conclusion (pp. 190-203). This chapter contains a general

discussion with conclusions drawn from the experimental findings. The author

points out two novel aspects of the presented research: (1) her adoption of a new

approach to issues such as learner simplification strategies, which is the

Optimality Theory framework which assumes the universality of constraints on

language output as well as the parametric difference between languages; (2) her

adoption of a new psycholinguistic approach in phonological testing, that is the

introduction of the notion of relative acceptability/rejection taking into account

gradient judgments of listeners. Finally two sections stress the role of UG in L2

phonology and of lexical frequencies in phonotactic knowledge. The concluding

remarks show that a line of research has been opened up onto important issues in

115

language acquisition. Further research could focus on the order of acquisition of

structures and the definition of cues needed to establish correct parameter settings.

The first impression made by this book is that it is geared towards language

acquisition specialists who are fluent readers of English. Other readers may be

discouraged by the complexity of a presentation (particularly in the last two

chapters) more suited to a dissertation than to the communication of a scientific

work to a larger public. In addition, the use of a small font size doesn't make

reading easier for some. However language acquisition specialists will

acknowledge the tremendous work that has been conducted in the presentation of

the reviews, and language teachers will appreciate the analysis of research on

language acquisition mechanisms. French teachers will also find helpful and

sometimes practical information they can directly use in their teaching approach.

Reference

Chomsky, Noam (1965). Aspects of the Theory of Syntax, Cambridge, MA, MIT

Press.

116

WORKSHOPS AND CONFERENCES

+++

2-3 May 2012

The Listening Talker (LISTA) Workshop

Edinburgh, Scotland

+++

2-4 May 2012

2nd Workshop on Sound Change

Kloster Seeon, Germany

+++

21-27 May 2012

8th International Conference on Language Resources and Evaluation (LREC)

Istanbul, Turkey

+++

22-25 May 2012

Speech Prosody 2012

Shanghai, China

+++

26 May 2012

4th International Workshop on Corpora for Research on Emotion Sentiment &

Social Signals

Istanbul, Turkey

+++

19-21 July 2012

Interdisciplinary Workshop on Perspectives on Rhythm and Timing

Glasgow, UK

+++

27-29 July 2012

Lab Phon 13

Stuttgart, Germany

+++

5-8 August 2012

Annual Conference of the International Association for Forensic Phonetics and

Acoustics (IAFPA)

Santander, Spain

117

+++

3-5 September 2012

ISICS 2012: International Symposium on Imitation and Convergence in Speech

Aix-en-Provence, France

+++

September 7-8, 2012

Interdisciplinary Workshop on Feedback Behaviors in Dialog

Portland, U.S.

+++

09-13 September 2012

Interspeech 2012

Portland, U.S.

+++

25-29 August 2013

Interspeech 2013

Lyon, France

+++

7-11 September 2014

Interspeech 2014

Singapore

+++

August 2015

18th International Congress of the Phonetic Sciences (ICPhS)

Glasgow, Scotland

+++

September 2015

Interspeech 2015

Dresden, Germany

118

CALL FOR PAPERS

The Phonetician will publish peer-reviewed papers and short articles in all areas of

speech science including articulatory, acoustic phonetics, speech production and

perception, speech synthesis, speech technology, applied phonetics,

psycholinguistics, sociophonetics, history of phonetics, etc. Contributions should

primarily focus on experimental work but theoretical and methodological papers

will also be considered. Papers should be original works that have not been

published and are not considered for publication elsewhere.

Authors should follow the guidelines of the Journal of Phonetics for the

preparation of their manuscripts. Manuscripts will be reviewed anonymously by

two experts of the field. The title page should include the authors’ names and

affiliations, address, e-mail, telephone, and fax numbers. Manuscripts should

include an abstract of no more than 150 words and up to four keywords. The final

version of the manuscript should be sent both in .doc and in .pdf files. It is the

authors’ responsibility to obtain written permission to reproduce copyright

material.

All kinds of manuscripts should be sent in electronic form (.doc and .pdf) to the

Editor. We encourage our colleagues to send manuscripts for our newly released

section entitled Master’s research: Introduction. Master’s students are invited to

sum up their research in the area of phonetics answering the questions of

motivation, topic, goal, and results (no more than 1,200 words).

INSTRUCTIONS FOR BOOK REVIEWERS

Reviews in the Phonetician are dedicated to books related to

phonetics and phonology. Usually the editor contacts prospective

reviewers. Readers who wish to review a book mentioned in the list

of “Publications Received” or any other book, should address the

editor about it.

A review should begin with the author’s surname and name,

publication date, the book title and subtitle, publication place, publishers, ISBN

numbers, price, page numbers, and other relevant information such as number of

indexes, tables, or figures. The reviewer’s name, surname, and address should

follow “Reviewed by” in a new line.

The review should be factual and descriptive rather than interpretive, unless

reviewers can relate a theory or other information to the book which could benefit

our readers. Review length usually ranges between 700 and 2500 words. All

reviews should be sent in electronic form to prof. Judith Rosenhouse (e-mail:

[email protected] ).

119

ISPhS MEMBERSHIP APPLICATION FORM

Please mail the completed form to:

Treasurer:

Prof. Dr. Ruth Huntley Bahr, Ph.D.

Treasurer’s Office:

Dept. of Communication Sciences and Disorders

4202 E. Fowler Ave. PCD 1017

University of South Florida

Tampa, FL 33620 USA

I wish to become a member of the International Society of Phonetic Sciences

Title: ____ Last Name: _________________ First Name: _________________

Company/Institution: ________________________________________________

Full mailing address: ________________________________________________

________________________________________________________________

Phone: __________________________ Fax: ____________________________

E-mail: ___________________________________________________________

Education degrees: __________________________________________________

Area(s) of interest: __________________________________________________

The Membership Fee Schedule (check one):

1.Members (Officers, Fellows, Regular) $ 30.00 per year

2.Student Members $ 10.000 per year

3.Emeritus Members NO CHARGE

4.Affiliate (Corporate) Members $ 60.000 per year

5.Libraries (plus overseas airmail postage) $ 32.000 per year

6.Sustaining Members $ 75.000 per year

7.Sponsors $ 150.000 per year

8.Patrons $ 300.000 per year

9.Institutional/Instructional Members $ 750.000 per year

Go online at www.isphs.org and pay your dues via PayPal using your credit card.

I have enclosed a cheque (in US $ only), made payable to ISPhS.

Date ___________________ Full Signature _____________________________

Students should provide a copy of their student card.

120

News on Dues

Your dues should be paid as soon as it convenient for you to do so. Please send

them directly to the Treasurer in US$:

Prof. Ruth Huntley Bahr, Ph.D.

Dept. of Communication Sciences & Disorders

4202 E. Fowler Ave., PCD 1017

University of South Florida

Tampa, FL 33620-8200 USA

Tel.: +1.813.974.3182, Fax: +1.813.974.0822

e-mail: rbahr@ usf.edu

VISA and MASTERCARD: You now have the option to pay your ISPhS

membership dues by credit card using PayPal if you hold a VISA or

MASTERCARD. Please visit our website, www.isphs.org, and click on the

Membership tab and look under Dues for the underlined phrase, “paid online via

PayPal.” Click on this phrase and you will be directed to PayPal.

The Fee Schedule:

1. Members (Officers, Fellows, Regular) $ 30.00 per year

2. Student Members $ 10.00 per year

3. Emeritus Members NO CHARGE

4. Affiliate (Corporate) Members $ 60.00 per year

5. Libraries (plus overseas airmail postage) $ 32.00 per year

6. Sustaining Members $ 75.00 per year

7. Sponsors $ 150.00 per year

8. Patrons $ 300.00 per year

9. Institutional/Instructional Members $ 750.00 per year

Special members (categories 6–9) will receive certificates; Patrons and

Institutional members will receive plaques, and Affiliate members will be

permitted to appoint/elect members to the Council of Representatives (two each

national groups; one each for other organizations).

Libraries: Please encourage your library to subscribe to The Phonetician. Library

subscriptions are quite modest – and they aid us in funding our mailings to

phoneticians in Third World Countries.

Life members: Based on the request of several members, the Board of Directors

has approved the following rates for Life Membership in ISPhS:

Age 60 or older: $ 150.00

Age 50–60: $ 250.00

Younger than 50 years: $ 450.00