UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they...

126
UNIVERSITY OF NOVI SAD The Faculty of Philosophy English Department MA Paper: Pronunciation of English Diphthongs by Speakers of Serbian: Acoustic Characteristics Mentor: Dr. Maja Marković Student: Romeo Mlinar Novi Sad, 2011

Transcript of UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they...

Page 1: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

UNIVERSITY OF NOVI SAD

The Faculty of Philosophy

English Department

MA Paper:

Pronunciation of English Diphthongs by

Speakers of Serbian: Acoustic Characteristics

Mentor:

Dr. Maja Marković

Student:

Romeo Mlinar

Novi Sad, 2011

Page 2: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 2

i Abstract

The Serbian language does not have diphthongs, while they are frequent and very important

in English. In this paper, we discuss the physical properties of the English diphthongs, as

pronounced by the Serbian freshmen students. The pronunciation of eight diphthongs (/eɪ/,

/aɪ/, /ɔɪ/, /əʊ/, /ɑʊ/, /ɪə/, /ɛə/, and /ʊə/) in its long and short variants was analysed after

recording 15 female test speakers, and compared with the data form a referent RP speaker.

The focus in the results was on length and formants. Pitch and intensity were briefly

analysed. The results showed that the ratio of diphthong lengths in the Serbian speakers was

from 85% to 112% for the long, and between 79% and 63% for the short variants. We also

compared the length of the diphthongs with the length of the words within which they were

pronounced: less than 8% difference was observed in the short variants, while the long

variants had the differences between 10% and 17%. The formant measurements and the

comparison of the change between the first and the second vowel within the diphthongs

showed that the students had the same values for F1, F2, and F3 in long /ɑʊ/, /ɛə/ and short

/eɪ/, /ɪə/, /ɔɪ/. Seven variants had matches in the first two formants, while four variants (short

/ʊə/, /ɛə/; long /ɪə/, /ʊə/) had the most prominent mismatches it the three formants. The

formant magnitude, calculated by using the Euclidean distance, showed 95% similarity in the

long /ɔɪ/, while the lowest value was recorded for the short /ɑʊ/, 58%. The paper proves that

the starting hypothesis about the problems Serbian speakers might have in diphthongal

realisation of the English vowels is valid, and offers directions for possible further research,

while suggesting which parts of the English vocal space should receive more attention in the

teaching of English.

Page 3: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 3

i Сажетак

Српски језик нема дифтонге, док су у енглеском језику чести и веома важни. У овом

раду бавимо се физичким својствима енглеских дифтонга у изговору српских

студенткиња, полазницâ прве године студија англистике. Изговор осам дифтонга (/еɪ/,

/аɪ/, /ɔɪ/, /əʊ/, /ɑʊ/, /ɪə/, /ɛə/ и /ʊə/), у дугој и краткој варијанти, анализиран је након

снимања 15 говорница, а затим је изговор поређен са изговором контролне говорнице

чији је матерњи језик енглески (RP). У резултатима се углавном бавимо дужином и

формантима, док су фреквенција основног тона и интензитет само кратко анализирани.

Резултати су показали да је однос дужине дифтонга код српских говорница, у

поређењу са дужином забиљеженом код енглеске говорнице, био од 85% до 112% у

дугим дифтонзима и од 79% до 63% у кратким. Поредили смо и дужину дифтонга са

дужином ријечи у којој је дифтонг изговорен: мање од 8% дужинске разлике

забиљежено је у свим кратким варијантама, док су дуге варијанте дифтонга имале

разлике између 10% и 17%. Мјерења форманата и поређење промјена између првог и

другог самогласника унутар дифтонга показали су да су испитанице имале исте

вриједности за Ф1, Ф2 и Ф3 у дугим облицима /ɑʊ/, /ɛə/ и кратким /еɪ/, /ɪə/ и /ɔɪ/. Седам

форми није имало поклапања у прва два форманта, док су четири варијанте (кратки

/ʊə/, /ɛə/; дуги /ɪə/, /ʊə/) имале највеће разлике у сва три форманта. Магнитуда

форманата, израчуната помоћу еуклидске удаљености, показала је 95% сличности у

изговору дугог облика /ɔɪ/, а најмања сличност забиљежена у кратком дифтонгу /ɑʊ/,

58%. Рад показује да је тачна полазна хипотеза о могућим проблемима српских

говорника у изговору енглеских дифтонга, те указује на могућа наредна истраживања.

Такође, у раду наговјештавамо којим би се дјеловима вокалног простора енглеског

језика требало посветити више пажње у настави овог страног језика.

Page 4: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 4

ii Content

i Abstract .................................................................................................................................... 2

ii Content .................................................................................................................................... 4

iii Introduction ........................................................................................................................... 6

1. Physiology and Physics of Speech ......................................................................................... 8

1.1 Speech as a Physiological Process ................................................................................... 8

1.2 Sound and Speech Production ........................................................................................ 10

1.3 Formants ......................................................................................................................... 14

1.4 Formants in Vowels ....................................................................................................... 15

2. Speech: Segments, Features, Duration ................................................................................. 17

2.1 Non-Linearity of Speech Elements ............................................................................. 17

2.2 Segment Quality and Duration ................................................................................... 18

3. Vowel Sounds ...................................................................................................................... 19

3.1 Vowels in English .......................................................................................................... 20

3.2 Vowels in Serbian .......................................................................................................... 28

4. Methods................................................................................................................................ 31

4.1 Diphthong Selection ....................................................................................................... 31

4.2 Corpus Compilation ....................................................................................................... 31

4.3 Questionnaire ................................................................................................................. 34

4.4 Speakers ......................................................................................................................... 35

4.5 Corpus Training.............................................................................................................. 36

4.6 Recording ....................................................................................................................... 37

4.7 Signal Files Manipulation .............................................................................................. 37

4.8 Software ......................................................................................................................... 38

4.9 Segmentation and Theory behind It ............................................................................... 40

4.9.1 Praat Settings Methods ........................................................................................... 46

4.9.2 Praat Scripting and R .............................................................................................. 48

5. Results .................................................................................................................................. 51

5.1 A note about the Terminology and Notation .................................................................. 51

Page 5: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 5

5.2 Diphthong Length .......................................................................................................... 52

5.2.1 Isolated Diphthong Lengths ..................................................................................... 52

5.2.2 Length of the Words ................................................................................................ 57

5.2.3 Ratios of the Diphthongs within the Words ............................................................ 60

5.2.4 Conclusion ............................................................................................................... 67

5.3 Formant Values .............................................................................................................. 69

5.3.1 Prediction of Target Values – Assumed Shifts ........................................................ 69

5.3.2 Positive and Negative Target Values in Corpora – Measured Shifts ...................... 71

5.3.3 Correcting Target Value Predictions ....................................................................... 75

5.3.4 Vowel Space in Targets ........................................................................................... 77

5.3.5 Formant Magnitudes ................................................................................................ 86

5.3.6 Conclusion ............................................................................................................... 91

5.4 Intensity .......................................................................................................................... 93

5.4.1 Overall Intensity ...................................................................................................... 93

5.4.2 Intensity by Targets and Length .............................................................................. 96

5.4.3 Conclusion ............................................................................................................... 99

5.5 Pitch .............................................................................................................................. 101

5.5.1 Overall Pitch .......................................................................................................... 101

5.5.2 Conclusion ............................................................................................................. 107

6. Conclusion ......................................................................................................................... 108

7. Appendices ......................................................................................................................... 110

7.1 Questionnaire Results ................................................................................................... 110

7.2 Programming and Scripting ......................................................................................... 111

7.2.1 Programming in R ................................................................................................. 111

7.2.3 Python .................................................................................................................... 112

7.2.3 Praat ...................................................................................................................... 114

7.3 Log Files from the Recording Sessions........................................................................ 115

7.4 The Result Tables ......................................................................................................... 116

7.5 The Euclidean Distance ................................................................................................ 120

8. Bibliography ...................................................................................................................... 122

Page 6: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 6

iii Introduction

Diphthongs are an important element of the English vocal system. This cannot be claimed for

the Serbian language. Therefore, we believed that it would be beneficial to see how well the

Serbian speakers perform in pronouncing a selected group or the RP diphthongs. The

challenge for the Serbian speakers is considerable, because of the foreign language vowels

that are not part of their vocal space.

This paper is a work in experimental phonetics, a sub-discipline of general phonetics.

The examinations include acoustic (physical) properties of speech sounds. We are exclusively

dealing with the analysis of the produced speech sounds – a discussion about perception,

another important phonetic aspect, is not part of this work.

We based conclusions on contrasting the measurements from the Serbian data with

the RP data. The digital corpora, used to acquire the acoustic data (time, frequencies,

intensity), was created from the recordings of a written corpus. The female freshmen students

were recorded, and the results were compared with the measurements from the referent

female RP speaker. The students were selected, 15 of them, after filling in a questionnaire

about their mother language and the relevant linguistic experience.

The hypothesis of the paper is related to the findings of experimental phonetics, about

the differences in physical properties of speech sounds: the properties differ greatly,

depending, namely, of the class of the articulated speech sound, the language spoken, and the

gender of the speakers. Thus, we expected to find significant differences in the acoustic

properties of the speech sounds (diphthongs) spoken by the Serbian speakers, when compared

to the native RP speaker. We chose to examine length, formant frequencies, and intensity of

the selected diphthongs.

The experimental framework for this paper we derived from the writings by the well-

known authors in the field: Ladefoged, Johnson, Kent and Read, Clark and Yallop, Harrigton,

Stevens etc. Basic assumptions about the vowels in English we acquired from Gimson, Jones,

O‟Connor, Odden, Laver etc. The acoustic description of Serbian vowels was extracted from

the works by Marković and by Petrović and Gudurić, but only the first source was used, since

the speakers in that corpus were female, just like in our research. We processed the sound

data in Audacity, while the measurements and segmentation were done in Praat. The analysis

of measurements was performed in R: A Language and Environment for Statistical

Page 7: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 7

Computing, and this environment was used for the customised plotting as well. We also wrote

three small software tools in Python, to facilitate different stages of the research process (the

corpus compilation, the recording, and the quality check).

Although we have showed, as predicted, that the acoustic properties of the English

diphthongs in the pronunciation of the Serbian students were different from the native

pronunciation, we must be cautious about the extent of the differences. The reason for this is

that we compared the aggregated data from the fifteen Serbian speakers with only one RP

speaker. The second reason for conditional acceptance of our results is that the conclusions

were relative, because we did not compensate for discrepancies within corpora (i.e. different

rate of speech or different intonation).

Page 8: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 8

1. Physiology and Physics of Speech

1.1 Speech as a Physiological Process

Speech is produced by three groups of organs working together (Kent and Read 1)

respiratory, phonatory and articulatory. The dominant elements of the respiratory system1 are

the lungs, the chest wall, and the diaphragm. Working together, they provide the mechanical

energy in form of air pressure, or the aerodynamic energy of the speech (2) needed to

produce sound in the larynx. The larynx is the place of phonation, while the tongue, the lips,

the jaw and the velum, the articulatory elements of the speech organs, modify the properties

of created sounds. The extent of modification depends on several factors, including the

position of articulatory organs, the intensity of sound (pressure), physical properties of the

tissues, etc.

There are two important phases in the respiratory system related to speech: inspiration

and exhalation. They make the respiratory cycle. Inspiration, or the process of inhaling,

occurs when the thoracic volume increases, which causes the lowering of the pressure in

lungs. This pressure difference causes air to enter the system. The increase of space within

the thorax is achieved by the rib cage moving upwards or by lowering of the diaphragm.

Expiration, or exhalation, is achieved by the relaxation pressure (Clark and Yallop 24).

Complex physiological responses of the muscular system are focused on maintaining the

pressure below the glottis needed for articulation (26).

1 Another system directly involved in speech is the nervous system.

Page 9: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 9

Figure 1. The vocal tract (Ogden 10)

The air from the lungs enters the larynx, a structure consisting of several cartilages

and it is about 11 cm long and has 2.5 cm in diameter (30). Above the larynx is the epiglottis,

a leaf-shaped cartilage that closes the airways during swallowing, thus protecting sensitive

tissue. The larynx houses vocal folds, “typically about 17 to 22 mm long in males and about

11 to 16 mm long in females” (32). Above the epiglottis is the pharynx, a muscular passage

that connects the oral cavity, the larynx, and the velum. The pharynx is passively involved in

speech (42), because it modifies the size of the space between the oral cavity and the larynx.

The velum, a soft tissue, is positioned above the pharynx. It directs the airflow in speech: if

raised it closes the velopharyngeal port, an opening to the nasal cavity2 (46).

The oral cavity is a space in vocal tracts where humans can exert the greatest control

over the size and shape (O‟Connor, Phonetics 34), which makes the oral cavity critical for

“determining the phonetic qualities of speech sounds” (Clark and Yallop 47). It is a space

between the lips, the palotaglossus muscle, the tongue, and the roof of the mouth (47). The

2 The velopharyngeal port is very important in discussing nasal sounds, where the air stream has a complex path,

including several cavities, and an intricate physical model.

Page 10: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 10

lips, the tongue and the angle of the mandible have an important role in speech sound

production, although not of equal importance (for example, it is possible to make a distinctive

sound with the mandible fixed) (47). Considering the complex muscular and neural structure

of the mobile parts that surround the oral cavity it is no surprise “that the characteristics of

vowels depend on the shape of the open passage above the larynx” (Jones, Outline 29). Of

course, this refers not only to vowels, but to all speech sounds; what makes vowels

interesting, however, is the lack of any closure in the passages, so their quality is conditioned

by the shape of the passages, or “inherent properties of the cavities” (Crystal 27).

The movement of tongue backwards or forwards modifies the space in the pharyngeal

region; the movement upwards and downwards (usually followed by mandible movement)

changes the volume and shape of the space defined by the hard palate and tongue (Stevens

22). According to Johnson the volume of the vocal tract in males is about 170 cm3 and 130

cm3

in females; when the mandible is lowered for about 1 cm (average in speech), the volume

increases to 190 cm3 and 150 cm

3, respectively (24). Citing Goldstine, Johnson gives 41.1 cm

as an average vocal tract length in adult females, 6.3 cm for pharynx length and 7.8 cm for

the oral cavity length. In males, the values are 16.9 cm, 8.9 cm and 8.1 cm, respectively (25).

This shows that the oral cavity in both sexes is almost of the same length, while differences

are reflected in the length of the pharyngeal region.

1.2 Sound and Speech Production

Sound is a form of energy (Crystal 32), a series of pressure fluctuations in a medium

(Johnson 4). In speech, the medium is usually air, although sound can propagate through

solid objects and water. Sound is a series of rarefaction and compression events in the

medium, which are (in speech) initiated once the air particles become energised by the vocal

folds vibration. Compression occurs when particles are shifted closer to each other, which

results in increased density within medium. Rarefaction is the opposite, when particles retract

so density in medium reduces.

A cycle (or oscillation) consists of one rarefaction and one compression event. The

number of periods in a second is defined as frequency, and it is measured in hertz (Hz). A

sound with one oscillation per second has 1 Hz; a sound of 100 Hz has an identifiable part

that repeats once in a tenth part of a second.

The energy of a sound wave depends on the force that created it. The bigger the

energy in making the sound wave, the bigger pressure level in the medium it creates. The

energy of a sound wave is related to its amplitude (the degree of change within an

Page 11: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 11

oscillation): a very strong wave will have big amplitude, and vice versa. The sound pressure,

or its intensity, is measured in dB (decibels). The human ear is very sensitive to pressure

variations, and it can register about 1013

units of intensity (Crystal 36). For easier reference,

such a long scale was logarithmically divided into the scale of 130 dB (36).

A sinusoid (figure 3) is an abstraction of a simple periodic sine wave. Three items are

needed for its description: amplitude, frequency and phase3 (Johnson 7). From the picture we

see that the frequency of the sound is 1 per unit of time, while the amplitude reaches its peaks

at 2 and -2 on the vertical scale.

Waves created in speech are complex and “composed of at least two sine waves” (8).

One such complex wave has a pressure oscillation (an amplitude) that is the result of the

pressure oscillations of at least two waves (Ladefoged, Elements 37), and, of course, the

phases of the waves involved. Every complex wave can be seen as composed of several

simple waves, and the merit of such model is that “any complex waveform can be

decomposed into a set of sine waves having particular frequencies, amplitudes and phase

relations)” (Johnson 11). The process of “breaking complex wave down into its sinusoidal

components” (Clark and Yallop 203) is well-known in physics and it is called the Fourier

analysis. Diphthongs, our key theme, are vowels; therefore, they are sounds with sustained

vocal phonation and they can be analysed as periodic sounds.

3 Phase is “the timing of the waveform relative to same reference point” (Johnson 8) but the term is not

important for our paper.

Page 12: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 12

Figure 2. An analysis of a spectral slice extracted from schwa within the word

“word”, as spoken by a native RP speaker; the peaks show the dominant frequency

components within the complex sound.

The second group of waves is aperiodic waves. They are characterised by the lack of

repetitive pattern. Two types of waves are grouped under the term aperiodic: white noise and

transients. White noise contains a completely random waveform, and an example for white

noise is a fricative such as [s] (Johnson 12). Transients are “various types of clanks and bursts

which produce a sudden pressure fluctuation that is not sustained” (12). Aperiodic sounds can

also be subjected to Fourier analysis, but since they include groups of sounds other than

vowels, there was no need, or adequate opportunity, to analyse them directly in this paper.

The awareness of the type of waves and its visual representation in various forms,

weather being numeric or graphical, was necessary to interpret the data. For example, we had

to be able to recognise the waveforms and spectrogram parts belonging to different sounds –

without this it would be impossible to isolate the vowels, which was our task.

Sometimes pressure fluctuations in form of sound that hit an object cause the object to

vibrate. The vibrations occur if the acting frequency is within the “effective frequency range”

or resonator bandwidth (Ladefoged, Elements 68). Such induction of vibrations by another

vibrating object is called resonance. Every object has a specific range of frequencies that it

can respond to, and those frequencies correspond to the dominant frequencies of the sound

Frequency (Hz)

0 2.205·104

So

und

pre

ssure

lev

el (

dB/

Hz)

0

20

40

Page 13: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 13

the object can create – or as Ladefoged explains it: “... the resonance curve of a body has the

same shape as its spectrum” (65). In speech, the speech organs have the function of

resonators: they filter (enhance and dampen) properties of waves, recognised as the speech

sounds.

Figure 3. A sinusoid with equilibrium, maximum and minimum points corresponding to

the rarefaction/compression events in sound propagation

An object will increase the vibrations4 that are close to its own natural frequencies. This is

what happens in the process of speech: while speech sound propagates through the vocal tract

some frequencies are dampened, while some are increased, in accordance to the resonant

properties of the parts of the tract and its cavities, tissues. This is a description of a well-

known theory about the production of vowels, the source-filter theory, which postulates that

the vocal folds are the source of the sound, and after the sound is made it passes through a

filter shaped by the vocal tract cavities (Ladefoged, Elements 103). This filter is “frequency-

selective and constantly modifies the spectral characteristics of sound sources during

articulation” (Clark and Yallop, Introduction 217), and it changes during articulation (218).

Kent and Read refer to the theory as a “linear source-filter theory, because it is based on a

linear mathematical level” (18).

4 In speech the origin of vibration usually refer to the vocal folds, but when a person in unable to produce sound

by the vocal folds, usually because of illness, other means can be employed (Pinker, Instincts, 165)

Page 14: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 14

However, the vocal tract is not the only filter involved: the sound is modified and

after it leaves the vocal tract, because the air in the “outside world” affects the sound5; it is

also affected by the physical properties of the head.

1.3 Formants

Ladefoged notes that “the best way of describing vowels is not in terms of the articulations

involved, but in terms of their acoustic properties” (Data 104). That is the reason we now

focus on critical elements in speech acoustics – formants, since they are “one of the most

important properties which allows a listener to distinguish speech sounds” (Odden 10). The

distribution of formants in the spectrum is critical for vowel perception, which makes them

an important part of our research.

Formants are defined as the “peaks of energy ... produced by selective enhancement

of the source by tract resonance” (Clark and Yallop, Introduction 221). In other words,

formants are the most prominent elements of energy distribution in speech sound. The

“selective enhancement” is related to the previously explained filtering function of the vocal

tract.

Furthermore, formants are the patterns of the lowest components of speech (rate at

which the vocal folds vibrate), the “F0 patterns” (Clark and Yallop, Introduction 229). F0 is

the most influential energy distribution (167) in speech production. All other formants (F1, F2

etc.) are its multiples (Johnson 79).

The fundamental frequency F0 is also called pitch, or “the perceived period of

frequency of a sound wave” (Clark and Yallop, Introduction 210). It is closely related to

fundamental frequency and to a “minor extent to the intensity of the sound, but the

relationship between [them] is ... nonlinear” (210). Pitch is related to “muscle forces

controlling the tension of the vocal folds and ... the air pressure” (11). Ladefoged points out

one very important terminology distinction: pitch is related to perception, and not to

measurable property of sound. However, he concludes that “it can usually be said to be the

rate at which vocal fold pulses recur” – which is the definition of fundamental frequency

(Data 75).

5 This is the „radiation factor/impedance“, a filter that intensifies high frequencies by 6 dB for each octave.

Within the pulse coming from the vocal folds, frequency peaks decrease for about -12 dB per octave. Thus,

“these two ... factors account for a -6dB/octave slope in the output spectrum”. (Ladefoged, Elements, 105). Such

sharp fall of the energy peak also means that “the intensity of the harmonics falls quite rapidly at high

frequencies” (Clark, Introduction, 212). It is then logical that most of the significant data in a sound signal is

below 5.000 Hz, assuming that the upper hearing limit in humans is 20.000 Hz.

Page 15: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 15

In the context of vowel analysis F0 is related to the vowel height, so “high vowels

have a somewhat higher fundamental frequency on the average than low vowels” (Kent 95).

In a speech wave sampled at 10.000 Hz, which means that the maximum frequency6

can be 5.000 Hz “we can expect to find five7 formants” (Ladefoged, Elements 193), but not

all formants are of the same importance for vowel distinction. The first two formants are “the

most important acoustic parameters for vowel quality distinction” (Harrington, Techniques

60). Clark and Yallop also include the third formant as the important one, and comment that

“higher peaks [of energy] contribute more to the personal voice quality and become

progressively less significant above about 5 kHz” (221). Furthermore, they underline that

formants are not the only important part of vowel identification: “We also depend on the

overall dynamic pattern of syllabic structure to supplement formant information in

establishing the phonological identity of vowel sounds” (239).

In our research, we are primarily interested in F1 and F2, although F3 is available in all

of the data we acquired.

1.4 Formants in Vowels

Formant distribution (formant patterns in vowels) refers to the specific positions of energy

peaks within vowels. Ladefoged (Elements 108) notes that formant frequencies are influenced

by “three factors:

- the position of the point of maximum constrictions in the vocal tract (... controlled

by the backward and forward movement of the tongue);

- the size of cross-sectional area of the maximum constriction (... controlled by the

movements of the tongue toward and away from the roof of the mouth and the

back of the throat);

- and the position of the lips”.

With the above as our starting points, we can state general rules, such as “F1 varies mostly

with tongue height and F2 varies mostly with tongue advancement” (Kent and Read 92).

Many researchers8 wrote about the influence different places of constriction in the vocal tract

6 The Niquist theorem.

7 Or four, if a speaker is female (because female speakers, as well as children, have higher pitch). In that case

the fifth formant can be expected around 5.500 Hz. Our results are explained in the chapter about methods. 8 The discussion is a recurring topic in every book about acoustic phonetics we had an opportunity to study.

Page 16: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 16

have on formants.9 Here we give the information provided by Petrović and Gudurić

10 (92),

based on a research by Gunar Fant and his findings about the neutral voice schwa:

1. Constriction in the upper part of the vocal tract lowers F1, bust increases F2

and F3.

2. Constriction in back part of the vocal tract increases F1 and lowers F2 and

F3.

3. Labialisation lowers the frequency of all formants.

4. The increase in the aperture makes formant values close to those of the

neutral voice schwa.

Here is another overview of formants predictions, adopted from Kent and Read (27):

1. All formants frequencies are lowered by labial constriction.

2. All three formant frequencies are raised by a constriction near the

larynx.

3. The curve for F2 has a negative region corresponding to the tongue

constriction for /ɑ/ and a positive region corresponding to the tongue

constriction for /ɪ/.

4. The curve for F3 has negative regions corresponding to constriction at

the lips, the palate and the pharynx. (...)

Ladefoged comments that “the major effect of lip rounding is to lower F3”, while in “the

palatal, velar and uvular regions ... there is much greater effect on F2” (Elements 131). The

reason for the lowering of F3, as well as other formants, in cases where lips are protruded is

the lengthening of the vocal tract, which lowers the resonance frequencies (Kent and Read

24).

Table 1

Formants values in vowels (+ means high, - low). Based on

Kent and Read (92).

9 One of the most influential researches is by Chiba and Kajiyama (1941), about the perturbation theory.

10 „1. Сужење пролаза (констрикција) у предњем делу говорног канала проузроковаће снижење Ф1, а

повишење Ф2 и Ф3; 2. сужење пролаза у задњем делу канала утиче превасходно на повишење Ф1, уз

истовремено снижавање Ф2 и Ф3; 3. лабијализација снижава фреквенције свих фоманата; 4. повећање

апертуре приближава формантске фреквенције фреквенцијама неутралног гласа ə.“

Page 17: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 17

Vowel Formant Difference

Low F1+

High F1-

Front F2+ large F1-F2 difference

Back F2- small F1-F2 difference

The correlation between formants and the vocal tract configuration has an important practical

implication. The fact that approximate formant shapes are known, both in Serbian and in

English, enabled us to establish two procedures related to the methodology of this research.

The first was easier reading of the spectrograms: by knowing where to find the energy peaks,

we were able to discriminate between the formant contours, and choose the most probable

location of the formants we wanted to measure11

. The second procedure was related to the

verification of the measurements: by comparing the measurement from the students with the

measurements form the native speaker, we expected to find a significant correlation in results

(which was indeed the case).

2. Speech: Segments, Features, Duration

This chapter discusses the relations between temporal organisation of speech, segments, and

features. It explains the theoretical assumptions for the framework of linguistic investigation

of speech, and explains why knowledge of such framework is important for creation of this

paper.

2.1 Non-Linearity of Speech Elements

Speech, observed in time, is a predominantly constant, uninterrupted phenomenon.

Spontaneous pauses and breaks do happen, but “in two occurrences in the linear production

of speech by a single speaker”, and those pauses are at the start or end of “a speaking-turn”

and the start or end of “an individual utterance” (Laver, Linguistic 157). For the purposes of

linguistic investigation, utterances are divided into smaller units, so the elements of speech

are analysed as if being individual, and not part of some larger construct (157). Such

elements are phonetic segments and phonetic features.

A speech segment, or “any one of the minimal units of which an utterance may be

regarded as a linear sequence, at either the phonetic or phonological level” (Trask, Dictionary

11

The details about the procedure are in 4.9.

Page 18: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 18

318), has a structure reflected in the different configuration of the speech articulators for the

given segment. This is significant for distinguishing different elements within the same word

in a spectrogram; each segment has different description of measurable acoustic properties of

the produced sound due to the influence the articulation exerts on the mechanics of sound.

Unlike segments, whose organisation is linear, features are non-linear because “they

can overlap each other in time, and have start-points and end-points which do not necessary

align with those of the chain of segments” (Laver, Linguistic 157). In other words, the

segmental phases are significant in coarticulation and the temporal overview of a segment,

because a feature may or may not be synchronous with the articulatory phases.12

An example of a feature that overlaps segments is nasalisation, where elements of

nasalisation are detectable not only in a nasal but also in the surrounding sounds. The

“unwanted” nasalisation neighbouring futures can affect the interpretation of spectrograms

and measurements. That is why the awareness of non-linearity was important for our work,

and that is the reason there is no, for example, nasals or approximants preceding diphthongs

in our corpus (chapter 4.2).

2.2 Segment Quality and Duration

Speakers can mark differences in speech segments either by “changing the phonetic

quality of the sound, or its duration” (161), which are dependent on coordination between

five important sub-processes of speech: initiation and direction of airflow, temporal

organization, intersegmental coordination, articulation, and phonation (161).

Phonation types are “distinctive patterns of vocal fold vibration” (Ogden 25). There are

six of them: voiceless sound, whisper, voiced sound, creaky voice, breathy voice and glottal

stop (Laver, Linguistics 164). In our research, we deal with analysis of voiced and voiceless

sound phonation, because our topic is related to diphthongs in the fortis/lenis context (which

correlates with the presence or absence of voicing).

Intersegmental coordination describes the relation between the three phases of a segment:

onset, medial and offset phase (164). The phases are defined in the terms of the configuration

of the vocal tract. Thus, the onset phase is characterised by an active articulator (for example,

the tongue) attaining the maximum stricture. The offset phase is the part of articulation where

articulators shift from maximum stricture to the medial phase of the next segment. Lastly, the

medial phase is “the time occupied in maintaining the maximum degree of stricture” (164).

12

Using the vocabulary of acoustic phonetics we can describe the overlapping of features as the “formant

frequency patterns [that] may [vary] within and across sound segment boundaries” (Fant 150).

Page 19: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 19

The knowledge of segmental coordination was needed in the spectrogram analysis, because

different measurements required different approaches (chapter 4.9). For example, in

measuring the formant values of a diphthong, we had to find the two medial phases of the

glide; if the total duration of a word was measured, we had to identify the onset phase of the

first segment and the offset of the last.

The second element for identifying differences in segments is duration, which is related to

“certain inherent ... constraints which have physiological or perceptual explanations” (177).

Such properties often have phonological significance, so duration has a role in marking a

different context and meaning. In our research duration is observed as a phonetic

phenomenon: we compared the duration measurements between the RP speaker and the

students, without discussing the possible consequences in the domain of meaning.

3. Vowel Sounds

Vowels are speech sounds13

during whose production “the tongue is held at such a distance

from the roof of the mouth that there is no perceptible frictional noise” and “a resonance

chamber is formed which modifies the quality of tone” (Jones, Pronunciation 12). Gimson

defines vowels14

as a “category of sounds ... normally made with a voiced egressive air-

stream, without any closure or narrowing such as would result in the noise component

characteristic of many consonantal sounds” (Introduction 35).

Fant gave a list of several correlates in speech sound classification (Speech pp. 153-

155), and we compiled an overview of properties a sound should have, according to Fant, to

be classified as a vowel/diphthong. Thus, the first condition is that a vowel must have sound

energy visible in sound spectrum, and that the source of the acoustic energy originates from

the vocal folds vibration. A vowel should also have “vowelike correlate” in speech

production, which means an unobstructed pass of airstream. Waveform analysis of a

vowelike sound implies that “at last F1 and F2 [are] detectable”, while F3 should be visible if

F1/F2 are not at their lower ends (156). To classify a vowel as a diphthong15

, the speech

13

Of course, they are also discussed in terms of being “purely linguistic units, counters which do a certain job,

irrespective of how they sound” (O‟Connor, Phonetics, 199) but that is the approach we leave to phonological

discussions. 14

Gimson refers to vowels in the introductory chapters as “the vowel type” of sounds, “described in mainly

auditory terms” (Introduction, 35). When discussing the vowel versus the consonant distinction he notes: “It

will be found that the phonemes of a language usually fall into two classes, those which a typically central (or

nuclear) in the syllable and those which are non-central (or marginal). The term „vowel‟ can then be applied to

those phonemes having the former function and „consonant‟ to those having the latter.” (53). 15

More on diphthongs in the next chapter.

Page 20: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 20

sound must satisfy the “glide” correlate, which in the production context means “moderate

speed within a segment”, seen as a “relatively slow [spectrum change] rate but faster than for

mere combination of two vowels” (156). We believe that figure 4 shows a spectrogram of a

diphthong, satisfying all of the Fant‟s requirements for the classification.

We will give one more description of vowels16

, as described by Laver, who says that

two of the distinctions for classifying speech sounds are place of articulation and degree of

stricture, both related to the medial phase17

of a segment. Place of articulation refers to “the

location of the articulatory zone in which the active articulator is closest to the passive

articulator during the medial phase of a segment” (166). Degree of stricture identifies the

degree of closure between the two articulators in the medial phase. Thus, he defines vowels

as a group of sounds articulated in places of neutral articulation (167), when “the potential

active articulators ... lie in their neutral anatomical position” (166) opposite their passive

articulators. In discussion about degree of stricture Laver says that in resonants “the stricture

is one of open approximation” (168), allowing unrestrained pass of energy from the vocal

folds.

3.1 Vowels in English

The English language has a very rich vocalic system, which consists of 21 distinctive

vowels.18

The vowels sounds are listed in the table below, using the newer IPA notation.

Table 2

The English vowels with examples (O‟Connor,

first edition 1973)

O'Connor Examples

1 i: see, unique, feel

2 ɪ wit, mystic, little

3 e set, meant, bet

4 æ pat, cash, bad

5 ɑ: half, part, father

6 ɒ not, what, cost

7 ɔ: port, caught, all

16

Laver (pp. 167-172) gives a detailed description of several articulation aspects, but we focused on only those

most relevant to our preliminary notes. 17

The segments were defined in chapter 2. 18

Or 22 depending on the classification used.

Page 21: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 21

8 ʊ wood, could, put

9 u: you, music, rude

10 ʌ bus, come, but

11 ɜ: beard, word, fur

12 ə alone, butter

13 eɪ lady, make

14 əʊ go, home

15 aɪ my, time

16 ɑʊ now, round

17 ɔɪ boy, noise

18 ɪə here, beard

19 ɛə fair, scarce

20 ɔə more, board

21 ʊə pure, your

Gimson (Introduction 90) sorts English vowels into three groups: short, long “relatively

pure” and long “diphthongal glides, with prominent 1st element”.

Table 3

Short and long

monophthongs in English

Short

ɪ

e

æ

ɒ

ʊ

ʌ

ə

Long

i:

u:

ɑ:

ɔ:

ɜ:

A diphthong, the speech sound in the primary focus of our research, is defined by Jones as “a

sound made by gliding from one vowel to another ... represented phonetically by sequence of

Page 22: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 22

two letters” (Pronunciation 22). A sound realised as a diphthong marks “a change from one

vowel quality to another, and the limits of the change are roughly indicated by the two vowel

symbols” (O‟Connor, Phonetics 155). It is important to note that even though a diphthong is

“... phonetically a vowel glide or a sequence of two vowel segments [it] ... functions as a

single phoneme” (220).

Figure 4. The spectrograms of /ɑɪ/, as spoken in “dies” by one of the Serbian

speakers (the first spectrum), and the referent RP speaker

The critical property of diphthongal realisation of a sound is movement, when “the

organs of speech perform a clearly perceptible movement” (Jones, Outline 63). Gimson notes

that diphthongs, or “diphthongal vowel sounds” (Introduction 39) are sounds “which have a

considerable voluntary glide”. They are “the sequences of vocalic elements ... which form a

glide within one movement” (126).

The movement in a diphthong starts from the first element, which is usually a pure

vowel (127) and reaches an approximate value of a vowel indicated by the second element or

Page 23: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 23

“the point in the direction of which the glide is made” (126). The point of direction, whether

on the cardinal vowel diagram, or the tongue in the mouth, enables classification of the RP

diphthongs into two groups: closing and centring (Jones, Pronunciation 23-24):

Table 4

Classification of diphthongs on the closing and the centring

Diphthong class Constituent vowels

Closing eɪ, ɔʊ, ɑɪ, ɑʊ, ɔɪ

Centring ɪə, ɛə, ɔə, ʊə

The first element in RP diphthongs is usually [ɪ, e, a, ʊ, ə], while the second is [ɪ, ʊ, ə]

(Gimson, Introduction, 126). However, one of the characteristics of diphthongs is great

regional variety19

.

Diphthongs can also be divided into groups based on the vowel to which they

gravitate in the second element. Thus, we have groups that have /ɪ/, /ʊ/ and /ə/ as the second

element.

Table 5

Diphthongs in RP English

Long vowels /

diphthongal

glides

[ɪ] eɪ

ɔɪ

ʊɪ

[ʊ] əʊ

ɑʊ

[ə] ɪə

ɛə

ɔə

ʊə

In this paper we are focused on Received Pronunciation, and the examples about the

sounds do not include different variants of pronunciation (whether in the UK itself, or the

USA, AU or other). The RP vowels of English, placed on the Cardinal Vowel diagram, are

19

Regional variations of diphthongs are not discussed in this paper.

Page 24: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 24

displayed below. The image is drawn based on the overview given by O‟Connor in his

Phonetics.

Figure 5. Vowels in English20

We can use the diagram to provide details about the sounds involved. The phoneme /i:/ often

has the quality of a diphthong (O'Connor 154), which depends on the accent. The arrow on

the diagram marks the approximate final location of the sound in diphthongal realisation. The

phoneme /ɪ/ is short and monophthongal. The phoneme /e/ is “in RP ... generally realised ...

as a short, front vowel between cardinals [e] and [ɛ]” (O'Connor 156), while /æ/ is also a

short vowel, but between cardinal [ɛ] and [a], it is usually realised as a monophthong. The

phoneme /ʌ/ is a “short almost open central vowel”, while /ɑ:/ is an “open, rather back

vowel” (O‟Connor 157-8). The phoneme /ɒ/ is pronounced by speakers of RP as “a short,

back, open or almost open vowel” (158). In a word such as caught there is the phoneme /ɔ:/.

In the diagram /ɔ:/ it is just below the cardinal vowel [o]. The dashed line pointing towards

the more central position illustrates the fact that many speakers do not make a distinction

between a monophthong /ɔ:/ and a diphthong /ɔə/. In such cases, the speakers “nevertheless

use a diphthong [ɔə] ... before pause” (160). The consequence is that “both saw and sore are

pronounced [sɔə] and both caught and court are pronounced [kɔ:t]” (160). The phoneme /ʊ/

is somewhat more centralised than cardinal [o], and it shows a relatively constant

20

Figures 5, 6 and 7 were derived from O‟Connor‟s Phonetics.

Page 25: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 25

pronunciation in dialects (162), unlike most of other vowels. About /u:/ O'Connor notes that

it “most often has a diphthongal realisation ... but it may be given a monophthongal

pronunciation slightly lower and more central than cardinal [u]” (162). The diphthongal

property of the vowel is indicted by an arrow in the graph. The phoneme /ɜ:/ is “typically a

long, mid, central vowel”, but in rhotic accents (American English, for example) this vowel

is in the sequence /ər/ (163) replaced by the retroflex [ɹ], i.e. bird (163). The phoneme /ə/ has

“two major allophones in RP, one central and half-close which occurs in non-final

positions..., and one central and about half open which occurs before pause ...” (the example

for the first variant is about, and for the second sailor) (164).

Figure 6 shows the approximation of the diphthongs /eɪ/, /aɪ/, /əʊ/ and /ɑʊ/ in

Received Pronunciation. Diphthong /eɪ/ starts “from slightly below the half-close front

position and moves in the direction of RP /ɪ/” (Gimson, Introduction 128). The beginning of

this diphthong is between cardinals [e] and [ɛ]. The first element of the diphthong /aɪ/ “varies

from central to front” (O‟Connor 167) or, in Gimson‟s description, it is “slightly behind the

front open position i.e. C[ä]” (Introduction 129). The glide ends with RP /ɪ/ position.

The first element of /ɔɪ/ in RP is pronounced very close to cardinal [ɔ] and the second,

after the configuration changes, is close towards the pronunciation of /ɪ/ (O‟Connor,

Phonetics 169). In this glide “the range of closing ... is not as great as in /aɪ/ ...” and “the jaw

movement ... may not ... be as marked as in the case of /aɪ/” (Gimson, Introduction 131). This

diphthong can be seen as asymmetrical on the RP system, since it is the “only glide of this

type with a back starting point” (132).

The realisation of diphthong /əʊ/ starts with the articulators positioned for “typical RP

[ɜ:] position”, while afterwards the tongue moves “slightly up and back to RP [ʊ], but the

starting point may vary ...” (O‟Connor 167). In conservative pronunciation this diphthong

starts “in a more retracted region”, near centralised (or centralised-open) [o], “and the whole

glide is accompanied by increasing lip-rounding” (Gimson, Introduction 133). In an affected

variant, the diphthong starts with more centralised-closed [ɜ] position (134). Also, “in many

speakers of general RP, the 1st (central) element is so long that there may rise for a listener a

confusion between /əʊ/ an /ɜ:/, especially when [ɫ] follows, e.g. goal, girl ... ” (134).

The diphthong /ɑʊ/ starts “further back than /aɪ/ and changes towards RP /ʊ/”

(O‟Connor, Phonetics 168); Gimson describes it as starting “slightly more fronted ... than RP

/ɑ:/” (Introduction 136). Another dominant diphthong in the back region is /əʊ/, so /ɑʊ/ has

to be pronounced with a perceivable difference – for this reason no raising is possible without

losing the contrast, and so “fronting or retraction” (136) prevails in the variants of /ɑʊ/.

Page 26: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 26

We will now describe the centring diphthongs /ɪə/, /ɛə/ and /ʊə/. Diphthong /ɪə/, starts

with the tongue positioned for /ɪ/. In the second part of the pronunciation, the movement has

two types. The first is “the more open variety of /ə/ when /ɪə/ is final in the words”, while in

the second type, in non-final positions21

, the movement is not so extensive (Gimson,

Introduction 142). The two pronunciations are, in essence, “two main allophones of /ɪə/ in

RP, corresponding to those of /ə/” (O‟Connor, Phonetics 170).

Diphthong /ɛə/ “starts at cardinal /ɛ/ or below and moves to more central but equally

open position” (171). Gimson adds that when final /ə/ acquires a more open position, while

in the cases when /ɛə/ is “closed by a consonant”, /ə/ it is of “mid ... type” (Introduction 143).

The variants are mostly in the degree of openness of the first element (143).

The glide /ʊə/ has “coalesced with /ɔ:/ for most RP speakers” (Gimson, Introduction

145) and “[a] monophthongal pronunciation is ... found regularly before /r/ in, e.g. alluring,

furious, having the quality of the diphthong‟s beginning point” (O‟Connor, Phonetics 172).

Gimson also gives an overview of the monophthongal pronunciation, such as in the words

your, Shaw or sure, but warns “that such lowering of monophthongization of /ʊə/ is rarer in

case of less commonly used monosyllabic words such as moor, tour, dour” (Introduction

145). The diphthong is pronounced with the first element around /ʊ/, while the second

element reaches a “more open type of /ə/” (144).

21

In the words used in this research, most of the instances of /ə/ are non-final.

Page 27: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 27

Figure 6. The closing diphthongs

A characteristic relevant to our research is the prominence of the diphthongal

elements and the length of the sound. For the exception of the falling diphthongs, “most of

the height and stress associated [with the sound] is concentrated on the 1st element, the 2nd

element being only lightly sounded” (126). The length of the diphthongs is the same as in

long pure vowels, which means they are affected by the same syllabic fortis and lenis rules.

Page 28: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 28

Figure 7. The centring diphthongs

Harrington describes a study based on the hypotheses by Pols, about classification of

diphthongs applied in American English by Cottinfield, and the importance of the targets for

the classification. The first hypothesis is about “dual target” (or onset plus offset), the second

about “onset plus slope”, while the third involves “onset plus direction”. According to the

first hypothesis, “both diphthong targets are critical for identification [of a diphthong]”, while

the second claims that “quality is presumed to depend on the first target”; the third hypothesis

postulates that “the first target and the direction of spectral movement” are the biggest

contributors in diphthong recognition (Techniques 66).

3.2 Vowels in Serbian

In most descriptions of the phonetic system of Serbian language five vowels are cited. These

are [a], [e], [i], [o] and [u]. The vowels [i] and [e] are classified as front, [a], [o], [u] as back

(Stanojčić 31). Table 6 shows the traditional classification of the Serbian vowels, together

with information about openness and height. Figure 8 shows the vowels, placed on the F1/F2

plane. The information is extracted from an appendix22

in the research done by Marković.

22

It includes 450 measurements of Serbian long and short vowel contexts.

Page 29: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 29

Table 6

Serbian vowels, adopted from Stanojčić (31)

Tongue/mouth cavity Front part Bottom part Back part

High [i] [u]

Mid [e] [o]

Low [a]

Figure 8. Vowels of Serbian language, derived from Marković corpus

Figure 9 shows formant values for Serbian, taken from three sources23

. In this paper

we will refer to Marković corpus, because of the speakers that were included in that corpus.

23

Ivić-Lehiste and Gudurić corpora were taken from Fonologija srpskog jezika.

Page 30: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 30

In the creation of Marković corpus only female students were included, which explain the

differences between the graphical corpora representation.

Figure 9. Vowel spaces form three corpora in Serbian: Ivić-Lehiste,

Gudurić, Marković (Marković contains the values from female speakers

only, with long/short vowel context, here joined in one graph)

Serbian language features four accent types, of which two are falling and two rising. Rising

and falling accents are also present in the non-accented syllables. Combined, they form three

types of contrast (Petrović 115). The first contrast is by quantity, the second by quality and

the third by the placement of the accent. In Serbian, falling accents are realised on one

syllable only. In polysyllabic words such accent can be placed on the first syllable, the rising

accent is found on the first and mid syllables of polysyllabic words, while the last syllable

does not have an accent (117).

Page 31: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 31

Table 7

Accents in Serbian (Stanojčić 39)

Description Symbol Example

Short falling pesma

Short rising ` sloboda

Long falling ˆ sunce

Long rising ´ glava

Marković claims that “... accents have been, traditionally, in the literature about Serbian

phonology, viewed as separate, prosodic characteristics, and separated from so called

inherent distinctive features” 24

(36). We will not discuss Serbian accents and their role in the

length of words, but our initial results about the vocal space of the Serbian students implies

that the English space was influenced by long Serbian vowels, rather than short (chapter

5.3.4).

4. Methods

4.1 Diphthong Selection

The single most important task in the preparatory phase was to single out diphthongs for the

research. We selected the diphthongs after consulting several authoritative sources. Daniel

Jones (Outline 99) lists 12 diphthongs he considers important in RP. According to him there

are “essential” and “nonessential” diphthongs, 10 of which being the important ones.

O‟Connor mentions 9 diphthongs (Phonetics 153), but indicates, just like Jones, that /ɔ:/ and

/ɔə/ are not separated in pronunciation (“relatively few RP speakers make a contrast”). Thus,

/ɔə/ is non-essential. Gimson (Introduction 98) does not mention /ɔə/ in his main overview,

and lists only /eɪ/, /aɪ/, /ɔɪ/, /əʊ/, /ɑʊ/, /ɪə/, /ɛə/, /ʊə/. Several other sources list the same

classification, and we decided to select the above eight diphthongs for the research.

4.2 Corpus Compilation

Each of the eight diphthongs in the corpus was represented by 4 words, two of which had

long and two short versions of the diphthong, reflecting the lenis-fortis distinction.

24

“Акценти се традиционално у литератури о фонологији српског језика посматрају као засебна,

прозодијска обележа и раздвајају се од такозваних инхерентних дистинктивних обележја” (Marković, 36)

Page 32: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 32

Since the corpus was designed for recording and digital analysis, particular attention

was given to the problem of coarticulation25

(Kent and Read 111; Fant, Speech 143; Crystal

52). To get signal files with usable formant contours we chose the words where the

diphthongs were not nasalized (Fant 149) and not in front of laterals or liquids (151). In

addition, the words were monosyllabic, where we could find words matching the criteria. We

also wanted voiced and voiceless plosives to be the same or at least homorganic26

.

Fortis and lenis relate to the force in pronunciation. Clark and Yallop cite Pike‟s

comments about fortis, who says that such articulation “‟entails strong, tense movements ...

relative to a norm assumed for all sounds‟” (Introduction 89). If we postulate that fortis relate

to strong sounds, it follows that lenis is related to weak ones (O‟Connor, Phonetics 129). The

main distinction between the two is in the energy distribution, which is important because “in

certain situations, the voice oppositions may be lost, so the energy of articulation becomes a

significant factor” (Gimson, Introduction 32). This “implies that, no matter whether the

sounds are fully voiced, partially voiced or completely without voice, there will always be a

difference between them on an energy basis” (O‟Connor, Phonetics 140).

Table 8

Lenis and fortis consonants in English (O‟Connor 129)

Lenis /p/ /t/ /k/ /ʧ/ /f/ /θ/ /s/ /ʃ/

Fortis /b/ /d/ /ɡ/ /ʤ/ /v/ /ð/ /z/ /ʒ/

The fact that neighbouring consonants are important for energy distribution in vocalic

sounds had a critical role in the preparation of this paper. Since fortis and lenis directly

influence the length of vowels, we used the lenis and fortis distinction in the selection of the

words for our corpus. The differentiation was possible because “when the vowels are

followed by a strong consonant they are shorter than when they are followed by a weak

consonant” 27

(O‟Connor, Better 102). To get significant number of items for conclusive

results, 16 of our corpus words feature fortis consonants and 16 lenis consonants after the

diphthongs.

25

The assimilation is also one of the problems, but since we are not interested in boundaries between words,

where this problem appears, we did not list it. 26

The instances when “the oral stricture is at the same place of articulation” (Laver 169). 27

This sentence is continued with: “... even so the vowel i: is always longer than the vowels i and e in any one

set”.

Page 33: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 33

Table 9

The corpus: the selected words, categorised by the length of their diphthongs (fortis versus

lenis)

/eɪ/ /aɪ/ /ɔɪ/ /əʊ/ /ɑʊ/ /ɪə/ /ɛə/ /ʊə/

Short bait,

dais

bite,

dice

doit,

Joyce

boat,

joke

douse,

doubt

idiot,

fierce

daresay,

thereto

bourse,

graduate

Long babe,

daze

bide,

dies

toyed,

joys

bode,

Job

bows,

gouge

idiom,

fears

dare,

theirs

gourd,

abjured

To produce shortlisted words that met coarticulation rules, we used two databases,

BEEP and Moby Hyphenation. To use the databases in this preparatory step, we had to write

a program to search the files28

.

As expected, diphthongs of greater lexical distribution were represented in the corpus

by the word pairs matching most of our conditions, while for those less frequent we had to

exert flexibility in the pair selection; that is why we had fierce/fears (no plosives) or

graduate/abjured (polysyllabic words). Table 9 shows an overview of the corpus. Table 10 is

a more detailed view of the corpus, including the unique strings identifying each word (and

file name).

The words were embedded into the sentence “The word WORD is spoken”, and the

compiled randomised list was used in the recording session.

Table 10

The corpus used in the research

Transcription

ID Word Simple IPA

/eɪ/ 1. bait b ey t /beɪt/

2. dais d ey s /deɪs/

3. babe b ey b /beɪb/

4. daze d ey z /deɪz/

/aɪ/ 5. bite b ay t /baɪt/

6. dice d ay s /daɪs/

7. bide b ay d /baɪd/

28 We named the program FONRYE, after Serbian “fonetski rječnik” or “phonetic dictionary”, and made it

available at http://languagebits.com/files/fonrye033.zip (version 0.3.3).

Page 34: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 34

8. dies d ay z /daɪz/

/ɔɪ/ 9. doit d oy t /dɔɪt/

10. Joyce jh oy s /dʒɔɪs/

11. toyed t oy d /tɔɪd/

12. joys jh oy z /dʒɔɪz/

/əʊ/ 13. boat b ow t /bəʊt/

14. joke jh ow k /dʒəʊk/

15. bode b ow d /bəʊd/

16. Job jh ow b /dʒəʊb/

/ɑʊ/ 17. douse d aw s /dɑʊs/

18. doubt d aw t /dɑʊt/

19. bows b aw z /bɑʊz/

20. gouge g aw dj /gɑʊdʒ/

/ɪə/29

21. idiot ih d ia t /ɪdɪət/

22. fierce f ia s /fɪəs/

23. idiom ih d ia m /ɪdɪəm/

24. fears f ia z /fɪəz/

/ɛə/ 25. daresay d ea s ey /dɛəseɪ /

26. thereto dh ea t uw /ðɛəˈtu/

27. dare d ea /dɛə/

28. theirs dh ea z /ðɛəz/

/ʊə/30

29. bourse b ua s /bʊəs/

30. graduate g r ae jh ua t /gradʒʊət/

31. gourd g ua d /gʊəd/

32. abjured ax b jh ua d /əbˈdʒʊəd/

4.3 Questionnaire

The topics covered in the questionnaire were the mother tongue and the details of learning

English. The aim was to select a representative group of Serbian speakers and to acquire

29

We could not find monosyllabic words meeting the requirements, so polysyllabic words were selected and

paired homorganically. 30

We could not match the properties in the monosyllabic pairs.

Page 35: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 35

additional information that could explain possible significant differences in the

measurements.31

After reviewing the supplied answers, we selected a group of 21 students as potential

speakers for our corpus, 15 of which were asked to record the sentences. The complete

overview of the gathered results is in the Appendix.

4.4 Speakers

The subjects of this research were female freshmen students at the English Department.32

Most of them grew up in Novi Sad or in the surrounding areas. That is how we achieved a

relatively uniform distribution of Serbian language variant spoken in Novi Sad and,

consequently, a uniform influence of the mother tongue33

on the pronunciation of English.

The number of students who filled in the questionnaire was 57. The mother tongue of

one of the students was English, so her results were excluded from the final overview. After

the questionnaire data was compared, the result showed that the number of students eligible

for the recording was 21. They were divided into three groups. The first group, Group A,

consisted of 15 students who reported Serbian as their mother tongue; at the same time, they

spent at least 15 years in Novi Sad. Group B listed one student only. Her mother tongue is

Hungarian, while she speaks Serbian with her family. The third group, Group C consisted of

5 students; their mother tongue is Serbian, they were born in Novi Sad, where they spent less

than 15 years.

Only female students were included in the research. The reason was that the male

students were under-represented: out of 57 students that filled-in the questionnaire, 7 were

male. After the consultations and establishing the recording schedule, we managed to record

sentences as spoken by 15 students. Twelve of the students were from Group A, and 5 from

Group C. Since most of them (80%) spent at least 15 years in Novi Sad, and the minority was

born in the same city (20%), we believe that this group is a good representation of the dialect

spoken in Novi Sad.

31

At least this was the intention at the beginning, but we had so many results to write about that notes about the

individual pronunciation differences were left for some other research. We do refer to them briefly in the part

about segmentation. 32

We would like to express our gratitude to all of the students who took part in the research. 33

To some students it was the second language.

Page 36: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 36

Table 11

The three groups of students and the

number within each group that

participated in the recording

Group A B C

Students 12 0 3

% 80% 0 20%

Our control speaker, a native in RP English, is Ms Helen Zaltzman. Helen is an award

winning author and broadcaster, born and raised in Tunbridge Wells, Kent, United Kingdom.

Her accent is distinctively PR. Since 2002 she has been living in London. She cited Italian as

her second language, which she uses actively.

4.5 Corpus Training

Before the actual recording, we gave the speakers a short introduction into acoustic

phonetics. The students were also explained how the recording would take place, and what

they were expected to do.34

We prepared two handouts, containing the most significant

information.

The first handout was labelled “Acoustic phonetics: Research, Corpus Recording”35

.

The introductory part was about the study of phonetics. The second part listed example

sentences (the “template” sentence with the embedded sample words). In the third section of

the handout, we presented SNTRecorder software: it was a short manual about the steps of

recording sentences. We noted that sentences should be pronounced without rush, and that

potential re-recording should not worry or discourage them. They were also reminded not to

memorise the order of the words, since the list was to be randomised on every recording

session.

The second handout, titled “Words for Recording” contained all the words for the

recording. This time the words were given in a table, without the context of the template

sentence. The table listed isolated words, followed by the IPA pronunciation keys, a sample

sentence from the real-world text, and similar words (words containing the same diphthong).

The introductory part contained the instructions for the recording practice: the speakers were

to read all the words aloud by paying attention to the IPA notation, first the main words,

followed by the similar words.

34

For this preparatory work with students, Phonetic Analysis of Speech Corpora (Harrington) was of great help. 35

The handouts were written in Serbian.

Page 37: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 37

Not all of the 32 words contained the example sentence. We used the Princeton‟s

database (a digital dictionary) and WordWeb Pro Dictionary to isolate the most frequent

English words, for which no example was provided. However, some exceptions were allowed

for the words we suspected could be less know to the students of Serbian.

The purpose of the table was to prepare the students for the recording, by giving them

not only pronunciation keys, but also an example from which they could observe the

meaning. We were careful not to pronounce the corpus words ourselves to avoid students

imitating us.

Students were not told about the precise purpose of the experiment in order so as to

avoid biased pronunciation.

4.6 Recording

The recording took place36

at the Faculty of Philosophy, in a room with an improvised studio.

Professional recording equipment was used, to which an AKG C 1000S microphone was

attached.

The students were given a low-noise laptop, on which SNTRecorder was running.

Most of the students had no problems with the process, and we were able to get as natural-

sounding pronunciation as possible in the given environment. In such setting 12 students

were recorded.

Due to vis major we had to use a portable voice recorder to record the last 3 students,

and opt for MP3 recording. A professional Roland Edirol R-09 model was used. The

recording took place in a soundproof room.

The files recorded with the AKG microphone were saved as the signal files sampled

at 41.000 Hz, with a depth of 32 bits, stored as uncompressed waveform audio file format

(WAV). When the R-09 was used, the sampling frequency remained the same, but signal files

were saved as MPEG 1 Layer 3 format (MP3), with a bitrate of 128 kilobits per second. The

MP3 sampling was confidently above the threshold of 60 bits per second, which was a

technical minimum for our research (Van Son). We did not expect significant deviations

differences in data due to the 20% of the sound files that were recorded by the R-09.

The native speaker recorded the corpus in a semi-professional studio, using

uncompressed PCM format, sampled at 44.1 KHz in 16 bits. 37

4.7 Signal Files Manipulation

36

We would like to thank Mr Jaroslav Kovač, the recording technician, for his great help during this process. 37

Unfortunately, we do not know the type of microphone used.

Page 38: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 38

After the recording was finished, files were opened in Audacity. Excessive pauses between

words, repetitions and self-corrections of the students were deleted. Each signal file was

verified in the corresponding log file (chapter 7.3) and by a student‟s self-introduction at the

beginning. The introduction was removed from the final signal files, due to the privacy

considerations. To keep track of the speakers‟ profiles,38

a file naming convention was

adopted:

NO-speaker-ID

NO stands for the speaker‟s number, determined by the order of the recording (1 for

the first student recorded, 15 for the last), and ID are the speaker‟s initials.

The three MP3 files were uncompressed. All signal files were saved as mono files

sampled at 41.000 Hz, 16 bits. No intervention was done on tracks that could alter the voice

data (for example, noise removal).

4.8 Software

Praat was our primary analysis tool, followed by R, used for statistical computations.

Audacity was used for basic sound data manipulation. Two custom pieces of software were

written to facilitate the corpus building and the recording session: Fonrye and SNTRecorder,

respectively.

Audacity, free and open source program for editing digitally stored sound data was

used to cut and export signal files. Thanks to its ability to work with multiple tracks and

annotations, it was of great help in preparing the signal files for segmentation in Praat.

Audacity was also used in uncompressing the MP3 data. Although the unpacking of the

compressed sound data could have been done in Praat, we decided to dispose of the MP3

format right away, and immediately work with the WAV. The resulting decompression in

Audacity and Praat should be the same, considering that both programs use the same MP3

decompression library (MAD39

). Although it was shown that properly compressed signal files

can, to a satisfying extent, be used in a research like this (Van Son), we had to be careful

about uncompressing the data to WAV files, because low-quality decoder could alter our

data.

38

This included the exact order of the randomised sentences, the upper limit settings in Praat and the unique

depersonalised identification string in the measurements. 39

The software extension used both in Praat and in Audacity. Its MPEG audio decoder compliance was a

guarantee that the decoding would be done in accordance with ISO/IEC 11172 international standard

(http://www.underbit.com/resources/mpeg/audio/compliance/).

Page 39: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 39

We used Praat to acquire the largest part of information from our signal files

(formant, intensity, pitch, and time). Thanks to the program‟s scripting features, we

developed custom scripts that compiled data in accordance with segments and labels (chapter

4.9). The data was afterwards exported to tab delimited text files.

R is a statistics-programming package. The use or R in experimental phonetics we

encountered in Harrington‟s Phonetic Analysis of Speech Corpora. The book primarily

describes the use of EMU and EMU-R (the software extension for linking EMU and R).

Although we liked the combination of EMU and R, we decided to choose Praat for acquiring

data, even though that required additional work in scripting and programming. The scripts in

Praat produced text-table data that was easily imported into R for analysis. R was also used

to create some of the most conclusive images in the Paper.

SNTRecorder is a second piece of software written especially for the research. It was used

during the recording session by each speaker. With this program, we solved the potential

problem of reading the sentences too slow or too fast. In addition, if some sentences needed

re-recording the program had an option to select the sentences. SNTRecorder took a project

file as the input. The project file consisted of the template sentence and the word list. When

the project loaded, the program created the full sentences based on the template and the list.

The full sentence list was then shuffled, so each speaker had a unique recording set. Parallel

to this, the program created a time list, a unique list of randomly chosen intervals of 1 to 4

seconds. We decided to randomize time intervals so our speakers would not develop a rhythm

in pronunciation. The program40

was used as follows:

1. The project is loaded and the program creates the sentences and the time list, and then

shuffles them.

2. A user sits in front of the screen and the initials.

3. The program shows a sentence and a red line in the lower part of the sentence screen

(figure 10). The sentence is crossed out at this step, as a signal to the speaker to read

the sentence without saying it aloud.

4. The randomly41

selected time for the current sentence elapses and the red line changes

to green, and the text appears normal.42

40

We wrote the SNTRecorder in Python programming language. The user interface is created in Tkinter, Tk

implementation for Python. 41

This proved to be very useful, combined with random pauses between words. The only condition for students

was not to speak during the display of the red line, so they read without pressure (as much as the situation

allowed) and they were unable to hasten the process.

Page 40: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 40

5. The speaker pronounces the sentence and presses the “next” key to continue.

6. Steps 3 through 5 repeat until all sentences are recorded.

7. The program informs that the current session is over and asks if there is need to repeat

some items. If answered yes, a window is shown to select the sentences and to repeat

steps 3-6 for a given selection. The session ends once there is no re-recording.

8. A new user is ready and the cycle restarts.

Figure 10. A SNTRecorder screen, shown to a speaker just before

pronunciation is allowed

4.9 Segmentation and Theory behind It

The segmentation was done in Praat, using TextGrids, Praat‟s annotations. Segments are the

annotated parts of sound data, superimposed over it, used as references for measurements or

for further data extraction. TextGrids allowed very precise placement and manipulation of

segments, and, overall, they were very important for consistent work and measurement.

42

The program provides a sound feedback as well. There are two distinct short sounds after a new sentence

appears, when no speaking is allowed, and just after the speaker is to read the content off the screen. The idea

was to wire the output of the computer to the recording equipment, so the signal files contain the feedback

sound marks: this could have helped in the initial segmentation (the computer would have searched for the

particular feedback frequencies marking the starts and the ends, and made two cuts for each sentence). However,

the wiring proved to be too inconvenient during this particular recording, so the automation was left to be

implemented in some other project.

Page 41: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 41

Two methods were used to make annotations in the grids, which enabled the

extraction of several types of values: measuring between two points (for measuring time) and

extracting values from single points in data (for measuring formants, intensity, and pitch).

Before the actual segmentation of the recorded 16 signal files,43

each containing 32

sentences, the signal files were individually checked in Audacity for errors or strange peaks.

Afterwards, they were opened in Praat, inspected and segmented44

. We created 16 TextGrid

files containing segmented parts of our recorded sentences.

The segmentation was done on three levels (figure 11): three Praat “tiles” (grid

elements) were created per TextGrid, for two different types of calculations. One tile (number

3) was named “word” and it included the words within the sentences. The second (“diph”,

number 1) marked the beginning and the end of the diphthongs within the words. They were

both used to extract the length of the referring elements. The words were labelled with corpus

word names (“bourse”, “Joyce”), while diphthongs followed more detailed procedure. Each

diphthong was marked with ASCII45

letters representing both the first and the second targets,

followed by an underscore and length mark (“s” for short and “l” for long). For example, /ɪə/

in “fierce” was labelled “ia_s”.

The third tile contained not intervals, but the single points that referred to the two

targets within the diphthong. We placed the point marks in the positions that we believed

approximately displayed the best articulatory values of the target vowels within a diphthong.

The points were labelled by a diphthong name, followed by an underscore, a length mark,

again an underscore, and the target number (i.e “babe” had the word label “babe”, the

diphthong interval label “ey”, and the targets “ey_l_1” and “ey_l_2”)46

.

43

15 recorded by our Serbian speakers, and one by the referent speaker. 44

The sound data annotation was very time-consuming. It was done in several steps. 45

The system consisting of mostly English alphabet letters. It is widely supported in software, unlike the more

modern Unicode system. 46

In the chapter about the results the notation is explained again.

Page 42: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 42

Figure 11. A TextGrid example, drawn here below a waveform

Table 12 shows the total number of annotations that needed to be placed manually

during work in Praat after visual inspection and correct settings. The completed 16 TextGrid

files with 2046 labels underwent detailed automated check47

to ensure all segmented elements

were properly formatted, placed within correct subordinate elements, and in full number.

Table 12

The overview of the TextGrid elements in the corpora, per file and the final count

Sound

signal

“words” (interval) “diph” (interval) “points”

(point)

Total

1 32 32 64 128

For 16 sound signals, total marks: 2048

The segmentation itself was performed after referring to several sources. One of the

methodological issues in the research was where to place boundaries that limit the data, later

used for measurements. Harrington and Cassidy gave an overview for determining vowel

47

Again, a custom code was written to load the TextGrids and run several checks (diphthong count, matching,

etc). The code uses a part of NLTK library. More details in the Appendix.

oyl

oyl1 oyl2

joys

Time (s)

55.76 56.24

Page 43: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 43

targets (Techniques 59) by different authors. In some studies the targets were defined as

section or “the formant values at a single time point” (59), while in others authors propose an

entire section. In the second instance, it was suggested that 25% of the total steady duration

of a vowel be taken as the reference section. However, this can be applied to monophthongs;

applying it to diphthongs would bring into focus some other problems, such as defining the

limits and lengths of the two targets. Also, a problem related to the steady state in vowels is

that such “steadiness” is not substantiated by firm evidence (59).

Another proposed measurement approach was to focus on F1 movement. The authors

suggested calculation from the point where “the first format reaches its maximum value”

(60). The rationale was that F1 movement is related to the jaw movement, achieving the

target values within a syllable. This approach should be applicable in certain vowel classes,

“at least for most phonetically open and mid vowels”, where “F1 in general should be in the

shape of inverted parabola whose maximum occurs at the vowel target” (60).

In the chapter about characterising vowels and measurements, Ladefoged writes that

“in short monosyllables ... that do not have diphthongs, an interval near the middle of the

vowel is usually appropriate” (Data 104). However, in diphthongal sounds there should be

two points for formant measurement, “one near beginning, but not so close as to be part of

the consonant transition”, and the second should be “near the end, but again sufficiently far

from any consonantal effects” (150). Kent and Read cite 50 ms as “one fairly reliable

temporal constant of stop articulation” during which a transition takes place from a stop to a

vowel and from a vowel to a stop (Acoustic 116). In the segmentation of tokens for analysis

we had taken 50 ms to be an estimate for the transition Ladefoged refers to.

Page 44: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 44

Figure 12. An approximate 50 ms after the plosive release, the region

after which the beginning of a segment is placed (the first line in aw_s

part). The waveform and the spectrogram show the word “doubt”. The

formant and pitch lines were visible since they help in segmentation.

During the segmentation phase, we used both approaches, each for the different use:

F1 movement was usually a good reference for reaching the target value, so no measurement

occurred before F1 reached the expected maximum. The measurement points of the two

diphthong targets were placed approximately in the middle of the targets (the points were

used to measure F1, F2, F3, intensity and pitch).

However, these were not sufficient techniques for the measurements larger than a

single point: in the measuring of a whole-diphthong (and whole-word) duration, the

segmentation included placing one boundary at the start of the TextGrid segment, and the

other at the end. The temporal segmentation was no easy task. It was done at the first step,

before placing points for the single-value measurements; it exacted several manual “passes”

through the corpus and detailed inspection of waveforms, spectrograms, formants, and pitch

(they were all very important signals to determine where a word or diphthong approximately

began, and where they ended).

This was a complex issue due to coarticulation, the variety of speaking styles our

speakers employed, and the fact that our speakers were females. For example, Speaker 8 had

no voicing in vowel that preceded the voiceless plosive. If we were relaying only on voicing

as a criterion for a vowel, then this speaker would, in some examples, have had an extremely

Page 45: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 45

short vowel duration (which would not have been true, compared with other data and the

place of the plosive release). In some words in data recorded by Speaker 14, a sudden pitch

drop was evident just before the plosive, as a mark of the diphthong end, because the speaker

was speaking so fast that other cues were hard to discern.

The fact that all analysed data originated form female speakers meant taking care

about higher frequencies, spectral resolutions and formant cues. Even though we were aware

of the differences between male and female voice, as well as of methodological issues,48

our

only concern (since we had no mixed corpus) was to take good measurements and have

consistent procedures.49

The result of the above considerations was a set of rules for temporal segmentation of

diphthongs, to some extent influenced by the theoretical assumptions in this chapter, and to

some extent by our own in-practice observations:

1. After a plosive, a boundary was placed 0.04 to 0.05 seconds after the release or

voicing. This was considered sufficient to reduce the influence of the consonant.

With some speakers this could have been even more, with some less, but a precise

boundary would have been extremely difficult to determine: it would have lacked

consistency and would have been influenced by subjective factors.

2. The second boundary was placed after whatever a speaker pronounced that was

supposed to be a diphthong. For example, if /ðɛətu/ was pronounced with “rhotic

r” instead of /ə/ as the second target, the boundary was placer after /r/. This

included, but was not limited to, the lack of voicing and pitch drop. This rule was

crucial in determining the temporal domain of diphthongs (errors in words were

on much lower scale, because words last longer).

3. When placing the second boundary before the fricative /s/, on average three cycles

in waveform were indicative of the boundary limit. This seemed to be an

interesting consistency throughout the corpus.

4. When preceded by a voiceless plosive, the second boundary was placed after the

voicing in the segment ends, where applicable.

48

Kent and Read in the chapter “Speaker Variables: Age and Sex” quote Titze: “One wanders, for example, if

the source-filter theory of speech production would have taken the same course of development if female voices

had been the primary model early on”. (154) 49

More about this in Praat settings below.

Page 46: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 46

Figure 13. Selecting the target points for the diphthongs

The primary rules for selecting the points used as the places for measuring formant, pitch and

intensity were:

1. The target points must be within the temporal boundaries.

2. The target points must be in the section where F1 reached sustained maximum value.

3. The target point must be selected while optimal upper formant frequency was set.

4. The target points must be within the expected range (formant curves must have

expected forms).

4.9.1 Praat Settings Methods

Spectrogram settings in Praat were applied having in mind the instructions about the

differences in measurements in male and female voices. Thus, we expected to find

approximately one formant per 1200 Hz in our research (Ladefoged, Data 125), because all

of our speakers were females.

Page 47: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 47

Our primary references for the settings, segmentation and measurements came from

Ladefoged, notes from the Praat help files, Weenin‟s instructions (Speech Signal Processing

With Praat), Harrington, and other sources.50

These were used to assemble a set of methods

that we hoped to be suitable for an approximation (as any measurement is an approximation)

of the digital data we recorded with our speakers.

The upper limit for spectrogram settings in Praat ranged from 3400 to 3800 Hz. The

generally suggested value was about 1000 Hz for male speaker and 1100 Hz for female

speaker per formant. After examining our data it was obvious that 1100 Hz, as suggested in

Praat, was a very low upper frequency in formant calculations. When 3300 Hz, which

corresponds to 3 formants in the 1100 Hz range, was set in Praat, the program could rarely

determine the most probable F3 values. This was a problem, because of the narrow

differences some speech sounds have in the F2/F3 range (most notably upper front vowels).

Figure 14. The diphthong and the formant marks from “joke”, after 3800 Hz

was applied as the maximum frequency for the first three formants

However, our knowledge of formant distribution within vowels and vocalic

articulatory features enabled us to introduce a consistent method for setting the upper

frequency limit in formant calculations. The method consisted of gradual increase of the

upper frequency for the calculation of the first three formants, until, in the region of /ɪ/, a

significant amount of F3 values (re-calculated by a program) became visible (drawn upon the

spectrogram). The vocal /ɪ/ was used in this step because of the near position of F2 and F3.

Of course, even this is an approximate selection, but when accompanied with the spectrum

50

Most of works about acoustic phonetics in our working bibliography had notes about the analysis.

Page 48: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 48

observation, it provided a good orientation for measuring formant values within targets of the

diphthongs.

In practice, this meant that the initial upper frequency setting was 3300 Hz, while the

program was set to calculate only 3 formants. Because all of our subjects were young

adolescent females, the upper frequency limits were expectedly relatively high. Table 13

shows the upper frequencies used in the measurement settings. They frequencies range from

3400 Hz (two speakers, one of which is our referent speaker, who is in the upper age bracket)

to 3900 Hz. The average value is 3620 Hz, with a deviation of 112.5 Hz.

Table 13

Maximum frequency settings for formant calculations

Speaker Frequency settings (Hz)

16-speaker-hz 3400

15-speaker-sr 3600

14-speaker-ao 3500

13-speaker-jr 3900

12-speaker-vv 3600

11-speaker-dz 3600

10-speaker-st 3500

09-speaker-tz 3770

08-speaker-ni 3650

07-speaker-lc 3600

06-speaker-ip 3700

05-speaker-gl 3500

04-speaker-ym 3800

03-speaker-im 3800

02-speaker-jj 3400

01-speaker-jk 3600

Average: 3620

Average deviation: 112.5

4.9.2 Praat Scripting and R

Page 49: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 49

After we created the TextGrids we wrote a Praat script to automate the measurement, which

was possible because Praat has a feature for task automation. The measurements could have

been done manually, but this would have been strongly influenced by factors of human error,

which could have appeared during reading values of more than 2000 elements in the corpus.

Scripting enabled not only the reduction of possible human error, but also consistent and

easily manageable measuring.

The script was written to give joint measurements for the signal files. Thus, we had

one folder with our corpus files (30 files, out of which 15 sound files and 15 TextGrids), and

another for referent files (2 files). The script run by loading all TextGrid file names into

Praat working space. Once the files were enumerated, the first TextGrid and the

corresponding sound file were loaded. The second step was to create analysis objects,51

while

beforehand applying upper limits for each recording individually, as specified in the table

above.

The calculations were made on the file level, instead of on the diphthong level which

means that the measurements were taken by passing all file content to the program, not just

the bits containing words or diphthongs. By this we avoided the influence of “analysis

window” (Johnson 32) which reduces important data near the ends of segments.

The next step in the script52

was to calculate the lengths of diphthongs and words,

while checking if they were all present in the TextGrid. Afterwards, pitch, formant, and

intensity calculations were read per point (the first and the second diphthong target). The

lengths were saved in one file, while formants, pitch and intensity in another.

All data was saved in tab-delimited text files with proper column headings:

file/speaker handle, diphthong, word, time, f1, f1, f3, pitch and intensity labels.

R: A Language and Environment for Statistical Computing

R, or officially R: A Language and Environment for Statistical Computing, was our primary

tool for statistical analysis, calculations, and drawing graphics. Its value for this paper was

51

A Praat element with (in our case) calculated formants, intensity and pitch.. 52

One part of the script was more elaborate, and involved a special reading of the data. The time of the

diphthong (n) is divided by 10, and for each n1 time span, formant values were read and saved. This file was

later to be used in determining the phases of diphthongs and target attainments. However, the analysis proved to

be methodologically very challenging.

Page 50: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 50

immense. The analysis of data in R included learning the R programming language53

in order

to produce the results and to attest the validity of calculations.

The code in R was divided into several sections, the most notable being fpi (formants,

pitch, intensity), time and graph. These sections processed the measurement data, and filtered

them in the same time. Thus, we had mean values of the first and the second targets for both

the corpus and referent recordings. The same sections were used to create calculations from

the Marković corpus.

The second important aspect of R was the plotting function. We created several

plotting elements that were used to generate many images in this paper. The basic idea

revolved about the F1/F2 graph (Ladefoged, Data 131) on which other elements were placed:

individual IPA signs or ellipses.54

53

The author would like to thank people from the official #R channel on Freenode Network for their unselfish

help during programming. 54

Ellipses were calculated using the “car” package for R.

Page 51: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 51

5. Results

This chapter is about the results we reached in our work. The primary topics are formant

targets, length of diphthongs and words, pitch, and intensity.

5.1 A note about the Terminology and Notation

The number of diphthongs in the chapters about results is 16. This is due to our

approach, where each of the eight diphthongs was observed in two versions, the long and the

short. As already noted, the variants were labelled with the letters “l” and “s” (which is the

same as +V and -V). We are aware that it is not usual to place non-IPA characters within IPA

ones as was done here, but we found it convenient to represent long and short diphthongs (i.e.

/__s/ for short and /__l/ for long). This system was used to differentiate the vowels within the

diphthongs, which means that there were 32 targets: 8 diphthongs times 2 variants

(short/long) times 2 targets (first/second).

We compiled the results by comparing two recorded sources, and these were the

recordings from the students55

, and the recordings from the native RP speaker56

. Where we

referred both to the students‟ and the native speaker‟s data, the term “corpora” was used.

55

Also referred to as: “the data from Serbian speakers/students/speakers”, “Serbian data” 56

The terms ”referent data”, “the native/test/RP/English speaker” was also used to denote the same data.

Page 52: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 52

5.2 Diphthong Length

5.2.1 Isolated Diphthong Lengths

This chapter is about observations and results regarding the temporal domain or the lengths57

within the corpora. We were primarily interested in the diphthong lengths, but, as we will

show, a discussion about the word length was necessary as well.

The diphthong lengths were described in two ways. We will see first how much

diphthongs differ in both corpora in terms of time expressed in seconds. The second approach

will be discussing the diphthong lengths by the percentage they account for in their words, or

their ratio length. Some of the observations deal with the differences between the

measurements obtained from the native speaker data and the Serbian data: in such cases the

values were calculated by subtracting the corpus from the referent data (in seconds or

percents).

Table 14

Measured time range in the diphthongs

Range Minimum Maximum

Referent 0.134 s 0.296 s

Corpus 0.190 s 0.279 s

Table 14 shows maximum and minimum values (range) for diphthong lengths in the corpus

and the referent data. The range in the referent speaker was lower than the lowest corpus data

and higher than the corresponding corpus data. This shows that the time domain in the

English speaker‟s pronunciation was more differentiated when compared to the

measurements taken from Serbian data.

57

“Temporal domain length” (or just “diphthong length”) was introduced for emphasis and differentiation from

the “diphthong magnitude”, which also relates to length, but in the context of distance in F1/F2 plain (Chapter

5.3).

Page 53: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 53

Figure 15. Mean diphthong lengths in both corpora (sorted by the referent speaker data)

Figure 15 shows mean diphthong lengths from the corpora and the referent data, sorted by the

latter. The range now becomes more apparent, as well as its relation to the overall data. Most

of the diphthong lengths from the Serbian speakers are distributed in the right half of the

graph, which means that the majority of the glides lasted longer in their pronunciation.

Another conclusion from figure 15 is that the short diphthongs are the ones that the

Serbian speakers tended to pronounce longer than our referent speaker: all short diphthongs

were much longer, when compared to overall data, from the corresponding diphthongs in the

English speaker‟s pronunciation. Table 15 provides more details.

Table 15

Mean diphthong lengths in seconds and percentages (both corpora, sorted by

the referent speaker data)

Diphthong Referent Corpus Difference (s) Duration

Page 54: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 54

ratio (ref :

cor) (%)

eɪs 0.134 0.211 -0.077 63.51

əʊs 0.135 0.205 -0.07 65.85

ɪəs 0.143 0.191 -0.048 74.87

aɪs 0.144 0.202 -0.058 71.29

ʊəs 0.15 0.202 -0.052 74.26

ɛəs 0.153 0.195 -0.042 78.46

ɔɪs 0.165 0.207 -0.042 79.71

ɑʊs 0.166 0.238 -0.072 69.75

ɪəl 0.21 0.19 0.02 110.53

eɪl 0.211 0.229 -0.018 92.14

ɛəl 0.239 0.278 -0.039 85.97

əʊl 0.243 0.249 -0.006 97.59

ɔɪl 0.262 0.252 0.01 103.97

ɑʊl 0.264 0.278 -0.014 94.96

ʊəl 0.279 0.249 0.03 112.05

aɪl 0.296 0.28 0.016 105.71

Table 15 is sorted so values in the referent column are rising. What we immediately noticed

in the English speaker data is that all short variants of the diphthongs shared the lowest

values, while the long accounted for the highest values. This confirmed our choice of the

corpus words and fortis versus lenis distinction (chapter 4.2).

Table 15 divides the diphthongs into two groups, one with the short glides and another

with the long ones. The difference column is the result of the corpus values being subtracted

from the referent values. Thus, the closer the value to zero, the more similar the length of a

diphthong; a negative value in the difference column indicates that the Serbian speakers

pronounced the diphthong longer than the English speaker, while a positive value refers to

the diphthong spoken shorter than by the referent speaker.

The diphthong /ɔɪl/, or the long version of /ɔɪ/, in the last quarter of the data set, had

the most similar length, and its difference was closest to zero, 0.01 second. The diphthong

just above it in the list, /əʊl/, had the difference of -0.006 seconds, and the one below, /aɪl/,

Page 55: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 55

had the difference of 0.016 seconds. Both diphthongs have the differences expressed in

centiseconds, while the time of /ɔɪl/ is expressed in milliseconds – this precision is indicative

of the overall similarity in the length.

The last column in table 15 shows the length ratio, or the length of the diphthongs as

pronounced by the Serbian speakers compared, in percents, with the length measured in the

RP‟s pronunciation. For example, the data in the column for the long variant of the diphthong

/ɪə/ shows that it lasted 110%, or 10% longer than in the native speaker‟s pronunciation.

Figure 16. Differences in seconds between the diphthong lengths in corpora

Figure 16 shows the differences in diphthong lengths, expressed in seconds. Our provisional

central (zero) point /ɔɪl/, the glide with the length similar in the corpora, is on the 13th place,

counting from the lowest value. Four diphthongs are in the positive region, /ɔɪl/ included, and

12 in the negative (that is another way of saying that the Serbian speakers pronounced 12

diphthongs longer than the native speaker). In percents, if we temporarily exclude /ɔɪl/ as our

Page 56: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 56

reference point, 25% of the diphthong variants are in the positive region, while 75% of them

are in the negative (one quarter of the diphthong variants students pronounced shorter). Table

16 shows diphthongs again, this time with the difference column only, expressed in both

seconds and ratio.

Table 16

Diphthong values and differences (seconds

and percentages) in both corpora

Diphthong Difference (s) Ratio (%), ref.

vs. cor.

eɪs -0.077 63.51

əʊs -0.07 65.85

ɑʊs -0.072 69.75

aɪs -0.058 71.29

ʊəs -0.052 74.26

ɪəs -0.048 74.87

ɛəs -0.042 78.46

ɔɪs -0.042 79.71

ɛəl -0.039 85.97

eɪl -0.018 92.14

ɑʊl -0.014 94.96

əʊl -0.006 97.59

ɔɪl 0.01 103.97

aɪl 0.016 105.71

ɪəl 0.02 110.53

ʊəl 0.03 112.05

We can conclude that the Serbian speakers in our research tended to pronounce 20% of

English vowel glides shorter than our referent English speaker, while 80% of the English

diphthongs they pronounced longer than the test speaker native in RP.

Considering we are operating on the scale of -3 (milliseconds), and that the zero point

could have been /əʊl/ with -0.006 seconds or /aɪl/ with 0.016 seconds, we can recalculate the

percentage. First, we could conditionally accept that all diphthongs that in the Serbian

Page 57: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 57

speakers‟ pronunciation lasted between 95% and 105% of the native speaker‟s diphthongs are

“of the same length”. There are three diphthongs that fulfil that condition:/ɔɪl/, /əʊl/ and /aɪl/.

The rest of the diphthongs range up to the maximum 112% in /ʊəl/ to the minimum 63.51%

in /eɪs/, which is 13 diphthongs or 81.5% of all diphthong variants. .

Table 16 points to another detail of the diphthong lengths: the distribution of short and

long glides. All short diphthongs have differences lower than zero (they were shorter when

pronounced by the English speaker), and – they are at the bottom of the scale. There are four

long diphthongs above zero, while the remaining four are below the established zero value.

This shows that the Serbian speakers in our research pronounced the short diphthongs longer

than the referent English speaker (the difference was between -0.042 seconds in /ɔɪs/ and -

0.077 seconds in /eɪs/).

Amongst the four long diphthongs that lasted longer in the student‟s pronunciation,

the smallest difference is 0.01 seconds in /ɔɪl/, while the biggest is in /ʊəl/, 0.03 seconds.

However, the difference in the long diphthongs, even in the ones pronounced shorter by the

students, was smaller than the difference in the short diphthongs: the short diphthongs had the

greatest differences.

5.2.2 Length of the Words

While most of the diphthongs were longer in the Serbian pronunciation, the words were all

longer (all words were shorter in the pronunciation of the native speaker).

Table 17

Differences and values in seconds between the word

lengths in corpora (all words included)

Word Corpus Referent Difference (s)

boat 0.467 0.261 -0.206

babe 0.44 0.263 -0.177

bait 0.47 0.285 -0.185

dare 0.418 0.302 -0.116

doit 0.505 0.303 -0.202

joke 0.502 0.331 -0.171

doubt 0.494 0.334 -0.16

toyed 0.487 0.336 -0.151

Page 58: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 58

bite 0.483 0.337 -0.146

dice 0.508 0.361 -0.147

fierce 0.612 0.368 -0.244

bode 0.468 0.368 -0.1

bourse 0.578 0.371 -0.207

Job 0.523 0.373 -0.15

bide 0.458 0.386 -0.072

fears 0.593 0.4 -0.193

bows 0.548 0.407 -0.141

theirs 0.556 0.409 -0.147

idiot 0.551 0.411 -0.14

gourd 0.516 0.44 -0.076

dies 0.567 0.443 -0.124

dais 0.514 0.444 -0.07

douse 0.546 0.457 -0.089

daze 0.517 0.47 -0.047

Joyce 0.549 0.471 -0.078

idiom 0.502 0.475 -0.027

gouge 0.528 0.477 -0.051

Joys 0.554 0.481 -0.073

graduate 0.69 0.508 -0.182

thereto 0.684 0.525 -0.159

daresay 0.72 0.55 -0.17

abjured 0.668 0.661 -0.007

Table 17 shows that only “abjured” was spoken with approximately the same length (the

difference is -0.007 seconds). The longest difference is in the word “boat” – almost a quarter

of a second (0.21 sec).

Page 59: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 59

Figure 17. The length of words in corpora (excluded polysyllabic words: thereto,

daresay, graduate)

The difference in the word lengths may not be overly conclusive, considering the fact

that the native speaker used a different rate of speech, and that we are comparing such data to

the mean values of 15 other speakers who are not native in English.

However, we can still use the data to comment the lengths. For example (table 18), we

can say the Serbian speakers had 18.1% less diversified deviation within word lengths. This

shows that our speakers were less aware of the length58

than the referent English speaker was.

Table 18

Standard deviation in word lengths in referent and

corpus data

58

At least during articulation – this paper does not discuss the perceptual features of length.

Page 60: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 60

Standard deviation

Referent Corpus

0.0891 0.0730

Figure 18. The word “graduate”, as spoken by the referent speaker

and by one of the Serbian speakers (the line marks intensity)

5.2.3 Ratios of the Diphthongs within the Words

So far, we were discussing a length of a diphthong as of an individual element

extracted from the word in which the diphthong was pronounced. Thus, the diphthongs were

described as having a length in seconds. However, there is another perspective in analysing

the lengths. It involves comparing the lengths of the diphthongs with the lengths of the words

in which the diphthongs were pronounced. In this perspective, the lengths were expressed in

percentages, and we labelled such lengths as “the ratios of the diphthongs within the words”.

Page 61: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 61

We consider this equally important as the previous discussion, because it compensates for

different speeds of pronunciation (the students versus the native speaker).

Before we start with the percentage ratios, let us see the overall relationship between

the word lengths and the diphthong lengths (table 19): the diphthong and the word lengths

were greater in the students‟ pronunciation (+0.028 s and +0.218 respectively).

Table 19

Mean length in seconds (the diphthongs versus the words)

Mean diphthong length (s) Mean word length (s) Percent

Referent 0.205 0.393 52.16%

Corpus 0.233 0.521 44.72%

In percentages, the RP speaker had pronounced her diphthongs with 52.16% length of the

words. In students‟ pronunciation, we measured that the diphthongs lasted approximately

44.67% of the words length. That means that students, when compared to the RP speaker,

“undershot” for about 7.44% of the diphthong length.

Table 20

Ratios of the diphthongs within the words

Diphthong Referent (%) Corpus (%) Difference (%)

eɪs 35.32 43.88 -8.56

ɑʊs 42.46 46.7 -4.24

aɪs 41.26 41.29 -0.03

ɛəs 28.52 28.1 0.42

əʊs 44.53 43.35 1.18

ʊəs 35.36 33.1 2.26

ɪəs 37.23 32.71 4.52

ɔɪs 45.17 40.06 5.11

ɑʊl 59.75 52.63 7.12

ʊəl 53.37 44 9.37

ɛəl 67.71 58.04 9.67

eɪl 59.55 49.38 10.17

ɔɪl 64.25 49.47 14.78

əʊl 65.83 50.85 14.98

Page 62: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 62

ɪəl 49.74 34.38 15.36

aɪl 72 55.35 16.65

Table 20 describes the diphthongs in terms of their lengths within the words (in percentages).

The column difference was calculated by subtracting the corpus values from the referent

values. Again, a positive value in the difference column means that the English speaker had

greater values in pronunciation, or more precisely, that a particular diphthong had a greater

percentage within the word in which it was pronounced. A negative value shows that the

Serbian speakers had greater length within the word when compared to the native English

speaker.

Table 21

Standard deviation in the diphthong lengths within the

words

Standard deviation

Referent Corpus

13.4% 8.67%

Standard deviations (table 21) again show that the native speaker had a greater differentiation

of the diphthong lengths within the words, 13.4%, while the deviation in corpus is about

8.5%.

The best values ratio-wise was measured in the short diphthong /aɪ/, just -0.03%. This

means that the diphthong in the Serbian speakers‟ pronunciation lasted 0.03% longer than in

the test speaker, measured on word level. This makes /aɪs/ equally realised percent-wise in

the corpora.

After the data is sorted by differences (table 20), we see that two more diphthongs

have negative values, /eɪs/ and /ɑʊs/, which means they lasted longer within words spoken by

the Serbian speakers. Both diphthongs are short, and the difference was noticeable: 8.56 and

4.24 percent, respectively. There are six diphthongs that had 5% or less difference in the

length within the words in which they occurred, when compared to the native speaker, and all

such diphthongs are short: /ɑʊs/, /aɪs/, /ɛəs/, /əʊs/, /ʊəs/ and /ɪəs/ (although some short

diphthongs had ratio of about 8%).

Page 63: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 63

Figure 19. Differences in the diphthong ratios expressed in percentages (i.e. /aɪ/ is

close to zero, which means that it lasted approximately equally)

Table 22 shows the diphthongs sorted by absolute differences in the diphthong ratio

within the words, in groups of less than 5%, between 5% and 10%, and between 10% and

17% (the maximum measured).

Table 22

Absolute ratio differences in the diphthong lengths

x < 5% ɑʊs, aɪs, ɛəs, əʊs, ʊəs, ɪəs

5% < x < 10% eɪs, ɔɪs, ɑʊl, ʊəl, ɛəl

10% > x < 17% eɪl, ɔɪl, əʊl, ɪəl, aɪl

The table demonstrates that the Serbian speakers achieved the best diphthong ratio values in

the short versions of the diphthongs. A greater difference, over 10% of diphthong length

difference within the words, was observed in the longer versions (the upper part of the scale).

Page 64: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 64

The percentage-ratio can be calculated not only on the diphthong level (what is the

ratio of a diphthong in all of the words in which it was pronounced?) but on the word level as

well (what is the ratio of a diphthong in an individual word in which it was pronounced?).

Table 23 and figure 20 are part of the answer to the second question.

Table 23

The lengths of the diphthongs within each word and the

corresponding differences

Word Referent

(%)

Corpus

(%)

Difference

(%)

bait 28.37 43.66 -15.29

boat 36.24 44.4 -8.16

douse 38.71 46.47 -7.76

Joyce 33.07 36.89 -3.82

dais 42.27 44.11 -1.84

daresay 24.41 26 -1.59

bite 40.59 41.44 -0.85

doubt 46.21 46.94 -0.73

bourse 43.84 43.55 0.29

dice 41.93 41.13 0.8

idiot 29.71 27.78 1.93

thereto 32.63 30.2 2.43

daze 52.66 48.45 4.21

graduate 26.87 22.65 4.22

idiom 29.67 24.72 4.95

bows 59.98 54.27 5.71

dare 71.48 64.81 6.67

abjured 40.23 33.28 6.95

Job 56.9 49.88 7.02

fierce 44.75 37.63 7.12

gouge 59.52 50.99 8.53

joke 52.81 42.3 10.51

Page 65: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 65

toyed 65 54.3 10.7

gourd 66.51 54.72 11.79

theirs 63.94 51.27 12.67

dies 64.44 51.05 13.39

doit 57.28 43.24 14.04

babe 66.45 50.3 16.15

joys 63.5 44.64 18.86

bide 79.57 59.66 19.91

bode 74.75 51.81 22.94

fears 69.82 44.04 25.78

Page 66: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 66

Table 23 shows the ratio length and differences on the word level. For example, in the

word “daresay”, the diphthong lasted 24.41% of the word length (which is the lowest value in

the referent data, most likely because the word is polysyllabic). The highest percentage in the

native speaker data was measured for “bide” – 79.57%. In the Serbian data the values were

22.65% in the polysyllabic “graduate” and 64.81% in “dare”. The mean length of the

diphthong calculated on the word level was 50.12% in the native speaker pronunciation and

43.95% in the student pronunciation, while the standard deviation was 15.8% and 10.09%,

respectively.

Figure 20. Differences of percentages between the diphthong ratios on word

level (i.e. “dice” is close to zero, which means that /aɪ/ lasted approximately

equally in the students‟ and native speaker‟s pronunciation on word level)

Page 67: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 67

5.2.4 Conclusion

Considering the relatively small number of the test students, and only one referent

native speaker, we could not reach a detailed and definitive conclusion. Our measurements of

diphthong lengths in seconds (table 16) are provisional only, and there are at least two

reasons for that: the number of speakers and the methodology of segmentation.

However, we can state the conclusions within the scope and purpose of our research.

If we refer back to table 16, we see that the longer versions of the diphthongs had smaller

difference in seconds, which in the percentage ratio ranged from 85% to 112%. In the short

diphthongs the ratio difference was from 79% to 63%. The short diphthongs never reached

100% of the length the native speaker had in her pronunciation, which does not hold true for

the long diphthongs.

When the lengths were placed in the context of the words (table 20), the percentages

showed a different result. The short diphthongs had the smallest absolute difference, with half

of them having less than 5% ratio difference; this means that during recording the Serbian

speakers pronounced the words in which the diphthong lengths in percents was very close to

the percent in the recording of the native speaker. The long diphthongs had different results

ratio-wise: they had a considerably greater ratio, with half of them with over 10% absolute

difference, when compared to the native speaker pronunciation.

We can now compare the two sets of results. The diphthongs that the speakers

pronounced longer than the referent speaker seemed to have better diphthong-to-word ratio;

these are the diphthongs in 5% column in the table, /ɑʊs/, /aɪs/, /ɛəs/, /əʊs/, /ʊəs/ and /ɪəs/.

The diphthongs that had the length in seconds very close to the native speaker‟s had the

largest ratio difference, and these were mostly long diphthongs with an over 10% difference

(/eɪl/, /ɔɪl/, /əʊl/, /ɪəl/ and /aɪl/). Most of the other diphthongs fall in the middle range,

between 5% and 10%.

If we should accept that that the ratio of diphthongs within words is more relevant, the

conclusion is that the Serbian students had problems in the long diphthongs. This relates to

the long vowel realisation and fortis distinction. The overall conclusion could then be that the

students had weaker fortis values than the referent speaker. However, if we consider the

length of the diphthongs per se (regardless of their ratio within words) as the relevant fact for

our research, the conclusion is that the Serbian speakers had the biggest problems with lenis

values, because the measured times differed significantly in the cases where the diphthongs

were followed by a voiceless consonant. In general, we can conclude that although the

Page 68: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 68

Serbian speakers were well aware of the importance of lengths, is seems they did not master

it completely.

Page 69: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 69

5.3 Formant Values

The purpose of this chapter is to give an overview of the differences between formant values

measured in the Serbian and the RP data. We did not provide many details about the

individual diphthongs and their short/long variant, because we were focused on the overall

targets. The chapter starts with the assumptions adopted form the phonetic theory, about the

relationship between the tongue position and formant values. The discussion then moves to

the differences between targets in the corpora. Afterwards, the differences are compared with

the expected results.

The targets in this paper refer to the diphthong “units”, primarily the first and the

second vowel.59

Thus, we have two targets for the short diphthongs and two targets for the

long diphthongs. A note is again necessary about the notation: for example, in /əʊs_1/ “1”

refers to the first target of /əʊ/, the vowel /ə/, while “s” denotes the short version of the

diphthong. The second target (vowel) in /əʊ/ is /ʊ/, but in most of the tables and the graphs, it

was listed as /əʊs_2/ for the short and /əʊl_2/ for the long variant. This practice is consistent

throughout the chapters.

In this chapter we use the expressions “measured target shifts” and “assumed target

shifts”. The targets denote the two representative positions within a diphthong: the first vowel

and the second vowel. The shift is some value, which is in our paper numeric, positive (+) or

negative (-). What the shift describes is a difference between the formants in the target one

and the target two; it shows the value calculated by subtracting the target one from the target

two (thus, if T1 has value of 400 Hz and T2 500 Hz, the shift is -100 Hz).

The shift is assumed (“expected”) or measured. The assumed shift values were

derived from the theoretical observations.60

The measured shifts were acquired by measuring

the data. The details are explained in the next chapters.

5.3.1 Prediction of Target Values – Assumed Shifts

Designing a method for comparing formant targets was a challenging task. The main problem

was the lack of the referent formant values that could be used for the comparison. Even if we

had the referent values for the same corpus, the problem would remain in defining the

achieved/undershot targets. To tackle the problem, we used a method that we hoped would

show the approximate similarities and the differences in formant values. The proposed

59

The middle part of a diphthong can also one of the targets, but it will not be discussed here. 60

In essence, from the fact that F1 and F2 change during vowel articulation. The details are available in chapter

2.4.

Page 70: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 70

solution consists of three parts. Part one was predicting the probable formant values on a

purely theoretical level. The second part consisted of sorting the measurements (from both

the students and the native speaker) and placing them in the context of the expectations from

the first path of the procedure. Finally, the third part was where the results were compared

with the expected results and measured values.

Part one of the procedure dealt with the theoretical predictions of what should happen

with the formants within the diphthong. Such predictions we labelled “assumed target shifts”.

Table 24

Assumed target changes (shifts) in diphthongs

Diphthong T1F1 T1F2 T2F1 T2F2 STF1 STF2

/ɪə/ 1 3 2 2 +1 -1

/ɛə/ 2 3 2 2 0 -1

/ʊə/ 1 1 2 2 +1 +1

/eɪ/ 2 3 1 3 -1 0

/aɪ/ 3 1 1 3 -2 +2

/ɔɪ/ 2 1 1 3 -1 +2

/ɑʊ/ 3 1 1 1 -2 0

/əʊ/ 2 2 1 1 -1 -1

Table 24 is another way of representing relative formant values in vowels, derived from table

25. Table 25 is a standard way to represent relative formant change depending on the place of

vowel articulation (we only added the scale). For example, the table shows that a close vowel

will have low F1, and that a back vowel will have low F2. Of course, the extent of “low” is

relative, so the previous statement should be rephrased as: back vowel will probably have

lower F2 when compared to the same front vowel in the same pronunciation. This means, for

example, that we expect /ʊ/ to have lower F2 than /ɪ/. This also means that /ə/, being

centralised, should have F2 higher than /ɪ/, but again lower than /ʊ/.

Three numbers can represent these relative values: a 1 for “expected lower”, a 2 for

“expected middle value” and a 3 for “expected higher value”. The numbers “1, 2, 3” and “3,

2, 1” in table 25 reflect such predictions (based on chapter 1.4).

We repeated table 25, and the result, with some additions, is shown in table 24. Let us

now explain table 24. To begin, we focus on the first five columns of the table. The column

Page 71: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 71

“Diphthong” shows /ɪə/ as the first entry. The following four columns range from T1F1 (the

F1 value of the first target of a diphthong) through T2F2 (the F1 value of the second target of

a diphthong). In the diphthong /ɪə/ T1 is /ɪ/, T2 is /ə/. Looking back to table 25 we see that the

assumed values for /ɪ/ in (F1, F2) are (1, 3) and for /ə/ (2, 2). The procedure is repeated for

each diphthong.

Table 25

Relationship between vowels, the place of articulation, and the

expected formant change

F2+ F2 -

3 2 1

Front Central Back

F1 –

F2+

1 Close ɪ

ʊ

2 Mid e ə ɔ

3 Open

ɑ, a

What is left to explain is the ST column. The label ST means “subtracted target”, and

it is the result of subtracting61

the assumed values of the second target from the assumed

values in the first target. When the change is positive the formant values rise, they fall when

the change is negative, and remain unchanged (zero) when the there is no change. Thus, the

resulting change can be moderate (absolute value62

is 1) or high (absolute value is 2). Of

course, the table shows only a model of changes between the targets, so zero value denotes a

minimum change, and not the lack of a change (as we will see, there were some uniform

discrepancies in the shifts where the value is zero). We assume these will be sufficient to

review our data in the next section.

5.3.2 Positive and Negative Target Values in Corpora – Measured Shifts

Table 26

Measured target shifts (T2-T1) in the referent data

Diphthong f1 f2 f3

61

The subtraction is therefore analogous to the degree of change in the speech apparatus. 62

Absolute value: the distance from zero (-2 and +2 have the same absolute value, 2); “absolute value is defined

as the distance, without regard to direction, that any number is from 0 on the real number line” (Mathematics 3)

Page 72: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 72

ɑʊl -527.35 -517.51 146.75

ɑʊs -475.35 -663.35 -58.48

aɪl -369.05 1102.68 -192.73

aɪs -310.02 699.77 -240.9

ɛəl -121.84 -106.97 -184.49

ɛəs 98.17 -4.27 58.01

eɪl -174.56 144.01 -16.21

eɪs -82.02 179.86 9.87

ɪəl -2.98 -599.09 -213.74

ɪəs 26.64 -419.61 -60.49

əʊl -209.45 -108.72 -5.79

əʊs -245.74 -122.42 -87.67

ɔɪl -133.1 1208.06 216.23

ɔɪs -161.52 747.89 119.16

ʊəl -149.24 422.77 249.12

ʊəs 3.79 117.44 -72.31

Table 27

Measured target shifts (T2-T1) in the corpus data

Diphthong f1 f2 f3

ɑʊl -411.68 -136.61 181.38

ɑʊs -375.1 -285.84 181.38

aɪl -382.43 861.63 142.15

aɪs -427.73 861.41 177.43

ɛəl -9.65 -404.6 -265.41

ɛəs -91.54 -218.24 -120.21

eɪl -126.71 221.99 75.56

eɪs -195.76 383.58 223.1

ɪəl 74.26 -725.41 -221.93

ɪəs 26.7 -519 -159.03

əʊl -169 -229.7 49.48

əʊs -160.24 -287.13 19.05

ɔɪl -119.65 1152.92 -9.01

Page 73: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 73

ɔɪs -155.41 990.11 14.56

ʊəl 29.59 285.36 -24.27

ʊəs -20.7 273.68 -1.26

Tables 26 and 27 were calculated by subtracting F1, F2 and F3 formant values in Hertz in

two targets:

In the above formula, i is the first, the second or the third formant, 1 the first target and 2 the

second target within one diphthong. We can use the tables to discuss the degree of differences

in formants, between the corpora and the assumed values. Since we are now more interested

in the positive/negative regions then in the exact frequency values, table 28 will be more

useful. It was created by combining tables 26 and 27, while taking positive and negative

values only.

Table 28

Measured positive and negative target shifts in both the

test speaker and the Serbian speakers (“ref” – the native

speaker, “cor” – the Serbian speakers).

Diphthong f1ref f2ref f3ref f1cor f2cor f3cor

ɑʊl - - + - - +

ɑʊs - - - - - +

aɪl - + - - + +

aɪs - + - - + +

ɛəl - - - - - -

ɛəs + - + - - -

eɪl - + - - + +

eɪs - + + - + +

ɪəl - - - + - -

ɪəs + - - + - -

əʊl - - - - - +

əʊs - - - - - +

ɔɪl - + + - + -

Page 74: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 74

ɔɪs - + + - + +

ʊəl - + + + + -

ʊəs + + - - + -

The table shows big similarities between the measured target shifts in data acquired after

analysing the recordings of Serbian speakers (“cor” columns) and the native speaker (“ref”

columns). Let us focus on F1 and F2, as the more dominant formants in vowel distinction,

and see where the similarities and differences are.

Table 29

List of diphthongs and their target

matches/mismatches in the corpora

Target

differences

Diphthongs

The same targets in

F1, F2 and F3

/ɑʊl/, /ɛəl/, /eɪs/, /ɪəs/,

/ɔɪs/

The same targets in

F1 and F2

/ɑʊs/, /aɪl/, /aɪs/, /eɪl/,

/əʊl/, /əʊs/, /ɔɪl/

Mismatches in

targets

/ɛəs/, /ɪəl/, /ʊəl/, /ʊəs/

Table 29 shows there were five diphthongs with the same target differences in F1, F2, and F3

in the pronunciation. For example, in the short version of /ɑʊ/ both the Serbian speakers and

the native RP speaker had the same positive and negative values in diphthong targets.

Consequently, it is safe to assume that the movements and the positions of the articulators

were approximately the same in these five diphthongs: /ɑʊl/, /ɛəl/, /eɪs/, /ɪəs/, and /ɔɪs/.

Page 75: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 75

The next group consists of seven vowel glides: /ɑʊs/, /aɪl/, /aɪs/, /eɪl/, /əʊl/, /əʊs/ and

/ɔɪl/. In the context of the target shifts, they show the same positive/negative sign in F1 and

F2. These diphthongs had no similarity in the F3 target (the referent speaker had negative

values for the F3 target shift in six out of seven diphthongs in this group). This means that F3

values in the first targets were higher than those in the second target, and it implies that lip

rounding63

was prominent in the second target.

The diphthong /ɑʊs/ is in the group with the F3 mismatch. In the short version of /ɑʊ/

the native speaker had higher F3 formant values in /ɑ/ than in rounded /ʊ/, while the Serbian

speakers had not. We can explain formant lowering in this diphthong by more intense lip

rounding in /ʊ/ in the native speaker‟s pronunciation.

The only diphthong in the group where the native speaker has higher F3 in the second

target than in the first target is /ɔɪl/.

The third group of diphthongs in this section about measured target shifts lists the

glides where the Serbian speakers exhibited the greatest differences; these are /ɛəs/, /ɪəl/, /ʊəl/

and /ʊəs/ – all centring diphthongs. Both versions of /ʊə/ are in this group, but it is worth

noting than the RP speaker pronounced “bourse” with a monophthong. The

monophthongisation might have affected the data and positive/negative values. In /ɛəs/ the

students had much lower second targets in F2 and F3; in /ɪəl/ they had higher F1 in the first

target. In the short version of /ʊə/, the Serbian students had lower F1 in the second target,

while in the pronunciation of the long version they had higher F1 in the second target, and, in

addition – lower F3. Standard deviation in this group is similar within the corpora.

5.3.3 Correcting Target Value Predictions

Table 24 provided information about the general change of diphthong targets and the

assumed changes. Let us now see how the table with the expected values corresponds with

the real-world data we acquired from the corpora, and where the differences and similarities

were. Below is table 24 again, with marked discrepancies found after the measurements.

Table 30

Target value predictions (assumptions) with differences within the

corpora (* - different in the Serbian speakers, ! - different in the referent

speaker, in brackets – the measured values)

63

The result of rounding is elongation of the speaking apparatus. This is followed by the drop of resonance in

the apparatus (tube), which consequently lowers the formants.

Page 76: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 76

Diphthong T1F1 T1F2 T2F1 T2F2 STF1 STF2

/ɪə/ 1 3 2 2 +1! -1

/ɛə/ 2 3 2 2 0 (-, +) -1!

/ʊə/ 1 1 2 2 +1*! +1

/eɪ/ 2 3 1 3 -1 0 (+)

/aɪ/ 3 1 1 3 -2 +2

/ɔɪ/ 2 1 1 3 -1 +2

/ɑʊ/ 3 1 1 1 -2 0 (-)

/əʊ/ 2 2 1 1 -1 -1

The target values with a mismatch are marked with * (the student data had different

value) and ! (the referent speaker had different value). The mismatches occurred in /ɪə/, /ʊə/

and /ɛə/. These are all centering diphthongs with schwa as the second target. In /ɪə/, both

corpora had the assumed values except for the short version (/ɪəs/) in F1, where the value in

the RP speaker was negative.

We can further explain (or complicate) the discrepancies by stating what the target

values should have been in the acquired data in order to have the predicted positive/negative

shifts. Thus, to have a plus in the table for /ɪə/, the native speaker should have had either

lower value of F1 in the first target, or higher value of F1 in the second target. Alternatively,

F1 should have been either lower in /ɪ/ or higher in /ə/. When we compare the data64

we see

that the differences were very small indeed: in the long version of /ɪə/ F1 for /ɪ/ was 396.67

Hz and F1 for /ə/ was 393.69 Hz, so the change in shift was highly probable. The Serbian

speakers articulated with greater difference: the values were 388.98 Hz and 463.23 Hz

respectively, and therefore the calculation was positive.

The diphthongs with different measurements, when compared with the assumed target

shifts, were /ʊə/ and /ɛə/. In F1 of /ʊə/, where this diphthong was pronounced long, the

referent speaker had a negative value, while the Serbian speakers had a negative value in the

short version. The referent speaker had -149.2h Hz difference between F1 of /ʊ/ and /ə/ for

the long version, while only 3.79 Hz for the short version, which points to a conclusion that

/ʊə/ was monophthongized. The Serbian speakers had -20.70 Hz difference in the short

version of the diphthong.

64

Detailed data is in chapter 7.4.

Page 77: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 77

We have given details about the discrepancies between the expected positive and

negative values in the measured target shifts. What is left is a discussion about the expected

zero values. In the assumed values a zero, very broadly, meant “lack of change”, but we knew

that such claim was not going to be substantiated by the measurements, because it was highly

improbable.65

As already noted, the assumptions were abstracted from the assumed F1/F2

changes according to the tongue position. What then happened with the target changes that

were provisionally labelled as zeroes (table 24)? The updated table 30 has either "+" or "-" in

the zero cells; this means that the target changes for a particular diphthong were in the

positive or the negative region. For /ɛə/ the F2 target change was negative, /eɪ/ had a positive

F2 target change, while in /ɑʊ/ we observed a negative target difference in F2. The values

were fairly consistent in both corpora for F1 and F2, except for /ɛə/, where the referent

speaker had a positive value in the short version.

5.3.4 Vowel Space in Targets

This section deals with the vowel space in the corpora. The focus is not on the diphthongs

individually, but on the vowel space created by the diphthongal targets. To acquire an

overview of the vowel space we will use F1/F2 plots. F3 data is available only in the tables,

and we will refer to it occasionally.

65

To begin the list of reasons: even the same speaker will not pronounce the same word in absolutely the same

manner to achieve the zero values we “anticipated”. Other reasons are differences due the hardware,

calculations, methods, etc.

Page 78: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 78

Figure 21. Vowel space in targets – the referent speaker (the second targets are above)

Figure 21 shows the F1/F2 plot of data measured in the native speaker. The upper segment of

the graph represents the second targets, which were vowels /ɪ/, /ə/ and /ʊ/. The lower part of

the graph contains the first targets. The two figures were calculated by taking the highest

values in the F1/F2 data, and then delineating66

the vowel space in the graph.

Figure 21 is complemented with table 31, where the columns show minimum,

maximum and mean values for the three formants.

Table 31

Individualised vowels (extracted from the diphthongs) and their values in the referent data

66

This was calculated in R by chull function, that “[c]omputes the subset of points which lie on the convex hull

of the set of points specified” (the R documentation).

Page 79: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 79

Vowel Count f1min f1max f1mea f2min f2max f2mea f3min f3max f3mean

ɑ 2 882.82 900.96 891.89 1600.1 1713.08 1656.59 2513.43 2614.59 2564.01

a 2 691.64 709.66 700.65 1295.36 1483.24 1389.3 2805.72 2937.75 2871.73

ɛ 2 546.69 624.95 585.82 1865.31 1952.69 1909 2653.66 2772.51 2713.09

e 2 523.18 541.93 532.55 2127.7 2267.88 2197.79 2711.26 2717.46 2714.36

ɔ 2 487.11 500.78 493.94 1137.87 1402.66 1270.26 2583.58 2624.88 2604.23

ʊ 6 301.9 455.22 383.08 1049.73 1498.36 1240.57 2537.22 2664.93 2610.09

ɪ 8 339.26 447.1 383.48 2150.55 2447.74 2308.99 2564.82 2799.81 2719.21

ə 8 301.29 644.86 480.32 1295.34 1879.49 1680.69 2556.76 2846.56 2652.19

This table was not used for creating the graph, but it relays on the same data. In the vowel

column we see the total count of vowels in data; i.e. there were six instances of /ʊ/, four in

the second target (up) and two in the first (below).

Page 80: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 80

Figure 22. Vowel space for both targets – the Serbian speakers (the second targets are above)

Table 32

Individualised vowels (extracted from the diphthongs) and their values in the corpus data

Vowel Count f1min f1max f1med f2min f2max f2med f3min f3max f3mean

ɑ 2 823.07 826.65 824.86 1542.39 1606.43 1574.41 2766.15 2782.68 2774.41

a 2 792.74 822.62 807.68 1387.43 1460.32 1423.88 2810.99 2817.31 2814.15

ɛ 2 513.6 544.2 528.9 2135.42 2269.92 2202.67 3005.13 3027.45 3016.29

e 2 491.2 550.48 520.84 2209.54 2367.65 2288.6 2947.11 3024.48 2985.8

ɔ 2 492.95 528.7 510.83 1121.82 1300.17 1210.99 2949.4 2958.97 2954.18

ʊ 6 373.77 456.09 422.01 1285.96 1501.36 1384.7 2873.92 2964.06 2913.38

ɪ 8 354.72 410.31 383.39 2249.06 2593.12 2416.39 2949.96 3170.21 3037.97

Page 81: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 81

ə 8 433.86 546.56 482.62 1530.73 1970.33 1776.05 2762.04 2920.09 2864.04

Figure 23. Vowel spaces divided into targets in the corpora (the second targets are above)

Figure 23 shows the two vowel spaces combined. The differences in the vowel scope are

visible for both targets. The referent speaker data is much broader for most of the vowels in

the second target. It is only in /ɪ/ that the Serbian speakers had a significant shift outwards the

referent space; here, the left edge of the second target in corpus data corresponds to /eɪ/ (short

and long). The /ɪ/ is protruding in the first targets (in /ɪə/) as well; in the first target section the

biggest difference is visible in /ɑ/, where the native speaker had much higher F1 and F2.

The students were successful in emulating the English vowel space, but there were

significant differences. The Serbian speakers seemed to struggle with back rounded vowel

/ʊ/, particularly when it was pronounced in the second target (table 28 from the previous

Page 82: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 82

section shows mismatches for the measured shifts in F3 diphthongs containing /ʊ/, the

formant not shown in the figures). We also noticed diversity in schwa in the pronunciation of

the native speaker, which lacked in the Serbian data: it is /ə/ and /ʊ/ in the referent speaker

that influenced broadening of the second target region. If /ə/ were less diversified, we might

have seen almost a parallel line in the second region, between /ɪ/ and /ʊ/ – and the picture

would be much closer to the Serbian data (and vice versa: more range in /ə/ for the Serbian

speakers could have meant better RP targets).

Figure 24. The English vowel space and the two diphthong targets (the

second targets above), with superimposed Serbian vowel space (the

triangles)

Page 83: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 83

Figure 24 shows the English vowel space for the Serbian speakers and the native speaker, in

the context of the first and the second targets. The vowel spaces from our research are shown

by a full line (the Serbian speakers) and dotted line (the RP speaker). The third data set, the

dot-dash line, is also from the Serbian speakers, but from the research where the speakers

pronounced Serbian vowels (Marković67

). The two dot-dash triangles correspond to short (the

inner triangle) and long Serbian vowels (the outer triangle). We see that the vowel space68

of

the Serbian speakers from our research exerted an influence on their vowel space in English;

this is particularly visible in the lines on the left side, where the full line (the speakers) is

almost parallel to the dashed triangular lines (Marković corpus). Also, such parallel

properties are visible in the upper part, where the second targets follow the Serbian vowel

space. It seems that the long vowel context in Serbian had a strong influence on English

vowel production – at least that is what the graph is leading us to conclude: the parallels that

follow the triangle correspond to the Serbian long vowel context.

67

A corpus compiled from measurements taken after a test group (of the same profile as the test group for this

paper) recorded in Serbian, for a context in which Serbian vowels show long and short realisation. 68

The figure does not contain individual sounds, due to technical restraints: they would not appear clear in low

resolution.

Page 84: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 84

Figure 25. English vowel space in the diphthongs, as measured in the students‟

data (the first targets)69

We already mentioned how schwa was less diversified in the Serbian data. After

examining figure 24, there is another explanation for the second target difference in graphs:

the influence of Serbian vowel space might be stronger than the tendency (or ability) of the

Serbian speakers to have a wide range of schwa sounds. In other words, our Serbian speakers,

in their first university year, fresh from high-school, did not master the central point of the

English vowel system. It seems that the lack of clear perception and articulation of /ə/

affected the overall structure in English as the second language. The table 34 lists schwa

properties regardless of the diphthongal context, and it shows that the Serbian Speakers

69

It would have been useful to have this kind of a plot for the English data as well, but the computation was not

possible with one set of samples (one speaker). The ellipses were drawn by using car package (An {R}

Companion to Applied Regression).

Page 85: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 85

struggled with F1, which corresponds to the close/mid/open positions in vowels. The similar

results were also found for non-final /ə/ (table 33). This demonstrates that throughout the

context where schwa is present, the students most likely had insufficient articulatory closure

level to achieve better results in English (when compared to the native speaker)70

.

Table 33

Values for /ə/ in the final position within “dare”

Vowel Diphthong f1 f2 f3

Corpus ə ɛə 502.6815 2260.998 3029.305

Referent ə ɛə 587.0339 1975.631 2686.869

Table 34

Schwa extracted from its diphthongal context (data from all words)

vowel count f1min f1max f1mean f2min f2max f2mean f3min f3max

Ref. ə 8 301.29 644.86 480.32 1295.34 1879.49 1680.69 2556.76 2846.56

Cor. ə 8 433.86 546.56 482.62 1530.73 1970.33 1776.05 2762.04 2920.09

70

It would be interesting to do a research that would examine formant values in the final versus the non-final

schwa.

Page 86: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 86

Figure 26. English vowel space in the diphthongs, as measured in the students‟

data (the second targets)

5.3.5 Formant Magnitudes

This section is about movements and differences between the two targets. It features two

ways of representing data that we introduce in this paper for the first time. We will first

present graphs showing diphthong targets as lines in the coordinate system consisting of F1

and F2 axes, similar to the representation of vectors. The second approach is drawing and

examining differences by using diphthong magnitude.71

71

We are introducing “diphthong magnitude” (borrowed from the Euclidean terminology) to differentiate it

from the “diphthong length”, a phrase we used in the chapter about the temporal domain. Description of vowels

through the Euclidean calculations is not a novice endeavour. However, we had no access to sources pertaining

to such calculations and differences, nor did the idea come from any formal text. The numbers referring to the

magnitude are not an attempt to present a detailed procedure, but a wish to provide more details about graphs in

the context of this paper. This is a good place to thank the members of the official online IRC channel for R

Page 87: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 87

Figure 27. The movement of the two targets in F1/F2 vowel space – the long

diphthongs in corpora

Figure 27 shows long diphthongs in the research. They were drawn by plotting points A with

coordinates x1 and y1, then points B with x2 and y2. The points A and B were then joined by

a line and an arrow next to the B. The A is the first vowel (target) of a diphthong, with its F2

and F1 values, while B is the second vowel, again, with its F2 and F1 values. This means that

the points represent the approximate values attained in the ariculatory configuration, while

project (statisticians, mathematicians, programmers etc.) for their unselfish help in work with R, and their note

that Euclidean distance could be a suitable way to measure distances in a graph.

Page 88: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 88

the arrow points towards a particular configuration the speech apparatus was attaining

towards the second target in a diphthong72

.

Figure 28. The movement of the two targets in F1/F2 vowel space – the short

diphthongs in the corpora

Therefore, the properties of these diphthong lines73

are: the direction (from the first

target/vowel towards the second) and the magnitude (the Euclidean distance74

in F2/F1 plain

72

Ladefoged (Data 135) gives an example of the F1/F2 plot where diphthongs were represented by several

points between the two targets. Surely, that was possible here as well. The magnitude in that instance would

have been calculated by adding up all of the lengths between the points. However, we decided to choose a

simpler approach, so only the starting and the ending points are used. 73

It is tempting to call them vectors, but we will avoid that, because it would involve mathematical evaluations

we are not sufficiently familiar with. 74

Appendix 7.5 is where the equation is explained.

Page 89: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 89

between the targets). The direction identifies the diphthong, while the distance shows the

extent of the change in the frequency domain.

Figure 29. The magnitude differences (the distances in F1/F2 plot) for all diphthongs

This follows from the assumption that F1 and F2 change in accordance with the acoustic

properties of the articulators. A line defined with two points, each reflecting a particular state

of the vocal apparatus in Hertz values of F1 and F2, will be longer for pairs which are more

distant in the vocal space (i.e. /aɪ/, back-open versus front-closed), than for those that are

closer in the same vocal space (i.e. /ɛə/, central-middle versus central-middle).

Table 35 lists the differences in magnitudes, and figure 29 shows the diphthong

differences in terms of their magnitudes. The closer the difference is to zero, the more similar

were two diphthongs magnitude-wise in the coordinate system.

Table 35

Magnitudes and percentage/distance differences

Page 90: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 90

Diphthong Ref. Cor. Difference Perc. (%)

ɛəl 162.13 404.72 -242.59 249.63

ɔɪs 765.13 1002.23 -237.1 130.99

eɪs 197.68 430.65 -232.97 217.85

aɪs 765.37 961.76 -196.39 125.66

ʊəs 117.5 274.46 -156.96 233.58

ɛəs 98.26 236.66 -138.4 240.85

ɪəl 599.1 729.2 -130.1 121.72

ɪəs 420.45 519.69 -99.24 123.6

əʊs 274.54 328.82 -54.28 119.77

əʊl 235.99 285.17 -49.18 120.84

eɪl 226.3 255.61 -29.31 112.95

ɔɪl 1215.37 1159.11 56.26 95.37

ʊəl 448.34 286.89 161.45 63.99

aɪl 1162.8 942.69 220.11 81.07

ɑʊl 738.86 433.75 305.11 58.71

ɑʊs 816.08 471.6 344.48 57.79

The percentage column expresses the ratios of the diphthongs between the Serbian data and

the referent data. For example, magnitude difference of 249.63% means that the students‟

data had over 2.4 times larger Euclidean distance when compared with the referent data for

the diphthong /ɛəl/. We will use this abstracted data to evaluate the similarities and

differences between the first two formants in the referent and in Serbian data. In the previous

section we discussed the vowel spaces in general, and this is where we see what exactly

happened in the vocal spaces.

Table 36

Magnitude differences in the referent and the

corpus data, according to the diphthong length

Ref. Cor. Difference

Short 431.9 528.23 -96.36

Long 598.61 562.14 36.47

Percents 72.15% 93.97%

Page 91: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 91

Table 36 shows the magnitudes of the diphthongs, grouped by the length. The magnitude of

the short diphthongs in the native speaker was 72.15% of the long ones. The data obtained

from the student shows that the students had only 94% difference in the magnitude between

the long and the short diphthongs. This means that the RP speaker had magnitudes in the

short diphthongs that were for more than a quarter (27.85%) shorter than in the long

diphthongs; the Serbian speakers had the difference of only 6.03%.

Table 37

The magnitude by the second diphthong target (included are only

diphthongs that have these vowels as their second element)

Ending vowel Ref. Cor. Diff. %

ɪ 722.11 792.01 -69.9 109.68

ə 307.63 408.6 -100.97 132.82

ʊ 516.37 379.84 136.53 73.56

When the data is split according to the second targets (table 37), the results are ranged as

expected: the magnitude is higher for the diphthongs whose second target is on the periphery

of the vocal space, and lower for the diphthongs whose second target is closer to the central

positions of the vocal space.

In the RP speaker data, the highest magnitude was measured in the diphthongs ending

with /ɪ/, followed by /ʊ/, while the glides with mid-central /ə/ as the second target had the

lowest magnitude. We observed the same pattern in the Serbian data, though the values were

higher in /ɪ/ (109%) and /ʊ/ (127%), but lower in /ə/ (73%).

5.3.6 Conclusion

Let us conclude the chapter by seeing how similar the overall magnitudes between the

measurements were. The Serbian speakers had the best results in /ɔɪl/, with 95% of similarity

in magnitude (table 35). They pronounced eleven out of the sixteen diphthongs with higher

magnitude (112-250%); out of those, seven glides were short (/əʊ/, /əɛ/, /ɪə/, /ʊə/, /aɪ/, /eɪ/ and

/ɔɪ/) and four long (/eɪ/, /əʊ/, /ɪə/ and /ɛə/). The remainder of the diphthongs, four of them,

consisted of one short (/ɑʊ/) and three long (/ɑʊ/, /aɪ/, /ʊə/) diphthongs. At the top of the

magnitude list is the long version of the diphthong /ɛə/, where the Serbian speakers had 2.4

times higher magnitude in pronunciation. The lowest value of 0.57 was measured in short

Page 92: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 92

/ɑʊ/, while its long counterpart was just a notch above, with the magnitude of 0.58. The

difference in pronunciation is visible in the magnitude results: the change in students‟ vocal

apparatus rarely reflects the scope and length in the change of the PR apparatus. Even when

the magnitude is very similar, like in /ɔɪl/, F1 and F2 are out of the region expected in the

English vocal space, and closer to the Serbian vocal space (figures 22, 24).

This leaves us with a question symmetrical to the one in the conclusion about the

length in seconds: what is important in a diphthong? Are the F1 and F2 values in the first, the

second or in both targets what is of greatest importance; or, must they be followed by proper

magnitude?

Page 93: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 93

5.4 Intensity

We measured formants and pitch at the same points in the corpora. This means that the values

in this chapter show the intensity within a diphthong where formants have a typical

distribution for a particular vowel. Such distribution is not necessarily at the peak of intensity,

and it can be located before or after the peak. There was a solution to this issue: to measure

intensity in all points (the frames of sound data) of a diphthong, and then to calculate the

mean value. However, this would have raised another problem: how to find the intensity at

individual targets, which is our main concern? Again, we could have attempted to solve this

second problem by calculating the mean data for the first and the second target separately, but

how to set the line that separates the two targets? Faced with these considerations, we decided

to measure intensity at the points of formant measurement. What follows is a summary of the

data acquired.

5.4.1 Overall Intensity

Page 94: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 94

Figure 30. Intensity of all diphthongs and their targets in the native speaker data

Intensity in the referent data had higher range than the intensity in the corpus data. The

standard deviation of intensity in the native speaker data was 5 dB, while in the Serbian data

that number was 3.3 dB.

Figure 30 shows intensity from the recording by the RP speaker. The black symbols

refer to the long diphthongs, the white to the short. The triangles denote the first targets, the

squares the second ones. Therefore, there are four symbols per diphthong, two for the short,

and two for the long version. For example, the graph shows that /ʊə/ and /əʊ/ had

approximately the same intensity for the first target (/ʊ/ and /ə/, respectively) in their long

variants (the black triangles); or, that /ə/ in /ɛə/ had the lowest intensity in the RP

pronunciation.

Page 95: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 95

Table 38

Decibels in the short diphthong

targets (the native speaker)

Target (short) dB (ref.)

ɔɪs_2 66.52

ɛəs_2 66.89

əʊs_2 70.11

eɪs_2 73.07

ɪəs_2 73.64

ɑʊs_2 73.82

ʊəs_2 74.13

ʊəs_1 76.51

ɪəs_1 77.62

aɪs_2 79.02

ɑʊs_1 80.01

eɪs_1 80.43

aɪs_1 80.44

ɛəs_1 80.72

ɔɪs_1 81.83

əʊs_1 82.19

Table 39

Decibels in the long diphthong

targets (the native speaker)

Target (long) dB (ref.)

ʊəl_2 67.36

əʊl_2 68.3

eɪl_2 72.86

ɔɪl_2 73.71

aɪl_2 74.13

ɛəl_2 75.18

ɪəl_2 75.97

ɑʊl_2 76.48

ɪəl_1 77.4

aɪl_1 78.78

ɛəl_1 79.05

ɑʊl_1 79.06

ɔɪl_1 80.12

eɪl_1 80.37

əʊl_1 82.24

ʊəl_1 82.46

Page 96: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 96

Tables 38 and 39 show intensity in the targets75

, as pronounced by the referent speaker. The

results were consistent, and indicated that the second targets had the lowest intensity in both

the short and the long diphthongs. Only /ɪ/ in the short version of /aɪ/ showed slight deviation,

because it was found in the group of targets with intensity similar to the first vowels of the

diphthongs.

5.4.2 Intensity by Targets and Length

The targets in table 40 were calculated by their positional distinction (the first versus

the second), and the length (the long versus the short). The results show that the native

speaker had higher intensity in the long diphthongs. As for the target positions, the first

targets in the RP data were spoken with higher intensity than the second targets. Overall, the

difference in intensity was larger on the target level (7.38 dB) than on the length level (0.4

dB).

Table 40

Mean values of the targets within the referent data

Mean Long Short First Second

Referent 76.46 76.06 79.95 72.57

Corpus 75.87 75.57 78.92 72.53

The Serbian speakers had similar difference in intensity. The difference between long and

short diphthongs in the Serbian corpus was small, 0.3 dB versus 0.4 dB. In the intensity

between the targets the students had a higher difference (6.39 dB), which is much closer to

the results of the native speaker.

75

Again, “ʊəl_1”, for example, means “/ʊ/ in the long variant of /ʊə/”.

Page 97: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 97

Figure 31. The intensity values within the Serbian speakers data

Figure 31 shows the intensity distribution in the corpus data. The range of x-axis is the same

as in the previous figure, to better illustrate the differences and similarities. The figure shows

that the Serbian speakers pronounced the words in corpus more uniformly. This is

particularly visible in the first targets, where the symbols (triangles) are almost stacked

together in some diphthongs. The reason for this might be that the native speaker, not foreign

to the studio and recording process, was much more relaxed than our students, who, despite

of our efforts seemed to approach the recording somewhat uncomfortably and tried hard to

read the words properly.

Page 98: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 98

From tables 41 and 42 we see that the pattern is the same as the one observed in the data from

the native speaker: the first targets, regardless of whether they were found in the short or long

diphthong, had a higher distribution of energy in all instances.

Another way of representing data is a subdivision to length versus target distinction:

Table 43

Intensity in corpora calculated by the target position and the length

First, long First, short Second, long Second, short

Table 41

Decibels in the long diphthong

targets (the Serbian speakers)

Target (long) dB (cor.)

ɛəl_2 71.49

aɪl_2 71.57

əʊl_2 72.42

ʊəl_2 72.5

ɑʊl_2 73.01

ɪəl_2 73.04

ɔɪl_2 73.87

eɪl_2 73.96

ɪəl_1 77.83

aɪl_1 78.64

ʊəl_1 78.92

eɪl_1 79.01

ɑʊl_1 79.27

ɔɪl_1 79.29

ɛəl_1 79.47

əʊl_1 79.72

Table 42

Decibels in the short diphthong

targets (the Serbian speakers)

Targets (short) dB (cor.)

əʊs_2 70.97

eɪs_2 71.4

ʊəs_2 71.43

ɑʊs_2 72.27

ɪəs_2 72.41

aɪs_2 72.45

ɔɪs_2 72.92

ɛəs_2 74.78

ɪəs_1 77.53

ʊəs_1 78.03

ɑʊs_1 78.28

ɛəs_1 78.62

aɪs_1 79.19

eɪs_1 79.31

ɔɪs_1 79.46

əʊs_1 80.21

Page 99: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 99

Referent 79.935 79.968 72.998 72.15

Corpus 79.018 78.82 72.732 72.328

When the overall data is separated into the groups of the target position and the diphthong

length, the results show that the native speaker had the highest intensity in the first vowels of

the long diphthongs. However, in our results, the difference was very small when compared

to the first vowels of short diphthongs, just 0.033 dB. The difference was somewhat higher in

the second targets: 0.848 dB in favour of the long diphthongs. The Serbian speakers had

higher difference between the first targets (0.198 dB in long diphthongs). However, they had

a smaller difference in the second targets (0.404 dB).

Figure 32. The differences in intensity (the referent minus the speakers‟ data)

5.4.3 Conclusion

Page 100: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 100

We defined long diphthongs as the vocalic glides found before voiced (fortis) consonants,

and short as those found before voiceless (lenis) consonants. Therefore, the analysis in this

chapter points to the conclusion that the Serbian speakers had not mastered the degree of

intensity in the vowels of English when the vowels were found before voiced and voiceless

consonants. Table 44 shows that the students were more mismatched in intensity (when

compared to the native speaker) in the second targets of the short diphthongs, that is, the

diphthongs followed by a lenis. The results showed that the Serbian speakers had a very small

difference, less than 1 dB, in the following targets: /əʊs_2/, /ɪəl_1/, /ɛəl_1/, /ɑʊl_1/, /ɔɪl_2/,

/ɪəs_1/, /aɪl_1/, and /ɔɪl_1/.

Table 44

The differences in pitch and intensity

between the RP and the Serbian data

(shown by the target pairs)

Targets Pitch (Hz

difference)

Intensity

(dB

difference)

ɑʊl_1 13.88 -0.21

ɑʊl_2 -7.02 3.47

ɑʊs_1 9.88 1.73

ɑʊs_2 -8.24 1.55

aɪl_1 17.22 0.14

aɪl_2 -14.35 2.56

aɪs_1 29.34 1.25

aɪs_2 -9.53 6.57

ɛəl_1 4.25 -0.42

ɛəl_2 -6.12 3.69

ɛəs_1 -9.92 2.1

ɛəs_2 -5.1 -7.89

eɪl_1 6.14 1.36

eɪl_2 -6.75 -1.1

eɪs_1 20.24 1.12

eɪs_2 -14.11 1.67

ɪəl_1 3.5 -0.43

ɪəl_2 -14.83 2.93

ɪəs_1 -2.28 0.09

ɪəs_2 -31.88 1.23

əʊl_1 9.95 2.52

əʊl_2 -15.56 -4.12

əʊs_1 14.78 1.98

əʊs_2 -6.61 -0.86

ɔɪl_1 -13.37 0.83

ɔɪl_2 -2.18 -0.16

ɔɪs_1 25.23 2.37

ɔɪs_2 1.61 -6.4

ʊəl_1 -5.35 3.54

ʊəl_2 -17.75 -5.14

ʊəs_1 11.64 -1.52

ʊəs_2 -12.21 2.7

Page 101: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 101

5.5 Pitch

Pitch, just like intensity, has not been discussed in this paper in details, but we do have a list

of things to examine. One is that “high vowels have a somewhat higher fundamental

frequency on the average than low vowels” (Kent 95). In addition, we will see the overall

difference between the data gathered from the native speaker and from the Serbian speakers.

The perceptual side of pitch is not discussed.

The notes about intensity from the previous chapter apply to pitch as well: we

measured the pitch values in the same points where formants were measured. The graphs also

follow the same principle: they show the eight diphthongs, while the targets and the lengths

were labelled on the same line.

5.5.1 Overall Pitch

Table 45

Pitch in the referent data (mean column rising)

Vowel N. Min. Max. Mean

ə 8 165.17 238 193.92

ɪ 8 164.25 229.08 199.27

ʊ 6 172.23 227.64 199.95

ɛ 2 213.93 226.26 220.09

ɑ 2 222.87 229.85 226.36

ɔ 2 217.99 245.81 231.9

e 2 225.47 242.99 234.23

a 2 232.54 240.59 236.56

Table 45 was created by taking vowels from their targets within the diphthongs. It shows that

in our case high vowels have on average lower fundamental frequency. However, we should

note that the number of calculated samples is not the same (8, 6 or 2) and is generally low for

a firm conclusion.

The lowest pitch was measured in the mid-central /ə/, while the highest value was

recorded in /a/. The high (close) vowels /ɪ/ and /ʊ/ were two notches higher than the mid-

central vowel, with a 9 Hz difference from the lowest value.

Page 102: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 102

Table 46

Pitch in the corpus data (mean column rising)

The data from the Serbian speakers (table 46) also shows that /ɪ/ and /ʊ/ are just below schwa,

the sound with the lowest fundamental frequency. The native speaker had the following order

of mean pitch in vowels: /ə/, /ɪ/, /ʊ/, /ɛ/, /ɑ/, /ɔ/, /e/, and /a/; the Serbian speakers:/ə/, /ɪ/, /ʊ/,

/a/, /ɑ/, /e/, /ɛ/, and /ɔ/. The order in the low range is identical: close vowels and schwa (/ə/,

/ɪ/, and /ʊ/). The Serbian speakers had the highest pitch in /e/, /ɛ/ and /ɔ/, while the native

speaker had the fundamental highest in /ɔ/, /e/ and /a/.

The biggest difference is in the highest values: /a/ versus /ɔ/. The native speaker

pronounced the front-open vowel with the highest pitch, while the Serbian speakers had the

highest pitch in the mid-back vowel.

Vowel N. Min. Max. Mean

ə 8 184.73 223.71 201.82

ɪ 8 178.6 231.36 204.78

ʊ 6 187.79 230.08 205.14

a 2 211.25 215.32 213.28

ɑ 2 212.99 215.97 214.48

e 2 219.33 222.75 221.04

ɛ 2 222.01 223.85 222.93

ɔ 2 220.58 231.36 225.97

Page 103: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 103

Figure 33. Pitch in the referent data

Figure 33 shows pitch measured in the native speaker data. All first targets (the triangles) are

on the right side, with higher pitch, while the second targets (the squares) are on the lower,

left, side of the graph. This shows that pitch within the diphthongs was falling in the direction

of the second target.

Table 47

The targets and pitch in

the native speaker data

Targets Pitch

aɪl_2 164.25

ɪəs_2 165.17

ʊəl_2 168.25

əʊl_2 172.23

ɪəl_2 174.39

ʊəs_2 177.28

ɛəl_2 178.61

eɪs_2 180.12

ɑʊl_2 181.53

ɔɪl_2 184.14

aɪs_2 188.72

ɑʊs_2 196.73

Page 104: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 104

əʊs_2 196.84

eɪl_2 202.87

ɛəs_1 213.93

ɛəs_2 216.01

ɔɪl_1 217.99

ɔɪs_2 218.52

ɑʊs_1 222.87

ʊəl_1 224.73

eɪl_1 225.47

ɛəl_1 226.26

ɪəl_1 226.48

ʊəs_1 227.64

ɪəs_1 229.08

ɑʊl_1 229.85

aɪl_1 232.54

əʊl_1 233.66

əʊs_1 238

aɪs_1 240.59

eɪs_1 242.99

ɔɪs_1 245.81

Table 47 shows pitch in numbers: it falls as the values approach the second targets. The table

below shows the same relationship between pitch and the diphthong targets, this time in the

Serbian data.

Table 48

The targets and pitch in

the Serbian data

Target Pitch

aɪl_2 178.6

ɛəl_2 184.73

ʊəl_2 186

ɔɪl_2 186.32

əʊl_2 187.79

ɑʊl_2 188.55

ɪəl_2 189.22

ʊəs_2 189.49

eɪs_2 194.23

ɪəs_2 197.05

aɪs_2 198.25

əʊs_2 203.45

ɑʊs_2 204.97

eɪl_2 209.62

aɪs_1 211.25

ɑʊs_1 212.99

aɪl_1 215.32

ɑʊl_1 215.97

ʊəs_1 216

ɔɪs_2 216.91

eɪl_1 219.33

ɔɪs_1 220.58

ɛəs_2 221.11

ɛəl_1 222.01

eɪs_1 222.75

ɪəl_1 222.98

əʊs_1 223.22

əʊl_1 223.71

ɛəs_1 223.85

ʊəl_1 230.08

Page 105: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 105

ɪəs_1 231.36

ɔɪl_1 231.36

Page 106: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 106

Standard deviation for pitch in Serbian speakers‟ data was 16.02 Hz, while in the native data

that number was 26.15 Hz. Once again, just like in the temporal domain, the native speaker

uses a more diversified range in pronunciation. Considering the fact that we have only one

native speaker as the reference, compared to 15 Serbian students in research, and that it is

very difficult to give any precise numbers, we will try to conclude by finding a possible

rationale in the pitch differences. Figure 34 shows such differences, and the values were

calculated by subtracting student‟s data from referent data.

Figure 34. Pitch in the Serbian speakers

The differences are distributed between the maximum of 29.34 Hz and the minimum of -

31.88 Hz. Thirteen diphthong targets had a positive difference, which means that the Serbian

speakers in those instances had lower fundamental frequency. Except for /ɪ/ in the short

version of /ɔɪ/ (ɔɪs_2), all other targets belong to the first element of the diphthongs. The

values below zero, which correspond to the values in the Serbian corpus being higher than the

Page 107: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 107

native speaker data, were registered in 19 targets. Out of those 19 targets, four belong to the

first targets, while the rest of them are in the second target. In addition, the highest positive

differential values were observed in the first targets (top of the y-axis), while the second

targets have the highest negative differential values (bottom of y-axis).

Figure 35. The differences in pitch (referent minus speaker)

5.5.2 Conclusion

Pitch is often more interesting and conclusive at the level of units larger than speech

sounds. Further, we limited data measurement not only to a small fraction of words such as

diphthongs, but we narrowed the data more by selecting only two points for measurement

(the representative formant vowel points in the recording).

However, we proved that both Serbian speakers and the native speaker had higher

pitch for the first targets of the diphthongs. The differences in pitch (figure 34) might seem

Page 108: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 108

significant, but they are linear. With the overall data and measurements in corpora, we can

conclude that the Serbian speakers were successful in using fundamental frequency (pitch,

accent) in a way that is very similar to the native speaker‟s strategy.

6. Conclusion

In the classification of the diphthongs we made a distinction between the fortis and the lenis

variants of the eight selected diphthongs (/eɪ/, /aɪ/, /ɔɪ/, /əʊ/, /ɑʊ/, /ɪə/, /ɛə/, /ʊə/), so each

diphthong had a short and a long form. Within each variant, we had two targets, or the

constituent vowels. A special notation was introduced, and every diphthong variant was

examined as a separate diphthong.

The length of diphthongs was discussed in two ways: in terms of their absolute time

within a word, and in terms of their ratio within a word. When we compared the Serbian data

with the native speaker data, we saw that most of the diphthongs were shorter in the

pronunciation of the native speaker. The average length for the native speaker was 0.12 s

(0.25 for the long, 0.15 for the short), and for the students 0.23 s (also 0.25 s for the long,

0.21 s for the short). All long variants of the diphthongs in the Serbian speakers‟

pronunciation were closer to the length of the RP speaker (the ratio ranged from 85% to

112%), while the short variants showed lower ratio (63% – 79%) and never reached the

length measured in the native speaker‟s data. However, when we compared the ratios

diphthongs had in the words, the results were better for the short diphthong (the absolute

difference was between 0.42% and 8%), then for the long (7% –16%). The Serbian speakers

had the best diphthong length for the long /əʊ/, /ɔɪ/, and /aɪ/, and the worst for the short /eɪ/,

/əʊ/, and /ɑʊ/. They achieved the best ratio in the short diphthongs /aɪ/, /ɛə/, and /əʊ/, while

the greatest difference was in the long variants of /əʊ/, /ɪə/, and /aɪ/. We also measured the

length of the words, and the results showed lower values in the native speaker pronunciation.

The formant values were examined within the vocal space. We introduced the

magnitude, or the length between the diphthongal targets in the F1/F2 coordinates, which

approximately corresponds to the degree of movement of the articulators. The students had

good results for both variants of /əʊ/, and the long versions of /eɪ/ and /ɔɪ/ (less than 60 Hz in

the Euclidean distance), while all other diphthongs had greater differences in magnitude.

Most of the diphthongs (84%) had a negative value, which means that the movement within

the vocal space was greater in the Serbian speakers. When we compared the English vocal

space (from both the native speaker and the Serbian speakers) with the Serbian vocal space,

Page 109: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 109

the graphs indicated that the long vowels in Serbian might exerted an influence on the

English vocal space in the students.

In examining the formant target values, we decided to analyse all of the target

differences in corpora and then compare the results. First, we calculated an overview (5.3.1)

of the expected differences for the formants between the targets. This was then used to verify

the results and to comment on the students‟ pronunciation. The Serbian students had the

expected formant differences, that were also found in the native speaker, for /ɑʊl/, /ɛəl/, /eɪs/,

/ɪəs/, and /ɔɪs/; these diphthong variants had the same76

F1, F2, and F3 values. Good results

(the same F1 and F2 difference) were measured for /ɑʊs/, /aɪl/, /aɪs/, /eɪl/, /əʊl/, /əʊs/, and

/ɔɪl/. The diphthongs that posed the greatest difficulties to the students were /ɛəs/, /ɪəl/, /ʊəl/,

and /ʊəs/: these variants had the mismatches in the target differences. The diphthongs /ɛə/ and

/ɪə/ were found in the group with both the best and the worst results, which indicates that the

length of diphthongs had an important role in formant achievement.

The formant values for each constituent vowel were not discussed77

, but we can

comment on the measurements. The students had the biggest differences in F1 for /a/ (-107

Hz), /ɑ/ (67 Hz) and /ɛ/ (57 Hz), while other vowels shared less than 50 Hz difference. F2

formants had higher differences, with /ɛ/ (-294 Hz), /ʊ/ (-144 Hz), and /ɪ/ (107 Hz) on the top

of the list. F3 values showed very high discrepancies, with most vowels being between -210

Hz and -350 Hz, while only /a/ had a positive difference (58 Hz). In /ʊ/ all values were higher

in the Serbian speakers (/ɔ/ also had very high F3), which could be due to less rounding in

their pronunciation.

Intensity, a parameter that correlates with loudness, was higher for the first targets

throughout the corpora, and the results indicate that loudness was equally distributed within

the targets in the RP and Serbian corpus. The only part where we expected lower results,

following the pattern of the native speaker‟s data, was for the second targets in the Serbian

data: the students and the native speaker had the same intensity (72-73 dB), which cannot be

said for the first targets. The conclusion is that the Serbian speakers were faster (form the RP

speaker) to lower the intensity at the end of the diphthongs.

Pitch was, as expected, falling as the articulation was nearing the second target. The

students had higher pitch in the first targets. At the lowest differences in pitch (it neared zero

around /ɔɪ/), the values entered the negative region (the students had higher values in the

second targets).

76

Here, “the same” refers to the positive or negative formant difference between the targets (5.3.2). 77

However, an overview is given in 7.4.

Page 110: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 110

7. Appendices

7.1 Questionnaire Results

This section contains the responses the students gave after filling in the questionnaire. The

responses were filled in a spreadsheet and then exported as tab-delimited text files. The

tabular data was imported in R, for which the summary() command produced the following

output:

Initials Sex YearOfBirth UniYear Group BornInNS Spent15inNS ag : 1 f:49 Min. :1987 Min. :1 a :12 no :34 no :38 ak : 1 m: 7 1st Qu.:1991 1st Qu.:1 b :16 yes:22 yes:18 ao : 1 Median :1991 Median :1 c :11 bz : 1 Mean :1991 Mean :1 d : 4 ddj : 1 3rd Qu.:1991 3rd Qu.:1 g : 8 de : 1 Max. :1992 Max. :1 v : 3 (Other):50 NA's :2 NA's: 2 MotherTongue OtherLanguage YrsStudyingEngl PrivateClasses ESCVisited Hungarian: 5 no :48 Min. : 5.00 a : 6 Canada: 1 Russian : 1 English : 2 1st Qu.: 9.00 b :12 GB : 1 Serbian :50 German : 1 Median :10.00 c :21 no :49 Hungarian: 1 Mean :10.55 no :15 USA : 3 Italian : 1 3rd Qu.:12.00 NA's: 2 NA's : 2 Russian : 1 Max. :15.00 (Other) : 2 NA's : 1.00 ESCAge ESCVisitedSpent ESCStay ESCYrStarted ESCRegStayM Min. :15.0 Min. : 0.500 no :55 Mode:logical Mode:logical 1st Qu.:16.0 1st Qu.: 0.875 NA's: 1 NA's:56 NA's:56 Median :17.0 Median : 2.000 Mean :16.6 Mean : 4.125 3rd Qu.:17.0 3rd Qu.: 5.250 Max. :18.0 Max. :12.000 NA's :51.0 NA's :52.000

UniYear – the year a student enrolled at the university;

Group – the student‟s lecture group;

PrivateClasses – the time a student attended language classes out of school, where a

represents up to a year, b 2 to 5 years, and c more than 5 years;

ESCVisited – an English speaking country that a student visited;

ESCAge – how old the students were when they visited an English speaking country;

ESCVisitedSpent – the number of months stayed in an English speaking country;

Page 111: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 111

ESCStay, ESCYrStarted and ESCRegStayM – the entries denoting a regular visits to

an English speaking country, student‟s age when the visits started and months spent in

the country. The last three items had no positive entries, so they were excluded from

the overview.

7.2 Programming and Scripting

7.2.1 Programming in R

The R environment “is designed around a true computer language, and it allows users to add

additional functionality by defining new functions”.78

Most of the calculations in this paper

were done by defining new function in accordance with our needs.

Here is an example of code that draws F1/F2 plots in the figures used in this paper:

DrawPlotFormants <-function(st=T, title='Test graph', sdata){ # # Draw F1/F2 plane # if (st == T){ f2max <- 2600 f1max <- 900 f2min <- 1000 f1min <- 300 warning('Using default max/min values for plot axes.') } else if (st == 'data') { f2max <- max(sdata$f2) f1max <- max(sdata$f1) f2min <- min(sdata$f2) f1min <- min(sdata$f1) } else if (st == 'markovic'){ f2max <- 2750 f1max <- 940 f2min <- 750 f1min <- 300 } plot(f2max, f1max, type = 'n', axes = 'T', xlab = 'F2 (Hz)', ylab = 'F1 (Hz)', main = title, xlim = rev(c(f2min, f2max)),

78

The R Project documentation

Page 112: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 112

ylim = rev(c(f1min, f1max))) }

Once the initial plot was drawn on the screen, other elements were placed (points, IPA signs,

arrows, and lines).

This code is an example of the difference calculations:

GetFPIDiffs <- function(fpidata.ref, fpidata.cor){ # # Calculate the diffrences and return the resulting # data frame. # crows <- c('f1', 'f2', 'f3', 'pitch', 'intensity') res.diff <- fpidata.ref[,crows] - fpidata.cor[,crows] res.diff[, c('ascii', 'ipa')] <- fpidata.ref[, c('ascii', 'ipa')] return(res.diff) }

This snippet calculates the Euclidean distance for the given coordinate values:

euc.dist2 <- function(x1, y1, x2, y2){ # # Euclidean distance for two dimensions. # return(sqrt((x1-x2)^2+(y1-y2)^2)) }

There are many other examples of the code for this paper, divided into several files.

7.2.3 Python

Python is an object-oriented multiparadigm scripting programming language. It is widely

used in the IT industry, as well as in academia (for example, NLTK Toolkit, a suite for natural

language processing, is written in Python).

Python was used as an aid in the process of creating this paper. As we already noted,

two custom programs were written: for searching the diphthongs within a needed context, and

for recording the corpus.

The code for searching diphthongs loaded all the words form the databases, filtered

them according to the rules supplied, and, finally, selected only the words matching the

syllable count. The program produced about 100 words for our needs. However, not all words

could be used in the corpus, because only several diphthongs had a matching pair (i.e. bait,

Page 113: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 113

babe). Here is an example of the program call to search for the words containing the

diphthong /eɪ/ within the monosyllabic words, before plosives /b/ or /d/, and after /p/ or /t/:

before = ('b', 'd'), after = ('p', 't'), diphthongs = ('ay',) syllable = 0

Two small utilities were also used in the process. The first was a quick corpus filer. We had

16 files to segment, each containing 32 sentences (which accounted for 64 tokens per

speaker, or 1024 in all). During lengthy segmentation sessions it was easy to lose track of

what was seen or heard, especially since we used two notations. A Python with Tk interface

was used to type in a part of string and get all needed information: the word, the string to be

used in segmentation, ASCII transcription, IPA transcription and a duration note.

Figure 36. Filtering the corpus: only short

diphthongs returned of the diphthong /oy/

Figure 37. Filtering the corpus: all long forms returned

The second small utility was used in “quality assurance”. With so many bits of information

combined together in 16 TextGrids and no room for error, we wrote a Python script to check

all the TextGrids and make sure that all have correct words, diphthongs, length and target

Page 114: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 114

marks, and that everything matches. Here is a sample of the report after the check was

successfully finished:

~$ python Researh/scripts/textgridchecker/checker.py STARTING TEXTGRID CHECKS The path is /media/data/corpus/speaker-reference. The number of text grids is 1. Starting the loop... Checking file 16-speaker-hh.TextGrid Checking proper tier names... Checking if the tiers contain 32 nonempty items... Checking that the number of empty and nonempty tiers is the same... Checking if all tiers have valid text... Checking if the diphthongs have pairs... Checking if all words are present... Checking if the words and diphthongs match... Checking if the number of intervals is 64... Checking if intervals match diphthongs... OK 16-speaker-hh.TextGrid

7.2.3 Praat

Praat is a highly scriptable program. We used this option to automate the measurements for

the paper. After executing the script, Praat loaded one file after another and calculated the

values for formants, pitch and intensity, as well as the length of the words and the

diphthongs. The results were saved in tabulated text files and later used in the analysis.

Below is a short sample from the script. As explained in Methods, each speaker was

assigned different maximum calculation frequency: this code made sure that the settings were

applied and that an error was reported if there was an unrecognized file.

procedure MakeFormants name$ # # Make formant table. # print Running procedure MakeFormants... 'newline$' select Sound 'name$' print 'tab$'Making formant file for the spreaker...'newline$' # Each signal file requires different # settings for the formants. The values are # obtained by examening the spectrograms and # waveforms. if name$ = "16-speaker-hh" To Formant (burg)... 0 3 3400 0.025 50 elif name$ = "15-speaker-sr" To Formant (burg)... 0 3 3600 0.025 50 elif name$ = "14-speaker-ao" To Formant (burg)... 0 3 3500 0.025 50 elif name$ = "13-speaker-jr" To Formant (burg)... 0 3 3900 0.025 50 elif name$ = "12-speaker-vv"

Page 115: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 115

To Formant (burg)... 0 3 3600 0.025 50 elif name$ = "11-speaker-dz" To Formant (burg)... 0 3 3600 0.025 50 elif name$ = "10-speaker-st" To Formant (burg)... 0 3 3500 0.025 50 elif name$ = "09-speaker-tz" To Formant (burg)... 0 3 3770 0.025 50 elif name$ = "08-speaker-ni" To Formant (burg)... 0 3 3650 0.025 50 elif name$ = "07-speaker-lc" To Formant (burg)... 0 3 3600 0.025 50 elif name$ = "06-speaker-ip" To Formant (burg)... 0 3 3700 0.025 50 elif name$ = "05-speaker-gl" To Formant (burg)... 0 3 3500 0.025 50 elif name$ = "04-speaker-ym" To Formant (burg)... 0 3 3800 0.025 50 elif name$ = "03-speaker-im" To Formant (burg)... 0 3 3800 0.025 50 elif name$ = "02-speaker-jj" To Formant (burg)... 0 3 3400 0.025 50 elif name$ = "01-speaker-jk" To Formant (burg)... 0 3 3600 0.025 50 else # No speaker ID is found. Abort. exit "Error: Invalid speaker ID in the formant calculations." endif select Formant 'name$' print 'tab$'Making table...'newline$' Down to Table... 0 1 6 1 3 1 3 1 print 'tab$'Done.'newline$' endproc

7.3 Log Files from the Recording Sessions

The sentences during the recording sessions appeared in random order, so we needed to

implement a system to keep track of what was actually said: we could have not relied solely

on our hearing to decide about the word spoken79

.

For each student a unique log file was created, and filled automatically during the

recording. The heading of the file contains general information to identify the speaker

(labelling was similar to signal file naming). In the example supplied, the random time

display is set between 1 s and 4 s. Time scale refers to a fraction of second (here, the scale is

0.1 second).

The two numerical columns in the sentence list represent times. As indicated below,

when sentence “The word bite is spoken” was shown, the student JJ had to wait 3.1 seconds

79

We were bound to make a mistake in that case. For example, many students pronounced joys and Joyce

identically.

Page 116: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 116

before she was allowed to read the sentence aloud and go to the next sentence. The time when

the next sentence was shown is indicated in the final column.

# Started at: 2011-03-09-09:27 # Ended at: 2011-03-09-09:32 # Speaker: jj # Random time: 1-4 # Time scale: 10 Sentence t(1/SCALE s) Next The word "bows" is spoken. 36.82 09:27:45:77 The word "fierce" is spoken. 11.08 09:28:07:11 The word "dies" is spoken. 33.34 09:28:12:96 The word "babe" is spoken. 17.34 09:28:20:72 The word "douse" is spoken. 32.13 09:28:26:53 The word "fears" is spoken. 18.40 09:28:34:49 The word "bourse" is spoken. 34.30 09:28:41:14 The word "bite" is spoken. 31.24 09:28:55:77 The word "joke" is spoken. 25.03 09:29:03:80 The word "idiom" is spoken. 12.70 09:29:10:40 The word "bide" is spoken. 36.72 09:29:15:41 The word "bode" is spoken. 24.39 09:29:22:77 The word "doubt" is spoken. 19.24 09:29:28:83 The word "gouge" is spoken. 38.75 09:29:34:56 The word "doit" is spoken. 35.23 09:29:41:48 The word "dare" is spoken. 35.20 09:29:48:98 The word "theirs" is spoken. 11.04 09:29:57:50 The word "dice" is spoken. 21.00 09:30:02:33 The word "toyed" is spoken. 20.06 09:30:08:31 The word "Joyce" is spoken. 34.95 09:30:14:52 The word "abjured" is spoken. 21.96 09:30:22:42 The word "Job" is spoken. 24.68 09:30:29:77 The word "daze" is spoken. 17.95 09:30:35:78 The word "graduate" is spoken.14.60 09:30:42:35 The word "dais" is spoken. 36.80 09:30:48:65 The word "thereto" is spoken. 27.17 09:31:03:38 The word "boat" is spoken. 14.38 09:31:21:18 The word "idiot" is spoken. 30.08 09:31:27:76 The word "daresay" is spoken. 39.28 09:31:35:25 The word "joys" is spoken. 16.09 09:31:43:97 The word "bait" is spoken. 14.48 09:31:50:27 The word "gourd" is spoken. 37.18 09:31:56:73

7.4 The Result Tables

The following data was used for the tables and the figures in this paper. The referent data

consist of one source only, since we had only one native speaker. The data from the Serbian

speakers was aggregated by the diphthongs while mean() function was applied.

The data as measured in the recordings of the native speaker (formants, pitch, intensity):

> res.fpi.ref

Page 117: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 117

ascii ipa f1 f2 f3 pitch intensity 1 aw_l_1 ɑʊl_1 900.96 1600.10 2513.43 229.85 79.06 2 aw_l_2 ɑʊl_2 373.61 1082.59 2660.18 181.53 76.48 3 aw_s_1 ɑʊs_1 882.82 1713.08 2614.59 222.87 80.01 4 aw_s_2 ɑʊs_2 407.47 1049.73 2556.11 196.73 73.82 5 ay_l_1 aɪl_1 709.66 1295.36 2937.75 232.54 78.78 6 ay_l_2 aɪl_2 340.61 2398.04 2745.02 164.25 74.13 7 ay_s_1 aɪs_1 691.64 1483.24 2805.72 240.59 80.44 8 ay_s_2 aɪs_2 381.62 2183.01 2564.82 188.72 79.02 9 ea_l_1 ɛəl_1 624.95 1952.69 2772.51 226.26 79.05 10 ea_l_2 ɛəl_2 503.11 1845.72 2588.02 178.61 75.18 11 ea_s_1 ɛəs_1 546.69 1865.31 2653.66 213.93 80.72 12 ea_s_2 ɛəs_2 644.86 1861.04 2711.67 216.01 66.89 13 ey_l_1 eɪl_1 541.93 2127.70 2711.26 225.47 80.37 14 ey_l_2 eɪl_2 367.37 2271.71 2695.05 202.87 72.86 15 ey_s_1 eɪs_1 523.18 2267.88 2717.46 242.99 80.43 16 ey_s_2 eɪs_2 441.16 2447.74 2727.33 180.12 73.07 17 ia_l_1 ɪəl_1 396.68 2375.88 2770.50 226.48 77.40 18 ia_l_2 ɪəl_2 393.70 1776.79 2556.76 174.39 75.97 19 ia_s_1 ɪəs_1 447.10 2299.10 2707.10 229.08 77.62 20 ia_s_2 ɪəs_2 473.74 1879.49 2646.61 165.17 73.64 21 ow_l_1 əʊl_1 519.23 1558.91 2670.72 233.66 82.24 22 ow_l_2 əʊl_2 309.78 1450.19 2664.93 172.23 68.30 23 ow_s_1 əʊs_1 547.64 1620.78 2624.89 238.00 82.19 24 ow_s_2 əʊs_2 301.90 1498.36 2537.22 196.84 70.11 25 oy_l_1 ɔɪl_1 487.11 1137.87 2583.58 217.99 80.12 26 oy_l_2 ɔɪl_2 354.01 2345.93 2799.81 184.14 73.71 27 oy_s_1 ɔɪs_1 500.78 1402.66 2624.88 245.81 81.83 28 oy_s_2 ɔɪs_2 339.26 2150.55 2744.04 218.52 66.52 29 ua_l_1 ʊəl_1 450.53 1184.66 2597.44 224.73 82.46 30 ua_l_2 ʊəl_2 301.29 1607.43 2846.56 168.25 67.36 31 ua_s_1 ʊəs_1 455.22 1177.90 2644.63 227.64 76.51 32 ua_s_2 ʊəs_2 459.01 1295.34 2572.32 177.28 74.13

The data as measured and aggregated in the recordings of the Serbian speakers (formants,

pitch, intensity):

> res.fpi.cor ascii ipa f1 f2 f3 pitch intensity 1 aw_l_1 ɑʊl_1 823.07 1542.39 2766.15 215.97 79.27 2 aw_l_2 ɑʊl_2 411.39 1405.78 2947.53 188.55 73.01 3 aw_s_1 ɑʊs_1 826.65 1606.43 2782.68 212.99 78.28 4 aw_s_2 ɑʊs_2 451.55 1320.59 2964.06 204.97 72.27 5 ay_l_1 aɪl_1 792.74 1387.43 2810.99 215.32 78.64 6 ay_l_2 aɪl_2 410.31 2249.06 2953.14 178.60 71.57 7 ay_s_1 aɪs_1 822.62 1460.32 2817.31 211.25 79.19 8 ay_s_2 aɪs_2 394.89 2321.73 2994.74 198.25 72.45 9 ea_l_1 ɛəl_1 513.60 2269.92 3027.45 222.01 79.47 10 ea_l_2 ɛəl_2 503.95 1865.32 2762.04 184.73 71.49 11 ea_s_1 ɛəs_1 544.20 2135.42 3005.13 223.85 78.62 12 ea_s_2 ɛəs_2 452.66 1917.18 2884.92 221.11 74.78 13 ey_l_1 eɪl_1 491.20 2367.65 3024.48 219.33 79.01 14 ey_l_2 eɪl_2 364.49 2589.64 3100.04 209.62 73.96 15 ey_s_1 eɪs_1 550.48 2209.54 2947.11 222.75 79.31 16 ey_s_2 eɪs_2 354.72 2593.12 3170.21 194.23 71.40 17 ia_l_1 ɪəl_1 388.98 2523.25 3092.60 222.98 77.83 18 ia_l_2 ɪəl_2 463.24 1797.84 2870.67 189.22 73.04 19 ia_s_1 ɪəs_1 407.16 2489.33 3079.12 231.36 77.53

Page 118: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 118

20 ia_s_2 ɪəs_2 433.86 1970.33 2920.09 197.05 72.41 21 ow_l_1 əʊl_1 542.77 1530.73 2869.22 223.71 79.72 22 ow_l_2 əʊl_2 373.77 1301.03 2918.70 187.79 72.42 23 ow_s_1 əʊs_1 546.56 1573.09 2865.86 223.22 80.21 24 ow_s_2 əʊs_2 386.32 1285.96 2884.91 203.45 70.97 25 oy_l_1 ɔɪl_1 492.95 1121.82 2958.97 231.36 79.29 26 oy_l_2 ɔɪl_2 373.30 2274.74 2949.96 186.32 73.87 27 oy_s_1 ɔɪs_1 528.70 1300.17 2949.40 220.58 79.46 28 oy_s_2 ɔɪs_2 373.29 2290.28 2963.96 216.91 72.92 29 ua_l_1 ʊəl_1 452.97 1493.49 2873.92 230.08 78.92 30 ua_l_2 ʊəl_2 482.56 1778.85 2849.65 186.00 72.50 31 ua_s_1 ʊəs_1 456.09 1501.36 2891.15 216.00 78.03 32 ua_s_2 ʊəs_2 435.39 1775.04 2889.89 189.49 71.43

The data as measured and aggregated in the recordings of the native speaker (the formant

ranges, mean values across data):

> res.stats.ref vowel count f1min f1max f1sd f1mea f2min f2max f2sd f2mea 1 ɑ 2 882.82 900.96 12.83 891.89 1600.10 1713.08 79.89 1656.59 3 a 2 691.64 709.66 12.74 700.65 1295.36 1483.24 132.85 1389.30 5 ɛ 2 546.69 624.95 55.34 585.82 1865.31 1952.69 61.79 1909.00 7 e 2 523.18 541.93 13.26 532.55 2127.70 2267.88 99.12 2197.79 8 ɔ 2 487.11 500.78 9.67 493.94 1137.87 1402.66 187.23 1270.26 2 ʊ 6 301.90 455.22 66.93 383.08 1049.73 1498.36 189.12 1240.57 4 ɪ 8 339.26 447.10 42.21 383.48 2150.55 2447.74 103.84 2308.99 6 ə 8 301.29 644.86 102.60 480.32 1295.34 1879.49 200.29 1680.69 f3min f3max f3sd f3mea intmin intmax intsd intmea pchmin pchmax pchsd 1 2513.43 2614.59 71.53 2564.01 79.06 80.01 0.67 79.53 222.87 229.85 4.94 3 2805.72 2937.75 93.36 2871.73 78.78 80.44 1.17 79.61 232.54 240.59 5.69 5 2653.66 2772.51 84.04 2713.09 79.05 80.72 1.18 79.88 213.93 226.26 8.72 7 2711.26 2717.46 4.38 2714.36 80.37 80.43 0.04 80.40 225.47 242.99 12.39 8 2583.58 2624.88 29.20 2604.23 80.12 81.83 1.21 80.97 217.99 245.81 19.67 2 2537.22 2664.93 54.94 2610.09 68.30 82.46 5.09 74.61 172.23 227.64 22.40 4 2564.82 2799.81 70.75 2719.21 66.52 79.02 3.92 74.29 164.25 229.08 23.74 6 2556.76 2846.56 94.14 2652.19 66.89 82.24 5.75 74.70 165.17 238.00 30.21 pchmea 1 226.36 3 236.56 5 220.09 7 234.23 8 231.90 2 199.95 4 199.27 6 193.92

The data as measured and aggregated in the recordings of the Serbian speakers (the formant

ranges, mean values across data):

Page 119: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 119

> res.stats.cor vowel count f1min f1max f1sd f1mea f2min f2max f2sd f2mea f3min 1 ɑ 2 823.07 826.65 2.53 824.86 1542.39 1606.43 45.28 1574.41 2766.15 3 a 2 792.74 822.62 21.13 807.68 1387.43 1460.32 51.54 1423.88 2810.99 5 ɛ 2 513.60 544.20 21.64 528.90 2135.42 2269.92 95.11 2202.67 3005.13 7 e 2 491.20 550.48 41.92 520.84 2209.54 2367.65 111.80 2288.60 2947.11 8 ɔ 2 492.95 528.70 25.28 510.83 1121.82 1300.17 126.11 1210.99 2949.40 2 ʊ 6 373.77 456.09 36.62 422.01 1285.96 1501.36 96.69 1384.70 2873.92 4 ɪ 8 354.72 410.31 20.13 383.39 2249.06 2593.12 146.83 2416.39 2949.96 6 ə 8 433.86 546.56 44.79 482.62 1530.73 1970.33 154.68 1776.05 2762.04 f3max f3sd f3mea intmin intmax intsd intmea pchmin pchmax pchsd pchmea 1 2782.68 11.69 2774.41 78.28 79.27 0.70 78.78 212.99 215.97 2.11 214.48 3 2817.31 4.47 2814.15 78.64 79.19 0.39 78.91 211.25 215.32 2.88 213.28 5 3027.45 15.78 3016.29 78.62 79.47 0.60 79.05 222.01 223.85 1.30 222.93 7 3024.48 54.71 2985.80 79.01 79.31 0.21 79.16 219.33 222.75 2.42 221.04 8 2958.97 6.77 2954.18 79.29 79.46 0.12 79.38 220.58 231.36 7.62 225.97 2 2964.06 36.40 2913.38 70.97 78.92 3.34 74.27 187.79 230.08 16.24 205.14 4 3170.21 83.07 3037.97 71.40 77.83 2.49 73.94 178.60 231.36 18.47 204.78 6 2920.09 46.18 2864.04 71.43 80.21 3.56 74.45 184.73 223.71 17.67 201.82

The formant values extracted from the diphthong targets and grouped by the vowels, for the

native speaker:

Vowel f1mean f2mean f3mean

ɑ 891.89 1656.59 2564.01

a 700.65 1389.3 2871.73

ɛ 585.82 1909 2713.09

e 532.55 2197.79 2714.36

ɔ 493.94 1270.26 2604.23

ʊ 383.08 1240.57 2610.09

ɪ 383.48 2308.99 2719.21

ə 480.32 1680.69 2652.19

For the Serbian speakers:

Vowel f1mean f2mean f3mean

ɑ 824.86 1574.41 2774.41

a 807.68 1423.88 2814.15

ɛ 528.9 2202.67 3016.29

e 520.84 2288.6 2985.8

ɔ 510.83 1210.99 2954.18

ʊ 422.01 1384.7 2913.38

ɪ 383.39 2416.39 3037.97

ə 482.62 1776.05 2864.04

The differences (RP minus Serbian data), F1 rising:

Page 120: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 120

Vowel f1 f2 f3

a -107.03 -34.58 57.58

ʊ -38.93 -144.13 -303.29

ɔ -16.89 59.27 -349.95

ə -2.3 -95.36 -211.85

ɪ 0.09 -107.4 -318.76

e 11.71 -90.81 -271.44

ɛ 56.92 -293.67 -303.2

ɑ 67.03 82.18 -210.4

7.5 The Euclidean Distance

Euclidean distance is a metric distance from point A to point B in a Cartesian system, and it

is derived from the Pythagorean Theorem. If a point p has the coordinates (p1, p2) and the

point q = (q1, q2), the distance between them is calculated using this formula:

Our Cartesian coordinate system was defined by F2 and F1 axes80

, and the metric distance

refers to the distance from one diphthong target to another. The vowel targets, corresponding

to A and B points were defined by the F1/F2 values in Hertz for a particular vowel. For

example, these are the targets within the long version of /ɑʊ/ in the Serbian speakers:

1 aw_l_1 ɑʊl_1 823.07 1542.39 2 aw_l_2 ɑʊl_2 411.39 1405.78

The point A (the first target /ɑ/) has the coordinates (1542, 823), the B (the second target /ʊ/)

has the coordinates (1405, 411). To illustrate the calculation we will use the simplified

versions of the coordinates in R, where A is (2, 2) and B is (8, 2)81

:

# Define the axes x <- 1:10 y <- 1:10 # Create the plot plot(x, y, type="n") # Place the points points(c(2,8), c(2,2), pch=c("A", "B")) # Draw the line

80

Where F2 is x and F1 y axis. 81

The plotted diphthongs are in 5.3.5

Page 121: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 121

lines(c(2,8), c(2,2)) # Calculate the distance using # the formula: sqrt((x1-x2)^2+(y1-y2)^2) distance <- sqrt((2-8)^2+(2-2)^2) # Print the distance print("The distance is:") print(distance)

This code will print this output:

[1] "The distance is:" [1] 6

And, a graph will be created, where we can see that the distance is 6:

Figure 38. Euclidean distance calculation example

Page 122: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 122

8. Bibliography

Audacity Development Team. Audacity (version 3.13) [Computer software].

< http://audacity.sourceforge.net/>

Belić, Aleksandar. Savremeni srpskohrvatski jezik, Deo 1, Glasovi i akcenat. Beograd:

Izdavačko preduzeće N. P. Srbije, 1948.

Boersma, P., and David Weenink. Praat: Doing Phonetics by Computer (version 5.2.18)

[Computer software].

< http://www.fon.hum.uva.nl/praat/>

Clark, John., and Collin Yallop. An Introduction to Phonetics and Phonology. Oxford, UK:

Blackwell, 1990.

Collins English Dictionary – Complete and Unabridged. 2003rd ed. HarperCollins

Publishers. Web. Nov 2010.

Cox, F. “The Acoustic Characteristics of /hVd/ Vowels in the Speech of Some Australian

Teenagers.” Australian Journal of Linguistics 26.2 (2006): 147–179.

Crystal, David. How Language Works. Penguin Books Ltd, 2008.

Didier, Demolin. “Control and Regulation of Speech Production.” Experimental Approaches

to Phonology. Ed. Maria-Josep Sole, Patrice Speeter Beddor, and Manjari Ohala.

Oxford: Oxford University Press, 2007.

Fellbaum, Christiane. WordNet: An Electronic Lexical Database [Database]. Cambridge,

MA: MIT Press, 1998.

Fox, John, and Stanford Weisberg. An {R} Companion to Applied Regression. Sage.

< http://socserv.socsci.mcmaster.ca/jfox/Books/Companion>

Garde, Paul. Naglasak. Zagreb: Školska knjiga, 1993.

Page 123: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 123

Gibaldi, Joseph, and Walter S. Achtert. MLA Handbook for Writers of Research Papers –

Fifth Edition. The Modern Language Association of America, 1999.

Gimson, A. An Introduction to the Pronunciation of English by A. C. Gimson. 2nd ed.

London: Edward Arnold, 1970.

Gimson, A.C. A Practical Course of English Pronunciation: A Perceptual Approach.

Hodder Arnold, 1975.

Harrington, J., and S. Cassidy. “Techniques in Speech Acoustics.” Computational

Linguistics 26.2 (2000): 294–295.

Harrington, Jonathan. Phonetic Analysis of Speech Corpora. Oxford: Wiley-Blackwell.

Web. Dec 2010.

< http://phonetik.uni-muenchen.de/~jmh/research/pasc010808/pasc.pdf>

Hillenbrand, J., and R. T Gayvert. “Vowel Classification Based on Fundamental Frequency

and Formant Frequencies.” Journal of Speech and Hearing Research 36.4 (1993): 694.

Johnson, Keith. Acoustic and Auditory Phonetics. 2nd ed. Malden, Mass: Blackwell Pub,

2003.

Jones, Daniel. An Outline of English Phonetics. 9th ed. Cambridge: Cambridge University

Press, 1975.

---. The Pronunciation of English. Cambridge University Press, 1956.

Kent, Raymond, and Charles Read. The Acoustic Analysis of Speech. San Diego: Singular,

1996.

Ladefoged, Peter. Elements of Acoustic Phonetics. 2nd ed. Chicago: University of Chicago

Press, 1996.

---. Phonetic Data Analysis: An Introduction to Fieldwork and Instrumental Techniques.

Malden, MA: Blackwell Pub, 2003.

Lass, Norman J. Experimental Phonetics. New York: MSS Information Corp, 1974.

Page 124: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 124

Laver, John. “Linguistic Phonetics.” The Handbook of Linguistics. Ed. Mark Aronoff and

Janie Rees-Miller. Malden, Mass: Blackwell Publishers, 2001.

Leather, Jonathan. “Second-Language Research: An Introduction.” Language Learning

49.S1 (1999): 1-56.

Lewis, Anthony. WordWeb Pro (Version 6) [Computer software]

<http://www.wordweb.info/>

Marković, Maja. Kontrastivna analiza akustičkih i artikulacionih karakteristika vokalskih

sistema engleskog i srpskog jezika. Diss. Novi Sad: University of Novi Sad, 2007.

Mathematics. New York: Macmillan Reference USA, 2002.

Martirosian, O., and M. Davel. “Acoustic Analysis of Diphthongs in Standard South African

English.” Nineteenth Annual Symposium of the Pattern Recognition Association of

South Africa (PRASA 2008), Cape Town, South Africa. 2008. 27–28.

Mitchell, Margaret, and Steven Bird. Natural Language Toolkit: TextGrid Analysis

[Computer software].

< http://nltk.googlecode.com/svn/trunk/nltk_contrib/nltk_contrib/textgrid.py>

O‟Connor, J. D. Phonetics. Harmondsworth: Penguin, 1973.

O‟Connor, Joseph. Better English Pronunciation. Cambridge, UK: Cambridge University

Press, 1967.

Odden, David Arnold. Introducing Phonology. Cambridge, UK: Cambridge University

Press, 2005.

Ogden, Richard. An Introduction to English Phonetics. Edinburgh University Press, 2009.

Petrović, Dragoljub, and Snežana Gudurić. Fonologija srpskog jezika. Beograd: Institut za

srpski jezik SANU, Beogradska knjiga, Matica srpska, 2010.

Pinker, Steven. Language Instinct. Penguin, 1995.

Page 125: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 125

Port, R. F. “Guidelines for Phonetics Project.” The Trustees of Indiana University. 1998. 4

Nov 2010.

< http://www.cs.indiana.edu/~port/teach/541/project.guidelines.html>

Python Software Foundation. Python (versions 2.7 and 3.1) [Computer software]. Python

Software Foundation.

< http://www.python.org>

R Development Core Team. R: A Language and Environment for Statistical Computing

(version 2.12.1) [Computer software]. R Foundation for Statistical Computing.

< http://www.R-project.org/>

R Development Core Team. R Data Import/Export.

< http://cran.r-project.org/doc/manuals/R-data.pdf>

Robinson, Tony. BEEP (British English Example Pronunciations) [Database]. University of

Cambridge. 1996. FTP. Oct 2010.

< ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/dictionaries/beep.tar.gz>

Rossini, Anthony J. et al. Emacs Speaks Statistics (ESS) (version 5.13) [Computer software]

< http://ess.r-project.org>

Sawicka, Irena. An Outline of the Phonetic Typology of the Slavic Languages. Toru :

Wydawn. Uniwersytetu M. Kopernika, 2001.

Stevenson, Angus. Oxford Dictionary of English [Database]. Oxford, OUP. 2010.

Van Son, R. “Can Standard Analysis Tools Be Used on Decompressed Speech?” COCOSDA

2002 Workshop of the International Committee for the Co-ordination and

Standardisation of Speech Databases and Assessment Techniques, Denver, Colorado.

2002. Web. Oct 2010.

<http://www.cocosda.org/meet/denver/COCOSDA2002-Rob.pdf >

Page 126: UNIVERSITY OF NOVI SAD - Language Bits · The Serbian language does not have diphthongs, while they are frequent and very important in English. In this paper, we discuss the physical

Mlinar 126

Stanojčić, ivojin, and Popović, Ljubomir. Gramatika srpskoga jezika: u enik za , II, III i

V razre sre nje škole. Beograd: Zavod za udžbenike i nastavna sredstva, 1995.

Stevens, Kenneth N. Acoustic Phonetics. Cambridge, Mass: MIT Press, 1998.

Trask, R. L. Language and Linguistics: The Key Concepts. 2nd ed. Abingdon, England:

Routledge, 2007.

Trask, R.L. A Dictionary of Phonetics and Phonology. 1st ed. Routledge, 1995.

Venables W. N., D. M. Smith and the R Development Core Team. An Introduction to R

Notes on R: A Programming Environment for Data Analysis and Graphics.

< http://cran.r-project.org/doc/manuals/R-intro.pdf>

Welch, Brent B. Practical Programming in Tcl & Tk. Upper Saddle River, NJ: Prentice Hall,

2000.

Ward, Grady. Moby Hyphenator II [Database]. University of Cambridge. 1993. FTP. Oct

2010.

< ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/dictionaries/moby/mhyph.tar.gz>

Ward, Ida. The Phonetics of English. Fifth Edition. Cambridge: Cambridge University

Press, 1972.

Weenin, David. Speech Signal Processing with Praat. Dec 2010.

< http://www.fon.hum.uva.nl/david/sspbook/sspbook.pdf>