Pavel Skrelin (Saint-Petersburg State University) Some Principles and Methods of Measuring Fo and...

25
Pavel Skrelin (Saint-Petersburg State University) Some Principles and Methods of Measuring Fo and Tempo
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of Pavel Skrelin (Saint-Petersburg State University) Some Principles and Methods of Measuring Fo and...

Pavel Skrelin (Saint-Petersburg State University)

Some Principles and Methods of Measuring Fo and Tempo

My main principle:

• Acoustic data that we retrieve from speech material for analysis should be connected with phonetic (linguistic) features, hence obtained values should reflect concrete features and have clear phonetic (linguistic) interpretation; used methods of calculations and classifications should take into account not only speech production but speech perception properties too.

Fo measurements

(phrase № 53 in reading and spontaneous speech, F>40)

Terms:

• Smoothing: Fo data is processed by rectangular window of 100 ms long with pitch-synchronous shift

• Correction: pitch marks are eliminated on voiced consonants + approximants, on voiced onsets, hesitations and voiced transitions between vowels and consonants.

Why the correction is needed on voiced

transitions between vowels and consonants This voiced transition does not affect the perceived vowel duration:

The whole group [kak'i tak'i]•[kaki] isolated -- original vowel length -- vowel without transition•[i] -- original length (66 ms) -- vowel without transition (37ms)

but affects the next consonant duration and Fo values:

Smoothed Fo data with pitch marks on voiced [i-t] transition

Smoothed Fo data without pitch marks on voiced [i-t] transition

Raw data: reading

Raw data: spontaneous speech

Smoothed data: reading

• Smoothed data: spontaneous speech

Smoothed Fo without laryngealization: reading

Smoothed Fo without laryngealization: spontaneous speech

Smoothed Fo without laryngealization and some consonants: reading

Smoothed Fo without laryngealization and some consonants: spont. speech

Fo measurementsRaw Fo Data Smoothed Fo Data Smoothed Fo

without laryngealization

Smoothed Fo without laryngealazation and some consonants

Reading Spont. Reading Spont Reading Spont Reading Spont

Average Fo (Hz) 200.9 210.1 204.5 210.8 206.5 214.1 210.2 218

Max Fo (Hz) 359 347 324 339 324 339 336 338

Min Fo (Hz) 62 51 110 51 128 135 128 126

Range (Hz) 297 296 214 288 196 204 208 212

St. deviation 56.8 58.8 53.1 54.5 52.1 51.6 54.2 53.4

Mean rise slope (Hz/s)

839 951 268 386 244 387 197 151

Mean fall slope (Hz/s)

846 1054 255 303 249 306 219 349

Tempo measurements

• Methods may be different for different tasks:

• For Comparison on the basis of the whole material• For tempo monitoring, for example for revealing tempo

modification specific for some IU types or IU position in the utterance or for local tempo comparison between read and spontaneous realizations of the same phrase

Tempo measurementsComparison on the basis of the whole material

• Syllables: Average Duration of Syllables realized in Spont. Speech vs Average Duration of Syllables realized in Reading

Example for F>40 152/143 = 1.06

• Sounds: Average Duration of Sounds realized in Spont. Speech vs Average Duration of Sounds realized in Reading

Example for F>40 67/63 = 1.06

Possible correction -taking into account the ideal number of syllables or sounds

The simplest way: direct comparison of sound duration in both phrases

But some sounds are longer in reading, others – in spontaneous speech, it makes the tempo comparison difficult and inconsistent.

Tempo measurements

Tempo monitoring: Example (Speaker F<20: phrase №12, sounds duration in spontaneous speech and reading)

0

20

40

60

80

100

120

140

k a n' e w n a u m n' a p e i v' i l y s' m n o g y n o v y h_z t r u z' e i

ms

Read Spont

Methods for tempo monitoring

1. Current syllable duration/average syllable duration:

current syllable duration = IU duration/number of syllables;

average syllable duration = net sound material duration/number of syllables

Not good because the result depends on syllable structures in the current IU, so it needs use of some normalization taking into account the average syllable structure (C/V coefficient) and current one.

Methods for tempo monitoring

2. Average sound duration in current IU/average sound duration

average sound duration in current IU = IU duration/number of sounds;

average sound duration = net sound material duration/number of sounds

With possible correction - taking into account the ideal number of sounds in the whole material and in the current IU

Example for F<20

•Reading 1-st IU 59/64 = 0.92 2-nd IU 61/64 = 0.95

•Spont. Speech 1-st IU 70/71 = 0.99 2-nd IU 54/71 = 0.76

Not good because the result does not take into account individual average durations of each sound in the IU and deviations of current duration of each sound in the IU from its average duration in the whole material.

Methods for tempo monitoring

3. Average sound duration in current IU/averaged sound duration in the IU

average sound duration in current IU = IU duration/number of sounds;

averaged sounds duration = sum of average sound durations (on the basis of the whole material) in the IU/ number of sounds in the IU

(some pictures)

Or the same in better view

Or the same in better view

Methods for tempo monitoring

3. Average sound duration in current IU/averaged sounds duration in the IU

average sound duration in current IU = IU duration/number of sounds;

averaged sounds duration = sum of average sounds durations (on the basis of the whole material) / number of sounds

Example for F<20• Reading 1-st IU 59/72 = 0.82

2-nd IU 61/56 = 1.09• Spont. Speech 1-st IU 70/75 = 0.93

2-nd IU 54/66 = 0.82

With possible correction - taking into account the ideal number of sounds in the whole material and in the current IU and average durations of pre-stressed and post-stressed vowels

Methods for tempo monitoring

4. Rob van Son proposal (Z-values):• "As Finnish and Dutch (and Russian?) use quantities on (some)

phonemes, this is not a good way to define tempo. We had a PhD student (Xue Wang) who developed a very nice way to define "local" tempo as the Z value of the phoneme (i.e., LocalTempo = (PhonemeDuration - MeanPhonemeDuration)/StandDeviation for each phoneme).

• The local speaking rate is then the mean of these values over an utterance."

Example for F<20• Reading 1-st IU -0.39

2-nd IU 0.26• Spont. Speech 1-st IU -0.12

2-nd IU -0.41

No comprehensible relation between values and linguistic features

Method comparison

IU-1 IU-2

Reading Spont. Speech Reading Spont. Speech

2. Average sound duration in current IU/average sound duration 0.92 0.99 0.95 0.76

3. Average sound duration in current IU/averaged sound duration in the IU 0.82 0.93 1.09 0.82

4. Mean for Z-values -0.39 -0.12 0.26 -0.41