ECE 598: The Speech Chain
-
Upload
giacomo-franklin -
Category
Documents
-
view
24 -
download
0
description
Transcript of ECE 598: The Speech Chain
![Page 1: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/1.jpg)
ECE 598: The Speech ECE 598: The Speech ChainChain
Lecture 8: Formant Lecture 8: Formant Transitions; Vocal Tract Transitions; Vocal Tract
Transfer FunctionTransfer Function
![Page 2: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/2.jpg)
TodayToday Perturbation Theory: Perturbation Theory:
A different way to estimate vocal tract resonant A different way to estimate vocal tract resonant frequencies, useful for consonant transitionsfrequencies, useful for consonant transitions
Syllable-Final Consonants: Formant Syllable-Final Consonants: Formant TransitionsTransitions
Vocal Tract Transfer FunctionVocal Tract Transfer Function Uniform Tube (Quarter-Wave Resonator)Uniform Tube (Quarter-Wave Resonator) During Vowels: All-Pole SpectrumDuring Vowels: All-Pole Spectrum
QQ BandwidthBandwidth
Nasal Vowels: Sum of two transfer functions Nasal Vowels: Sum of two transfer functions gives spectral zerosgives spectral zeros
![Page 3: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/3.jpg)
Topic #1:Topic #1:Perturbation TheoryPerturbation Theory
![Page 4: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/4.jpg)
Perturbation TheoryPerturbation Theory(Chiba and Kajiyama, (Chiba and Kajiyama, The VowelThe Vowel, 1940), 1940)
A(x) is constant everywhere, except for one small perturbation.
Method: 1. Compute formants of the “unperturbed” vocal tract. 2. Perturb the formant frequencies to match the area perturbation.
![Page 5: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/5.jpg)
Conservation of Energy Under Conservation of Energy Under PerturbationPerturbation
![Page 6: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/6.jpg)
Conservation of Energy Under Conservation of Energy Under PerturbationPerturbation
![Page 7: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/7.jpg)
““Sensitivity” FunctionsSensitivity” Functions
![Page 8: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/8.jpg)
Sensitivity Functions for the Sensitivity Functions for the Quarter-Wave Resonator (Lips Quarter-Wave Resonator (Lips
Open)Open)
L
/AA/ /ER/ /IY/ /W/
• Note: low F3 of /er/ is caused in part by a side branch under the tongue – perturbation alone is not enough to explain it.
![Page 9: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/9.jpg)
Sensitivity Functions for the Sensitivity Functions for the Half-Wave Resonator (Lips Half-Wave Resonator (Lips
Rounded)Rounded)
L
/L,OW/ /UW/
• Note: high F3 of /l/ is caused in part by a side branch above the tongue – perturbation alone is not enough to explain it.
![Page 10: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/10.jpg)
Formant Frequencies of Formant Frequencies of VowelsVowels
From Peterson & Barney, 1952
![Page 11: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/11.jpg)
Topic #2:Topic #2:Formant Transitions, Formant Transitions,
Syllable-Final Syllable-Final ConsonantConsonant
![Page 12: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/12.jpg)
Events in the Closure of a Events in the Closure of a Nasal ConsonantNasal Consonant
Vowel Nasalization
Formant Transitions
Nasal Murmur
![Page 13: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/13.jpg)
Formant Transitions: A Formant Transitions: A Perturbation Theory ModelPerturbation Theory Model
![Page 14: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/14.jpg)
Formant Formant Transitions: Transitions:
Labial Labial ConsonantsConsonants
“the mom”
“the bug”
![Page 15: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/15.jpg)
Formant Formant Transitions: Transitions:
Alveolar Alveolar ConsonantsConsonants
“the tug”
“the supper”
![Page 16: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/16.jpg)
Formant Formant Transitions: Transitions: Post-alveolar Post-alveolar ConsonantsConsonants
“the shoe”
“the zsazsa”
![Page 17: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/17.jpg)
Formant Formant Transitions: Transitions:
Velar Velar ConsonantsConsonants
“the gut”
“sing a song”
![Page 18: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/18.jpg)
Topic #3:Topic #3:Vocal Tract Transfer Vocal Tract Transfer
FunctionsFunctions
![Page 19: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/19.jpg)
Transfer FunctionTransfer Function ““Transfer Function” T(Transfer Function” T()=Output()=Output()/Input()/Input()) In speech, it’s convenient to write In speech, it’s convenient to write
T(T()=U)=ULL(()/U)/UGG(()) UULL(() = volume velocity at the lips) = volume velocity at the lips UUGG(() = volume velocity at the glottis) = volume velocity at the glottis T(0) = 1T(0) = 1
Speech recorded at a microphone = Speech recorded at a microphone = pressurepressure PPRR(() = R() = R()T()T()U)UGG(()) R(R() = j) = jf/r = “radiation characteristic”f/r = “radiation characteristic”
= density of air= density of air r = distance to the microphoner = distance to the microphone f = frequency in Hertzf = frequency in Hertz
![Page 20: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/20.jpg)
Transfer Function of an Ideal Transfer Function of an Ideal Uniform TubeUniform Tube
Ideal Terminations:Ideal Terminations: Reflection coefficient at glottis: zero velocity, Reflection coefficient at glottis: zero velocity, =1=1 Reflection coefficient at lips: zero pressure, Reflection coefficient at lips: zero pressure, ==11 Obviously, this is an approximation, but it gives… Obviously, this is an approximation, but it gives…
T(T() = 1/cos() = 1/cos(L/c)L/c)
= …= ………
nn = n = nc/L – c/L – c/2Lc/2L
FFnn = nc/2L – c/4L = nc/2L – c/4L
122
232…
![Page 21: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/21.jpg)
Transfer Function of an Ideal Uniform Transfer Function of an Ideal Uniform TubeTube
Peaks are actually infinite in height (figure is clipped to fit the display)
![Page 22: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/22.jpg)
Transfer Function of a Non-Transfer Function of a Non-Ideal Uniform TubeIdeal Uniform Tube
Almost ideal terminations:Almost ideal terminations: At glottis: velocity almost zero, At glottis: velocity almost zero, ≈1≈1 At lips: pressure almost zero, At lips: pressure almost zero, ≈≈1 1
T(T() = 1/(j/Q +cos() = 1/(j/Q +cos(L/c))L/c))
… … at Fat Fnn=nc/2L – c/4L,…=nc/2L – c/4L,…
T(2T(2FFnn) = ) = jQjQ20log20log1010|T(2|T(2FFnn)| = 20log)| = 20log1010QQ
![Page 23: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/23.jpg)
Transfer Function of a Non-Ideal Uniform Transfer Function of a Non-Ideal Uniform TubeTube
![Page 24: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/24.jpg)
Transfer Function of a Vowel: Transfer Function of a Vowel: Height of First Peak is QHeight of First Peak is Q11=F=F11/B/B11
T(T() = ) = (j (j+j2+j2FFnn++BBnn)(j)(jj2j2FFnn++BBnn))
T(2T(2FF11) ≈ (2) ≈ (2FF11))22/(j4/(j4FF11BB11))
= = jFjF11/B/B11
Call QCall Qnn = F = Fnn/B/Bnn
T(2T(2FF11) ≈ ) ≈ jQjQ11
20log10|T(220log10|T(2FF11)| ≈ 20log10Q)| ≈ 20log10Q11
(2Fn)2+(Bn)2
n=1
∞
![Page 25: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/25.jpg)
Transfer Function of a Vowel: Transfer Function of a Vowel: Bandwidth of a Peak is BBandwidth of a Peak is Bnn
T(T() = ) = (j (j+j2+j2FFnn++BBnn)(j)(jj2j2FFnn++BBnn))
T(2T(2FF11++BB11) ≈ (2) ≈ (2FF11))22/((j4/((j4FF11)()(BB11++BB11))))
= = jQjQ11/2/2
At f=FAt f=F11+0.5B+0.5Bnn, ,
|T(|T()|=0.5Q)|=0.5Qnn
20log20log1010|T(|T()| = 20log)| = 20log1010QQ11 – 3dB – 3dB
(2Fn)2+(Bn)2
n=1
∞
![Page 26: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/26.jpg)
Amplitudes of Higher Amplitudes of Higher Formants: Include the RolloffFormants: Include the Rolloff
T(T() = ) = (j (j+j2+j2FFnn++BBnn)(j)(jj2j2FFnn++BBnn))
At f above FAt f above F11
T(2T(2f) ≈ (Ff) ≈ (F11/f)/f)
T(2T(2FF22) ≈ () ≈ (jFjF22/B/B22)(F)(F11/F/F22))
20log10|T(220log10|T(2FF22)| )|
≈ ≈ 20log20log1010QQ22 – 20log – 20log1010(F(F22/F/F11))
1/f Rolloff: 6 dB per octave (per doubling of 1/f Rolloff: 6 dB per octave (per doubling of frequency)frequency)
(2Fn)2+(Bn)2
n=1
∞
![Page 27: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/27.jpg)
Vowel Transfer Function: Synthetic Vowel Transfer Function: Synthetic ExampleExample
L1 = 20log10(500/80)=16dB
L2 = 20log10(1500/240) – 20log10(F2/F1) = 16dB – 9.5dB
L3 = 20log10(2500/600)
– 20log10(F3/F1) – 20log10(F3/F2)
B2 = 240Hz
B1 = 80HzB3 = 600Hz?(hard to measure because rolloff from F1, F2 turns the F3 peak into a plateau)
F4 peak completely swamped by rolloff from lower formants
![Page 28: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/28.jpg)
Shorthand Notation for the Shorthand Notation for the Spectrum of a VowelSpectrum of a Vowel
T(s) = T(s) = (s (sssnn)(s)(sssnn*)*)
s = js = jssn n = = BBnn+j2+j2FFnn
ssnn* = * = BBnnj2j2FFnn
ssnnssnn* = |s* = |snn||22 = (2 = (2FFnn))22+(+(BBnn))22
T(0) = 1T(0) = 1
20log20log1010|T(0)| = 0dB|T(0)| = 0dB
snsn*
n=1
∞
![Page 29: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/29.jpg)
Another Shorthand Notation for Another Shorthand Notation for the Spectrum of a Vowelthe Spectrum of a Vowel
T(s) = T(s) = (1-s/s (1-s/snn)(1-s/s)(1-s/snn*)*)1
n=1
∞
![Page 30: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/30.jpg)
Topic #4:Topic #4:Nasalized VowelsNasalized Vowels
![Page 31: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/31.jpg)
Vowel NasalizationVowel Nasalization
Nasalized Vowel
Nasal Consonant
![Page 32: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/32.jpg)
Nasalized VowelNasalized Vowel
PPRR(() = R() = R()(U)(ULL(()+U)+UNN(())))
UUNN(() = Volume Velocity from Nostrils) = Volume Velocity from Nostrils
PPRR(() = R() = R()(T)(TLL(()+T)+TNN(())U))UGG(())
= R(= R()T()T()U)UGG(())
T(T() = T) = TLL(() + T) + TNN(())
![Page 33: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/33.jpg)
Nasalized VowelNasalized Vowel
T(s) = TT(s) = TLL(s)+T(s)+TNN(s)(s)
= (1-s/s= (1-s/sLnLn)(1-s/s)(1-s/sLnLn*) *) ++ (1-s/s (1-s/sNnNn)(1-s/s)(1-s/sNnNn*)*)
= (1-s/s= (1-s/sLnLn)(1-s/s)(1-s/sLnLn*)(1-s/s*)(1-s/sNnNn)(1-s/s)(1-s/sNnNn*)*)
1/s1/sZnZn = ½(1/s = ½(1/sLnLn+1/s+1/sNnNn))
ssZnZn = n = nthth spectral zero spectral zero
T(s) = 0 if s=sT(s) = 0 if s=sZnZn
1 1
2(1-s/sZn)(1-s/sZn*)
![Page 34: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/34.jpg)
The “Pole-Zero Pair”The “Pole-Zero Pair”
20log20log1010T(T() =) =
20log20log1010(1/(1-s/s(1/(1-s/sLnLn)(1-s/s)(1-s/sLnLn*))*))
+ 20log+ 20log1010((1-s/s((1-s/sZnZn)(1-s/s)(1-s/sZnZn*)/(1-s/s*)/(1-s/sNnNn)(1-s/s)(1-s/sNnNn*))*))
= original vowel log spectrum= original vowel log spectrum
+ log spectrum of a pole-zero pair+ log spectrum of a pole-zero pair
![Page 35: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/35.jpg)
Additive Terms in the Log Additive Terms in the Log SpectrumSpectrum
![Page 36: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/36.jpg)
Transfer Function of a Transfer Function of a Nasalized VowelNasalized Vowel
![Page 37: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/37.jpg)
Pole-Zero Pairs in the Pole-Zero Pairs in the SpectrogramSpectrogram
Nasal Pole
Zero
Oral Pole
![Page 38: ECE 598: The Speech Chain](https://reader035.fdocuments.us/reader035/viewer/2022062314/56812d94550346895d92b064/html5/thumbnails/38.jpg)
SummarySummary Perturbation Theory: Perturbation Theory:
Squeeze near a velocity peak: formant goes downSqueeze near a velocity peak: formant goes down Squeeze near a pressure peak: formant goes upSqueeze near a pressure peak: formant goes up
Formant TransitionsFormant Transitions Labial closure: loci near 250, 1000, 2000 HzLabial closure: loci near 250, 1000, 2000 Hz Alveolar closure: loci near 250, 1700, 3000 HzAlveolar closure: loci near 250, 1700, 3000 Hz Velar closure: F2 and F3 come together (“velar pinch”)Velar closure: F2 and F3 come together (“velar pinch”)
Vocal Tract Transfer FunctionVocal Tract Transfer Function T(s) = T(s) = ssnnssnn*/(s-s*/(s-snn)(s-s)(s-snn*)*) T(T(=2=2Fn) = QFn) = Qnn = F = Fnn/B/Bnn 3dB bandwidth = B3dB bandwidth = Bnn Hertz Hertz T(0) = 1T(0) = 1
Nasal Vowels: Nasal Vowels: Sum of two transfer functions gives a spectral zero Sum of two transfer functions gives a spectral zero
between the oral and nasal polesbetween the oral and nasal poles Pole-zero pair is a local perturbation of the spectrumPole-zero pair is a local perturbation of the spectrum