Post on 28-Feb-2018
7/25/2019 Lecture 2 Acoustic Theory
1/34
Acoustic
Theory
of
Speech
Production
Overview
Soundsources
Vocal
tract
transfer
function
Waveequations
Soundpropagationinauniformacoustictube
Representing
the
vocal
tract
with
simple
acoustic
tubes
Estimatingnaturalfrequenciesfromareafunctions
Representingthevocaltractwithmultipleuniformtubes
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction1
Lecture # 2
Session 2003
7/25/2019 Lecture 2 Acoustic Theory
2/34
A n a t o m i ca l S tr u ctu r e s f o r S p ee ch P r o d u ct i o n
6. 34 5 Automatic Speech Recognition Acous tic T heory of Speech Production 2
7/25/2019 Lecture 2 Acoustic Theory
3/34
Phonemesin AmericanEnglish
PHONEME
EXAMPLE
PHONEME
EXAMPLE
PHONEME
EXAMPLE
/i/ beat/I/ bit
/e/
bait
/E/ bet/@/ bat/a/ Bob/O/ bought
/^/
but
/o/
boat
/U/ book/u/ boot
/5/
Burt
/a/ bite/O/ Boyd/a/ bout/{/ about
/s/ see /w/ wet/S/ she /r/ red
/f/
fee
/l/
let
/T/ thief /y/ yet/z/ z /m/ meet/Z/ Gigi /n/ neat/v/ v /4/ sing
/D/
thee
/C/
church
/p/
pea
/J/
judge
/t/ tea /h/ heat/k/ key
/b/
bee
/d/ Dee/g/ geese
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction3
7/25/2019 Lecture 2 Acoustic Theory
4/34
Places
of
Articulation
for
Speech
Sounds
Palato-Alveolar Velar
AlveolarLabial
Uvular
Dental
Palatal
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction4
7/25/2019 Lecture 2 Acoustic Theory
5/34
Speech
Waveform:
An
Example
Two
plus
seven
is
less
than
ten
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction5
7/25/2019 Lecture 2 Acoustic Theory
6/34
A WidebandSpectrogram
Two
plus
seven
is
less
than
ten
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction6
7/25/2019 Lecture 2 Acoustic Theory
7/34
Acoustic
Theory
of
Speech
Production
The
acoustic
characteristics
of
speech
are
usually
modelled
as
a
sequenceofsource,vocaltractfilter,andradiationcharacteristics
UG
UL
Prr
Pr(j)=S(j)T(j)R(j)
For
vowel
production:
S(j) = UG(j)
T(j) = UL(j)/ UG(j)
R(j)
=
Pr(j)
/ UL(j)
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction7
7/25/2019 Lecture 2 Acoustic Theory
8/34
SoundSource: VocalFold Vibration
Modelled
as
a
volume
velocity
source
at
glottis,
UG(j)
Pr( t )
UG
( t )
T 1/Fo o=
t
t
UG
( f )
1 / f 2
f
F0 ave
(Hz)
F0 min
(Hz)
F0 max
(Hz)Men 125 80 200
Women 225 150 350
Children 300 200 500
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction8
7/25/2019 Lecture 2 Acoustic Theory
9/34
SoundSource: TurbulenceNoise
Turbulence
noise
is
produced
at
a
constriction
in
the
vocal
tract
Aspirationnoiseisproducedatglottis
Frication
noise
is
produced
above
the
glottis
Modelledasseriespressuresourceatconstriction,PS(j)
P ( f )s
f
0.2V
D
4A
V: Velocityatconstriction D: Criticaldimension= A
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction9
7/25/2019 Lecture 2 Acoustic Theory
10/34
Vocal TractWaveEquations
Define:
u(x, t) =
U(x, t) =p(x, t) =
=
c
=
particle
velocity
volumevelocity(U =uA)
soundpressurevariation(P=PO +p)
densityofair
velocity
of
sound
Assumingplanewavepropagation(foracrossdimension ),andaone-dimensionalwavemotion,itcanbeshownthat
p = u u = 1 p 2u 1 2u
=x
t
x c2 t x2 c2 t2
Timeandfrequencydomainsolutionsareoftheform
u(x, t)=u+(tx)u(t+x) u(x, s)= 1 P+esx/cPesx/cc c c
x xp(x, t)=c u+(t
) + u(t+ ) p(x, s)=P+esx/c +P
esx/c
c c
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction10
7/25/2019 Lecture 2 Acoustic Theory
11/34
UG
Propagation
of
Sound
in
a
Uniform
Tube
A
x = - l x = 0
Thevocaltracttransferfunctionofvolumevelocitiesis
UL(j) U(, j)T(j)=
UG(j)=
U(0, j)
UsingtheboundaryconditionsU(0, s) = UG(s)andP(, s) = 0 2 1
T(s) =
e
s/c +e
s/cT
(j)
=
cos(/c)
ThepolesofthetransferfunctionT(j)arewherecos(/c) = 0
4(2fn)=
(2n
2
1) fn =
4
c
(2n
1) n =
(2n
1)
n= 1,2, . . .
c
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction11
7/25/2019 Lecture 2 Acoustic Theory
12/34
Propagation
of
Sound
in
a
Uniform
Tube
(cont)
For
c
= 34,
000 cm/sec,
= 17
cm,
the
natural
frequencies
(also
calledtheformants)areat500Hz,1500Hz,2500Hz,. . . j
x
x
x
x
x
x
40)
T(j
2010
20log
0
0 1 2 3 4 5
Frequency ( kHz )
Thetransferfunctionofatubewithnosidebranches,excitedat
oneendandresponsemeasuredatanother,onlyhaspoles
The
formant
frequencies
will
have
finite
bandwidth
when
vocal
tractlossesareconsidered(e.g.,radiation,walls,viscosity,heat)
41,42,43,..., Thelengthofthevocaltract,,correspondsto
1 3 5
where
i is
the
wavelength
of
the
ith
natural
frequency
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction12
7/25/2019 Lecture 2 Acoustic Theory
13/34
Standing
Wave
Patterns
in
a
Uniform
Tube
A
uniform
tube
closed
at
one
end
and
open
at
the
other
is
often
referredtoasaquarterwavelengthresonator
xglottis lips
SWP forF1
|U(x)|
SWP for
F223
SWP forF3
2 45 5
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction13
7/25/2019 Lecture 2 Acoustic Theory
14/34
Natural
Frequencies
of
Simple
Acoustic
Tubes
z-l
A z-l
A
x = - l x = 0 x = - l x = 0
Quarterwavelengthresonator Half-wavelengthresonator
P(x, j) = 2P+ cosx
P(x, j) =
j2P+
sin
xc c
U(x,j)=
j
A A
c
2P+ sinx
U(x, j) =
c
2P+ cosx
c c
ctan
ccotY =j
A Y =j A
c c
j
A
A 1
c2 =
jCA /c
1
j
=
j
MA /c1CA =A/c
2 =acousticcompliance MA =/A=acousticmass
c c
fn =
4 (2n
1)
n
= 1, 2, . . .
fn =
2 n n
= 0, 1, 2, . . .
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction14
7/25/2019 Lecture 2 Acoustic Theory
15/34
Approximating Vocal TractShapes
[ i] [a] [ u]
A1 A2
1l 2l
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction15
7/25/2019 Lecture 2 Acoustic Theory
16/34
2
1
2
l
Estimating NaturalResonanceFrequencies
Resonance
frequencies
occur
where
impedance
(or
admittance)
functionequalsnatural(e.g.,opencircuit)boundaryconditions
UG A1 A2 UL
1l
Y + Y = 0
ForatwotubeapproximationitiseasiesttosolveforY1+Y2 = 0
j
A1tan
1j
A2cot
2= 0
c
c c c
sin1
sin2 A2 cos1 cos2 = 0
c c A1 c c
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction16
7/25/2019 Lecture 2 Acoustic Theory
17/34
Decoupling SimpleTube Approximations
If
A1 A2, orA1 A2,thetubescanbedecoupledandnaturalfrequenciesofeachtubecanbecomputedindependently
Forthevowel/i/,theformantfrequenciesareobtainedfrom:
A1 A2
1l
2l
c cfn =
21
n plus fn =22
n
At
low
frequencies:
A21/2 1 1 1/2c
f = =2 A112 2 CA1
MA2
This
low
resonance
frequency
is
called
the
Helmholtz
resonance
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction17
7/25/2019 Lecture 2 Acoustic Theory
18/34
VowelProduction Example
7 cm2
1 cm2
8 cm2
1 cm2
9 cm 8 cm 9 cm 6 cm
++ +
1093 268 1944 2917972
2917 . . .
. . . .
.
.
.
.. . . .
Formant Actual Estimated Formant ActualF1 789 972 F1 256F2 1276 1093 F2 1905F3
2808
2917
F3
2917
. . . . .
. . . . .
Estimated268
19442917
.
.
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction18
7/25/2019 Lecture 2 Acoustic Theory
19/34
Example of Vowel Spectrograms
kHz kHz
Wide Band Spectrogram
kHz kHz
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
Time (seconds)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
kHz kHz
0 0
8 8
16 16Zero Crossing Rate
dB dBTotal Energy
dB dBEnergy -- 125 Hz to 750 Hz
Waveform
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
kHz kHz
Wide Band Spectrogram
kHz kHz
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
Time (seconds)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
kHz kHz
0 0
8 8
16 16Zero Crossing Rate
dB dBTotal Energy
dB dBEnergy -- 125 Hz to 750 Hz
Waveform
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
/bit/ bat /
6.345 Automatic Speech Recognition Acoustic Theory of Speech Production 19
/
7/25/2019 Lecture 2 Acoustic Theory
20/34
Estimating
Anti-Resonance
Frequencies
(Zeros)
Zeros
occur
at
frequencies
where
there
is
no
measurable
output
UN
UG
Ap Ao
An
Yp Yo
Yn
nl
Ab Ac AfPsUL
lp
lo lb lc lf
Fornasalconsonants,zerosinUN occurwhereYO = Forfricativesorstopconsonants,zerosinUL occurwherethe
impedancebehindsourceisinfinite(i.e.,ahardwallatsource)
Y = 0
Y + Y = 01 3 4
Zeros
occur
when
measurements
are
made
in
vocal
tract
interior
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction20
7/25/2019 Lecture 2 Acoustic Theory
21/34
ConsonantProduction
Ab Ac AfPs
lb
lc
lf
POLES
ZEROS
+ + + +
Ab Ac Af b c f
[g]
5
0.2
4
9
3
5[s]
5
0.5
4
11
3
2.5
[g] [s]poles zeros poles zeros
215
0
306
0
1750 1944 1590 15901944 2916 3180 29163888 3888 3500 3180
.
.
.
.. . . .
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction21
7/25/2019 Lecture 2 Acoustic Theory
22/34
Example of Consonant Spectrograms
kHz kHz
Wide Band Spectrogram
kHz kHz
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
kHz kHz
0 0
8 8
16 16Zero Crossing Rate
dB dBTotal Energy
dB dBEnergy -- 125 Hz to 750 Hz
Waveform
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
kHz kHz
Wide Band Spectrogram
kHz kHz
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
kHz kHz
0 0
8 8
16 16Zero Crossing Rate
dB dBTotal Energy
dB dBEnergy -- 125 Hz to 750 Hz
Waveform
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
/kip/ si /
6.345 Automatic Speech Recognition Acoustic Theory of Speech Production 22
/
7/25/2019 Lecture 2 Acoustic Theory
23/34
A
A Y
j
Yl
PerturbationTheoryforsmall
l
Considerauniformtube,closedatoneendandopenattheother
l
x
Reducingtheareaofasmallpieceofthetubeneartheopening
(where
U
is
max)
has
the
same
effect
as
keeping
the
area
fixedandlengtheningthetube
Sincelengtheningthetubelowerstheresonantfrequencies,
narrowingthetubenearpointswhereU(x)ismaximuminthe
standing
wave
pattern
for
a
given
formant
decreases
the
value
ofthatformant
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction23
7/25/2019 Lecture 2 Acoustic Theory
24/34
A
Perturbation
Theory
(contd)
A
Yj
c2 for
small
Yl
l
l
x
Reducingtheareaofasmallpieceofthetubeneartheclosure
(wherepismax)hasthesameeffectaskeepingtheareafixedand
shortening
the
tube
Sinceshorteningthetubewillincreasethevaluesoftheformants,
narrowingthetubenearpointswherep(x)ismaximuminthe
standingwavepatternforagivenformantwillincreasethevalue
of
that
formant
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction24
7/25/2019 Lecture 2 Acoustic Theory
25/34
Summary
of
Perturbation
Theory
Results
xglottis lips
SWP forF1
|U(x)|
SWP forF2
23
SWP forF3
2 45 5
xglottis lips
F1 12
+
(as a consequence of decreasing A)
F2 12
++
F3 12
++
+
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction25
7/25/2019 Lecture 2 Acoustic Theory
26/34
Illustration
of
Perturbation
Theory
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction26
http://d%7C/media/6345/ah_ba_wa.wav7/25/2019 Lecture 2 Acoustic Theory
27/34
Illustration
of
Perturbation
Theory
Theshipwastornapartonthesharp(reef)
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction27
7/25/2019 Lecture 2 Acoustic Theory
28/34
Illustration
of
Perturbation
Theory
(Theshipwastornapartonthesh)arpreef
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction28
7/25/2019 Lecture 2 Acoustic Theory
29/34
Multi-Tube
Approximation
of
the
Vocal
Tract
We
can
represent
the
vocal
tract
as
a
concatenation
of
N
lossless
tubeswithconstantarea{Ak} andequallengthx=/N Thewavepropagationtimethrougheachtubeis= x = Ncc
A A7
x
xxx
x
xx
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction29
7/25/2019 Lecture 2 Acoustic Theory
30/34
Wave
Equations
for
Individual
Tube
The
wave
equations
for
the
kth tube
have
the
form
c x
Akk(t
x) + U
cpk(x, t) = [U
+k(t+ )]c
Uk(x, t) =
U
+
c)
U
c)k(t
x k(t
+
x
wherexismeasuredfromtheleft-handside(0 x x)
+ + + +Uk
( t ) Uk( t - ) U
k+1( t ) U
k+1( t - )
- - - -
Uk( t ) U k( t + ) U k+1( t ) U k+1 ( t + )
Ak
x
x
Ak+1
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction30
7/25/2019 Lecture 2 Acoustic Theory
31/34
Update Expressionat Tube Boundaries
We
can
solve
update
expressions
using
continuity
constraints
at
tubeboundariese.g.,pk(x, t) = pk+1(0, t), andUk(x, t) = Uk+1(0, t)
+k + 1U+
k + 1U
-
kU )
-
kU )
+
1 - r
1 + rk
k
rkk
- r
DELAY
DELAY
DELAY
DELAY
k th ( k + 1 ) st
k(t
) +
rkU
( t )
( t )( t +
( t -
tube tube
+Uk
( t ) U k + 1 ( t - )
- -U
k
( t ) Uk + 1
( t + )
Uk+
+1(t)
=
(1
+
rk)U
+k+1(t)
Uk
(t
+
)=rkUk+(t ) + ( 1 rk)Uk+1(t)
rk
=Ak+1Ak
note|
rk
|1
Ak+1+
Ak
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction31
7/25/2019 Lecture 2 Acoustic Theory
32/34
Digital
Model
of
Multi-Tube
Vocal
Tract
Updates
at
tube
boundaries
occur
synchronously
every
2
Ifexcitationisband-limited,inputscanbesampledeveryT = 2
Eachtubesectionhasadelayofz1/21
+ z2
1 + rk +Uk
( z )
kr
1
k-r
Uk + 1
( z )
- -
Uk( z ) Uk + 1 ( z )z 2 1 - rk
ThechoiceofN dependsonthesamplingrateT
T = 2= 2
=
N =
2
Nc
cT
Seriesandshuntlossescanalsobeintroducedattubejunctions
Bandwidthsareproportionaltoenergylosstostorageratio
Stored
energy
is
proportional
to
tube
length
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction32
7/25/2019 Lecture 2 Acoustic Theory
33/34
Assignment
1
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction33
7/25/2019 Lecture 2 Acoustic Theory
34/34
References
Zue,
6.345
Course
Notes
Stevens,AcousticPhonetics,MITPress,1998.
Rabiner&Schafer,DigitalProcessingofSpeechSignals,
Prentice-Hall,
1978.
6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction34