Lecture 2 Acoustic Theory

7/25/2019 Lecture 2 Acoustic Theory

1/34

Acoustic

Theory

of

Speech

Production

Overview

Soundsources

Vocal

tract

transfer

function

Waveequations

Soundpropagationinauniformacoustictube

Representing

the

vocal

tract

with

simple

acoustic

tubes

Estimatingnaturalfrequenciesfromareafunctions

Representingthevocaltractwithmultipleuniformtubes

6.345AutomaticSpeechRecognition AcousticTheoryofSpeechProduction1

Lecture # 2

Session 2003


2/34

A n a t o m i ca l S tr u ctu r e s f o r S p ee ch P r o d u ct i o n

6. 34 5 Automatic Speech Recognition Acous tic T heory of Speech Production 2


3/34

Phonemesin AmericanEnglish

PHONEME

EXAMPLE

PHONEME

EXAMPLE

PHONEME

EXAMPLE

/i/ beat/I/ bit

/e/

bait

/E/ bet/@/ bat/a/ Bob/O/ bought

/^/

but

/o/

boat

/U/ book/u/ boot

/5/

Burt

/a/ bite/O/ Boyd/a/ bout/{/ about

/s/ see /w/ wet/S/ she /r/ red

/f/

fee

/l/

let

/T/ thief /y/ yet/z/ z /m/ meet/Z/ Gigi /n/ neat/v/ v /4/ sing

/D/

thee

/C/

church

/p/

pea

/J/

judge

/t/ tea /h/ heat/k/ key

/b/

bee

/d/ Dee/g/ geese



4/34

Places

of

Articulation

for

Speech

Sounds

Palato-Alveolar Velar

AlveolarLabial

Uvular

Dental

Palatal



5/34

Speech

Waveform:

An

Example

Two

plus

seven

is

less

than

ten



6/34

A WidebandSpectrogram

Two

plus

seven

is

less

than

ten



7/34

Acoustic

Theory

of

Speech

Production

The

acoustic

characteristics

of

speech

are

usually

modelled

as

a

sequenceofsource,vocaltractfilter,andradiationcharacteristics

UG

UL

Prr

Pr(j)=S(j)T(j)R(j)

For

vowel

production:

S(j) = UG(j)

T(j) = UL(j)/ UG(j)

R(j)

=

Pr(j)

/ UL(j)



8/34

SoundSource: VocalFold Vibration

Modelled

as

a

volume

velocity

source

at

glottis,

UG(j)

Pr( t )

UG

( t )

T 1/Fo o=

t

t

UG

( f )

1 / f 2

f

F0 ave

(Hz)

F0 min

(Hz)

F0 max

(Hz)Men 125 80 200

Women 225 150 350

Children 300 200 500



9/34

SoundSource: TurbulenceNoise

Turbulence

noise

is

produced

at

a

constriction

in

the

vocal

tract

Aspirationnoiseisproducedatglottis

Frication

noise

is

produced

above

the

glottis

Modelledasseriespressuresourceatconstriction,PS(j)

P ( f )s

f

0.2V

D

4A

V: Velocityatconstriction D: Criticaldimension= A



10/34

Vocal TractWaveEquations

Define:

u(x, t) =

U(x, t) =p(x, t) =

=

c

=

particle

velocity

volumevelocity(U =uA)

soundpressurevariation(P=PO +p)

densityofair

velocity

of

sound

Assumingplanewavepropagation(foracrossdimension ),andaone-dimensionalwavemotion,itcanbeshownthat

p = u u = 1 p 2u 1 2u

=x

t

x c2 t x2 c2 t2

Timeandfrequencydomainsolutionsareoftheform

u(x, t)=u+(tx)u(t+x) u(x, s)= 1 P+esx/cPesx/cc c c

x xp(x, t)=c u+(t

) + u(t+ ) p(x, s)=P+esx/c +P

esx/c

c c



11/34

UG

Propagation

of

Sound

in

a

Uniform

Tube

A

x = - l x = 0

Thevocaltracttransferfunctionofvolumevelocitiesis

UL(j) U(, j)T(j)=

UG(j)=

U(0, j)

UsingtheboundaryconditionsU(0, s) = UG(s)andP(, s) = 0 2 1

T(s) =

e

s/c +e

s/cT

(j)

=

cos(/c)

ThepolesofthetransferfunctionT(j)arewherecos(/c) = 0

4(2fn)=

(2n

2

1) fn =

4

c

(2n

1) n =

(2n

1)

n= 1,2, . . .

c



12/34

Propagation

of

Sound

in

a

Uniform

Tube

(cont)

For

c

= 34,

000 cm/sec,

= 17

cm,

the

natural

frequencies

(also

calledtheformants)areat500Hz,1500Hz,2500Hz,. . . j

x

x

x

x

x

x

40)

T(j

2010

20log

0

0 1 2 3 4 5

Frequency ( kHz )

Thetransferfunctionofatubewithnosidebranches,excitedat

oneendandresponsemeasuredatanother,onlyhaspoles

The

formant

frequencies

will

have

finite

bandwidth

when

vocal

tractlossesareconsidered(e.g.,radiation,walls,viscosity,heat)

41,42,43,..., Thelengthofthevocaltract,,correspondsto

1 3 5

where

i is

the

wavelength

of

the

ith

natural

frequency



13/34

Standing

Wave

Patterns

in

a

Uniform

Tube

A

uniform

tube

closed

at

one

end

and

open

at

the

other

is

often

referredtoasaquarterwavelengthresonator

xglottis lips

SWP forF1

|U(x)|

SWP for

F223

SWP forF3

2 45 5



14/34

Natural

Frequencies

of

Simple

Acoustic

Tubes

z-l

A z-l

A

x = - l x = 0 x = - l x = 0

Quarterwavelengthresonator Half-wavelengthresonator

P(x, j) = 2P+ cosx

P(x, j) =

j2P+

sin

xc c

U(x,j)=

j

A A

c

2P+ sinx

U(x, j) =

c

2P+ cosx

c c

ctan

ccotY =j

A Y =j A

c c

j

A

A 1

c2 =

jCA /c

1

j

=

j

MA /c1CA =A/c

2 =acousticcompliance MA =/A=acousticmass

c c

fn =

4 (2n

1)

n

= 1, 2, . . .

fn =

2 n n

= 0, 1, 2, . . .



15/34

Approximating Vocal TractShapes

[ i] [a] [ u]

A1 A2

1l 2l



16/34

2

1

2

l

Estimating NaturalResonanceFrequencies

Resonance

frequencies

occur

where

impedance

(or

admittance)

functionequalsnatural(e.g.,opencircuit)boundaryconditions

UG A1 A2 UL

1l

Y + Y = 0

ForatwotubeapproximationitiseasiesttosolveforY1+Y2 = 0

j

A1tan

1j

A2cot

2= 0

c

c c c

sin1

sin2 A2 cos1 cos2 = 0

c c A1 c c



17/34

Decoupling SimpleTube Approximations

If

A1 A2, orA1 A2,thetubescanbedecoupledandnaturalfrequenciesofeachtubecanbecomputedindependently

Forthevowel/i/,theformantfrequenciesareobtainedfrom:

A1 A2

1l

2l

c cfn =

21

n plus fn =22

n

At

low

frequencies:

A21/2 1 1 1/2c

f = =2 A112 2 CA1

MA2

This

low

resonance

frequency

is

called

the

Helmholtz

resonance



18/34

VowelProduction Example

7 cm2

1 cm2

8 cm2

1 cm2

9 cm 8 cm 9 cm 6 cm

++ +

1093 268 1944 2917972

2917 . . .

. . . .

.

.

.

.. . . .

Formant Actual Estimated Formant ActualF1 789 972 F1 256F2 1276 1093 F2 1905F3

2808

2917

F3

2917

. . . . .

. . . . .

Estimated268

19442917

.

.



19/34

Example of Vowel Spectrograms

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8

16 16Zero Crossing Rate

dB dBTotal Energy

dB dBEnergy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz


kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8


dB dBTotal Energy


Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

/bit/ bat /

6.345 Automatic Speech Recognition Acoustic Theory of Speech Production 19

/


20/34

Estimating

Anti-Resonance

Frequencies

(Zeros)

Zeros

occur

at

frequencies

where

there

is

no

measurable

output

UN

UG

Ap Ao

An

Yp Yo

Yn

nl

Ab Ac AfPsUL

lp

lo lb lc lf

Fornasalconsonants,zerosinUN occurwhereYO = Forfricativesorstopconsonants,zerosinUL occurwherethe

impedancebehindsourceisinfinite(i.e.,ahardwallatsource)

Y = 0

Y + Y = 01 3 4

Zeros

occur

when

measurements

are

made

in

vocal

tract

interior



21/34

ConsonantProduction

Ab Ac AfPs

lb

lc

lf

POLES

ZEROS

+ + + +

Ab Ac Af b c f

[g]

5

0.2

4

9

3

5[s]

5

0.5

4

11

3

2.5

[g] [s]poles zeros poles zeros

215

0

306

0

1750 1944 1590 15901944 2916 3180 29163888 3888 3500 3180

.

.

.

.. . . .



22/34

Example of Consonant Spectrograms

kHz kHz


kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz

0 0

8 8


dB dBTotal Energy


Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

kHz kHz


kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

kHz kHz

0 0

8 8


dB dBTotal Energy


Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

/kip/ si /

6.345 Automatic Speech Recognition Acoustic Theory of Speech Production 22

/


23/34

A

A Y

j

Yl

PerturbationTheoryforsmall

l

Considerauniformtube,closedatoneendandopenattheother

l

x

Reducingtheareaofasmallpieceofthetubeneartheopening

(where

U

is

max)

has

the

same

effect

as

keeping

the

area

fixedandlengtheningthetube

Sincelengtheningthetubelowerstheresonantfrequencies,

narrowingthetubenearpointswhereU(x)ismaximuminthe

standing

wave

pattern

for

a

given

formant

decreases

the

value

ofthatformant



24/34

A

Perturbation

Theory

(contd)

A

Yj

c2 for

small

Yl

l

l

x

Reducingtheareaofasmallpieceofthetubeneartheclosure

(wherepismax)hasthesameeffectaskeepingtheareafixedand

shortening

the

tube

Sinceshorteningthetubewillincreasethevaluesoftheformants,

narrowingthetubenearpointswherep(x)ismaximuminthe

standingwavepatternforagivenformantwillincreasethevalue

of

that

formant



25/34

Summary

of

Perturbation

Theory

Results

xglottis lips

SWP forF1

|U(x)|

SWP forF2

23

SWP forF3

2 45 5

xglottis lips

F1 12

+

(as a consequence of decreasing A)

F2 12

++

F3 12

++

+



26/34

Illustration

of

Perturbation

Theory

http://d%7C/media/6345/ah_ba_wa.wav


27/34

Illustration

of

Perturbation

Theory

Theshipwastornapartonthesharp(reef)



28/34

Illustration

of

Perturbation

Theory

(Theshipwastornapartonthesh)arpreef



29/34

Multi-Tube

Approximation

of

the

Vocal

Tract

We

can

represent

the

vocal

tract

as

a

concatenation

of

N

lossless

tubeswithconstantarea{Ak} andequallengthx=/N Thewavepropagationtimethrougheachtubeis= x = Ncc

A A7

x

xxx

x

xx



30/34

Wave

Equations

for

Individual

Tube

The

wave

equations

for

the

kth tube

have

the

form

c x

Akk(t

x) + U

cpk(x, t) = [U

+k(t+ )]c

Uk(x, t) =

U

+

c)

U

c)k(t

x k(t

+

x

wherexismeasuredfromtheleft-handside(0 x x)

+ + + +Uk

( t ) Uk( t - ) U

k+1( t ) U

k+1( t - )

- - - -

Uk( t ) U k( t + ) U k+1( t ) U k+1 ( t + )

Ak

x

x

Ak+1



31/34

Update Expressionat Tube Boundaries

We

can

solve

update

expressions

using

continuity

constraints

at

tubeboundariese.g.,pk(x, t) = pk+1(0, t), andUk(x, t) = Uk+1(0, t)

+k + 1U+

k + 1U

-

kU )

-

kU )

+

1 - r

1 + rk

k

rkk

- r

DELAY

DELAY

DELAY

DELAY

k th ( k + 1 ) st

k(t

) +

rkU

( t )

( t )( t +

( t -

tube tube

+Uk

( t ) U k + 1 ( t - )

- -U

k

( t ) Uk + 1

( t + )

Uk+

+1(t)

=

(1

+

rk)U

+k+1(t)

Uk

(t

+

)=rkUk+(t ) + ( 1 rk)Uk+1(t)

rk

=Ak+1Ak

note|

rk

|1

Ak+1+

Ak



32/34

Digital

Model

of

Multi-Tube

Vocal

Tract

Updates

at

tube

boundaries

occur

synchronously

every

2

Ifexcitationisband-limited,inputscanbesampledeveryT = 2

Eachtubesectionhasadelayofz1/21

+ z2

1 + rk +Uk

( z )

kr

1

k-r

Uk + 1

( z )

- -

Uk( z ) Uk + 1 ( z )z 2 1 - rk

ThechoiceofN dependsonthesamplingrateT

T = 2= 2

=

N =

2

Nc

cT

Seriesandshuntlossescanalsobeintroducedattubejunctions

Bandwidthsareproportionaltoenergylosstostorageratio

Stored

energy

is

proportional

to

tube

length



33/34

Assignment

1



34/34

References

Zue,

6.345

Course

Notes

Stevens,AcousticPhonetics,MITPress,1998.

Rabiner&Schafer,DigitalProcessingofSpeechSignals,

Prentice-Hall,

1978.


Lecture 2 Acoustic Theory

Documents

Transcript of Lecture 2 Acoustic Theory

Non-Intrusive Acoustic Valve Inspection (NAVI) Theory & Applications

Lecture 3 Acoustic Auditory Phonetics

Theory Lecture 13

Acoustic Waves in Layered Media - From Theory to Seismic

Digital Speech Processing— Lecture 3 · 2011-12-22 · Digital Speech Processing— Lecture 3 Acoustic Theory of Speech Production 2 Topics to be Covered • Sound production mechanisms

Acoustic versus Electromagnetic Field Theory: Scalar ...

ACOUSTIC WAVEFORM LOGGING ADVANCES IN THEORY AND …

Basic Acoustic Theory - R2Sonic€¦ · sonar operating frequency, the more rapid the vibration (or excitement) of the ... R2Sonic LLC Multibeam Training – Basic Acoustic Theory

Lecture 1.1 Theory

Acoustic Theory Speech Production

Acoustic High-Frequency Diffraction Theory

Acoustic methods in Rock Mechanics - ipmnet.rusvkuznec/Lecture courses/Lecture course on Acoustic... · Acoustic methods in Rock Mechanics Sergey V. Kuznetsov Institute for Problems

LECTURE 10: ACOUSTIC TRANSDUCERS - ISIP · LECTURE 10: ACOUSTIC TRANSDUCERS Objectives: Introduce the basic types of microphones Understand microphone impedance and other physical

Graph Theory Lecture

Digital Speech Processing— Lecture 3 - UCSB speech... · Digital Speech Processing— Lecture 3 Acoustic Theory of Speech Production 2 Topics to be Covered • Sound production

Theory Lecture

Sound Propagation Theory for Linear Ray Acoustic …skiminki/bib/D-skiminki.pdf · Sound Propagation Theory for Linear Ray Acoustic Modelling ... Sound Propagation Theory for Linear

Acoustic Theory 3 Sound Creation and Manipulation.

Acoustic Theory Applied to the Physics

Digital Audio Signal Processing Lecture-4: Acoustic Echo Cancellation