Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis...

103
1 Digital Speech Processing— Lecture 9 Short-Time Fourier Analysis Methods- Introduction

Transcript of Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis...

Page 1: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

1

Digital Speech Processing—Lecture 9

Short-Time Fourier Analysis Methods-

Introduction

Page 2: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

2

General Discrete-Time Model of Speech Production

Voiced Speech:

• AVP(z)G(z)V(z)R(z)

Unvoiced Speech:

• ANN(z)V(z)R(z)

Page 3: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

3

Short-Time Fourier Analysis• represent signal by sum of sinusoids or

complex exponentials as it leads to convenient solutions to problems (formant estimation, pitch period estimation, analysis-by-synthesis methods), and insight into the signal itself

• such Fourier representations provide– convenient means to determine response to a sum of

sinusoids for linear systems– clear evidence of signal properties that are obscured

in the original signal

Page 4: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

4

Why STFT for Speech Signals• steady state sounds, like vowels, are produced

by periodic excitation of a linear system => speech spectrum is the product of the excitation spectrum and the vocal tract frequency response

• speech is a time-varying signal => need more sophisticated analysis to reflect time varying properties– changes occur at syllabic rates (~10 times/sec)– over fixed time intervals of 10-30 msec, properties of

most speech signals are relatively constant (when is this not the case)

Page 5: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

5

Overview of Lecture• define time-varying Fourier transform (STFT)

analysis method• define synthesis method from time-varying FT

(filter-bank summation, overlap addition)• show how time-varying FT can be viewed in

terms of a bank of filters model• computation methods based on using FFT• application to vocoders, spectrum displays,

format estimation, pitch period estimation

Page 6: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

6

Frequency Domain Processing

• Coding:– transform, subband, homomorphic, channel vocoders

• Restoration/Enhancement/Modification:– noise and reverberation removal, helium restoration,

time-scale modifications (speed-up and slow-down of speech)

Page 7: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

7

Frequency and the DTFT

0 00

0

2 sinusoids

( ) cos( ) ( ) /where is the (in radians) of the sinusoid the Discrete-Time Fourier Transform ( )

( ) ( ) (

frequency

ω ω

ω ω

ωω

∞−

=−∞

= = +

= =∑

j n j n

j j n

n

x n n e e

X e x n e x

DTFT

DTFT { }

{ }12

-1

)

( ) ( ) ( )

where is the of ( )frequency variable

πω ω ω

π

ω

ωπ

ω−

= =∫ j j n j

j

n

x n X e e d X e

X e

DTFT

Page 8: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

8

DTFT and DFT of Speech

1(2 / )

0

The DTFT and the DFT for the infinite duration signal could be calculated (the DTFT) and approximated (the DFT) by the following:

( ) ( ) ( )

( ) ( ) ( ) ,

j j m

mL

j L km

m

X e x m e DTFT

X k x m w m e

ω ω

π

∞−

=−∞

−−

=

=

=

(2 / )

0,1,..., 1

( ) ( )

using a value of =25000 we get the following plot

j

k L

k L

X e DFT

L

ω

ω π=

= −

=

Page 9: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

9

25000-Point DFT of SpeechM

agni

tude

Log

Mag

nitu

de (d

B)

Page 10: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Short-Time Fourier Transform (STFT)

10

Page 11: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

11

Short-Time Fourier Transform• speech is not a stationary signal, i.e., it

has properties that change with time• thus a single representation based on all

the samples of a speech utterance, for the most part, has no meaning

• instead, we define a time-dependent Fourier transform (TDFT or STFT) of speech that changes periodically as the speech properties change over time

Page 12: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

12

Definition of STFTˆ ˆ

ˆ

ˆˆ

ˆ ˆ ˆ ( ) ( ) ( ) both and are variables

ˆ ˆ ( ) is a real window which determines the portion of ( )that is used in the computation of ( )

ω ω

ω

ω∞

=−∞

= −

• −

∑j j mn

m

jn

X e x m w n m e n

w n m x nX e

Page 13: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

13

Short Time Fourier Transform

ω ω

ω

ω

∞−

=−∞

= −

= − ⇒

∑ˆ ˆˆ

ˆ STFT is a function of two variables, the time index, , whichis discrete, and the frequency variable, ˆ, which is continuous

ˆ( ) ( ) ( )

ˆ ˆ( ( ) ( )) fixed, ˆ varia

j j mn

m

n

X e x m w n m e

x m w n m n DTFT ble

Page 14: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

14

ˆˆ ˆ ( )ˆ

ˆˆ ˆ

ˆ ˆˆ

ˆ

alternative form of STFT (based on change of variables) is

ˆ ( ) ( ) ( )

ˆ( ) ( )

if we define

ˆ ( ) ( ) ( )

then (

ω ω

ω ω

ω ω

∞− −

=−∞

∞−

=−∞

=−∞

= −

= −

= −

∑%

j j n mn

m

j n j m

m

j j mn

mj

n

X e w m x n m e

e x n m w m e

X e x n m w m e

X e ˆ

ˆ ˆˆ ˆ ˆ ˆˆ

) can be expressed as (using )ˆ ( ) ( ) ( ) ( )

ω

ω ω ω ω− −

′ = −

= = + −⎡ ⎤⎣ ⎦%j j n j j n

nn

m m

X e e X e e x n m w mDTFT

Short-Time Fourier Transform

Page 15: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

15

STFT-Different Time Origins

ˆ ˆˆ

the STFT can be viewed as having two different time origins 1. time origin tied to signal ( )

ˆ( ) ( ) ( )

ˆ ˆ ˆ( ) ( ) , fixed, variable

2. time origin ti

ω ω

ω

∞−

=−∞

= −

= −⎡ ⎤⎣ ⎦

∑j j mn

m

x n

X e x m w n m e

x m w n m n DTFT

ˆˆ ˆ ˆˆ

ˆˆ ˆ

ˆˆ

ed to window signal ( )

ˆ( ) ( ) ( )

( )ˆ ˆ ˆ( ) ( ) , fixed, variable

ω ω ω

ω ω

ω ω

∞− −

=−∞

= + −

=

= − +⎡ ⎤⎣ ⎦

∑%

j j n j mn

m

j n j

j n

w m

X e e x n m w m e

e X ee w m x n m nDTFT

Page 16: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

16

Time Origin for STFT

ˆ [0]m n x= − ⇒

ˆ[ ] [ ]Time origin tied towindow w m x n m− +

Page 17: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

17

Interpretations of STFT

1

ˆˆ

ˆˆ

ˆˆ

there are 2 distinct interpretations of ( )ˆ. assume is fixed, then ( ) is simply the normal Fourier

ˆ transform of the sequence ( ) ( ), for ˆfixed , ( ) has the

ω

ω

ω

− −∞ < < ∞ =>

jn

jn

jn

X e

n X ew n m x m m

n X e

ˆˆ

ˆˆ ˆˆ

same properties as a normal Fouriertransform

ˆ ˆ2. consider ( ) as a function of the time index with fixed.ˆ Then ( ) is in the form of a convolution of the signal ( )

with

ω

ω ω

ω−

jn

j j nn

X e n

X e x n e

ˆˆ

ˆthe window ( ). This leads to an interpretation in the form of ˆ ˆ linear filtering of the frequency modulated signal ( ) by ( ).

- we will now consider each of these interpretations of th

ω− j n

w nx n e w n

e STFT in a lot more detail

Page 18: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

18

Fourier Transform Interpretationˆ

ˆ consider ( ) as the normal Fourier transform of the sequenceˆ ˆ( ) ( ), for fixed .

ˆ the window ( ) slides along the sequence ( ) and defines aˆ new STFT for every value of

w

ω•

− − ∞ < < ∞• −

jnX e

w n m x m m nw n m x m

n

hat are the conditions for the existence of the STFTˆ the sequence ( ) ( ) must be absolutely summable for all

ˆ values of ˆ - since | ( ) | (32767 for 16-bit sampling)

-

• −

w n m x mn

x n L1ˆ since | ( ) | (normalized window levels)

- since window duration is usually finiteˆ ˆ ( ) ( ) is absolutely summable for all

• −

w n

w n m x m n

Page 19: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

19

Frequencies for STFT

2ˆ ˆ( )ˆ ˆ

the STFT is periodic in with period 2 , i.e.,( ) ( ),

can use any of several frequency variables to express STFT, including

ˆˆ -- (where is the sampling period for

j j kn nX e X e k

T T

ω ω π

ω π

ω

+

= ∀

= Ω

2

2 2

1

1

ˆˆ

ˆˆ

( )) to represent analog radian frequency,

giving ( )ˆ ˆˆ ˆ -- or to represent normalized

ˆ frequency (0 ) or analog frequency ˆ (0 ), giving (

j Tn

j fs n

x m

X e

f FT

f

F F / T X e π

ω π ω π

Ω

= =

≤ ≤

≤ ≤ = 2 ˆˆ) or ( )j FTnX e π

Page 20: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

20

Signal Recovery from STFTˆ

ˆ

ˆˆ

ˆ since for a given value of , ( ) has the same properties as a normal Fourier transform, we can recover the input sequence exactly

since ( ) is the normal Fourier transform of the windo

ω

ω

jn

jn

n X e

X e

12

0 0

ˆ ˆˆ

wed ˆ sequence ( ) ( ), then

ˆ ˆ ( ) ( ) ( )

assuming the window satisfies the property that ( ) ( a trivial requirement), then by evaluating the inverse Fourier t

πω ω

π

ωπ −

− =

• ≠

∫ j j mn

w n m x m

w n m x m X e e d

w

12 0

ˆˆˆ

ransformˆ when , we obtain

ˆ ˆ ( ) ( )( )

πω ω

π

ωπ −

=

= ∫ j j nn

m n

x n X e e dw

Page 21: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

21

Signal Recovery from STFT

12 0

0 0

ˆˆˆ

ˆ ˆˆ ˆ

ˆ ˆ ( ) ( )( )

ˆ with the requirement that ( ) , the sequence ( ) can be recovered exactly from ( ), if ( ) is known

ˆfor all values of over one complete pe

πω ω

π

ω ω

ωπ

ω

=

• ≠

∫ j j nn

j jn n

x n X e e dw

w x nX e X e

ˆˆ

riod- sample-by-sample recovery process

ˆ ˆ - ( ) must be known for every value of and for all ˆcan also recover sequence ( ) ( ) but can't guarantee

that ( ) can be recovered since (

ω ω−

jnX e n

w n m x mx m w ˆ ) can equal 0−n m

Page 22: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

22

Properties of STFT

[ ]

ˆˆ

ˆ ˆ ˆ ˆ2ˆ ˆ ˆ ˆ ˆ

ˆ ˆ

ˆ ˆ ˆ( ) [ ( ) ( )] fixed, variable relation to short-time power density function

ˆ ( ) | ( ) | ( ) ( ) ( ) fixed

ˆ ˆ( ) ( ) ( ) ( ) ( )

ω

ω ω ω ω

ω

= −

= = ⋅ =

= − − − + ⇔

jn

j j j jn n n n n

n n

X e w n m x m n

S e X e X e X e R k n

R k w n m x m w n m k x m k S

DTFT

DTFT

[ ]

ˆ

ˆ

ˆ ˆ

ˆˆ ˆ( )ˆ

ˆ

( )

Relation to regular ( ) (assuming it exists)

( ) ( ) ( )

1( ) ( ) ( )2

ˆ( ) ( ) ( ) ( )

ω

ω

ω ω

πω θ ω θ θ

π

θ θ θ

θπ

=−∞

∞−

=−∞

− − −

− −

= =

=

⎡ ⎤− ↔ ∗⎣ ⎦

j

mj

j j m

m

j j j j nn

j j n j

e

X e

X e x m x m e

X e W e X e e d

w n m x m W e e X e

DTFT

Page 23: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

23

Properties of STFT

[ ]

12

1 2

1 22

ˆ

ˆ ˆ

ˆˆ ˆ( )ˆ

ˆ

ˆˆ ˆ ˆ( )ˆ

assume ( ) exists

( ) ( ) ( )

( ) ( ) ( )

limiting caseˆ ˆ ˆ ( ) ( ) ( )

( ) ( ) ( ) ( )

i.e.

ω

ω ω

πω θ ω θ θ

π

ω

πω ω θ θ ω

π

θπ

πδ ω

πδ θ θπ

∞−

=−∞

− − −

− −

= =

=

= − ∞ < < ∞ ⇔ =

= − =

j

j j m

m

j j j j nn

j

j j j n jn

X e

X e x m x m e

X e W e X e e d

w n n W e

X e X e e d X e

DTFT

, we get the same thing no matter where the window is shifted

Page 24: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

24

Alternative Forms of STFT

1

ˆˆ

ˆ ˆ ˆˆ ˆ ˆ

ˆ ˆ

ˆˆ ˆ

ˆˆ ˆ

Alternative forms of ( ). real and imaginary parts

( ) Re ( ) Im ( )

ˆ ˆ( ) ( )

ˆ( ) Re ( )

ˆ( ) Im ( )

ˆ when ( ) and ( ) are bot

ω

ω ω ω

ω

ω

ω ω

ω

ω

⎡ ⎤ ⎡ ⎤= +⎣ ⎦ ⎣ ⎦= −

⎡ ⎤= ⎣ ⎦⎡ ⎤= − ⎣ ⎦

• −

jn

j j jn n n

n n

jn n

jn n

X e

X e X e j X e

a j b

a X e

b X e

x m w n m

ˆ

ˆ ˆ

ˆ( )ˆ ˆˆ ˆ

ˆˆ ˆ

h real (usually the case)ˆ ˆ ˆ can show that ( ) is symmetric in , and ( ) isˆ anti-symmetric in

2. magnitude and phase ( ) | ( ) |

ˆ can relate | ( ) | and ( )

θ ωω ω

ω

ω ω ωω

θ ω

=

n

n n

jj jn n

jn n

a b

X e X e e

X e ˆ ˆˆ ˆ to ( ) and ( ) ω ωn na b

Page 25: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

25

Role of Window in STFT

ˆˆ

ˆˆ

ˆ The window ( ) does the following: 1. chooses portion of ( ) to be analyzed 2. window shape determines the nature of ( )

ˆ Since ( ) (for fixed ) is the normal FT o

ω

ω

jn

jn

w n mx m

X e

X e n

ˆ ˆ

ˆ ˆ

ˆf ( ) ( ), then if we consider the normal FT's of both ( ) and ( ) individually, we get

( ) ( )

( ) ( )

ω ω

ω ω

∞−

=−∞

∞−

=−∞

=

=

j j m

m

j j m

m

w n m x mx n w n

X e x m e

W e w m e

Page 26: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

26

Role of Window in STFT

ˆˆ ˆ

ˆ then for fixed , the normal Fourier transform of the ˆproduct ( ) ( ) is the convolution of the transforms

ˆof ( ) and ( )ˆ ˆ for fixed , the FT of ( ) is ( ) --thus

ω ω− −

−−

• − j j n

nw n m x m

w n m x mn w n m W e e

12

12

ˆˆ ˆ( )ˆ

ˆˆ ˆ( )ˆ

( ) ( ) ( )

and replacing by gives

( ) ( ) ( )

πω θ θ ω θ

π

πω θ θ ω θ

π

θπ

θ θ

θπ

− − −

+

=

• −

=

j j j n jn

j j j n jn

X e W e e X e d

X e W e e X e d

Page 27: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

27

Interpretation of Role of Windowˆ ˆ

ˆ

ˆˆ ˆ

ˆ

( ) is the convolution of ( ) with the FT of the shifted

window sequence ( )ˆ ( ) really doesn't have meaning since ( ) varies with time;

ˆ consider ( ) defined for wi

ω ω

ω ω

ω

− −

j jn

j j n

j

X e X e

W e eX e x n

x nˆ

ndow duration and extended for all time to have the same properties then ( ) does exist with properties

ˆ that reflect the sound within the window (can also consider ( ) 0 outside the window

ω⇒=

jX ex n

ˆ

ˆˆ

and define ( ) appropriately--but this is another case)Bottom Line: ( ) is a smoothed version of the FT of the part

ˆ of ( ) that is within the window .

ω

ω

j

jn

X e

X ex n w

Page 28: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

28

Windows in STFTˆ

ˆ

ˆ

ˆ for ( ) to represent the short-time spectral properties of ( )

inside the window ( ) should be much narrower in frequency than significant spectral regions of ( )--i.e., almost a

ω

θ

ω

=>

jn

j

j

X e x n

W eX e n impulse

in frequency consider rectangular and Hamming windows, where width of the

main spectral lobe is inversely proportional to window length, and side lobe levels are essentially independent

of window length

Rectangular Window: flat window of length L samples; first zero in frequency response occurs at FS/L, with sidelobe levels of -14 dB or lower

Hamming Window: raised cosine window of length Lsamples; first zero in frequency response occurs at 2FS/L, with sidelobe levels of -40 dB or lower

Page 29: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Windows

29L=2M+1-point Hamming window and its corresponding DTFT

Page 30: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

30

Frequency Responses of Windows

Page 31: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Effect of Window Length-HW

31

Page 32: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Effect of Window Length-HW

32

Page 33: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Effect of Window Length-RW

33

Page 34: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Effect of Window Length-HW

34

Page 35: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Relation to Short-Time Autocorrelation

35

ˆˆ

ˆ ˆ ˆ ˆ2 *ˆ ˆ ˆ ˆ

ˆ

ˆ( ) [ ] [ ]ˆ,

( ) | ( ) | ( ) ( )

ˆ( ) [ ]

is the discrete-time Fourier transform of for each value of then it is seen that is the Fourier transform of

ω

ω ω ω ω

= =

= −

jn

j j j jn n n n

n

X e w n m x mn

S e X e X e X e

R l w n m x ˆ[ ] [ ] [ ]

which is the short-time autocorrelation function of the previouschapter. Thus the above equations relate the short-time spectrum to the short-time autocorrelation,

=−∞

− − +∑m

m w n l m x m l

Page 36: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Short-Time Autocorrelation and STFT

36

Page 37: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

37

Summary of FT view of STFTˆ

ˆ

interpret ( ) as the normal Fourier transform of the sequenceˆ( ) ( ),

properties of this Fourier transform depend on the window frequency resolution of ( ) varies inver

ω

ω

− − ∞ < < ∞•

o

jn

jn

X ew n m x m m

X e sely with the length of the window want long windows for high resolution want ( ) to be relatively stationary (non-time-varying) during duration of window for most s

=>o x n

table spectrum want short windows

as usual in speech processing, there needs to be a compromise between good temporal resolution (short windows) and good frequency resolution (l

=>

ong windows)

Page 38: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Linear Filtering Interpretation of STFT

38

Page 39: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

39

Linear Filtering Interpretation

( )

1

12

2

ˆ ˆ

ˆ

ˆ( )

ˆ. modulation-lowpass filter form ( rather than )

( ) ( ) ( )

ˆ( ) ( ) , variable, fixed

( ) ( )

. bandpass filter-demodulation

ω ω

ω

πθ θ ω θ

π

ω

θπ

∞−

=−∞

+

= −

= ∗

=

j j mn

m

j n

j j j n

n

n n

X e x m e w n m

w n x n e n

W e X e e d

X ˆ ˆ ( )

ˆ ˆ

ˆ ˆ

( ) ( ) ( )

( ( ) ) ( )

ˆ[( ( ) ) ( )], variable, fixed

ω ω

ω ω

ω ω ω

∞− −

=−∞

∞−

=−∞

= −

= −

= ∗

j j n m

m

j n j m

mj n j n

e w m x n m e

e w m e x n m

e w n e x n n

Page 40: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

40

Linear Filtering Interpretation

( )( )( )

ˆ ˆ

ˆ

1. modulation-lowpass filter form:

( ) ( ) ( ),

ˆ variable, fixed

( ) ( )

ˆ( )cos( ) ( )

ˆ( )sin( ) ( )ˆ ˆ( ) ( )

j j mn

m

j n

n n

X e x m e w n m

n

x n e w n

x n n w n

j x n n w na jb

ω ω

ω

ω

ω

ω

ω ω

∞−

=−∞

= −

= ∗

= ∗ −

= −

Page 41: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

41

Linear Filtering Interpretation

( )ˆ ˆ ˆ

2. bandpass filter-demodulation form

ˆ ( ) ( ) ( ) , variable, fixedω ω ω ω− ⎡ ⎤= ∗⎣ ⎦j j n j n

nX e e w n e x n n

ˆ

complex bandpass filter output modulated by signal if ( ) is lowpass, then filter

ˆ is bandpass around all real computation for lower

half structure

j n

j

eW e

ω

θ

θ ω

•=

Page 42: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

42

Linear Filtering InterpretationLowpass filter frequency response

Bandpass filter frequency response

Page 43: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

43

Linear Filtering Interpretation

ˆ ˆ( )

assume normal FT of ( ) existsˆ ( ) ( ) (recall that is a particular frequency)

( ) ( )ˆ spectrum of ( ) at frequency is shifted to zero frequency;

s

θ

ω θ ω

ω

ω

− +

↔=>•

j

j n j

x nx n X ex n e X e

x n

1

ˆ( )

ince the STFT is a convolution, the FT of the STFT is the product of the individual FT's, i.e., ( ) ( ) if ( ) resembles a narrow band lowpass filter, i.e., ( )

for sm

θ ω θ

θ θ

+ ⋅

• =

j j

j j

X e W eW e W e

ˆ ˆ( )

all and is 0 otherwise, then ( ) ( ) ( )θ ω θ ω

θ+ ⋅ ≈j j jX e W e X e

Page 44: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Summary-STFT

44

ˆ ˆˆ

Short-Time Fourier Transform (STFT)

ˆ( ) [ ] [ ] ,

ˆ ˆ, 0 2ˆ ˆFixed value of , varying -- DFT Interpretation

ˆˆFixed value of , varying -- Filter Bank Interpretation

j j mn

mX e x m w n m e

nn

n

ω ω

ω πω

ω

∞−

=−∞

= −

−∞ < < ∞ ≤ <

Page 45: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Summary

45

ˆ ˆˆ

Short-Time Fourier Transform (STFT)

ˆ ˆ ˆ( ) [ ] [ ] , , 0 2j j mn

mX e x m w n m e nω ω ω π

∞−

=−∞

= − −∞ < < ∞ ≤ <∑

0 R 2R 3R n̂

ω̂

0

( )

( )

ˆˆ ˆ

ˆˆ 1

ˆˆ

ˆDFT: ( ) [ ] [ ]

ˆ( ) DFT [ ] [ ]ˆˆ0 2 , 0, ,2 ,...

nj j m

nm n L

jn

X e x m w n m e

X e x m w n m

n R R

ω ω

ω

ω π

= − +

= −

= −

≤ < =

Page 46: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Summary – Modulation/Lowpass Filter

46

ˆ ˆˆ

Short-Time Fourier Transform (STFT)

ˆ ˆ ˆ( ) [ ] [ ] , , 0 2j j mn

mX e x m w n m e nω ω ω π

∞−

=−∞

= − −∞ < < ∞ ≤ <∑

0ω̂1ω̂2ω̂

1ˆLω −

n

( )( )( )

ˆ ˆ

1

ˆ ˆ

ˆ

Filter Bank: ( ) [ ] [ ]

( ) [ ] [ ]

[ ] [ ]

nj j m

nm n L

j j nn

j n

X e x m e w n m

X e x n e w n m

x n e w n

ω ω

ω ω

ω

= − +

= −

= −

= ∗

Page 47: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Summary – Bandpass Filter/Demodulation

47

ˆ ˆˆ

Short-Time Fourier Transform (STFT)

ˆ ˆ ˆ( ) [ ] [ ] , , 0 2j j mn

mX e x m w n m e nω ω ω π

∞−

=−∞

= − −∞ < < ∞ ≤ <∑

0ω̂1ω̂2ω̂

1ˆLω −

n

( )ˆ ˆ ( )

ˆ ˆ ˆ

Filter Bank: ( ) [ ] [ ]

( ) ( [ ] ) [ ]

j j n mn

m

j j n j nn

X e x n m e w m

X e e w n e x n

ω ω

ω ω ω

∞− −

=−∞

= −

⎡ ⎤= ∗⎣ ⎦

Page 48: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Summary – Modulation

48

ˆ ˆ

ˆ( )

Modulation[ ] ( ) ( )

ˆ( ) ( )( )

j n j j n

j

j

x n e X e FT eX eX e

ω ω ω

ω

ω ω

δ ω ω−

↔ ∗

= ∗ −

=

-W 0 W

( )jX e ω ˆ( )( )jX e ω ω−

ˆ Wω − ˆ Wω +ω̂

Page 49: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

49

STFT Magnitude Only

1 22 2 /ˆ

for many applications you only need the magnitude of the STFT(not the phase) in such cases, the bandpass filter implementation is

less complex, since

ˆ ˆ | ( ) | ( ) ( )

|

ω ω ω

⎡ ⎤= +⎣ ⎦

=

jn n nX e a b

1 22 2 /ˆ ˆ ˆ( ) | ( ) ( )ω ω ω⎡ ⎤= +⎣ ⎦% %%j

n n nX e a b

Page 50: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Sampling Rates of STFT

50

Page 51: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

51

Sampling Rates of STFT• need to sample STFT in both time and frequency to

produce an unaliased representation from which x(n) can be exactly recovered

• sampling rates lower than the theoretical minimum rate can be used, in either time or frequency, and x(n) can still be exactly recovered from the aliased (under-sampled) short-time transform– this is useful for spectral estimation, pitch estimation, formant

estimation, speech spectrograms, vocoders– for applications where the signal is modified, e.g., speech

enhancement, cannot undersample STFT and still recover modified signal exactly

Page 52: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

52

Sampling Rate in Time

ˆ

ˆ

to determine the sampling rate in time, we take a linear filtering view 1. ( ) is the output of a filter with impulse response ( )

2. ( ) is a lowpass response with effective ba

jn

j

X e w n

W e

ω

ω

•%

2

0 54 0 46

ˆ ˆ

ndwidth of Hertz thus the effective bandwidth of ( ) is Hertz ( ) has

to be sampled at a rate of samples/second to avoid aliasingExample: Hamming Window

( ) . . co

j jn n

BX e B X e

B

w n

ω ω• =>

= − 2 1 0 10

2 10 000

s( / ( ))otherwise

(Hz); for 400, , Hz 50 Hz need

rate of 100/sec (every 100 samples) for sampling rate in time

ss

n L n L

FB L F BL

π − ≤ ≤ −=

⇒ ≈ = = => = =>

Page 53: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

53

Sampling Rate in Frequency2

22 0 1 1

ˆ ˆ since ( ) is periodic in with period , it is only necessary to sample over an interval of length

ˆ need to determine an appropriate finite set of frequencies, / , , ,...,

jn

k

X e

k N k N

ω ω ππ

ω π

• = = −ˆ

ˆ

ˆ

at which ( ) must be specified to exactly recover ( )

use the Fourier transform interpretation of ( ) 1. if the window ( ) is time-limited, then the inverse transform of ( ) is

jn

jn

jn

X e x n

X e

w n X e

ω

ω

ω

ˆ

time-limited

2. the sampling theorem requres that we sample ( ) in the frequency dimension at a rate of at least twice its ('symmetric') "time width" 3. since the inverse Fourie

jnX e ω

ˆ

ˆ

r transform of ( ) is the signal ( ) ( ) and this signal is of duration samples (the duration of ( )), then according to the sampling theorem

( ) must be sampled (in frequency

jn

jn

X e x m w n mL w n

X e

ω

ω

2 0 1 1 2

ˆ

) at the set of frequencies

ˆ , , ,..., (where / is the effective width of the window)

in order to exactly recover ( ) from ( ) k

k

jn

k k L LL

x n X e ω

πω = = −

• thus for a Hamming window of duration L=400 samples, we require that the STFT be evaluated at at least 400 uniformly spaced frequencies around the unit circle

Page 54: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

54

“Total” Sampling Rate of STFT• the “total” sampling rate for the STFT is the product of the sampling

rates in time and frequency, i.e.,SR = SR(time) x SR(frequency)

= 2B x L samples/secB = frequency bandwidth of window (Hz)L = time width of window (samples)

• for most windows of interest, B is a multiple of FS/L, i.e.,B = C FS/L (Hz), C=1 for Rectangular Window

C=2 for Hamming WindowSR = 2C FS samples/second

• can define an ‘oversampling rate’ ofSR/ FS = 2C = oversampling rate of STFT as compared to

conventional sampling representation of x(n)for RW, 2C=2; for HW 2C=4 => range of oversampling is 2-4this oversampling gives a very flexible representation of the speech signal

Page 55: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

55

Mathematical Basis for Sampling the STFT

2 2

2 2

2

2 0 1 1

ˆ assume sample in time at ,

ˆ ˆ and in frequency at , , ,...,

sample values

( ) [ ] [ ]

( )

( ) [ ]

r

k

j k j kmN N

r Rm

j krR j kN N

r R

j kN

r R

n n rR r

k k NN

X e w rR m x m e

e X e

X e x rR m

π π

π π

π

πω ω

∞ −

=−∞

• = = −∞ < < ∞

⎛ ⎞= = = −⎜ ⎟⎝ ⎠

= −

=

= +

%

% ( )2

2 2

( ) set ;

define DFT-type notation

( ) ( ) ( )

j kmN

m

j k j krRN N

r r R r

w m e m rR m m m

X k X e e X k

π

π π

∞ −

=−∞

′ ′− = + =

= =

%

Page 56: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

56

Sampling the STFT

Page 57: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

57

Sampling the STFT

2 2

21

0

0 0 1

0 1

DFT Notation

[ ] ( ) [ ]

let [ ] for (finite duration window with no zero-valued samples)

[ ] [ ] [ ]

( fixed, ) if then (D

j k j krRN N

r r R r

L j kmN

rm

X k X e e X k

w m m L

X k x r R m w m e

r k NL N

π π

π

− −

=

= =

• − ≠ ≤ ≤ −

= + −

≤ ≤ −• ≤

%

%

21

0

1

0 1

FT defined with no aliasing can recover sequence exactly using inverse DFT)

[ ] [ ] [ ]

( fixed, ) if (IDFT defined with no aliasing), then all samples

ca

N j kmN

rk

x r R m w m X k eN

r m NR L

π−

=

+ − =

≤ ≤ −• ≤

∑ %

( )n be recovered from [ ] gaps in sequencerX k R L> ⇒

Page 58: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

58

What We Have Learned So Far1 ˆ ˆ

ˆ

ˆˆ

ˆ. ( ) ( ) ( )

ˆ ˆ function of for sampled (looks like a time sequence)ˆˆ function of for sampled (looks like a transform)

( ) (no sampling rate reduction) define

j j mn

m

jn

X e x m w n m e

n nn

X e

ω ω

ω

ωω ω

∞−

=−∞

= −

==

1 2 3 0

2 ˆˆ

ˆˆ ˆˆ

ˆ ˆd for , , ,...;ˆ ˆ ˆ. ( ) DTFT ( ) ( ) fixed, variable

ˆ with time origin tied to ( )ˆ ˆ ˆ( ) DTFT ( ) ( ) fixed, variable

with time origin tied to (

jn

j j nn

n

X e x m w n m n

x nX e e x n m w m n

w

ω

ω ω

ω π

ω

ω−

= ≤ ≤

= − ⇒⎡ ⎤⎣ ⎦

= + − ⇒⎡ ⎤⎣ ⎦

3

1

ˆˆ

ˆˆ

ˆ ˆˆ

). Interpretations of ( )

ˆ ˆˆ. fixed, variable; ( ) DTFT ( ) ( ) DFT View

ˆ ˆ 2. variable, fixed; ( ) ( ) ( ) Linear Filtering view

jn

jn

j j nn

mX e

n X e x m w n m

n n X e x n e w n

ω

ω

ω ω

ω ω

ω −

= = − ⇒⎡ ⎤⎣ ⎦= = ∗ ⇒

filter bank implementation⇒

Page 59: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

59

What We Have Learned So Far4

12

12 0

5

ˆ ˆˆ

ˆˆ ˆˆ

ˆ

. Signal Recovery from STFT

ˆ ˆ ( ) ( ) ( )

ˆ ˆ( ) ( )( )

. Linear Filtering Interpretation

1. modulation-lowpass filter ( ) ( ) ( )

j j mn

j j nn

j jn

x m w n m X e e d

x n X e e dw

X e w n x n e

πω ω

π

πω ω

π

ω

ωπ

ωπ

− =

=

⇒ = ∗

( )

12

ˆ

ˆ ˆ( )

ˆ ˆ ˆ

,

ˆ ˆ variable, fixed ( ) ( ) ( )

2. bandpass filer-demodulation ( ) ( ) ( ) ,

ˆ ˆ variable, fixed

n

j j j j nn

j j n j nn

n n X e W e X e e d

X e e w n e x n

n n

ω

πω θ θ ω θ

π

ω ω ω

ω θπ

ω

+

⎡ ⎤⎣ ⎦

= =

⎡ ⎤⇒ = ∗⎣ ⎦=

Page 60: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

60

What We Have Learned So Far6

22

. Sampling Rates in Time and Frequency 1. time: ( ) has bandwidth of Hertz samples/sec rate

Hamming Window: (Hz)

2. frequency: ( ) is time limited to samples

j

S

W e B BFBL

w n L

ω ⇒

=

% inverse of ( ) is also time limited need to sample in frequency at twice the (effective) time width of the time-limited sequence frequency samples 3.

jnX e

L

ω⇒

⇒⇒

2total Sampling Rate: samples/sec - frequency bandwidth of the window (Hz) - effective time width of the window (samples) / (Hz) SampliS

B LBL

B C F L

⋅=== ⋅ ⇒ 2 2

12

ng Rate samples/second - for Rectangular Window, - for Hamming Window,

SB L CFC

C

= ⋅ =

==

Page 61: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Spectrographic Displays

61

Page 62: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

62

Spectrographic Displays• Sound Spectrograph-one of the earliest embodiments of the time-

dependent spectrum analysis techniques– 2-second utterance repeatedly modulates a variable frequency

oscillator, then bandpass filtered, and the average energy at a given time and frequency is measured and used as a crude measure of the STFT

– thus energy is recorded by an ingenious electro-mechanical system on special electrostatic paper called teledeltos paper

– result is a two-dimensional representation of the time-dependent spectrum-with vertical intensity being spectrum level at a given frequency, and horizontal intensity being spectral level at a given time-with spectrum magnitude being represented by the darkness of the marking

– wide bandpass filters (300 Hz bandwidth) provide good temporal resolution and poor frequency resolution (resolve pitch pulses in time but not in frequency)—called wideband spectrogram

– narrow bandpass filters (45 Hz bandwidth) provide good frequency resolution and poor time resolution (resolve pitch pulses in frequency, but not in time)—called narrowband spectrogram

Page 63: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Conventional Spectrogram (Every salt breeze comes from the sea)

63

Page 64: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

64

Digital Speech Spectrograms• wideband spectrogram

• follows broad spectral peaks (formants) over time

• resolves most individual pitch periods as vertical striations since the IR of the analyzing filter is comparable in duration to a pitch period

• what happens for low pitch males—high pitch females

• for unvoiced speech there are no vertical pitch striations

• narrowband spectrogram• individual harmonics are resolved in voiced regions

• formant frequencies are still in evidence

• usually can see fundamental frequency

• unvoiced regions show no strong structure

Page 65: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Digital Speech Spectrograms• Speech Parameters (“This is a test”):

– sampling rate: 16 kHz– speech duration: 1.406 seconds– speaker: male

• Wideband Spectrogram Parameters:– analysis window: Hamming window– analysis window duration: 6 msec (96 samples)– analysis window shift: 0.625 msec (10 samples)– FFT size: 512– dynamic range of spectral log magnitudes: 40 dB

• Narrowband Spectrogram Parameters:– analysis window: Hamming window– analysis window duration: 60 msec (960 samples)– analysis window shift: 6 msec (96 samples)– FFT size: 1024 – dynamic range of spectral log magnitudes: 40 dB

65

Page 66: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Digital Speech Spectrograms

66

Top Panel: 3 msec (48 samples) window

Second Panel: 6 msec (96 samples) window

Third Panel: 9 msec (144 sample) window

Fourth Panel: 30 msec (480 sample) window

Page 67: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

67

Spectrogram Comparisons

Page 68: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Spectrogram - Male

68

1024 80 75= = =nfft , ,L Overlap

Page 69: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Spectrogram - Female

69

1024 80 75= = =nfft , ,L Overlap

Page 70: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Overlap Addition (OLA) Method

70

Page 71: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

71

Overlap Addition (OLA) Method

/ˆ ˆ

ˆ

based on normal FT interpretation of short-time spectrumˆ( ) ( ) ( ) ( )

can reconstruct by computing IDFT of ( ) and dividing out the window (assumed non-zero

k

k

DFT IDFTjn n

jn

X e y m x m w n m

x(m) X e

ω

ω

←⎯⎯⎯⎯→ = −

ˆ

for all samples) this process gives signal values of for each window

window can be moved by samples and the process repeated

since ( ) is "undersampled" in time, it is highly susckjn

L x(m)L

X e ω

• =>

• eptible to aliasing errors need more robust synthesis procedure=>

Page 72: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

72

Overlap Addition (OLA) Method( ) ( )

summation is for overlapping analysis sections for each value of where ( ) is measured, do an inverse FT to give

( ) ( ) ( ) (where is the size of the FT)(

k k

k

j j nm

m k

jm

m

y n X e e

m X ey n Lx n w m n Ly

ω ω

ω

⎡ ⎤= ⎢ ⎥

⎣ ⎦•

= −

∑ ∑

10

00

) ( ) ( ) ( )

a basic property of the window is

( ) ( ) | ( )

since any set of samples of the window are equivalent (by sampling arguments), then if ( ) is sampled ofte

k

k

mm m

Njj

n

n y n Lx n w m n

W e W e w n

w n

ωω

==

= = −

= =

∑ ∑

0

0

n enough we get (independent of ) ( ) ( )

( ) ( ) ( ) using overlap-added sections

j

mj

nw m n W e

y n Lx n W e

− =

=

Page 73: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Overlap Addition of Bartlett and Hann Windows

73

Page 74: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

74

Overlap Addition of Hamming WindowL=128

Page 75: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Window Spectra

75DTFTof Bartlett (triangular), Hann and Hamming windows

Page 76: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Hamming Window Spectra

76

DTFTs of even-length, odd-length and modified odd-to-even length Hamming windows; zeros spaced at 2π/R give perfect

reconstruction using OLA (even-length window)

Page 77: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

77

Overlap Addition (OLA) Method

Page 78: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

78

Overlap Addition (OLA) Method

• w(n) is an L-point Hamming window with R=L/4

• assume x(n)=0 for n<0

• time overlap of 4:1 for HW

• first analysis section begins at n=L/4

Page 79: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

79

Overlap Addition (OLA) Method

• 4-overlapping sections contribute to each interval

• N-point FFT’s done using L speech samples, with N-Lzeros padded at end to allow modifications without significant aliasing effects

• for a given value of ny(n)=x(n)w(R-n)+x(n)w(2R-n)+x(n)w(3R-n)+x(n)w(4R-n)=x(n)[w(R-n)+w(2R-n)+w(3R-n)+w(4R-n)]=x(n) W(ej0)/R

Page 80: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

Filter Bank Summation (FBS)

80

Page 81: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

81

Filter Bank Summation

the filter bank interpretation of the STFT shows that for

any frequency , ( ) is a lowpass representation ˆ of the signal in a band centered at ( for FBS)

( ) ( )

ω

ω ω

ωω

=

= −

k

k k

jk n

k

j j nn k

X en n

X e e x n m w ( )

where ( ) is the lowpass window used at frequency (we have generalized the structure to allow a different lowpass window at each frequency ).

ω

ω

ω

=−∞∑ kj m

m

k k

k

m e

w m

Page 82: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

82

Filter Bank Summation

define a bandpass filter and substitute it in the equation to give

( ) ( )

( ) ( ) ( )

k

k k

j nk k

j j nn k

m

h n w n e

X e e x n m h m

ω

ω ω∞

=−∞

=

= −∑

Page 83: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

83

Filter Bank Summation

Page 84: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

84

Filter Bank Interpretation of STFT (case: wk[n]=w[n])

[ ]

( )

( )

[ ] ( )

[ ] [ ] ( ) ( ) ( )

( ) ( )

( ) ( ) ( ) ( )

[ ] ( ) [ ] [ ]

( ) ( ) ( )

k k

k k k

k

k k

j

j n j nj j

j n j n j nj nk

n njj j

k

j j nk n

j j jk

w n W e

h n w n e H e W e FT e

FT e e e e

H e W e W e

v n X e e x n h n

V e H e X e FT

ω

ω ωω ω

ω ω ω ωω

ω ωω ω

ω ω

ω ω ω

δ ω ω

δ ω ω

∞ ∞− −−

=−∞ =−∞

← − − − →

= ← − − − → = ⊗

= = = −

= ⊗ − =

= = ⋅ ∗

⎡ ⎤= ⋅ ⊗⎣ ⎦

∑ ∑

( )

( )

( )

( ) ( ) ( )

( ) ( ) ( )

( ) ( )

k

k

k

j n

j jk

jjk

j j

e

H e X e

X e W e

X e W e

ω

ω ω

ω ωω

ω ω ω

δ ω ω

δ ω ω

+

⎡ ⎤= ⋅ ⊗ +⎣ ⎦⎡ ⎤= ⋅ ⊗ +⎣ ⎦

= ⋅

-ω0 ω0 ω

ωk-ω0 ωk ωk+ ω0 ω

“single-sided, bandpass”

“lowpass”

Page 85: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

85

Filter Bank Summation

thus ( ) is obtained by bandpass filtering ( ) followed by modulation with the complex exponential

. We can express this in the form

( ) ( ) ( ) ( )

thus

k

k

k k

jn

j n

j j nk n k

m

X e x n

e

y n X e e x n m h m

ω

ω

ω ω

=−∞

= = −

∑ ( ) is the output of a bandpass filter with impulse

response ( )k

k

y nh n

Page 86: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

86

Filter Bank Summation

Page 87: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

87

Filter Bank Summation0 1 1

a practical method for reconstructing ( ) from the STFT is as follows

1. assume we know ( ) for a set of frequencies { }, , ,..., 2. assume we have a set of bandpass filters

ω ω

= −kjn k

x n

X e N k NN

0 1 1

with impulse responses

( ) ( ) , , ,..., 3. assume ( ) is an ideal lowpass filter with cutoff frequency

ω

ω= = −kj n

k k

k pk

h n w n e k Nw n

( )

- the frequency response of the bandpass filter is

( ) ( )kjjk kH e W e ω ωω −=

Page 88: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

88

Filter Bank Summation

2 0 1 1

consider a set of bandpass filters, uniformly spaced, so that the entire frequency band is covered

, , ,...,

also assume window the same for all channels, i.e., ( ) ( ),

πω

= = −

•=

k

k

N

k k NN

w n w n

1 1

0 0

0 1 1

( )

, ,..., if we add together all the bandpass outputs, the composite response is

( ) ( ) ( )

if ( ) is properly sampled in frequency ( ), where is the

ω ωω ω

ω

− −−

= =

= −

= =

• ≥

∑ ∑% k

k

N Njj j

kk k

j

k N

H e H e W e

W e N L L

1

0

1 0( )

window duration, then it can be shown that

( ) ( )ω ω ω−

=

= ∀∑ k

Nj

k

W e wN

FBS Formula

Page 89: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

89

Filter Bank Summation

1/N

Page 90: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

90

Filter Bank Summation

/

derivation of FBS formula

( ) ( )

if ( ) is sampled in frequency at uniformly spaced points, the inverse discrete Fourier transform

of the sampled version of ( ) is

ω

ω

ω

←⎯⎯⎯→

k

FT IFT j

j

j

w n W e

W e N

W e

1

0

1

(recall that sampling multiplication convolution aliasing)

( ) ( )

an aliased version of ( ) is obtained.

ω ω− ∞

= =−∞

⇒ ⇔ ⇒

= +

∑ ∑k k

Nj j n

k r

W e e w n rNN

w n

Page 91: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

91

Filter Bank Summation

0 0

0

1

If ( ) is of duration samples, then ( ) , , and no aliasing occurs due to sampling in frequency

of ( ). In this case if we evaluate the aliased formula for , we get

ω

•= < ≥

=

j

w n Lw n n n L

W en

1

0

0( ) ( )

the FBS formula is seen to be equivalent to the formula above, since (according to the sampling theorem) any

set of uniformly spaced samples of ( ) is adequate

ω

ω

=

=

∑ k

Nj

k

j

W e wN

N W e

Page 92: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

92

Filter Bank Summation

1 1

0 0

0

0

the impulse response of the composite filter bank system is

( ) ( ) ( ) ( ) ( )

thus the composite output is

( ) ( ) ( ) ( ) ( ) thus for FBS method, the

ω δ− −

= =

= = =

= ∗ =•

∑ ∑%

%

k

N Nj n

kk k

h n h n w n e Nw n

y n x n h n Nw x n

1 1

0 0

0

reconstructed signal is

( ) ( ) ( ) ( ) ( )

if ( ) is sampled properly in frequency, and is independent of theshape of ( )

ω ω

ω

− −

= =

= = =

∑ ∑ k k

k

N Nj j n

k nk k

jn

y n y n X e e Nw x n

X ew n

Page 93: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

93

Filter Bank Summation

1/N

Page 94: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

94

2 21 1

0 0

2 21

0

21

0

0 0 1

( )

( ) ( ) ( )

( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( )

( ) for if then n

N N j k j knN N

k nk k

N j km j knN N

k m

N j k n mN

m k

m r

r

y n y n X e e

x m w n m e e

x m w n m e

x m w n m N n m rN

y n N w rN x n rN

w n n L N L

π π

π π

π

δ

− −

= =

− −

=

− −

=

=−∞

=−∞

= =

⎡ ⎤= −⎢ ⎥

⎣ ⎦

= −

= − − −

= −

• ≠ ≤ ≤ − ⇒ ≥

∑ ∑

∑ ∑

∑ ∑

∑ ∑

∑0

00 1 2

eed only term

if then in order for ( ) ( ) you need the condition ( ) , , ,... 'undersampled' representation can still work--at least in theory

ry(n) Nw( )x(n)

N L y n x n w rN r

==

• < = = = ± ±•

Filter Bank Summation

Page 95: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

95

Summary of FBS Method perfect reconstruction of from ( ) is possible using FBS

under the following conditions: 1. is a finite duration filter/window

2. ( ) is sampled properly in both ti

jn

jn

x(n) X e

w(n)

X e

ω

ω

me and frequency

perfect reconstruction of from ( ) is also possible using FBS under the following condition:

( ) is perfectly bandlimited

To avoid time aliasing, ( ) must

k

k

jn

j

jn

x(n) X e

W e

X e

ω

ω

ω

2 4

be evaluated at at least uniformly spaced frequencies, where is the window duration -since window of length samples has frequency bandwidth of from

/ (for RW) to / (for HW), the

LL

LL Lπ π

2 0 1 1 bandpass filters in FBS overlap

in frequency since the analysis frequencies are / , , ,...,

there is a way (at least theoretically) for ( ) to be evaluated in non-overlapping bands

kjn

k L k L

X e ω

π = −

and for which can still be exactly recoveredx(n)

Page 96: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

96

FBS Reconstruction in Non-Overlapping Bands

2 0 1 1

assume window length for all bands is samples assume the same window is used for equally spaced

frequency bands with analysis frequencies

, , ,...,

where can be less than

πω

••

= = −

k

LN

k k NN

N L assume is an ideal lowpass filter with cutoff frequency

πω

=p

w(n)

N

example with N=6 equally spaced ideal

filters

Page 97: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

97

FBS Reconstruction in Non-Overlapping Bands

1 1

0 0

1 12

0 0

/

the composite impulse response for the FBS system is

( ) ( ) ( )

defining a composite of the terms being summed as

( )

we get fo

k k

k

N Nj n j n

k k

N Nj n j kn N

k k

h n w n e w n e

p n e e

ω ω

ω π

− −

= =

− −

= =

= =

= =

∑ ∑

∑ ∑

%

r ( )

( ) ( ) ( ) it is easy to show that is a periodic train of impulses of the form

( ) ( )

giving for ( ) the expression

( ) ( ) ( )

thus the composite i

r

r

h n

h n w n p np(n)

p n N n rN

h n

h n N w rN n rN

δ

δ

=−∞

=−∞

=•

= −

= −

%

%

%

%

mpulse response is the window sequence sampled at intervals of samplesN

Page 98: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

98

FBS Reconstruction in Non-Overlapping Bands

1 for ideal LPF we have

sin( / ) sin( )( ) , ( ) ( )

giving ( ) ( ) other cases where perfect reconstruction is obtained1. ( ) is of finite length and causal (no images)2. ( ) has

n N rw n w rN rn rN N

h n n

w n L Nw n

π π δπ π

δ

= = =

=•

%

0

0

0

0

0

10 0 1 2

length and has the property ( ) / , for

for ( , , , ,...)

giving ( ) ( ) ( ) ( )

( ) ( ) ( )j r Nj

Nw n N n r N

n rN r r r

h n p n w n n r N

H e e y n x n r Nωω

δ−

>= =

= = ≠ = ± ±

= = −

= ⇒ = −

%

%

impulse response of ideal lowpass filter with cutoff frequency π/N

Page 99: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

99

Summary of FBS Reconstruction for perfect reconstruction using FBS methods

1. does not need to be either time-limited or frequency-limited to exactly reconstruct

from ( ) 2. just needs eq

ω

kjn

w(n)

x(n) X ew(n) ually spaced zeros, spaced samples apart for theoretically perfect reconstruction

exact reconstruction of the input is possible with a number of frequency channels less than that required by the

•N

sampling theorem key issue is how to design digital filters that match these

criteria•

Page 100: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

100

Practical Implementation of FBS

Page 101: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

FBS and OLA Comparisons

101

Page 102: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

102

FBS and OLA Comparisonsduals filter bank summation method overlap addition method

-- one depends on sampling relation in frequency -- one depends on sampling relation in time FBS requires sampling in frequency

• ←⎯⎯⎯→

1

0

0

1 0( )

be such that the window

transform ( ) obeys the relation

( ) ( ) any

OLA requires that sampling in time be such that the window obeys the relation

( ) ( ) / any

ω

ω ω ω−

=

=

− =

∑ k

j

Nj

k

j

W e

W e wN

w rR n W e R

the key to Short-Time Fourier Analysis is the ability to modify the short-time spectrum (via quantization, noise enhancement, signal enhancement, speed-up/slow-down,etc) and recover an

=−∞

∑r

n

"unaliased" modified signal

Page 103: Digital Speech Processing— Lecture 9 Short-Time Fourier ... · Short-Time Fourier Analysis Methods-Introduction. 2 General Discrete-Time Model of Speech Production Voiced Speech:

103

Overlap Addition (OLA) Method

1

0

0 1

1

assume ( ) sampled with period samples in time

( ) ( ), integer, the Overlap Add Method is based on the summation

( ) ( ) OLA Metho

ω

ω ω

ω ω∞ −

=−∞ =

= ≤ ≤ −

⎡ ⎤= ⎢ ⎥

⎢ ⎥⎣ ⎦∑ ∑

k

k k

k k

jn

j jr rR

Nj j n

rr k

X e R

Y e X e r k N

y n Y e eN

d

for each value of , compute the inverse transform of ( ) giving the sequences ( ) ( ) ( ), the signal at time is obtained by summing the values at time

of all the seque

ω•

= − − ∞ < < ∞

kjr

r

r Y e

y m x m w rR m mn n

nces, ( ) that overlap at time , giving

( ) ( ) ( ) ( )

if has a bandlimited FT and if ( ) is properly sampled in time (i.e., small enough to avoid aliasing) the

ω

∞ ∞

=−∞ =−∞

= = −

∑ ∑k

r

rr r

jn

y m n

y n y n x n w rR n

w(n) X eR

0

0

n

( ) ( ) / -- independent of , and

( ) ( ) ( ) / -- exact reconstruction of

=−∞

− ≈

=

∑ j

rj

w rR n W e R n

y n x n W e R x(n)