Digital Speech ProcessingDigital Speech ... - web.ece.ucsb.edu...a(t) (or equivalently a lowpass...

Digital Speech ProcessingDigital Speech Processing——Lecture 15Lecture 15Lecture 15Lecture 15

Speech Coding Methods Based Speech Coding Methods Based on Speech Waveform on Speech Waveform

R t ti d S hR t ti d S hRepresentations and Speech Representations and Speech ModelsModels——Uniform and NonUniform and Non--Uniform Coding MethodsUniform Coding Methods

1

Uniform Coding MethodsUniform Coding Methods

AnalogAnalog--toto--Digital ConversionDigital Conversion(Sampling and Quantization)(Sampling and Quantization)(Sampling and Quantization)(Sampling and Quantization)

2Class of “waveform coders” can be

represented in this manner

Information RateInformation RateInformation RateInformation Rate

Waveform coder information rate Ii ,

( )

Waveform coder information rate, of the digital representation of the signal,

d fi d

wIi

( )/

, defined as:

a

w S

x tI B F B T= ⋅ =

1where is the number of bits used to

t h l d i th

B

F ,represent each sample and is the

numbe

SFT

=

r of samples/second3

number of samples/second

Speech Information RatesSpeech Information RatesSpeech Information RatesSpeech Information Rates• Production level:

– 10-15 phonemes/second for continuous speech– 32-64 phonemes per language => 6 bits/phoneme– Information Rate=60-90 bps at the source– Information Rate=60-90 bps at the source

• Waveform level– speech bandwidth from 4 – 10 kHz => sampling rate

f 8 20 kHfrom 8 – 20 kHz– need 12-16 bit quantization for high quality digital

coding– Information Rate=96-320 Kbps => more than 3

orders of magnitude difference in Information Rates between the production and waveform levels

4

Speech Analysis/Synthesis SystemsSpeech Analysis/Synthesis Systems

• Second class of digital speech coding systems:o analysis/synthesis systemso model-based systemso hybrid coderso hybrid coderso vocoder (voice coder) systems

• Detailed waveform properties generally not preservedo coder estimates parameters of a model for speech production

5

o coder tries to preserve intelligibility and quality of reproduction from the digital representation

Speech Coder ComparisonsSpeech Coder Comparisons

• Speech parameters (the chosen representation) are• Speech parameters (the chosen representation) are encoded for transmission or storage

• analysis and encoding gives a data parameter vector• data parameter vector computed at a sampling rate much lower than the signal sampling rateg p g• denote the “frame rate” of the analysis as Ffr• total information rate for model-based coders is:

I B F= ⋅• where Bc is the total number of bits required to represent the parameter vector

m c frI B F=

6

Speech Coder ComparisonsSpeech Coder Comparisons

• waveform coders characterized by:• high bit rates (24 Kbps – 1 Mbps)• low complexitylow complexity• low flexibility

• analysis/synthesis systems characterized by:• low bit rates (10 Kbps – 600 bps)

hi h l it

7

• high complexity• great flexibility (e.g., time expansion/compression)

Introduction to Waveform CodingIntroduction to Waveform Coding

( ) ( )=x n x nT1 2

( ) ( )

( ) ( )π∞Ω

=

= Ω +∑a

j Ta

x n x nTj kX e X j

T T1 2( ) ( )ω ω πω

=−∞

∞

= Ω ⇒ = +

∑

∑

ak

ja

T Tj j kT X e X

T T T8

( ) ( )=−∞∑ a

kT T T

Introduction to Waveform CodingIntroduction to Waveform CodingT=1/ Fs

• to perfectly recover xa(t) (or equivalently a lowpass filtered version of it) from the set of digital samples (as yet unquantized) we require that F = 1/T > twicethe set of digital samples (as yet unquantized) we require that Fs = 1/T > twice the highest frequency in the input signal

• this implies that xa(t) must first be lowpass filtered since speech is not inherently lowpassinherently lowpass

for telephone bandwidth the frequency range of interest is 200-3200 Hz (filtering range) => Fs = 6400 Hz, 8000 Hz

f id b d h th f f i t t i 100 7000 H

9

for wideband speech the frequency range of interest is 100-7000 Hz (filtering range) => Fs = 16000 Hz

Sampling Speech SoundsSampling Speech Sounds

• notice high frequency components of vowels and fricatives (up to 10 kHz) => need F > 20 kHz without lowpass filtering

10

need Fs 20 kHz without lowpass filtering

• need only about 4 kHz to estimate formant frequencies

• need only about 3.2 kHz for telephone speech coding

Telephone Channel ResponseTelephone Channel ResponseTelephone Channel ResponseTelephone Channel Response

it is clear that 4 kHz bandwidth is sufficient for most li ti i t l h h b f i h t

11

applications using telephone speech because of inherent channel band limitations from the transmission path

Statistical Model of SpeechStatistical Model of SpeechStatistical Model of SpeechStatistical Model of Speech• assume xa(t) is a sample function of a continuous-time a( ) p

random process• then x(n) (derived from xa(t) by sampling) is also a

l f di t ti dsample sequence of a discrete-time random process• xa(t) has first order probability density, p(x), with

autocorrelation and power spectrum of the formautocorrelation and power spectrum of the form

[ ]( ) ( ) ( )φ τ τ= +a a aE x t x t[ ]( ) ( ) ( )

( ) ( ) τ

φ

φ τ τ∞

− ΩΦ Ω = ∫

a a a

ja a e d

12

−∞∫

Statistical Model of SpeechStatistical Model of SpeechStatistical Model of SpeechStatistical Model of Speech

[ ] ( ) has autocorrelation and power spectrum of the form:

( ) ( ) ( )• x n

E [ ][ ]

( ) ( ) ( )

( ) ( ) ( )

φ

φ∞

= +

= + =

∑

a a a

j T j T

m E x n x n m

E x nT x nT mT mT

1 2

( ) ( )

( )

φ

π

Ω − Ω

=−∞

∞

Φ =

= Φ Ω +

∑

∑

j T j Tm

m

a

e m e

kT T

( )

since ( ) is a sampφ=−∞

•

∑ akT T

m

2

led version of ( ), its transform is an infinite sum

of the power spectrum shifted by aliased version of the power

φ τ

π>

a

kof the power spectrum, shifted by aliased version of the power

spectrum of the original analog signal

=>T

13

Speech Probability Density FunctionSpeech Probability Density Function• probability density function for x(n) is the same as for xa(t) since

x(n)= xa(nT) => the mean and variance are the same for both x(n)and xa(t)and xa(t)

• need to estimate probability density and power spectrum from speech waveforms– probability density estimated from long term histogram of amplitudesprobability density estimated from long term histogram of amplitudes– good approximation is a gamma distribution, of the form:

1 2 33

/−⎡ ⎤ x

– simpler approximation is Laplacian density of the form:

23 08

( ) ( )| |

σ

πσ⎡ ⎤

= = ∞⎢ ⎥⎢ ⎥⎣ ⎦

x

x

p x e px

– simpler approximation is Laplacian density, of the form:

21 10| |

( ) ( )σ−

= =x

x

p x e p14

02 2

( ) ( )σ σx x

p x e p

Measured Speech DensitiesMeasured Speech DensitiesMeasured Speech DensitiesMeasured Speech Densities• distribution normalized so

i 0 d i i 1mean is 0 and variance is 1 (x=0, σx=1)

• gamma density more closely approximates measured distribution for speech than Laplacian

• Laplacian is still a good model and is used in analytical studiesy

• small amplitudes much more likely than large amplitudes—by 100:1 ratio

15

amplitudes by 100:1 ratio

LongLong--Time Autocorrelation and Power Time Autocorrelation and Power SpectrumSpectrumSpectrumSpectrum

{ }

analog signal

( ) ( ) ( ) ( ) ( ) τφ φ∞

− Ω

•

Φ Ω ∫ jE x t x t e d{ }

{ } { }

( ) ( ) ( ) ( ) ( )

discrete-time signal( ) ( ) ( ) ( ) ( ) ( )

τφ τ τ φ τ τ

φ φ

Ω

−∞

= + ⇔ Φ Ω =

•

+ +

∫ ja a a a aE x t x t e d

m E x n x n m E x nT x nT mT mT{ } { }1 2

( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( )

φ φ

πφ∞ ∞

Ω − Ω

=−∞ =−∞

= + = + =

⇔ Φ = = Φ Ω +∑ ∑

a a a

j T j Tma

m k

m E x n x n m E x nT x nT mT mT

ke m eT T

estimated correlation and power •1

0

1 0 1

spectrum

ˆ ( ) ( ) ( ), | | ,φ− −

=

= + ≤ ≤ −∑L m

nm x n x n m m L

L

Estimates ≠ Expectations

• tapering with m due to finite windows0

is large integer

ˆˆ ( ) ( ) ( )φ

=

Ω − ΩΦ = ∑

n

Mj T j Tm

L

e w m m e

• need window on estimated correlation for smoothing because of discontinuity at ±M

16

=−m M ±M

Speech AC and Power SpectrumSpeech AC and Power Spectrum

• can estimate long term autocorrelation and power spectrum

Lowpass filtered speech

autocorrelation and power spectrum using time-series analysis methods

1

10

1

ˆ[ ]ˆ[ ] ˆ[ ]φρφ

− ≤ =

L m

mmBandpass filtered speech

1

01

2

1

11

( ) ( ),

( )

− −

=− −

+= ≤

∑

∑

L m

nL m

x n x n mL

x n

• 8 kHz sampled speech for several speakers

• high correlation between adjacent0

0 1

( )

, is window length=

≤ ≤ −

∑n

x nL

m L L

high correlation between adjacent samples

• lowpass speech more highly correlated than bandpass speech

17

correlated than bandpass speech

Speech Power SpectrumSpeech Power SpectrumSpeech Power SpectrumSpeech Power Spectrum

• power spectrum estimate derived from one minute of speech

• peaks at 250-500 Hz (region of maximal spectral information)

• spectrum falls at about 8-10 db/octave

• computed from set of bandpass filters

18

Alternative Power Spectrum Alternative Power Spectrum E iE iEstimateEstimate

ˆestimate long term correlation ( )φ• m estimate long term correlation, ( ), using sampled speech, thenweight and transform, giving:

φ• m

2 2/ /ˆˆ ) ( ) ( )

ˆ this lets us use ( ) to get a power

π πφ

φ

−

=−

Φ =

•

∑M

j k N j km N

m M(e w m m e

m( ) g pˆspectrum estimate

φ

Φ(e 2 / )via the weighting window, ( )

πj k N

w m

Contrast linear versus logarithmic scale for power

19

logarithmic scale for power spectrum plots

Estimating Power Spectrum via Estimating Power Spectrum via M th d f A d P i dM th d f A d P i dMethod of Averaged PeriodogramsMethod of Averaged Periodograms Periodogram defined as:i

21

0

1

g

( ) ( ) ( )ω ω−

−

=

= ∑L

j j n

nP e x n w n e

LU

( )1

21 where [ ] is an -point window (e.g., Hamming window), and

( )−

= ∑

iL

w n L

U w nL

( )0

is a normalizing constant that compensates for window=∑

inL

U tapering use the DFT to compute the periodogram as:i

212 2

0

1 0 1( )/ ( / ) ( ) ( ) ( ) , ,π π−

−

=

= ≤ ≤ −∑L

j k N j N kn

nP e x n w n e k N

LU

20

is size of DFTN

Averaging (Short) PeriodogramsAveraging (Short) PeriodogramsAveraging (Short) PeriodogramsAveraging (Short) Periodograms variability of spectral estimates can be reduced by averaging

short periodograms, computed over a long segment of speechishort periodograms, computed over a long segment of speech using an -point window, a short periodogram is the same as

the STFT of the weLi

1

ighted speech interval, namely:rR L+ −1

2 / 2 /[ ] ( ) [ ] [ ] , 0 1

where is the DFT transform size (number of frequency estimates)C id i l f h ( h

rR Lj k N j km N

r rRm rR

X k X e x m w rR m e k N

NN

π π+

−

=

= = − ≤ ≤ −∑i

i l ) thN Consider using samples of speech (where SNi

212 / 2 /1( ) ( ) 0 1

is large); theaveraged periodogram is defined as:

S

Kj k N j k N

N

e X e k Nπ π−

Φ ≤ ≤∑0

( ) ( ) , 0 1

where is the number of windowed segments in samples, is the window length, and is the windo

j jrR

r

S

e X e k NKLU

K NL U

=

Φ = ≤ ≤ −∑i

w normalization factor

21

s t e do e gt , a d s t e doU/ 2

o a at o acto use (shift window by half the window duration between

adjacent periodogram estimates)R L=i

88,00088,000--Point FFT Point FFT –– Female Female S kS kSpeakerSpeaker

22Single spectral estimate using all signal samples

Long Time Average SpectrumLong Time Average SpectrumLong Time Average SpectrumLong Time Average Spectrum

23

Long Time Average SpectrumLong Time Average Spectrum

Female speaker

M l kMale speaker

24

Instantaneous QuantizationInstantaneous QuantizationInstantaneous QuantizationInstantaneous Quantization• separating the processes of sampling andseparating the processes of sampling and

quantization• assume x(n) obtained by sampling aassume x(n) obtained by sampling a

bandlimited signal at a rate at or above the Nyquist rateyq

• assume x(n) is known to infinite precision in amplitudep

• need to quantize x(n) in some suitable manner

25

Quantization and CodingQuantization and CodingCoding is a two-stage process 1. quantization process:

ˆ( ) ( )ˆ( ) ( ) 2. encoding process:

ˆ( ) ( )

→

→

x n x n

x n c n( ) ( )- where is the (assumed fixed) quantization step size

→Δ

x n c n

Decoding is a single-stage process 1. decoding process:

ˆ( ) ( )′ ′→c n x n( ) ( )if ( ) ( ), (no errors in

ˆ ˆ transmission) then ( ) ( )

→′• =

′ =

c n x nc n c n

x n x n

26

ˆ ( ) ( ) coding and quantization loses information

′ ≠ ⇒x n x nassume Δ’=Δ

BB--bit Quantizationbit QuantizationBB bit Quantizationbit Quantization• use B-bit binary numbers to represent the quantized y p q

samples => 2B quantization levels• Information Rate of Coder: I=B FS= total bit rate in

bits/secondbits/second– B=16, FS= 8 kHz => I=128 Kbps– B=8, FS= 8 kHz => I=64 Kbps– B=4 F = 8 kHz => I=32 Kbps– B=4, FS= 8 kHz => I=32 Kbps

• goal of waveform coding is to get the highest quality at a fixed value of I (Kbps), or equivalently to get the lowest value of I for a fixed qualityvalue of I for a fixed quality

• since FS is fixed, need most efficient quantizationmethods to minimize I

27

Quantization BasicsQuantization BasicsQuantization BasicsQuantization Basics

| ( )| ( )• assume |x(n)| ≤ Xmax (possibly ∞)– for Laplacian density (where Xmax = ∞),p y ( max ),

can show that 0.35% of the samples fall outside the range -4σx ≤ x(n) ≤ 4σx => g x ( ) xlarge quantization errors for 0.35% of the samplesp

– can safely assume that Xmax is proportional to σx

28

proportional to σx

Quantization ProcessQuantization Process• quantization => dividing amplitude

range into a finite set of ranges, and assigning the same bin to all

3-bit quantizer => 8 levels

and assigning the same bin to all samples in a given range

0 1 10 100ˆ( ) ( )= < ≤ ⇒x x n x x

1 2 2

2 3 3

101110

ˆ( ) ( )ˆ( ) ( )

< ≤ ⇒

< ≤ ⇒

x x n x xx x n x x

3 4

1 0 1

1110 011

010

ˆ( ) ( )ˆ( ) ( )ˆ( ) ( )

− −

< < ∞ ⇒

< ≤ = ⇒

x x n xx x n x x

2 1 2

3 2 3

010001000

( ) ( )ˆ( ) ( )ˆ( ) ( )

− − −

− − −

< ≤ ⇒

< ≤ ⇒

∞ < ≤ ⇒

x x n x xx x n x x

x n x x• codewords are arbitrary!! => th d h i th t

29

3 4 000( ) ( ) range level

− −− ∞ < ≤ ⇒x n x xcodeword

there are good choices that can be made (and bad choices)

Uniform QuantizationUniform QuantizationUniform QuantizationUniform Quantization• choice of quantization ranges and levels so that

signal can easily be processed digitallysignal can easily be processed digitally

mid-riser mid-treadd se mid-tread

1

ˆ ˆ−− = Δi ix x

30

1ˆ ˆquantization step size

−− = Δ

Δ =i ix x

MidMid--Riser and MidRiser and Mid--Tread QuantizersTread Quantizers• mid-riser

– origin (x=0) in middle of rising part of the staircase– same number of positive and negative levels– symmetrical around origin

• mid-tread– origin (x=0) in middle of quantization level

ti l l th iti– one more negative level than positive– one quantization level of 0 (where a lot of activity occurs)

• code words have direct numerical significance (sign-magnitude representation for mid-riser two’s complement for mid-tread)representation for mid riser, two s complement for mid tread)

[ ]2

- for mid-riser quantizer:

ˆ( ) ( ) ( )Δ= + Δx n sign c n c n

[ ]2

1 0

1 1

where ( ) if first bit of ( ) if first bit of ( )

- for mid-tread quantizer code words are 3-bit

= + =

= − =

sign c n c nc n

31

- for mid-tread quantizer code words are 3-bittwo's complement representation, giving

ˆ ( ) ( ) = Δx n c n

AA--toto--D and DD and D--toto--A ConversionA Conversionx̂[n] x [n]

x[n]

ˆ[ ] [ ] [ ]= −e n x n x n

Quantization error[ ] [ ] [ ]

32

ˆ x B[n]

Quantization of a Sine WaveQuantization of a Sine WaveUnquantizedsinewave

3-bit quantizationquantizationwaveform

3 bit3-bitquantizationerror

Δ

8-bitquantization

Δ

33

error

Quantization of Complex SignalQuantization of Complex SignalQuantization of Complex SignalQuantization of Complex Signal[ ] sin(0.1 ) 0.3cos(0.3 )x n n n= +

(a) Unquantized signal

(b) 3-bit quantized x[n]

(c) 3-bit quantization error e[n]error, e[n]

(d) 8-bit quantization

34

error, e[n]

Uniform QuantizersUniform QuantizersUniform QuantizersUniform Quantizers• Uniform Quantizers characterized by:

number of levels 2B (B bits)– number of levels—2B (B bits)– quantization step size-Δ

• if |x(n)|≤Xmax and x(n) is a symmetric density, thenΔ2B 2 XΔ2B = 2 Xmax

Δ= 2 Xmax/ 2B

• if we let

• with x(n) the unquantized speech sample and e(n) the quantization

ˆ( ) ( ) ( )x n x n e n= +

with x(n) the unquantized speech sample, and e(n) the quantization error (noise), then

( )Δ Δ≤ ≤e n

(except for last quantization level which can exceed X

352 2

( )− ≤ ≤e n level which can exceed Xmaxand thus the error can exceed

Δ/2)

Quantization Noise ModelQuantization Noise Model1. quantization noise is a zero-mean, stationary

white noise processwhite noise process[ ] 2 0

0

( ) ( ) ,σ+ = =

=eE e n e n m m

otherwise

2. quantization noise is uncorrelated with the input signal

0 otherwise

input signal

3 distribution of quantization errors is uniform[ ] 0( ) ( )E x n e n m m+ = ∀

3. distribution of quantization errors is uniformover each quantization interval

1 2 2= Δ − Δ ≤ ≤ Δep e e( ) / / /2

20 σ Δ⇒ = =e

36

1 2 20Δ Δ ≤ ≤ Δ

=ep e e

otherwise( ) / / / 0

12, σ⇒ ee

Quantization ExamplesQuantization Examples

S h Si l 3 bi Q i i

how good is the d l f th

Speech Signal 3-bit Quantization Error

model of the previous slide?

378-bit Quantization Error (scaled)

Typical Amplitude DistributionsTypical Amplitude Distributions

Laplaciandistribution isdistribution isoften used asa model for speech signalsspeech signals

A 3-bit quantized i l h lsignal has only

8 different values

38

33--Bit Speech QuantizationBit Speech Quantization

Input toquantizer

3-bit ti tiquantization

waveform

39

33--Bit Speech Quantization ErrorBit Speech Quantization Error

Input topquantizer

3-bitquantizationquantizationerror

40

55--Bit Speech QuantizationBit Speech Quantization

Input toquantizer

5-bit ti tiquantization

waveform

41

55--Bit Speech Quantization ErrorBit Speech Quantization Error

Input topquantizer

5-bitquantizationquantizationerror

42

Histograms of Quantization NoiseHistograms of Quantization NoiseHistograms of Quantization NoiseHistograms of Quantization Noise

bi3-bit quantizationhistogramg

8-bit quantizationhistogram

43

s og

Spectra of Quantization NoiseSpectra of Quantization Noise

≈ 12 dB

44

Correlation of Quantization ErrorCorrelation of Quantization ErrorCorrelation of Quantization ErrorCorrelation of Quantization Error

3-bit Quantization

8-bit Quantization

45assumptions look good at 8-bit quantization, not as good at 3-bit levels

(however only 6 db variation in power spectrum level)

8 bit Quantization

Sound DemoSound Demo• “Original” speech sampled at 16kHz, 16 bits/sample

• Quantized to 10 bits/sample

• Quantization error (x32) for 10 bits/sample

• Quantized to 5 bits/sample

• Quantization error for 5 bits/sample

46

SNR for QuantizationSNR for QuantizationSNR for QuantizationSNR for Quantization• can determine SNR for quantized speech as

( )( )

222

2 22

( )( )

( )( )σσ

= = =∑∑

x nx nE x n

SNRe nE e n( )2

22

max

( )( )

(uniform quantizer step size)

σ

Δ =

∑en

B

e nE e n

X2

12 2

assume ( ) (uniform distribution)Δ Δ• = − ≤ ≤

Δ

B

p e e

222

2

0

12 3 2max

( )

σ

=

Δ= =e B

otherwiseX

47

12 3 2( )

SNR for QuantizationSNR for QuantizationSNR for QuantizationSNR for Quantization• can determine SNR for quantized speech as

( )( )

222

2 22

( )( )

( )( )σσ

= = =∑∑

x n

en

x nE x nSNR

e nE e n

222

2

22

12 3 2max

( )σ Δ

= =

⎡ ⎤ ⎡ ⎤

n

e B

B

X

22

10 102 2

3 2 10 6 4 77 20 max

max

( ) ; ( ) log . logσσ σ

σ

⎡ ⎤ ⎡ ⎤= = = + −⎢ ⎥ ⎢ ⎥⎡ ⎤ ⎣ ⎦⎣ ⎦⎢ ⎥⎣ ⎦

Bx

e x

x

XSNR SNR dB BX

4 6 7 216

maxif we choose , then . ,

σ⎣ ⎦

• = = −

=

x

xX SNR BB S 88 8

8 40 8.=NR dB

B SNR dB The term Xmax/σx tells how

48

8 40 83 10 8

, . , .

= == =

B SNR dBB SNR dB

max xbig a signal can be

accurately represented

Variation of SNR with Signal LevelVariation of SNR with Signal Level

Overloadf

SNR improves 6 dB/bit,from

clipping

pbut it decreases 6 dB for each halving of theinput signal amplitudeinput signal amplitude

12 dB

49

Clipping StatisticsClipping StatisticsClipping StatisticsClipping Statistics

50

ReviewReview----Linear Noise ModelLinear Noise Model

2 2{( [ ]) } σ= xE x n• assume speech is stationary random signal.• error is uncorrelated with the input.

• error is uniformly distributed over the interval

0{ [ ] [ ]} { [ ]} { [ ]}= =E x n e n E x n E e n

y

• error is stationary white noise, (i.e. flat spectrum)2 2( / ) [ ] ( / ).− Δ < ≤ Δe n

51

e o s s a o a y e o se, ( e a spec u )2

2

12( ) , | |ω σ ω πΔ

= = ≤e eP

Review of Quantization Review of Quantization A iA iAssumptionsAssumptions

1. input signal fluctuates in a complicated p g pmanner so a statistical model is valid

2. quantization step size is small enough to i l l t d tt iremove any signal correlated patterns in

quantization error3 range of quantizer matches peak to peak3. range of quantizer matches peak-to-peak

range of signal, utilizing full quantizer range with essentially no clippingg y pp g

4. for a uniform quantizer with a peak-to-peak range of ±4 σx, the resulting SNR(dB) i 6B 7 2

52

SNR(dB) is 6B-7.2

Uniform Quantizer SNR IssuesUniform Quantizer SNR IssuesUniform Quantizer SNR IssuesUniform Quantizer SNR IssuesSNR=6B-7.2

• to get an SNR of at least 30 dB, need at least B≥6 bits (assuming Xmax=4 σx)– this assumption is weak across speakers and differentthis assumption is weak across speakers and different

transmission environments since σx varies so much (order of 40 dB) across sounds, speakers, and input conditions

– SNR(dB) predictions can be off by significant amounts if full ( ) p y gquantizer range is not used; e.g., for unvoiced segments => need more than 6 bits for real communication systems, more like 11-12 bitsneed a quantizing system where the SNR is independent of the– need a quantizing system where the SNR is independent of the signal level => constant percentage error rather than constant variance error => need non-uniform quantization

53

Instantaneous CompandingInstantaneous Companding• to order to get constant percentage error (rather than constant

variance error), need logarithmically spaced quantization levels– quantize logarithm of input signal rather than input signal itself

54

μμ--Law CompandingLaw Companding

μ−LawCompander Quantizer

y[n] ˆ y [n]x[n]

( ) ln | ( ) |=y n x n

[ ] [ ][ ] 1 0

( ) exp ( ) ( )

where ( ) ( )

= ⋅

• = + ≥

x n y n sign x n

sign x n x n[ ] 1 0

1 0

where ( ) ( )( )

th ti d l it d i

• = + ≥

= − <

sign x n x n

x n

[ ] the quantized log magnitude is

ˆ log•

=y(n) Q | x(n) |

55log | ( ) | ( ) new error signalε= +x n n

μμ--Law CompandingLaw Companding

[ ] assume that ( ) is independent of log | ( ) | . The inverse is

ˆ ˆexp ( )ε•

⎡ ⎤⎣ ⎦

n x nx(n) y(n) sign x n[ ]

[ ] [ ][ ]

exp ( )

| ( ) | ( ) exp ( )

( ) exp ( )

ε

ε

= ⋅⎡ ⎤⎣ ⎦= ⋅

= ⋅

x(n) y(n) sign x n

x n sign x n n

x n n[ ][ ] 1

1

( ) exp ( )

assume ( ) is small, then exp ( ) ( ) ...

ˆ ( )

ε

ε ε ε• ≈ + +

= +

x n n

n n n

x(n) x n [ ]( ) ( ) ( ) ( ) ( ) ( )ε ε= + = +n x n n x n x n f n

2 2 2

2

since we assume ( ) and ( ) are independent, then

ε

ε

σ σ σ

•

= ⋅f x

x n n

2

2 2

2

1

is independent of --it depends only on step sizeε

σσ σ

σ

= =

•

x

f

x

SNR

SNR

56

p p y px

PseudoPseudo--Logarithmic CompressionLogarithmic Compressiong pg p

• unfortunately true logarithmic compression is not practical, since the dynamic range (ratio between the largest and smallest values) is infinite => need anrange (ratio between the largest and smallest values) is infinite > need an infinite number of quantization levels

• need an approximation to logarithmic compression => μ-law/A-law compression

57

μμ--law Compressionlaw Compressionμμ law Compressionlaw Compression[ ]

| ( ) |

=

⎡ ⎤

y(n) F x(n)

[ ]1

1max

max

| ( ) |log

log( )

μ

μ

⎡ ⎤+⎢ ⎥

⎣ ⎦= ⋅+

x nX

X sign x(n)

0 00

when ( ) ( )when ( ) ( ) linear compressionμ

• = ⇒ =• ⇒

x n y ny n x n0

max

when , ( ) ( ) linear compression when is large, and for large | ( ) |

| ( ) | | ( ) | loglog

μμ

μ

• = = ⇒•

⎡ ⎤≈ ⋅ ⎢ ⎥

⎣ ⎦

y n x nx n

X x ny nXmaxlogμ ⎢ ⎥

⎣ ⎦X

58

Histogram for Histogram for μμ --Law CompandingLaw Companding

SpeechSpeechwaveform

Output ofLμ-Law

compander

59

μμ--law approximation to loglaw approximation to log law encoding gives a good approximation to constant percentage

error (| ( ) | log | ( ) |)

μ• −

∝y n x n

60

SNR forSNR for μμ--law Quantizerlaw QuantizerSNR for SNR for μμ law Quantizerlaw Quantizer

[ ]2

10 106 4 77 20 1 10 1 2max max( ) . log ln( ) logμμσ μσ

⎡ ⎤⎛ ⎞ ⎛ ⎞⎢ ⎥= + − + − + +⎜ ⎟ ⎜ ⎟⎢ ⎥⎝ ⎠ ⎝ ⎠⎣ ⎦

X XSNR dB B

6

max

dependence on

h l d d

μσ μσ⎢ ⎥⎝ ⎠ ⎝ ⎠⎣ ⎦• ⇒

x x

B BX

good

dmax

max

much less dependence on

for large , is less sensitive to changes in

σ

μ

• ⇒

• ⇒

x

XSNR

good

goodσ x

- -law quantizer used in wireline telephony for more than 40 yearsμ

61

μμ--Law CompandingLaw Compandingμμ Law CompandingLaw Companding

Mu-law compressed signalcompressed signal utilizes almost the full dynamic range (±1) much more ( )

effectively than the original speech

signal

62

μμ--Law Quantization ErrorLaw Quantization Errorμμ Law Quantization ErrorLaw Quantization Error(a) Input speech

signal

(b) Histogram of 255μ=255

compressed speech

(c) Histogram of 8-bit companded quantization

63

quantization error

Comparison of Linear and Comparison of Linear and μμ--law Quantizerslaw Quantizers

Dashed lineDashed line –linear (uniform) quantizers with 6quantizers with 6, 7, 8 and 12 bit quantizers

Solid lineSolid line – μ-law quantizers with 6 7 and 8 bitwith 6, 7 and 8 bit quantizers (μ=100)

• can see in these plots that Xmax characterizes the quantizer (it specifies the ‘overload’ amplitude) and σ characterizes the signal (it

64

specifies the overload amplitude), and σx characterizes the signal (it specifies the signal amplitude), with the ratio (Xmax/σx) showing how the signal is matched to the quantizer

Comparison of Linear and Comparison of Linear and μμ--law law Q tiQ tiQuantizersQuantizers

Dashed lineDashed line –linear (uniform) quantizers with 6quantizers with 6, 7, 8 and 13 bit quantizers

Solid lineSolid line – μ-law quantizers with 6 7 and 8 bitwith 6, 7 and 8 bit quantizers (μ=255)

65

Analysis of Analysis of μμ--Law PerformanceLaw Performance• curves show that μ-law quantization can maintain

roughly the same SNR over a wide range ofroughly the same SNR over a wide range of Xmax/σx, for reasonably large values of μ– for μ=100, SNR stays within 2 dB for

8< X /σ <308< Xmax/σx<30– for μ=500, SNR stays within 2 dB for

8< Xmax/σx<150loss in SNR in going from μ=100 to μ=500 is about 2 6– loss in SNR in going from μ=100 to μ=500 is about 2.6 dB => rather small sacrifice for much greater dynamic rangeB=7 gives SNR=34 dB for μ=100 => this is 7 bit μ law– B=7 gives SNR=34 dB for μ=100 => this is 7-bit μ-law PCM-the standard for toll quality speech coding in the PSTN => would need about 11 bits to achieve this dynamic range and SNR using a linear quantizer

66

dynamic range and SNR using a linear quantizer

CCITT G.711 StandardCCITT G.711 StandardCCITT G.711 StandardCCITT G.711 Standard

• Mu-law characteristic is approximated byMu-law characteristic is approximated by 15 linear segments with uniform quantization within a segmentquantization within a segment.– uses a mid-tread quantizer. +0 and –0 are

defineddefined.– decision and reconstruction levels defined to

be integersbe integers• A-law characteristic is approximated by 13

linear segments67

linear segments– uses a mid-riser quantizer

Summary of Uniform andSummary of Uniform and μμ--Law PCMLaw PCMSummary of Uniform and Summary of Uniform and μμ Law PCMLaw PCM

• quantization of sample values is unavoidable in DSP li ti d i di it l t i i d t fapplications and in digital transmission and storage of

speech• we can analyze quantization error using a random noise

d lmodel• the more bits in the number representation, the lower the

noise. The ‘fundamental theorem’ of uniform quantization is that “the signal to noise ratio increases 6quantization is that the signal-to-noise ratio increases 6 dB with each added bit in the quantizer”; however, if the signal level decreases while keeping the quantizer step-size the same it is equivalent to throwing away one bitsize the same, it is equivalent to throwing away one bit for each halving of the input signal amplitude

• μ-law compression can maintain constant SNR over a wide dynamic range, thereby reducing the dependency

68

wide dynamic range, thereby reducing the dependency on signal level remaining constant

Quantization for Optimum SNR Quantization for Optimum SNR (MMSE)(MMSE)(MMSE)(MMSE)

• goal is to match quantizer to actual signalgoal is to match quantizer to actual signal density to achieve optimum SNRμ law tries to achieve constant SNR over wide– μ-law tries to achieve constant SNR over wide range of signal variances => some sacrifice over SNR performance when quantizer step p q psize is matched to signal variance

– if σx is known, you can choose quantizer x y qlevels to minimize quantization error variance and maximize SNR

69

Quantization for MMSEQuantization for MMSEQuantization for MMSEQuantization for MMSE• if we know the size of the signal (i.e., we know

the signal variance σ 2) then we can design athe signal variance, σx2) then we can design a

quantizer to minimize the mean-squared quantization error.q– the basic idea is to quantize the most probable

samples with low error and least probable with higher error.error.

– this would maximize the SNR– general quantizer is defined by defining M

t ti l l d t f M d i i l lreconstruction levels and a set of M decision levels defining the quantization “slots”.

– this problem was first studied by J. Max, “Quantizing

70

for Minimum Distortion,” IRE Trans. Info. Theory, March,1960.

Quantizer Levels for Maximum Quantizer Levels for Maximum SNRSNRSNRSNR

( )22 2

variance of quantization noise is:

ˆ( ) ( ) ( )

•

⎡ ⎤⎡ ⎤E E ( )[ ]

2 2

2 2 1 1 1 2( / ) ( / ) ( / )

( ) ( ) ( )

ˆ with ( ) ( ) . Assume quantization levels

ˆ ˆ ˆ ˆ ˆ , ,..., , ,...,

σ

− − + −

⎡ ⎤⎡ ⎤= = −⎣ ⎦ ⎣ ⎦• =

⎡ ⎤⎣ ⎦

e

M M M

E e n E x n x n

x n Q x n M

x x x x x2 2 1 1 1 2( / ) ( / ) ( / ), , , , , ,

associating quant− − + −⎣ ⎦

•M M M

1

ization level with signal intervals as:ˆ quantization level for interval ,

f t i di t ib ti ith l lit d ( )−⎡ ⎤= ⎣ ⎦j j jx x x

for symmetric, zero-mean distributions, with large amplitudes ( )it makes sense to define the boundary • ∞

0 20 /

points: (central boundary point),

th i i th±= = ±∞Mx x

2 2

the error variance is thus

( )σ∞

−∞

•

= ∫e ee p e de

71

Optimum Quantization LevelsOptimum Quantization LevelsOptimum Quantization LevelsOptimum Quantization Levels

ˆby definition ; thus we can make a change of variables to give:e x x

ˆ/

by definition, ; thus we can make a change of variables to give:ˆ ˆ ( ) ( ) ( / ) ( )

giving

• = −= − = =

•e e xx x

e x xp e p x x p x x p x

1

22 2

2 1

/

/

ˆ ( ) ( )

assuming ( ) ( ) so that th

σ−

=− +

= −

• = −

∑ ∫i

i

xM

e i xi M x

x x p x dx

p x p x e optimum quantizer is antisymmetric then assuming ( ) ( ) so that th• = −p x p x e optimum quantizer is antisymmetric, thenˆ ˆ and

thus we can write the error variance as− −= − = −

•i i i ix x x x

1

22 2

1

2

2/

ˆ ( ) ( )

goal is to minimize through choices of

σ

σ−

=

= −

•

∑ ∫i

i

xM

e i xi x

x x p x dx

ˆ{ } and { }i ix x

72

goal is to minimize through choices ofσ• e { } and { }i ix x

Solution for Optimum LevelsSolution for Optimum Levelspp2ˆ to solve for optimum values for { } and { }, we differentiate wrt

the parameters set the derivative to 0 and solve numerically:σ• i i ex x

1

2 0 1 2 2 1

the parameters, set the derivative to 0, and solve numerically:

ˆ ( ) ( ) , ,..., / ( )−

− = =∫i

i

x

i xx

x x p x dx i M

12

(=ix ( )

( )1

0 2

1 2 2 1 2

0 3/

ˆ ˆ ) , ,..., /

with boundary conditions of ,

+

±

+ = −

• = = ±∞

i i

M

x x i M

x x ( )0 2/y can also constrain quantizer to be uniform and solve for value of Δ

that maximizes SNRoptimum boundary points lie halfway betw

±

•M

2een / quantizer levelsM- optimum boundary points lie halfway betw

1

2een / quantizer levelsˆ - optimum location of quantization level is at the centroid of the probability

density over the interval to −

i

i i

Mx

x x

73

- solve the above set of equations (1,2,3) iterative { } { }ˆly for ,i ix x

Optimum Quantizers for Laplace DensityOptimum Quantizers for Laplace Density

Assumes1=xσ

74

M. D. Paez and T. H. Glisson, “Minimum Mean-Squared Error QuantizationIn Speech,” IEEE Trans. Comm., April 1972.

Optimum Quantizer for 3Optimum Quantizer for 3--bit Laplace Density; bit Laplace Density; Uniform CaseUniform CaseUniform CaseUniform Case

75step size decreases roughly exponentially

with increasing number of bitsquantization levels get further apart as

the probability density decreases

Performance of Optimum QuantizersPerformance of Optimum Quantizers

76

SummarySummarySummarySummary• examined a statistical model of speech showing

probability densities autocorrelations and power spectraprobability densities, autocorrelations, and power spectra• studied instantaneous uniform quantization model and

derived SNR as a function of the number of bits in the quantizer, and the ratio of signal peak to signal variancequantizer, and the ratio of signal peak to signal variance

• studied a companded model of speech that approximated logarithmic compression and showed that the resulting SNR was a weaker function of the ratio of gsignal peak to signal variance

• examined a model for deriving quantization levels that were optimum in the sense of matching the quantizer to p g qthe actual signal density, thereby achieving optimum SNR for a given number of bits in the quantizer

77

Digital Speech ProcessingDigital Speech ... - web.ece.ucsb.edu...a(t) (or equivalently a lowpass...

Documents

Transcript of Digital Speech ProcessingDigital Speech ... - web.ece.ucsb.edu...a(t) (or equivalently a lowpass...