Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.

Noise Reduction in Speech Recognition

Professor:Jian-Jiun DingStudent: Yung Chang

2011/05/06

Outline

Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition

Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral

substraction 、 wiener filtering Conclusions and applications

Mel Frequency Cepstral Coefficients(MFCC)

The most common used feature in speech recognition Advantages: High accuracy and low complexity

39 dimension

Mel Frequency Cepstral Coefficients(MFCC)

The framework of feature extraction:

Speech signalPre-emphasis

Window

DFT Mel filter-bank

Log(| |2)

IDFTMFCC

energy

derivatives

x(n) x’(n)

xt(n) At(k)

Yt(m)

Yt’(m)yt (j)

tt

tt

tt

t

ejyejy

ejy

22 ,,,

yet

Pre-emohasis

Pre-emphasis of spectrum at higher frequencies

Pre-emphasisx[n] x’[n]

End-point Detection(Voice activity detection)

Noise(silence) Speech

Windowing

Rectangle window

Hamming window

Mel-filter bank

After DFT we get spectrum

frequency

amplitude

Mel-filter bank

Triangular shape in frequency(overlaped)

Uniformly spaced below 1kHz

Logarithmic scale above 1kHz

frequency

amplitude

Delta Coefficients

1 st/2 nd order differences

1 st order

13 dimension

2 nd order

39 dimension

Outline




Mismatch in Statistical Speech Recognition

Possible Approaches for Acoustic Environment Mismatch

h[n]acoustic reception

microphone distortionphone/wireless channeln1(t) n2(t)

Feature Extraction Search

Speech Corpus

AcousticModels

Lexicon LanguageModel

TextCorpus

y[n] O =o1o2…oT

feature vectors

inputsignal

additivenoise convolutional noise additive

noise

outputsentences

original speech

x[n]

W=w1w2...wR

(training)

(recognition)

Feature Extraction

Feature Extraction

ModelTraining

Search andRecognition

AcousticModels

AcousticModels

Speech Enhancement Feature-based Approaches Model-based Approaches

y[n]

x[n]

Outline




Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN)

Cepstral Mean Substraction(CMS)—Convolutional Noise Convolutional noise in time domain becomes additive in

cepstral domain y[n] = x[n]h[n] y = x+h ,x, y, h in cepstral domain most convolutional noise changes only very slightly for some

reasonable time interval x = yh Cepstral Mean Substraction(CMS)

assuming E[ x ] = 0 , then E[ y ] = h xCMS = yE[ y ]

P(x)

P(y)P(x)

P(y)

CMS

P P

Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN)

CMVN: variance normalized as well xCMVN= xCMS/[Var(xCMS)]1/2

P(x)P(x) P(x)

CMS CMVN

P(y) P(y) P(y)

Feature-based Approach-HEQ(Histogram Equalization)

The whole distribution equalized y=CDFy

-1[CDFx(x)]

CDFy

PP

yx

CDFx

P=0.2P=0.2

3 3.5

Outline




Feature-based Approach-RASTA

amplitude

f

modulation frequency

amplitude

f

Perform filtering on these signals(temporal filtering)

Feature-based Approach-RASTA(Relative Spectral Temporal filtering)

Assume the rate of change of noise often lies outside the typical rate of vocal tract shape

A specially designed temporal filter

411

44

33

110

1

zzb

zazazaazB

Modulation Frequency (Hz )

Emphasize speech

Data-driven Temporal filtering

PCA(Principal Component Analysis)

x

y

e

Data-driven Temporal filtering

We should not guess our filter, but get it from data

Frame index

B1(z)B2(z)

Bn(z)

L

zk(1)zk(2)zk(3)

Original feature stream yt

filterconvolution

Outline




Speech Enhancement- Spectral Subtraction(SS)

producing a better signal by trying to remove the noise for listening purposes or recognition purposes Noise n[n] changes fast and unpredictably in time

domain, but relatively slowly in frequency domain, N(w)

t

amplitude speech noise

f

speech

noise

amplitude

Outline




Conclusions

We give a general framework of how to extract speech feature

We introduce the mainstream robustness There are still numerous noise reduction methods(leave

in the reference)

References

Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.

Documents

Transcript of Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.