Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
-
Upload
ashlee-ward -
Category
Documents
-
view
219 -
download
0
description
Transcript of Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
Noise Reduction in Speech Recognition
Professor:Jian-Jiun DingStudent: Yung Chang
2011/05/06
Outline
Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition
Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral
substraction 、 wiener filtering Conclusions and applications
Outline
Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition
Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral
substraction 、 wiener filtering Conclusions and applications
Mel Frequency Cepstral Coefficients(MFCC)
The most common used feature in speech recognition Advantages: High accuracy and low complexity
39 dimension
Mel Frequency Cepstral Coefficients(MFCC)
The framework of feature extraction:
Speech signalPre-emphasis
Window
DFT Mel filter-bank
Log(| |2)
IDFTMFCC
energy
derivatives
x(n) x’(n)
xt(n) At(k)
Yt(m)
Yt’(m)yt (j)
tt
tt
tt
t
ejyejy
ejy
22 ,,,
yet
Pre-emohasis
Pre-emphasis of spectrum at higher frequencies
Pre-emphasisx[n] x’[n]
End-point Detection(Voice activity detection)
Noise(silence) Speech
Windowing
Rectangle window
Hamming window
Mel-filter bank
After DFT we get spectrum
frequency
amplitude
Mel-filter bank
Triangular shape in frequency(overlaped)
Uniformly spaced below 1kHz
Logarithmic scale above 1kHz
frequency
amplitude
Delta Coefficients
1 st/2 nd order differences
1 st order
13 dimension
2 nd order
39 dimension
Outline
Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition
Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral
substraction 、 wiener filtering Conclusions and applications
Mismatch in Statistical Speech Recognition
Possible Approaches for Acoustic Environment Mismatch
h[n]acoustic reception
microphone distortionphone/wireless channeln1(t) n2(t)
Feature Extraction Search
Speech Corpus
AcousticModels
Lexicon LanguageModel
TextCorpus
y[n] O =o1o2…oT
feature vectors
inputsignal
additivenoise convolutional noise additive
noise
outputsentences
original speech
x[n]
W=w1w2...wR
(training)
(recognition)
Feature Extraction
Feature Extraction
ModelTraining
Search andRecognition
AcousticModels
AcousticModels
Speech Enhancement Feature-based Approaches Model-based Approaches
y[n]
x[n]
Outline
Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition
Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral
substraction 、 wiener filtering Conclusions and applications
Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN)
Cepstral Mean Substraction(CMS)—Convolutional Noise Convolutional noise in time domain becomes additive in
cepstral domain y[n] = x[n]h[n] y = x+h ,x, y, h in cepstral domain most convolutional noise changes only very slightly for some
reasonable time interval x = yh Cepstral Mean Substraction(CMS)
assuming E[ x ] = 0 , then E[ y ] = h xCMS = yE[ y ]
P(x)
P(y)P(x)
P(y)
CMS
P P
Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN)
CMVN: variance normalized as well xCMVN= xCMS/[Var(xCMS)]1/2
P(x)P(x) P(x)
CMS CMVN
P(y) P(y) P(y)
Feature-based Approach-HEQ(Histogram Equalization)
The whole distribution equalized y=CDFy
-1[CDFx(x)]
CDFy
PP
yx
CDFx
P=0.2P=0.2
3 3.5
Outline
Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition
Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral
substraction 、 wiener filtering Conclusions and applications
Feature-based Approach-RASTA
amplitude
f
modulation frequency
amplitude
f
Perform filtering on these signals(temporal filtering)
Feature-based Approach-RASTA(Relative Spectral Temporal filtering)
Assume the rate of change of noise often lies outside the typical rate of vocal tract shape
A specially designed temporal filter
411
44
33
110
1
zzb
zazazaazB
Modulation Frequency (Hz )
Emphasize speech
Data-driven Temporal filtering
PCA(Principal Component Analysis)
x
y
e
Data-driven Temporal filtering
We should not guess our filter, but get it from data
Frame index
B1(z)B2(z)
Bn(z)
L
zk(1)zk(2)zk(3)
Original feature stream yt
filterconvolution
Outline
Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition
Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral
substraction 、 wiener filtering Conclusions and applications
Speech Enhancement- Spectral Subtraction(SS)
producing a better signal by trying to remove the noise for listening purposes or recognition purposes Noise n[n] changes fast and unpredictably in time
domain, but relatively slowly in frequency domain, N(w)
t
amplitude speech noise
f
speech
noise
amplitude
Outline
Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition
Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral
substraction 、 wiener filtering Conclusions and applications
Conclusions
We give a general framework of how to extract speech feature
We introduce the mainstream robustness There are still numerous noise reduction methods(leave
in the reference)
References
Q & A