NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya...

NOISE DETECTION AND CLASSIFICATION NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTINGIN SPEECH SIGNALS WITH BOOSTING

Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo ArikiDepartment of Computer and System Engineering, Kobe University

Research purpose

Purpose Detecting and Classifying Sudden and Short-Period Noises

BackgroundSudden and short-period noises often affect speech recognition system in real environments.

Noise reduction improve speech recognition system.It is difficult to remove sudden and short-period noises because we do not know where the noise overlapped and what noise was.

Telephone calling

System overview

Well, I believe that you will ・・・・clatter

Noise detection using AdaBoost

Clean speechNoisy speech overlapped

by sudden noises

Smoothing

Noise classification using AdaBoost

Final results

Feature extraction

AdaBoost

Weak classifier Classifier’s weight

(x)hαxH

tttsgn

weak classifier )(1 xh

We labeled learning data {-1,+1}, 1 means noisy speech data label, -1 means clean speech data label.

AdaBoost is one of method of boosting.AdaBoost decides the weak classifiers and their weights.

Multi-class classification using AdaBoostWe perform multi-class classification using AdaBoost in order t

o determine noise classes.It is necessary to extend AdaBoost to classify multi-class

AdaBoostclass1

orother class

AdaBoostclass K

orother class

Find a maximum value in each outputs)(maxarg)( xHxC k

k

・・・

Feature vector

Label of class 1

Label of class 2

Label of class 3

….

1

1

1

….

k

tktk

t

k hxH 1

)(

combine

strong classifier )(xH

If speech recognition system can detect sudden noises, it will make it possible for the system to ask the speaker to repeat the same utterance.If it can be determined what and where noise is overlapped, these information will be useful for noise reduction or model composition.

}1,1{)( xht

Noise detection using AdaBoost

AdaBoost

Clean speech Noise overlapped

Feature vector

AdaBoost determines this frame overlapped by noise or clean speech.

ni xxx ,....,: 1

,.....,,,,

,....,,,,

11

54321

nnsss

xxxxx

Learning

where, weak classifier is one-dimension linear classifier.

Detection

AdaBoost makes strong classifier between clean speech frames and noisy speech frames using these data.

Multiple two-class classifiers are created, which distinguish one class and other classes. The class of the largest value is selected from the output values.

)(sgn)( xhxH

ttt

Changing η of this equation, we adjust the number of positive errors and negative errors.

red blue

red

blueChanging data weight

)(2 xh

・・・・・Algorithm

1)(11 xh

)(22 xh2

)(sgn)(:hypothesis Final

)})((exp{)(

)})((exp{)()(w

on wdistributi example Updata.4

1log3.Set

21))((

)(

of error training theCalclulate 2. }1,1{: hypothesisobtain and

on wdistributi example ed weight orespect tith learner w base a1.Train

T,1,...,for t Do/1)( wInitialize

)}y,(x),...,y,{(x Z examplesn Input

1t

t

t

t

1

nn11

xhxH

yxhIzw

yxhIzwz

yxhIzw

hxh

nz

tt

iittit

iittiti

t

tt

iititt

t

t

i

We use the AdaBoost for noise detection and classification because it can make complex boundary.

Weight weak classifier based on performance of it

Wrong data weight is biggerTrue data weight is smaller

Comparative approach

We use log likelihood ratio of GMMs. 　It is the popular method for VAD (voice activity detection )

)|()|(

logsgn)(elspeech_modxPlnoisy_modexP

xH

)|(maxarg)( kk

noisexPxC

Detection

Classification We find a class which has a maximum likelihood from noisy speech GMMs.

Experiments

Summary Future work

Classification

Noise class k

Noise class １or

Other class

Noise class2or

Other class

Noise class Kor

Other class

The frame to be noisy in detection approach

…

Noises are separated to some classes in advance.Classifiers are learned by AdaBoost to classify these classes.

Learning

classification

Classification are applied to only the frames which are determined as noisy in detection.

Classifiers decide the class of noisy speech frame.

Smoothing

noise １ noise １

noise １

noise2

A signal interval detected by AdaBoost may result in only a few frames

Experimental conditionWindow size 20msec Hamming window every 10-msecFeature: 24-order log-Mel filter bank and 12-order MFCCThe number of weak classifier of AdaBoost: 500SNR of learning data : -5 dB ～ 5 dB

Recall Precision F-measure Classification Accuracy

AdaBoostGMM 0.95

0.90

0.850.80

0.75

0.70

0.973 0.9580.914

0.896

1.00[SNR of 0 dB]

0.965 0.9620.9500.9510.989

0.973

These frames are removed by smoothing.We use majority voting for smoothing.

When carrying out the smoothing of one frame, the prior three and subsequent three frames are also consideration.

3

3

' )(maxargN

Nii

cN ccIc

ic : i-th frame’s classification output.

Criteria of evaluation

tpcetp

tionClassifica

PrecisionRecall

PrecisionRecallmeasureF

2

fntp

cefptpAccuracy

fptp

tpPrecision

fntptp

Recall

frames positive false ofnumber the:fp

frameserror tion classifica ofnumber the:ceframes negative false ofnumber the:fn

frames positive trueofnumber the:tp


AdaBoost

GMM0.95

0.90

0.85

0.80

0.75

0.70

0.923

0.842

0.804

1.00[SNR of 5 dB]

0.9730.949

0.9150.950

0.900

0.947 0.932

Experimental results


AdaBoostGMM 0.95

0.90

0.85

0.80

0.75

0.70

0.9740.9731.00[SNR of -5 dB]

0.9720.973 0.973 0.9740.989 0.989

0.937 0.933

η of this equation adjust the number of positive error and negative error.

We proposed the sudden noise detection and classification with Boosting. Detection and classification have high performance in low SNR. The performance using AdaBoost is better than GMM-based method.

We will detect more kinds of noises combining this method with clustering method as k-means.We will combine noise detection and classification with noise reduction method.

Speech data 16kHztraining:210 utterances of 21 menTesting:2104 utterances of 5 men

Noise data6 kinds of noise: “spray,“ " telephone,” ”tearing paper,” “pouring of a granular substance,” “bell-ringing,” “horn”

These have each 50 source. 20 data for training, 30 data for testing.

NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya...

Documents

Transcript of NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya...