Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep...

28
Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1 , Yan Song 1 , Si Wei 2 , Meng-Ge Wang 1 , Ian McLoughlin 1 , Li-Rong Dai 1 1、National Engineering Laboratory of Speech and Language Information Processing, University of Science and Technology of China 2、iFlytek Research, Anhui USTC iFlytek Co. Ltd.

Transcript of Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep...

Page 1: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Performance Evaluation of Deep Bottleneck Features for Spoken

Language Identification

Bing Jiang1, Yan Song1, Si Wei2, Meng-Ge Wang1, Ian McLoughlin1, Li-Rong Dai1

1、National Engineering Laboratory of Speech and Language Information Processing,

University of Science and Technology of China2、iFlytek Research, Anhui USTC iFlytek Co. Ltd.

Page 2: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Outline

• Background

• Our Method

• Experiments

• Conclusions

Friday, September 19, 2014 2

Page 3: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Outline

• Background• Our Method

• Experiments

• Conclusions

Friday, September 19, 2014 3

Page 4: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Background

• Language Identification is a typical problem inMachine learning

Friday, September 19, 2014 4

Feature Domain

Model

Domain

SDC GMM

Page 5: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Background

• There are many language-independent nuisancescovered by original acoustical feature.– Speaker variations

– Channel variations

– Special content variations

– Noise variations

• Feature improvement– MFCC SDC

• Temporal extension

– Compensation in feature domain• Factor analysis

Friday, September 19, 2014 5

So difficult

Page 6: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Background

• Model– Generative model Discriminative model

• GMM-UBM

• SVM

• MMI

– i-vector is the state of the art• Factor analysis

• With compensation methods:– LDA

– WCCN

– PLDA

• More suitable features are wanted……

Friday, September 19, 2014 6

Page 7: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Background

• Recently, DNN is drawing lots of attention– Non-linear modeling capability

• Deep layers structure

• Non-linear activation function

– Feature learning capability• Extracting information about the target layer by layer

• Using neural network to extract the discriminativefeature for LID task??– PLLR

– MLP

– Deep Bottleneck Feature

Friday, September 19, 2014 7

Page 8: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Outline

• Background

• Our Method• Experiments

• Conclusions

Friday, September 19, 2014 8

Page 9: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Our Method

• What are Deep bottleneck features?

Friday, September 19, 2014 9

输入特征

输出类别

Bottleneck layer

当前帧

当前帧

输出特征

Y=[y1,y2,...,yf]

Target Class

Input Feature

Current Frame

Current Frame

Output Feature

Page 10: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Our Method

• What are Deep bottleneck features?

Friday, September 19, 2014 10

1 -2 1

1 2

1 1 1 1

1 =1 1

( ; , ,..., )

( (... ( )...) )l l

l

m

M M Ml l l l

mj ji id d i j m

j i d

y W W W

x b b b

x

Page 11: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Our Method

• Why do we use Deep bottleneck features?– The target class

• Phonemes or phoneme states are suitable forlanguage identification task– Statistical method

• A low-dimensional compact representation ofthe original inputs

– Non-linear transformation– Discriminative features

Friday, September 19, 2014 11

Page 12: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Our Method

• Why do we use Deep bottleneck features?

Friday, September 19, 2014 12

SDC PK DBF

Page 13: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Our Method

• How to train the DBF extractor?

Friday, September 19, 2014 13

Page 14: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Outline

• Background

• Our Method

• Experiments• Conclusions

Friday, September 19, 2014 14

Page 15: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• DNN training database– 500 hours Mandarin telephone database

• Evaluation database– NIST LRE 2009

Friday, September 19, 2014 15

Page 16: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• Exper1: Comparison with SDC– DBF:43x11-2048-2048-43-2048-2048-6004

– 2048 mixture GMM-UBM

Friday, September 19, 2014 16

Page 17: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• Exper2: Context window size of DNNinput– Motivation

• Context window size is sensitive for LID

• The parameter for SDC (7-1-3-7)– Can cover 21 frames

• For LID, the input window should be more length than speech recognition– Speech recognition: 5-1-5

Friday, September 19, 2014 17

Page 18: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• Exper2: Context window size of DNNinput– DBF:43xn-2048-2048-43-2048-2048-6004

Friday, September 19, 2014 18

Page 19: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• Exper3: Dimension of DBF– Motivation

• DNN training forces the activation signals inthe bottleneck layer to form a low-dimensional compact representation of theoriginal inputs

• Find the relationship of the feature dimensionand the performance.

Friday, September 19, 2014 19

Page 20: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• Exper3: Dimension of DBF– DBF:43x21-2048-2048-d-2048-2048-6004

Friday, September 19, 2014 20

Page 21: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• Exper4: Generated in different layers– Motivation

• The feature is more discriminative for target, the more suitable for LID??

• The bottleneck layer is more closer to the output layer, the performance more better???

Friday, September 19, 2014 21

Page 22: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• Exper4: Generated in different layers– Layer3:43x21-2048-2048-43-2048-2048-6004

– Layer4:43x21-2048-2048-2048-43-2048-6004

– Layer5:43x21-2048-2048-2048-2048-43-6004

Friday, September 19, 2014 22

Page 23: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• Exper5: DBF with PCA– Motivation

• Since we use the diagonal covariance matrix toapproximate the GMM, each dimension of the inputfeature need to be de-correlated.

• For SDC, (Discrete cosine transformation) DCT.

• For DBF, we use the classical PCA to have a try.

Friday, September 19, 2014 23

Page 24: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Experiments

• Exper5: DBF with PCA– DBF:43x21-2048-2048-43-2048-2048-6004

Friday, September 19, 2014 24

Page 25: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Outline

• Background

• Our Method

• Experiments

• Conclusions

Friday, September 19, 2014 25

Page 26: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Conclusions

• In this paper, we investigated the useof bottleneck features for LID task.– DBF can significantly improve LID

performance, especially for shortduration utterances.

– DBF is a new milestone for LID research.

• We believe that using DNN to extractmore suitable feature for LID will makea great process in LID community.

Friday, September 19, 2014 26

Page 27: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Related Paper

• For more information about DBF for LID, you can seethe following paper:– Song Yan, Jiang Bing, Bao Ye-Bo, Wei Si, Dai Li-Rong, “I-vector

representation based on bottleneck features for languageidentification.” Electronics Letters, vol.49, no. 24, pp. 1569-1570,2013.

– Jiang Bing, Song Yan, Si Wei, Liu Jun-hua, Ian McLoulghlin andDai Li-Rong, “Deep Bottleneck features for Spoken languageidentification.” Plos One 9(7): e100795.doi:10.1371/journal.pone.0100795, 2014.

– Jiang Bing, Song Yan, Si Wei, Ian McLoulghlin and Dai Li-Rong,“Task-aware deep bottleneck features for spoken languageidentification.” in Proc. of INTERSPECH 2014, Singapore, 2014.

Friday, September 19, 2014 27

Page 28: Performance Evaluation of Deep Bottleneck Features for ... · Performance Evaluation of Deep Bottleneck Features for Spoken Language Identification Bing Jiang 1, Yan Song , Si Wei2,

Friday, September 19, 2014 28