Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf ·...

14
Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck Features for Speech Recognition Liang Lu, and Steve Renals University of Edinburgh Liang Lu, Interspeech, September, 2014. R T S C R T S C

Transcript of Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf ·...

Page 1: Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf · 2016-08-31 · Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck

Probabilistic Linear Discriminant Analysis (PLDA)with Bottleneck Features for Speech Recognition

Liang Lu, and Steve RenalsUniversity of Edinburgh

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Page 2: Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf · 2016-08-31 · Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck

I Motivation

I The model

I Experiments

I Conclusion

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Page 3: Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf · 2016-08-31 · Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck

Motivation

I Deep learning for speech feature representationsI Deep neural networksI Denoising autoencoderI Bottleneck features

I Do they fit GMMs?I Low and de-correlated feature inputI weak covariance modelling

I PLDA-based acoustic modelsI Higher dimensional feature inputI Approximated full-covariance modelling

I Bottleneck features from a DNN (this paper)

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Page 4: Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf · 2016-08-31 · Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck

Probabilistic linear discriminant analysis

A factorisation model:

yt |j = Uxjt + Gzj + b + εt , εt ∈ N (0,Λ) (1)

where

I j is the class index, e.g. HMM state index

I t is the frame index

I zj is the state variable that depends on each state

I xjt is the frame variable that depends on each frame

I U,G are two factor loading matrices, and b is the bias vector

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Page 5: Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf · 2016-08-31 · Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck

PLDA-HMM acoustic model

j − 1 j + 1j

zj

yt

xjt

Tj

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Page 6: Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf · 2016-08-31 · Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck

Gaussian PLDA

I If we assume the latent variables are Gaussian distributed, weobtain Gaussian PLDA

p(yt |xjt , zj , j) = N (yt ;Uxjt + Gzj + b,Λ) (2)

or we marginalise out xjt

p(yt |zj , j) = N (yt ;Gzj + b,UUT + Λ︸ ︷︷ ︸covariance

) (3)

I For higher dimensional features, zj and xjt can be lowdimensional, fast to train

I Better covariance modelling

I Model can be trained using Variational Bayesian EM

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Page 7: Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf · 2016-08-31 · Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck

PLDA-HMM acoustic model

Extensions:

I PLDA mixture model

yt |j ,m = Umxjmt + Gmzjm + bm + εjmt (4)

I Tied PLDA mixture model, similar to SGMM

yt |j ,m = Umxjmt + Gmzj + bm + εjmt (5)

Related works:

I PLDA + iVectors for speaker verificaiton

I Joint factor analysis model

I Factorized cluster adaptive training (fCAT)

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Page 8: Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf · 2016-08-31 · Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck

Setup

I Switchboard corpusI Using 33/109 hours of training data

I Bottleneck features (BN)I 33 hours data → 6 hidden layers ×1024 hidden unitsI 110 hours data → 6 hidden layers × 1200 hidden unitsI 5th hidden layer → bottleneck layer

I Tandem approach → (BN, MFCC)

I Maximum likelihood speaker independent systems

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Page 9: Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf · 2016-08-31 · Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck

Experiments - MFCCs

Table: WERs (%) using 33 hours Switchboard training data, SI systems

System Feature Dim WER

GMM MFCC 0+∆+∆∆ 39 36.6GMM MFCC 0(±2)+LDA STC 40 34.4GMM MFCC 0(±3)+LDA STC 40 33.5GMM MFCC 0±4+LDA STC 40 33.3

mix-PLDA MFCC 0(±2) 65 33.1mix-PLDA MFCC 0(±3) 91 32.4mix-PLDA MFCC 0(±4) 117 31.5mix-PLDA MFCC 0(±5) 143 33.2mix-PLDA MFCC 0+∆+∆∆(±1) 117 32.4mix-PLDA MFCC 0+∆+∆∆(±2) 195 34.0

SGMM MFCC 0+∆+∆∆ 39 31.4DNN MFCC 0+∆+∆∆(±4) 396 27.6

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Page 10: Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf · 2016-08-31 · Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck

Experiments - bottleneck features

Table: WERs (%) using 33 hours Switchboard training data, 26-dimbottleneck features (65-dim Tandem), SI systems

System Feature WER

DNN MFCC 0+∆+∆∆ (±4) 27.6BN-DNN MFCC 0+∆+∆∆ (±4) 28.8

GMM MFCC 0+∆+∆∆ 36.6GMM MFCC 0(±3)+LDA STC 33.5GMM Tandem 30.9GMM Tandem + LDA STC 27.4SGMM Tandem + LDA STC 26.7

mix-PLDA Tandem 27.1tied-PLDA Tandem 26.8

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Page 11: Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf · 2016-08-31 · Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck

Experiments - bottleneck features

I Results of using different size of bottleneck layer

26 39 52 65 8026

27

28

29

30

31

32

33

34

WER

(%)

dim of bottleneck features

BN DNNGMM + TandemGMM + Tandem + LDA_STCPLDA + Tandem

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Page 12: Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf · 2016-08-31 · Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck

Experiments

Table: WERs (%) using 109 hours Switchboard training data, SI systems

System Feature WER

DNN MFCC 0+∆+∆∆ (±4) 22.0BN-DNN MFCC 0+∆+∆∆ (±4) 22.7

GMM MFCC 0+∆+∆∆ 31.0GMM MFCC 0(±3) +LDA STC 28.0GMM Tandem 25.5GMM Tandem + LDA STC 22.1SGMM Tandem + LDA STC 21.7

mix-PLDA Tandem 21.6tied-PLDA Tandem 21.4

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Page 13: Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf · 2016-08-31 · Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck

Summary

I PLDA for acoustic modelling, and results of using bottleneckfeatures

I Discriminative training

I Speaker adaptation – another way for full-covarianceadaptation

I FMLLR with log-Mel filter bank features for DNN adaptation

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Page 14: Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf · 2016-08-31 · Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck

Thanks!

Liang Lu, Interspeech, September, 2014. RTSC

RTSC