Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf ·...

Post on 03-May-2020

20 views 0 download

Transcript of Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf ·...

Probabilistic Linear Discriminant Analysis (PLDA)with Bottleneck Features for Speech Recognition

Liang Lu, and Steve RenalsUniversity of Edinburgh

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

I Motivation

I The model

I Experiments

I Conclusion

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Motivation

I Deep learning for speech feature representationsI Deep neural networksI Denoising autoencoderI Bottleneck features

I Do they fit GMMs?I Low and de-correlated feature inputI weak covariance modelling

I PLDA-based acoustic modelsI Higher dimensional feature inputI Approximated full-covariance modelling

I Bottleneck features from a DNN (this paper)

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Probabilistic linear discriminant analysis

A factorisation model:

yt |j = Uxjt + Gzj + b + εt , εt ∈ N (0,Λ) (1)

where

I j is the class index, e.g. HMM state index

I t is the frame index

I zj is the state variable that depends on each state

I xjt is the frame variable that depends on each frame

I U,G are two factor loading matrices, and b is the bias vector

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

PLDA-HMM acoustic model

j − 1 j + 1j

zj

yt

xjt

Tj

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Gaussian PLDA

I If we assume the latent variables are Gaussian distributed, weobtain Gaussian PLDA

p(yt |xjt , zj , j) = N (yt ;Uxjt + Gzj + b,Λ) (2)

or we marginalise out xjt

p(yt |zj , j) = N (yt ;Gzj + b,UUT + Λ︸ ︷︷ ︸covariance

) (3)

I For higher dimensional features, zj and xjt can be lowdimensional, fast to train

I Better covariance modelling

I Model can be trained using Variational Bayesian EM

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

PLDA-HMM acoustic model

Extensions:

I PLDA mixture model

yt |j ,m = Umxjmt + Gmzjm + bm + εjmt (4)

I Tied PLDA mixture model, similar to SGMM

yt |j ,m = Umxjmt + Gmzj + bm + εjmt (5)

Related works:

I PLDA + iVectors for speaker verificaiton

I Joint factor analysis model

I Factorized cluster adaptive training (fCAT)

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Setup

I Switchboard corpusI Using 33/109 hours of training data

I Bottleneck features (BN)I 33 hours data → 6 hidden layers ×1024 hidden unitsI 110 hours data → 6 hidden layers × 1200 hidden unitsI 5th hidden layer → bottleneck layer

I Tandem approach → (BN, MFCC)

I Maximum likelihood speaker independent systems

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Experiments - MFCCs

Table: WERs (%) using 33 hours Switchboard training data, SI systems

System Feature Dim WER

GMM MFCC 0+∆+∆∆ 39 36.6GMM MFCC 0(±2)+LDA STC 40 34.4GMM MFCC 0(±3)+LDA STC 40 33.5GMM MFCC 0±4+LDA STC 40 33.3

mix-PLDA MFCC 0(±2) 65 33.1mix-PLDA MFCC 0(±3) 91 32.4mix-PLDA MFCC 0(±4) 117 31.5mix-PLDA MFCC 0(±5) 143 33.2mix-PLDA MFCC 0+∆+∆∆(±1) 117 32.4mix-PLDA MFCC 0+∆+∆∆(±2) 195 34.0

SGMM MFCC 0+∆+∆∆ 39 31.4DNN MFCC 0+∆+∆∆(±4) 396 27.6

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Experiments - bottleneck features

Table: WERs (%) using 33 hours Switchboard training data, 26-dimbottleneck features (65-dim Tandem), SI systems

System Feature WER

DNN MFCC 0+∆+∆∆ (±4) 27.6BN-DNN MFCC 0+∆+∆∆ (±4) 28.8

GMM MFCC 0+∆+∆∆ 36.6GMM MFCC 0(±3)+LDA STC 33.5GMM Tandem 30.9GMM Tandem + LDA STC 27.4SGMM Tandem + LDA STC 26.7

mix-PLDA Tandem 27.1tied-PLDA Tandem 26.8

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Experiments - bottleneck features

I Results of using different size of bottleneck layer

26 39 52 65 8026

27

28

29

30

31

32

33

34

WER

(%)

dim of bottleneck features

BN DNNGMM + TandemGMM + Tandem + LDA_STCPLDA + Tandem

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Experiments

Table: WERs (%) using 109 hours Switchboard training data, SI systems

System Feature WER

DNN MFCC 0+∆+∆∆ (±4) 22.0BN-DNN MFCC 0+∆+∆∆ (±4) 22.7

GMM MFCC 0+∆+∆∆ 31.0GMM MFCC 0(±3) +LDA STC 28.0GMM Tandem 25.5GMM Tandem + LDA STC 22.1SGMM Tandem + LDA STC 21.7

mix-PLDA Tandem 21.6tied-PLDA Tandem 21.4

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Summary

I PLDA for acoustic modelling, and results of using bottleneckfeatures

I Discriminative training

I Speaker adaptation – another way for full-covarianceadaptation

I FMLLR with log-Mel filter bank features for DNN adaptation

Liang Lu, Interspeech, September, 2014. RTSC

RTSC

Thanks!

Liang Lu, Interspeech, September, 2014. RTSC

RTSC