Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf ·...
Transcript of Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf ·...
Probabilistic Linear Discriminant Analysis (PLDA)with Bottleneck Features for Speech Recognition
Liang Lu, and Steve RenalsUniversity of Edinburgh
Liang Lu, Interspeech, September, 2014. RTSC
RTSC
I Motivation
I The model
I Experiments
I Conclusion
Liang Lu, Interspeech, September, 2014. RTSC
RTSC
Motivation
I Deep learning for speech feature representationsI Deep neural networksI Denoising autoencoderI Bottleneck features
I Do they fit GMMs?I Low and de-correlated feature inputI weak covariance modelling
I PLDA-based acoustic modelsI Higher dimensional feature inputI Approximated full-covariance modelling
I Bottleneck features from a DNN (this paper)
Liang Lu, Interspeech, September, 2014. RTSC
RTSC
Probabilistic linear discriminant analysis
A factorisation model:
yt |j = Uxjt + Gzj + b + εt , εt ∈ N (0,Λ) (1)
where
I j is the class index, e.g. HMM state index
I t is the frame index
I zj is the state variable that depends on each state
I xjt is the frame variable that depends on each frame
I U,G are two factor loading matrices, and b is the bias vector
Liang Lu, Interspeech, September, 2014. RTSC
RTSC
PLDA-HMM acoustic model
j − 1 j + 1j
zj
yt
xjt
Tj
Liang Lu, Interspeech, September, 2014. RTSC
RTSC
Gaussian PLDA
I If we assume the latent variables are Gaussian distributed, weobtain Gaussian PLDA
p(yt |xjt , zj , j) = N (yt ;Uxjt + Gzj + b,Λ) (2)
or we marginalise out xjt
p(yt |zj , j) = N (yt ;Gzj + b,UUT + Λ︸ ︷︷ ︸covariance
) (3)
I For higher dimensional features, zj and xjt can be lowdimensional, fast to train
I Better covariance modelling
I Model can be trained using Variational Bayesian EM
Liang Lu, Interspeech, September, 2014. RTSC
RTSC
PLDA-HMM acoustic model
Extensions:
I PLDA mixture model
yt |j ,m = Umxjmt + Gmzjm + bm + εjmt (4)
I Tied PLDA mixture model, similar to SGMM
yt |j ,m = Umxjmt + Gmzj + bm + εjmt (5)
Related works:
I PLDA + iVectors for speaker verificaiton
I Joint factor analysis model
I Factorized cluster adaptive training (fCAT)
Liang Lu, Interspeech, September, 2014. RTSC
RTSC
Setup
I Switchboard corpusI Using 33/109 hours of training data
I Bottleneck features (BN)I 33 hours data → 6 hidden layers ×1024 hidden unitsI 110 hours data → 6 hidden layers × 1200 hidden unitsI 5th hidden layer → bottleneck layer
I Tandem approach → (BN, MFCC)
I Maximum likelihood speaker independent systems
Liang Lu, Interspeech, September, 2014. RTSC
RTSC
Experiments - MFCCs
Table: WERs (%) using 33 hours Switchboard training data, SI systems
System Feature Dim WER
GMM MFCC 0+∆+∆∆ 39 36.6GMM MFCC 0(±2)+LDA STC 40 34.4GMM MFCC 0(±3)+LDA STC 40 33.5GMM MFCC 0±4+LDA STC 40 33.3
mix-PLDA MFCC 0(±2) 65 33.1mix-PLDA MFCC 0(±3) 91 32.4mix-PLDA MFCC 0(±4) 117 31.5mix-PLDA MFCC 0(±5) 143 33.2mix-PLDA MFCC 0+∆+∆∆(±1) 117 32.4mix-PLDA MFCC 0+∆+∆∆(±2) 195 34.0
SGMM MFCC 0+∆+∆∆ 39 31.4DNN MFCC 0+∆+∆∆(±4) 396 27.6
Liang Lu, Interspeech, September, 2014. RTSC
RTSC
Experiments - bottleneck features
Table: WERs (%) using 33 hours Switchboard training data, 26-dimbottleneck features (65-dim Tandem), SI systems
System Feature WER
DNN MFCC 0+∆+∆∆ (±4) 27.6BN-DNN MFCC 0+∆+∆∆ (±4) 28.8
GMM MFCC 0+∆+∆∆ 36.6GMM MFCC 0(±3)+LDA STC 33.5GMM Tandem 30.9GMM Tandem + LDA STC 27.4SGMM Tandem + LDA STC 26.7
mix-PLDA Tandem 27.1tied-PLDA Tandem 26.8
Liang Lu, Interspeech, September, 2014. RTSC
RTSC
Experiments - bottleneck features
I Results of using different size of bottleneck layer
26 39 52 65 8026
27
28
29
30
31
32
33
34
WER
(%)
dim of bottleneck features
BN DNNGMM + TandemGMM + Tandem + LDA_STCPLDA + Tandem
Liang Lu, Interspeech, September, 2014. RTSC
RTSC
Experiments
Table: WERs (%) using 109 hours Switchboard training data, SI systems
System Feature WER
DNN MFCC 0+∆+∆∆ (±4) 22.0BN-DNN MFCC 0+∆+∆∆ (±4) 22.7
GMM MFCC 0+∆+∆∆ 31.0GMM MFCC 0(±3) +LDA STC 28.0GMM Tandem 25.5GMM Tandem + LDA STC 22.1SGMM Tandem + LDA STC 21.7
mix-PLDA Tandem 21.6tied-PLDA Tandem 21.4
Liang Lu, Interspeech, September, 2014. RTSC
RTSC
Summary
I PLDA for acoustic modelling, and results of using bottleneckfeatures
I Discriminative training
I Speaker adaptation – another way for full-covarianceadaptation
I FMLLR with log-Mel filter bank features for DNN adaptation
Liang Lu, Interspeech, September, 2014. RTSC
RTSC
Thanks!
Liang Lu, Interspeech, September, 2014. RTSC
RTSC