Chengyuan Ma and Jinyu Li - ISyE Home | ISyEbrani/isyebayes/bank/ISYE8843_Ma_and_Li.pdf ·...
Transcript of Chengyuan Ma and Jinyu Li - ISyE Home | ISyEbrani/isyebayes/bank/ISYE8843_Ma_and_Li.pdf ·...
Bayesian Method in Speaker Verification
Chengyuan Ma and Jinyu Li
School of Electrical and Computer Engineering
Bayesian Method in Speaker Verification. December 2, 2004 – p. 1/??
Outline
Speaker Verification System
MAP adaptation of Speaker Model
Experiment Setup, Conclusion and Future Work
Bayesian Method in Speaker Verification. December 2, 2004 – p. 2/??
Speaker Verification System
Bayesian Method in Speaker Verification. December 2, 2004 – p. 3/??
Speaker Verification SystemFramework
Speaker Verification
Determines whether person is who they claim to be
User makes identity claim (one to one mapping)
Three Main Components
Front-end processing (Feature Extraction)
Speaker and background modeling (Speaker Modeling)
Log-likelihood-ratio score normalization (Decision
Making)
Bayesian Method in Speaker Verification. December 2, 2004 – p. 4/??
Speaker Verification SystemFramework (Cont.)
Bayesian Method in Speaker Verification. December 2, 2004 – p. 5/??
Speaker and Background Modeling
GMM (Gaussian Mixtures Model) represent static acoustic
event distribution for each speaker
p(ot|Λ) =M∑
m=1
wm
(2π)d
2 |Σm|1
2
· e−1
2(ot−µm)τΣ−1
m (ot−µm)
M∑
m=1
wm = 1 and wm > 0
Λ is used to represent speaker model parameters. wm is the mth weight ofGaussian mixture N (ot; µm, Σm) with mean µm and covariance matrix Σm.
Bayesian Method in Speaker Verification. December 2, 2004 – p. 6/??
Speaker Recognition Scoring
p(O|Λ) = p(o1, o2, · · · , oT |Λ) =
T∏
t=1
p(ot|Λ)
The score (average Log-Likelihood Ratio (LLR)) of a given test segment O is
computed as follow,
score =1
T
T∑
t=1
(log p(ot|Λtar) − log p(ot|ΛUBM))
T is the length of the test segment, ot is the feature vector at time t and Λtar and
ΛUBM are parameters of the target model and UBM (Universal Background
Model) respectively.
Bayesian Method in Speaker Verification. December 2, 2004 – p. 7/??
Performance Measurement ofSpeaker Verification System
ROC and DET plots are used for performance measurement
Bayesian Method in Speaker Verification. December 2, 2004 – p. 8/??
MAP adaptation of Speaker Model
Bayesian Method in Speaker Verification. December 2, 2004 – p. 9/??
Speaker Model Adaptation UsingMAP Method
EM training is reliable only when sufficient training data are
available
If only a small amount of training data, adaptation method
will be used for model training.
Background model is regarded as prior information for each
speaker, speaker-specified training data are the observation
samples, these two can be combined in the Bayesian
framework.
Bayesian Method in Speaker Verification. December 2, 2004 – p. 10/??
Speaker Model Adaptation UsingMAP Method (Cont.)
Speaker model parameters to be estimated
Λ = (w1, w2, · · · , wM , µ1, µ2, · · · , µM ,Σ1,Σ2, · · · ,ΣM )
MAP estimation of speaker model parameters
θMAP = arg maxθ
p(O|Λ)g(Λ)
Bayesian Method in Speaker Verification. December 2, 2004 – p. 11/??
Speaker Model Adaptation UsingMAP Method (Cont.)
Prior distribution of mixture weights is a conjugete density
such as Dirichlet density
g(w1, w2, · · · , wM ) ∝M∏
m=1
wvk−1m
where vk > 0 are the parameters for the Dirichlet density. a
aJ.-L. Gauvain and C.-H. Lee, ”Maximum a Posterior Estimation for Multivariate Gaussian Mixture Ob-
servations of Markov Chains,” IEEE Trans. Speech and Audio Processing, vol. 2, no. 2, pp. 291-298, April
1994.
Bayesian Method in Speaker Verification. December 2, 2004 – p. 12/??
Speaker Model Adaptation UsingMAP Method (Cont.)
Prior distribution of (µm,Σm) is a normal_Wishart density
g(µm,Σm) ∝ |γm|(αm−p)/2
exp[
−τm
2(µm − mm)tΣm(µm − mm)
]
exp
[
−1
2tr(umΣm)
]
where (τm, mm, αm, um) are the prior density parameters
such that αm > p − 1, τm > 0, mm is a vector of dimension p,
and um is a p ∗ p positive definite matrix.
Bayesian Method in Speaker Verification. December 2, 2004 – p. 13/??
Speaker Model Adaptation UsingMAP Method (Cont.)
Practical method for parameter adaptation only the mean
vectors will be updated
µ̂m = αmEm(O) + (1 − αm)µm
Bayesian Method in Speaker Verification. December 2, 2004 – p. 14/??
Experiment Setup, Conclusion and Future Work
Bayesian Method in Speaker Verification. December 2, 2004 – p. 15/??
Experiment Setup
1996 NIST(National Institute of Standards and Technology)
dataset was used for experiment,
225 Speakers in the Corpus
1 minutes for training and 15 second for testing for each
speaker
Bayesian Method in Speaker Verification. December 2, 2004 – p. 16/??
System Performance using EMTraining
Bayesian Method in Speaker Verification. December 2, 2004 – p. 17/??
System Performance using MAPAdaptation
Bayesian Method in Speaker Verification. December 2, 2004 – p. 18/??
Future Work
Acoustic features robust against noisy environment and
different channels
Incorporation dynamical acoustic features and prosodic
features
Higher level information
novel probabilistic models for speaker modeling
Bayesian Method in Speaker Verification. December 2, 2004 – p. 19/??
Q & A
Bayesian Method in Speaker Verification. December 2, 2004 – p. 20/??