Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)
-
Upload
xavier-giro -
Category
Data & Analytics
-
view
32 -
download
8
Transcript of Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)
![Page 1: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/1.jpg)
[course site]
Day 3 Lecture 3
Speaker ID I
Javier Hernando
![Page 2: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/2.jpg)
2
Acknowledgments
Miquel India, Omid Ghahabi, Pooyan SafariPh.D. candidates
![Page 3: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/3.jpg)
3
Speech Recognition
Emotion - HappyGender - WomanAge - Teengger
.
.
.
![Page 4: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/4.jpg)
4
Speaker ID as Biometrics
![Page 5: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/5.jpg)
5
Security
![Page 6: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/6.jpg)
6
Applications
![Page 7: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/7.jpg)
7
Modalities
![Page 8: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/8.jpg)
8
Tasks
![Page 9: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/9.jpg)
9
FeaturesHumans use different features to recognise the speaker:
● Pronunciation, diction, …● Prosody, rhythm, speed, volume,…● Acoustic aspects of the voice.
Desirable aspects for those features:● Practical
○ To appear frequently and naturally during the speech○ Easily measurable for the system
● Robust○ Any change over time or by speaker’s health○ Any change by different transmission characteristics or by background noise
● Secure○ Hard to falsify
No feature has all those propertiesSpectrum-derived features are the more used by now because of their effectiveness
![Page 10: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/10.jpg)
10
HMMs and GMMs
![Page 11: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/11.jpg)
11
GMM-UBM Universal Background Model
MAP Adaptation
![Page 12: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/12.jpg)
12
Supervectors
![Page 13: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/13.jpg)
13
i-vectorsJoint Factor Analysis (JFA) model
![Page 14: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/14.jpg)
14
i-Vector dimension
![Page 15: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/15.jpg)
15
i-Vector TrainingT is trained iteratively according to the GMM posteriors
Given T and the UBM, i-vectors are extracted for each speaker utterance
![Page 16: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/16.jpg)
16
i-vector Scoring
![Page 17: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/17.jpg)
17
SoA Speaker Recognition
![Page 18: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/18.jpg)
18
DL in Speaker Recognition
![Page 19: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/19.jpg)
19
DL Features
![Page 20: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/20.jpg)
20
BN Features
After M. Mclaren e al.,“Advances in deep neural network approaches to speaker recognition” ICASSP 2015.
![Page 21: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/21.jpg)
Autoencoder
![Page 22: Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)](https://reader031.fdocuments.us/reader031/viewer/2022022201/5899c4731a28ab45548b5735/html5/thumbnails/22.jpg)
Denoising Autoencoder
Denoising autoencoder for cepstral domain dereverberation.
● Transfrom noisy features of reverberant speech to clean speech features.
● Pre-Trainning with Deep Belief Networks (DBN)
Zhang et al., Deep neural network-based bottleneck feature and denoising autoencoder-based fro distant-talking speaker identification, EURASSIP Journal on Audio, Speech, and Music Processing (2015) 2015:12