Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

[course site]

Day 3 Lecture 3

Speaker ID I

Javier Hernando

https://telecombcn-dl.github.io/2017-dlsl/

https://telecombcn-dl.github.io/2017-dlsl/

http://futur.upc.edu/179875?locale=en

http://futur.upc.edu/179875?locale=en

2

Acknowledgments

Miquel India, Omid Ghahabi, Pooyan SafariPh.D. candidates

3

Speech Recognition

Emotion - HappyGender - WomanAge - Teengger

.

.

.

4

Speaker ID as Biometrics

5

Security

6

Applications

7

Modalities

8

Tasks

9

FeaturesHumans use different features to recognise the speaker:

● Pronunciation, diction, …● Prosody, rhythm, speed, volume,…● Acoustic aspects of the voice.

Desirable aspects for those features:● Practical

○ To appear frequently and naturally during the speech○ Easily measurable for the system

● Robust○ Any change over time or by speaker’s health○ Any change by different transmission characteristics or by background noise

● Secure○ Hard to falsify

No feature has all those propertiesSpectrum-derived features are the more used by now because of their effectiveness

10

HMMs and GMMs

11

GMM-UBM Universal Background Model

MAP Adaptation

12

Supervectors

13

i-vectorsJoint Factor Analysis (JFA) model

14

i-Vector dimension

15

i-Vector TrainingT is trained iteratively according to the GMM posteriors

Given T and the UBM, i-vectors are extracted for each speaker utterance

16

i-vector Scoring

17

SoA Speaker Recognition

18

DL in Speaker Recognition

19

DL Features

20

BN Features

After M. Mclaren e al.,“Advances in deep neural network approaches to speaker recognition” ICASSP 2015.

Autoencoder

Denoising Autoencoder

Denoising autoencoder for cepstral domain dereverberation.

● Transfrom noisy features of reverberant speech to clean speech features.

● Pre-Trainning with Deep Belief Networks (DBN)

Zhang et al., Deep neural network-based bottleneck feature and denoising autoencoder-based fro distant-talking speaker identification, EURASSIP Journal on Audio, Speech, and Music Processing (2015) 2015:12

Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

Data & Analytics

Transcript of Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)