Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

[course site]

Day 3 Lecture 3

Speaker ID I

Javier Hernando

Acknowledgments

Miquel India, Omid Ghahabi, Pooyan SafariPh.D. candidates

Speech Recognition

Emotion - HappyGender - WomanAge - Teengger

Speaker ID as Biometrics

Security

Applications

Modalities

FeaturesHumans use different features to recognise the speaker:

● Pronunciation, diction, …● Prosody, rhythm, speed, volume,…● Acoustic aspects of the voice.

Desirable aspects for those features:● Practical

○ To appear frequently and naturally during the speech○ Easily measurable for the system

● Robust○ Any change over time or by speaker’s health○ Any change by different transmission characteristics or by background noise

● Secure○ Hard to falsify

No feature has all those propertiesSpectrum-derived features are the more used by now because of their effectiveness

HMMs and GMMs

GMM-UBM Universal Background Model

MAP Adaptation

Supervectors

i-vectorsJoint Factor Analysis (JFA) model

i-Vector dimension

i-Vector TrainingT is trained iteratively according to the GMM posteriors

Given T and the UBM, i-vectors are extracted for each speaker utterance

i-vector Scoring

SoA Speaker Recognition

DL in Speaker Recognition

DL Features

BN Features

After M. Mclaren e al.,“Advances in deep neural network approaches to speaker recognition” ICASSP 2015.

Autoencoder

Denoising Autoencoder

Denoising autoencoder for cepstral domain dereverberation.

● Transfrom noisy features of reverberant speech to clean speech features.

● Pre-Trainning with Deep Belief Networks (DBN)

Zhang et al., Deep neural network-based bottleneck feature and denoising autoencoder-based fro distant-talking speaker identification, EURASSIP Journal on Audio, Speech, and Music Processing (2015) 2015:12

Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

Data & Analytics

Transcript of Speaker ID I (D3L3 Deep Learning for Speech and Language UPC 2017)

A SPEAKER DEPENDENT, LARGE VOCABULARY, ISOLATED WORD SPEECH

Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC 2017)

Speech Recognition with Deep Neural Networks (D3L2 Deep Learning for Speech and Language UPC 2017)

Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)

The Competent Speaker Speech Evaluation Form - National

What makes a GOOD PUBLIC SPEAKER & Speech?

Speaker normalization in speech perception

Speaker Diarization with Enhancing Speech for the First ...

Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Language UPC 2017)

Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)

Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)

Language Model (D3L1 Deep Learning for Speech and Language UPC 2017)

Speech Synthesis: WaveNet (D4L3 Deep Learning for Speech and Language UPC 2017)

Organizing Your Speech The Better Speaker Series 276.

Speech separation using speaker-adapted eigenvoice speech ...dpwe/pubs/WeissE08-spkadapt.pdf · Speech separation using speaker-adapted eigenvoice speech ... system and model the

The speaker producing speech

DT2118 Speech and Speaker Recognition - Introduction · DT2118 Speech and Speaker Recognition Introduction ... Itrain and evaluate a speech recogniser using the HTK ... DT2118 Speech

Speaker Adaptation in Speech Synthesis - LMU …jmh/lehre/sem/ws0708/venice... · Speaker Adaptation in Speech Synthesis Uwe Reichel Institute of Phonetics und Speech Processing ...

COMBINING SPEECH RECOGNITION AND SPEAKER VERIFICATION

Speech and speaker normalization (in vowel normalization)