Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability...

44
Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai Aronowitz IBM Haifa Research Lab Presentation is available online at: http://aronowitzh.googlepages.com/ Intra-Class Variability Modeling for Speech Processing

Transcript of Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability...

Page 1: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1

Dr. Hagai Aronowitz

IBM Haifa Research Lab

Presentation is available online at: http://aronowitzh.googlepages.com/

Intra-Class Variability Modeling for Speech Processing

Page 2: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 2

Given labeled training segments from class + and class –, classify unlabeled test segments

Classification framework

1. Represent speech segments in segment-space

2. Learn a classifier in segment-space• SVMs• NNs• Bayesian classifiers• …

Speech ClassificationProposed framework

Page 3: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 3

OutlineIntra-Class Variability Modeling for Speech Processing

1 Introduction to GMM based classification

2 Mapping speech segments into segment space

3 Intra-class variability modeling

4 Speaker diarization

5 Summary

Page 4: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 4

GMM based speaker recognitionEstimate Pr(yt|S)

1. Train a universal background model (UBM) GMM using EM2. For every target speaker S:

Train a GMM GS by applying MAP-adaptation

Text-Independent Speaker RecognitionGMM-Based Algorithm [Reynolds 1995]

Assuming frame independence:

T

tT SySyy1t

1 Pr,...,Pr

?Pr SY

UBM

Q1 - speaker #1

Q2 - speaker #2

μ1 μ2 μ3

R26 MFCC feature space

Page 5: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 5

1. Invalid frame independence assumption:

Factors such as channel, emotion, lexical variability, and

speaker aging cause frame dependency

2. GMM scoring is inefficient – linear in the length of the

audio

3. GMM scoring does not support indexing

GMM Based Algorithm - Analysis

Page 6: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 6

OutlineIntra-Class Variability Modeling for Speech Processing

1 Introduction to GMM based classification

2 Mapping speech segments into segment space

3 Intra-class variability modeling

4 Speaker diarization

5 Summary

Page 7: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 7

Mapping Speech Segments into Segment SpaceGMM scoring approximation 1/4

Definitions

X: training session for target speaker

Y: test session

Q: GMM trained for X

P: GMM trained for Y

Goal

Compute Pr(Y |Q) using GMMs P and Q only

Motivation

1. Efficient speaker recognition and indexing

2. More accurate modeling

Page 8: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 8

QPHdxQxPxQyQYx

T

T

ttTT

,PrlogPrPrlogPrlog1

11

)1(

Negative cross entropy

Mapping Speech Segments into Segment SpaceGMM scoring approximation 2/4

Approximating the cross entropy between two GMMs

1. Matching based lower bound [Aronowitz 2004]

2. Unscented-transform based approximation [Goldberger & Aronowitz 2005]

3. Others options in [Hershey 2007]

Page 9: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 9

CwwQPH

D

d

D

d

Qdj

D

d

Qj

j

G

g

Pg Q

dj

Pdg

Qdj

Qdj

Pdg

1

2

21

1,

1 21 ,

,

2

,

2

,, loglogmax,

(2)

Matching based approximation

Mapping Speech Segments into Segment SpaceGMM scoring approximation 3/4

Assuming weights and covariance matrices are speaker independent (+ some approximations):

CwQPH

G

g

D

di

dg

Qdg

Pdg

1 1 22

,

2

,,,

(3)

Mapping T is induced:

dg

GMMdg

gdDg

GD

wGMMT

RGMMT

,

,*ˆ;ˆ

:

(4)

Page 10: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 10

Results

Mapping Speech Segments into Segment SpaceGMM scoring approximation 4/4

Figure and Table taken from:H. Aronowitz, D. Burshtein, “Efficient Speaker Recognition Using Approximated Cross Entropy (ACE)”, in IEEE Trans. on Audio, Speech & Language Processing, September 2007.

Page 11: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 11

1. Anchor modeling projection [Sturim 2001]

• efficient but inaccurate

2. MLLR transofrms [Stolcke 2005]

• accurate but inefficient

3. Kernel-PCA-based mapping [Aronowitz 2007c]

Given - a set of objects

- a kernel function

(a dot product between each pair of objects)

Finds a mapping of the objects into Rn which preserves the

kernel function.• accurate & efficient

Other Mapping Techniques

Page 12: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 13

Introduction Mapping Modeling Speaker Diarization Summary

OutlineIntra-Class Variability Modeling for Speech Processing

1 Introduction to GMM based classification

2 Mapping speech segments into segment space

3 Intra-class variability modeling

4 Speaker diarization

5 Summary

Page 13: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 14

Introduction Mapping Modeling Speaker Diarization Summary

The classic GMM algorithm does not explicitly model intra-speaker inter-session variability:• channel, noise• language• stress, emotion, aging

The frame independence assumption does not hold in these cases!

T

tT SySyy1t

1 Pr,...,Pr)1(

dffSySfdfSfyySyyT

tTT

1t

11 ,PrPr,,...,Pr,...,Pr)3(

Instead, we can use a more relaxed assumption:

Intra-Class Variability Modeling [Aronowitz 2005b] Introduction

T

tT fSyfSyy1t

1 ,Pr,,...,Pr)2(

which leads to:

Page 14: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 15

Introduction Mapping Modeling Speaker Diarization Summary

Speaker

FrameFrame

sequencesequencegenerated independently

a GMM

Old vs. New Generative Models

Session GMM

FrameFrame

sequencesequence

Speaker a PDF over GMM space

a GMM

generated independently

Old Model New Model

Page 15: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 16

Introduction Mapping Modeling Speaker Diarization Summary

speaker #1 speaker #2

speaker #3

Session-GMM Space

Session-GMM space

GMM for session A of speaker #1

GMM for session B of speaker #1

Page 16: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 17

GDs~

,~|ˆPr NS

Modeling in Session-GMM space 1/2

Recall mapping T induced by the GMM approximation analysis:

• is called a supervector• A speaker is modeled by a multivariate normal distribution in supervector space:

)3(

• A typical dimension of is 50,000*50,000• is estimated robustly using PCA + regularization: Covariance is assumed to be a low rank matrix with an additional non-zero (noise) diagonal

GDΣ~

GDΣ~

dg

GMMdg

gdDg

GD

wGMMT

RGMMT

,

,*ˆ;ˆ

:

Page 17: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 18

Introduction Mapping Modeling Speaker Diarization Summary

Supervector space

GDΣ~

1

2

1

2

1

2

1

2

1

2

1

2speaker #1 speaker #2

speaker #3 Delta supervector space

sΣ2~

Modeling in Session-GMM Space 2/2Estimating covariance matrix

Page 18: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 19

• is estimated from the NIST-2006-SRE corpus• Evaluation is done on the NIST-2004-SRE corpus

• ETSI MFCC (13-cep + 13-delta-cep)• Energy based voice activity detector• Feature warping• 2048 Gaussians• Target models are adapted from GI-UBM• ZT-norm score normalization

GDΣ~

Experimental Setup

Datasets

System description

Page 19: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 20

Results

38% reduction in EER

Page 20: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 21

• NAP+SVMs [Campbell 2006]

• Factor Analysis [Kenny 2005]

• Kernel-PCA [Aronowitz 2007c]

• Model each supervector as

s S : Common speaker subspace

u U : Speaker unique subspace

• S is spanned by a set of development supervectors (700 speakers) • U is the orthogonal complement of S in supervector space• Intra-speaker variability is modeled separately in S and in U• U was found to be more discriminative than S• EER was reduced by 44% compared to baseline GMM

Other Modeling Techniques

Kernel-PCA based algorithm

us

Page 21: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 22

Session space

Feature space

x

f(x)

Tx

Common speaker subspace (Rn)

y

f(y)

Ty

uy

ux

Speaker unique subspace

K-PCA

Anchor sessions

Kernel-PCA Based Modeling

Kernel induced

Page 22: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 23

OutlineIntra-Class Variability Modeling for Speech Processing

1 Introduction to GMM based classification

2 Mapping speech segments into segment space

3 Intra-class variability modeling

4 Speaker diarization

5 Summary

Page 23: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 24

Goals

• Detect speaker changes – “speaker segmentation”

• Cluster speaker segments - “speaker clustering”

Motivation for new method

Current algorithms do not exploit available training data!

(besides tuning thresholds, etc.)

Method

Explicitly model inter-segment intra-speaker variability from labeled

training data, and use for the metric used by change-detection /

clustering algorithms.

Trainable Speaker Diarization [Aronowitz 2007d]

Page 24: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 25

Dev data

• BNAD05 (5hr) - Arabic, broadcast news

Eval data

• BNAT05 – Arabic, broadcast news,

(207 target models, 6756 test segments)

System EER (%)

Anchor modeling (baseline) 15.1

Anchor modeling - Kernel based scoring 10.8

Kernel-PCA projection (CSS) 8.8

Kernel-PCA projection (CSS) + inter-segment variability modeling

7.4

Speaker recognition on pairs of 3s segments

Page 25: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 26

Speaker change detection

• 2 adjacent sliding windows (3s each)

• Speaker verification scoring + normalization

Speaker clustering

• Speaker verification scoring + normalization

• Bottom-up clustering

Speaker Error Rate (SER) on BNAT05

• Anchor modeling (baseline): 12.9%

• Kernel-PCA based method: 7.9%

Speaker Diarization System & Experiments

Page 26: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 27

1 Introduction to GMM based classification

2 Mapping speech segments into segment space

3 Intra-class variability modeling

4 Speaker diarization

5 Summary

OutlineIntra-Class Variability Modeling for Speech Processing

Page 27: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 28

• A method for mapping speech segments into a GMM

supervector space was described

• Intra-speaker inter-session variability is modeled in

GMM supervector space

Speaker recognition

• EER was reduced by 38% on the NIST-2004 SRE

• A corresponding kernel-PCA based approach reduces

EER by 44%

Speaker diarization

• SER for speaker diarization was reduced by 39%.

Summary 1/2

Page 28: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 29

• Speaker recognition [Aronowitz 2005b; Aronowitz 2007c]

• Speaker diarization (“who spoke when”) [Aronowitz 2007d]

• VAD (voice activity detection) [Aronowitz 2007a]

• Language identification [Noor & Aronowitz 2006]

• Gender identification [Bocklet 2008]

• Age detection [Bocklet 2008]

• Channel/bandwidth classification [Aronowitz 2007d]

Summary 2/2Algorithms based on the proposed framework

Page 29: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 30

[1] D. A. Reynolds et al., “Speaker identification and verification using Guassian mixture speaker models,” Speech Communications, 17, 91-108.

[2] D.E. Sturim et al., “Speaker indexing in large audio databases using anchor models”, in Proc. ICASSP, 2001.

[3] H. Aronowitz, D. Burshtein, A. Amir, "Speaker indexing in audio archives using test utterance Gaussian mixture modeling", in Proc. ICSLP, 2004.

[4] H. Aronowitz, D. Burshtein, A. Amir, "A session-GMM generative model using test utterance Gaussian mixture modeling for speaker verification", in Proc. ICASSP, 2005.

[5] P. Kenny et al., “Factor Analysis Simplified”, in Proc. ICASSP, 2005.

[6] H. Aronowitz, D. Irony, D. Burshtein, “Modeling Intra-Speaker Variability for Speaker Recognition ”, in Proc. Interspeech, 2005.

[7] J. Goldberger and H. Aronowitz, "A distance measure between GMMs based on the unscented transform and its application to speaker recognition" , in Proc. Interspeech 2005.

[8] H. Aronowitz, D. Burshtein, "Efficient Speaker Identification and Retrieval", in Proc. Interspeech 2005.

Bibliography 1/2

Page 30: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 31

[9] A. Stolcke et al., “MLLR Transforms as Features in Speaker Recognition”, in Proc. Interspeech, 2005.

[10] E. Noor, H. Aronowitz, "Efficient language Identification using Anchor Models and Support Vector Machines,“ in Proc. ISCA Odyssey Workshop, 2006.

[11] W.M. Campbell et al., “SVM Based Speaker Verification Using a GMM Supervector Kernel and NAP Variability Compensation”, in Proc. ICASSP 2006.

[12] H. Aronowitz, “Segmental modeling for audio segmentation”, in Proc. ICASSP, 2007.

[13] J.R. Hershey and P. A. Olsen, “Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models” ,in Proc. ICASSP 2007.

[14] H. Aronowitz, D. Burshtein, “Efficient Speaker Recognition Using Approximated Cross Entropy (ACE)”, in IEEE Trans. on Audio, Speech & Language Processing, September 2007.

[15] H. Aronowitz, “Speaker Recognition using Kernel-PCA and Intersession Variability Modeling”, in Proc. Interspeech, 2007.

[16] H. Aronowitz, “Trainable Speaker Diarization”, in Proc. Interspeech, 2007.[17] T. Bocklet et al., “Age and Gender Recognition for Telephone Applications

Based on GMM Supervectors and Support Vector Machines”, in Proc. ICASSP, 2008.

Bibliography 2/2

Page 31: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 32

Presentation is available online at: http://aronowitzh.googlepages.com/

Thanks!

Page 32: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 33

Backup slides

Page 33: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 34

Session spaceDot-product feature space

f(x)

f(y)

x

yKernel trick

Anchor sessions

f()

Goals: - Map sessions into feature space

- Model in feature space

Kernel-PCA Based Mapping 2/5

Page 34: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 35

Given - kernel K

- n anchor sessions

Find an orthonormal basis for

Method

1) Compute eigenvectors of the centralized kernel-matrix ki,j =

K(Ai,Aj).

2) Normalize eigenvectors by square-roots of corresponding

eigenvalues → {vi}

3) for is the requested basis

},...,{ 1 nAfAfspan

ini vAfAff ,...,1}{ if

nAA ,...,1

Kernel-PCA Based Mapping 3/5

Page 35: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 36

nn AxK

AxK

v

v

xT

,

...

,

...:11

is a mapping x→Rn with the property:

Given sessions x, y, may be uniquely represented as:

},...,{/

},...,{

1

1

n

n

AfAfspanFU

AfAfspanC

Common speaker subspace -

Speaker unique subspace -

UuuCccucyfucxf yxyxyyxx ,and,withand

()(,) yfxf

22

yx ccyTxT

Kernel-PCA Based Mapping 4/5

Page 36: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 37

Session space Feature space

x f(x)

Tx

Common speaker subspace (Rn)

y

f(y)

Ty

uy

ux

Speaker unique subspace

K-PCA

Anchor sessions

Kernel-PCA Based Mapping 5/5

Page 37: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 38

Modeling in Segment-GMM Supervector Space

Segment-GMM supervector spaceSegment-GMM supervector space

FrameFrame

sequence:sequence:

segment #1segment #1

FrameFrame

sequence:sequence:

segment #2segment #2

FrameFrame

sequence:sequence:

segment #nsegment #n

music

speechsilence

Page 38: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 39

Segmental Modeling for Audio Segmentation

Goal

• Segment audio accurately and robustly into speech / silence / music segments.

Novel idea

• Acoustic modeling is usually done on a frame-basis.

• Segmentation/classification is usually done on a segment-basis (using smoothing).

Why not explicitly model whole segments?

Note: speaker, noise, music-context, channel (etc.) are constant during a segment.

Page 39: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 40

10-2

10-1

10-2

10-1

speech miss probability

sile

nce

mis

s pr

obab

ility

SPEECH / SILENCE SEGMENTATION

IBM EVAL06IBM EVAL06 no-padGMM baselineSegmental System EER FA @

FR=0.5%

FR @

FA=1%

EVAL06 FA=24.2% @ FR=0.25%

GMM

baseline

2.9% 7.9% 29.6%

Segmental 1.7% 5.1% 2.7%

Error

reduction

41% 35% 91%

Speech / Silence Segmentation – Results 1/2

Page 40: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 41

10-3

10-2

10-1

10-2

10-1

speech miss probability

mus

ic m

iss

prob

abili

ty

SPEECH / MUSIC SEGMENTATION

IBM EVAL06IBM EVAL06 no-padGMM baselineSegmental

System EER FA @

FR=0.5%

FR @

FA=1%

EVAL06 FA=69% @ FR=0.25%

GMM

baseline

1.43% 3.4% 3.2%

Segmental 1.27% 2.0% 1.9%

Error

reduction

11% 41% 41%

Speech / Silence Segmentation – Results 2/2

Page 41: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 42

LID in Session Space

English

Arabic

FrenchSession space

Training session Test session

Page 42: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 43

1. Front end: shifted delta cepstrum (SDC).

2. Represent every train/test session by a GMM super-vector.

3. Train a linear SVM to classify GMM super-vectors.

Results

• EER=4.1% on the NIST-03 Eval (30sec sessions).

LID in Session Space - Algorithm

Page 43: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 44

Anchor Modeling Projection

• Speaker indexing [Sturim et al., 2001]

• Intersession variability modeling in projected space [Collet et

al., 2005]

• Speaker clustering [Reynolds et al., 2004]

• Speaker segmentation [Collet et al., 2006]

• Language identification [Noor and Aronowitz, 2006]

nXsXsX ˆ,...,ˆ 1

UBM

iFi X

XXs

Pr

Prlogˆ 1

Given: anchor models λ1,…,λn and session X= x1,…,xF

= average normalized log-likelihood

Projection:

Page 44: Introduction Mapping Modeling Speaker Diarization Summary H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 1 Dr. Hagai.

Introduction Mapping Modeling Speaker Diarization Summary

H. Aronowitz (IBM) Intra-Class Variability Modeling for Speech Processing June 08 45

The classic GMM algorithm does not explicitly model intra-speaker inter-session variability:• Noise• Channel• Language• Changing speaker characteristics – stress, emotion, aging

The frame independence assumption does not hold in these cases!

T

tT SySyy1t

1 Pr,...,Pr)1(

dffSySfdfSfyySyyT

tTT

1t

11 ,PrPr,,...,Pr,...,Pr)2(

Instead, we get:

Intra-Class Variability ModelingIntroduction

fSt Gy ,Pr SG fS ,Pr