Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13...

Post on 03-Apr-2015

106 views 0 download

Transcript of Fusion Gérard CHOLLET chollet@tsi.enst.fr@ GET-ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13...

Fusion

Gérard CHOLLETchollet@tsi.enst.fr

GET-ENST/CNRS-LTCI46 rue Barrault

75634 PARIS cedex 13http://www.tsi.enst.fr/~chollet

Plan

Motivations, Applications Reconnaissance de formes Multi-capteurs

Rehaussement du signal Parametres Scores Decisions Conclusions Perspectives

Introduction

Reconnaissance des formes Pourquoi fusionner ? Que fusionner ?

Des signaux issus de capteurs divers, Des parametres mesures sur ces signaux, Des scores calculés par des classificateurs, Des decisions prises par des classificateurs

Comment fusionner ?

Reconnaissance de formes

Fusion de signaux Nombre de capteurs Types de capteurs

Identiques ? Nombre de sources Exemples :

Réseaux de microphones Stérovision Seïsmographe

Fusion de paramètres

Issus d’un seul capteur Issus de plusieurs capteurs Modèles multi-flux Exemples :

Reconnaissance de la parole Réseaux bayésiens

Fusion de scores

Fusion de décisions

Vector Quantization (VQ)

bestquant.

),()Y,X( X2

jiCd y∑=μ

Dictionnaire locuteur 1

Dictionnaire locuteur 2

Dictionnaire locuteur n

“Bonjour” locuteur test Y

Dic

tionn

aire

locu

teur

X

SOONG, ROSENBERG 1987

Hidden Markov Models (HMM)

Bestpath

)S(Plog)Y,X(iXjy∑−=μ

“Bonjour” locuteur 1

“Bonjour” locuteur 2

“Bonjour” locuteur n

“Bonjour” locuteur test Y

“Bon

jour

” lo

cute

ur X

ROSENBERG 1990, TSENG 1992

Ergodic HMM

Best path

)S(Plog)Y,X(iXjy∑−=μ

HMM locuteur 1

HMM locuteur 2

HMM locuteur n

“Bonjour” locuteur test Y

HM

M lo

cute

ur X

PORITZ 1982, SAVIC 1990

Gaussian Mixture Models (GMM)

REYNOLDS 1995

HMM structure depends on the application

Gaussian Mixture Model

Parametric representation of the probability distribution of observations:

Gaussian Mixture Models

8 Gaussians per mixture

Support Vector Machines and Speaker Verification

Hybrid GMM-SVM system is proposed

SVM scoring model trained on development data to classify true-target speakers access and impostors access, using new feature representation based on GMMs

Modeling

Scoring

GMM

SVM

SVM principles

X (X)

Inpu

t sp

ace

Feat

ure

spac

e Separating hyperplans H , with the optimal hyperplan Ho

Ho

H

Class(X)

Results

Combining Speech Recognition and Speaker Verification.

Speaker independent phone HMMs Selection of segments or segment classes

which are speaker specific Preliminary evaluations are performed on the

NIST extended data set (one hour of training data per speaker)

Some developments were done during a 6 weeks workshop (SuperSID) during summer 2002

SuperSID experiments

GMM with cepstral features

Selection of nasals in words in -ing

being everything getting anything

thing something things

going

Fusion

Fusion results

Audio-Visual Identity Verification

A person speaking in front of a camera offers 2 modalities for identity verification (speech and face).

The sequence of face images and the synchronisation of speech and lip movements could be exploited.

Imposture is much more difficult than with single modalities.

Many PCs, PDAs, mobile phones are equiped with a camera. Audio-Visual Identity Verification will offer non-intrusive security for e-commerce, e-banking,…

Examples of Speaking Faces

Sequence of digits (PIN code)

Free text

QuickTime™ et undécompresseur sont requis pour visionner cette image. QuickTime™ et undécompresseur sont requis pour visionner cette image.

Fusion of Speech and Face

(from thesis of Conrad Sanderson, aug. 2002)

1. Acquisition of biometric signals for each modality2. Scores are computed for each modality3. Fusion of scores and decision

InsecureNetwork

Distant server:1. Access to private data2. Secured transactions

An illustration

Conclusions and Perspectives Speech is often the only usable biometric

modality (over the telephone network).

Interactive Voice Servers may use both text dependent and text independent approaches for improved verification accuracy.

Evaluation campaigns and research workshops are efficient means to stimulate progress.

Most PCs, PDAs and Mobile Phones will be equipped with cameras. Audio-Visual Identity Verification should find applications in e-Banking, e-Commerce, ….