Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif...

17
Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University of Miami Coral Gables, Florida 33124, U.S.A. DSAP

description

Using Confidence To Guide The Search The search score is changed from To the confidence based score Where A : Acoustic input W : The hypothesized word sequence P(A/W): The acoustic model score P(W) : The language model score LW :The language weight C(W) : The confidence of word sequence W

Transcript of Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif...

Page 1: Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence

Measure

Sherif Abdou, Michael ScordilisDepartment of Electrical and Computer Engineering, University of Miami

Coral Gables, Florida 33124, U.S.A.

DSAP

Page 2: Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Abstract

Speech recognition errors limit the capability of language models to predict subsequent words correctly

Error analysis on Switchboard data show that :87% of words proceeded by a correct word were correctly decoded 47% of words proceeded by incorrect word was correctly decoded

An effective way to enhance the function of the language model is by using confidence measures

Most of current efforts for developing confidence measures for speech recognition focus on the verification of the final result but doesn’t make any effort to correct recognition errors

In this work, we use confidence measures early during the search process.

A word-based acoustic confidence metric is used to define a dynamic language weight.

Page 3: Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Using Confidence To Guide The Search

The search score is changed from

( / ) ( / ) ( )LWScore W A P A W P W

To the confidence based score

( ( ))( / ) ( / ) ( )LW C WScore W A P A W P W

Where A : Acoustic input W : The hypothesized word sequence P(A/W): The acoustic model score P(W) : The language model score LW :The language weight C(W) : The confidence of word sequence W

Page 4: Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

We used the functional form

02( ( ))

1 exp( ( ))LW C W LW

r C W

10

( )( ) ( )

N

jj

C wC W C

N

The word sequence confidence is estimated by the average of its words’ confidence.

Where N : The number of words in sequence W C(wj) : The confidence of word wj

C0 : The operation point threshold LW0 : The static language weight r : A smoothing parameter

Page 5: Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

For bigram models we approximate by the current and previous words confidence

10

( ) ( )( ) ( )2

N NC w C wC W C

LW as a function of C(W), LW0=6.5, C0=0.65

Page 6: Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Constraints On The Measures Used For Confidence-Based Language Model (CBLM)

Efficiency: Has to be computationally inexpensive Synchronization: Can be extracted from on-line

information Source of information : Extracted only from

acoustic data

Page 7: Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Word Posterior As a Confidence Measure

^arg max ( / )

( / ) ( )arg max( )

arg max ( / ) ( )

W

W

W

W p W X

p X W p Wp X

p X W p W

Ignored in all ASR systems

Page 8: Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Observation Probability Estimation

Theoretically: ( ) ( ) ( / )

q

p x p q p x q

Discrete HMM:

Semi-Continuous HMM:

( ) ( ) ( ( ) / ) ( ( ))q

p x p q p m x q p m x

1

( ) ( ) ( )C

iq ii all q

p x p q w g x

q : model states

m(x) : vector quantization of x

C : number of mixtures wiq: mixture weightsgi(x): mixtures

Page 9: Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Continuous HMM:

Building a catch-all model

VectorQuantization

ClusteringTechnique

Mappinginformation

Catch-allModel

Originalacoustic model

Page 10: Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Mixtures Clustering Technique

1 2log ( ) ( ) B p x p x dx distance

11 21 2

1 2 1 2 1/ 2 1/ 21 2

( ) / 21 1( ) ( ) ln8 2 2

TB

distance

1 2neww w w

1 1 2 2

1 2new

w ww w

2 21 1 1 2 2 2( ( ) ) ( ( ) )new new neww w

Bdistance: Bhattacharyya distance

Page 11: Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Vector Quantization

OV: observation vectorCVi : code vector : Gaussian mixture mean

Computation reduction using VQ

CVi

CVj

CVk

CVm

OV

Page 12: Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

The Catch-all Model Performance

Relative ROC performance of reduced catch-all models

Page 13: Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Word Level Confidence Measures

Arithmetic Mean

Geometric Mean

Weighted Mean

1

1( ) ( )N

am ii

CM w CM phN

1

1 log( ( ))N

ii

CM phN

gmCM e

1

1( ) ( )N

wm i i ii

CM w a CM ph bN

CM(a): confidence score of phoneme a

a , b : linear model parameters

Page 14: Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Word Level Confidence Measures Performance

ROC curves indicating the relative performance of CMam , CMgm and CMwm

Page 15: Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Performance Evaluation Compared With Other Approaches

Comparison of the catch-all model measure, the likelihood ratio(LR) measure and the word

lattice based measure

Page 16: Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Experimental Results 

Smoothing Parameter( r)

Threshold0.5 0.6 0.7 0.8 0.9

0 19.3% 19.3% 19.3% 19.3% 19.3%1 18.6% 18.43% 18.41% 18.31 24%2 18.9% 18.42% 18.41 18.30 22%3 18.9% 18.47% 18.63 18.43 25%

  WER for different threshold and r values

Recognition accuracy for words following correctly decoded and

incorrectly decoded words

Page 17: Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

CONCLUSION AND FUTURE WORK

We used a confidence metric to improve the integration of system models and guide the search towards the most promising paths

Dynamic tuning of the language model weight parameter proved to be effective for performance improvement

Word posterior based confidence measures are efficient and can be extracted from the online search side information.It doesn’t require the training of anti-models

With CBLM the language model score will be favored in regions of ambiguous acoustics, but will plays a second fiddle when the acoustics are well matched.

Future work: We plan to extend this work for the cases when we have high confidence only for one of the words, we should back off to the unigram language model score not completely reduce the language model score.