Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif...

Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence

Measure

Sherif Abdou, Michael ScordilisDepartment of Electrical and Computer Engineering, University of Miami

Coral Gables, Florida 33124, U.S.A.

Abstract

Speech recognition errors limit the capability of language models to predict subsequent words correctly

Error analysis on Switchboard data show that :87% of words proceeded by a correct word were correctly decoded 47% of words proceeded by incorrect word was correctly decoded

An effective way to enhance the function of the language model is by using confidence measures

Most of current efforts for developing confidence measures for speech recognition focus on the verification of the final result but doesn’t make any effort to correct recognition errors

In this work, we use confidence measures early during the search process.

A word-based acoustic confidence metric is used to define a dynamic language weight.

Using Confidence To Guide The Search

The search score is changed from

( / ) ( / ) ( )LWScore W A P A W P W

To the confidence based score

( ( ))( / ) ( / ) ( )LW C WScore W A P A W P W

Where A : Acoustic input W : The hypothesized word sequence P(A/W): The acoustic model score P(W) : The language model score LW :The language weight C(W) : The confidence of word sequence W

We used the functional form

02( ( ))

1 exp( ( ))LW C W LW

( )( ) ( )

C wC W C

The word sequence confidence is estimated by the average of its words’ confidence.

Where N : The number of words in sequence W C(wj) : The confidence of word wj

C0 : The operation point threshold LW0 : The static language weight r : A smoothing parameter

For bigram models we approximate by the current and previous words confidence

( ) ( )( ) ( )2

N NC w C wC W C

LW as a function of C(W), LW0=6.5, C0=0.65

Constraints On The Measures Used For Confidence-Based Language Model (CBLM)

Efficiency: Has to be computationally inexpensive Synchronization: Can be extracted from on-line

information Source of information : Extracted only from

acoustic data

Word Posterior As a Confidence Measure

^arg max ( / )

( / ) ( )arg max( )

arg max ( / ) ( )

W p W X

p X W p Wp X

p X W p W

Ignored in all ASR systems

Observation Probability Estimation

Theoretically: ( ) ( ) ( / )

p x p q p x q

Discrete HMM:

Semi-Continuous HMM:

( ) ( ) ( ( ) / ) ( ( ))q

p x p q p m x q p m x

( ) ( ) ( )C

iq ii all q

p x p q w g x

q : model states

m(x) : vector quantization of x

C : number of mixtures wiq: mixture weightsgi(x): mixtures

Continuous HMM:

Building a catch-all model

VectorQuantization

ClusteringTechnique

Mappinginformation

Catch-allModel

Originalacoustic model

Mixtures Clustering Technique

1 2log ( ) ( ) B p x p x dx distance

11 21 2

1 2 1 2 1/ 2 1/ 21 2

( ) / 21 1( ) ( ) ln8 2 2

distance

1 2neww w w

1 1 2 2

1 2new

w ww w

2 21 1 1 2 2 2( ( ) ) ( ( ) )new new neww w

Bdistance: Bhattacharyya distance

Vector Quantization

OV: observation vectorCVi : code vector : Gaussian mixture mean

Computation reduction using VQ

The Catch-all Model Performance

Relative ROC performance of reduced catch-all models

Word Level Confidence Measures

Arithmetic Mean

Geometric Mean

Weighted Mean

1( ) ( )N

CM w CM phN

1 log( ( ))N

CM phN

gmCM e

1( ) ( )N

wm i i ii

CM w a CM ph bN

CM(a): confidence score of phoneme a

a , b : linear model parameters

Word Level Confidence Measures Performance

ROC curves indicating the relative performance of CMam , CMgm and CMwm

Performance Evaluation Compared With Other Approaches

Comparison of the catch-all model measure, the likelihood ratio(LR) measure and the word

lattice based measure

Experimental Results

Smoothing Parameter( r)

Threshold0.5 0.6 0.7 0.8 0.9

0 19.3% 19.3% 19.3% 19.3% 19.3%1 18.6% 18.43% 18.41% 18.31 24%2 18.9% 18.42% 18.41 18.30 22%3 18.9% 18.47% 18.63 18.43 25%

WER for different threshold and r values

Recognition accuracy for words following correctly decoded and

incorrectly decoded words

CONCLUSION AND FUTURE WORK

We used a confidence metric to improve the integration of system models and guide the search towards the most promising paths

Dynamic tuning of the language model weight parameter proved to be effective for performance improvement

Word posterior based confidence measures are efficient and can be extracted from the online search side information.It doesn’t require the training of anti-models

With CBLM the language model score will be favored in regions of ambiguous acoustics, but will plays a second fiddle when the acoustics are well matched.

Future work: We plan to extend this work for the cases when we have high confidence only for one of the words, we should back off to the unigram language model score not completely reduce the language model score.

Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif...

Documents

Transcript of Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif...

Sherif Mahmoud CV

Presented by Sherif Abdou. VoiceXML experimentation platform Interpreter: SpeechWorks OpenVXI ASR engine:…

Anesthesia Interventional Neuroradiology Sherif 2008

Copyright © 2004 Sherif Kamel Egypt Goes Online Sherif Kamel The American University in Cairo.

Sherif Eissa MD, FRCS

ADNAN SHERIF - UFPE

CESTAR Briefing Mohamed Abdou

AbdelRahman Abdou, Paul C. van Oorschot, and Tao Wanpeople.scs.carleton.ca/~abdou/sdnsec.pdf · AbdelRahman Abdou, Paul C. van Oorschot, and Tao Wan Abstract—Software deﬁned networking

Carolyn Sherif (1922-1982)

Art Portfolio Nouran Sherif

Sherif Said CV

RTOS EngSobkyLectures by Sherif

Sherif nasr – biography info nf

A Combined Algorithm for Layout Analysis of Arabic ... · Abdulrahman Alshameri Faculty of Computers and Information Cairo University Sherif Abdou Faculty of Computers and Information

Photo Album Of Sherif(2)

By Sherif M. Sharroush

2010 StructuralDesignConcrete Sherif

Presentation Skills Sherif

Formation of morula Dr. Sherif Fahmy. Blastocyst (at 7 th day) Dr. Sherif Fahmy.

Mohamed Abdou