Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation

Performance Analysis of Advanced Front Endson the Aurora Large Vocabulary Evaluation

• Authors:Naveen Parihar and Joseph PiconeInst. for Signal and Info. ProcessingDept. Electrical and Computer Eng.Mississippi State University

• Contact Information:Box 9571Mississippi State UniversityMississippi State, Mississippi 39762Tel: 662-325-8335Fax: 662-325-2298

• URL: http://www.isip.msstate.edu/publications/seminars/msstate_misc/2004/gsa/

Email: {parihar,picone}@isip.msstate.edu

INTRODUCTIONBLOCK DIAGRAM APPROACH

Core components:

• Transduction

• Feature extraction

• Acoustic modeling (hidden Markov models)

• Language modeling (statistical N-grams)

• Search (Viterbi beam)

• Knowledge sources

INTRODUCTIONAURORA EVALUATION OVERVIEW

• WSJ 5K (closed task) with seven (digitally-added) noise conditions

• Common ASR system• Two participants:

QIO: QualC., ICSI, OGI; MFA: Moto., FrTel., Alcatel

• Client/server applications

• Evaluate robustness in noisy environments

• Propose a standard for LVCSR applications

Performance Summary

SiteTest Set

CleanNoise

(Sennh)

(MultiM)

Base (TS1) 15% 59% 75%

Base (TS2) 19% 33% 50%

QIO (TS2) 17% 26% 41%

MFA (TS2) 15% 26% 40%

• Is the 31% relative improvement (34.5% vs. 50.3%) operationally significant ?

INTRODUCTIONMOTIVATION

• Aurora Large Vocabulary (ALV) evaluation goal was at least a 25% relative improvement over the baseline MFCC front end

MFCC: Overall WER – 50.3%

8 kHz – 49.6% 16 kHz – 51.0%

TS1 TS2 TS1 TS2

58.1% 41.0% 62.2% 39.8%

QIO: Overall WER – 37.5%

8 kHz – 38.4% 16 kHz – 36.5%

TS1 TS2 TS1 TS2

43.2% 33.6% 40.7% 32.4%

MFA: Overall WER – 34.5%

8 kHz – 34.5% 16 kHz – 34.4%

TS1 TS2 TS1 TS2

37.5% 31.4% 37.2% 31.5%

ALV Evaluation Results

• Generic baseline LVCSR system with no front end specific tuning

• Would front end specific tuning change the rankings?

EVALUATION PARADIGMTHE AURORA – 4 DATABASE

Acoustic Training:• Derived from 5000 word WSJ0 task• TS1 (clean), and TS2 (multi-condition)• Clean plus 6 noise conditions• Randomly chosen SNR between 10 and 20 dB• 2 microphone conditions (Sennheiser and secondary)• 2 sample frequencies – 16 kHz and 8 kHz• G.712 filtering at 8 kHz and P.341 filtering at 16 kHz

Development and Evaluation Sets:• Derived from WSJ0 Evaluation and Development sets• 14 test sets for each• 7 test sets recorded on Sennheiser; 7 on secondary• Clean plus 6 noise conditions• Randomly chosen SNR between 5 and 15 dB• G.712 filtering at 8 kHz and P.341 filtering at 16 kHz

EVALUATION PARADIGMBASELINE LVCSR SYSTEM

Standard context-dependent cross-word HMM-based system:

• Acoustic models: state-tied4-mixture cross-word triphones

• Language model: WSJ0 5K bigram

• Search: Viterbi one-best using lexical trees for N-gram cross-word decoding

• Lexicon: based on CMUlex

• Real-time: 4 xRT for training and 15 xRT for decoding on an800 MHz Pentium

MonophoneModeling

State-Tying

CD-TriphoneModeling

MixtureModeling (2,4)

Training Data

EVALUATION PARADIGMWI007 ETSI MFCC FRONT END

• Zero-mean debiasing

• 10 ms frame duration

• 25 ms Hamming window

• Absolute energy

• 12 cepstral coefficients

• First and second derivatives

Input Speech

Fourier Transf. Analysis

Cepstral Analysis

Zero-mean andPre-emphasis

Energy

FRONT END PROPOSALSQIO FRONT END

• 10 msec frame duration

• 25 msec analysis window

• 15 RASTA-like filtered cepstral coefficients

• MLP-based VAD

• Mean and variance normalization

Fourier Transform

Mel-scale Filter Bank

Mean/VarianceNormalization

Input Speech

MLP-basedVAD

FRONT END PROPOSALSMFA FRONT END

• 10 msec frame duration• 25 msec analysis window• Mel-warped Wiener filter

based noise reduction• Energy-based VADNest• Waveform processing to

enhance SNR• Weighted log-energy• 12 cepstral coefficients• Blind equalization (cepstral

domain)• VAD based on acceleration of

various energy based measures

Input Speech

Noise Reduction

Cepstral Analysis

Waveform Processing

Blind Equalization

Feature Processing

VADNest

EXPERIMENTAL RESULTSFRONT END SPECIFIC TUNING

• Pruning beams (word, phone and state) were opened during the tuning process to eliminate search errors.

• Tuning parameters: State-tying thresholds: solves the problem of

sparsity of training data by sharing state distributions among phonetically similar states

Language model scale: controls influence of the language model relative to the acoustic models (more relevant for WSJ)

Word insertion penalty: balances insertions and deletions (always a concern in noisy environments)

EXPERIMENTAL RESULTSFRONT END SPECIFIC TUNING

• QIO FE - 7.5% relative improvement

• MFA FE - 9.4% relative improvement

• Ranking is still the same (14.9% vs. 12.5%) !

FE Cond. # of

States

State Tying Thresholds LM Scale

Ins. Pen.

Split Merge Occu.

QIO Base 3209 165 165 840 18 10 16.1%

QIO Tuned 3512 125 125 750 20 10 14.9%

MFA Base 3208 165 165 840 18 10 13.8%

MFA Tuned 4254 100 100 600 18 05 12.5%

EXPERIMENTAL RESULTSCOMPARISON OF TUNING

Front End

Train Set

Tuning Average WER over 14 Test Sets

QIO 1 No 43.1%

QIO 2 No 38.1%

Avg. WER No 38.4%

QIO 1 Yes 45.7%

QIO 2 Yes 35.3%

Avg. WER Yes 40.5%

MFA 1 No 37.5%

MFA 2 No 31.8%

Avg. WER No 34.7%

MFA 1 Yes 37.0%

MFA 2 Yes 31.1%

Avg. WER Yes 34.1%

• Same Ranking: relative performance gap increased from9.6% to 15.8%

• On TS1, MFA FE significantly better on all 14 test sets (MAPSSWE p=0.1%)

• On TS2, MFA FE significantly better only on test sets 5 and 14

EXPERIMENTAL RESULTSMICROPHONE VARIATION

• Train on Sennheiser mic.; evaluate on secondary mic.

• Matched conditions result in optimal performance

• Significant degradation for all front ends on mismatched conditions

• Both QIO and MFA provide improved robustness relative to MFCC baseline

Sennheiser Secondary

ETSI QIO MFA

EXPERIMENTAL RESULTSADDITIVE NOISE

ETSI QIO MFA

TS2 TS3 TS4 TS5 TS6 TS7

• Performance degrades on noise condition when systems are trained only on clean data

• Both QIO and MFA deliver improved performance

TS2 TS3 TS4 TS5 TS6 TS7

• Exposing systems to noise and microphone variations (TS2) improves performance

SUMMARY AND CONCLUSIONSWHAT HAVE WE LEARNED?

• Both QIO and MFA front ends achieved ALV evaluation goal of improving performance by at least 25% relative over ETSI baseline

• WER is still high ( ~ 35%), human benchmarks have reported low error rates (~1%). Improvement in performance is not operationally significant

• Front end specific parameter tuning did not result in significant change in overall performance (MFA still outperforms QIO)

• Both QIO and MFA front ends handle convolution and additive noise better than ETSI baseline

APPENDIXAVAILABLE RESOURCES

• Speech Recognition Toolkits: compare front ends to standard approaches using a state of the art ASR toolkit

• ETSI DSR Website: reports and front end standards

• Aurora Project Website: recognition toolkit, multi-CPU scripts, database definitions, publications, and performance summary of the baseline MFCC front end

Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation

Documents

Transcript of Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation

Ends 1-2-14 l VW Sign Then Drive Sales Event l Serving Aurora, CO l Emich Volkswagen Colorado

AURORA IGLOOS PROMOTION! (AY) ฟินแลนด์แสงเหนือ ... - JAN 1819/AURORA IGLOOS... · 2018. 10. 15. · aurora igloos promotion! (ay) ฟินแลนด์แสงเหนือ

Aurora Finance 100 Introduction to Aurora Finance

Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation

Aurora Epoxy Garage Floor in Aurora, Colorado

Aurora Series · 2019-07-05 · Aurora Series Product Manual and User Guide Aurora 4C Aurora Cobalt Aurora Crimson ... skin damage, and death. Please refer to the section on safe

Aurora Universal Protocol Converter (UPC) … · Aurora Universal Protocol Converter (UPC) Aurora UPC Application and Troubleshooting Guide ... Aurora Modbus protocol to BACnet MS/TP,

AURORA Chamber of Commerce96bda424cfcc34d9dd1a-0a7f10f87519dba22d2dbc6233a731e5.r41.… · Aurora Chamber of Commerce 2015 Sponsorship & Advertising Opportunities 2015 Aurora Community

Aurora Primer - Aurora Investment Trust · Aurora Investment Trust Primer Phoenix Asset Management Partners Limited 10 Track Record of the Aurora Investment Trust NAV Return % Share

Aurora 8B10B for GTY UltraScale+, Zynq UltraScale+ MPSoC ... · Aurora 8B10B GTY XAPP1331 June 6, 2018 9 Aurora 8B10B GTY The Aurora 8B10B for GTY is derived from the la test Aurora

Aurora University Aurora, IL Michelle Rich 2 nd Period .

Aurora Bearing Commercial Catalog - Aurora Bearing Company

Aurora R7 Setup and Specifications - Delltopics-cdn.dell.com/pdf/alienware-aurora-r7-desktop_specifications... · Aurora R7 Setup and Specifications Computer Model: Alienware Aurora

AURORA SERIES L.E.D. REMOTE DISPLAYSleduc-thibeault.weebly.com/uploads/1/4/8/5/14859346/aurora---alpha_tm_0412.pdfINDICATOR AURORA #1 AURORA #2 AURORA #3 TX A RX A RX A RX A TX B RX

Aurora House Foundation Aurora House News

Stage 1 Vocabulary - Understanding suffixes · Web viewTeach if word ends in consonant + y, change to i: plenty-plentiful, beauty-beautiful. Teach if word ends in a vowel + y, just

AURORA HYDRONIC ACCESSORIES - …files.pentairliterature.com/aurora/A-02-1065.pdf · AURORA ® Hydronic Accessories Aurora® Pump Hydronic ... Distribution Center is stocked with

BEGINNING STRING ORCHESTRA Grade 2½ Aurora Borealis · aurora borealis, or northern lights, and aurora australis, or southern lights. Aurora Borealis MICHAEL HOPKINS INSTRUMENTATION

Cytek Aurora User’s Guide - University of Washington Analysis Facility/Aurora User Guide.pdfChapter 2: Overview 11 2 Overview Aurora System The Aurora system consists of the Aurora

Word Problems and Vocabulary Fractions, Decimals, Place Value Add, Subtract, Multiply, Divide Odds & Ends Geometry, Patterns, Algebra $100 $300 $200.