Quantile Based Histogram Equalization for Noise Robust Speech Recognition von Diplom-Physiker...

Quantile Based Histogram Equalizationfor Noise Robust Speech Recognition

vonDiplom-Physiker Florian Erich Hilger

ausBonn - Bad Godesberg

Berichter: Univ.-Prof. Dr.-Ing. Hermann Ney

Presenter : Chen Hung_Bin

December 2004

outline

Histogram Normalization Quantile Based Histogram Equalization Experimental Conclusion

Histogram Normalization

Histogram normalization is a general non-parametric method to make the cumulative distribution function (CDF) of some given data match a reference distribution.

to reduce an eventual mismatch between the distribution of the incoming test data and the training data's distribution which is used as reference

between the test and the training data distributions is caused by the dierent acoustic conditions

the two CDFs can be used directly to dene a transformation

))((ˆ 1 YPPY train

data training theof CDF reference inverse the

and datast current te theof CDF theis If1trainP

Example for the cumulative distribution functions of a clean and noisy signal.

The arrows show how an incoming noisy value is transformed based on these twocumulative distribution functions.

two pass method Two separate histograms, one for silence the other for speech, can be

estimated on the training data. Then a first recognition pass can be used to determine the amount of

silence in the recognition utterances. Based on that percentage the appropriate target histogram can be

determined. which requires a sufficiently large amount of data from the same

recording environment or noise condition to get reliable estimates for the high resolution histograms

two pass method It can not be used when a real-time response of the recognizer is requir

ed, like in command and control applications or spoken dialog systems.

Quantile equalization is a straight forward solution to this problem would be to reduce the number of histogram bins, in order to get reliable estimates even with little data.

Quantile Based Histogram Equalization

Quantiles are very easy to determine by just sorting the sample data set.

Cumulative distributions can be approximated using quantiles. example, two cumulative distribution function with four 25% quant

iles, NQ = 4

NQ = 4, like shown in the example, about one second of data (100 time frames) is already sufficient to get a rough estimate of the cumulative distribution

an other advantage of the quantile Even if the data set that shall be considered only consists of very few o

r in an extreme case just one sample, the quantiles can be calculated without any special modication of the algorithm.

the corresponding reference quantiles of the training data define a set of points that can be used to determine the parameters of a transformation function that transforms the incoming data to and thus reduces the mismatch between the test and training data quantiles

),(~ YTY

Applying a transformation function to make the four training and recognition quantiles match.

Within the context of this work the transformation is applied to the output of the Mel-scaled filter-bank after applying a 10th root to reduce the dynamic range, so in the following will denote the output vector of the filter-bank and will correspondingly denote its component.

To scale the incoming filter output values down to the interval [0; 1] After the power function transformation is applied the values are scaled

back to the original range:

YkY thk

1 , ),(~

kkNQkkkk Q

Small values are scaled down even further towards zero, so little amplitude dierences will be enhanced considerably if a logarithm is applied afterwards, this is in contradiction to the desired compression of the signal to a smaller range.

so the transformation function that will always be used within the context

kkkNQkkkk Q

kkNQkkkk Q

Both transformation parameters are jointly optimized to minimize the squared distance between the current quantiles and the training quantiles

The minimum is determined with a simple grid search: by the way it should be in the range

kkk ,ktQ

trainiQ

2',minarg'

trainikkikk QQT

kk , max1, , 0,1 kk

The step size for the grid search can be set to a value in the order of 0.01

Example: output of the 6th Mel scaled lter over time for a sentence from the Aurora 4 test set

case in this 0.1 and 4.1search grid the Cumulative distributions of the signals

Combine neighboring filter channels: a linear combination of a filter with its left and right neighbor can be u

sed to further reduce the remaining difference are the filter output values and the recognition quantiles after the pre

ceding power function transformation factors are denoted for the left neighbors and for the right neigh

bors With the transformation step can be written as:

11 kkkkkkkkkk YYYTY

kkk ,~

Comparison of the RWTH baseline feature extraction front-end

Experiment

Car Navigation isolated German words recorded in cars vocabulary consists of 2100 equally probable words the training data was recorded in a quiet office environment

Aurora 3 – SpeechDat Car continuous digit strings recorded in cars four languages are available: Danish, Finnish, German, and Spanish

Aurora 4 – noisy WSJ 5k utterances read from the Wall Street Journal with various artificially a

dded noises vocabulary consists of 5000 words

Comparison of Logarithm and Root Functions

isolated word Car Navigation database with different root functions on the Car Navigation database

LOG: logarithm, CMN: cepstral mean normalization,2nd - 20th: root instead of logarithm, FMN: filter mean normalization.

Comparison of logarithm and 10th root on Aurora 3 database

WM: well matched, MM: medium mismatch, HM: high mismatch, FMN: filter mean normalization

on the Aurora 4 noisy WSJ 16kHz database.

LOG: logarithm, CMN: cepstral mean normalization,2nd - 20th: root instead of logarithm, FMN: filter mean normalization.

Experiment - Quantile Equalization

Recognition results on the Car Navigation database with quantile equalization

LOG: logarithm, CMN: cepstral mean normalization, 10th: root instead of logarithm, FMN: filter mean normalization, QE: quantile equalization, QEF(2): quantile equalization with filter combination (2 neighbors).

Comparison of quantile equalization with histogram normalization on the Car Navigation database.

QE train: applied during training and recognition. HN: speaker session wise histogram normalization, HN sil: histogram normalization dependent on the amount of silence, ROT: feature space rotation.

Comparison of QE and HN

Cumulative distribution function of the 6th lter output.

HN: after histogram normalization,QE: after quantile equalization.clean: data from test set 1, noisy: test set 12

Recognition results on the Car Navigation database for dierent numbers of quantiles.

10th: root instead of logarithm, FMN: filter mean (and variance) normalization, QE: quantile equalization with NQ quantiles, QEF quantile equalization with filter combination.

Comparison of the logarithm in the feature extraction with dierent root functions on the Car Navigation database.

2nd - 20th: root instead of logarithm, FMN:filter mean normalization, QE: quantile equalization, QEF: quantile equalization with filter combination.

Conclusion

Replacing the logarithm in the feature extraction by a root function signficantly increased the recognition performance on noisy data

Using four quantiles NQ = 4 can be recommended as standard setup, it can be used on short windows as well as complete utterances.

Spectral Entropy Feature in Full-Combination Multi-Stream for Robust ASR

Hemant Misra , Herv´e Bourlard∗ ∗IDIAP Research Institute, Martigny, Switzerland

Presenter : Chen Hung_Bin

INTERSPEECH 2005

Introduction

computing spectral entropy features from the sub-bands of spectrum in order to locate the spectral peaks of the spectrum

spectral entropy features are used along with PLP features in multi-stream framework

training a separate multi-layered perceptron (MLP) for PLP features

9.2% relative error reduction as compared to the baseline

Spectral entropy feature

Entropy measures can be used to capture the “peakiness” sharp peak will have low entropy flat distribution will have high entropy

convert the spectrum into a probability mass function (PMF) like function by normalizing it.

spectrum ofenergy theis , /

Spectral entropy feature

observe that entropy computed on full-band spectrum can be used as an estimate for speech/silence detection

Entropy computed from the full-band spectrum. (a) Clean speech wave form, (b) Entropy contour for clean speech,(c) Speech corrupted with factory noise at 6 dB SNR, and (d) Entropy contour for speech corrupted with factory noise at 6 dB SNR.

Multi-band/multi-resolution spectral entropy feature

The full-band spectral entropy feature can capture only the gross peakiness of the spectrum.

obtained the best results by dividing the normalized full-band spectrum into 24 overlapping sub-bands defined by Mel-scale and computed entropy from each sub-band

Entropy based full-combination multi-stream (FCMS)

Full-combination multi-stream :

All possible combinations of the two features are treated as separate streams.An MLP expert is trained for each stream. The posteriors at the output of experts are weighted and combined. The combined posteriors thus obtained are passed to an HMM decoder.

Entropy based full-combination multi-stream

The combined output posterior probability for class and framethk thn

xqPxqPh

xqPwXqP

: 10000~

),|(log),|(

),|(),|(ˆ

3) of (case stream ofnumber :

setparameter :

number frame :

vectorfeature stream :

Spectral entropy feature in Tandem framework

exploiting the advantages of both HMM/ANN and HMM/GMM systems

Multi-stream Tandem: Out puts from different experts are weighted and combined. The combined output undergoes KL transform before being fed as features into HMM/GMM systems.

access to the ‘outputs before softmax’

Therefore we cannot use the entropy based weighting directly. To overcome this problem

we converted the ‘outputs before softmax’ into posteriors using the equation.

“softmax” nonlinearity in this position (exponentials normalized to sum to 1)

nknk xy

)|exp(

)|exp()|(

instant time:

vectorfeature:

Experimental

Numbers95 database of US English connected digits telephone speech is used

There are 30 words in the database represented by 27 phonemes

Noisex92 database added at different signal-to-noise-ratios (SNRs)

There were 3,330 utterances for training and 2,250 utterances were used for testing the system

Results

Hybrid system under different noise conditions:

WERs for PLP features, 24 Mel-band spectral entropy features and its time derivaties (24-Mel), the two features appended (PLP + 24-Mel), and PLP and spectral entropy features in FCMS with inverse entropy weighting.

Results

Tandem system under different noise conditions:

WERs for PLP features, 24 Mel-band spectral entropy features and its time derivaties (24-Mel), the two features appended (PLP + 24-Mel), and PLP and spectral entropy features in FCMS with inverse entropy weighting.

Conclusion

We demonstrated that better performance can be achieved by FCMS as compared to appending the multi-resolution entropy feature vector to the PLP feature vector.

References

[4] Hemant Misra, Shajith Ikbal, Herv´e Bourlard, and Hynek Hermansky, “Spectral entropy based feature for robust ASR,” in Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing, Montreal, Canada, May 2004.

[5] Hemant Misra, Shajith Ikbal, Sunil Sivadas, and Herv´e Bourlard, “Multi-resolution spectral entropy feature for robust ASR,” in Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing, Philadelphia, U.S.A., Mar. 2005.

[7] Hynek Hermansky, Daniel P. W. Ellis, and Sangita Sharma, “TANDEM connectionist feature extraction for conventional HMM systems,” in Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing, Istanbul, Turkey, 2000.

[11] Astrid Hagen and Andrew Morris, “Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR,” Computer Speech and Language, , no. 19, pp. 3–30, 2005.

Quantile Based Histogram Equalization for Noise Robust Speech Recognition von Diplom-Physiker...

Documents

Transcript of Quantile Based Histogram Equalization for Noise Robust Speech Recognition von Diplom-Physiker...

UPWARD MOBILITY AND DISCRIMINATION: THE … · Upward Mobility and Discrimination: The Case of Asian Americans Nathaniel Hilger NBER Working Paper No. 22748 October 2016, Revised

GaAs-Based Semiconductor Optical Amplifiers with … Semiconductor Optical Amplifiers with Quantum Dots as an Active Medium vorgelegt von Diplom-Physiker Matthias Lämmlin aus Müllheim

Konferenz ist tot Diese Stadt hat genug…€¦ · Coworking Wissensarchitektin Mathematiker Chemiker Techniker Biologe Zeichner Physiker Linguistin Geograph Medizinerin Informatiker

Two topics in particle accelerator beams - DESYmpybar/thesisdump/kh_thesis.pdfTwo topics in particle accelerator beams by Klaus Heinemann Diplom Physiker, University of Hamburg, 1986

Treaty Series 299/v299.pdfExchange of notes constituting an agreement supplementing the above-mentioned Agreement. Bonn/Bad Godesberg, 7 and 14 November 1955, and Bonn, 29 November

PROOF OF THE ORTHOGONAL MEASUREMENT CONJECTURE FOR TWO … · 2017-11-23 · proof of the orthogonal measurement conjecture for two states of a qubit andreas keil (diplom-physiker),

CURRICULUM VITAE Silvio Lorenzetti - ETH Zn.ethz.ch/~losilvio/CV_Silvio_Lorenzetti.pdf · CURRICULUM VITAE Silvio Lorenzetti ... Uni Bern Diplom-Physiker Phone ... Winterthur Warriors,

List of Participants - Stacksbd092kk0903/bd092kk0903.pdf · Bandelow, C, Miinchen Banderet, P., Wettingen Bansch, D., Bad Godesberg Barber, D.L.A., Teddington Barbou dcs Courieres,

Differentialgeometrie fur Physiker¨giulini/papers/Skript... · 2020. 6. 10. · wir uns oft (aber nicht immer) an dem klassischen Lehrbuch von Detlef Laugwitz [2] orientieren, ohne

Noise Robust Automatic Speech Recognition · speech recognition. Proceedings Interspeech 2007, 1054-1057, August 2007. F. Hilger and H. Ney. Quantile based histogram equalization

Mathematik fuer Physiker IIa

1 ORANGE COUNTY SACPA/PC1210 Three-Year Report Sandy Hilger, Research Division, OC Probation Mack Jenkins, Director Adult Court Services Division, OC Probation.

Physics of Atomic clocks - École Polytechnique Fédérale de … · 2010-11-09 · Physics of Atomic Frequency Standards”, Bristol: Adam Hilger, 1989. • Claude Audoin, Bernard

C. Hilger visual pres. final

Integrating variable electricity supply from wind and ... · Integrating variable electricity supply from wind and solar PV into power systems Vorgelegt von Diplom-Physiker ... the

REDD+-safeguards für Biodiversität Einführung und status quo Dr. T. Pistorius Bad Godesberg, 31.5.2011 Tagung: REDDplus - Wie weiter nach Cancún? Output.

inauguraldissertationtneusius.com/pdf_verw/Dissertation_Neusius_2009.pdf · Diplom-Physiker Thomas Neusius ... Auf der Basis von Molekulardynamik-Simulationen werden die thermischen

EINFUHRUNG IN DIE¨ GRUPPENTHEORIE FUR¨ PHYSIKER · Element von G als Produkt von endlich vielen Elementen dieses Satzes oder dessen Inversen geschrieben werden kann (ist i.a. nicht

Optical properties of transition-metal-doped GaN and ZnO ... · Transition-Metal-Doped GaN and ZnO for Spintronics Applications Enno Malguth Diplom-Physiker Technische Universitat

The Hilger X-ray crystallograph and the cubic-crystal analyser