Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by...
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by...
![Page 1: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/1.jpg)
Robust Speech Recognition Algorithm Against Unknown
Short-Time Noise
By Arthur Chan
Supervised by Prof. Manhung Siu
Hong Kong University of Science and Technology
Copyright © by Arthur Chan 2001
![Page 2: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/2.jpg)
Outline• Robust Speech Recognition• HMM-based speech recognition in short-time noises.• Our Proposal : Skip the poor frames.
– Theory,– Implementation. FSVA and FSHMM
• Evaluation I : gaussian noise replacement• Improvement of FSVA• Evaluation II : Further evidences
– Additive short-time noise,– Short-time noise in GSM environment
• Conclusion and Future Work
![Page 3: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/3.jpg)
Robust Speech Recognition
![Page 4: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/4.jpg)
Speech Recognition
• Speech recognition– acceptable performance in matched
training and testing conditions.– Or the operating conditions is known in
training– Digit recognition (99%).– Dictation (90%).– Performance is still improving if the task is
under active research.
![Page 5: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/5.jpg)
Mismatch Conditions
• The difference between training and operating (testing) enviroment.
• It exists.• For example,
– Simpler example• Sudden door slam when dictating a letter.
– In wireless environment,• The background of the speaker can change.
• Robust Speech Recognition is the study of building speech recognition that handle mismatch condition.
![Page 6: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/6.jpg)
Mismatch Conditions (cont.)
• Why mismatch conditions are hard to deal with ? – There are so many causes of it.
• Additive noise (e.g. background noise such as air-conditioning)
• Channel noise (e.g. difference between microphones in training and testing conditions)
• Others : Lombard noise. Reflection of building.
– In general, noise can have• Random amplitude,• Random duration,• Random occurrence,• Random spectral characteristic.
![Page 7: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/7.jpg)
Conventional Approach of Robust Speech Recognition
• E.g. Parallel Model Combination (PMC) (Gales, 95)– First collect some samples of noise in operating
environment,– Update acoustic model using the noise statistics,
• Work satisfactorily for stationary noise,• General time-varying noise cannot be
handled.
![Page 8: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/8.jpg)
Short Time Noise
• Time limited Noise.
• Usually in operating environment, such as,– Door slam,– Click sound of keyboard,– Frame loss in network transmission of
speech.
![Page 9: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/9.jpg)
Short Time Noise (cont.)
• In this work, we define short-time noise as,– Random spectral characteristic,– Random amplitude,– Random occurrence,– Random duration,– Shorter than the speech signal.
• Also known as partially temporal corruption (J. Ming, 2001).
• Some parts of speech is not corrupted.
![Page 10: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/10.jpg)
This work
• Deal with short-time noise.• Some parts of speech is uncorrupted.• Using an interesting perspective,
– Can we ignore contributions of those corrupted frames in the decision making process?
– Supported by Missing Feature Theory. (Lipmann 97)
• We can regard those corrupted parts of speech as missing.• We can ignore those missing parts in decision.
![Page 11: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/11.jpg)
HMM-Based Speech Recognition
![Page 12: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/12.jpg)
Hidden Markov Model (HMM)
• Markov model with unobservable states sequence,• Can be used in other pattern recognition task.• Efficient algorithm for training and testing exists.• Example : Left-to-right HMM to model speech.
![Page 13: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/13.jpg)
Viterbi Algorithm
• Efficiently search for most likely state sequence explains all observations.
)|(logmaxarg~
*OQPQ
• : An observation sequence, or .• : A state sequence, or .• : The set of all possible state sequence• : Best state sequence
OQ
*Q
Q~
),....,,( 21 Tooo
),....,,( 21 Tqqq
TO1TQ1
![Page 14: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/14.jpg)
Viterbi Algorithm (cont.)
– Express in HMM’s parameters,
1
1 11
111
)(logloglogmaxarg
)()|(logmaxarg~
1*
*
T
tt
T
tqtqqq
TTT
oba
QPQOPQ
tt
Transition. Probability
Observation
Probability
Initial Probability
![Page 15: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/15.jpg)
Viterbi Algorithm (cont.)
1
2
3
1
2
3
1
2
3
……….
T=1 T=2 T=3
• Efficient Implementation– At each state , at each time, define partial score,
),|(max)( 1111
1
iqQOPi ttt
Qt t
)(])(max[)( 1 tjijti
t obaij • Recursive Formula
![Page 16: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/16.jpg)
Short-Time Noise in Viterbi Algorithm
)(loglog)|(11
111 t
T
tq
T
ttqtq
TT obaOQPt
• Finding the best state sequence,
• Finding the mean using the average,
N
nnxN 1
1 –E.g. Mean of 2.2,2.3,2.4,2.2 =2.275
– Mean of 2.2, 2.3, 2.4,100=26.275
• Easily affected by outlier frames.
![Page 17: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/17.jpg)
Our Proposal
![Page 18: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/18.jpg)
Our Proposal
• Search for most-likely state-sequence that ignores the most poorly performing K frames.
• Can be implemented efficiently– similar to Viterbi algorithm
• achieve satisfactory performance.
Robust Mean of 2.2, 2.3, 2.4,100
=(2.2+2.3+2.4)/3=2.3
![Page 19: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/19.jpg)
Formulation : Ignore the poorest frame
• Try to ignore the frame with lowest likelihood. I.e.
1
11
)(maxarg~
1
1*
T
tt
ttqtqqq
QQobaQ
tt
• we have ranked order the frames in ).....( 1 Too to ).....(
1 Ttt oo
• Such that )()(11
iitiittqtq obob
![Page 20: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/20.jpg)
Generalization : Ignore the poorest K frames
• The robust likelihood, is defined which skip the frames with lowest likelihood
),....{ 1
1,
1
1,
1
11
1
log)(log
log)(log
)|(
K
iit
it
tti
T
i
iitq
T
i
ii
T
Kiitq
TTK
aob
aob
OQ
-Still, we maintain the alignment information (transition term unchanged)
![Page 21: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/21.jpg)
Generalization :
• Speech Recognition become the problem of finding a state sequence with best robust likelihood,
)|(maxarg
~111 *
TTK
T OQQ
![Page 22: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/22.jpg)
Alternative Formulation• For every state sequence, consider all possible patterns of corruption of K frames among T frames.• Totally of them. Denote them as .• For each pattern, are the set of uncorrupted frames in this pattern • Pattern of corruption . E.g. of T=4, K=2 has following patterns of corruption.
– Frames 1 and 2,– Frames 1 and 3,– Frames 1 and 4,– Frames 2 and 3,– Frames 2 and 4,– Frames 3 and 4.
TKC ),......,,( 21 T
KClll
),......,,( 21 KTiiii oool
![Page 23: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/23.jpg)
Alternative Formulation
• The robust likelihood, can be alternatively defined as,
1,
111
1,
11
11
log)|(logmax
log)|(logmax)|(
jj
T
j
ii
KT
j
C
i
jj
T
j
i
C
i
TTK
aqop
aQlpOQ
jj
TK
TK
• Extended Union Model probability (J. Ming)
1,
1111 log)|(log)|(
jj
T
j
i
C
i
TTK aQlpOQ
TK
![Page 24: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/24.jpg)
Missing Feature Theory Interpretation
• The above formulation relates to Missing Feature Theory that suggests:– If a feature is corrupted, we can just ignore If a feature is corrupted, we can just ignore
itit– Example: Multi-band ASR assumes band
limited noise (frequency limited)– Similarly : Our Idea assumes noises are
short time in nature(time limited)
![Page 25: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/25.jpg)
Direct Implementation
• Exhaustively neglect K frames for every state sequence– Very expensive,– For each state sequence, additions
are required,– Intractable for useful value of T and K
)( KTCT
K
![Page 26: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/26.jpg)
Previous Attempts to tackle the Computation Burden
• Lets look at attempt deals with EUM• J.Ming et al (2001)
– N-Best re-scoring paradigm– An approximate model based on segment
(consecutive number of frames) is used.– Corruption in few frames is also regarded
as corruption of a whole segment.
• A more efficient algorithm is desirable.
![Page 27: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/27.jpg)
Efficient Implementation of Viterbi Algorithm
that skips frames• Two approaches
– Topological-space expansion approach• using FSHMM.• using terminology similar to HMM.
– State-space expansion approach• Modify Viterbi algorithm directly.
![Page 28: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/28.jpg)
Topological Space Expansion
• Frame-Skipping HMM (FSHMM)• Skipping state
– Consume one observation vector.– Generate a constant only.– Example:
1
Non-Skipping Version1
1
1_s
Skip State
Frame-Skipping Version
![Page 29: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/29.jpg)
Left-to-right HMM (FS version)
Skipping State
NonSkip state
Skipt state
![Page 30: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/30.jpg)
Implementation of TopologicalSpace Expansion Approach
• Memory usage (2N+1) times of Viterbi algorithm.
• Can be implemented with standard HMM software(e.g. HTK).
• Hard to be generalized to Continuous Word Recognition– A huge HMM need to be constructed
![Page 31: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/31.jpg)
State-Space Expansion approach
• The general idea– Augment K scores when skip K frames.– In updates from previous skips, we ignore the
contribution of observation probability.– E.g.
Non-skipping version
1 2
3
1_0
1_1
2_0
2_1
3_0
3_1skipping version
ija
)( tjij oba
![Page 32: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/32.jpg)
Update Formula
• We can prove the recursion for partial robust likelihood.• We can define the partial score (robust likelihood) of state j at time T with skips K as
))](),(),1,(max([max
)),(),,(max(
),(
11
11
jjttiji
skipnont
skipt
t
obkikia
kjkj
kj
![Page 33: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/33.jpg)
Proof of Update Formula
))|(max),|(maxmax(
)|(max
)|(
11
11
|11
21
t
Ll
t
Ll
ti
C
i
ttk
QlpQlp
Qlp
QOtk
– are the set of corruption where the k-th frame is skipped
– are the set of corruption where the k-th not skipped
1L
2L
![Page 34: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/34.jpg)
Proof of Update Formula (cont.)
))|(),()|(max(
))|(max),|(maxmax()|(
|11
111
|11
11
11|
1121
ttktq
ttk
t
Ll
t
Ll
ttk
QOobQO
QlpQlpQO
t
–If we check the cardinality (or size) of the two sets.
||||||
||
||,||
2121
,21
112
11
LLLL
CLL
CLCLtk
tk
tk
Pascal’s formula
![Page 35: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/35.jpg)
Frame Skipping Viterbi Algorithm (FSVA)
• Transition probability can be easily incorporated in the above formula
• above update formula is called FSVA.
• Similar idea can be used to compute the probability of extended union model (EUM).
![Page 36: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/36.jpg)
FSVA (cont.)
• Update Formula
))](),(),1,(max([max
)),(),,(max(
),(
11
11
jjttiji
nt
st
t
obkikia
kjkj
kj
Updated from
Skip k
Updated from
Skip k-1 e.g
Impatient Button
![Page 37: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/37.jpg)
Implementation II (State-Space expansion approach)
• similar to exact N-Best Algorithm,
• Memory usage: N Times normal Viterbi,
• With caching of observation probabilities, computation will be quite similar to normal Viterbi .
![Page 38: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/38.jpg)
Evaluation I:Gaussian Noise Replacement
![Page 39: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/39.jpg)
Evaluation I(Objective)
• To determine the usefulness of FSVA.
![Page 40: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/40.jpg)
Evaluation I(Conditions)
• Baseline– Corpus : TIDIGITS(adults) train 8668, test
8668– Training 12 MFCCs + delta +delta delta
+energy = 39 features– Testing results
• 99.72 (Isolated Digit Recognition),• 98.90 (Connected Digit Recognition) (Un-
tuned)
![Page 41: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/41.jpg)
Evaluation I(Conditions)(cont.)
• Corruption is simulated– 10% of frames in testing utterance is
skipped and replaced by a frame , which is• gaussian noise• Constant energy level
– A clean model is used to test – Testing results using left-to-right HMM
• 85.34%(Isolated Digit Recognition), • 78.83%(Continuous Digit Recognition)
![Page 42: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/42.jpg)
Experiment I(Results)
• Using FSVA
But : We are not happy!
-Degrade in clean speech.
-Hard to determine what is best skip if the condition is unknown
Acc Skip
CDR Clean 98.97 2
CDR Noisy 93.71 28
IDR Clean 98.47 20
IDR Noisy 99.76 2
70
75
80
85
90
95
100
1 7 13 19 25 31 37 43 49 55
CDR noisy(0.1)
CDR clean
IDR noisy (0.1)
IDR clean
IDR: +88%
CDR: +70%
![Page 43: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/43.jpg)
How much corrupted frames are skipped? -An Analysis
• Define – : All Frames.– : Set of corrupted frames.– : Set of uncorrupted frames.– : Set of detected frame or hit frames.
• Then likelihood ratio is found to be
• We skip mostly corrupted frames.
ACU CA /H
10)|(
)|(
UHP
CHP
![Page 44: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/44.jpg)
How much can be gained from FSVA? – 2nd Analysis
• Performance of FSVA using skips which gives lowest WER for each sentence– 99.72 (Isolated), 97.66 (Continuous)
• Still room for improvement– Longer sentences require more skips to recover
• E.g (Observed from data)111.wav
-SIL 1 1 SIL (from skip 1 to 5)
-SIL 1 1 1 SIL(from skip 6 to 29)
-SIL 3 1 1 SIL(from skip 29 to 57)
….
24z982z.wav
-SIL 2 z o 9 8 2 o SIL (from skip 1 to 4)
-SIL 2 4 z o 9 8 2 o SIL(from skip 5 to 22)
-SIL 2 4 z o 9 8 2 z o SIL(from skip 23 to 36)
-SIL 2 4 z o 9 8 2 z SIL (from skip 37 to 57)
….
![Page 45: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/45.jpg)
Observations from Evaluation I
• It is difficult to determine the number of skips because of two factors,– The condition is unknown (rate of
corruption).– The length of sentence is unknown,
• Memory issue : N-times of standard Viterbi algorithm
![Page 46: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/46.jpg)
Improvements of FSVA
![Page 47: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/47.jpg)
Improvements of FSVA :
• We present the solutions of the skip determination problem,– Skip determination
• An automatic skip determination mechanism is presented.
– Memory problem is related to skip determination
• An approximate algorithm is presented• Preliminary result is presented.
![Page 48: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/48.jpg)
Improvements of FSVA:Automatic Skip Determination
• This is hard problem, depends on– Length of utterance– Rate of corruption
• In known corruption rate and length of corruption– skipping fixed number of frames may be the most
intuitive.
• In general, these conditions are unknown– Ideally, we seek for method requires no prior
knowledge of the environment.
![Page 49: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/49.jpg)
Improvements of FSVA:Automatic Skip Determination
(cont.)• Idea (Log Likelihood Ratio Thresholding
(LLRT))– Stop the skipping process by testing the ratio of
likelihood.
• Why does it work?– In general, the robust likelihood is increasing
against K.
– Because, we decimate one more frame contribution in criterion function
))1(~
|())(~
|( 1 KQOKQO KK
))(~
|())1(~
|(1 KQOKQO KK
![Page 50: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/50.jpg)
Improvements of FSVA:Automatic Skip Determination
(cont.)• The improvement
– A likelihood ratio – Generally decreasing
• It suggests we can stop skipping if the ratio > certain threshold c
![Page 51: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/51.jpg)
Cont.• Can be done very efficiently
– We can easily generate multi solutions.
Non-Skipping Version
Skipping Version
Start backtracking here
![Page 52: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/52.jpg)
Evaluation of LLRT in gaussian noise replacement
• It works.– Undegraded in clean
condition– Improved in noisy
condition– Single value works for all
conditions. E.g. c=90
BL LLRT
Clean 98.90 98.98
Noisy 78.33 95.61
![Page 53: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/53.jpg)
Discussion
• In LLRT, the threshold c– Effectively means the minimum likelihood of the
clean frames.– Success in LLRT suggests
• Skipping frames with likelihood smaller than c.• Simplified Frame-skipping Viterbi algorithm (SFSVA)• Update formula can be expressed as
, if else.
cobobct
ttob )()({)(ˆ
![Page 54: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/54.jpg)
Simplified FSVA : Preliminary Evaluation
• At c=90
BL FSVA+
LLRT
SFSVA
Clean 98.90 98.98 98.86
Noisy 78.33 95.61 95.61
• Comparable Performance as FSVA+LLRT.
![Page 55: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/55.jpg)
Evaluation II : Further Evidences
![Page 56: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/56.jpg)
Evaluation II
• Previous Experiment in Evaluation I– Fixed spectral content (gaussian noise)– Fixed amplitude– Fixed duration ( 1 frame)– Replacement noise– Not general enough.
• Experiment 1 : additive short-time noise– With varying spectral content, amplitude, duration and
occurrence. • Experiment 2 : GSM environment (replacement noise)
– Replacement with comfort noise– Similar to speech in this case.
![Page 57: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/57.jpg)
Experiment 1 (Setup)
• Train set is the same as Evaluation I• Additive short-time noise.
– Randomly pick up frames from 7 types of noises such as ring-tone, ICQ message.
– Controlled by 3 factors,• Amplitude (SNR),• Duration (L),• Rate of corruption (C).
• FSVA + LLRT is used in evaluation.
![Page 58: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/58.jpg)
Experiment 1 (Results)
• Changing amplitudes, C=20%, L=1
SNR BL LLRT(opt.)
98.90 98.99(102)
10 98.62 98.67(106)
0 97.57 97.99(102)
-10 84.04 91.89(94)
![Page 59: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/59.jpg)
Experiment 1 (Results) (cont.)
• Changing rate of corruptions, SNR=-10dB, L=1
Rate BL LLRT(opt.)
20% 84.04 91.89(94)
30% 69.21 82.96(94)
40% 56.39 71.15(94)
![Page 60: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/60.jpg)
Experiment 1 (Results) (cont.)
• Changing length of corruptions. SNR=-10dB, C=20%
Length BL LLRT(opt.)
1 84.04 91.89(94)
2 87.33 91.99(100)
3 90.24 94.07(94)
4 93.30 96.22(92)
5 95.17 97.09(94)
6 95.69 97.31(92)
![Page 61: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/61.jpg)
Experiment 1 (Results) (cont.)
• Average performance.
• Outperform baseline in wide range of c
• In [90,100]– Close to optimal
performance.
![Page 62: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/62.jpg)
Experiment 1 (Summary of Results)
• FSVA + LLRT works in all conditions,– Undegraded result in SNR >0dB– Outperforms Viterbi algorithm in other cases
• Does it necessary to use the optimal threshold?– No.– A large range of values of c outperforms Viterbi
algorithm– A large range of values of c can be used such that,
• Closed to optimal result• Tuning in single condition only.
![Page 63: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/63.jpg)
Experiment 2 (Comfort Noise Generation)
• GSM codec (GSM 06.10)– Regular Pulse Excited – Long Term Prediction (RPE-
LTP)– Linear Predictive Analysis and Synthesis
• Residual coefficients is important• Comfort Noise Generation (GSM 06.11)
– 1st frame : replace from last good frame– 2nd frame to 16th frame : decrease the magnitude of
residual coefficients of 1st frame– 16th + frames : predefined “silence” frame is substituted
• The generator cannot deal with frame loss with long duration.
![Page 64: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/64.jpg)
Experiment 2 (Setup)
• Using AURORA database.– Down-sampled version of TIDIGITS.– 8008 training utterances.– 4004 testing utterances.
• Baseline result– Train(GSM coded) on Test (GSM coded),
98.64% (<98.90%)
![Page 65: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/65.jpg)
Experiment 2 (Frame Loss Condition)
• Experiment in Noisy condition– 1%~2% of frames are corrupted– All skip position are known for the comfort
noise generator.– Comfort noise generation is done before
speech recognition.– 2 factors is controlled
• Rate of corruption• Length of corruption
![Page 66: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/66.jpg)
Experiment 2 (Results)
D C BL LLRT(opt.)
1 1% 98.03 98.12(104)
2 1% 96.47 97.40(104)
3 1% 96.10 97.19(98)
1 2% 97.71 97.95(106)
1 5% 96.31 97.20(98)
1 10% 92.98 95.33(98)
![Page 67: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/67.jpg)
Experiment 2 (Average Performance)
![Page 68: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/68.jpg)
Experiment 2 (Summary of Results)
• Corruptions with 1 frame can be handled by comfort noise generator
• FSVA still has market value– When length of corruption > 1– When rate of corruption increase– After all, no degradation even in D=1
![Page 69: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/69.jpg)
Conclusion and Future Work
![Page 70: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/70.jpg)
Contribution of this work• FSVA – Frame Skipping Viterbi Algorithm
– found to be theoretically interesting– can be easily and efficiently implemented– good results in simulated noise
• Search technique can be applied in fast computation of Extended Union Model(EUM).
• LLRT – Log Likelihood Ratio Thresholding– Automatically determine no. of skips for FSVA.
• Preliminary study of SFSVA – simplified FSVA– Same amount of memory as Viterbi algorithm– Comparable improvement as FSVA + LLRT
![Page 71: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/71.jpg)
Impact of this work
• HMM has wide range of applications in pattern recognition, digital communication.– FSVA can be used to deal with time-limited
(or space-limited) corruption in these applications
![Page 72: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/72.jpg)
Future Work• Other possibilities implied from MFT.
– Don’t ignore, but impute.– When should we ignore a frame? When should we
impute it?• Combination of FSVA and Model-compensation technique
– Deal with general additive noise• Automatic Skip Determination : Any other combination
schemes?– E.g. Rover w/ confidence and voting?
• Evaluation in comfort noise generator of other codec.– E.g. Voice Over IP (VoIP)
• Extend FSVA to applications which applied HMM.
![Page 73: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/73.jpg)
Thanks for your patience !
![Page 74: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/74.jpg)
Q & A
![Page 75: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/75.jpg)
3, Have you tried your algorithm in Aurora?
• Yes! We tried on AURORA II• But, FSVA doesn’t work because
– Most of the noise are additive noise• E.g. Street noise• E.g Babble noise
– The database is designed for Feature Extraction
![Page 76: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/76.jpg)
4, It is hard to get clean speech corpus. How do you solve this?
• Our paradigm assume– Train in clean speech– Test in noisy speech
• A complementary method (Not yet succeed)– Train in noisy speech– Test in clean speech– Difficult because multiple mixture paradigm is hard
to beat.
![Page 77: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/77.jpg)
5, Can we incorporate burst corruption in FSVA?
• It is possible but not elegant.
Burst skip
stateSkip state
![Page 78: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/78.jpg)
6, Relation between Noise Composition?
• Not yet thoroughly understand
• Decompose FSHMM will result
![Page 79: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/79.jpg)
7, How about Null Node?
• This is a little bit tricky.• Skip state is a real state.• Null state cannot result in skipping of a frame,
– Because no frame is consumed!
1_0
1_1
2_0
2_1
3_0
3_1skipping version
Null Nodes
![Page 80: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/80.jpg)
8, Have you consider any real-life examples of additive noise?
• Yes!• Not presented in thesis and presentation.• We have tested on machine gun noise in NOISEX-92.• Results : 0.7% absolute gain or no gain• Cause: machine gun noise in NOISEX-92 corrupts all speech
frames. – Better to regard it as additive– Recording is done when the man is continuously shoot for
several minutes. (Can this be real?)– Positive result was obtained if the additive noise component
is removed.– Not reported because it may not be easily accepted by the
community.
![Page 81: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/81.jpg)
10, Examples of extending this idea to other applications?• Yes.
![Page 82: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/82.jpg)
11, How could this idea can be used in convolution
coding?
![Page 83: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/83.jpg)
12, What is your plan on combining the other
techniques with FSVA?
![Page 84: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/84.jpg)
13, Do you really think short-time noise should always
shorter than speech?• There is an intrinsic difficulty to define short-
time noise.• Dictionary of technology always characterize
short-time noise as– Random spectral content,– Random amplitude,– Random occurrence.
• No characterization in terms of length.• The length of speech may be the basic norm
for the length of noise.
![Page 85: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/85.jpg)
14, How do you compare this with other similar techniques?• As we have mentioned,
– There is another technique called EUM search.
![Page 86: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/86.jpg)
15, Actually, what makes FSVA works?
• Sorry!
• This is a problem we do not thoroughly understand
• Some strange results we obtained
• Hypothesis: partially corrupted frames.
![Page 87: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/87.jpg)
17, Why do you keep the transition probability in your
formulation?• In theory, we can also ignore the
transition contribution. However,– Changing the transition means breaking
the word apart.– It would be disastrous if a phone is deleted
or distorted.
![Page 88: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and.](https://reader031.fdocuments.us/reader031/viewer/2022032800/56649d375503460f94a0f96d/html5/thumbnails/88.jpg)
Topological Expansion of SFSHMM