Post on 19-Dec-2015
Advances in WP1
Nancy Meeting – 6-7 July 2006
www.loquendo.com
2
WP1: Environment & Sensor RobustnessT1.2 Noise Independence
Noise Reduction:– Spectral Subtraction (YEAR 1) and Spectral Attenuation (YEAR2)
“Automatic Speech Recognition
With a Modified Ephraim-Malah Rule”,
Roberto Gemello, Franco Mana and Renato De Mori
IEEE Signal Processing Letters, VOL 13, NO 1, January 2006
– Evaluation of HEQ for feature normalization (HEQ study + Revision 2)
3
Denoising Techniques for Y2 evaluations (1)
kkk YGX ˆ
kv
t
k
kk dt
t
eG
2
1exp
1
kk
kkv
1
Ephraim–Malah MMSE log estimator rule:
Spectral Attenuation (or spectral weighting) is a form of audio signal enhancement in which noise suppression can be viewed as the application of a suppression rule, or non-negative real-valued gain Gk, to each bin k of the observed signal magnitude
spectrum, in order to form an estimate of the original signal magnitude spectrum.
2
2
k
kk
D
X 1,0,1)(,0max)1(
)1(ˆ
)1(ˆˆ
2
2
m
mD
mXk
k
k
k
2
2
k
kk
D
Y
2
2
ˆˆ
k
kk
D
Y
4
Denoising Techniques for Y2 evaluations (2)
)(~
),(~)(),(~
mmGmmG kkkkkk
Modified Ephraim–Malah MMSE log estimator rule:
2
2
k
kk
D
X
)(,1)(~))(1(
)1(ˆ)(
)1(ˆ)(max
~̂2
2
mmmmDm
mXmm k
k
k
k
2
2
k
kk
D
Y 1)(,1
)(ˆ)(
)(max)(~
2
2
m
mDm
mYm
k
kk
We propose to make the estimation of the a priori and the a posteriori SNR dependent on the noise overestimation factor (m) and the spectral floor (m) as follows:
(m)
1.5
0 10 20 SNR(m) dB
0.001
(m)
1.0
0 15 20 SNR(m) dB
0.01
5
Denoising Techniques for Y2 evaluations (3)
otherwisemD
falseVADmmDmYif
mYmD
mD
k
kk
kk
k
)1(ˆ
)()(ˆ)(
)(1)1(ˆ
)(ˆ
222 ˆ)(1)1()( mDmYmm kk
The noise spectrum amplitude is obtained by a first-order recursion in conjunction with an energy based Voice Activity Detector (VAD) as follows:
Where: controls the update speed of the recursion (0.9), controls the allowed dynamics of noise (4.0), and the noise standard deviation (m) is estimated as:
Baseline evaluations of Loquendo ASR on Aurora2
speech databases
7
Year 1+2 Performance evaluations
Test A Test B Test C A-B-C Avg
Models Clean Multi Clean Multi Clean Multi Clean Multi
ND 24.4 6.5 22.5 8.9 24.7 9.8 23.7 8.1
WM 16.0(34.4)
6.1(6.1)
15.6(30.7)
7.9(11.2)
16.7(32.4)
9.5(3.0)
16.0(32.5)
7.5(7.4)
EMM 14.7(39.7)
6.0(7.7)
15.8(29.8)
8.0(10.1)
15.2(38.5)
8.9(9.2)
15.2(35.9)
7.4(8.6)
The testing conditions used in the experiments are the following:1) No Denoising (ND): Rasta PLP features (RPLP) are used without any preliminary noise reduction.2) Wiener modified (WM): RPLP with Wiener filtering dependent on global SNR.3) Ephraim-Malah modified (EMM): RPLP with noise reduction based on the modified Ephraim-Malah spectral attenuation rule.
Baseline evaluations of Loquendo ASR on Aurora3
speech databases
9
Year 1+2 Performance evaluationsThe testing conditions used in the experiments are the following:1) No Denoising (ND): Rasta PLP features (RPLP) are used without any preliminary noise reduction.2) Wiener modified (WM): RPLP with Wiener filtering dependent on global SNR.3) Ephraim-Malah modified (EMM): RPLP with noise reduction based on the modified Ephraim-Malah spectral attenuation rule.
Ita WM Ita HM Spa WM Spa HM
ND 1.8 53.4 2.7 25.4
WM 1.7(5.5)
22.5(57.9)
2.4(11.1)
10.1(60.2)
EMM 1.6(11.1)
17.8(66.7)
2.3(14.8)
11.5(54.7)
Baseline evaluations of Loquendo ASR on Aurora4
speech databases
11
Year 1+2 Performance evaluationsThe testing conditions used in the experiments are the following:1) No Denoising (ND): Rasta PLP features (RPLP) are used without any preliminary noise reduction.2) Wiener modified (WM): RPLP with Wiener filtering dependent on global SNR.3) Ephraim-Malah modified (EMM): RPLP with noise reduction based on the modified Ephraim-Malah spectral attenuation rule.
CLEANModels
CLEAN Car Babble Restaurant Street Airport Train Station
Noise avg.
ND 14.8 45.7 76.9 70.6 66.0 70.7 67.7 66.3
WM 14.8 (00.0)
33.0(27.8)
63.4 (17.5)
69.3(1.8)
56.9 (13.8)
68.1 (3.7)
51.2 (24.4)
57.0(14.0)
EMM 14.5 (2.02)
29.6 (35.2)
62.9 (18.2)
68.4 (3.1)
54.2 (17.8)
68.4 (3.2)
46.3 (31.6)
55.0 (17.0)
12
Year 1+2 Performance evaluationsThe testing conditions used in the experiments are the following:1) No Denoising (ND): Rasta PLP features (RPLP) are used without any preliminary noise reduction.2) Wiener modified (WM): RPLP with Wiener filtering dependent on global SNR.3) Ephraim-Malah modified (EMM): RPLP with noise reduction based on the modified Ephraim-Malah spectral attenuation rule.
MULTIModels
CLEAN Car Babble Restaurant Street
Airport Train Station
Noise avg.
ND 15.7 24.8 40.1 41.8 41.9 39.1 42.3 38.3
WM 16.6(-5.7)
24.1 (2.8)
39.7 (1.0)
43.2(-3.3)
39.6 (5.5)
39.5(-1.0)
37.1 (12.3)
37.2(2.9)
EMM 15.5 (1.3)
24.7 (0.4)
40.4 (-0.7)
44.2 (-5.7)
39.5 (5.7)
40.4 (-3.3)
38.2 (9.7)
37.9 (1.0)
HEQ + Denoising techniques
14
Problems:
(1) Context dependency (whole utterance CDF estimation the best)
(2) High variability in background noise segment
HEQ Evaluation: Revision 1 (1)(Loquendo & UGR)
HEQ (121)
E+12CEP
DE+12DEP
DDE+12DDEP
(39 coefficients)
15
HEQ Integration: Revision 1 (2)(Loquendo & UGR)
Loquendo FE
UGR HEQ
Loquendo ASR
Denoise
(Power Spectrum level)
Feature Normalization
(Frame -39coeff- level)
Phoneme-based
Models
AURORA3 ITA - HM
SA WA WI WD WS
Loquendo 46.6% 77.5% 4.8% 7.2% 10.4%
+HEQ121 38.2% 69.6% 4.3% 12.6% 13.5%
HEQ121 37.9% 69.1% 3.5% 13.8% 13.5%
+HEQ1001 46.5% 77.7% 4.0% 7.3% 11.0%
16
HEQ Evaluation: Revision 2 (3)(Loquendo & UGR)
HEQ (1573)E+12CEP
DE+12DEP
DDE+12DDEP
(39 coefficients)
HEQ (1573)
HEQ (1573)Benefits:
(1) Relation in magnitude and dynamics among coefficients are preserved
(2) More stable CDF estimation similar to extend the HEQ temporal window
17
HEQ Evaluation: Revision 2 (4)(Loquendo & UGR)
AURORA3 ITA - HM
SA WA WI WD WS
WM 46.6% 77.5% 4.8% 7.2% 10.4%
HEQ121 47.9% 77.7% 5.1% 6.7% 10.5%
HEQ241 49.7% 79.7% 4.3% 6.6% 9.3%
WM+HEQ121 49.0% 79.2% 5.1% 5.7% 10.0%
WM+HEQ241 50.8% 79.8% 4.6% 6.1% 9.4%
18
HEQ for denoising (5)(Loquendo & UGR)
Comparing RPLP / HEQrev1 / HEQrev2 using the same clean and noisy signal
19
HEQ for signal level equalization (6)(Loquendo & UGR)
Comparing RPLP / HEQrev1 / HEQrev2 using the same clean signal at normal gain level and at low gain level
20
WP1: Workplan
• Selection of suitable benchmark databases; (m6)
• Completion of LASR baseline experimentation of Spectral Subtraction (Wiener SNR
dependent) (m12)
• Discriminative VAD (training+AURORA3 testing) (m16)
• Exprimentation of Spectral Attenuation rule
(Ephraim-Malah SNR dependent) (m21)
• Preliminary results on spectral subtraction and HEQ techniques (m24)
• Integration of denoising and normalization techniques (m33)
• Noise estimation and reduction for non-stationary noises (m33)