Communications & Multimedia Signal Processing Meeting 6 Esfandiar Zavarehei Department of Electronic...
-
date post
19-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of Communications & Multimedia Signal Processing Meeting 6 Esfandiar Zavarehei Department of Electronic...
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Meeting 6
Esfandiar Zavarehei
Department of Electronic and Computer Engineering
Brunel University
6 July, 2005
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Contents
• Review of Noise Reduction Methods (more Results)– Review of the methods
– DFT-Kalman, a new method for parameter estimation
– Evaluation results and sample speech signals
• FTLP-HNM Model– FTLP-HNM for gap restoration
• Noise Station– An Interface for the programs
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Review of Noise Reduction Methods
• Most noise reduction systems fit to this block-diagram
• The de-noising method is based on:– Spectral subtraction, or– Bayesian Estimation
FFT Analysis
De-noising Method
SNR Estimation
Noise Estimation
Overlap-Add
Noisy Phase
Noisy Speech
Z-1 Enhanced Speech
Soft Decision
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Spectral Subtraction
• Where S, X and N are the speech, noisy speech and noise spectral amplitudes, k is the frequency index, α is the power exponent A and B are attenuation and subtraction coefficients respectively and T is the dynamic threshold
• Spectral subtraction methods vary with the methods used to for estimation of A and B
max ,k k k k k k kS X A B N X T
• Spectral subtraction method is generally formulized as:
for 1: max ,k k k k k kS A X B N T
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Spectral Subtraction
• Simple SS: Constant A and B (e.g. A=1, B=1, T=0 α=1 or 2)
• Adaptive Spectral Subtraction:– Using a posteriori SNR (uses only the speech information in current frame)
– Using a priori SNR (tracks the fluctuations of speech in successive frames)
– Using a posteriori and a priori SNRs (e.g. optimized to give the MMSE)
• Different algorithms are used for calculation of the threshold
-10 0 10 20 300
10
20
30
40
50
60
SNR (dB)
Ne
ga
tive
ST
SA
Pe
rce
nta
ge
(%
)
Car NoiseTrain NoiseWhite Noise
• The number of negative values resulting from spectral subtraction could be large and depends on the noise spectrum and SNR
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Bayesian Estimation
• Frames are independent:– Estimation of ST-DFT components (real and imaginary)
• Gaussian-Gaussian (Wiener)
• Other distributions for speech and noise (various estimators by Martin)
– Estimation of the amplitude and using noisy phase• Amplitude, log-Amplitudes, Power (different parameters to be estimated)
• Gaussian, Gaussian Mixtures (needs training), Laplacian (computationally not feasible)
• Criteria: MMSE, MAP, Joint phase and amplitude MAP, etc.
– Methods for parameter estimation use inter-frame information
• Frames are not independent:– DFT-Kalman
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Bayesian Estimation
• Wiener: speech always suppressed• Distributions vary from phoneme
to phoneme and frequency to frequency
-16
0
16
-16
0
16-40
-20
0
20
k dB
k dB
Gk d
B
GkEM
Gkw
-4 -3 -2 -1 0 1 2 3 4
x 104
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08 HistogramGaussian SKLD=0.28Laplacian SKLD=0.24Gamma SKLD=0.36
0.810.62 0.56
0
0.5
1
Gaussian Laplacian Gamma
Average Symetric Kullback-Leibler Distance
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
DFT-Kalman
• Incorporate the AR model of the short-time DFT trajectories for estimation
• Gaussian Distribution• Noise in each ST-DFT channel
is assumed to be WGN
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
Time lag ( 5ms)
Ave
rag
ed
Co
rre
latio
n C
oe
ffici
en
t CarTrainWhiteSpeech
nananana
n
NNN
r
121
1000
0100
0010
F
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
DFT-Kalman
•During noise only periods the output converges to zero, making the whole output zeroIn order to avoid too small values of LP error covariance, Q, during speech active periods:
Q=max (Q,m×|X(k)|2)(0.05)2 <m<(0.30)2
•Small values of m results in further reduction of background noise but results in more distortion of the speech signal.
0 2000 4000 6000 8000 10000 12000 14000-4000
-3000
-2000
-1000
0
1000
2000
3000
samples
ampl
itude
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
DFT-Kalman
Another method is based on spectral subtraction of the ST-DFT Trajectories. An autocorrelation vector is obtained using spectral subtraction at the start of the speech after long noise-only periods:
ˆ max ,0S X Dn n Φ Φ Φ
E TX r r rn X n X n X n L Φ F
1 ˆss Sn nr ΦF
Where L+1 is the number of samples used in calculation of the autocorrelation vector and Xr(n) is the real component of the ST-DFT trajectories at frame n and an arbitrary frequency. Similar equations hold for the imaginary components.
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
DFT-Kalman
Where n1 is the frame index of the first speech segment detected.
Regardless of the presence of speech if the variance of the excitation of the AR model is lower than a fixed threshold, a weighted average of the spectral subtraction-based autocorrelation and the autocorrelation of the previous estimates of the ST-DFT trajectories is used:
2if
ˆ ˆ1 1ss sr
Q m X n
n n n
r r r
1 1
1 1
ˆ ˆ 1 ,
for
ss srL n n n n
n n nL L
n n n L
r r r
This autocorrelation is linearly combined with the estimated autocorrelation obtained from previous estimated samples:
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Evaluation of the methods
• The correlation coefficient between different distortion measures and the mean opinion score (MOS) of 90 sentences is calculated (noisy, clean and de-noised) (number of listeners: 10)
• PESQ has the highest correlation with the MOS results
0.86
-0.69 -0.61-0.45
0.240.07
-1-0.8-0.6-0.4-0.2
00.20.40.60.8
1
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
PESQ – Car Noise
Car Noise
0
0.5
1
1.5
2
2.5
3
3.5
4
Method
PE
SQ
-5dB
0dB
5dB
10dB
SASS: Simple Amplitude SS BPSS: a post. Power SS MBSS: Multiband SSSSAPR: a priori Amplitude SS PSS: Parametric SSMMSE STSA: Ephraim’s Amp. Estimator MMSE LSA: Ephraim’s Log-Amp. EstimatorGGDFT: Martin’s Gamma-Gamma DFT Estimator
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
PESQ – Train Noise
Train Noise
0
0.5
1
1.5
2
2.5
3
3.5
Method
PE
SQ
-5dB
0dB
5dB
10dB
SASS: Simple Amplitude SS BPSS: a post. Power SS MBSS: Multiband SSSSAPR: a priori Amplitude SS PSS: Parametric SSMMSE STSA: Ephraim’s Amp. Estimator MMSE LSA: Ephraim’s Log-Amp. EstimatorGGDFT: Martin’s Gamma-Gamma DFT Estimator
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Mean Opinion Score – Car Noise
SASS: Simple Amplitude SS BPSS: a post. Power SS MBSS: Multiband SSSSAPR: a priori Amplitude SS PSS: Parametric SSMMSE STSA: Ephraim’s Amp. Estimator MMSE LSA: Ephraim’s Log-Amp. EstimatorGGDFT: Martin’s Gamma-Gamma DFT Estimator
Noise Level Car (MOS)
00.5
11.5
22.5
33.5
4
No
ise
Le
ve
l
Natural Noise
Annoying Noise
Speech Quality Car (MOS)
00.5
11.5
22.5
33.5
44.5
Sco
re
Speech Naturalness
Overal Preference
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Mean Opinion Score – Train Noise
SASS: Simple Amplitude SS BPSS: a post. Power SS MBSS: Multiband SSSSAPR: a priori Amplitude SS PSS: Parametric SSMMSE STSA: Ephraim’s Amp. Estimator MMSE LSA: Ephraim’s Log-Amp. EstimatorGGDFT: Martin’s Gamma-Gamma DFT Estimator
Speech Quality Train (MOS)
00.5
11.5
22.5
33.5
4
Sco
re
Speech Naturalness
Overal PreferenceNoise Level Train (MOS)
00.5
11.5
22.5
33.5
44.5
No
ise
Le
ve
l
Natural Noise
Annoying Noise
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Sample Speech Signals• Car Noise
• Noisy
• SASS
• BPSS
• MBSS
• SSAPR
• PSS
• Wiener
• MMSE STSA
• MMSE LSA
• GGDFT
• DFTK
• DFTSS
• Train Noise
• Noisy
• SASS
• BPSS
• MBSS
• SSAPR
• PSS
• Wiener
• MMSE STSA
• MMSE LSA
• GGDFT
• DFTK
• DFTSS
• Clean Signal
SASS: Simple Amplitude SS BPSS: a post. Power SS MBSS: Multiband SSSSAPR: a priori Amplitude SS PSS: Parametric SSMMSE STSA: Ephraim’s Amp. Estimator MMSE LSA: Ephraim’s Log-Amp. EstimatorGGDFT: Martin’s Gamma-Gamma DFT Estimator
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Future and Present Work
• Investigate the effect of incorporating noise AR model in the Kalman formulation:
• Where F’s are the state transition matrices of speech and noise. Clean speech would a by-product of the Kalman filtering
Speech
Noiser n
F 0F
0 F
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Future and Present Work
• Development of FTLP-HNM model together with the group and explore its potential for:
– Gap Restoration,– Speech Enhancement, and– (possibly) Coding
• The problem with phase in gap restoration
• Sample
LP Decompose
Noisy Speech
AR
Excitation
Formant Tracking
Formants
Pitch Estimation/Tracking
Pitch
Voiced Harmonic Tracking
Sub-band Voiced/
Unvoiced Decisions
Unvoiced Energy
Tracking
Original Excitation
Phase
Harmonic Amplitudes Ak
UV Sub-band Energies
Excitation Spectrum
ReconstructionInverse FFT
Filter
Restored Excitation
RestoredSpeech
Phase Φk
Inte
rpol
atio
n fo
r G
ap E
stim
atio
nC
orre
ctio
n an
d T
rack
ing
for
Enh
ance
men
t
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Future and Present Work
• Further development of the Noise Station program
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Future and Present Work
• Current capabilities:– Open/Close/Save/Amplify/Play/Resample wave signals
– Frame by Frame and overall viewing of signal/FFT/LP Spectrum/Excitation/Formants/Pitch Frequency/Harmonics
– Add Noise/De-Noise (different methods)/Distortion Measurement
– Formant/Pitch/Harmonic Tracking and viewing
• Future capabilities– An option for adding new methods (de-noising, pitch tracking,
etc) easily
Co
mm
un
icat
ion
s &
Mu
ltim
edia
Sig
nal
Pro
cess
ing
Future and Present Work
function output=MMSESTSA84_NS(signal,fs,P) % output=MMSESTSA84_NS(signal,fs,P)% HELP AND DIRECTIONS APPEARE HERE% Author: -% Date: Dec-04 % INITIALIZE ALL THE PARAMETERS HERE PARAMETER IS=.25; %INITIAL SILENCE LENGTH alpha=.99; %DECISION DIRECTED PARAMETER if (nargin>=3 & isstruct(P)) %EXTRACTING PARAMETERS if isfield(P,'alpha') alpha=IS.alpha; %DECISION DIRECTED PARAMETER else alpha=.99; %DECISION DIRECTED PARAMETER end if isfield(P,'IS') IS=P.IS; else IS=.25; %INITIAL SILENCE LENGTH endend
%THE PROGRAM STARTS HERE...............
Template for the Programs