Communications & Multimedia Signal Processing Meeting 6 Esfandiar Zavarehei Department of Electronic...

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Meeting 6

Esfandiar Zavarehei

Department of Electronic and Computer Engineering

Brunel University

6 July, 2005

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Contents

• Review of Noise Reduction Methods (more Results)– Review of the methods

– DFT-Kalman, a new method for parameter estimation

– Evaluation results and sample speech signals

• FTLP-HNM Model– FTLP-HNM for gap restoration

• Noise Station– An Interface for the programs

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Review of Noise Reduction Methods

• Most noise reduction systems fit to this block-diagram

• The de-noising method is based on:– Spectral subtraction, or– Bayesian Estimation

FFT Analysis

De-noising Method

SNR Estimation

Noise Estimation

Overlap-Add

Noisy Phase

Noisy Speech

Z-1 Enhanced Speech

Soft Decision

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Spectral Subtraction

• Where S, X and N are the speech, noisy speech and noise spectral amplitudes, k is the frequency index, α is the power exponent A and B are attenuation and subtraction coefficients respectively and T is the dynamic threshold

• Spectral subtraction methods vary with the methods used to for estimation of A and B

max ,k k k k k k kS X A B N X T

• Spectral subtraction method is generally formulized as:

for 1: max ,k k k k k kS A X B N T

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Spectral Subtraction

• Simple SS: Constant A and B (e.g. A=1, B=1, T=0 α=1 or 2)

• Adaptive Spectral Subtraction:– Using a posteriori SNR (uses only the speech information in current frame)

– Using a priori SNR (tracks the fluctuations of speech in successive frames)

– Using a posteriori and a priori SNRs (e.g. optimized to give the MMSE)

• Different algorithms are used for calculation of the threshold

-10 0 10 20 300

10

20

30

40

50

60

SNR (dB)

Ne

ga

tive

ST

SA

Pe

rce

nta

ge

(%

)

Car NoiseTrain NoiseWhite Noise

• The number of negative values resulting from spectral subtraction could be large and depends on the noise spectrum and SNR

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Bayesian Estimation

• Frames are independent:– Estimation of ST-DFT components (real and imaginary)

• Gaussian-Gaussian (Wiener)

• Other distributions for speech and noise (various estimators by Martin)

– Estimation of the amplitude and using noisy phase• Amplitude, log-Amplitudes, Power (different parameters to be estimated)

• Gaussian, Gaussian Mixtures (needs training), Laplacian (computationally not feasible)

• Criteria: MMSE, MAP, Joint phase and amplitude MAP, etc.

– Methods for parameter estimation use inter-frame information

• Frames are not independent:– DFT-Kalman

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Bayesian Estimation

• Wiener: speech always suppressed• Distributions vary from phoneme

to phoneme and frequency to frequency

-16

0

16

-16

0

16-40

-20

0

20

k dB

k dB

Gk d

B

GkEM

Gkw

-4 -3 -2 -1 0 1 2 3 4

x 104

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08 HistogramGaussian SKLD=0.28Laplacian SKLD=0.24Gamma SKLD=0.36

0.810.62 0.56

0

0.5

1

Gaussian Laplacian Gamma

Average Symetric Kullback-Leibler Distance

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

DFT-Kalman

• Incorporate the AR model of the short-time DFT trajectories for estimation

• Gaussian Distribution• Noise in each ST-DFT channel

is assumed to be WGN

0 2 4 6 8 100

0.2

0.4

0.6

0.8

1

Time lag ( 5ms)

Ave

rag

ed

Co

rre

latio

n C

oe

ffici

en

t CarTrainWhiteSpeech

nananana

n

NNN

r

121

1000

0100

0010

F

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

DFT-Kalman

•During noise only periods the output converges to zero, making the whole output zeroIn order to avoid too small values of LP error covariance, Q, during speech active periods:

Q=max (Q,m×|X(k)|2)(0.05)2 <m<(0.30)2

•Small values of m results in further reduction of background noise but results in more distortion of the speech signal.

0 2000 4000 6000 8000 10000 12000 14000-4000

-3000

-2000

-1000

0

1000

2000

3000

samples

ampl

itude

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

DFT-Kalman

Another method is based on spectral subtraction of the ST-DFT Trajectories. An autocorrelation vector is obtained using spectral subtraction at the start of the speech after long noise-only periods:

ˆ max ,0S X Dn n Φ Φ Φ

E TX r r rn X n X n X n L Φ F

1 ˆss Sn nr ΦF

Where L+1 is the number of samples used in calculation of the autocorrelation vector and Xr(n) is the real component of the ST-DFT trajectories at frame n and an arbitrary frequency. Similar equations hold for the imaginary components.

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

DFT-Kalman

Where n1 is the frame index of the first speech segment detected.

Regardless of the presence of speech if the variance of the excitation of the AR model is lower than a fixed threshold, a weighted average of the spectral subtraction-based autocorrelation and the autocorrelation of the previous estimates of the ST-DFT trajectories is used:

2if

ˆ ˆ1 1ss sr

Q m X n

n n n

r r r

1 1

1 1

ˆ ˆ 1 ,

for

ss srL n n n n

n n nL L

n n n L

r r r

This autocorrelation is linearly combined with the estimated autocorrelation obtained from previous estimated samples:

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Evaluation of the methods

• The correlation coefficient between different distortion measures and the mean opinion score (MOS) of 90 sentences is calculated (noisy, clean and de-noised) (number of listeners: 10)

• PESQ has the highest correlation with the MOS results

0.86

-0.69 -0.61-0.45

0.240.07

-1-0.8-0.6-0.4-0.2

00.20.40.60.8

1

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

PESQ – Car Noise

Car Noise

0

0.5

1

1.5

2

2.5

3

3.5

4

Method

PE

SQ

-5dB

0dB

5dB

10dB

SASS: Simple Amplitude SS BPSS: a post. Power SS MBSS: Multiband SSSSAPR: a priori Amplitude SS PSS: Parametric SSMMSE STSA: Ephraim’s Amp. Estimator MMSE LSA: Ephraim’s Log-Amp. EstimatorGGDFT: Martin’s Gamma-Gamma DFT Estimator

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

PESQ – Train Noise

Train Noise

0

0.5

1

1.5

2

2.5

3

3.5

Method

PE

SQ

-5dB

0dB

5dB

10dB


Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Mean Opinion Score – Car Noise


Noise Level Car (MOS)

00.5

11.5

22.5

33.5

4

No

ise

Le

ve

l

Natural Noise

Annoying Noise

Speech Quality Car (MOS)

00.5

11.5

22.5

33.5

44.5

Sco

re

Speech Naturalness

Overal Preference

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Mean Opinion Score – Train Noise


Speech Quality Train (MOS)

00.5

11.5

22.5

33.5

4

Sco

re

Speech Naturalness

Overal PreferenceNoise Level Train (MOS)

00.5

11.5

22.5

33.5

44.5

No

ise

Le

ve

l

Natural Noise

Annoying Noise

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Sample Speech Signals• Car Noise

• Noisy

• SASS

• BPSS

• MBSS

• SSAPR

• PSS

• Wiener

• MMSE STSA

• MMSE LSA

• GGDFT

• DFTK

• DFTSS

• Train Noise

• Noisy

• SASS

• BPSS

• MBSS

• SSAPR

• PSS

• Wiener

• MMSE STSA

• MMSE LSA

• GGDFT

• DFTK

• DFTSS

• Clean Signal


Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing

Future and Present Work

• Investigate the effect of incorporating noise AR model in the Kalman formulation:

• Where F’s are the state transition matrices of speech and noise. Clean speech would a by-product of the Kalman filtering

Speech

Noiser n

F 0F

0 F

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing


• Development of FTLP-HNM model together with the group and explore its potential for:

– Gap Restoration,– Speech Enhancement, and– (possibly) Coding

• The problem with phase in gap restoration

• Sample

LP Decompose

Noisy Speech

AR

Excitation

Formant Tracking

Formants

Pitch Estimation/Tracking

Pitch

Voiced Harmonic Tracking

Sub-band Voiced/

Unvoiced Decisions

Unvoiced Energy

Tracking

Original Excitation

Phase

Harmonic Amplitudes Ak

UV Sub-band Energies

Excitation Spectrum

ReconstructionInverse FFT

Filter

Restored Excitation

RestoredSpeech

Phase Φk

Inte

rpol

atio

n fo

r G

ap E

stim

atio

nC

orre

ctio

n an

d T

rack

ing

for

Enh

ance

men

t

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing


• Further development of the Noise Station program

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing


• Current capabilities:– Open/Close/Save/Amplify/Play/Resample wave signals

– Frame by Frame and overall viewing of signal/FFT/LP Spectrum/Excitation/Formants/Pitch Frequency/Harmonics

– Add Noise/De-Noise (different methods)/Distortion Measurement

– Formant/Pitch/Harmonic Tracking and viewing

• Future capabilities– An option for adding new methods (de-noising, pitch tracking,

etc) easily

Co

mm

un

icat

ion

s &

Mu

ltim

edia

Sig

nal

Pro

cess

ing


function output=MMSESTSA84_NS(signal,fs,P) % output=MMSESTSA84_NS(signal,fs,P)% HELP AND DIRECTIONS APPEARE HERE% Author: -% Date: Dec-04 % INITIALIZE ALL THE PARAMETERS HERE PARAMETER IS=.25; %INITIAL SILENCE LENGTH alpha=.99; %DECISION DIRECTED PARAMETER if (nargin>=3 & isstruct(P)) %EXTRACTING PARAMETERS if isfield(P,'alpha') alpha=IS.alpha; %DECISION DIRECTED PARAMETER else alpha=.99; %DECISION DIRECTED PARAMETER end if isfield(P,'IS') IS=P.IS; else IS=.25; %INITIAL SILENCE LENGTH endend

%THE PROGRAM STARTS HERE...............

Template for the Programs

Communications & Multimedia Signal Processing Meeting 6 Esfandiar Zavarehei Department of Electronic...

Documents

Transcript of Communications & Multimedia Signal Processing Meeting 6 Esfandiar Zavarehei Department of Electronic...