ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May...

35
Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 1 May 14th, 2006 ICASSP 2006, Toulouse, France 1 Defeating Ambient Noise: Practical Approaches for Noise Reduction and Suppression Ivan Tashev Microsoft Research Redmond, USA May 14th, 2006 ICASSP 2006, Toulouse, France 2 Introduction Why signal enhancement is important: Reducing the ambient noise from the captured audio signal is crucial for providing good sound in modern computing systems, critical for the needs of real time communication and speech recognition. Tutorial goal: To present the key theoretical aspects and share our practical experience in the area of noise suppression and reduction for application in sound capture and processing systems. Target audience: Engineers and researchers working in the area of audio signal processing planning or building audio systems for sound capturing.

Transcript of ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May...

Page 1: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 1

May 14th, 2006 ICASSP 2006, Toulouse, France 1

Defeating Ambient Noise:Practical Approaches for Noise Reduction and Suppression

Ivan TashevMicrosoft Research

Redmond, USA

May 14th, 2006 ICASSP 2006, Toulouse, France 2

IntroductionWhy signal enhancement is important:

Reducing the ambient noise from the captured audio signal is crucial for providing good sound in modern computing systems, critical for the needs of real time communication and speech recognition.

Tutorial goal:To present the key theoretical aspects and share our practical experience in the area of noise suppression and reduction for application in sound capture and processing systems.

Target audience:Engineers and researchers working in the area of audio signal processing planning or building audio systems for sound capturing.

Page 2: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 2

May 14th, 2006 ICASSP 2006, Toulouse, France 3

Introduction (2)

Noise suppression as science and as art:It is a science, because uses mathematical models and hypotheses, it is repeatable, i.e. we get the same results with the same input dataIt is an art, because it is about human perception of the sound and requires evaluation from a human

For speech signals the process is part of more general term speech enhancement

May 14th, 2006 ICASSP 2006, Toulouse, France 4

Defeating ambient noise: tutorial agenda

BasicsNoise suppressionDirectional microphonesMicrophone arraysAdvanced techniquesFree joke and conclusions

Page 3: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 3

May 14th, 2006 ICASSP 2006, Toulouse, France 5

Basics

Noise: definition and propertiesSignal: definition and propertiesNoise suppression and reduction, speech enhancementAudio processing in frequency domain: weighting, transformation, synthesisBandpass filtering

May 14th, 2006 ICASSP 2006, Toulouse, France 6

Basics: noise properties

Statistical model: Zero mean Gaussian random processRight: airplane noise PDF vs. Gaussian PDF

In frequency domain: White noise spectrumPink noise: 6 dB/oct decreaseColored noise – with given spectrum Hoth noise: typical room noise model

Temporal characteristics: Pseudo stationary compared to speechSpecific noises may be different: wind noise

Spatial characteristics: Ambient, isotropic: evenly distributed Point noise sources - jammers

0 1000 2000 3000 4000 5000 6000 7000 8000-10

-5

0

5

10

15

20

25

30

35

Frequency, Hz

Spc

tral D

ensi

ty, d

B

Hoth noise

-4 -3 -2 -1 0 1 2 3 40

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04Probability distribution function

Times standard deviation

Pro

babi

lity

NoiseGauss

White Inside A320 NYC Café

Page 4: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 4

May 14th, 2006 ICASSP 2006, Toulouse, France 7

Basics: signal properties

In most of the cases the signal is speechStatistical model (in long term):

Zero mean random Gaussian (Laplace, Gamma) process

Frequency domain (in short term):Voiced – e.g. vowels (harmonic structure) Unvoiced – e.g. fricatives (noise type)

Temporal: Speech and nonspeech segments

Spatial: Point sound source (mouth or loudspeaker)

-4 -3 -2 -1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4Probability distribution function

Times standard deviation

Pro

babi

lity

SpeechGaussLaplaceGamma

May 14th, 2006 ICASSP 2006, Toulouse, France 8

Basics: classification

Noise suppression: removing the noise based on statistical models of the noise and signal, spectral subtractionNoise reduction or cancellation: removing the noise based on knowledge or estimation of the corrupting signalSignal (speech) enhancement: more general term for any type of processing aiming improving some property of the signalActive noise cancellation: decreasing the noise level in certain area by sending opposite phase sound with loudspeakers – not discussed in this tutorial

Page 5: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 5

May 14th, 2006 ICASSP 2006, Toulouse, France 9

Basics: processing flow

Processing in frequency domainAudio frames:

80-1024 samples, 5-25 ms

Frequency domain transformations:Fourier (FFT): symmetric spectra, zero Fs/2 bin, process the first halfMCLT (Malvar, 1992): shifts bins ½ frequency binOther: Hartley, wavelet, cepstra; no re-synthesis

May 14th, 2006 ICASSP 2006, Toulouse, France 10

Basics: processing flow (2)

Overall process (typical):Extract the frame

Weighting TransformProcessInverse transformSynthesis (overlap-add) using ½ of the previous frame

Move one half frame forward, repeat

+ +. . .

Page 6: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 6

May 14th, 2006 ICASSP 2006, Toulouse, France 11

Basics: processing flow (3)

Weighting function:Keeps the spectral peaks less smearedCommonly used:

Bartlett (triangle)Hann or Hanning (cos-shaped)Modified Hann – sqrt(cos)-shaped, to be applied twice

If re-synthesis is not requiredNatural, Bartlett, Parsen: sinc, sinc2

and sinc4 in frequency domainMax-Fauque-Bertier (sinc): rectangular in frequency domainBlackman and further generalization as Taylor sequence

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.80

0.2

0.4

0.6

0.8

1

Weight windows

Time

Wei

ght

BartlettHannModified Hann

May 14th, 2006 ICASSP 2006, Toulouse, France 12

Basics: bandpass filtering

Bandpass filtering:Do not process frequency bins below and above certain frequencies – zero themTypical low limit: 100-300 Hz for speechTypical high limit: 0.45Fs, reduces aliasingDynamic bandpass filtering

Measure SNR per binAdjust the low and high slopesApply the filter

No kidding!Increases speech intelligibilitySaves artifacts and distortionsSaves efforts and some CPU time

Page 7: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 7

May 14th, 2006 ICASSP 2006, Toulouse, France 13

Basics: summary

Noise and signal properties: statistical, frequency, temporal, and spatialSuppression vs. reduction vs. enhancement vs. cancellationProcessing in frequency domain

Break in 50% overlapping frames – most commonWeighting function is important, sqrt(cos)-shaped most commonOverlap-add processing

Bandpass filtering: increases intelligibility, reduces artifacts and saves efforts

May 14th, 2006 ICASSP 2006, Toulouse, France 14

Noise suppressionGain based noise suppressiona priori and a posteriori SNRSuppression rulesML and Decision Directed approach for a priori SNR estimationUncertain presence of signalVoice activity detectorsAccounting for the temporal characteristicsOverall architectureDemos

Page 8: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 8

May 14th, 2006 ICASSP 2006, Toulouse, France 15

Noise suppression: gain based processing

Given signal xn(t) and noise dn(t) mixed in yn(t)Observed in frequency domain, n-th frame, k-th frequency bin: Yk = Xk + DkNoise suppression:

Gk – time varying, non-negative, real value gain (or suppression rule)The estimator keeps the same phase as Yk: under Gaussian assumptions the best phase estimator is observed phase

The goal of noise suppression is for each frame to estimate Gk vector optimal in certain way

( ) .kk k k k k

k

YX G Y G YY

= =

May 14th, 2006 ICASSP 2006, Toulouse, France 16

Noise suppression: a priori and a posteriori SNR

Signal and noise: statistically independent Gaussian processesSignals variances a priori and a posteriori SNRs

The suppression rule is now function of two parameters:

( ), ( ), ( )X D Yk k kλ λ λ

( , )k k kG ξ γ

2( )( )

( )D

Y kk

λ( )( )( )

X

D

kkk

λξλ

Page 9: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 9

May 14th, 2006 ICASSP 2006, Toulouse, France 17

Noise suppression:suppression rules

Wiener (1945):

MMSE spectral amplitude estimator

DerivationGoal Solution

Problems:Musical noises in the pausesDistortion in the speech segments

2

2

( ) ( ) ( )( ) 1 ( )1 ( )( )

DY k k kG k kkY k

λ ξγξ

−= = − =

+

{ }2ˆk kX Xε ⎡ ⎤−⎣ ⎦

2

2 2

( ) ( )( ) ( ) ( ) ( )( ) 1 1 ( )( ) ( ) ( ) ( )

DXY YY DD D

YY YY

Y k kP k P k P k kG k kP k P k Y k Y k

λ λ γ−−= = = = − = −

Musical noisesand distortions

May 14th, 2006 ICASSP 2006, Toulouse, France 18

Noise suppression:suppression rules

McAulay/Malpass (1980):

ML spectral amplitude estimator

Ephraim/Malah (1984):

Introduce a priori SNR

MMSE short term spectral amplitude estimator

Where:

2

2

( ) ( )1 1( )2 2 ( )

DY k kG k

Y kλ−

= +

0 1(1 ) exp2 2 2 2

k k k kk k k

k

G I Iπν ν ν νν νγ

⎡ ⎤ −⎛ ⎞ ⎛ ⎞ ⎛ ⎞= + +⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦

-40-20

020

40

-40-20

020

40-40

-30

-20

-10

0

10

ζ , dB

Ephraim and Malah suppression rule

γ, dB

Sup

pres

sion

Gai

n, d

B

( )1

kk

kk ξν γ

ξ+

-40-20

020

40

-40-20

020

40-40

-30

-20

-10

0

ζ , dB

Wiener suppression rule

γ, dB

Sup

pres

sion

Gai

n, d

B

Page 10: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 10

May 14th, 2006 ICASSP 2006, Toulouse, France 19

Noise suppression:suppression rules (2)Ephraim/Malah (1985):

MMSE short term log spectral amplitude estimator

Computational complexity of Ephraim and Malahsuppression rulesEfficient alternatives, P. Wolfe/S. Godsill (2001):

Joint Maximum A Posteriori Spectral Amplitude EstimatorMaximum A Posteriori Spectral Amplitude Estimator

MMSE Spectral Power Estimator:

Gaussian noise and Gamma speech distributions,Martin (2002)

11

k kk

k k

G ξ νξ γ

⎛ ⎞+= ⎜ ⎟+ ⎝ ⎠

1.exp1 2

k

tk

kk

eG dttν

ξξ

∞ −⎛ ⎞= ⎜ ⎟⎜ ⎟+ ⎝ ⎠

May 14th, 2006 ICASSP 2006, Toulouse, France 20

Noise suppression:a priori SNR estimation

a priori SNR estimation:

ML approximation:

Decision-directed (Ephraim/Malah, 1984):

Noise variation estimationRequires signal/noise classification of the audio frames/binsIn non-signal frames/bins update the noise model:

2( ) ( 1) ( )( ) (1 ) ( ) ( )n n nD Dk k Y kλ β λ β−= − +

2( ) ( )ˆ( )( )

D

D

Y k kk

ξλ

−=

2( 1)( )

( 1)

ˆ ( )ˆ( ) (1 )max 0, ( ) 1 , [0,1)

( )

nn

nD

X kk k

kξ α α γ α

λ

−⎡ ⎤= + − − ∈⎣ ⎦

Page 11: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 11

May 14th, 2006 ICASSP 2006, Toulouse, France 21

Noise suppression:uncertain presence of signal

McAulay/Malpass (1980)Observation Yk = Xk + Dk holds only if we have signal presentedReal case:

Modified MMSE suppression rule:

1

0

,,

k kk

k

X D with signal state HY

D just noise state H+

=

{ } { }1 1| ( | ). | ,k k k k kE X Y P H Y E X Y H=

May 14th, 2006 ICASSP 2006, Toulouse, France 22

Noise suppression:voice activity detectors

Energy based, binary decision

Track minimal energy

For classification apply threshold (2.5-7 Emin)Can be done per frame or per bin

Probabilistic based (Sohn et. all., 1999)Compute likelihood ratio:Apply hang-over scheme Result: signal presence probability vector (per bin)

See Martin (2001) as well

( )( )

2 2( 1) ( 1) ( 1)min min min

( )min

2 2( 1) ( 1) ( 1)min min min

upn n n

n

n n ndown

E E Y Y ETE

E E Y E YT

τ

τ

− − −

− − −

+ − >=

+ − >

1 exp1 1

k kk

k k

γ ξξ ξ

⎧ ⎫Λ = ⎨ ⎬+ +⎩ ⎭

Page 12: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 12

May 14th, 2006 ICASSP 2006, Toulouse, France 23

Noise suppression:using temporal properties

Suppression rule estimators use only the current frame: artifacts, distortionsTemporal gain smoothing

Direct smoothing:

HMM based:

Practical interpolation:

( ) ( 1)(1 )n nk k kG G Gβ β−= − +

( 1)( ) 01 11

( 1)00 10

nn k

k knk

a a GG Ga a G

+=+

( ) ( 1)n nk k kG G G−=

May 14th, 2006 ICASSP 2006, Toulouse, France 24

Noise suppression:overall architecture

Non-observable

signal x(t)

noise d(t)

Observable

SFFT

VADUpdatenoisemodel

Computesuppression

rule

Final estimator

phase

magnitude

Corruptedsignal y(t)

iSFFT

noise model

presence probability vector

suppression rule

x(t) estimation

Page 13: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 13

May 14th, 2006 ICASSP 2006, Toulouse, France 25

Noise suppression:practical tips and tricks

Limit: Suppression gains: keep above -60 dBProbabilities: [1e-4,0.9999]

Smooth (in time and/or frequency):Noise modelsGains

Simplify:Do not use more complex models than necessarySimpler model with more precise or faster parameters estimation usually works better

May 14th, 2006 ICASSP 2006, Toulouse, France 26

Noise suppression:demonstrations

Input file Wiener MMSE SPEMcAulay/Malpass Ephraim/Malah

10.722.5-44.8-22.2MMSE SPE

13.225.0-47.0-22.0Ephraim-Malah

2.614.4-36.0-21.6McAulay-Malpass

18.230.2-52.3-22.1Wiener filtered

11.8-33.3-21.5Not processed

ImprovementSNRNoiseSignalAlgorithm

Note: All measurement units are dB

Page 14: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 14

May 14th, 2006 ICASSP 2006, Toulouse, France 27

Noise suppression:summary

Noise suppression as time varying, real value, non-negative gain (or suppression rule) based operationa priori and a posteriori SNRs estimation is essential – the decision-directed approachSignal may or may not be present – voice activity detectors are criticalEstimation of precise noise model is with high importanceSmoothing in time improves listening results

May 14th, 2006 ICASSP 2006, Toulouse, France 28

Directional microphones

Microphone typesPressure gradient microphoneParameters for directional microphonesFirst order directional microphonesClassification and parametersBottom line

Page 15: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 15

May 14th, 2006 ICASSP 2006, Toulouse, France 29

Directional microphones:microphone types

Microphone is a device that converts the air pressure to a electric signalMicrophone types:

Carbon – in first phonesCrystal – piezoelectric effect basedDynamic – inverted loudspeakerCondenser – measurement grade micsElectret – the most common today

May 14th, 2006 ICASSP 2006, Toulouse, France 30

Directional microphones:pressure gradient microphone

Pressure microphoneConverts pressure to electric signalCan be designed as diaphragm in closed capsuleAcoustical monopole

Pressure gradient microphoneConverts the pressure difference into electric signalCan be designed as diaphragm in a open capsuleAcoustical dipole

acoustical monopole,omnidirectional

acoustical dipole,directional

closedcapsule

diaphragm

opencapsule diaphragm

Page 16: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 16

May 14th, 2006 ICASSP 2006, Toulouse, France 31

Directional microphones:pressure gradient microphone (2)

Directivity pattern of pressure gradient microphone

Has figure-8 directivity patternFrequency response: 6 dB/octslope towards low frequencies

sound0 deg

sound90 deg

0.05

0.1

0.15

0.2

30

210

60

240

90

270

120

300

150

330

180 0

Directivity pattern at 1000 Hz

0 1000 2000 3000 4000 5000 6000 7000 80000

0.2

0.4

0.6

0.8

1

1.2

1.4Frequency response at 0 deg

cos( )( , ) 1 exp( 2 )dU f j f θθ πν

= − −

d = 9 mmυ = 342 m/sf = 0-8000 Hzө= 0-360O

May 14th, 2006 ICASSP 2006, Toulouse, France 32

Directional microphones:first order microphones

First order microphone as combination of delayed τ and subtracted two signals from two microphones at distance dDirectivity pattern

cos( )( , ) 1 exp 2

( , ) (1 )cos( )Norm

dU f j f

U f

θθ π τν

θ α α θ

⎛ ⎞⎛ ⎞= − − +⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠≈ + −

Omnidirectional and directionalmicrophones

Page 17: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 17

May 14th, 2006 ICASSP 2006, Toulouse, France 33

Directional microphones:classification

Zero at 90 deg, acoustic dipole4.80.00figure 8

Highest DI, zeros at ± 109 deg6.00.25hypercardioid

Highest front-to-back ratio, zeros at ±125 deg5.7~0.35supercardioid

Zero at 180 deg4.80.50cardioid

No directivity0.01.00omnidirectional

NoteDIαType

cardioid supercardioid hypercardioid figure 8 (dipole)

0.2

0.4

0.6

0.8

1

30

210

60

240

90

270

120

300

150

330

180 0

Directivity: cardioid

0.2

0.4

0.6

0.8

1

30

210

60

240

90

270

120

300

150

330

180 0

Directivity: supercardioid

0.2

0.4

0.6

0.8

1

30

210

60

240

90

270

120

300

150

330

180 0

Directivity: hypercardioid

0.2

0.4

0.6

0.8

1

30

210

60

240

90

270

120

300

150

330

180 0

Directivity: figure 8

May 14th, 2006 ICASSP 2006, Toulouse, France 34

Directional microphones:parameters

Directivity patternDirectivity index Sensitivity, -45 dBV/Pa typicalSNR, 60 dB typicalFrequency response: front/back

10 2

0 0

( , , )( ) 10.log1 ( , , )

4

T TP fDI f

d d P fπ π

ϕ θ

θ ϕ ϕ θπ

⎛ ⎞⎜ ⎟⎜ ⎟

= ⎜ ⎟⎜ ⎟⋅⎜ ⎟⎜ ⎟⎝ ⎠

∫ ∫2

0( , , ) ( , ) , constantP f U f cϕ θ ρ ρ= = =

( , )U f c

Page 18: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 18

May 14th, 2006 ICASSP 2006, Toulouse, France 35

Directional microphones:summary

In the Noise suppression section we learned that 6 dB noise suppression is a good achievementAn cardioid microphone gives 4.8 dB noise reduction without distortions and artifactsIn real systems design using directional microphones is importantThe microphone directivity pattern is further denoted as U(f,c), f – frequency, c – look-up direction { , , }c θ ϕ ρ=

May 14th, 2006 ICASSP 2006, Toulouse, France 36

Microphone arrays

Definition and typesDelay-and-sum beamformerTerminologyTime-invariant beamformers, demoSound source localizationAdaptive beamformersSpatial filtering, demo

Page 19: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 19

May 14th, 2006 ICASSP 2006, Toulouse, France 37

Microphone arrays:definition and types

Set of synchronously sampled microphonesTypes:

linear, planar, 3Dcompact and largeuniform, nonuniform and random spacingnear field and far field

Advantage: allow spatial filtering, reducing the noises and reverberationDisadvantage: require more microphones and more processing time

May 14th, 2006 ICASSP 2006, Toulouse, France 38

Microphone arrays:delay-and-sum beamformer

The most intuitive approachShift the signals to align them and sumAdvantages:

Simple and efficientProblems:

Variable directivityBig sidelobesLow efficiency

Page 20: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 20

May 14th, 2006 ICASSP 2006, Toulouse, France 39

Microphone arrays:terminology

Beamforming: making the microphone array to listen to given look-up directionBeamsteering: electronically change the look-up direction the microphone array listens toNullsteering: suppressing the sounds coming from given directionSound source localization: techniques to detect, localize and track one or multiple sound sources using microphone array

May 14th, 2006 ICASSP 2006, Toulouse, France 40

Microphone arrays:general parameters

Generalized form:M – number of microphonesXi(f) – spectrum of i-th channelW(f,i) – weight coefficients matrixY(f) – output signal

Parameters:Directivity pattern B:

Main Response Axis – direction towards max sensitivity, look-up directionBeamwidth: area -3 dB around MRA

1

0( ) ( , ) ( )

M

ii

Y f W f i X f−

=

= ∑

2

( , ) ( ) ( , ),

( , ) ( , )

m

H

c pj f

m

B f W f D f

eD f U f cc p

πν

θ θ

θ

−−

= ⋅

=−

maxθ

Page 21: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 21

May 14th, 2006 ICASSP 2006, Toulouse, France 41

Microphone arrays:general parameters (2)

Ambient noise gain: isotropic noise reduction

Non-correlated (sensor) noise gain

Total noise gain: combination of the two above

The beamformer design is to find weight matrix to satisfy certain criteria & constrains

2 2

02

1( ) ( , , )4CH f B f d d

ππ

π

ϕ θ θ ϕπ

+

= ∫ ∫

12

0( ) ( , )

M

Ni

H f W f i−

=

= ∑

( ) ( )2 2

2

( ). ( ) ( ). ( )( )

( )C C N N

C

H f N f H f N fH f

N f+

=

May 14th, 2006 ICASSP 2006, Toulouse, France 42

Microphone arrays:time invariant beamformerDesign criteria:

Max noise suppression: highly non-linearReplaced with directivity pattern matching – reducing the optimization dimensionsIsotropic noise assumption

Constrains:Unit gain and zero phase shift towards MRAFrequently: in the beamwidth area

Two controversial trends: decreasing the ambient noise gain increases the non-correlated noise gain. Optimum? – Minimize the total gain

Page 22: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 22

May 14th, 2006 ICASSP 2006, Toulouse, France 43

Microphone arrays:time invariant beamformer (2)

Superidirective beamformer (Cox, 1986)

is the power spectral density matrix of the input signals assuming isotropic noiseConstrained LMS algorithm, antenna arrayAchieves maximum directivityChu, 1997; Elko, 2000

min( ) 1H HXXW

W W subject to W DΦ =

XXΦ

May 14th, 2006 ICASSP 2006, Toulouse, France 44

Microphone arrays:time invariant beamformer (3)

0 2000 4000 6000 80000

5

10

15

20

25

Frequency (Hz)

Dire

ctiv

ity (l

inea

r)

SuperdirectiveDelay-and-sum

0 2000 4000 6000 8000-60

-50

-40

-30

-20

-10

0

10

Frequency (Hz)

Whi

te n

oise

gai

n (d

B)

SuperdirectiveDelay-and-sum

-20

-10

0

90

270

180 0

Superdirective beamformer (f=3000 Hz)

-20

-10

0

90

270

180 0

Delay-and-sum beamformer (f=3000 Hz)

Comparison:Delay and sum and Superdirective array

Simulation:5 element linear array,3 cm distance

Page 23: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 23

May 14th, 2006 ICASSP 2006, Toulouse, France 45

Microphone arrays:time invariant beamformer (4)

Design example (Tashev/Malvar, 2005)Four element linear arrayBeamwidth vs. Frequency vs. Total Noise GainDirectivity pattern vs. FrequencyDirectivity pattern in 3D for 1000 Hz

Demonstrations:a) Parallel recordingb) Real-time SSL

May 14th, 2006 ICASSP 2006, Toulouse, France 46

Microphone arrays:time invariant beamformer (5)Advantages:

No VAD required Stable, reliable, predictable, measurableGuaranteed parametersFast switching to different speakerLow CPU requirement

Real-world problems: Requires Sound Source Localizer to find and track the desired sound sourceSensor’s & equipment’s noises limit the performance Microphones manufacturing tolerances:

Calibration during manufacturingAuto calibration during use (Tashev, 2004)

Page 24: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 24

May 14th, 2006 ICASSP 2006, Toulouse, France 47

Microphone arrays:source localization

Time delay estimates basedCross-correlation functionWeighting: ML, PHAT (Knap/Carter, 1976)Combining the pairs

Brandstein et. all., 1996Burchfield et. all., 2001 – uses optimization, works in 2DRui/Florencio, 2003 – sum or cross-correlation functions towards hypothesis

Beamsteering basedCompute the output energy of set of beamsFind the maximumDo interpolation for increased precisionVariant: two dimensional search

May 14th, 2006 ICASSP 2006, Toulouse, France 48

Microphone arrays:source localization (2)

Problems: noise and reverberation Post-processing the raw SSL results

Particle filteringKalman filteringReal-time clustering

Camera-assisted approachFace detection softwareFusion SSL and video data

•Real SSL results: raw, post-processed, snapped to 10 degrees beams.•Two persons talking at 6 and -38 degrees, distance 12 feet, conference room.•Four element linear array.

Page 25: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 25

May 14th, 2006 ICASSP 2006, Toulouse, France 49

Microphone arrays:adaptive algorithms

Frost algorithm (Frost, 1972)

is the power spectral density matrix of the input signalsGradient descent optimization, i.e. constrained LMS algorithmDesigned for antenna array

min( ) 1H HXXW

W W subject to W DΦ =

XXΦ

May 14th, 2006 ICASSP 2006, Toulouse, France 50

Microphone arrays:adaptive algorithms (2)

Generalized Side Lobe Canceller (Griffiths/Jim, 1982)Time-invariant beamformerNulls are sharper than beamsBlocking matrix – place null towards the sound sourceAdaptive filters to minimize residual in the beamformer output

Page 26: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 26

May 14th, 2006 ICASSP 2006, Toulouse, France 51

Microphone arrays:adaptive algorithms (3)

AdvantagesUse fully the geometry under the specific noiseVery good with point noise sources No calibration required

Real-world problemsHigher requirement for CPU, memoryMore complex for implementationSlower adaptation and switching to next sound sourceNon-predictable and non-guaranteed parametersSimilar to fixed beamformers performance with ambient type of noise

May 14th, 2006 ICASSP 2006, Toulouse, France 52

Microphone arrays:non-linear spatial filtering

Implemented as non-linear post-processorBased on Instantaneous Direction Of Arrival (IDOA) estimation per bin

where Compute the probability and apply in the same way as in noise suppression under uncertain presence of signal

[ ]1 2 1( ) ( ), ( ), , ( )Mf f f fδ δ δ −∆ …

1 1( ) arg( ( )) arg( ( ))j jf X f X fδ − = −-2

02

-20

2

-2

0

2

-π ≤ δ1 ≤ +π

Phase differences at 750 Hz

-π ≤ δ2 ≤ +π

-π ≤

δ3 ≤

MeasuredComputed

Page 27: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 27

May 14th, 2006 ICASSP 2006, Toulouse, France 53

Microphone arrays:non-linear spatial filtering (2)

Generalized suppression with spatial information and known look-up directionDemo:

Recording conditions:Human speaker at 0 degrees, 1.5 mRadio at -45 degrees, 2 mOffice: normal noise and reverberationFour element linear microphone array

Same audio recording, two sequences:video: direction-frequency-power; audio: one microphonevideo: direction-power for SSL; audio: array output

May 14th, 2006 ICASSP 2006, Toulouse, France 54

Microphone arrays:non-linear spatial filtering (3)

AdvantagesBetter directivity than time-invariant beamformerGood source separationLow CPU overhead

Real-world problemsRequires channel matching, i.e. calibrationNon-linear processing (artifacts, musical noises) direction-time-power

Page 28: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 28

May 14th, 2006 ICASSP 2006, Toulouse, France 55

Advanced techniques

Adaptive noise reductionPsychoacoustic based noise suppressorNoise suppressor optimized for speech recognitionNoise suppression with speech modelSpatial noise suppression

May 14th, 2006 ICASSP 2006, Toulouse, France 56

Advanced techniques:adaptive noise reduction

Add a microphone to capture the noise signal (HDD in a laptop, engine in a car)Two inputs system:

voice + noise: y(t)=x(t)+h(t)*z(t)noise only: z(t)

Use LMS, RLS or NLMS adaptive filterDouble talk detector necessary if leakage of x(t) in z(t)

Page 29: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 29

May 14th, 2006 ICASSP 2006, Toulouse, France 57

Advanced techniques:adaptive noise reduction (2)

Advantages:Linear! No musical noises or distortionsWorks with non-stationary noisesLow CPU requirement

Real-world issuesNeeds a second microphoneLimited applicability: when we can capture the noise only signalHas some audible residuals and artifacts

May 14th, 2006 ICASSP 2006, Toulouse, France 58

Advanced techniques:psychoacoustic noise suppressor

Concept:More energy removed -> more musical noises and distortionsMasking effects in frequency and time domains in human perception of soundWhy remove noises we can’t hear?

Real-life issuesNeeds MOS tests for evaluationDuplicates codec functionality – the new audio codecs use the same effect

Page 30: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 30

May 14th, 2006 ICASSP 2006, Toulouse, France 59

Advanced techniques:noise suppressor for ASR

General idea: optimize parameterized suppression rule for best recognition rate (Tashev/Droppo/Acero, 2006)

More training data improves average recognition, harms clean speech recognitionRprop optimization algorithm: enhanced version of gradient descent Objective function: Maximum Mutual Information (MMI) from ASR, closely related to the recognition accuracyOptimization parameters: the suppression ruleStarting point: MMSE Spectral Power Estimator rule

Baseline: 99.5% clean, 52.5% averageStarting point: 96.9% clean, 74.9% averageAchieved optimal point: 99.0% clean, 77.7% average

May 14th, 2006 ICASSP 2006, Toulouse, France 60

Advanced techniques:noise suppressor for ASR (4)

MMSE SPE After 20 Iterations

-40-20

020

40

-40-20

020

40-40

-30

-20

-10

0

10

ζ , dB

Suppression Rule - start point

γ, dB

Sup

pres

sion

Gai

n, d

B

-40-20

020

40

-40-20

020

40-40

-30

-20

-10

0

ζ , dB

Suppression Rule - result point

γ, dB

Sup

pres

sion

Gai

n, d

B

Page 31: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 31

May 14th, 2006 ICASSP 2006, Toulouse, France 61

Advanced techniques:using speech model

General idea:Detect and parse the speech signal: fricatives, vowels, glides, nasals, stopsMeasure the parametersSynthesize clean speech signal

Real-world issues:If we can do reliably the parsing – we solved the noise robust ASR problems ☺Even text-to-speech systems do not have very good pronunciation, doing this without language model is more difficult

May 14th, 2006 ICASSP 2006, Toulouse, France 62

Advanced techniques:using speech model (2)

Drucker (1968):Detect and parse the speech signal: fricatives, vowels, glides, nasals, stopsUse separate enhancing filters for each categoryHard decision for presence and class

McAulay/Malpass (1980) introduced soft decision rules and using several filters in parallelSome techniques:

Use the harmonic structure of vowels, time warping to make them flat, clean, un-warpUse vocal tract model for generating fricatives and other consonantsUsing language model (too specific)

Page 32: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 32

May 14th, 2006 ICASSP 2006, Toulouse, France 63

Advanced techniques:spatial noise suppression

Microphone array for headset (Tashev/Seltzer/Acero, 2005)

3-element microphone arrayBone sensor for reliable VADWorking in IDOA space

Multidimensional generalization of classic noise suppression

Building position-dependent noise modelsApply suppression rule

[ ]1 2 1( ) ( ), ( ), , ( )Mf f f fδ δ δ −∆ …

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

Phase differences for 1000 Hz

D12

D13

1 1( ) arg( ( )) arg( ( ))j jf X f X fδ − = −

May 14th, 2006 ICASSP 2006, Toulouse, France 64

Advanced techniques:spatial noise suppression (2)

0 2000 4000 6000 8000-10

-8

-6

-4

-2

0

2

Frequency, Hz

Mag

nitu

de, d

B

Mic1Mic2

•General architecture•Beamformerdirectivity•Diffraction around the head correction

Page 33: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 33

May 14th, 2006 ICASSP 2006, Toulouse, France 65

Advanced techniques:spatial noise suppression (3)

Signal and noise variances

a priory and a posteriori SNR

Suppression rule

2

2

( | ) ( | )

( | ) ( | )

Y

D

f E Y f

f E D f

λ

λ

⎡ ⎤∆ ∆⎢ ⎥⎣ ⎦⎡ ⎤∆ ∆⎢ ⎥⎣ ⎦

2

( | ) ( | )( | ) (1 )max[0, ( | )], [0,1)( | )

( | )( | )

( | )

Y D

D

D

f ff ff

Y ff

f

λ λξ β β γ βλ

γλ

∆ − ∆∆ + − ∆ ∈∆

∆∆

( | ) 1 ( | )( | )1 ( | ) ( | )

f fH ff f

ξ ϑξ γ

⎛ ⎞∆ + ∆∆ = ⎜ ⎟+ ∆ ∆⎝ ⎠

-20

2

-2

0

2

0

50

100

-π ≤ δ2 ≤ +π-π ≤ δ1 ≤ +π

Sig

nal v

aria

nce

-20

2

-2

0

2

0

5

10

-π ≤ δ2 ≤ +π-π ≤ δ1 ≤ +π

Noi

se v

aria

nce

May 14th, 2006 ICASSP 2006, Toulouse, France 66

Advanced techniques:spatial noise suppression (4)

SNR improvement, all units in dB

BM – best microphone,BF – beamformerNS – noise suppressorSR – spatial noise suppressor

Demo: parallel recording with BT headset

16.411.16.43.2Car, 90 dB22.817.512.37.2Café, 75 dB 34.729.422.525.2Office, 55 dB SRNSBFBM

Page 34: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 34

May 14th, 2006 ICASSP 2006, Toulouse, France 67

Advanced techniques:summary

Improving further the noise suppression and reduction increases complexity, requires more information.The algorithms become more specialized: for car, for speech, for ASR, for specific noises.Use good judgment when use or design them:

Do I need this? How specific is the application?

Remember: more complex model with more parameters means slower computation and adaptation. Use with caution.Still very exciting new algorithms, solving problems unsolved so far.

May 14th, 2006 ICASSP 2006, Toulouse, France 68

Defeating ambient noise: final remarks

The art of noise suppression is to know when to stop.None of the methods is universal, use cascading and make sure not to destroy important properties.Build processing blocks, think the whole system: well balanced suppression across the processing chain.Noise suppression is about human perception: use your ears and MOS tests.

Page 35: ICASSP06 Tutorial Defeating Ambient Noise · Defeating Ambient Noise - practical approaches May 14th, 2006 ICASSP 2006, Toulouse, France 3 May 14th, 2006 ICASSP 2006, Toulouse, France

Defeating Ambient Noise - practical approaches

May 14th, 2006

ICASSP 2006, Toulouse, France 35

May 14th, 2006 ICASSP 2006, Toulouse, France 69

Finally

Thank you for choosing this tutorial!Thank you for the attention!

Questions?

Contact info: [email protected]://research.microsoft.com/users/ivantash