Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820:...
Transcript of Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820:...
![Page 1: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/1.jpg)
EE E6820: Speech & Audio Processing & Recognition
Lecture 8:Spatial sound
Michael Mandel <[email protected]>
Columbia University Dept. of Electrical Engineeringhttp://www.ee.columbia.edu/∼dpwe/e6820
March 27, 2008
1 Spatial acoustics
2 Binaural perception
3 Synthesizing spatial audio
4 Extracting spatial sounds
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 1 / 33
![Page 2: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/2.jpg)
Outline
1 Spatial acoustics
2 Binaural perception
3 Synthesizing spatial audio
4 Extracting spatial sounds
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 2 / 33
![Page 3: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/3.jpg)
Spatial acoustics
Received sound = source + channelI so far, only considered ideal source waveform
Sound carries information on its spatial originI ”ripples in the lake”
I evolutionary significance
The basis of scene analysis?I yes and no—try blocking an ear
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 3 / 33
![Page 4: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/4.jpg)
Ripples in the lake
SourceSource
Listener
Wavefront (@ c m/s)
Energy ∝ 1/r2
Effect of relative position on soundI delay = ∆r
cI energy decay ∼ 1
r 2
I absorption ∼ G (f )r
I direct energy plus reflections
Give cues for recovering source position
Describe wavefront by its normal
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 4 / 33
![Page 5: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/5.jpg)
Recovering spatial information
Source direction as wavefront normal
moving plane found from timing at 3 points
wavefront
A
B
Ctime
pres
sure
∆t/c = ∆s = AB·cosθ
θ
need to solve correspondence
range r
azimuth θ
elevation φ
Space: need 3 parameters
e.g. 2 angles and range
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 5 / 33
![Page 6: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/6.jpg)
The effect of the environmentReflection causes additional wavefronts
reflection
diffraction & shadowing
I + scattering, absorptionI many paths → many echoes
Reverberant effectI causal ‘smearing’ of signal energy
time / sec
freq
/ H
z
time / sec
freq
/ H
z
0 0.5 1 1.50
2000
4000
6000
8000
0 0.5 1 1.50
2000
4000
6000
8000Dry speech 'airvib16' + reverb from hlwy16
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 6 / 33
![Page 7: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/7.jpg)
Reverberation impulse response
Exponential decay of reflections:
t
hroom(t) ~e-t/T
time / s
freq
/ H
z
hlwy16 - 128pt window
0 0.1 0.2 0.3 0.4 0.5 0.6 0.70
2000
4000
6000
8000
-70
-60
-50
-40
-30
-20
-10
Frequency-dependentI greater absorption at high frequencies → faster decay
Size-dependentI larger rooms → longer delays → slower decay
Sabine’s equation:
RT60 =0.049V
Sα
Time constant as size, absorption
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 7 / 33
![Page 8: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/8.jpg)
Outline
1 Spatial acoustics
2 Binaural perception
3 Synthesizing spatial audio
4 Extracting spatial sounds
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 8 / 33
![Page 9: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/9.jpg)
Binaural perception
path length difference
path length difference
head shadow (high freq)
source
LR
What is the information in the 2 ear signals?I the sound of the source(s) (L+R)I the position of the source(s) (L-R)
Example waveforms (ShATR database)
2.2 2.205 2.21 2.215 2.22 2.225 2.23 2.235
-0.1
-0.05
0
0.05
0.1
time / s
shatr78m3 waveform
Left
Right
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 9 / 33
![Page 10: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/10.jpg)
Main cues to spatial hearing
Interaural time difference (ITD)I from different path lengths around headI dominates in low frequency (< 1.5 kHz)I max ∼750 µs → ambiguous for freqs > 600 Hz
Interaural intensity difference (IID)I from head shadowing of far earI negligible for LF; increases with frequency
Spectral detail (from pinna reflections) useful for elevation &range
Direct-to-reverberant useful for range
Claps 33 and 34 from 627M:nf90
time / s
freq
/ kH
z
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80
5
10
15
20
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 10 / 33
![Page 11: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/11.jpg)
Head-Related Transfer Functions (HRTFs)
Capture source coupling as impulse responses
{`θ,φ,R(t), rθ,φ,R(t)}
Collection: (http://interface.cipic.ucdavis.edu/)
0 0.5 1 1.5
-45
0
45
0 0.5 1 1.5
0
1
0 0.5 1 1.5-1
0
1
time / ms time / ms
HRIR_021 Left @ 0 el
HRIR_021 Left @ 0 el 0 az
HRIR_021 Right @ 0 el 0 az
HRIR_021 Right @ 0 el
LEFT
RIGHT
Azi
mut
h / d
eg
Highly individual!
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 11 / 33
![Page 12: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/12.jpg)
Cone of confusion
azimuth θ
Cone of confusion (approx equal ITD)
Interaural timing cue dominates (below 1kHz)I from differing path lengths to two ears
But: only resolves to a coneI Up/down? Front/back?
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 12 / 33
![Page 13: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/13.jpg)
Further cues
Pinna causes elevation-dependent coloration
Monaural perceptionI separate coloration from source spectrum?
Head motionI synchronized spectral changesI also for ITD (front/back) etc.
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 13 / 33
![Page 14: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/14.jpg)
Combining multiple cuesBoth ITD and ILD influence azimuth;What happens when they disagree?
t t
r(t)
1 ms
l(t)
t t
r(t)l(t)
Identical signals to both ears → image is centered
Delaying right channel moves image to left
t t
r(t)l(t)
Attenuating left channel returns image to center
“Time-intensity trading”
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 14 / 33
![Page 15: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/15.jpg)
Binaural position estimation
Imperfect results: (Wenzel et al., 1993)
-180 -120 -60 0 60 120 1800
Target Azimuth (Deg)
-180
-120
-60
60
120
180
0Ju
dged
Azi
mut
h (D
eg)
listening to ‘wrong’ HRTFs → errors
front/back reversals stay on cone of confusion
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 15 / 33
![Page 16: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/16.jpg)
The Precedence Effect
Reflections give misleading spatial cues
t
l(t)
tR/c
R r(t)
directreflected
But: Spatial impression based on 1st wavefrontthen ‘switches off’ for ∼50 ms
. . . even if ‘reflections’ are louder
. . . leads to impression of room
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 16 / 33
![Page 17: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/17.jpg)
Binaural Masking Release
Adding noise to reveal target
t
t
Tone + noise to one ear: tone is masked
+
t
t
Identical noise to other ear: tone is audible
t
+
Binaural Masking Level Difference up to 12dBI greatest for noise in phase, tone anti-phase
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 17 / 33
![Page 18: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/18.jpg)
Outline
1 Spatial acoustics
2 Binaural perception
3 Synthesizing spatial audio
4 Extracting spatial sounds
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 18 / 33
![Page 19: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/19.jpg)
Synthesizing spatial audio
Goal: recreate realistic soundfieldI hi-fi experienceI synthetic environments (VR)
ConstraintsI resourcesI information (individual HRTFs)I delivery mechanism (headphones)
Source material typesI live recordings (actual soundfields)I synthetic (studio mixing, virtual environments)
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 19 / 33
![Page 20: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/20.jpg)
Classic stereo
L R
‘Intensity panning’:no timing modifications, just vary level ±20 dB
I works as long as listener is equidistant (ILD)
Surround sound:extra channels in center, sides, . . .
I same basic effect: pan between pairs
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 20 / 33
![Page 21: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/21.jpg)
Simulating reverberation
Can characterize reverb by impulse responseI spatial cues are important: record in stereoI IRs of ∼1 sec → very long convolution
Image model: reflections as duplicate sources
source listener
virtual (image) sourcesreflected
path
‘Early echos’ in room impulse response:
t
hroom(t)
direct pathearly echos
Actual reflection may be hreflect(t), not δ(t)
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 21 / 33
![Page 22: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/22.jpg)
Artificial reverberation
Reproduce perceptually salient aspectsI early echo pattern (→ room size impression)I overall decay tail (→ wall materials. . . )I interaural coherence (→ spaciousness)
Nested allpass filters (Gardner, 1992)
z-k+ +
-g
g
g,k
x[n] y[n]
nk 2k 3k
-g
1-g2g(1-g2) g2(1-g2)h[n]
z-k - g1 - g·z-kH(z) =
20,0.3
Allpass
Nested+Cascade Allpass Synthetic Reverb
30,0.750,0.5
AP0+ AP1 AP2
LPFg
a0 a1 a2
+ +
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 22 / 33
![Page 23: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/23.jpg)
Synthetic binaural audio
Source convolved with {L,R} HRTFs gives precise positioning. . . for headphone presentation
I can combine multiple sources (by adding)
Where to get HRTFs?I measured set, but: specific to individual, discreteI interpolate by linear crossfade, PCA basis setI or: parametric model - delay, shadow, pinna (Brown and Duda,
1998)
Source
Delay Shadow Pinna
z-tDL(θ)1 - azt
1 - bL(θ)z-1
z-tDR(θ)1 - azt
1 - bR(θ)z-1
Σ pkL(θ,φ)·z-tPkL(θ,φ)
Σ pkR(θ,φ)·z-tPkR(θ,φ)
Room echoKE·z-tE
+
+
(after Brown & Duda '97)
Head motion cues?I head tracking + fast updates
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 23 / 33
![Page 24: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/24.jpg)
Transaural sound
Binaural signals without headphones?
Can cross-cancel wrap-around signalsI speakers SL,R , ears EL,R , binaural signals BL,R
I Goal: present BL,R to EL,R
SL = H−1LL (BL − HRLSR)
SR = H−1RR (BR − HLRSL)
EL ER
HRR
HRLHLR
HLL
SL
BL
SR
BR
M
Narrow ‘sweet spot’I head motion?
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 24 / 33
![Page 25: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/25.jpg)
Soundfield reconstruction
Stop thinking about earsI just reconstruct pressure + spatial derivatives
p(x,y,z,t)
∂p(t)/∂z∂p(t)/∂x∂p(t)/∂y
I ears in reconstructed field receive same sounds
Complex reconstruction setup (ambisonics)
I able to preserve head motion cues?
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 25 / 33
![Page 26: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/26.jpg)
Outline
1 Spatial acoustics
2 Binaural perception
3 Synthesizing spatial audio
4 Extracting spatial sounds
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 26 / 33
![Page 27: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/27.jpg)
Extracting spatial sounds
Given access to soundfield, can we recover separatecomponents?
I degrees of freedom: > N signals from N sensors is hardI but: people can do it (somewhat)
Information-theoretic approachI use only very general constraintsI rely on precision measurements
Anthropic approachI examine human perceptionI attempt to use same information
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 27 / 33
![Page 28: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/28.jpg)
Microphone arrays
Signals from multiple microphones can be combined toenhance/cancel certain sources
‘Coincident’ mics with different directional gains
m1 s1
m2
s2
a21a22
a12a11
[m1
m2
]=
[a11 a12
a21 a22
] [s1
s2
]⇒
[s1
s2
]= A−1m
Microphone arrays (endfire)
DD +D ++
-40
-20
0
λ = 4D
λ = 2D
λ = D
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 28 / 33
![Page 29: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/29.jpg)
Adaptive Beamforming &Independent Component Analysis (ICA)
Formulate mathematical criteria to optimize
Beamforming: Drive interference to zeroI cancel energy during nontarget intervals
ICA: maximize mutual independence of outputsI from higher-order moments during overlap
m1 m2
s1 s2
a11 a21
a12 a22
x
−δ MutInfo δa
Limited by separation model parameter spaceI only N × N?
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 29 / 33
![Page 30: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/30.jpg)
Binaural models
Human listeners do better?I certainly given only 2 channels
Extract ITD and IID cues?
I cross-correlation finds timing differencesI ‘consume’ counter-moving pulsesI how to achieve IID, tradingI vertical cues...
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 30 / 33
![Page 31: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/31.jpg)
Time-frequency masking
How to separate sounds based on direction?I assume one source dominates each time-frequency pointI assign regions of spectrogram to sources based on probabilistic
modelsI re-estimate model parameters based on regions selected
Model-based EM Source Separation and Localization
I Mandel and Ellis (2007)
I models include IID as∣∣∣ Lω
Rω
∣∣∣ and IPD as arg Lω
Rω
I independent of source, but can model it separately
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 31 / 33
![Page 32: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/32.jpg)
Summary
Spatial soundI sampling at more than one point gives information on origin
direction
Binaural perceptionI time & intensity cues used between/within ears
Sound renderingI conventional stereoI HRTF-based
Spatial analysisI optimal linear techniquesI elusive auditory models
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 32 / 33
![Page 33: Lecture 8: Spatial sound - Electrical Engineeringdpwe/e6820/lectures/L08-spatial.pdf · EE E6820: Speech & Audio Processing & Recognition Lecture 8: Spatial sound Michael Mandel](https://reader034.fdocuments.us/reader034/viewer/2022042709/5f3db32b9253a74a9d40e86d/html5/thumbnails/33.jpg)
References
Elizabeth M. Wenzel, Marianne Arruda, Doris J. Kistler, and Frederic L. Wightman.Localization using nonindividualized head-related transfer functions. The Journal ofthe Acoustical Society of America, 94(1):111–123, 1993.
William G. Gardner. A real-time multichannel room simulator. The Journal of theAcoustical Society of America, 92(4):2395–2395, 1992.
C. P. Brown and R. O. Duda. A structural model for binaural sound synthesis. IEEETransactions on Speech and Audio Processing, 6(5):476–488, 1998.
Michael I. Mandel and Daniel P. Ellis. EM localization and separation using interaurallevel and phase cues. In IEEE Workshop on Applications of Signal Processing toAudio and Acoustics, pages 275–278, 2007.
J. C. Middlebrooks and D. M. Green. Sound localization by human listeners. AnnuRev Psychol, 42:135–159, 1991.
Brian C. J. Moore. An Introduction to the Psychology of Hearing. Academic Press,fifth edition, April 2003. ISBN 0125056281.
Jens Blauert. Spatial Hearing - Revised Edition: The Psychophysics of Human SoundLocalization. The MIT Press, October 1996.
V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano. The cipic hrtfdatabase. In Applications of Signal Processing to Audio and Acoustics, 2001 IEEEWorkshop on the, pages 99–102, 2001.
Michael Mandel (E6820 SAPR) Spatial sound March 27, 2008 33 / 33