Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We...

78
Auditory Scene Analysis Chris Darwin
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    0

Transcript of Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We...

Page 1: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Auditory Scene Analysis

Chris Darwin

Page 2: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.
Page 3: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Need for sound segregation

• Ears receive mixture of sounds

• We hear each sound source as having its own appropriate timbre, pitch, location

• Stored information about sounds (eg acoustic/phonetic relations) probably concerns a single source

• Need to make single source properties (eg silence) explicit

Page 4: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Mechanisms of segregation

• Primitive grouping mechanisms based on general heuristics such as harmonicity and onset-time - “bottom-up” / “pure audition”

• Schema-based mechanisms based on specific knowledge (general speech constraints?) - “top-down.

Page 5: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Segregation of simple musical sounds

• Successive segregation– Different frequency (or pitch)– Different spatial position– Different timbre

• Simultaneous segregation– Different onset-time– Irregular spacing in frequency– Location (rather unreliable)– Uncorrelated FM notnot used

Page 6: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Successive grouping by frequency

Track 8Track 7

Bugandan xylophone music: “Ssematimba ne Kikwabanga”

Page 7: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Not peripheral channelling

Streaming occurs for sounds – with same auditory excitation pattern, but

different periodicities Vliegen, J. and Oxenham, A. J. (1999). "Sequential stream segregation in the absence of spectral cues," J. Acoust. Soc. Am. 105, 339-46.

– with Huggins pitch sounds that are only defined binaurally Akeroyd, Carlyon & Deeks (2005) JASA, 118, 977-81

Page 8: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Huggins pitch

Noise

Fre

quen

cy

Time

"a faint tone"

Inte

raur

alph

ase

diff

eren

ce

0

Frequency500 Hz

∆ø

Page 9: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Successive grouping by timbre

Wessell illusion

fo

Page 10: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Successive grouping by spatial separation

Track 41

Page 11: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Sach & Bailey - rhythm unmasking by ITD or spatial position ?

ITD sufficient but, sequential segregation by spatial position rather than by ITD alone.

Target • ITD=0, ILD = 0 Target • ITD=0, ILD = +4 dB

Masker

Page 12: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Simultaneous: Frequency vs Space

Left

Right

Tchaikovsky: Intro to 4th movement of Pathétique Symphony

From Diana Deutsch (1999)Frequency wins !

Page 13: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Build-up of segregation

Horse Morse-LHL-LHL-LHL- --> --H---H---H--

-L-L-L-L-L-L-L

• Segregation takes a few seconds to build up.

• Then between-stream temporal / rhythmic

judgments are very difficult

Page 14: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Some interesting points:

• Sequential streaming may require attention - rather than being a pre-attentive process.

Page 15: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Attention necessary for build-up of streaming (Carlyon et al, JEP:HPP 2000)

Horse Morse-LHL-LHL-LHL- --> --H---H---H--

-L-L-L-L-L-L-L

• Horse -> Morse takes a few seconds to segregate

• These have to be seconds spent attending to the tone stream

• Does this also apply to other types of segregation?

Page 16: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Continuity and streaming

Discontinuous frequency changes produce more streaming than do continuous changes

Page 17: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Capturing a component from a mixture by frequency proximity

A-B A-BCFreq separation of ABHarmonicity & synchrony of BC

Bregman & Pinker, 1978, Canad J Psychol

Disjoint Allocation?

Page 18: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Rhythmic masking release

Simultaneous onset prevents a component from forming part of a sequential stream

Track 22

Page 19: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Simultaneous grouping

What is the timbre / pitch / location of a particular sound source ?

Important grouping cues

• continuity (or repetition) “Old + New”

• onset time

• harmonicity (or regularity of frequency spacing)

Page 20: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Bregman’s Old + New principle

Stimulus: A followed by A+B

-> Percept of:

A as continuous (or repeated)

with B added as separate percept

Page 21: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

BMAMB

Old+New Heuristic

A A A A

B B B B

M M M M M M M MM

AMAMB

Page 22: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Percept

M

Page 23: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Rate of onset and continuity

Rapid increases in level lead to Old+NewGradual just heard as increase. Track 32

Page 24: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

time

frequ

ency

Grouping & vowel quality

Page 25: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Grouping & vowel quality (2)

+time

frequ

en

cy

time

frequen

cy

time

frequ

en

cy

continuation removed from vowel continuation not removed from vowel

time

frequency

captor

+time

frequ

ency

time

frequency

Page 26: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Onset-time:allocation is subtractive not exclusive

• Bregman’s Old-plus-New heuristic

Level-Independent Level-Dependent

timefrequency

+

time

frequency +

time

frequency

• Indicates importance of coding change.

Page 27: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Asynchrony & vowel quality

90 ms

T

440

450

460

470

480

490

0 80 160 240 320Onset Asynchrony T (ms)

F1

boun

dary

(H

z)

8 subjectsNo 500 Hz component

Page 28: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Mistuning & pitch

-0.2

0

0.2

0.4

0.6

0.8

1

0 1 2 3 5 8

vowelcomplex

Mea

n pi

tch

shif

t (H

z)

% Mistuning of 4th Harmonic

8 subjects

90 ms

Page 29: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Onset asynchrony & pitch

-0.2

0

0.2

0.4

0.6

0.8

1

0 80 160 240 320

vowelcomplex

Onset Asynchrony T (ms)

Mea

n pi

tch

shif

t (H

z)

8 subjects±3% mistuning

90 ms

T

Page 30: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Some interesting points:• Sequential streaming may require attention - rather than

being a pre-attentive process.

• Parametric behaviour of grouping depends on what it is for.

Page 31: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Grouping for

Effectiveness of a parameter on grouping

depends on the task. Eg

• 10-ms onset time allows a harmonic to be heard out

• 40-ms onset-time needed to remove from vowel quality

• >100-ms needed to remove it from pitch.

Page 32: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

c. 10 msHarmonic in vowel to be heard out:

40 msHarmonic to be removed from vowel:

200 msHarmonic to be removed from pitch:

Minimum onset needed for:

Page 33: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Grouping not absolute and independent

of classification

group

classify

Page 34: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Apparent continuity

Track 28 - 31

If B would have masked if it HAD been there, then you don’t notice that it is not there.

iTunes

Page 35: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Continuity & grouping

Harmonic

1. Pulsing complex 1. Pulsing high tone2. Steady low tone

Enharmonic

Group tones; then decide on continuity.

Page 36: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Some interesting points:

• Sequential streaming may require attention - rather than being a pre-attentive process.

• Parametric behaviour of grouping depends on what it is for.

• Not everything that is obvious on an auditory spectrogram can be used :

• FM of Fo irrelevant for segregation (Carlyon, JASA 1991; Summerfield & Culling 1992)

Page 37: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Fo between two sentences(Bird & Darwin 1998; after Brokx & Nooteboom, 1982)

0

20

40

60

80

100

0 2 4 6 8 10

Normal

Fo difference (semitones)

40 Subjects40 Sentence Pairs

Perfect Fourth ~4:3

Two sentences (same talker)• only voiced consonants • (with very few stops)Thus maximising Fo effect

Task: write down target sentence

Replicates & extends Brokx & Nooteboom

Target sentence Fo = 140 Hz

Masking sentence = 140 Hz ± 0,1,2,5,10 semitones

Page 38: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Carlyon: across-frequency FM coherence

Odd-one in 2 or 3 ?

1 2 3

freq

uenc

y

5 Hz, 2.5% FM

250020001500

Easy

250021001500

Impossible

Harm Inharmonic

Carlyon, R. P. (1991). "Discriminating between coherent and incoherent frequency modulation of complex tones," J. Acoust. Soc. Am. 89, 329-340.

Page 39: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

McAdams FM in sung vowels

Bregman demo 24

Page 40: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Role of localisation cues

What role do localisation cues play in helping us to hear one voice in the presence of another ?

• Head shadow increases S/N at the nearer ear (Bronkhurst &

Plomp, 1988).

– … but this advantage is reduced if high frequencies inaudible (B &

P, 1989)

• But do localisation cues also contribute to selectively

grouping different sound sources?

Page 41: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Some interesting points:

• Sequential streaming may require attention - rather than being a pre-attentive process.

• Parametric behaviour of grouping depends on what it is for.• Not everything that is obvious on an auditory spectrogram can be used :

• FM of Fo irrelevant for segregation (Carlyon, JASA 1991; Summerfield & Culling 1992)

• Although we can group sounds by ear, ITDs by themselves remarkably useless for simultaneous grouping. Group first then localise grouped object.

Page 42: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Separating two simultaneous sound sources

• Noise bands played to different ears group by ear, but...

• Noise bands differing in ITD do not group by ear

Page 43: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Segregation by ear but not by ITD

(Culling & Summerfield 1995)

0

25

50

75

100

ear ITDLateralisation cue

% vowels identified

ear ITDAR EE AR EE

delay

AREE

EROO

Task - what vowel is on your left ? (“ee”)

Page 44: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Two models of attention

Establish ITD of frequency

components

Attend to common ITD across

components

Establish ITD of frequency

components

Group components by harmonicity, onset-time etc

Establish direction of grouped object

Attend to direction of

grouped object

Attend to common ITD Attend to direction of object

Peripheral filtering into frequency components

Peripheral filtering into frequency components

Page 45: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Phase Ambiguity500 Hz: period = 2ms

R leads by 1.5 ms L leads by 0.5 ms

LL R

cross-correlation peaks at +0.5ms and -1.5ms

auditory system weighted toone closest to zero

500-Hz pure tone leading in Right ear by 1.5 ms

Heard on Left side

Page 46: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Disambiguating phase-ambiguity

• Narrowband noise at 500 Hz with ITD of 1.5 ms (3/4 cycle) heard at lagging side.

•Increasing noise bandwidth changes location to the leading side.

Explained by across-frequency consistency of ITD.

(Jeffress, Trahiotis & Stern)

Page 47: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Resolving phase ambiguity

500 Hz: period = 2ms

L lags by 1.5 ms or L leads by 0.5 ms ?

-2.5200

800

600

400

-0.5 1.5 3.5

Delay of cross-correlator ms

Fre

quency

of

audit

ory

filt

er

Hz

Cross-correlation peaks for noise delayed in one ear by 1.5 ms

300 Hz: period = 3.3ms

R R LL R

Actual delay

Left ear actually lags by 1.5 ms

L lags by 1.5 ms or L leads by 1.8 ms ?

R

Page 48: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Segregation by onset-time

200

400

600

800

Fre

quen

cy (

Hz)

Duration (ms)0 400

Duration (ms)0 80 400

Synchronous Asynchronous

ITD: ± 1.5 ms (3/4 cycle at 500 Hz)

Page 49: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Segregated tone changes location

-20

0

20

0 20 40 80

Onset Asynchrony (ms)

Poi

nter

IID

(dB

)

Pure

ComplexR L

Page 50: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Segregation by mistuning

200

400

600

800

Fre

quen

cy (

Hz)

Duration (ms)0 400

Duration (ms)0 80 400

In tune Mistuned

Page 51: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Mistuned tone changes location

Page 52: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Mechanisms of segregation

• Primitive grouping mechanisms based on general heuristics such as harmonicity and onset-time - “bottom-up” / “pure audition”

• Schema-based mechanisms based on specific knowledge (general speech constraints?) - “top-down.

Page 53: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Hierarchy of sound sources ?

Orchestra1° Violin section

LeaderChord

Lowest noteAttack

2° violins…

Corresponding hierarchy of constraints ?

Page 54: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Is speech a single sound source ?

Multiple sources of sound:Vocal folds vibratingAspirationFricationBurst explosionClicks

Nama: Baboon's arse

Page 55: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Tuvan throat music

Page 56: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Tuvan throat music

Page 57: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Mechanisms of grouping / segregation

• Primitive grouping mechanisms based on general heuristics such as harmonicity and onset-time - “bottom-up” / “pure audition”– Evidence: Fo-diffs on simultaneous speech

• Schema-based mechanisms based on specific knowledge (general speech constraints?) - “top-down” / “segregation by recognition”– Evidence: sine-wave speech

Page 58: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Sine-wave speech: one is OK...(Bailey et al., Haskins SR 1977; Remez et al., Science 1981)

Page 59: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

... but how about two?Barker & Cooke, Speech Comm 1999

Onset-time provides only bottom-up cue

Page 60: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Both approaches could be true

• Bottom-up processes constrain alternatives considered by top-down processes

e.g. cafeteria model (Darwin, QJEP 1981)

Evidence:

Onset-time segregates a harmonic from a vowel, even if it produces a “worse” vowel (Darwin, JASA 1984)

time

+

time

Page 61: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Low-level cues for separating a mixture of two sounds such as speech

frequency ->

dB

frequency ->

dB

Mixture

frequency ->

dB

frequency ->

dB

Source A

Source B

Look for:

• harmonic series

• sounds starting at the same time

Page 62: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Plan

• How does Fo help in separating sound sources?– within vs across-formant grouping

• Effect of localisation cues on grouping & attention– Grouping by ear & by ITD– Maintaining attention to sound source (ITD,

prosody, VT length)

Page 63: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Broadbent & Ladefoged (1957)

• PAT-generated sentence “What did you say before that?”

F1 F2

• when Fo the same -125 Hz (either natural or monotone),

• listeners heard:

• one voice only 16/18

• in one place 18/18

• when Fo different -125 /135 (monotone),

• listeners heard:

• two voices 15/18

• in two places 12/18

But as B & L admit ...

Page 64: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

... Harvey Fletcher (1953) was there first ! (almost)

p 216 describes experiment (suggested by Arnold).

• Speech fuses

• but polyphonic music sounds weird since different notes are heard at different ears

LP @1kHz HP @1kHz

Page 65: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Conclusion

Common Fo integrates

– broadband frequency regions of a single voice

– coming simultaneously to different ears – into a single voice heard in one position.

But what about Fo’s ability to separate different voices? (original B & L question)

Page 66: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

0

20

40

60

80

100

0 2 4 6 8 10 12

Assmann & Summerfield 200msBrokx & Notteboom

% correct

semitones

Fo improves identification

double vowels

sentences

Page 67: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Mechanisms of Fo improvement

• A. Across formant grouping by Fo• B. Better definition of individual formants -

especially where harmonics resolved

• B more important than A for double vowels (Culling & Darwin, JASA 1993).

• Also true for sentences?

Page 68: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Fo between two sentences(Bird & Darwin 1998; after Brokx & Nooteboom, 1982)

0

20

40

60

80

100

0 2 4 6 8 10

Normal

Fo difference (semitones)

40 Subjects40 Sentence Pairs

Perfect Fourth ~4:3

Target sentence Fo = 140 Hz

Masking sentence = 140 Hz ± 0,1,2,5,10 semitones

Two sentences (same talker)• only voiced consonants • (with very few stops)

Task: write down target sentence

Replicates & extends Brokx & Nooteboom

Page 69: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Chimeric sentences(Bird & Darwin, Grantham 1998)

100-100 100-106 100-112 100-133 100-178

Fo below 800 Hz Fo above 800 Hz

0 1 2 5 10 semitones

Page 70: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Fo only in low or high freq. regions

0

20

40

60

80

100

0 2 4 6 8 10

Normal

Same Fo in High PassSame Fo in Low Pass

Fo difference (semitones)

40 Subjects40 Sentence Pairs

• all the action is in the low freq region (<800 Hz)

Page 71: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Segregating Fo-chimeric sentences

0

20

40

60

80

100

0 2 4 6 8 10

NormalFo-swappedSame Fo in High PassSame Fo in Low Pass

Fo difference (semitones)

40 Subjects40 Sentence Pairs

• inappropriate pairing of Fo only detrimental at/ above 4 semitones

• so across-formant grouping only important at/ above 4 semitones

Page 72: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Summary of Fo effects in separating competing voices

• Intelligibility increased by small Fo in F1 region...

• … but not by Fo in higher freq. region.

• Across-formant consistency of Fo only important at larger

Fo

Page 73: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Hi / Low complementarity

Intelligibility of competing voices increased in:

• Low frequencies: Fo differences allow better estimates of

F1

• High frequencies: spatial separation head-shadow ->

better S/N ratio (Bronkhorst & Plomp, 1988). But may not be

audible?

Page 74: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Harmonicity or regular spacing?

Roberts and Brunstrom: Perceptual coherence of complex tones (2001) J. Acoust. Soc. Am. 110

time

frequency

adjust

mistuned

Similar results for harmonicand for linearly frequency-shifted complexes

Page 75: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Auditory grouping and ICA / BSS

• Do grouping principles work because they provide some degree of stastistical independence in a time-frequency space?

• If so, why do the parametric values vary with the task?

Page 76: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Speech music

Page 77: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Speech music

Page 78: Auditory Scene Analysis Chris Darwin Need for sound segregation Ears receive mixture of sounds We hear each sound source as having its own appropriate.

Speech music