Tutorial on Auditory Scene Analysis Perception and...

50
Tutorial on Auditory Scene Analysis Perception and Physiology Tutorial on Tutorial on Auditory Scene Analysis Auditory Scene Analysis Perception and Physiology Perception and Physiology Shihab Shamma Institute for Systems Research Electrical and Computer Engineering University of Maryland College Park With additional contribution from Christophe Micheyl Research Laboratory of Electronics Massachusettes Institue of Technology Shihab Shamma Shihab Shamma Institute for Systems Research Electrical and Computer Engineering University of Maryland College Park With additional contribution from With additional contribution from Christophe Micheyl Christophe Micheyl Research Laboratory of Electronics Research Laboratory of Electronics Massachusettes Institue of Technology Massachusettes Institue of Technology

Transcript of Tutorial on Auditory Scene Analysis Perception and...

  • Tutorial on Auditory Scene Analysis

    Perception and Physiology

    Tutorial on Tutorial on Auditory Scene AnalysisAuditory Scene Analysis

    Perception and PhysiologyPerception and Physiology

    Shihab ShammaInstitute for Systems Research

    Electrical and Computer EngineeringUniversity of Maryland College Park

    With additional contribution from

    Christophe MicheylResearch Laboratory of Electronics

    Massachusettes Institue of Technology

    Shihab ShammaShihab ShammaInstitute for Systems Research

    Electrical and Computer EngineeringUniversity of Maryland College Park

    With additional contribution from With additional contribution from

    Christophe MicheylChristophe MicheylResearch Laboratory of ElectronicsResearch Laboratory of Electronics

    Massachusettes Institue of TechnologyMassachusettes Institue of Technology

  • An auditory sceneAn auditory sceneAn auditory scene

    Frequency

    Time

  • FrequencySinging voice

    Time

  • Two classes of ASA processesTwo classes of ASA processesTwo classes of ASA processes

    Time

    Frequency

    Simultaneous processes

    Sequentialprocesses

  • Schema-basedtop-down

    under attentional controldependent upon learning

    SchemaSchema--basedbasedtoptop--downdown

    under under attentionalattentional controlcontroldependent upon learningdependent upon learning

    Primitivebottom-up

    automaticnot dependent on learning

    PrimitivePrimitivebottombottom--upup

    automaticautomaticnot dependent on learningnot dependent on learning

    schemasobjects

    objects

    stimulus stimulus

  • Schema-basedtop-down

    under attentional controldependent upon learning

    SchemaSchema--basedbasedtoptop--downdown

    under under attentionalattentional controlcontroldependent upon learningdependent upon learning

    Primitivebottom-up

    automaticnot dependent on learning

    PrimitivePrimitivebottombottom--upup

    automaticautomaticnot dependent on learningnot dependent on learning

    schemasobjects

    objects

    stimulus stimulus

  • OutlinePart I: Psychoacoustics of ASA

    Part II: Neural Correlates of two-tone Streaming

    ===========================

    Two excellent references:A Bregman’s book (1990) - Auditory Scene AnalysisBJC Moore & H Gockel a recent review (2002) “Factors influencing

    sequential stream segregation”, Acta Acustica (88) 320-332

  • Outline of Part IOutline of Part IOutline of Part I

    Sequential ASA processes: streaming•(What is it?) The perceptual phenomenon •(How does it work?) Theories and computational models•(How does it really work?) Neural mechanisms•(What’s it good for?) Relationships with other aspects of perception

    Simultaneous ASA processes: hearing out concurrent sounds•The identification of concurrent vowels•Concurrent harmonic complexes: the role of frequency selectivity

    Sequential ASA processes: streamingSequential ASA processes: streaming••(What is it?)(What is it?) The perceptual phenomenon The perceptual phenomenon ••(How does it work?)(How does it work?) Theories and computational modelsTheories and computational models••(How does it really work?)(How does it really work?) Neural mechanismsNeural mechanisms••(What(What’’s it good for?) s it good for?) Relationships with other aspects of perceptionRelationships with other aspects of perception

    Simultaneous ASA processes: hearing out concurrent Simultaneous ASA processes: hearing out concurrent soundssounds••The identification of concurrent vowelsThe identification of concurrent vowels••Concurrent harmonic complexes: the role of frequency selectivityConcurrent harmonic complexes: the role of frequency selectivity

  • Auditory streaming

    What is it?

    Description and demonstration of the phenomenon

    Auditory streamingAuditory streaming

    What is it?What is it?

    Description and demonstration of Description and demonstration of the phenomenonthe phenomenon

  • Miller & Heise (1950), Bregman & Campbell (1971), … Bregman (1990), …

    Frequency

    Time

    A

    B

    A

    B

    A

    B

    A

    B

    ……

    dF

  • Frequency

    Time

    A

    B

    A

    B

    A

    B

    A

    B

    ……

    “1 stream of sounds jumping up and down

    in pitch”

  • Frequency

    A

    B

    A

    B

    A

    B

    A

    B

    dF

    Time

  • Frequency

    Time

    A A

    B B

    A A

    B B …

    “2 streams,one high, one low”

    Note: you can only attend to one stream at a time

  • Frequency

    A

    B

    A A

    B

    A…

    Time

  • Frequency

    Time

    A

    B

    A A

    B

    A…

    “1 streamwith a galloping

    rhythm”

  • “2 streams,one high and slow, the

    other low and fast”

    Frequency

    BB

    Time

    A A A A…

    Note: when streamed, the relative timing between A and B tonesbecomes less important.

  • Streaming also depends on temporal parameters

    Streaming also depends on Streaming also depends on temporal parameterstemporal parameters

    Frequencydt

    A

    B

    A…

    A A

    B B B

    Time

    Slow Fast

  • Streaming also depends on connectedness (continuation)

    Streaming also depends on Streaming also depends on connectednessconnectedness (continuation)(continuation)

    Frequency

    A A A A

    B B B B

    B

    Time

    A A A A

    B B B …

  • Dependence of streaming on stimulus parametersDependence of streaming on stimulus parametersDependence of streaming on stimulus parameters

    ABA_ stimulus spectrogramABA_ stimulus spectrogramABA_ stimulus spectrogram

    dF

    dT

    always2 streams

    1 or 2 streams

    always1 stream

    Tone repetition rate

    fission boundary

    temporal coherence boundary

    After: van Noorden (1975)

    Fast Slow

  • Streaming

    How does it work?

    Theories and computational models

    StreamingStreaming

    How does it work?How does it work?

    Theories and Theories and computational modelscomputational models

  • The channeling theoryHartmann and Johnson (1991) Music Percept.

    The channeling theoryThe channeling theoryHartmann and Johnson (1991) Music Percept.Hartmann and Johnson (1991) Music Percept.

    Peripheral auditory filtersLevel

    Frequency

  • The channeling theoryHartmann and Johnson (1991) Music Percept.

    The channeling theoryThe channeling theoryHartmann and Johnson (1991) Music Percept.Hartmann and Johnson (1991) Music Percept.

    “1 stream”

    Level

    AB Frequency

  • The channeling theoryHartmann and Johnson (1991) Music Percept.

    The channeling theoryThe channeling theoryHartmann and Johnson (1991) Music Percept.Hartmann and Johnson (1991) Music Percept.

    “2 streams”

    Level

    A B Frequency

  • Beauvois & Meddis’s modelBeauvois and Meddis (1996) J. Acoust. Soc. Am.

    Computer simulation of auditory stream segregation in alternating-tone sequence

    Beauvois & MeddisBeauvois & Meddis’’s models modelBeauvois and Meddis (199Beauvois and Meddis (19966) J. Acoust. Soc. Am.) J. Acoust. Soc. Am.

    Computer simulation of auditory stream segregation in Computer simulation of auditory stream segregation in alternatingalternating--tone sequencetone sequence

  • McCabe & Denham’s modelMcCabe and Denham (1997) J. Acoust. Soc. Am.

    A model of auditory streaming

    McCabe & DenhamMcCabe & Denham’’s models modelMcCabe and Denham (1997) J. Acoust. Soc. Am.McCabe and Denham (1997) J. Acoust. Soc. Am.

    A model of auditory streamingA model of auditory streaming

  • Is peripheral chanelling the whole story?Is peripheral chanelling the whole story?Is peripheral chanelling the whole story?

  • Sounds that excite the same peripheral channels can yield streaming

    Vliegen & Oxenham (1999)Vliegen, Moore, Oxenham (1999)

    Grimault, Micheyl, Carlyon et al. (2001)Grimault, Bacon, Micheyl (2002)Roberts, Glasberg, Moore (2002)

    ...

    Sounds that excite the same peripheral Sounds that excite the same peripheral channels can yield streamingchannels can yield streaming

    Vliegen & Oxenham (1999)Vliegen & Oxenham (1999)Vliegen, Moore, Oxenham (1999)Vliegen, Moore, Oxenham (1999)

    Grimault, Micheyl, Carlyon et al. (2001)Grimault, Micheyl, Carlyon et al. (2001)Grimault, Bacon, Micheyl (2002)Grimault, Bacon, Micheyl (2002)Roberts, Glasberg, Moore (2002)Roberts, Glasberg, Moore (2002)

    ......

  • Streaming with complex tonesStreaming with complex tonesStreaming with complex tones

    Amplitude

    F0400Hz 800Hz 1200Hz …

    F0

    FrequencyF0150Hz

    300Hz450Hz …

  • Spectral Grouping or “Fusion” of Harmonics

    Mistuning a harmonic

    • Fusion is found in humans and many animals alike• Fusion also breaks with onset mismatches

  • Streaming based on F0 differencesStreaming based on F0 differencesStreaming based on F0 differences

    Frequency

    TimeA

    B

    A A

    B

    A …

    Frequency

    TimeA

    B

    A A

    B

    A …

    F0

    TimeA

    B

    A A

    B

    A …

    Musical melodies also stream

    Telemann

  • Auditory spectral excitation pattern evoked by bandpass-filtered harmonic complex

    Auditory spectral excitation pattern Auditory spectral excitation pattern evoked by bandpassevoked by bandpass--filtered harmonic complexfiltered harmonic complex

    1000 2000 3000 4000 5000 60005

    10

    15

    20

    25

    30

    35

    40

    45

    Leve

    l (dB

    )

    Frequency (Hz)

    F0 = 400Hz

  • Auditory spectral excitation pattern evoked by bandpass-filtered harmonic complex

    Auditory spectral excitation pattern Auditory spectral excitation pattern evoked by bandpassevoked by bandpass--filtered harmonic complexfiltered harmonic complex

  • Auditory spectral excitation pattern evoked by bandpass-filtered harmonic complex

    Auditory spectral excitation pattern Auditory spectral excitation pattern evoked by bandpassevoked by bandpass--filtered harmonic complexfiltered harmonic complex

  • Auditory spectral excitation pattern evoked by bandpass-filtered harmonic complexAuditory spectral excitation pattern Auditory spectral excitation pattern evoked by bandpassevoked by bandpass--filtered harmonic complexfiltered harmonic complex

    F0A=100Hz F0B= F0A+1.5oct = 283Hz

    Small ∆FAB

    Large ∆FAB

  • F0-based streaming with unresolved harmonics is possibleVliegen & Oxenham; Vliegen, Moore, Oxenham (1999)

    Grimault, Micheyl, Carlyon et al. (2000)

    but the effect is weaker than with resolved harmonicsGrimault, Micheyl, Carlyon et al. (2000)

    F0F0--based streaming with unresolved harmonics is possiblebased streaming with unresolved harmonics is possibleVliegen & Oxenham; Vliegen, Moore, Oxenham (1999)Vliegen & Oxenham; Vliegen, Moore, Oxenham (1999)

    Grimault, Micheyl, Carlyon Grimault, Micheyl, Carlyon et al.et al. (2000)(2000)

    but the effect is weaker than with resolved harmonicsbut the effect is weaker than with resolved harmonicsGrimault, Micheyl, Carlyon Grimault, Micheyl, Carlyon et al.et al. (2000)(2000)

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    -6 0 6 12 18F0 difference (semitones)

    Pro

    babi

    lity

    of "2

    stre

    ams"

    resp

    onse

    Low region

    High region

    F0(A) = 250 Hz

    From: Grimault et al. (2000) JASA 108, 263-

    Unresolved

    Unresolved

    Resolved Resolved

  • Phase-based streamingRoberts, Glasberg, Moore (2002)

    PhasePhase--based streamingbased streamingRoberts, Glasberg, Moore (2002)Roberts, Glasberg, Moore (2002)

    Harmonics insine phaseφ(n)=0

    Harmonics in alternating-phaseφ(n)=0 for odd n φ(n)=90 for even n

  • Streaming Based on Timbre

    Ripple A Ripple B A-B-ADifferent Spectral Envelopes

    A B A

  • What is it good for ?Organizing auditory scenes into different sources:

    • foreground-background• parsing speakers and speech• ignoring distractions

    Harmonic SegregationFM HarmonicsContinuity Illusion Ignoring Distractions

  • •• The formation of auditory streams is determined The formation of auditory streams is determined partly by peripheral frequency selectivitypartly by peripheral frequency selectivity

    •• Streaming may be produced by sounds that excite the Streaming may be produced by sounds that excite the samesameperipheral channelsperipheral channels

    •• What matters is the What matters is the perceptual differenceperceptual difference between the between the streamed soundsstreamed sounds

    •• Perceptual differencePerceptual difference is created by simultaneous is created by simultaneous (primative) processes: Harmonicity; Onset and offset (primative) processes: Harmonicity; Onset and offset detection; Analysis of spectral shape.detection; Analysis of spectral shape.

    •• Curiously Curiously ……. . Sound Sound localizationlocalization (e.g., ITD) does not behave (e.g., ITD) does not behave as a primitive process as a primitive process

    Interim SummaryInterim Summary

  • Neural Correlates

    of

    Two-Tone Streaming

  • A basic pre-requisite for any neural correlate of streaming:

    depend on both dF and dT

    A basic preA basic pre--requisite for any neural correlate of requisite for any neural correlate of streaming: streaming:

    depend on both dF and dTdepend on both dF and dT

    dF

    dT

    always2

    streams1 or 2 streams

    always1 stream

    Tone repetition rate

    fission boundary

    temporal coherence boundary

  • Single/few/multi-unit intra-cortical recordingsMonkeys: Fishman et al. (2001) Hear. Res. 151, 167-187

    Bats: Kanwal, Medvedev, Micheyl (2003) Neural Networks

    Single/few/multiSingle/few/multi--unit intraunit intra--cortical recordingscortical recordingsMonkeys: Monkeys: Fishman et al. (2001) Hear. Res. 151, 167Fishman et al. (2001) Hear. Res. 151, 167--187187

    Bats:Bats: Kanwal, Medvedev, Micheyl (2003) Neural NetworksKanwal, Medvedev, Micheyl (2003) Neural Networks

    At low repetition rates,units respond to both on- and off-BF tones

    At low repetition rates,At low repetition rates,units respond to both units respond to both onon-- and offand off--BF tonesBF tones

    At high repetition rates, only on-BF tone response

    is visible

    At high repetition rates, At high repetition rates, only ononly on--BF tone responseBF tone response

    is visibleis visible

  • Maybe, but:Maybe, but:Maybe, but:That neural responses in auditory cortex depend both

    on ∆F and ∆T is hardly a surpriseThis is insufficient evidence for the fact that streaming is

    relfected in neural responses in the auditory cortex

    A much more convinving correlate of streaming would be obtained if neural responses were shown to

    co-vary with the perceptwhile the physical stimulus remains unchanged

    ...

    That neural responses in auditory cortex depend both That neural responses in auditory cortex depend both on on ∆∆F and F and ∆∆T is hardly a surpriseT is hardly a surprise

    This is insufficient evidence for the fact that streaming is This is insufficient evidence for the fact that streaming is relfected in neural responses in the auditory cortexrelfected in neural responses in the auditory cortex

    A much more convinving correlate of streaming would A much more convinving correlate of streaming would be obtained if neural responses were shown to be obtained if neural responses were shown to

    coco--vary with the vary with the perceptperceptwhile the physical stimulus remains unchangedwhile the physical stimulus remains unchanged

    ......

  • Ambiguous stimuli, bi-stable perceptsAmbiguous stimuli, biAmbiguous stimuli, bi--stable perceptsstable perceptsNecker’s cubeNeckerNecker’’s cubes cube Rubin’s vase-facesRubinRubin’’s vases vase--facesfaces

    have been used successfully in the pastto demonstrate single-unit correlates of visual percepts

    (not just stimulus parameters)e.g., Logothetis & Schall (1989) Science

    Leopold & Logothetis (1996) Nature

    have been used successfully in the pasthave been used successfully in the pastto demonstrate singleto demonstrate single--unit correlates of visual perceptsunit correlates of visual percepts

    (not just stimulus parameters)(not just stimulus parameters)e.g., Logothetis & Schall (1989) Sciencee.g., Logothetis & Schall (1989) Science

    Leopold & Logothetis (1996) NatureLeopold & Logothetis (1996) Nature

    Necker's cube

  • The build-up of auditory streaming:a systematic change in the auditory percept over time

    during prolonged listening to repeating sequences

    The buildThe build--up of auditory streaming:up of auditory streaming:a systematic change in the auditory percept over timea systematic change in the auditory percept over time

    during prolonged listening to repeating sequencesduring prolonged listening to repeating sequences

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    0 1 2 3 4 5 6 7 8 9Time (s)

    Pro

    babi

    lity

    '2 s

    tream

    s' re

    spon

    se

    1 ST3 ST6 ST9 ST

    ST: semitone

  • The break-down of apparent motionThe breakThe break--down of apparent motiondown of apparent motionWertheimer (1912), Anstis et al. (1985)Wertheimer (1912), Anstis et al. (1985)Wertheimer (1912), Anstis et al. (1985)

    A

    B

    A

    B

    fast rates or large distances:two dots lit alternately

    fast rates or large distances:fast rates or large distances:two dots lit alternatelytwo dots lit alternately

    slow rates & small distancesone dot moving

    slow rates & small distancesslow rates & small distancesone dot movingone dot moving

    intermediate parameters:apparent movement at first,

    then steady dots

    intermediate parameters:intermediate parameters:apparent movement at first,apparent movement at first,

    then steady dotsthen steady dots

  • Explanations for perceptual breakdown/buildup effectsExplanations for perceptual Explanations for perceptual breakdown/buildup effectsbreakdown/buildup effects

    Neurophysiological explanationNeural adaptation of coherence/pitch-motion detectors

    (Anstis & Saida, 1985)

    « Cognitive » explanationThe default is integration (1 stream);

    the brain needs to accumulate evidence that there is more than 1 streambefore declaring « 2 streams »

    (Bregman, 1978, 1990,…)

    Other explanations coming up …

    Neurophysiological explanationNeurophysiological explanationNeural adaptation of coherence/pitchNeural adaptation of coherence/pitch--motion detectors motion detectors

    (Anstis & Saida, 1985)(Anstis & Saida, 1985)

    «« CognitiveCognitive »» explanationexplanationThe default is integration (1 stream);The default is integration (1 stream);

    the brain needs to accumulate evidence that there is more than 1the brain needs to accumulate evidence that there is more than 1 streamstreambefore declaring before declaring «« 2 streams2 streams »»

    (Bregman, 1978, 1990,(Bregman, 1978, 1990,……))

    Other explanations coming up Other explanations coming up ……

  • Alternate Models & Experimental Paradigms

    • It is essential that neural recordings and perception occur simultaneously

    • Human fMRI and MEG studies are valuable - up to a point!

    • Animal studies are physiologically versatile - but introspective behavioral measure are not an option!

    Therefore, we critically need …1. Cortical representations of perception that integrate both spectral and

    dynamic features - to account for all perceptual distances

    2. Objective psychoacoustic measures to facilitate animal experimentation

    3. Characterization of adaptive processes during perception

  • Spectro-Temporal Models of Streaming

  • Cortical Physiology and Auditory ComputationsJonathan Fritz, Didier Depireux, David KleinJonathan Simon

    Acknowledgment

    Auditory Speech and Music ProcessingTaishih Chi, Mounya ElHilali, Powen Ru, Nima Masgarani

    Supported by:MURI # N00014-97-1-0501 from the Office of Naval Research# NIDCD T32 DC00046-01 from the NIDCD# NSFD CD8803012 from the National Science Foundation

    Tutorial on Auditory Scene AnalysisPerception and PhysiologyAn auditory sceneTwo classes of ASA processesOutline of Part IAuditory streamingWhat is it?Description and demonstration of the phenomenonStreaming also depends on temporal parametersStreaming also depends on connectedness (continuation) Dependence of streaming on stimulus parametersStreamingHow does it work?Theories and computational modelsThe channeling theoryHartmann and Johnson (1991) Music Percept.The channeling theoryHartmann and Johnson (1991) Music Percept.The channeling theoryHartmann and Johnson (1991) Music Percept.Beauvois & Meddis’s modelBeauvois and Meddis (1996) J. Acoust. Soc. Am.Computer simulation of auditory stream segregation inMcCabe & Denham’s modelMcCabe and Denham (1997) J. Acoust. Soc. Am.A model of auditory streamingIs peripheral chanelling the whole story?Sounds that excite the same peripheral channels can yield streamingVliegen & Oxenham (1999)Vliegen, Moore, Oxenham (1999)GStreaming with complex tonesStreaming based on F0 differencesAuditory spectral excitation pattern evoked by bandpass-filtered harmonic complexF0-based streaming with unresolved harmonics is possible Vliegen & Oxenham; Vliegen, Moore, Oxenham (1999) Grimault, MicheylPhase-based streamingRoberts, Glasberg, Moore (2002)A basic pre-requisite for any neural correlate of streaming: depend on both dF and dTSingle/few/multi-unit intra-cortical recordingsMonkeys: Fishman et al. (2001) Hear. Res. 151, 167-187 Bats: Kanwal, MedvedevMaybe, but:Ambiguous stimuli, bi-stable perceptsThe build-up of auditory streaming:a systematic change in the auditory percept over timeduring prolonged listening to repeatThe break-down of apparent motionExplanations for perceptual breakdown/buildup effects

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile () /PDFXOutputCondition () /PDFXRegistryName (http://www.color.org) /PDFXTrapped /Unknown

    /Description >>> setdistillerparams> setpagedevice