introauditorysystem
marcelomagnasco
woods hole, august 2009; c.a.b., octubre 2009
whatexactlyisthecolor“brown”?
our senses parse entire scenes at a time, separating attributes of the objects being looked at (albedo of surfaces) from the attributes of the scene (direction of incident light) from the attributes of our own relationship to the scene (“shiny” reflectivity) into separate percepts.
therefore concepts such as “color” end up being extremely complex because they are anchored in the relationship of an object to an entire scene to the viewer
wemixaudioandvideo
• localiza:onofsimultaneousclickandflash(whicharedisplacedfromoneanother)showwetrusthearing~1/3andvision~2/3
• manipula:onofvideostreamofhumanvocaliza:onshowswe“hear”differentsyllableswhenthevideoandaudioaremismatched
arichauditoryworldIheartherainpaHeringontheroofaboveme,drippingdownthewallstomyleJandright,splashingfromthedrainpipeatgroundlevelonmyleJ,whilefurtherovertotheleJthereisalighterpatchastherainfallsalmostinaudiblyuponalargeleafyshrub.Ontheright,itisdrumming,withadeeper,steadiersounduponthelawn.Icanevenmakeoutthecontoursofthelawn,whichrisestotherightinaliHlehill.Thesoundoftherainisdifferentandshapesoutthecurvatureforme.S:llfurtherto the right, I hear the rain sounding upon the fencewhich divides our propertyfromthatnextdoor.Infront,thecontoursofthepathandthestepsaremarkedout,right down to the garden gate. Here the rain is striking the concrete, here it issplashing into the shallow pools which have already formed. Here and there is alightcascadeasitdripsfromsteptostep.Thesoundonthepathisquitedifferentfromthesoundoftheraindrummingintothelawnontheright,andthisisdifferentagainfromtheblanketed,heavy,soddenfeelofthelargebushontheleJ.Furtherout, the sounds are less detailed. I can hear the rain falling on the road, and theswishofthecarsthatpassupanddown.IcanheartherushingofthewaterinthefloodedguHerontheedgeoftheroad.
JohnHull,TouchingtheRock
numbersgiveclues
• 200’000’000photoreceptorsintwore:nas• 100’000’000olfactoryreceptors• 20’000’000tac:le‐noci‐andpropio‐receptors
• 7’000innerhaircells
verydifferentdensityofinforma:oninbits/sec/cell
what’s in a voice?
the text the volume of the utterance the emotional stance the identity of the speaker the speaker’s accent the distance to the speaker the position of the speaker the orientation of the speaker an impression of the room
a multitude of percepts!
I can tell when other things are moving by the sounds they make. Cars swish past, feet patter along, leaves rustle, but a silent nature is immobile. So it is that, for me, the clouds do not move.
John Hull, Touching the Rock
we hear things that move
many sounds have survival and evolutionary importance: we perceive things that make noise
that includes living beings, especially predators, prey, offspring and mates,
and many inanimate things on which our survival hinges, such as fire, water, wind and earth.
why sounds wake us up
we can even hear things that do not make sounds
a current project: sound textures
sounds such as rain, fire crackling, brook babbling, windblown leaves
they are important because: • they have great survival (ecological) value, • we inattentively recognize them extremely fast, • they are powerful modulators of emotional state.
:meinvariance
/2 /4 normal speed *2 *4
introauditorypathways
thepathwayofsound
earlycodingandtransduc:on
thecochleaislikeacamera
the cochlear partition: the organ of Corti sits atop the basilar membrane
oval window sound waves enter
sound waves exit and dissipate
impenetrable boundary condition(bone)
basal end
apical end
round window
for acoustical purposes, the cochlea is a fluid filled cavity encased in bone, and separated into two distinct partitions by a membrane of exponentially varying stiffness
sound is focused by an acoustical lens onto a sound-sensitive film
thecochleaislikeacamera
high frequency
low frequency
the basilar membrane vibrates with different amplitudes in different places according to the frequency of sound
fluid mechanics can be formulated as a free-boundary problem with the b.m. as boundary—long-range kernel and other issues.
traveling wave—not standing wave!—from basal to apical
theearisac:ve
live cochlea
dead cochlea
both the cochlear high gain and the sharpness of its frequency tuning depend on a biological power supply. a power interruption (by pressing the carotid artery, for instance) reversibly abolishes both.
in addition, the cochlea displays nonlinear responses which are similarly abolished by interrupting the power supply. the discovery by rhodes of the active nonlinearity (1971) ushered a new era in auditory biophysics.
as far as we can gather, the active elements are the hair cells.
thecochleaislikeacamera
the acoustical lens is in the organ of Corti
the sound-sensitive film is also in the organ of Corti
in this camera, the lens and the film are one and the same: the hair cells of the organ of Corti are responsible for both sensitivity and frequency selectivity!
theorganofcor:
kandel
stereociliaarejoinedby:p
links
bechara kachar et al
tuningcurves
pick any observable—spike rates in fibers of the eighth nerve, velocity, suppression of a SOA line. Now, apply a sinusoidal input and vary the amplitude until the response reaches a prescribed level; record that amplitude as a function of the frequency and sweep the frequency.
I
f
many tuning curves (across species and observables) have a stereotyped shape:
cochleartuningcurve
ruggero ‘00
240
db/o
ctav
e= fr
eq40
=20
pole
s
fourcharacteris:csoftheac:veprocess
• gain• frequencyselec:vity• compressivenonlinearity
• spontaneousemissions
all four characteristics have been well documented in basilar-membrane motion from lower vertebrates to mammals.
no experimental manipulation has been able to abolish one without abolishing all.
geometryofhopf
neartheresonance
when is big then
on the contrary, exactly at the resonance,
which is an “essential nonlinearity”. V. M. Eguiluz , M. Ospeck , Y. Choe , A. J. Hudspeth and M. O. Magnasco, PRL 84 5232-5236 (2000)
thehopfresonance
cochlearvelocimetryhopf behaviour
nonlinearly compressive for all frequencies > center frequency!
ruggero et al ‘00
propaga:on+hopf=
parallel active elements version; the series version is more complex but also has more interesting behaviour
geometryoftheefferents
gaincontrol
along basilar membrane
amplitude of BM motion (log)
gain
spot having f as BF
best place to control
three:mesacochlea?The tympanic ear evolved three times independently from the ancestor fish ear. E.g., ossicles and collumella are not homologous. A number of convergent evolutionary features were evolved which indicates certain features are evolutionarily stable. Mammals, birds and lizards all evolved elongated papillae/cochleae and two types of hair cells.
Recent evidence suggests the two types of hair cells are actually homologous, and represents a specialization in two classes, one for sensing and the other one for amplifying.
from Manley, PNAS 2002
introauditorycoding
auditorynervefibers
the fibers of the eighth cranial nerve fire in response to sounds. each fiber has a frequency to which it is most sensitive. near the threshold of the fiber, the number of action potentials evoked by the stimulus increases with stimulus intensity, but it rapidly saturates.
for all stimuli intensity that evoke potentials, the fiber fires always at specific phases of the stimulus. they may not fire every cycle but in subharmonic lock, particularly at high frequencies. this phase lock has been shown in mammals up to 4 kHz, and up to 9 kHz in owls.
therefore any given fiber carefully encodes the driving frequency, and encodes very little information about intensity.
encodingintheauditorynerve
sopranos
• cherylstuder
• editagruberova
• florencefoster‐jenkins
michale fee
phaseandintensity
theauditorysystempreservesphaseinforma:onmuchmorecarefullythanintensityinforma:on:
• discrimina:oninintensity:1dB(26%changeinenergy)
• discrimina:oninfrequency:1/12thsemitone(0.5%changeinf).
• discrimina:oninheading:15degrees=20µs.(needsheadphones–echoesinterfere)
445.7
441.9
12% v 26% E
26% v 58% E
€
χ(ω,t) = eiω(t− t ' )∫ e−( t− t ' )2
2σ 2 x(t') dt '
theshort:meFouriertransform(a.k.a. Gabor transform, a.k.a. single wavelet)
Fourier
time localiz
signal
amplitude
phase
theshort:meFouriertransform
1/σ
σ
ω
t
the stft depends only on nearby data: a neighbourhood of size σ in time, and of size 1/σ in frequency. therefore all nearby points are strongly correlated, and so the amplitude of the stft appears to be “out of focus”. making σ smaller only succeeds, of course, in making 1/σ bigger, so the area of the “grain” is constant.
simplest thing one can do with the phase is take derivatives
instantaneous:me‐frequency
instantaneous:mepicture
simplest thing one can do with the phase is take derivatives
now that we have these two estimates, the easiest thing to do is to plot one against the other and dispense with (w,t) entirely!
instantaneous:me‐frequency
€
ω ins(ω,t) =∂φ∂t
tins(ω,t) = t − ∂φ∂ω
simpleelements
€
x = eiω 0t x = δ(t − t0) x = eiαt2 / 2
ω ins(ω,t) =∂φ∂t
ω0 ω αtins
tins(ω,t) = t − ∂φ∂ω
t t0 ...
any linear element in the time-frequency plane, like a tone, a click, or a linear frequency sweep (calculation mercifully spared and available upon request) is mapped by the instantaneous reassignment to a one-dimensional, perfectly thin line. meanwhile, the original stft is still out of focus, with this line blurred by (σ,1/σ). no linear transform has this property. only the (bilinear) wigner-ville distribution localizes these signals exactly.
the implications for the fourier uncertainty theorem are clear. uncertainty in the time-frequency plane refers to resolution, i.e., the ability to distinguish two objects as distinct. any single object can be tracked in both frequency and time to arbitrary accuracy.
K. Kodera, R. Gendrin, and C. de Villedary, ``Analysis of time-varying signals with small BT values”, IEEE Trans. ASSP, 26.1 64-76 (1978). F. Auger and P. Flandrin, ``Improving the readability of time-frequency and time-scale representations by the reassignment method,'' IEEE Trans. on Signal Proc. {\bf 43}, 1068-1089 (1995). E. Chassande-Mottin, I. Daubechies, F. Auger, P. Flandrin, ``Differential reassignment,” IEEE Signal Proc. Lett., 410, 293-294 (1997).
themap
ωins
t
tins
ω
the instantaneous estimates define a transformation
ωins
t
tins
ω
the instantaneous estimates define a transformation taking points from ωt to points in ωtins
ωins
t
tins
ω
taking a grid of points from ωt to a distorted grid of points in ωtins
ωins
t
tins
ω
taking a grid of points from ωt to a distorted grid of points in ωtins, which can then be put on a two-dimensional histogram
ωins
t
tins
ω
taking a grid of points from ωt to a distorted grid of points in ωtins, which can then be put on a two-dimensional histogram and counted
remappingif one takes a fine grid of points, maps it through the instantaneous time map, and then “histograms” the imaged points, one obtains a measure µ (or better stated, the derivative of µ wrt lebesgue (λ), which is of course singular):
if we denote by γ the measure weighted by the STFT
then we can also define
which is equivalent to making a histogram of the points in the grid, mapped by the it map, and weighted by the stfft at the original point.
whitenoise
• attheoppositeendof“singlelinearelements”wehavewhitenoise:astreamofuncorrelatedgaussianrandomnumbers.
• whitenoiseisthe“densest”signal:i.e.,there’sstuffhappeningatall:mesandallfrequencies.
• whiletheexpecta:onvalueofthespectrumofwhitenoiseisaconstant,theactualspectrumforanygivenrealiza:onhasstochas:cvaluesandfluctuatesasmuchinfrequencydomainasitdoesinthe:medomain.
zebra finch, regular sonogram
zebra finch, reassigned multiband
?
Top Related