Speech Perception Richard Wright Linguistics 453.
-
date post
21-Dec-2015 -
Category
Documents
-
view
225 -
download
3
Transcript of Speech Perception Richard Wright Linguistics 453.
Speech PerceptionSpeech Perception
Richard Wright
Linguistics 453
Class OverviewClass Overview
PhysiologyAuditory Shaping of the signalAuditory CuesNormalization and ContextExperiment types
Physiology 1: The EarPhysiology 1: The Ear
Outer: Pinna, Ear Canal, Ear Drum
Middle: Ossicles, Oval Window
Inner: Cochlea — Basilar Membrane, Tectorial Membrane, Hair Cells
ear canal (external auditory meatus)
ear drum(tympanic membrane)
ossicular chain
pinna
cochlea
auditory nerve
oval window
Physiology 1: The Outer EarPhysiology 1: The Outer Ear
Pinna: directional hearingEar Canal: high frequency emphasis
(very short resonator closed at one end)Ear Drum: membrane’s vibrations
convert pressure fluctuations to mechanical movement
Physiology 1: The Middle EarPhysiology 1: The Middle Ear
Convert eardrum movement to movement of oval window — overcomes air to fluid impedance.
Lower frequency emphasis (500-4000 Hz)
Lessen impact of very loud noises by stiffening (damping)
Ossicles (Malleus, Incus, Stapes):
Physiology 1: The Inner EarPhysiology 1: The Inner EarCochlea: fluid filled cavity, wave propagation
in fluid caused by movement of oval window
Basilar Membrane:stiff and narrow at base — wide and flaccid at apex: base = high frequencies and apex = low frequencies (acts like series of band pass filters). Most of membrane is devoted to sounds below 5000 Hz.
Shearing between Basilar and Tectorial
membranes displace hair cells exciting cochlear nerve endings
Physiology 2: Nerual PathwayPhysiology 2: Nerual Pathway
Cochlear NerveCochlear NucleusLateral LemniscusAuditory Cortex
Superior olive
Medial geniculate
CortexAuditory raditaions
Lateral lemniscus
Inferior coliculus
Probst
Monakow
Held
Cochlear nerve
Mid-line
CIC
Cochlearnucleus
Auditory Shaping of the SignalAuditory Shaping of the Signal
Frequency Selectivity: Changes in frequency of stimulus do not result in equivalent changes in sensitivity
Non-linear loudness sensitivityPhase Locking and noise reductionLateral Inhibition and TuningOnsets and neural spikes
Bark function
0
2
4
6
8
10
12
14
16
18
0 1000 2000 3000 4000 5000
Hz
Frequency SelectivityFrequency Selectivity
rapid
adaptation
short term
adaptation
consonant release transient
formant transitions
steady state (saturated response)
schematic of
speech signal
F2
F1
spontaneous level
of fiber
Onset AdvantageOnset Advantage
Delgutte and Kiang (1984)
What are Cues?What are Cues?
Cues: information in the signal that listeners use in recovering the segmental content of the utterance– Place cues– Manner cues– Voicing cues– Vowel quality cues
Distribution of CuesDistribution of Cues
F3
F2
F1
stop release burst
fricative noise
F2 transitions nasal pole and zero
Place cues
Distribution of CuesDistribution of Cues
Manner cues
F3
F2
F1
stop release burst
fricative noise nasal pole and zero
abruptness and
degree of attenuation
slope of formant
transitions
nasalization
of vowel
Distribution of CuesDistribution of Cues
Voicing cues
F3
F2
F1
release burst amplitude
aspiration noise
vowel
duration
vowel duration
VOT
periodicity
stricture
duration
Stop release bursts are very brief and difficult to recover: stops rely on formant transition cues
Distribution of CuesDistribution of Cues
Stop release bursts are very brief and difficult to recover: stops rely on formant transition cues
Fricative noise, particularly sibilant, contains robust cues: fricatives may be recovered in the absence of formant transitions
Distribution of CuesDistribution of Cues
Stop release bursts are very brief and difficult to recover: stops rely on formant transition cues
Fricative noise, particularly sibilant, contains robust cues: fricatives may be recovered in the absence of formant transitions
Nasals contain strong manner cues but weak place cues
Distribution of CuesDistribution of Cues
Onset AdvantageOnset Advantage
Redundancy advantage:Onset stops automatically have both a releaseburst and a set of formant transitions
Coda stops may be unreleased and thereforehave less cue redundancy
Onset AdvantageOnset Advantage
Onset consonant with flanking vowels
F2 Transitions
F2 Transition
Release burst
Abrupt attenuation
Abrupt attenuation
VOT
Vowellength
Vowellength
Constriction duration
Experimental TasksExperimental Tasks
IdentificationDiscriminationRatingMethod of Adjustment (MOA)
Exp.Tasks 1: IdentificationExp.Tasks 1: Identification
Listeners are asked to identify stimuli as speech sounds...
Open set: options openForced choice: listeners choices
constrained
Experiment 1: Onset vs CodaExperiment 1: Onset vs Coda
Stimuli– male speaker of American English– /ba, da, ga, ab, ad, ag/ bursts excised– 16 bit, 22 kHz– mixed in three levels of white noise:
• no noise
• noise at 2 dB above RMS of signal
• noise at 2 dB below RMS of signal
Experiment 1: Onset vs CodaExperiment 1: Onset vs Coda
Task– onsets & codas mixed and randomized– presented binaurally over headphones– 3 way forced choice task: “B D G”– labeled button press– self paced
Exp.Tasks 2: DiscriminationExp.Tasks 2: Discrimination
Listeners are asked to respond “same” or “different” to presented sets of stimuli
AX discrimination: fixed initial stimulus, variable second stimulus (same/different)
ABX discrimination: two fixed initial stimuli, variable third stimulus (same A, same B)
Experiment 2: vowel discriminationExperiment 2: vowel discrimination
Stimuli– Synthetic vowel continuum– Equal steps: 2.37 Bark along F1-F2
dimension– 16 bit, 11 kHz– variable AX design
Task– same/different response to vowel pairs– presented binaurally over headphones– labeled button press– speeded (limited time to decide)
Experiment 2: vowel discriminationExperiment 2: vowel discrimination
Exp.Tasks 3: RatingsExp.Tasks 3: Ratings
Listeners are asked to rate a stimulus in some way: goodness, similarity, accentedness
Example: Effect of intonational contour on naturalness: listeners hear sentences with and without f0 contour and rate naturalness on a 1-5 scale.
Exp.Tasks 4: MOAExp.Tasks 4: MOA
Listeners are asked to adjust a stimulus along some dimensions until it fits some criterion: matches another stimulus, sounds most natural, matches a category, etc. (can be identification, discrimination, or rating exp.)
Advantages and shortcomings 1Advantages and shortcomings 1
Open identification– Good: most natural, subjects understand
– Bad: time consuming, little control of variables, stats difficult (non-comparable resoponses across subjects
Forced choice identification– Good: less time consuming, control of response variables
– Bad: not as natural
Advantages and shortcomings 2Advantages and shortcomings 2
Discrimination– Good: allows experimenter to map relationship between
classification and discrimination
– Bad: very time consuming, not at all natural, unintuitive to subjects
Advantages and shortcomings 3Advantages and shortcomings 3
Rating– Good: allows experimenter to map preferences in a
multidimensional space, allows for correlation between one or more aspects of stimulus
– Bad: hard to control interactions between preferences and stimulus variables, not that natural
Advantages and shortcomings 4Advantages and shortcomings 4
Method of adjustment (MOA)– Good: much quicker method of mapping multidimensional
perceptional
– Bad: not natural, complex interaction of stimulus variables