Perception + Vocal Tract Physiology November 25, 2014.

30
Perception + Vocal Tract Physiology November 25, 2014

Transcript of Perception + Vocal Tract Physiology November 25, 2014.

Perception + Vocal Tract Physiology

November 25, 2014

Stuff to Remember• The final homework is due on Thursday!

• Categorical perception

• The final interim course project report is due on Tuesday of next week.

• We’ll do our palatography demo on Tuesday of next week, as well.

• You won’t have to put up with me on Thursday!

• We have a mystery spectrogram to solve!

More Evidence for Modularity• It has also been observed that speech is perceived multi-modally.

• i.e.: we can perceive it through vision, as well as hearing (or some combination of the two).

• We’re perceiving “gestures”

• …and the gestures are abstract.

• Interesting evidence: McGurk Effect

McGurk Effect, revealedAudio Visual Perceived

ba + ga da

ga + ba bga

• Some interesting facts:

• The McGurk Effect is exceedingly robust.

• Adults show the McGurk Effect more than children.

• Americans show the McGurk Effect more than Japanese.

Original McGurk Data Auditory Visual

• Stimulus: ba-ba ga-ga

• Response types:

Auditory: ba-ba Fused: da-da

Visual: ga-ga Combo: gabga, bagba

Age Auditory Visual Fused Combo

3-5 19% 36 81 0

7-8 36 0 64 0

18-40 2 0 98 0

Original McGurk Data Auditory Visual

• Stimulus: ga-ga ba-ba

• Response types:

Auditory: ba-ba Fused: da-da

Visual: ga-ga Combo: gabga, bagba

Age Auditory Visual Fused Combo

3-5 57% 10 0 19

7-8 36 21 11 32

18-40 11 31 0 54

Audio-Visual Sidebar• Visual cues affect the perception of speech in non-mismatched conditions, as well.

• Scientific studies of lipreading date back to the early twentieth century

• The original goal: improve the speech perception skills of the hearing-impaired

• Note: visual speech cues often complement audio speech cues

• In particular: place of articulation

• However, training people to become better lipreaders has proven difficult…

• Some people got it; some people don’t.

Sumby & Pollack (1954)• First investigated the influence of visual information on the perception of speech by normal-hearing listeners.

• Method:

• Presented individual word tokens to listeners in noise, with simultaneous visual cues.

• Task: identify spoken word

• Clear:

• +10 dB SNR:

• + 5 dB SNR:

• 0 dB SNR:

Sumby & Pollack data

Auditory-Only Audio-Visual

• Visual cues provide an intelligibility boost equivalent to a 12 dB increase in signal-to-noise ratio.

Tadoma Method

• Some deaf-blind people learn to perceive speech through the tactile modality, by using the Tadoma method.

Audio-Tactile Perception• Fowler & Dekle: tested ability of (naive) college students to perceive speech through the Tadoma method.

• Presented synthetic stops auditorily

• Combined with mismatched tactile information:

• Ex: audio /ga/ + tactile /ba/

• Also combined with mismatched orthographic information:

• Ex: audio /ga/ + orthographic /ba/

• Task: listeners reported what they “heard”

• Tactile condition biased listeners more towards “ba” responses

Fowler & Dekle data

orthographic mismatch condition

tactile mismatch condition

read “ba”

felt “ba”

Another Piece of the Puzzle• Another interesting finding which has been used to argue for the “speech is special” theory is duplex perception.

• Take an isolated F3 transition:

and present it to one ear…

Do the Edges First!• While presenting this spectral frame to the other ear:

Two Birds with One Spectrogram

• The resulting combo is perceived in duplex fashion:

• One ear hears the F3 “chirp”;

• The other ear hears the combined stimulus as “da”.

Duplex Interpretation• Check out the spectrograms in Praat.

• Mann and Liberman (1983) found:

• Discrimination of the F3 chirps is gradient when they’re in isolation…

• but categorical when combined with the spectral frame.

• (Compare with the F3 discrimination experiment with Japanese and American listeners)

• Interpretation: the “special” speech processor puts the two pieces of the spectrogram together.

fMRI data• Benson et al. (2001)

• Non-Speech stimuli = notes, chords, and chord progressions on a piano

fMRI data• Benson et al. (2001)

• Difference in activation for natural speech stimuli versus activation for sinewave speech stimuli

Mirror Neurons• In the 1990s, researchers in Italy discovered what they called mirror neurons in the brains of macaques.

• Macaques had been trained to make grasping motions with their hands.

• Researchers recorded the activity of single neurons while the monkeys were making these motions.

• Serendipity:

• the same neurons fired when the monkeys saw the researchers making grasping motions.

• a neurological link between perception and action.

• Motor theory claim: same links exist in the human brain, for the perception of speech gestures

Moving On…• One important lesson to take from the motor theory perspective is:

• The dynamics of speech are generally more important to perception than static acoustic cues.

• Note: visual chimerism and March Madness.

Auditory Chimeras• Speech waveform + music spectrum:

• Music waveform + speech spectrum:

frequency bands

1 2 4 8 16 32

frequency bands

1 2 4 8 16 32

Source: http://research.meei.harvard.edu/chimera/chimera_demos.html

Originals:

Auditory Chimeras• Speech1 waveform + speech2 spectrum:

• Speech2 waveform + speech1 spectrum:

frequency bands

1 2 4 6 8 16

frequency bands

1 2 4 6 8 16

Originals:

Motor Theory, in a nutshell• The big idea:

• We perceive speech as abstract “gestures”, not sounds.

• Evidence:

1. The perceptual interpretation of speech differs radically from the acoustic organization of speech sounds

2. Speech perception is multi-modal

3. Direct (visual, tactile) information about gestures can influence/override indirect (acoustic) speech cues

4. Limited top-down access to the primary, acoustic elements of speech

Vocal Tract Physiology

November 25, 2014

The Toolkit• There are four primary active articulators in speech.

• (articulators we can move around )

1. The lips

2. The lower jaw (mandible)

3. The tongue

4. The velum

• The pharynx can also be constricted, to some extent.

• Separate sets of muscles control each articulator...

Articulatory Speed• The gold medal goes to the tongue tip...

• which is capable of 7.2 - 9.6 movements per second.

• The rest:

• Mandible 5.9 - 8.4 movements per second

• Back of tongue 5.4 - 8.9

• Velum 5.2 - 7.8

• Lips 5.7 - 7.7

• Note: lips can be raised and lowered faster than they can be protruded and rounded.

1. The Lips• The orbicularis oris muscle surrounds the lips.

• Contraction compresses and rounds the lips.

• A muscle called the mentalis also protrudes the lips.

• Contraction of the risorius muscle retracts the corners of the lips...

• and spreads them.

By the way...• The vowel [i] is typically produced with active lip spreading.

• “Say cheese!”

• What acoustic effect would this have?

• Lips Normal:

• Lips Spread:

• Check ‘em out in Praat.

2. The Jaw• Several different muscles are used to both lower and raise the mandible.

• Primary raisers:

• Masseter

• Temporalis

• Internal pterygoid

2. The Jaw• Several different muscles are used to both lower and raise the mandible.

• Lowerers:

• Anterior belly digastricus

• Geniohyoid

• Mylohyoid

• Note: in lowering, the mandible also retracts.