Phonetic Context Effects Major Theories of Speech Perception Motor Theory: Specialized module (later...

Phonetic Context EffectsPhonetic Context Effects

Major Theories of Speech Major Theories of Speech PerceptionPerception

Motor Theory: Specialized module (later version) represents speech sounds in terms of intended gestures through a model of or knowledge of vocal tracts

Direct Realism: Perceptual system recovers (phonetically-relevant) gestures by picking up the specifying information in the speech signal.

General Approaches: Speech is processed in the same way as other sounds. Representation is a function of the auditory system and experience with language.

Explanatory level = gesture Explanatory level = sound

Fluent Speech ProductionFluent Speech Production

adjacent speech adjacent speech becomes more similarbecomes more similar

The vocal tract is subject The vocal tract is subject to physical constraints...to physical constraints...

Mass

Inertia

Coarticulation = AssimilationCoarticulation = Assimilation

Radical Context DependencyRadical Context Dependency

Also a result of the motor plan

An ExampleAn ExamplePlace of Articulation in stops Place of Articulation in stops

Say /da/ Say /ga/

Anterior Posterior


Say /al/ Say /ar/

Anterior Posterior


Say /al da/Say /ar da/

Say /al ga/Say /ar ga/

Place of articulation changes = Coarticulation


Say /ar da/Say /al ga/

Coarticulation Coarticulation has acoustical has acoustical consequencesconsequences

/al da/

/ar da/

/al ga/

/ar ga/

*

*

ft

How does the listener deal with this?How does the listener deal with this?

Speech PerceptionSpeech Perception

/ar//al/

/ga/ /da/

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7

none

al

ar

Identifying in ContextP

erce

nt

“g”

Res

po

nse

s

[ga] [da]

/al/

/ar/

Per

cen

t “g

” R

esp

on

ses

Direction of Effect

ProductionProduction

/al//al/ More /da/-likeMore /da/-like

/ar//ar/ More /ga/-likeMore /ga/-like

PerceptionPerception

/al//al/ More /ga/-likeMore /ga/-like

/ar//ar/ More /da/-likeMore /da/-like

Perceptual Compensation Perceptual Compensation

ForFor

CoarticulationCoarticulation

What happens when there is no coarticulation?

AT&T Natural Voices Text-To-Speech EngineAT&T Natural Voices Text-To-Speech Engine

“ALL DA” “ARE GA”

Further Findings

4 ½ month old infants(Fowler et al. 1990)

Native Japanese listeners who do not discriminate /al/ from /ar/ (Mann, 1986)

“There may exist a universally shared level where representation of speech sounds more closely corresponds to articulatory gestures that give rise to the speech signal.” (Mann, 1986)

“Presumably human listeners possess implicit knowledge of coarticulation.” (Repp, 1982)

Theoretical InterpretationsTheoretical Interpretations

Motor TheoryMotor Theory


Motor Theory: “Knowledge” of coarticulation allows perceptual system to compensate for its predicted effects on the speech signal.

Direct Realism: Coarticulation is information for the gestures involved. Signal is parsed along the gestural lines. Coart. is assigned to gesture.

General Approaches: Those other guys are wrong.

Theoretical InterpretationsTheoretical Interpretations

Common Thread:Detailed correspondence betweenspeech production and perception

Special Module for Speech Perception

• Talker-Specific• Speech-Specific

Two Predictions:

Testing Hypothesis #1Testing Hypothesis #1Talker-specific

Should only compensate for the speech coming from a

single speaker

Testing Hypothesis #1Testing Hypothesis #1Talker-specific

Male /al/Male /al/

Male /ar/Male /ar/

Male /da/ - /ga/Male /da/ - /ga/

Female /al/Female /al/

Female /ar/Female /ar/

Mean context shift

0

10

20

30

40

Male Female

Testing Hypothesis #2Testing Hypothesis #2Speech-specific

Compensation should only occur for speech sounds

/al/

SPEECH TONES

/ar/

SPEECH TONES

Testing Hypothesis #2Testing Hypothesis #2

0

10

20

30

40

50

60

70

80

Speech Non-Speech

Condition

Me

an

% /g

a/ R

esp

ons

es

/al/

/ar/

Does this rule out motor theory?Does this rule out motor theory?

It may be that the special speech module is broadly tuned.If it acts like speech it went through speech module. If not, not.

/al/ /ar/

SPEECH PRECURSORS

Training the Quail

/da//ga/

1 2 6 7

/al/ /ar/

Withheld from training

53 4 53 4

Withheld fromtraining

CV series varying in F3 onset frequency

Context-Dependent Speech PerceptionContext-Dependent Speech Perceptionby an avian speciesby an avian species

No

rma

lize

d R

esp

on

se(P

eck

s o

r “G

A”

Re

spo

nse

s)

ConclusionsConclusions

General auditory processes

play a substantive role in maintaining perceptual compensation for coarticulation

3

Links to speech production are not necessary1

Neither speech-specific nor species-specific

Learning is not necessary2

Quail had no experience with covariation


Motor Theory: “Knowledge” of coarticulation allows perceptual system to compensate for its predicted effects on the speech signal.

Direct Realism: Coarticulation is information for the gestures involved. Signal is parsed along the gestural lines. Coart. is assigned to gesture.

General Approaches: General Auditory Processes

GAP

Effects of ContextEffects of Contexta familiar examplea familiar example

How well does this analogy hold upfor context effects in speech?

Effects of ContextEffects of Contexta familiar examplea familiar example

/al da/

/ar da/

/al ga/

/ar ga/

*

*

ft

Hypothesis: Spectral ContrastHypothesis: Spectral Contrastthe case of [ar]the case of [ar]

Time

ProductionProductionF3 assimilated toward lower frequency

Fre

quen

cy

/ar da/

PerceptionPerceptionF3 is perceived as a higher frequency

F3 Step

Pe

rce

nt /

ga/

Re

spo

nse

s

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7

none

al

ar

Evidence for General ApproachEvidence for General Approach

The Empire Strikes BackThe Empire Strikes Back

Fowler, et al. (2000)Fowler, et al. (2000)

video

audio

Visual cue: face“AL” or “AR”

Ambiguous precursor Test syllable:/ga/-/da/ series

Precursor conditions differed only in visual information

Results of Fowler, et al. (2000)Results of Fowler, et al. (2000)

• More /ga/ responses when video cued /al/

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10

Test syllable

Pe

rce

nt

/ga

/ re

sp

on

se

s

video /al/

video /ar/

Experiment 1: ResultsExperiment 1: Results

• No context effect on test syllable– F(1,8) = 3.2,

p = .111

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10

Test syllable

Pe

rce

nt

/ga

/ re

sp

on

se

s

video /al/

video /ar/

%ga responses by condition for 9 participants

A closer look…A closer look…• 2 videos: /alda/ /arda/

• Video information during test syllable presentation

• Should be the same in both conditions

…more consistent with /ga/?

…more consistent with /da/?

/alda/ video /arda/ video

ResultsResults

010

2030

405060

7080

90100

1 2 3 4 5 6 7 8 9 10

Test syllable

% GA

re

sp

on

se

s

Video /da/ from /alda/

Video /da/ from /arda/

ComparisonsComparisons

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10

Test syllable

Pe

rce

nt

/ga

/ re

sp

on

se

s

video /al/

video /ar/

010

2030

405060

7080

90100

1 2 3 4 5 6 7 8 9 10

Test syllable

% GA

re

sp

on

se

s

Video /da/ from /alda/

Video /da/ from /arda/

ConclusionsConclusions

Spectral contrast is best current account 3

No evidence of visually mediated phonetic context effect

1

No evidence that gestural information is required

2

But what about backwards effects???

The Stimulus Paradigm

Time

Target Speech Stimulu

s/da-ga/

Noise Burst

(/t/ or /k/)

Sine-wave Tone Context

(High or Low Freq)

GotDot

Low High

GawkDock

Time (ms)

Freq

uen

cy (

Hz)

Speaker Normalization

Ladefoged & Broadbent (1957)

CARRIER SENTENCE“Please say what this word is…”

Original, F1, F1

TARGET“bit”, “bet”, “bat”, “but”

+

• TARGET acoustics were constant• TARGET perception shifted with changes in “speaker”• Spectral characteristics of the preceding sentence predicted perception

‘Talker/Speaker Normalization’Sensitivity to Accent, Etc.

Experiment Model

/ga/ /da/

1 9

Time

Natural speech F2 & F3 onset edited to create 9-step series Varying perceptually from /ga/ to /da/

Speech

Token589 ms

No Effect of Adjacent Contextwith intermediate spectral characteristics

Time

Silent Interval

50 ms

Standard Tone70 ms

Speech

Token589 ms

2300 Hz

PILOT TEST:No context effect on speech perception(t(9)=1.35, p=.21)

Acoustic Histories

ACOUSTIC HISTORY: The critical context stimulus for these experiments is not a single sound, but a distribution of sounds

21 70-ms tones, sampled from a distribution 30-ms silent interval between tones

Time

Acoustic

History2100 ms

Silent Interval

50 ms

Standard Tone70 ms

Speech

Token589 ms

Acoustic History Distributions

Tone Frequency (Hz)

33001300 2300 28001800

1

Freq

uen

cy o

f Pre

sen

tati

on

High Mean = 2800 Hz

33001300 2300 28001800

1

Low Mean = 1800 Hz

Example Stimuli

Time (ms)

Frequency (Hz)

A B

2800 Hz Mean 1800 Hz Mean

Characteristics of the Context

Context is not local Standard tone immediately precedes each stimulus,

independent of condition. On its own, this has no effect of context on /ga/-/da/ stimuli.

Context is defined by distribution characteristics Sampling of the distribution varies on each trial

Precise acoustic characteristics vary with trial Context unfolds over a broad time course

Time

Acoustic

History2100 ms

Silent Interval

50 ms

Standard Tone70 ms

Speech

Token589 ms

Results

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9

Stimulus Step

Perc

ent GA R

espo

nses

High Mean

Low Mean

p<.0001

Contrastive

Notched Noise Histories

Time (ms)

Frequency (Hz)

A

B

100 Hz BW for each notch

Results

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9

Stimulus Step

Perc

ent

"GA"

Resp

onse

s High Mean

Low Mean

Tones

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9

Stimulus Step

Perc

ent

"GA"

Resp

onse

s

High Mean

Low Mean

Notched Noise

p<0.04p<0.01

N=10

Joint Effects?

Time

Acoustic

History2100 ms

Silent Interval

50 ms

Standard Tone70 ms

Speech

Token589 ms

Time

Acoustic

History2100 ms

Silent Interval

50 ms

/al/ or /ar/

300 ms

Speech

Token589 ms

Conflicting e.g., High Mean + /ar/Cooperating e.g., High Mean + /al/

Interaction of Speech/N.S.Speech Only

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9

Speech Target

Per

cent

"ga

" Res

pons

es

/al/

/ar/

p=.007

Cooperating

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9

Speech Target

Per

cent

"ga

" Res

pons

es

/al/

/ar/

p<.0001

Conflicting

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9

Speech TargetPer

cent

"ga

" Res

pons

es

/al/

/ar/

p=.009

Significantly greater than speech alone

Same magnitude as Speech Only, opposite directionFollows NS spectra

Phonetic Context Effects Major Theories of Speech Perception Motor Theory: Specialized module (later...

Documents

Transcript of Phonetic Context Effects Major Theories of Speech Perception Motor Theory: Specialized module (later...