Acoustic-phonetic characteristics of speech produced with ...
Phonetic Context Effects Major Theories of Speech Perception Motor Theory: Specialized module (later...
-
Upload
marjory-payne -
Category
Documents
-
view
217 -
download
0
Transcript of Phonetic Context Effects Major Theories of Speech Perception Motor Theory: Specialized module (later...
Phonetic Context EffectsPhonetic Context Effects
Major Theories of Speech Major Theories of Speech PerceptionPerception
Motor Theory: Specialized module (later version) represents speech sounds in terms of intended gestures through a model of or knowledge of vocal tracts
Direct Realism: Perceptual system recovers (phonetically-relevant) gestures by picking up the specifying information in the speech signal.
General Approaches: Speech is processed in the same way as other sounds. Representation is a function of the auditory system and experience with language.
Explanatory level = gesture Explanatory level = sound
Fluent Speech ProductionFluent Speech Production
adjacent speech adjacent speech becomes more similarbecomes more similar
The vocal tract is subject The vocal tract is subject to physical constraints...to physical constraints...
Mass
Inertia
Coarticulation = AssimilationCoarticulation = Assimilation
Radical Context DependencyRadical Context Dependency
Also a result of the motor plan
An ExampleAn ExamplePlace of Articulation in stops Place of Articulation in stops
Say /da/ Say /ga/
Anterior Posterior
An ExampleAn ExamplePlace of Articulation in stops Place of Articulation in stops
Say /al/ Say /ar/
Anterior Posterior
An ExampleAn ExamplePlace of Articulation in stops Place of Articulation in stops
Say /al da/Say /ar da/
Say /al ga/Say /ar ga/
Place of articulation changes = Coarticulation
An ExampleAn ExamplePlace of Articulation in stops Place of Articulation in stops
Say /ar da/Say /al ga/
Coarticulation Coarticulation has acoustical has acoustical consequencesconsequences
/al da/
/ar da/
/al ga/
/ar ga/
*
*
ft
How does the listener deal with this?How does the listener deal with this?
Speech PerceptionSpeech Perception
/ar//al/
/ga/ /da/
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7
none
al
ar
Identifying in ContextP
erce
nt
“g”
Res
po
nse
s
[ga] [da]
/al/
/ar/
Per
cen
t “g
” R
esp
on
ses
Direction of Effect
ProductionProduction
/al//al/ More /da/-likeMore /da/-like
/ar//ar/ More /ga/-likeMore /ga/-like
PerceptionPerception
/al//al/ More /ga/-likeMore /ga/-like
/ar//ar/ More /da/-likeMore /da/-like
Perceptual Compensation Perceptual Compensation
ForFor
CoarticulationCoarticulation
What happens when there is no coarticulation?
AT&T Natural Voices Text-To-Speech EngineAT&T Natural Voices Text-To-Speech Engine
“ALL DA” “ARE GA”
Further Findings
4 ½ month old infants(Fowler et al. 1990)
Native Japanese listeners who do not discriminate /al/ from /ar/ (Mann, 1986)
“There may exist a universally shared level where representation of speech sounds more closely corresponds to articulatory gestures that give rise to the speech signal.” (Mann, 1986)
“Presumably human listeners possess implicit knowledge of coarticulation.” (Repp, 1982)
Theoretical InterpretationsTheoretical Interpretations
Motor TheoryMotor Theory
Major Theories of Speech Major Theories of Speech PerceptionPerception
Motor Theory: “Knowledge” of coarticulation allows perceptual system to compensate for its predicted effects on the speech signal.
Direct Realism: Coarticulation is information for the gestures involved. Signal is parsed along the gestural lines. Coart. is assigned to gesture.
General Approaches: Those other guys are wrong.
Theoretical InterpretationsTheoretical Interpretations
Common Thread:Detailed correspondence betweenspeech production and perception
Special Module for Speech Perception
• Talker-Specific• Speech-Specific
Two Predictions:
Testing Hypothesis #1Testing Hypothesis #1Talker-specific
Should only compensate for the speech coming from a
single speaker
Testing Hypothesis #1Testing Hypothesis #1Talker-specific
Male /al/Male /al/
Male /ar/Male /ar/
Male /da/ - /ga/Male /da/ - /ga/
Female /al/Female /al/
Female /ar/Female /ar/
Mean context shift
0
10
20
30
40
Male Female
Testing Hypothesis #2Testing Hypothesis #2Speech-specific
Compensation should only occur for speech sounds
/al/
SPEECH TONES
/ar/
SPEECH TONES
Testing Hypothesis #2Testing Hypothesis #2
0
10
20
30
40
50
60
70
80
Speech Non-Speech
Condition
Me
an
% /g
a/ R
esp
ons
es
/al/
/ar/
Does this rule out motor theory?Does this rule out motor theory?
It may be that the special speech module is broadly tuned.If it acts like speech it went through speech module. If not, not.
/al/ /ar/
SPEECH PRECURSORS
Training the Quail
/da//ga/
1 2 6 7
/al/ /ar/
Withheld from training
53 4 53 4
Withheld fromtraining
CV series varying in F3 onset frequency
Context-Dependent Speech PerceptionContext-Dependent Speech Perceptionby an avian speciesby an avian species
No
rma
lize
d R
esp
on
se(P
eck
s o
r “G
A”
Re
spo
nse
s)
ConclusionsConclusions
General auditory processes
play a substantive role in maintaining perceptual compensation for coarticulation
3
Links to speech production are not necessary1
Neither speech-specific nor species-specific
Learning is not necessary2
Quail had no experience with covariation
Major Theories of Speech Major Theories of Speech PerceptionPerception
Motor Theory: “Knowledge” of coarticulation allows perceptual system to compensate for its predicted effects on the speech signal.
Direct Realism: Coarticulation is information for the gestures involved. Signal is parsed along the gestural lines. Coart. is assigned to gesture.
General Approaches: General Auditory Processes
GAP
Effects of ContextEffects of Contexta familiar examplea familiar example
How well does this analogy hold upfor context effects in speech?
Effects of ContextEffects of Contexta familiar examplea familiar example
/al da/
/ar da/
/al ga/
/ar ga/
*
*
ft
Hypothesis: Spectral ContrastHypothesis: Spectral Contrastthe case of [ar]the case of [ar]
Time
ProductionProductionF3 assimilated toward lower frequency
Fre
quen
cy
/ar da/
PerceptionPerceptionF3 is perceived as a higher frequency
F3 Step
Pe
rce
nt /
ga/
Re
spo
nse
s
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7
none
al
ar
Evidence for General ApproachEvidence for General Approach
The Empire Strikes BackThe Empire Strikes Back
Fowler, et al. (2000)Fowler, et al. (2000)
video
audio
Visual cue: face“AL” or “AR”
Ambiguous precursor Test syllable:/ga/-/da/ series
Precursor conditions differed only in visual information
Results of Fowler, et al. (2000)Results of Fowler, et al. (2000)
• More /ga/ responses when video cued /al/
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10
Test syllable
Pe
rce
nt
/ga
/ re
sp
on
se
s
video /al/
video /ar/
Experiment 1: ResultsExperiment 1: Results
• No context effect on test syllable– F(1,8) = 3.2,
p = .111
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10
Test syllable
Pe
rce
nt
/ga
/ re
sp
on
se
s
video /al/
video /ar/
%ga responses by condition for 9 participants
A closer look…A closer look…• 2 videos: /alda/ /arda/
• Video information during test syllable presentation
• Should be the same in both conditions
…more consistent with /ga/?
…more consistent with /da/?
/alda/ video /arda/ video
ResultsResults
010
2030
405060
7080
90100
1 2 3 4 5 6 7 8 9 10
Test syllable
% GA
re
sp
on
se
s
Video /da/ from /alda/
Video /da/ from /arda/
ComparisonsComparisons
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10
Test syllable
Pe
rce
nt
/ga
/ re
sp
on
se
s
video /al/
video /ar/
010
2030
405060
7080
90100
1 2 3 4 5 6 7 8 9 10
Test syllable
% GA
re
sp
on
se
s
Video /da/ from /alda/
Video /da/ from /arda/
ConclusionsConclusions
Spectral contrast is best current account 3
No evidence of visually mediated phonetic context effect
1
No evidence that gestural information is required
2
But what about backwards effects???
The Stimulus Paradigm
Time
Target Speech Stimulu
s/da-ga/
Noise Burst
(/t/ or /k/)
Sine-wave Tone Context
(High or Low Freq)
GotDot
Low High
GawkDock
Time (ms)
Freq
uen
cy (
Hz)
Speaker Normalization
Ladefoged & Broadbent (1957)
CARRIER SENTENCE“Please say what this word is…”
Original, F1, F1
TARGET“bit”, “bet”, “bat”, “but”
+
• TARGET acoustics were constant• TARGET perception shifted with changes in “speaker”• Spectral characteristics of the preceding sentence predicted perception
‘Talker/Speaker Normalization’Sensitivity to Accent, Etc.
Experiment Model
/ga/ /da/
1 9
Time
Natural speech F2 & F3 onset edited to create 9-step series Varying perceptually from /ga/ to /da/
Speech
Token589 ms
No Effect of Adjacent Contextwith intermediate spectral characteristics
Time
Silent Interval
50 ms
Standard Tone70 ms
Speech
Token589 ms
2300 Hz
PILOT TEST:No context effect on speech perception(t(9)=1.35, p=.21)
Acoustic Histories
ACOUSTIC HISTORY: The critical context stimulus for these experiments is not a single sound, but a distribution of sounds
21 70-ms tones, sampled from a distribution 30-ms silent interval between tones
Time
Acoustic
History2100 ms
Silent Interval
50 ms
Standard Tone70 ms
Speech
Token589 ms
Acoustic History Distributions
Tone Frequency (Hz)
33001300 2300 28001800
1
Freq
uen
cy o
f Pre
sen
tati
on
High Mean = 2800 Hz
33001300 2300 28001800
1
Low Mean = 1800 Hz
Example Stimuli
Time (ms)
Frequency (Hz)
A B
2800 Hz Mean 1800 Hz Mean
Characteristics of the Context
Context is not local Standard tone immediately precedes each stimulus,
independent of condition. On its own, this has no effect of context on /ga/-/da/ stimuli.
Context is defined by distribution characteristics Sampling of the distribution varies on each trial
Precise acoustic characteristics vary with trial Context unfolds over a broad time course
Time
Acoustic
History2100 ms
Silent Interval
50 ms
Standard Tone70 ms
Speech
Token589 ms
Results
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9
Stimulus Step
Perc
ent GA R
espo
nses
High Mean
Low Mean
p<.0001
Contrastive
Notched Noise Histories
Time (ms)
Frequency (Hz)
A
B
100 Hz BW for each notch
Results
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9
Stimulus Step
Perc
ent
"GA"
Resp
onse
s High Mean
Low Mean
Tones
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9
Stimulus Step
Perc
ent
"GA"
Resp
onse
s
High Mean
Low Mean
Notched Noise
p<0.04p<0.01
N=10
Joint Effects?
Time
Acoustic
History2100 ms
Silent Interval
50 ms
Standard Tone70 ms
Speech
Token589 ms
Time
Acoustic
History2100 ms
Silent Interval
50 ms
/al/ or /ar/
300 ms
Speech
Token589 ms
Conflicting e.g., High Mean + /ar/Cooperating e.g., High Mean + /al/
Interaction of Speech/N.S.Speech Only
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9
Speech Target
Per
cent
"ga
" Res
pons
es
/al/
/ar/
p=.007
Cooperating
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9
Speech Target
Per
cent
"ga
" Res
pons
es
/al/
/ar/
p<.0001
Conflicting
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9
Speech TargetPer
cent
"ga
" Res
pons
es
/al/
/ar/
p=.009
Significantly greater than speech alone
Same magnitude as Speech Only, opposite directionFollows NS spectra