Post on 17-Dec-2015
Cross-modal Prediction in Speech
PerceptionCarolina Sánchez, Agnès Alsius, James T. Enns & Salvador
Soto-Faraco
Multisensory Research Group
Universitat Pompeu Fabra
Barcelona
Auditory + visual performanceMSI enhancement
Background
Visual + Auditory
Improve Speech Perception
Multisensory Integration
Background
• Prediction within one sensory modality• Many levels of information processing
– Phonological prediction “ This morning I went to the library and borrowed a … book” (De Long, 2005; Pickering, 20707)
– Visual prediction: Visual search (Enns, 2008; Dambacher, 2009)
– Sensorimotor prediction: forward model (Wolpert, 1997)
Predictive coding
Pickering, 2007
Hypothesis
• If there exists prediction within the same modality,
and if predictive coding models can account for prediction at a phonological level, then …
Predictive Coding could occur across different sensory modalities too.
Indirect evidences of cross-modal transfer in speech
van Wassenhove’s , 2005
time
ERPs
• Amplitud reduction
• Shortening latency
/pa/ high visual saliency
/ka/ short visual saliency
Our study
• Visual prediction
• Auditory prediction
• Visual-to-auditory cross-modal prediction
• Auditory-to-visual cross-modal prediction
Visual prediction
Visual stream
Auditory stream
V
A
With visual informative visual context
Without informative context
Task :
AV Match vs. AV Mismatch
Target fragment
Context fragment
speechnon speech
Results
*
0
200
400
600
800
1000
1200Reaction time
mse
c
match mismatch
With visual informative context
Without informative context
* With previous context participants respond faster than without it.
VISUAL PREDICTION
Auditory prediction
Visual stream
Auditory stream
V
A
With auditory informative auditory context
Without informative context
speechnon speech
Task :
AV Match vs. AV Mismatch
Target fragment
Context fragment
Results
*
0
200
400
600
800
1000
1200
With auditory informative context
Without informative context
Reaction time
mse
c
match mismatch
* With previous context participants respond faster than without it.
AUDITORY PREDICTION
Visual vs. Auditory Visual prediction Auditory
prediction
0
200
400
600
800
1000
1200Rts
mse
c
congruent incongruent
With visual informative context
Without informative context*
0
200
400
600
800
1000
1200
With auditory informative context
Without informative context
Rts
mse
c
congruent incongruent
*
Conclusions
• Visual prediction
• Auditory prediction
Is this prediction cross-modal?
Predictability of Vision-to-Audition Design of the experiment
V
AMismatch
Unimodal continued
Auditory stream
Visual stream
Match
Unimodal continuedV
A
Discontinued
Match
V
A
Discontinued
Mismatch
V
A
Cross-modal continued
Mismatch
Predictability of Vision-to-Audition Stimuli
V
AMismatch
V
AMismatch
V
AMismatch
Unimodal continued Discontinued Cross-modal continued
Results
Participants were faster in the cross-modal condition than in the completely incongruent one.
VISUAL –TO-AUDITORY PREDICTION
700
750
800
850
900
950
1000
Reaction time
mse
c
*
VisualAuditory
Unimodal continued
Discontinued Cross-modal continued
Predictability of Audition-to-Vision Design of the experiment
Auditory stream
Visual stream
Match
Unimodal continued
V
AMismatch
Unimodal continued
V
AMatch
Discontinued
V
AMismatch
Discontinued
V
AMismatch
Cross-modal continued
0
200
400
600
800
1000
1200Reaction time
mse
c
Visual
Auditory
Unimodal continued
Discontinued Cross-modal continued
Results
We didn’t find any difference between the mismatch condicions
NO AUDITORY-TO-VISUAL PREDICTION
Conclusions
• There is some kind of prediction from vision-to-auditory modality
• There is not any prediction from auditory-to-vision modality
Does this prediction depend on the language?
Canadian participants with english sentences
VISUAL –TO-AUDITORY PREDICTION IN NATIVE LANGUAGE
700
750
800
850
900
950
1000Reaction time
mse
c
*
Visual
Auditory
Unimodal continued
Discontinued Cross-modal continued
700
750
800
850
900
950
1000
Reaction time
mse
c
*
VisualAuditory
Unimodal continued
Discontinued Cross-modal continued
Spanish participants with spanish sentences
Results (L1)
Results (L1)
Canadian participants with english sentences
0
200
400
600
800
1000
1200Reaction time
mse
c
No differences between the mismatch conditions
No prediction from auditory-to-visual modality in native language
Spanish participants with spanish sentences
0
200
400
600
800
1000
1200Reaction time
mse
c
Visual
Auditory
Unimodal continued
Discontinued Cross-modal continued
Visual
Auditory
Unimodal continued
Discontinued Cross-modal continued
Conclusions
• There is some kind of prediction from vision-to-auditory modality in L1
• There is not any prediction from auditory-to-vision modality L1
What happens with an unknown language?
Unknown language : visual to auditory
Canadian participants with spanish sentences
NO VISUAL-TO-AUDITORY IN OTHER LANGUAGE
700
800
900
1000
1100
1200Reaction time
mse
c
Visual
Auditory
Unimodal continued
Discontinued Cross-modal continued
Unknown language: auditory to visual
Spanish participants with english sentences
Canadian participants with spanish sentences
0
200
400
600
800
1000
1200Reaction time
mse
c
0
200
400
600
800
1000
1200Reaction time
mse
c
No differences between the mismatch conditions
No prediction from auditory-to-visual modality in other language
Visual
Auditory
Unimodal continued
Discontinued Cross-modal continued
Visual
Auditory
Unimodal continued
Discontinued Cross-modal continued
Conclusions
• No visual-to-auditory cross-modal prediction in an unknown language…
it seems that some level of knowledge about the articulatory phonetics of the language is required to obtain the advantage of the predictive coding
• No auditory-to-visual cross-modal prediction
General Conclusions
• Unimodal prediction from visual to visual modality from auditory to auditory
• L1: ASYMMETRY– Cross-modal prediction from visual-to-auditory
modality– No cross-modal prediction from auditory-to-visual
modality
• Unknown language: previous knowledge of the language is neccesary to make the prediction– No cross-modal prediction from visual-to-auditory
modality– No cross-modal prediction from auditory-to-visual
modality
- Agnès Alsius, Postdoc
Queen’s University
- Antonia Najas, MA/ Research Assistant Universitat Pompeu Fabra
- Phil Jaekl, PostdocUniversitat Pompeu Fabra
- All the people of the Vision Lab, UBC, Vancouver
Thanks to…
Thanks for your attention!!