Joan Borràs-Comes Ph.D. Project Supervisor: Dr. Pilar Prieto.

19
The role of pitch range and facial gestures in conveying prosodic meaning Joan Borràs-Comes Ph.D. Project Supervisor: Dr. Pilar Prieto
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    2

Transcript of Joan Borràs-Comes Ph.D. Project Supervisor: Dr. Pilar Prieto.

  • Slide 1
  • Slide 2
  • Joan Borrs-Comes Ph.D. Project Supervisor: Dr. Pilar Prieto
  • Slide 3
  • Vision has a strong influence upon speech perception in normal verbal communication Gesture and speech form a fully integrated system Gestures are framed by speech (Goldin-Meadow 2005) Only by looking at both we can predict how people learn, remember, and solve problems Gestures and speech are co-expressive but not redundant Gesture allows speakers to convey thoughts that may not easily fit into the categorical system that their conventional language offers (McNeill 1992/2005) Vision has a clear role for the perception of various aspects typically associated with verbal prosody Audiovisual cues for prosodic functions such as focus (Dohen & Lvenbruck 2009) and question intonation (Srinivasan & Massaro 2003) have been successfully investigated
  • Slide 4
  • Visual cues (eyebrow flashes, head nods, beat gestures) boost the perceived prominence of the words they occur with (Cav et al. 1996, Erickson et al. 1998, Hadar et al. 1983, Krahmer & Swerts 2004/2007, Swerts & Krahmer 2008) Audiovisual cues for (traditional prosodic) functions as: phrasing (Barkhuysen et al. 2008) face-to-face grounding (Nakano et al. 2003) question intonation (Srinivasan & Massaro 2003) have been explored as have the audiovisual expressions of affective functions such as: signaling basic emotions (Barkhuysen et al. 2009, de Gelder et al. 1999) uncertainty (Krahmer & Swerts 2005, Swerts & Krahmer 2005) frustration (Barkhuysen, Krahmer & Swerts, 2005) In sign languages prosody, there is no audio component the articulatory effort has shifted to the hands Does visual prosody (eyebrow movements, eye blinks) works in similar ways across the signed and non-signed languages? different visual signs have specific prosodic functions the combination of these visual signs gives rise to a subtle yet meaningful layer on top of the signing (Dachkovsky & Sandler 2009, for Israeli SL; Wilbur 2009, for American SL) Dachkovsky & Sandler 2009
  • Slide 5
  • Most of the work has described a correlated mode of processing: Vision partially duplicates acoustic information Vision provides a powerful assist in decoding speech, e.g., in noisy environments Many studies have found a weak visual effect relative to a robustly strong auditory effect Dohen (2009): production & perception of the contrastive informational focus Suprasegmental perception of speech is multimodal Production reveal visible correlates to contrastive focus (Dohen et al. 2006) Prosodic contrastive focus is detectable from the visual modality alone (Krahmer & Swerts 2006) Perception cues partly correspond to those used in production (Dohen & Loevenbruck 2005) Srinivasan & Massaro (2003): discrimination between statements and questions Much larger influence of the auditory cues than visual cues Visual cues do not strongly signal interrogative intonation (House 2002) In whispered speech (no F0) auditory only perception is degraded adding vision clearly improves prosodic focus detection RTs: adding vision reduces processing time (Dohen & Loevenbruck 2009)
  • Slide 6
  • Vision provides a powerful assist: in noisy environments in whispered speech So what happens when acoustic information is ambiguous? In Catalan intonation: Nuclear pattern L+H* L% can be used to express: statement, contrastive focus and echo question Production analyses show that that these three sentence-types : differ in their pitch accent height may be distributed in three well-differentiated areas of the pitch range L+H* L% can be used to express both contrastive foci and echo questions Main hypothesis: For more ambiguous or underspecified parts of the speech stream, a complementary mode of processing can be possible, whereby vision provides information more efficiently than hearing
  • Slide 7
  • 1. Hows the participants identification of 3 pragmatic meanings across an auditory continuum? 2. Does the categorical contrast between 2 intonational contours elicit a specific MMN? 3. Whats the contribution of visual and acoustic cues in perceiving an acoustically ambiguous intonational contrast? 4. Which gestural elements guide speakers interpretations? 5. Future projects
  • Slide 8
  • In Catalan, the same nuclear configuration L+H*L% is used to express 3 sentence-types: statements, contrastive foci, and echo questions The peak height indicates sentence type. Our initial hypothesis was that these three sentence-types may be distributed in three well- differentiated areas of the pitch range a.Com la vols, la cullera? Petita[, sisplau]. b.Volies una cullera gran, no? P ETITA [, la vull, i no gran]. c.Jo la vull petita, la cullera. Petita?[, nests segur?]
  • Slide 9
  • 20 native speakers 2 semantically motivated identification tasks Congruency test: participants acceptance of each stimulus ocurrying within each of the three communicative contexts Identification test: participants had to identify each of the three meanings for each isolated stimulus Stimuli: continuum of 11 steps by created by modifying the F0 height (distance between each one = 1.2 semitones) of the noun phrase petita
  • Slide 10
  • 116 64 82 Congruency test One-way ANOVA Effect of linguistic context on sentence interpretation (F (3582, 2) = 16.579, p