Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness -...

38
Using Paragraph- and Discourse- based Prosodic Cues to Improve Speech Synthesis Expressiveness Mireia Farrús AI With the Best, 25/09/2016

Transcript of Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness -...

Page 1: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

Using Paragraph- and Discourse-based Prosodic Cues to Improve Speech Synthesis Expressiveness

Mireia FarrúsAI With the Best, 25/09/2016

Page 2: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 2

Outline

Over the last decade, automatically generated speech has significantly improved in terms of voice quality and expressiveness. However, multi-sentential synthesized speech still suffers from a high degree of unnaturalness.

Page 3: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 3

Outline

To overcome it, a more paragraph and communicative structure aware approach is needed to make real improvements in speech synthesis

Page 4: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 4

Text-to-Speech (TTS) Systems

Page 5: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 5

TTS systems

Page 6: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 6

TTS systems

Page 7: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 7

Context: current TTS systems

• Preceding and following phonemes• Position of segment in syllable• Position of syllable in word & phrase• Position of word in phrase• Stress/accent/length features of

current/preceding/following syllables• Distance from stressed/accented syllables

Page 8: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 8

Context: current TTS systems

• POS of current/preceding/following word• Length of current/preceding/following

phrase• End tone of phrase• Lenght of utterance measured in

syllables/words/phrases(King, 2010)

Page 9: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 9

BUT human speech also relies on…

• Paragraph structure• Communicative structure• Discourse structure

Page 10: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 10

Paragraph structure

• “Paragraph-based Prosodic Cues for Speech Synthesis Applications”.Mireia Farrús, Catherine Lai, Johanna D. Moore

Page 11: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 11

Paragraph structure

Page 12: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 12

Paragraph structure

Page 13: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 13

Page 14: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016

Prosody & Pragraph Structure

• ~ 1400 TED talks

14

Page 15: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 15

Page 16: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 16

Page 17: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 17

• There is clear evidence of prosodic resets over paragraph breaks• We can also observe a steady declination in prosodic level over the paragraph• Difference features are more discriminative of boundaries than sentence-based features• Paragraphs have an identifiable suprasentential prosodic structure that can be described in terms of relative changes in F0, intensity, and timing• The classification experiments support the idea that utterance intrinsic features to paragraph position exist• Pause duration is the most robust predictor of paragraph breaks We should be able to employ paragraph declination, pause and prosodic reset features to improve the naturalness of longer synthesized speech

Conclusions

Paragraph structure

Page 18: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 18

Information/Communicative structure

• “The Information Structure-Prosody Language Interface Revisited”.Mónica Domínguez, Mireia Farrús, Alicia Burga, Leo Wanner

Page 19: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 19

Theoretical background - Motivation

• Influence of information structure on intonation

• Steedman’s theory relating– Theme/rheme– Intonation patterns

Page 20: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 20

Theoretical background - Handicaps

• Based on short sentences with a simple structure and a default word order (SVO for English)

• What if we have…

Page 21: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 21

ToBI labels

Tones and Break Indices• high (H) and low (L) tones• pitch accents (the L* tones)• bitonal pitch accents (L+H*, etc.)• phrase accents (H- and L- tones)• boundary tones (H% and L%)

Page 22: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 22

Theoretical background – Our work

• “The Information Structure-Prosody Language Interface Revisited”.Mónica Domínguez, Mireia Farrús, Alicia Burga, Leo Wanner

• Objectives– Validate Steedman’s theory– Proposal for more complex syntactic structures

Page 23: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 23

Theoretical background - Mel’čuk

Steedman• Linearity• Intonation ~ theme/rheme

Mel’čuk• Hierarchy• Intonation ~ Thematicity

– theme/rheme– specifiers– embeddedness

Page 24: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 24

Preliminary experiments

• Wall Street Journal corpus (Penn Treebank)• American English recordings• Native speakers• 109 sentences• AuToBI labelling + reduction model• Manual annotation of Thematicity

Page 25: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 25

Validating the classic interface

• To what extent the classic approaches can be applied to general discourse with more complex sentences?

• Examples matching the expected THEME patterns…

… but not the expected RHEMES.

Page 26: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 26

Validating the classic interface

• We have found that…

– Themes usually match, although ~40% do not.– Steedman’s approach to include everything –apart

from theme – into a flat rheme span lacks accuracy.

• We need a more accurate IS—prosody interface.

Page 27: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 27

Towards a more accurate IS-Prosody interface

• Our hypothesis:– Applying Mel’čuk’s hierarchical three-partite

thematicity structure, we will be able to:• Propose a more accurate modelisation of the

intonation-thematicity correlation for the ~40% non-coincident patterns in theme spans.• Find a justification for the discrepancies observed in the

rheme patterns.

Page 28: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 28

Towards a more accurate IS-Prosody interface

• SpecifierExample with the annotation suggested by Mel’čuk (1)

Page 29: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 29

Towards a more accurate IS-Prosody interface

• SpecifierExample with the annotation suggested by Mel’čuk (2)

Page 30: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 30

Towards a more accurate IS-Prosody interface

• HierarchyExample with the annotation suggested by Mel’čuk

rising pattern ↔ theme

Embedded themes behave as main themes in terms of intonation.

Page 31: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 31

Classification experiments

• Combining Acoustic and Linguistic Levels in Phrase-Oriented Prosody Modelling

Page 32: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 32

Classification experiments

• Testing acoustic parameters

Page 33: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 33

Classification experiments

• Testing linguistic features

Page 34: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 34

Conclusions

• Information Structure determines the “communicative” segmentation of the meaning of an utterance.

• Central to the semantics—syntax—intonation interface, and to NLP.

Page 35: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 35

Conclusions

• Descriptive study attempting to determine which intonation patterns better characterize thematicity in real utterances.

• Flat theme/rheme interpretation prevailing in classical approaches fails to explain complex linguistic structures.

• Hierarchical structures and the specifiers render positive results.

Page 36: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 36

Prosody & discourse structure

• Rhetorical Structure Theory (RST)(Mann & Thompson, 1988)

Describes organization structure of texts via definitions of relations between two text span, nucleous (N) and satellite (S)

Page 37: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 37

Conclusions

• Prosody prediction from:• Type of sentence• Discourse structure• Discourse markers• Information structure

… to improve expressiveness and naturalness of automatically generated speech

Page 38: Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness - Mireia Farrus

AI With the Best, 25/09/2016 38

Thank you for your attention!