Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness -...
-
Upload
withthebest -
Category
Technology
-
view
50 -
download
0
Transcript of Using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness -...
Using Paragraph- and Discourse-based Prosodic Cues to Improve Speech Synthesis Expressiveness
Mireia FarrúsAI With the Best, 25/09/2016
AI With the Best, 25/09/2016 2
Outline
Over the last decade, automatically generated speech has significantly improved in terms of voice quality and expressiveness. However, multi-sentential synthesized speech still suffers from a high degree of unnaturalness.
AI With the Best, 25/09/2016 3
Outline
To overcome it, a more paragraph and communicative structure aware approach is needed to make real improvements in speech synthesis
AI With the Best, 25/09/2016 4
Text-to-Speech (TTS) Systems
AI With the Best, 25/09/2016 5
TTS systems
AI With the Best, 25/09/2016 6
TTS systems
AI With the Best, 25/09/2016 7
Context: current TTS systems
• Preceding and following phonemes• Position of segment in syllable• Position of syllable in word & phrase• Position of word in phrase• Stress/accent/length features of
current/preceding/following syllables• Distance from stressed/accented syllables
AI With the Best, 25/09/2016 8
Context: current TTS systems
• POS of current/preceding/following word• Length of current/preceding/following
phrase• End tone of phrase• Lenght of utterance measured in
syllables/words/phrases(King, 2010)
AI With the Best, 25/09/2016 9
BUT human speech also relies on…
• Paragraph structure• Communicative structure• Discourse structure
AI With the Best, 25/09/2016 10
Paragraph structure
• “Paragraph-based Prosodic Cues for Speech Synthesis Applications”.Mireia Farrús, Catherine Lai, Johanna D. Moore
AI With the Best, 25/09/2016 11
Paragraph structure
AI With the Best, 25/09/2016 12
Paragraph structure
AI With the Best, 25/09/2016 13
AI With the Best, 25/09/2016
Prosody & Pragraph Structure
• ~ 1400 TED talks
14
AI With the Best, 25/09/2016 15
AI With the Best, 25/09/2016 16
AI With the Best, 25/09/2016 17
• There is clear evidence of prosodic resets over paragraph breaks• We can also observe a steady declination in prosodic level over the paragraph• Difference features are more discriminative of boundaries than sentence-based features• Paragraphs have an identifiable suprasentential prosodic structure that can be described in terms of relative changes in F0, intensity, and timing• The classification experiments support the idea that utterance intrinsic features to paragraph position exist• Pause duration is the most robust predictor of paragraph breaks We should be able to employ paragraph declination, pause and prosodic reset features to improve the naturalness of longer synthesized speech
Conclusions
Paragraph structure
AI With the Best, 25/09/2016 18
Information/Communicative structure
• “The Information Structure-Prosody Language Interface Revisited”.Mónica Domínguez, Mireia Farrús, Alicia Burga, Leo Wanner
AI With the Best, 25/09/2016 19
Theoretical background - Motivation
• Influence of information structure on intonation
• Steedman’s theory relating– Theme/rheme– Intonation patterns
AI With the Best, 25/09/2016 20
Theoretical background - Handicaps
• Based on short sentences with a simple structure and a default word order (SVO for English)
• What if we have…
AI With the Best, 25/09/2016 21
ToBI labels
Tones and Break Indices• high (H) and low (L) tones• pitch accents (the L* tones)• bitonal pitch accents (L+H*, etc.)• phrase accents (H- and L- tones)• boundary tones (H% and L%)
AI With the Best, 25/09/2016 22
Theoretical background – Our work
• “The Information Structure-Prosody Language Interface Revisited”.Mónica Domínguez, Mireia Farrús, Alicia Burga, Leo Wanner
• Objectives– Validate Steedman’s theory– Proposal for more complex syntactic structures
AI With the Best, 25/09/2016 23
Theoretical background - Mel’čuk
Steedman• Linearity• Intonation ~ theme/rheme
Mel’čuk• Hierarchy• Intonation ~ Thematicity
– theme/rheme– specifiers– embeddedness
AI With the Best, 25/09/2016 24
Preliminary experiments
• Wall Street Journal corpus (Penn Treebank)• American English recordings• Native speakers• 109 sentences• AuToBI labelling + reduction model• Manual annotation of Thematicity
AI With the Best, 25/09/2016 25
Validating the classic interface
• To what extent the classic approaches can be applied to general discourse with more complex sentences?
• Examples matching the expected THEME patterns…
… but not the expected RHEMES.
AI With the Best, 25/09/2016 26
Validating the classic interface
• We have found that…
– Themes usually match, although ~40% do not.– Steedman’s approach to include everything –apart
from theme – into a flat rheme span lacks accuracy.
• We need a more accurate IS—prosody interface.
AI With the Best, 25/09/2016 27
Towards a more accurate IS-Prosody interface
• Our hypothesis:– Applying Mel’čuk’s hierarchical three-partite
thematicity structure, we will be able to:• Propose a more accurate modelisation of the
intonation-thematicity correlation for the ~40% non-coincident patterns in theme spans.• Find a justification for the discrepancies observed in the
rheme patterns.
AI With the Best, 25/09/2016 28
Towards a more accurate IS-Prosody interface
• SpecifierExample with the annotation suggested by Mel’čuk (1)
AI With the Best, 25/09/2016 29
Towards a more accurate IS-Prosody interface
• SpecifierExample with the annotation suggested by Mel’čuk (2)
AI With the Best, 25/09/2016 30
Towards a more accurate IS-Prosody interface
• HierarchyExample with the annotation suggested by Mel’čuk
rising pattern ↔ theme
Embedded themes behave as main themes in terms of intonation.
AI With the Best, 25/09/2016 31
Classification experiments
• Combining Acoustic and Linguistic Levels in Phrase-Oriented Prosody Modelling
AI With the Best, 25/09/2016 32
Classification experiments
• Testing acoustic parameters
AI With the Best, 25/09/2016 33
Classification experiments
• Testing linguistic features
AI With the Best, 25/09/2016 34
Conclusions
• Information Structure determines the “communicative” segmentation of the meaning of an utterance.
• Central to the semantics—syntax—intonation interface, and to NLP.
AI With the Best, 25/09/2016 35
Conclusions
• Descriptive study attempting to determine which intonation patterns better characterize thematicity in real utterances.
• Flat theme/rheme interpretation prevailing in classical approaches fails to explain complex linguistic structures.
• Hierarchical structures and the specifiers render positive results.
AI With the Best, 25/09/2016 36
Prosody & discourse structure
• Rhetorical Structure Theory (RST)(Mann & Thompson, 1988)
Describes organization structure of texts via definitions of relations between two text span, nucleous (N) and satellite (S)
AI With the Best, 25/09/2016 37
Conclusions
• Prosody prediction from:• Type of sentence• Discourse structure• Discourse markers• Information structure
… to improve expressiveness and naturalness of automatically generated speech
AI With the Best, 25/09/2016 38
Thank you for your attention!