SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.
-
Upload
donald-manning -
Category
Documents
-
view
216 -
download
0
Transcript of SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.
![Page 1: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649ea25503460f94ba5c12/html5/thumbnails/1.jpg)
SSML Extension for Expressive Mandarin TTS
Shuang LiHongwu YangLianhong Cai
Tsinghua University
![Page 2: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649ea25503460f94ba5c12/html5/thumbnails/2.jpg)
Outline
MotivationMotivation
Expression of SpeechExpression of Speech
Proposed SSML extension Proposed SSML extension
ConclusionConclusion
![Page 3: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649ea25503460f94ba5c12/html5/thumbnails/3.jpg)
Motivation(1/3)
• Sentences with the same text can be expressed with different styles, emotions and moods
• Current tts system lacks variability
![Page 4: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649ea25503460f94ba5c12/html5/thumbnails/4.jpg)
Motivation(2/3)• Current SSML cannot define speaking style, em
otion and mood– Good news: 生日快乐 “ Happy birthday”
expressed in happiness (emotion)
– Bad news: 张总去世了 “ Director Zhang passed away” expressed in sadness (emotion)
– Information provider: 飞往纽约的飞机将要起飞 “Flight for New York is going to take off”:
Expressed in a mild mood
– Dialog: 是中国队赢了吗?“Did Chinese team win?”: Emphasize “Chinese”, with interrogative mood
• Current SSML hard to show the difference between the expressions above
![Page 5: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649ea25503460f94ba5c12/html5/thumbnails/5.jpg)
Motivation(3/3)
emotion
Positive, neutral, negative
style
news
Sports comment
dialog
Info providing
……
characteristic
Expressive speech
Emotion, style and characteristic are relatively independent but cannot be separatedCharacteristic and style: relatively stable and global featuresEmotion: short-time, local feature
Expressing pattern
No tag
Phisiological/social characteristics
Voice tag
Phisiological reactations
No tag
With different speaking stylesRepresenting speaker’s attitude, purpose and emotionMore harmonious with the circumstance
![Page 6: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649ea25503460f94ba5c12/html5/thumbnails/6.jpg)
Outline
MotivationMotivation
Expression of SpeechExpression of Speech
Proposed SSML extension Proposed SSML extension
ConclusionConclusion
![Page 7: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649ea25503460f94ba5c12/html5/thumbnails/7.jpg)
Expression of SpeechStyle : speaking style( dialog, news, information providing…)Mood : mood( request, acquisition, affirmation, apology…) Emotion : emotional activities( neutral, negative, positive)
Mood Emotion
Intonation Emphasis
Speaking RateBreak
Spectral Features
Duration Energy Pitch
Style
![Page 8: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649ea25503460f94ba5c12/html5/thumbnails/8.jpg)
Hierarchical framework of Prosody
• Break level– B0: no break– B1: Syllable – B2: Prosodic word– B3: Prosodic Phrase– B4: Breath Group– B5: Prosodic Group
• Chiu-yu Tseng,et al. Fluent speech prosody: Framework and modeling. Speech Communication, 46(2005) 284-399
![Page 9: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649ea25503460f94ba5c12/html5/thumbnails/9.jpg)
我永远忘不了 <B3/25ms> 一张对日抗战时的新闻照片, <B3/507ms> 轰炸后的废墟焦土上,<B3/272ms> 一个衣不蔽体、 <B3/384ms> 满身尘土灰烟的幼儿 <B3/100ms> 坐在地上 <B3/75ms> 无助的大哭着。 <B5/1110ms> 那是一再令我热泪盈眶的镜头。 <B3/507ms> 新闻摄影中的战争传真 <B3/276ms> 已不能只称是照片了。 <B5/802ms>
• From Chiu-yu Tseng, report in Beijing University, Oct 11, 2005
![Page 10: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649ea25503460f94ba5c12/html5/thumbnails/10.jpg)
Outline
IntroductionIntroduction
Expression of SpeechExpression of Speech
Proposed SSML extension Proposed SSML extension
ConclusionConclusion
![Page 11: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649ea25503460f94ba5c12/html5/thumbnails/11.jpg)
Proposed tag ( 1/2 )• Utterance: prosodic group, expressing a complete meaning
– Attributes:Style : speaking style
Value :News, Reading, Information provider, dialog, etc
Emotion: speaking emotion
Value :Happy 、 Sad 、 Angry 、 Calm 、 Despair, etc
+1 for positive,0 for neutral, -1 for negative
mood : speaking mood
Value :given, request, acquisition, affirmation,apology, etc
![Page 12: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649ea25503460f94ba5c12/html5/thumbnails/12.jpg)
Proposed tag ( 2/2 )• BG: breath group
– attributes:intonation :
Value : indicative, interrogative, imperative
• PPh: prosodic phrase
• PW: prosodic word
• Syl: Syllable
![Page 13: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649ea25503460f94ba5c12/html5/thumbnails/13.jpg)
Some examples(1/3)• <?xml version="1.0"?>• <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"• xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"• xsi:schemaLocation="http://www.w3.org/2001/10/synthesis• http://www.w3.org/TR/speech-synthesis/synthesis.xsd"• xml:lang=“zh-CN">• <utterence style=”information provide” emotion=”-1” mood=”apology”>• <bg intonation=” indicative”>• <pph>1121 次航班 (Flight 1121)</pph>• <pph> 延误 (has been delayed )• <pw><emphasis level=”strong”>1 小时 (for an hour )</emphasis></pw></pph>• <break strength=”medium”, time=”215ms”/>• <pph> 请旅客们到 (Please go to )</pph>• <pw><emphasis=”moderate”>G6</emphasis=”moderate”></pw>• <pph> 候机厅等候 (the waiting room)</pph>• </bg>• </utterence>• </speak>
![Page 14: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649ea25503460f94ba5c12/html5/thumbnails/14.jpg)
Some examples(2/3)• <?xml version="1.0"?>• <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"• xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"• xsi:schemaLocation="http://www.w3.org/2001/10/synthesis• http://www.w3.org/TR/speech-synthesis/synthesis.xsd"• xml:lang=“zh-CN">• <utterence style=”dialog” emotion=”neutral” mood=”acquisition”>• <bg intonation=”interrogative”>• <pph><pw>• <emphasis level=”strong”> 张威 (Zhang Wei )</emphasis>• </pw></pph>• <break strength=medium time=75ms/>• <pph> 担心肖荫开车发晕 (is afraid of Xiao Yin being dizzy when driving
)</pph>• </bg>• </utterence>• </speak>
![Page 15: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649ea25503460f94ba5c12/html5/thumbnails/15.jpg)
Some examples(3/3)• <?xml version="1.0"?>• <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"• xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"• xsi:schemaLocation="http://www.w3.org/2001/10/synthesis• http://www.w3.org/TR/speech-synthesis/synthesis.xsd"• xml:lang=“zh-CN">• <utterence style=”dialog” emotion=”angery”>• <bg intonation=”interrogative”>• <prosody rate=”x-fast”> 难道不是你的错吗? (Isn’t it your fault? )• <break strength=”medium” time=”520ms”/>• </bg>• <bg intonation=”imperative”>• 以后你小心一点 (Be careful next time)• </bg>• </utterence>• </speak>
![Page 16: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649ea25503460f94ba5c12/html5/thumbnails/16.jpg)
Outline
MotivationMotivation
Expression of SpeechExpression of Speech
Proposed SSML extension Proposed SSML extension
ConclusionConclusion
![Page 17: SSML Extension for Expressive Mandarin TTS Shuang Li Hongwu Yang Lianhong Cai Tsinghua University.](https://reader035.fdocuments.us/reader035/viewer/2022070409/56649ea25503460f94ba5c12/html5/thumbnails/17.jpg)
Conclusion & question?
• 5 elements for hierarchic prosodic structure– utterance, bg, pph, pw, syl
• 3 expressive attributes for utterance– style– emotion– mood
• 1 intonation attributes for bg– intonation