Producing Emotional Speech Thanks to Gabriel Schubiner.

41
Producing Emotional Speech Thanks to Gabriel Schubiner

Transcript of Producing Emotional Speech Thanks to Gabriel Schubiner.

Page 1: Producing Emotional Speech Thanks to Gabriel Schubiner.

Producing Emotional

Speech

Thanks to Gabriel Schubiner

Page 2: Producing Emotional Speech Thanks to Gabriel Schubiner.

Papers

Generation of Affect in Synthesized Speech

Corpus-based approach to synthesis

Expressive visual speech using talking head

Demos

Affect Editor Quiz/Demo

Synface Demo

Page 3: Producing Emotional Speech Thanks to Gabriel Schubiner.

Affect in SpeechGoals

Addition of Emotion to Synthetic speech

Acoustic Model

Typology of parameters of emotional speech

Quantification

Addresses problem of expressiveness

What benefit is gained from expressive speech?

Page 4: Producing Emotional Speech Thanks to Gabriel Schubiner.

Emotion Theory/Assumptions

Emotion -> Nervous System -> Speech Output

Binary distinction

Parasympathetic vs Sympathetic

based on physical changes

universal emotions

Page 5: Producing Emotional Speech Thanks to Gabriel Schubiner.

Approaches to Affect

Generative

Emotion -> Physical -> Acoustic

Descriptive

Observed acoustic params imposed

Page 6: Producing Emotional Speech Thanks to Gabriel Schubiner.

Descriptive Framework

4 Parameter groups

Pitch

Timing

Voice Quality

Articulation

Assumption of independence

How could this affect design and results?

Page 7: Producing Emotional Speech Thanks to Gabriel Schubiner.

PitchTiming

Accent Shape

Average Pitch

Contour Slope

Final Lowering

Pitch Range

Reference Line

Exaggeration (not used)

Fluent Pauses

Hesitation Pauses

Speech Rate

Stress Frequency

Stressed Stressable

Page 8: Producing Emotional Speech Thanks to Gabriel Schubiner.

Voice Quality Articulation

Breathiness

Brilliance

Loudness

Pause Discontinuity

Pitch Discontinuity

Tremor

Laryngealization

Precision

Page 9: Producing Emotional Speech Thanks to Gabriel Schubiner.

Implementation

Each parameter has scale

Each scale is independent

from other parameters

between positive and negative

Page 10: Producing Emotional Speech Thanks to Gabriel Schubiner.

Implementation

Settings grouped into preset conditions for each emotion

based on prior studies

Page 11: Producing Emotional Speech Thanks to Gabriel Schubiner.

Program Flow: Input

Emotion -> parameter representation

Utterance -> clauses

Agent, Action, Object, Locative

Clause and lexeme annotations

Finds all possible locations of affect and chooses whether or not to use

Page 12: Producing Emotional Speech Thanks to Gabriel Schubiner.

Program Flow

Utterance -> Tree structure -> linear phonology

“compiled” for specific synthesizer with software to simulate affects not available in hardware

Page 13: Producing Emotional Speech Thanks to Gabriel Schubiner.
Page 14: Producing Emotional Speech Thanks to Gabriel Schubiner.

Perception

30 Utterances

5 sentences * 6 affects

Forced choice of one of six affects

magnitude and comments

Page 15: Producing Emotional Speech Thanks to Gabriel Schubiner.

Elicitation Sentences

Intro

I’m almost finished

I’m going to the city

I saw your name in the paper X

I thought you really meant it

Look at that picture

Page 16: Producing Emotional Speech Thanks to Gabriel Schubiner.

Pop Quiz!!!

Page 17: Producing Emotional Speech Thanks to Gabriel Schubiner.

Pop Quiz Solutions

I’m almost finishedDisgust : Surprise : Sadness : Gladness : Anger : Fear

I’m going to the citySurprise : Gladness : Anger : Disgust : Sadness : Fear

I thought you really meant itAnger : Disgust : Gladness : Sadness : Fear : Surprise

Look at that pictureAnger : Fear : Disgust : Sadness : Gladness : Surprise

Page 18: Producing Emotional Speech Thanks to Gabriel Schubiner.

Resultsapprox 50% recognition rate

91% sadness

Page 19: Producing Emotional Speech Thanks to Gabriel Schubiner.
Page 20: Producing Emotional Speech Thanks to Gabriel Schubiner.

Conclusions

Effective?

Thoughts?

Page 21: Producing Emotional Speech Thanks to Gabriel Schubiner.

Corpus-based Approach to

Expressive Speech Synthesis

Page 22: Producing Emotional Speech Thanks to Gabriel Schubiner.

Corpus

Collect utterances in each emotion

emotion-dependent semantics

One speaker

Good news, Bad news, Question

Page 23: Producing Emotional Speech Thanks to Gabriel Schubiner.

Model: Feature Vector

FeaturesLexical stressPhrase-level stressDistance from beginning of phraseDistance from end of phrasePOSPhrase-typeEnd of syllable pitch

Page 24: Producing Emotional Speech Thanks to Gabriel Schubiner.

Model: Classification

Predicts F0

5 syllable window

Uses feature vector to predict observation vector

observation vector: log(p), Δp

p = end of syllable pitch

Decision Tree

Page 25: Producing Emotional Speech Thanks to Gabriel Schubiner.

Model: Target Duration

Similar to predicting F0

build tree with goal of providing Gaussian at leafs

Use mean of class as target duration

discretization

Page 26: Producing Emotional Speech Thanks to Gabriel Schubiner.

ModelsUses acoustic analogue of n-grams

captures sense of contextcompared to describing full emotion as sequence

compare to Affect EditorUses only F0 and length (comp. A E)Include information about from which utterance the features are derived

intentional bias, justified?

Page 27: Producing Emotional Speech Thanks to Gabriel Schubiner.

Model: SynthesisData tagged with original expression and emotion

expression-cost matrix

noted trade-off:

emotional intensity vs. smoothness

Paralinguistic events

Page 28: Producing Emotional Speech Thanks to Gabriel Schubiner.

SSML

Compare to Cahn’s typology

Abstraction layers

Page 29: Producing Emotional Speech Thanks to Gabriel Schubiner.

Perception Experiment

Distinguish same utterance spoken with neutral and affected prosody

Semantic content problematic?

Page 30: Producing Emotional Speech Thanks to Gabriel Schubiner.

Results

Binary decision

Reasonable gain over baseline?

Page 31: Producing Emotional Speech Thanks to Gabriel Schubiner.

Conclusion

Major contributions?

Paths forward?

Page 32: Producing Emotional Speech Thanks to Gabriel Schubiner.

Synthesis of Expressive Visual Speech on a

Talking Head

Page 33: Producing Emotional Speech Thanks to Gabriel Schubiner.

< Not these Talking Heads...

>

Page 34: Producing Emotional Speech Thanks to Gabriel Schubiner.

Synthesis Background

Manipulation of video imagesVirtual model with deformation parametersSynchronized with time-aligned transcriptionArticulatory Control Model

Cohen & Massaro (1993)

Page 35: Producing Emotional Speech Thanks to Gabriel Schubiner.

Data

Single actor

Given specific emotion as instruction

6 emotions + neutral

Page 36: Producing Emotional Speech Thanks to Gabriel Schubiner.

Facial Animation Parameters

Face independent

FAP Matrix * scaling factor + position0

Weighted deformations of distance between vertices and feature point

Page 37: Producing Emotional Speech Thanks to Gabriel Schubiner.

Modeling

Phonetic segments assigned target parameter vector

temporal blending over dominance functions

Principal components

Page 38: Producing Emotional Speech Thanks to Gabriel Schubiner.

ML

Separate models for each emotion

6:1 training:testing ratio

models -> PC traj -> FAP traj * emotion param matrix

Page 39: Producing Emotional Speech Thanks to Gabriel Schubiner.

Results

More extreme emotions easier to perceive

73% sad, 60% angry, 40% sad

Page 40: Producing Emotional Speech Thanks to Gabriel Schubiner.

Synface Demo

Page 41: Producing Emotional Speech Thanks to Gabriel Schubiner.

Discussion

Changes in approach from Cahn to Eide

Production compared to Detection