Modelling Personality Features by Changing Prosody in Synthetic Speech

Modelling Personality Features by Changing Prosody

in Synthetic Speech

Jürgen Trouvain1,2, Sarah Schmidt3, Marc Schröder4, Michael Schmitz3 & Bill Barry2

1Phonetik-Büro Trouvain, Saarbrücken2Institute of Phonetics, Saarland University

3Institute of Computer Science, Saarland University4DFKI GmbH, Saarbrücken

2

Dimensions of human personality

Personality dimension High level Low level

Neuroticism sensitive, nervous secure, confident

Extraversion outgoing, energetic

shy, withdrawn

Openness to experience inventive, curious cautious, conservative

Agreeableness friendly, compassionate

competitive, outspoken

Conscientiousness efficient, organized

easy-going, careless

Five factor model:

3

Features of personality in synthetic speech

• Nass & Lee (2001)– "introverted~extroverted" (among others)– manipulated parameters in synthetic speech:

• F0 range

• F0 mean

• tempo

– listeners perceive degree of introversion as predicted

4

Dimensions of brand personality

Personality dimension Attributes Examples

Sincerity down-to-earth, honest, wholesome, cheerful

plant

Excitement daring, spirited, imaginative, up-to-date

snowboard

Competence reliable, intelligent, successful

lexicon

Sophistication upper class, charming fragrance

Ruggedness outdoorsy, tough tractor

Aaker (1997)

5

Prosody of brand personality

F0 mean F0 range tempo

Sincerity +

Excitement + + +

Competence – + +

Sophistication – +

Ruggedness – –

findings of possible correlates in literature

6

Synthetic speech

• MARY speech synthesis mary.dfki.de

• two voices– male voice (Mbrola de6)– female voice (Mbrola de7)

• one utterance– "Hallo, ich bin Produkt XY. Ich möchte mich kurz

vorstellen. Ich werde nun meine Eigenschaften erläutern."

7

Parametrisation of prosody

F0 mean F0 range tempo

lowered ("–") – 30% 2 st 0%

baseline 0% 4 st +15%*

raised ("+") +30% 8 st +30%

* default rather slow

8

Manipulating prosodyF0 mean F0 range tempo

Sincerity 0 +1 0

Excitement +1 +1 +1

Competence -1 +1 +1

Sophistication -1 +1 0

Ruggedness -1 0 -1

Baseline 0 0 0

Audiorecorder

Audiorecorder

Audiorecorder

9

Listening test

Schmidt (2005)

• judging on scale from 1 (does not fit at all) to 5 (fits very well)

• 36 native speakers of German

• online test

10

Judgements female voice

judged as baseline as intended

sincere 3.4 3.5

excited 2.9 4.1

competent 3.5 3.3

sophistic. 3.1 3.2

rugged 2.6 2.7

**

1 = "does not fit at all" – 5 = "fits very well" ** = p < 0.01; * = p < 0.05; (*) = p <0.06

11


judged as baseline as intended best rated version

sincere 3.4 3.5

excited 2.9 4.1

competent 3.5 3.3

sophistic. 3.1 3.2

rugged 2.6 2.7


**

12



sincere 3.4 3.5

excited 2.9 4.1

competent 3.5 3.3 3.6 (sincere)

sophistic. 3.1 3.2 3.4 (sincere)

rugged 2.6 2.7 3.3 (sophist.)


**

13



sincere 3.4 3.5

excited 2.9 4.1

competent 3.5 3.3 3.6 (sincere)

sophistic. 3.1 3.2 3.4 (sincere)

rugged 2.6 2.7 3.3 (sophist.)*

**1 = "does not fit at all" – 5 = "fits very well" ** = p < 0.01; * = p < 0.05; (*) = p <0.06

**

14

judged as baseline as intended

sincere 3.2 3.5

excited 2.5 3.4

competent 3.5 4.0

sophistic. 2.7 3.6

rugged 3.0 3.6

Judgments male voice

**

*

**

(*)


15

Judgements male voicejudged as baseline as intended best rated version

sincere 3.2 3.5

excited 2.5 3.4

competent 3.5 4.0

sophistic. 2.7 3.6

rugged 3.0 3.6


**

**

(*)

*

16


sincere 3.2 3.5 3.7 (sophistic.)

excited 2.5 3.4

competent 3.5 4.0 4.1 (sophistic.)

sophistic. 2.7 3.6

rugged 3.0 3.6 4.1 (competent)


**

**

(*)

*

17


sincere 3.2 3.5 3.7 (sophistic.)

excited 2.5 3.4

competent 3.5 4.0 4.1 (sophistic.)

sophistic. 2.7 3.6

rugged 3.0 3.6 4.1 (competent)

1 = "does not fit at all" – 5 = "fits very well"

(*)

*

**

** = p < 0.01; * = p < 0.05; (*) = p <0.06

**

**

(*)

*

18

Summary• tendency for statistically significant differences

– between baseline and models– between baseline and best versions

• different preferences for different voices– "excited" 3.4 (male) vs. 4.1 (female)– "rugged" 4.1 (male) vs. 3.3 (female)

• improved default settings for synthesis– male: "sophisticated" model– female: "sincere" model

19

Conclusions

• modelling personality in synthethis possible

• more research needed, eg. wrt "excited" (also important for emotional synthesis)

• parametrical synthesis vs. unit-selection

• applications:– talking objects– speech prostheses for voice-handicapped– tuning of a synthetic corporate voice

20

Outlook

www.icphs2007.de

Modelling Personality Features by Changing Prosody in Synthetic Speech

Documents

Transcript of Modelling Personality Features by Changing Prosody in Synthetic Speech