Modelling Personality Features by Changing Prosody in Synthetic Speech
-
Upload
amber-trujillo -
Category
Documents
-
view
36 -
download
0
description
Transcript of Modelling Personality Features by Changing Prosody in Synthetic Speech
Modelling Personality Features by Changing Prosody
in Synthetic Speech
Jürgen Trouvain1,2, Sarah Schmidt3, Marc Schröder4, Michael Schmitz3 & Bill Barry2
1Phonetik-Büro Trouvain, Saarbrücken2Institute of Phonetics, Saarland University
3Institute of Computer Science, Saarland University4DFKI GmbH, Saarbrücken
2
Dimensions of human personality
Personality dimension High level Low level
Neuroticism sensitive, nervous secure, confident
Extraversion outgoing, energetic
shy, withdrawn
Openness to experience inventive, curious cautious, conservative
Agreeableness friendly, compassionate
competitive, outspoken
Conscientiousness efficient, organized
easy-going, careless
Five factor model:
3
Features of personality in synthetic speech
• Nass & Lee (2001)– "introverted~extroverted" (among others)– manipulated parameters in synthetic speech:
• F0 range
• F0 mean
• tempo
– listeners perceive degree of introversion as predicted
4
Dimensions of brand personality
Personality dimension Attributes Examples
Sincerity down-to-earth, honest, wholesome, cheerful
plant
Excitement daring, spirited, imaginative, up-to-date
snowboard
Competence reliable, intelligent, successful
lexicon
Sophistication upper class, charming fragrance
Ruggedness outdoorsy, tough tractor
Aaker (1997)
5
Prosody of brand personality
F0 mean F0 range tempo
Sincerity +
Excitement + + +
Competence – + +
Sophistication – +
Ruggedness – –
findings of possible correlates in literature
6
Synthetic speech
• MARY speech synthesis mary.dfki.de
• two voices– male voice (Mbrola de6)– female voice (Mbrola de7)
• one utterance– "Hallo, ich bin Produkt XY. Ich möchte mich kurz
vorstellen. Ich werde nun meine Eigenschaften erläutern."
7
Parametrisation of prosody
F0 mean F0 range tempo
lowered ("–") – 30% 2 st 0%
baseline 0% 4 st +15%*
raised ("+") +30% 8 st +30%
* default rather slow
8
Manipulating prosodyF0 mean F0 range tempo
Sincerity 0 +1 0
Excitement +1 +1 +1
Competence -1 +1 +1
Sophistication -1 +1 0
Ruggedness -1 0 -1
Baseline 0 0 0
Audiorecorder
Audiorecorder
Audiorecorder
9
Listening test
Schmidt (2005)
• judging on scale from 1 (does not fit at all) to 5 (fits very well)
• 36 native speakers of German
• online test
10
Judgements female voice
judged as baseline as intended
sincere 3.4 3.5
excited 2.9 4.1
competent 3.5 3.3
sophistic. 3.1 3.2
rugged 2.6 2.7
**
1 = "does not fit at all" – 5 = "fits very well" ** = p < 0.01; * = p < 0.05; (*) = p <0.06
11
Judgements female voice
judged as baseline as intended best rated version
sincere 3.4 3.5
excited 2.9 4.1
competent 3.5 3.3
sophistic. 3.1 3.2
rugged 2.6 2.7
1 = "does not fit at all" – 5 = "fits very well" ** = p < 0.01; * = p < 0.05; (*) = p <0.06
**
12
Judgements female voice
judged as baseline as intended best rated version
sincere 3.4 3.5
excited 2.9 4.1
competent 3.5 3.3 3.6 (sincere)
sophistic. 3.1 3.2 3.4 (sincere)
rugged 2.6 2.7 3.3 (sophist.)
1 = "does not fit at all" – 5 = "fits very well" ** = p < 0.01; * = p < 0.05; (*) = p <0.06
**
13
Judgements female voice
judged as baseline as intended best rated version
sincere 3.4 3.5
excited 2.9 4.1
competent 3.5 3.3 3.6 (sincere)
sophistic. 3.1 3.2 3.4 (sincere)
rugged 2.6 2.7 3.3 (sophist.)*
**1 = "does not fit at all" – 5 = "fits very well" ** = p < 0.01; * = p < 0.05; (*) = p <0.06
**
14
judged as baseline as intended
sincere 3.2 3.5
excited 2.5 3.4
competent 3.5 4.0
sophistic. 2.7 3.6
rugged 3.0 3.6
Judgments male voice
**
*
**
(*)
1 = "does not fit at all" – 5 = "fits very well" ** = p < 0.01; * = p < 0.05; (*) = p <0.06
15
Judgements male voicejudged as baseline as intended best rated version
sincere 3.2 3.5
excited 2.5 3.4
competent 3.5 4.0
sophistic. 2.7 3.6
rugged 3.0 3.6
1 = "does not fit at all" – 5 = "fits very well" ** = p < 0.01; * = p < 0.05; (*) = p <0.06
**
**
(*)
*
16
Judgements male voicejudged as baseline as intended best rated version
sincere 3.2 3.5 3.7 (sophistic.)
excited 2.5 3.4
competent 3.5 4.0 4.1 (sophistic.)
sophistic. 2.7 3.6
rugged 3.0 3.6 4.1 (competent)
1 = "does not fit at all" – 5 = "fits very well" ** = p < 0.01; * = p < 0.05; (*) = p <0.06
**
**
(*)
*
17
Judgements male voicejudged as baseline as intended best rated version
sincere 3.2 3.5 3.7 (sophistic.)
excited 2.5 3.4
competent 3.5 4.0 4.1 (sophistic.)
sophistic. 2.7 3.6
rugged 3.0 3.6 4.1 (competent)
1 = "does not fit at all" – 5 = "fits very well"
(*)
*
**
** = p < 0.01; * = p < 0.05; (*) = p <0.06
**
**
(*)
*
18
Summary• tendency for statistically significant differences
– between baseline and models– between baseline and best versions
• different preferences for different voices– "excited" 3.4 (male) vs. 4.1 (female)– "rugged" 4.1 (male) vs. 3.3 (female)
• improved default settings for synthesis– male: "sophisticated" model– female: "sincere" model
19
Conclusions
• modelling personality in synthethis possible
• more research needed, eg. wrt "excited" (also important for emotional synthesis)
• parametrical synthesis vs. unit-selection
• applications:– talking objects– speech prostheses for voice-handicapped– tuning of a synthetic corporate voice