Modeling of Nonverbal Characteristics of Persuasive Speech

10
*1 *1 *1 *2 Modeling of Nonverbal Characteristics of Persuasive Speech Yukiko Hirabayashi *1 , Yusuke Fujita *1 , Tomoaki Yoshinaga *1 and Yoshinori Kitahara *2 Abstract - With the objective of developing a persuasive voice-interaction system for making presentations to large groups, we analyzed the nonverbal characteristics, especially the prosody and face motion, of 35 Japanese speakers and used the results to model the persuasive prosody and face motion for the system. In regards to prosody, the maximum and average voice pitches of the persuasive speakers were high and the dynamic range of the persuasive speakers’ voice pitches was wide. Additionally, the maximum and average lengths of silent pauses of the persuasive speakers were long and the dynamic range of silent pause lengths of the persuasive speakers was wide. In regards to face motion, we found that the persuasive speakers mainly moved their faces from side to side and sparingly moved their faces during utterances. We have reproduced these nonverbal characteristics of prosody and face motion by synthesized voice and computer-generated (CG) animation and confirmed that these characteristics enhanced speakers’ persuasiveness. Keywords: Persuasiveness Nonverbal Prosody Gesture Multimodal 1. [1][2] [3] [3] CG [4] [5] ( ) [5,6] ( ) [6] Mehrabian 55% 38% 7% [4,7] *1: *2: *1: Research & Development Group, Hitachi, Ltd. *2: The Graduate School of Engineering, Tokyo University of Agriculture and Technology

Transcript of Modeling of Nonverbal Characteristics of Persuasive Speech

*1 *1 *1 *2

Modeling of Nonverbal Characteristics of Persuasive Speech

Yukiko Hirabayashi*1, Yusuke Fujita*1, Tomoaki Yoshinaga*1 and Yoshinori Kitahara*2

Abstract - With the objective of developing a persuasive voice-interaction system for making presentations to

large groups, we analyzed the nonverbal characteristics, especially the prosody and face motion, of 35 Japanese

speakers and used the results to model the persuasive prosody and face motion for the system. In regards to

prosody, the maximum and average voice pitches of the persuasive speakers were high and the dynamic range of

the persuasive speakers’ voice pitches was wide. Additionally, the maximum and average lengths of silent pauses

of the persuasive speakers were long and the dynamic range of silent pause lengths of the persuasive speakers was

wide. In regards to face motion, we found that the persuasive speakers mainly moved their faces from side to side

and sparingly moved their faces during utterances. We have reproduced these nonverbal characteristics of prosody

and face motion by synthesized voice and computer-generated (CG) animation and confirmed that these

characteristics enhanced speakers’ persuasiveness.

Keywords: Persuasiveness Nonverbal Prosody Gesture Multimodal

1.

[1][2]

[3]

[3] CG

[4]

[5]

( )

[5,6]

( )

[6]

Mehrabian

55%

38% 7% [4,7]

*1:

*2:

*1: Research & Development Group, Hitachi, Ltd.

*2: The Graduate School of Engineering, Tokyo University of Agriculture and Technology

CG

3 3.1

3.2 3.3

4 4.1

4.2

CG

2.

Mehrabian

[8]

LaCrosse

[9] Burgoon 60

[10] Pearce

[11]

[12]

Miller

[13]

[14]

[15]

ICT

Park

[16]

Ramanarayanan

[17].

[21]

[18]

1 1

Kitahara

[19]

[19] Pelachaud

CG

[20] Sumi

[21]

Huang

[22]

CG

3.

3.1

35

1

1

Figure 1 Speaker’s Image

1 20-30

WAV

3.2

3.2.1

35 (20 )125

64

61 2 (A B)

19 20

5

4

5 1

3.5

2.5

4 (

) A 2.8 1.0 3.2 0.9

4.3 0.8 2.1 0.9 B 2.8 1.0 3.3 1.0

4.8 0.5 2.3 0.9 3

1 (4.3 4.8 ) (Cohens’

d) 0.68 1%

13 9

3.2.2

Praat[23] Praat

70-400 Hz

Praat

1

1

(

)

(

) 1

( / )

3.2.3

(Text-To-Speech, TTS) [24,25] Praat

Audacity[26]

Praat Audacity

3.3

3.3.1

35 119 (20

)

64 55 2 (A B) 3

2

19 20

5

4

5 1

3.5 2.5

4

A 3.4 0.9 2.5 0.7 3.7

0.7 2.2 0.8 B 3.5 0.9 2.7 0.9 3.8

0.9 2.0 0.8 4

15 7

3.3.2

[27,28]

5

3

3

4.

4.1

4.1.1

(a)

( 1(b)) t

t 20 1(b)

(Cohens’ d)

0.8

p < 0.01

( )

1%

1

Table 1 A Comparison of Fundamental Frequencies of

Persuasive and Unpersuasive groups.

4.1.2

2

t

t 20 2

0.8

p > 0.01

[14]

[15] [14]

8.43 7.05 5.47 mora/s ( /

1 1 mora )

11.72 10.75 mora/s [14]

5.73 5.99 mora/s

2

Table 2 A Comparison of Speech Rates of Persuasive

and Unpersuasive groups.

4.1.3

3

t

20 3

p > 0.01

4

t

20 4

0.8

p < 0.01

( )

3

Table 3 A Comparison of Sounding Durations of

Persuasive and Unpersuasive groups.

4

Table 4 A Comparison of Silent Durations of

Persuasive and Unpersuasive groups.

4.1.4

5

( 5 % )

1

5 % 2

Praat

Audacity

( 2(5) ) 1

TTS

TTS

DP

(TTS (Control) 2(1) ) TTS (Control)

TTS (Control)

TTS (Short

silence)( 2(2) ) TTS (Short silence)

2

Figure 2 Speech waveform and F0 of TTS and Model real voices.

(1) TTS (Control), (2)TTS (Short silence), (3)TTS (Narrow pitch),

(4)TTS (Persuasive), (5) Model real voice.

TTS (Control)

TTS (Narrow pitch)( 2(3)

) TTS (Narrow pitch)

TTS (Persuasive)( 2(4) ) TTS (Short silence)

TTS (Narrow pitch)

TTS

2 2(1)-(5)

7

4

3.2.1 116

(20 ) 63 52 2

4 5

3

( )

(1) TTS (Control) (2)TTS (Short silence) (3)TTS

(Narrow pitch) (4)TTS (Persuasive)

F(3, 460)=106.17, p < 0.001

1%

(Tukey-Kramer ) (2)TTS

(Short silence) (3)TTS (Narrow pitch)

p < 0.01 (** ) (2) (3)

1%

3 TTS

Figure 3 Comparing of persuasiveness of TTS voices

4.1.5

[8,11]

[8,11]

[14]

[15]

4.2

4.2.1

3 3

A

4

-60

-40

-20

0

20

40

60

1 301 601 901 1201Left

Right

Front

Voice waveform

(a)

(b)

Down

Up

Front

0 10 20 30 40

-60

-40

-20

0

20

40

60

1 301 601 901 12010 10 20 30 40

(c)

Time (s)

4

(a) (b) (c)

Figure 4 Time-series variations of face direction of a

persuasive speaker. (a) Horizontal direction, (b) Vertical

direction, and (c) Voice waveform.

4(a) (b)

0

0

(c)

1

(a)(b)(c) A

(a) (c)

5 3 3

(A)~(C)

A B C (D)~(F)

D E F

(A) A 4(a)(b) 3

B C

A

30 10 2.6

10 0.8

D E F

30

0 10 5

[5,8]

4.2.2

4.2.1 3

(a) (b)

2

5

(A-C) (D-F)

Figure 5 Time-series variations of face directions of (A-C) persuasive and (D-F) unpersuasive speakers.

6

F(t)

0

F(t)

M(t)

S(t)

6

Figure 6 Model of Face Direction and Speech of

Persuasive Speaker.

M(t) S(t)

(1) (M(t)=0)

(S(t)=0) ( )

(2) (M(t)=0)

(S(t) 0) ( )

(3) (M(t) 0)

(S(t)=0) ( )

(4) (M(t) 0)

(S(t) 0)

(5) L

( L )

(6) L

(L L)

6 /

7

S0 S1 S2

S3

M(t) S(t) L 0 t=0

S0 t

M(t)=0

(L=L+1)

M(t)≠0 and S(t)=0

M(t)≠0 and S(t)≠0

M(t)≠0 and S(t)≠0

or

β<L or

M(t)≠0 and L<α

M(t)=0 and L β(L=L+1)

M(t)≠0 andS(t)=0 and

α L β

(L=0)

M(t)≠0 and S(t)=0

M(t)=0

M(t)≠0 and S(t)≠0

(1) (2)

(3)

(4)

(1) (2) and (5)

(3) and (5)

(3)

(1) (2)

(4)

(6)

(4)

all

S2

S1

S3

S0

7

Figure 7 Finite automaton of Face Direction and

Speech Model.

4.2.3

4.2.2

/ CG

8 CG

8 CG

Figure 8 Processing flow of CG images with voice.

3

3

6 7 A

A

3D

Poser ®

CG

( 9(a))

CG ( 9(b))

9 CG

(a) (b)

Figure 9 CG images of (a) the face motion and

(b) the face front models.

(1) (2) CG

(20 65 )

3.3.1 5

(

) 10 (1)

(2)

1.28 1%

10 CG

Figure 10 Verification of Face Motion Model by CG

Movies.

4.2.4

[8,11,16]

5.

CG

( )

CG

CG

[3]

[1] , :

(HCS) 2014 3

, (9) (2014).

[2] :

(HCS)2014

10 , (1) (2014).

[3] , , , , , P. Reisert, :

2015

( 29 ), 3M3-2in (2015).

[4] :

; (2010).

[5] : ;

(2010).

[6] :

; (2006).

[7] A. Mehrabian: Silent messages; Wadsworth, Belmont,

California (1971).

[8] A. Mehrabian and M. Williams: Nonverbal

Concomitants of Perceived and Intended Persuasiveness;

Journal of Personality and Social Psychology, 13, 1,

pp37-58 (1969).

[9] M. B. LaCrosse: Nonverbal Behavior and Perceived

Counselor Attractiveness and Persuasiveness; Journal of

Counseling Psychology, 22, 6, pp563-566 (1975).

[10] J. K. Burgoon, T. Birk, M. Pfau: Nonverbal Behaviors,

Persuasion, and Credibility; Human Communication

Research, 17, 1, pp140-169 (1990).

[11] W. Pearce and B. Brommel: Vocalic Communication in

Persuasion; Quarterly Journal of Speech, 58, 3,

pp298-306 (1972).

[12] :

; D-II J79-D-II,12,

pp2154-2162 (1996).

[13] N. Miller, G. Maruyama, R. J. Beaber, and K. Valone:

Speed of Speech and Persuasion; Journal of Personality

and Social Psychology 34 pp.615 (1976).

[14] :

; 57 pp200

(1986).

[15] :

; 8 pp65-70 (2008).

[16] S. Park, H. S. Shim, M. Chatterjee, K. Sagae, L. P.

Morency: Computational Analysis of Persuasiveness in

Social Multimedia: A Novel Dataset and Multimodal

Prediction Approach; ICMI'14, Proceedings of the 16th

International Conference on Multimodal Interaction,

pp50-57 (2014).

[17] V. Ramanarayanan, C. W. Leong, L. Chen, G. Feng, D.

Suendermann-Oeft: Evaluating Speech, Face, Emotion

and Body Movement Time-series Features for

Automated Multimodal Presentation Scoring; ICMI'15,

Proceedings of the 2015 ACM on International

Conference on Multimodal Interaction, pp23-30 (2015).

[18] : Sensitive Argent:

;

2004 pp.11-18

(2004).

[19] Y. Kitahara and Y. Tohkura: Prosodic Control to Express

Emotions for Man-Machine Speech Interaction; IEICE

Trans. Fundamentals, Vol.E75-A, No.2 (1992).

[20] C. Pelachaud: Studies on Gesture Expressivity for a

Virtual Agent; Speech Communication, 51, 7, pp630-639

(2008).

[21] K. Sumi and R. Ebata: Sensitive Argent: Human Agent

Interaction for Learning Service-Minded

Communication; The 1st International conference on

Human-Agent Interaction, III-3-3 (2013).

[22] S. Huang and F. Lin: Sensitive Argent: The design and

evaluation of an intelligent sales agent for online

persuasion and negotiation; Electronic Commerce

Research and Applications, 6, 3, pp285–296 (2007).

[23] http://www.fon.hum.uva.nl/praat/.

[24] : ;

vol. 88 No. 06 pp60-65 (2006).

[25] :

; 2-P-4 (2004).

[26] http://audacity.sourceforge.net/.

[27] :

; 2005

pp.419 (2005).

[28] :

; 15 (MIRU)

IS3-30 (2012).

1990 1992

( )

2002

2012

2015

( )

2003 2005

2015

2002

2004

( )

2015

1979 1981

( )

1986

1989 ( )

2014

1987

1989

2001

( )

(C)NPO法人ヒューマンインタフェース学会