Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the...

55
Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano Columbia University Columbia University Turn-Taking and Affirmative Cue Words in Task-Oriented Dialogue

Transcript of Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the...

Page 1: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Submitted in partial fulfillment of therequirements for the degree

of Doctor of Philosophyin the Graduate School of Arts and Sciences

Agustín Gravano

Columbia UniversityColumbia University

Turn-Taking and Affirmative Cue Wordsin Task-Oriented Dialogue

Page 2: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

2

Special thanks to: Julia Hirschberg Committee Members

Maxine Eskenazi, Kathy McKeown, Becky Passonneau, Amanda Stent.

The Speech Lab Stefan Benus, Fadi Biadsy, Sasha Caskey, Bob Coyne, Frank

Enos, Martin Jansche, Jackson Liscombe, Sameer Maskey, Andrew Rosenberg.

Collaborators Gregory Ward and Elisa Sneed German (Northwestern U);

Ani Nenkova (UPenn); Héctor Chávez, David Elson, Michel Galley, Enrique Henestroza, Hanae Koiso, Shira Mitchell, Michael Mulley, Kristen Parton, Ilia Vovsha, and Lauren Wilcox.

Family and friends

Page 3: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

3

Interactive Voice Response Systems

Quickly spreading. Mostly simple functionality. Examples of IVR systems:

Let’s Go!: Bus scheduling information (CMU).GOOG-411: Local information (Google).

Most visible components of IVR systems:Automatic Speech Recognition (ASR)Text-To-Speech (TTS)

Introduction

Page 4: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

4

ASR+TTS account for most IVR problems. ASR: Up to 60% word error rate. TTS: Described as ‘odd’ or ‘mechanical’.

As ASR and TTS improve, other problems begin to show: Coordination of system-user exchanges. Frequent words overloaded with multiple functions.

Introduction

Interactive Voice Response Systems

Page 5: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

5

Coordination of Exchanges

Let’s Go! Demo (http://www.speech.cs.cmu.edu/letsgo/)

S: Thank you for calling the CMU Let's Go! Bus Information System.

[…] What can I do for you?

U: I would like to go to the airport tomorrow morning. [silence]

S: To the airport. When do you want to arrive?

U: I'd like to arrive at 10:30. [silence]

S: Arriving at around 10:30 AM. Where do you want to leave from?

U: I'd like to leave from Carnegie Mellon. [silence]

S: From Carnegie Mellon. There is a 28X leaving Forbes Avenue […]

Turn boundary detection is currently based on silence detection. Problems: latencies and false positives.

Introduction

Page 6: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

6

Overloaded Cue Words

Cue words: expressions such as by the way, however, after all. Frequent in dialogue, used for structuring

discourse and shaping conversation. Affirmative cue words: okay, alright, etc.

Convey acknowledgment, start a new topic, display continued attention, inter alia.

Frequent in task-oriented dialogue. IVR systems: understanding and generation.

Introduction

Page 7: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

7

Motivation

Understand and incorporate these and other phenomena into IVR systems, aiming at gradually approaching human-like behavior.

Descriptions of associations between observed phenomena (e.g. turn exchange types) and measurable events (e.g. variations in acoustic features). No strong claims about the degree of awareness of

speakers and listeners.

Introduction

Page 8: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

8

(1) Columbia Games Corpus

(2) Study of Turn-Taking

(3) Study of Affirmative Cue Words

Page 9: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

9

Columbia Games Corpus

Task-oriented spontaneous dialogues. Two subjects, each with a laptop computer. Series of collaborative computer games. Soundproof booth; head-mounted mics. No eye contact; only verbal communication. No restrictions; subjects could speak freely.

Page 10: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

10

Cards Game, Part 1

Columbia Games Corpus

Player 1: Describer Player 2: Searcher

Page 11: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

11

Cards Game, Part 2

Player 1: Describer Player 2: Searcher

Columbia Games Corpus

Page 12: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

12

Objects Game

Player 1: Describer Player 2: Follower

Columbia Games Corpus

Page 13: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

13

Columbia Games Corpus

12 sessions, 13 subjects (6 female, 7 male). 9 hours of dialogue. Orthographic transcription and alignment.

70K words, 2K unique words Non-word vocalizations (laughs, coughs, etc.) Prosodic transcription (ToBI conventions). Automatically generated session logs.

Page 14: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

14

(1) Columbia Games Corpus

(2) Study of Turn-Taking

(3) Study of Affirmative Cue Words

Page 15: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

15

Goals

Speech understanding: Detection of the end of the user’s turn. Detection of points in the user’s turn where a

backchannel response would be welcome. Speech generation:

Display of cues signalling the end of system’s turn. Display of cues inviting the user to produce a

backchannel response.

Turn-Taking

Page 16: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

16

Previous Work

Sacks, Schegloff & Jefferson 1974. General characterization of turn-taking in

conversation between two or more persons. Transition-relevance place: The current speaker

may either yield the turn, or continue speaking. Duncan 1972, 1973, 1974, inter alia.

Six turn-yielding cues in face-to-face dialogue. Linear relation between the number of displayed

cues and the likelihood of a turn-taking attempt.

Turn-Taking

Page 17: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

17

Previous Work

Corpus and perception studies. Formalized and verified some of the turn-yielding cues

hypothesized by Duncan. Ford & Thompson 1996; Wennerstrom & Siegel 2003;

Cutler & Pearson 1986; Wichmann & Caspers 2001. Implementations of turn-boundary detection.

Simulations (Ferrer et al. 2002, 2003; Edlund et al. 2005; Schlangen 2006; Atterer et al. 2008; Baumann 2008).

Actual systems (Raux & Eskenazi 2008, on Let’s Go!). Exploiting turn-yielding cues improves performance.

Turn-Taking

Page 18: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

18

Turn-Yielding Cues

Cues displayed by the speaker when approaching a potential turn boundary.

Turn-Taking

Page 19: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

19

Method

Smooth switch: Speaker A finishes her utterance; speaker B takes the turn with no overlapping speech.

Trained annotators distinguished Smooth switches from Interruptions and Backchannels using a scheme based on Ferguson 1977, Beattie 1982.

Turn-Yielding Cues

IPU (Inter Pausal Unit): Maximal sequence of words from the same speaker surrounded by silence ≥ 50ms.

Speaker A:

Speaker B:

HoldIPU1 IPU2

IPU3

Smooth switch

Page 20: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

20

Compare IPUs preceding Holds and IPUs preceding Smooth switches.

Assumption: Cues are more likely to occur before Smooth switches than before Holds.

Speaker A:

Speaker B:

Hold Smooth switchIPU1 IPU2

IPU3

Turn-Yielding Cues

Method

Page 21: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

21

1. Final intonation

2. Speaking rate

3. Intensity level

4. Pitch level

5. Textual completion

6. Voice quality

7. IPU duration

Individual Turn-Yielding Cues

Page 22: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

22

Individual Turn-Yielding Cues

Smoothswitch

Hold

H-H% 22.1% 9.1%

[!]H-L% 13.2% 29.9%

L-H% 14.1% 11.5%

L-L% 47.2% 24.7%

No boundary tone 0.7% 22.4%

Other 2.6% 2.4%

Total 100% 100% (2 test: p≈0)

1. Final Intonation

Falling, high-rising: turn-final. Plateau: turn-medial. Examination of final pitch slope shows same results.

Page 23: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

23

Individual Turn-Yielding Cues

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Syllables persecond

Phonemesper second

Syllables persecond

Phonemesper second

Final IPU Final word

S

H

2. Speaking Rate

Reduced final lengthening before turn boundaries.

**

* *

(*) ANOVA: p < 0.01

Smooth switch

Hold

Final wordEntire IPU

z-sc

ore

Page 24: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

24

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

IPU Final1.0s

Final0.5s

IPU Final1.0s

Final0.5s

Intensity Pitch

S

H

3/4. Intensity and Pitch Levels

Individual Turn-Yielding Cues

* **

* * *

Intensity Pitch

(*) ANOVA: p < 0.01

Lower intensity, pitch levels before turn boundaries.

Smooth switch

Hold

z-sc

ore

Page 25: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

25

5. Textual Completion

Syntactic/semantic/pragmatic completion independent of intonation and gesticulation.

Automatic computation of textual completion.(1) Manually annotated a portion of the data.

3 labelers; 400 IPUs; Fleiss’ = 0.814.

(2) Trained an SVM classifier. 80% accuracy; baseline: 55%; human: 91%.

Individual Turn-Yielding Cues

Page 26: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

26

5. Textual Completion

Labeled all IPUs in the corpus with the SVM model.

Individual Turn-Yielding Cues

Incomplete

Complete

Smooth switch Hold

18%

82%47% 53%

(2 test, p ≈ 0)

Textual completion seems to be almost a necessary condition before switches, but not before holds.

Page 27: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

27

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

IPU Final1.0s

Final0.5s

IPU Final1.0s

Final0.5s

IPU Final1.0s

Final0.5s

Jitter Shimmer NHR

S

H

6. Voice Quality

Individual Turn-Yielding Cues

**

*

* * **

*

*

Jitter Shimmer NHR

Higher jitter, shimmer, NHR before turn boundaries.

(*) ANOVA: p < 0.01

Smooth switch

Hold

z-sc

ore

Page 28: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

28

7. IPU Duration

Individual Turn-Yielding Cues

Longer IPUs before turn boundaries.

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

IPU duration IPU wordcount

*

*

(*) ANOVA: p < 0.01

Smooth switch

Hold

z-sc

ore

Page 29: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

29

1. Final intonation

2. Speaking rate

3. Intensity level

4. Pitch level

5. Textual completion

6. Voice quality

7. IPU duration

Individual Cues

Turn-Yielding Cues

Page 30: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

30

Combined Cues

Number of cues conjointly displayed

Per

cent

age

of t

urn-

taki

ng a

ttem

pts

Turn-Yielding Cues

0%

10%

20%

30%

40%

50%

60%

70%

0 1 2 3 4 5 6 7

r 2 = 0.969

Page 31: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

31

Backchannel-Inviting Cues

Cues displayed by the speaker inviting the listener to produce a backchannel response.

Turn-Taking

Page 32: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

32

Compare IPUs preceding Holds and IPUs preceding Backchannels.

Assumption: Cues are more likely to occur before Backchannels than before Holds.

Backchannel-Inviting Cues

Method

Speaker A:

Speaker B:

Hold BackchannelIPU1 IPU2

IPU3

IPU4

Page 33: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

33

Backchannel-Inviting Cues

Individual Cues

1. Final rising intonation: H-H% or L-H%.

2. Higher intensity level.

3. Higher pitch level.

4. Longer IPU duration.

5. Lower NHR.

6. Final POS bigram: DT NN, JJ NN, or NN NN.

Page 34: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

34

Backchannel-Inviting Cues

Combined Cues

Number of cues conjointly displayed

Per

cent

age

of I

PU

s fo

llow

ed b

y a

BC

-5%

0%

5%

10%

15%

20%

25%

30%

35%

0 1 2 3 4 5 6

r 2 = 0.812 r

2 = 0.993

Page 35: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

35

Speaker A:

Speaker B:

ip2ip1 ip3

Overlapping Speech

95% of overlaps start during the turn-final intermediate phrase (ip).

We look for turn-yielding cues in the second-to-last intermediate phrase (e.g., ip2).

Hold Overlap

Turn-Taking

Page 36: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

36

Overlapping Speech

Cues found in second-to-last ips: Higher speaking rate. Lower intensity. Higher jitter, shimmer, NHR.

All cues match the corresponding cues found in (non-overlapping) smooth switches.

Cues seem to extend further back in the turn, becoming more prominent toward turn endings.

Future research: Generalize the model of discrete turn-yielding cues.

Turn-Taking

Page 37: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

37

(1) Columbia Games Corpus

(2) Study of Turn-Taking

(3) Study of Affirmative Cue Words

Page 38: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

38

Affirmative Cue Words

8% of the words in the Columbia Games Corpus: okay, right, yeah, mm-hm, alright, uh-huh, gotcha, huh,

yep, yes, yup. 10 discourse/pragmatic functions:

Acknowledgment/agreement, Literal modifier, Backchannel, Cue beginning/ending discourse segment, Check with the interlocutor, Stall/Filler, Back from a task, Pivot beginning/ending (Ack+Cue).

Labeled by 3 trained annotators. Fleiss’ = 0.69: ‘Substantial’ agreement.

Page 39: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

39

Examples

Affirmative Cue Words

that’s pretty much okay

Speaker 1: between the yellow mermaid and the whale

Speaker 2: okaySpeaker 1: and it is

okay we’re gonna be placing the blue moon

Literal modifier

Backchannel

Cue beginning discourse segment

Page 40: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

40

Interactive Voice Response Systems

Speech understanding: Must interpret the user’s input correctly.

Speech generation: Need to convey potentially ambiguous terms

with the appropriate parameters for the intended meaning.

Affirmative Cue Words

Page 41: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

41

Previous Work

Disambiguation of single-word cue phrases. well, now, say, so, like, really, … Discourse vs. sentential senses. Hirschberg & Litman 1987, 1993; Litman 1994, 1996;

Zufferey & Popescu-Belis 2004, Lai 2008. Affirmative cue words.

Hockey 1991, 1992; Kowtko 1997: Intonational differences across discourse/pragmatic functions.

Jurafsky et al. 1998: Lexical identity is a strong cue to word function.

Affirmative Cue Words

Page 42: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

42

Descriptive statistics

Large contextual differences Backchannels occur always as separate turns. Cue beginnings occur mostly in turn-initial

position. Modifier instances of right occur in all positions

within the turn, but rarely as separate turns. Acknowledgments occur in turn initial, medial

and final positions, and also as separate turns.

Affirmative Cue Words

Page 43: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

43

Descriptive statistics

Final intonation Backchannel: Rising (H-H%, L-H%) Cue beginning: Falling (L-L%) Check: High-rising (H-H%)

Intensity Backchannel: High Cue beginning: High Cue ending: Low

Affirmative Cue Words

Page 44: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

44

Perception study of okay

Okay is the most frequent ACW in the corpus. How do hearers disambiguate its meaning?

Acoustic/prosodic/phonetic vs. contextual info? 20 subjects classified 54 tokens of okay into

{Ack, BC, CueBeg} in two conditions: No context available: only the word okay. Context available: 2 full speaker turns.

Affirmative Cue Words

contextualized ‘okay’

Speaker A:okayokaySpeaker B:

Page 45: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

45

Perception study of okay

No context available Very low inter-subject agreement. Correlations of word function with acoustic/prosodic/

phonetic features. Context available

Higher inter-subject agreement. Contextual features trump ac/pr/ph features of okay. Exception: Final intonation of okay.

Affirmative Cue Words

Page 46: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

46

Automatic Classification

Identify automatically the function of ACWs. Classification into discourse vs. sentential function

insufficient for ACWs. right: 15% discourse, 85% sentential. All other ACWs: 99% discourse, 1% sentential.

New classification tasks: Detection of an acknowledgment function.

Acknowledgment vs. No acknowledgment. Detection of a discourse segment boundary function.

SegBeg vs. SegEnd vs. None.

Affirmative Cue Words

Page 47: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

47

Automatic Classification

Lexical features Lexical id, POS tags, n-grams.

Discourse features Position of target word in IPU, turn, conversation.

Timing features Duration of word, IPU, turn; amount of overlaps; latencies.

Acoustic features Pitch, intensity, pitch slope, voice quality.

Phonetic features Id, duration of each phone.

Affirmative Cue Words

Page 48: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

48

Automatic Classification

Discourse Boundary Acknowledgment

Error Rate Error Rate

Baseline (1) 18.6 % 15.3 %

SVM: Word-only 14.4 % 15.0 %

SVM: Online (up to current IPU) 10.1 % 6.7 %

SVM: Full model 6.9 % 4.5 %

Human labelers 5.7 % 3.3 %

(1) Discourse Boundary: majority class == no boundaryAcknowledgment: {right, huh} no ACK; all others ACK

(*) Significantly different (Wilcoxon signed rank sum test; p < 0.05)

Affirmative Cue Words

***

**

}}}

}}

Page 49: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

49

Affirmative Cue Words

Speaker Entrainment

In conversation, people adapt the way they speak to match their partner. Referring expressions (Brennan 1996). Syntactic constructions (Reitter et al. 2006). Intensity (Coulston et al. 2002, Ward & Litman 2007).

Entrainment at different levels (lex, syn, sem): Key for both production and understanding, and facilitates

interaction (Pickering & Garrod 2004, Goleman 2006). Predictor of task success (MapTask; Reitter & Moore 2007).

Page 50: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

50

Affirmative Cue Words

Speaker Entrainment

Two novel measures of entrainment based on usage of high-frequency words (HFW), including ACW.

Entrainment of HFW correlates with:(+) Game score Task success

(+) Proportion of overlaps

(–) Proportion of interruptions Dialogue coordination

(–) Latency of smooth switches

Future work: Establish causality relation. Impact on IVR system design and/or evaluation.

}

Page 51: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

51

(1) Columbia Games Corpus

(2) Study of Turn-Taking

(3) Study of Affirmative Cue Words

Page 52: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

52

Contributions

Columbia Games Corpus Valuable dataset for studying spontaneous task-

oriented dialogue.

Study of Turn-Taking Turn-yielding cues. Backchannel-inviting cues. Objective, automatically computable. Combined cues. Improve turn-taking decisions of IVR systems.

Page 53: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

53

Contributions

Study of Affirmative Cue Words Descriptive statistics and perceptual results. Automatic classification. Speaker entrainment. Understanding and generation in IVR systems.

Results drawn from task-oriented dialogues, thus not necessarily generalizable, but suitable for most IVR domains.

Necessary steps towards the ambitious, long-term goal of human-like speech systems.

Page 54: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Agustín Gravano - Thesis Defense - Jan 28, 2009

54

Future Work

Additional turn-taking cues. Voice quality? Novel ways to combine cues. Weights? Study cues that extend over entire turns,

increasing near potential turn boundaries. Characterize interruptions. Speaker entrainment

Affirmative cue words. Turn-taking behavior. Acoustic/prosodic variation.

Page 55: Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Agustín Gravano.

Submitted in partial fulfillment of therequirements for the degree

of Doctor of Philosophyin the Graduate School of Arts and Sciences

Agustín Gravano

Columbia UniversityColumbia University

Turn-Taking and Affirmative Cue Wordsin Task-Oriented Dialogue