Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín...

Classification of Discourse Functions of Affirmative Words

in Spoken Dialogue

Agustín Gravano, Stefan Benus, JuliaJulia Hirschberg

Shira Mitchell, Ilia Vovsha

INTERSPEECH, Antwerp, August 2007INTERSPEECH, Antwerp, August 2007

Spoken Language Processing GroupSpoken Language Processing GroupColumbia UniversityColumbia University

Agustín Gravano INTERSPEECH 2007

2

Cue Words

Ambiguous linguistic expressions used for Making a semantic contribution, or Conveying a pragmatic function.

Examples: now, well, so, alright, and, okay, first, by the way, on the other hand.

Single affirmative cue words Examples: alright, okay, mm-hm, right, uh-huh, yes. May be used to convey acknowledgment or

agreement, to change topic, to backchannel, etc.


3

Research Goals

Learn which features best characterize the different functions of single affirmative cue words.

Determine how these can be identified automatically.

Important in Spoken Dialogue Systems: Understand user input. Produce output appropriately.


4

Previous Work

Classification of cue words into discourse vs. sentential use. Hirschberg & Litman ’87, ’93; Litman ’94; Heeman,

Byron & Allen ’98; Zufferey & Popescu-Belis ’04. In our corpus:

right: 15% discourse, 85% sentential. All other affirmative cue words: 99% disc., 1% sent.

Discourse vs. sentential distinction insufficient. Need to define new classification tasks.


5

Talk Overview

Columbia Games Corpus Classification tasks Experimental features Results


6

The Columbia Games Corpus 12 spontaneous task-oriented dyadic conversations

in Standard American English. 2 subjects playing computer games; no eye contact.


7

The Columbia Games CorpusFunction of Affirmative Cue Words

Cue Words alright gotcha huh mm-hm okay right uh-huh yeah yep yes yup

Functions Acknowledgment / Agreement Backchannel Cue beginning discourse segment Cue ending discourse segment Check with the interlocutor Stall / Filler Back from a task Literal modifier Pivot beginning: Ack/Agree + Cue begin Pivot ending: Ack/Agree + Cue end

7.9% of the words in our corpus


8

Literal Modifierthat’s pretty much okay

BackchannelSpeaker 1: between the yellow mermaid and

the whaleSpeaker 2: okaySpeaker 1: and it is

Cue beginning discourse segmentokay we gonna be placing the blue moon



9


3 trained labelers Inter-labeler agreement:

Fleiss’ Kappa = 0.69 (Fleiss ’71) In this study we use the majority label for

each affirmative cue word. Majority label: label chosen by at least two of the

three labelers.


10

Identification of a discourse segment boundary function Segment beginning

vs. Segment end vs. No discourse segment boundary function

Identification of an acknowledgment function Acknowledgment vs. No acknowledgment

MethodTwo new classification tasks


11

ML Algorithm JRip: Weka’s implementation of the propositional

rule learner Ripper (Cohen ’95). We also tried J4.8, Weka’s implementation of the

decision tree learner C4.5 (Quinlan ’93, ’96), with similar results.

10-fold cross validation in all experiments.

MethodMachine Learning Experiments


12

IPU (Inter-pausal unit) Maximal sequence of words delimited by pause >

50ms.

Conversational Turn Maximal sequence of IPUs by the same speaker, with

no contribution from the other speaker.

MethodExperimental features


13

Text-based features Extracted from the text transcriptions. Lexical id; POS tags; position of word in IPU / turn; etc.

Timing features Extracted from the time alignment of the transcriptions. Word / IPU / turn duration; amount of overlap; etc.

Acoustic features {min, mean, max, stdev} x {pitch, intensity} Slope of pitch, stylized pitch, and intensity, over the whole word,

and over its last 100, 200, 300ms. Acoustic features from the end of the other speaker’s previous turn.

MethodExperimental features


14

ResultsDiscourse segment boundary function

Feature Set Error RateF-Measure

Begin End

Text-based 11.6 % .77 .30

Timing 11.3 % .73 .52

Acoustic 14.2 % .66 .19

Text-based + Timing 9.8 % .81 .53

Full set 9.6 % .81 .57

Baseline (1) 19.0 % .00 .00

Human labelers (2) 5.7 % .94 .71

(1) Majority class baseline: NO BOUNDARY.(2) Calculated wrt each labeler’s agreement with the majority labels.


15

ResultsAcknowledgment function

Feature Set Error Rate F-Measure

Text-based 8.3 % .94

Timing 11.0 % .92

Acoustic 17.2 % .87

Text-based + Timing 6.2 % .95

Full set 6.5 % .95

Baseline (1) 16.7 % .88

Human labelers (2) 5.5 % .98

(1) Baseline based on lexical identity: {huh, right } no ACK all other words ACK(2) Calculated wrt each labeler’s agreement with the majority labels.


16

Best-performing features

Discourse Segment Boundary Function

Acknowledgment Function

• Lexical identity• POS tag of the following word• Number and proportion of

succeeding words in the turn• Context-normalized mean

intensity

• Lexical identity• POS tag of preceding word• Number and proportion of

preceding words in the turn• IPU and turn length


17

ResultsClassification of individual words

Classification of each individual word into its most common functions. alright Ack/Agree, Cue Begin, Other mm-hm Ack/Agree, Backchannel okay Ack/Agree, Backchannel, Cue

Begin, Ack+CueBegin, Ack+CueEnd,

Other right Ack/Agree, Check, Literal Modifier yeah Ack/Agree, Backchannel


18

ResultsClassification of the word ‘okay’

Feature SetError Rate

F-MeasureAck /Agree

Back-channel

Cue Begin

Ack/Agree + Cue Begin

Ack/Agree + Cue End

Text-based 31.7 .76 .16 .77 .09 .33

Acoustic 40.2 .69 .24 .64 .03 .25

Text-based + Timing 25.6 .79 .31 .82 .18 .67

Full set 25.5 .80 .46 .83 .21 .66

Baseline (1) 48.3 .68 .00 .00 .00 .00

Human labelers (2) 14.0 .89 .78 .94 .56 .73

(1) Majority class baseline: ACK/AGREE.(2) Calculated wrt each labeler’s agreement with the majority labels.


19

Summary

Discourse/sentential distinction is insufficient for affirmative cue words in spoken dialogue.

Two new classification tasks: Detection of an acknowledgment function. Detection of a discourse boundary function.

Best performing ML models: Based on textual and timing features. Slight improvement when using acoustic features.


20

Further Work

Gravano et al, 2007On the role of context and prosody in the interpretation of ‘okay’.ACL 2007, Prague, Czech Republic, June 2007.

Benus et al, 2007The prosody of backchannels in American English. ICPhS 2007, Saarbrücken, Germany, August 2007.

Classification of Discourse Functions of Affirmative Words

in Spoken Dialogue

Agustín Gravano, Stefan Benus, JuliaJulia Hirschberg

Shira Mitchell, Ilia Vovsha

INTERSPEECH, Antwerp, August 2007INTERSPEECH, Antwerp, August 2007

Spoken Language Processing GroupSpoken Language Processing GroupColumbia UniversityColumbia University


22

alright mm-hm okay right uh-huh yeah Other Total

Ack / Agree 99 61 1137 114 18 808 133 2370

Backchannel 6 402 121 14 143 72 5 763

Cue Begin 89 0 548 2 0 2 0 641

Cue End 8 0 10 0 0 0 0 18

Pivot Begin 5 0 68 0 0 0 0 73

Pivot End 13 12 232 2 0 22 17 298

Back from Task 9 1 33 0 0 0 0 43

Check 0 0 6 53 0 1 8 68

Stall 1 0 15 1 0 2 0 19

Literal Modifier 9 0 29 1079 0 0 1 1118

? 56 27 235 10 3 65 11 407

Total 295 503 2434 1275 164 972 175 5818

Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín...

Documents

Transcript of Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín...