Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley...

18
Predicting Student Predicting Student Emotions in Computer- Emotions in Computer- Human Tutoring Dialogues Human Tutoring Dialogues Diane J. Litman & Kate Forbes-Riley University of Pittsburgh Department of Computer Science Learning Research and Development Center In Proceedings of ACL 2004

Transcript of Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley...

Page 1: Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Predicting Student Emotions Predicting Student Emotions in Computer-Human in Computer-Human Tutoring DialoguesTutoring Dialogues

Diane J. Litman & Kate Forbes-RileyUniversity of Pittsburgh

Department of Computer ScienceLearning Research and Development Center

In Proceedings ofACL 2004

Page 2: Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

AbstractAbstract

• Examine the utility of speech and lexical features for predicting student emotions in computer-human spoken tutoring dialogues.

• compare the results of machine

learning experiments using these features alone or in combination to predict student emotions.

Page 3: Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

IntroductionIntroduction• Intelligent tutoring dialogue systems

have become more prevalent in recent years.

• Long-term goal is to merge dialogue and affective tutoring research, by enhancing our intelligent tutoring spoken dialogue system to automatically predict and adapt to student emotions, and to investigate whether this improves learning and other measures of performance.

Page 4: Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Computer-Human Dialogue DataComputer-Human Dialogue Data• ITSPOKE (Intelligent Tutoring SPOKEn dialogue sy

stem), a spoken dialogue tutor built on top of the Why2-Atlas conceptual physics text-based tutoring system (VanLehn et al., 2002). – the Why2-Atlas back-end uses 3 different approaches (sy

mbolic, statistical and hybrid) parses the student essay into propositional representations, resolves temporal and nominal anaphora.

– During the dialogue, student speech is digitized from microphone input and sent to the Sphinx2 recognizer. Sphinx2's best “transcription” is then sent to the Why2-Atlas back-end for syntactic, semantic and dialogue analysis. Finally, the text response produced by Why2-Atlas is sent to the Cepstral text-to-speech system and played to the student. After the dialogue, the student revises the essay, thereby ending the tutoring or causing another round of tutoring/essay revision.

Page 5: Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

ITSPOKEITSPOKE

• In ITSPOKE, a student first types an essay answering a qualitative physics problem. ITSPOKE then analyzes the essay and engages the student in spoken dialogue to correct misconceptions and to elicit complete explanations.

Page 6: Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Computer-Human Dialogue DataComputer-Human Dialogue Data

• Corpus collected from 11 2003 - 4 2004.

• Subjects are University of Pittsburgh students who have never taken college physics, and who are native English speakers. Subjects first read a small document of background physics material, then work through 5 problems with ITSPOKE.

• The corpus contains 100 dialogues from 20 subjects. 15 dialogues have been annotated for emotion. (total 333 turns)

Page 7: Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Before researchBefore research• Using ripper rule induction machine learning progra

m, which takes as input the classes to be learned:• Use 25 fold cross validation• text + speech

– If( duration>0.65) ^ (text has “i”) then neg– Else if (duration>=2.98) then neg– Else if (duration>=2.98) ^ (start time>=297.62) then pos– Else if (text has “right”) then pos– Else neutral

• Estimated mean error and standard error is 33.03% +/- 2.45%

Page 8: Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Before researchBefore research• Text only

– If (text has “i”) ^ (text has “don’t”) then neg– Else if (text has “um”) ^ (text has “<hn>”) then neg– Else if (text has “the”) ^ (text has “<fs>”) then neg– Else if (text has “right”) then pos– Else if (text has “so”) then pos– Else if (text has “(laugh)”) ^ (text has “that’s”) then

pos– Else neutral

– Estimated mean error and standard error is 39.03% +/- 2.45%

Page 9: Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Annotating Student TurnsAnnotating Student Turns• These emotions are viewed along a linear

scale, shown and defined as – Negative

• A student turn that expresses emotions such as confused, bored, irritated. Evidence of a negative emotion can come from many knowledge sources such as lexical items (e.g. ”I don't know”), and/or acoustic-prosodic features (e.g., prior-turn pausing )

– Neutral• a student turn not expressing a negative or positive

emotion. Evidence comes from moderate loudness, pitch, and tempo.

– Positive• a student turn expressing emotions such as confident.

Evidence comes from louder speech and faster tempo.– Mixed

• A student turn expressing both positive and negative emotions.

Page 10: Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.
Page 11: Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

• Negative , Positive, Neutral (NPN): – mixeds are conflated with neutrals.– Agreement on 2 annotators:0.4

• Negative, Non-Negative (NnN): – positives, mixeds, neutrals are conflated as non-nega

tives.– Agreement on 2 annotators:0.5

• Emotional, Non-Emotional (EnE): – negatives, positives, mixeds are conflated as Emotion

al; neutrals are Non-Emotional.– Agreement on 2 annotators:0.5

Annotating Student TurnsAnnotating Student Turns

Page 12: Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Extracting Feature from TurnsExtracting Feature from Turns

• Extracted the set of features ( Restricted our features to those that can be computed in real-time)

• Acoustic-prosodic features– 4 (f0):max, min, mean, standard deviation– 4 RMS : max, min, mean, standard deviation– 4 temporal: amount of silence in turn, turn duration, duration

of pause prior to turn ( start_time ), speaking rate• Lexical features

– Human-transcribed lexical items in the turn– ITSPOKE-recognized lexical items in the turn

• Identifier features– Subject ID, gender, problem ID

Page 13: Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Predicting student emotionsPredicting student emotions• Created the 10 feature sets to study the effects that va

rious feature combinations had on predicting emotion.– sp: 12 acoustic-prosodic features– lex: human-transcribed lexical items– asr: ITSPOKE recognized lexical items– sp+lex: combined sp and lex features– sp+asr: combined sp and asr features– +id: each above set + 3 identifier features

• Use Weka machine learning software ( Witten and Frank 1999) + boosted decision trees to automatically learn our emotion prediction models.

Page 14: Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Predicting agreed turnsPredicting agreed turns• 10 run 10-fold cross vali

dation• Baseline: (MAJ), always

predict the majority class

• Compare statistically to the relevant baseline accuracy, as automatically in Weka using a 2-tailed t-test (p<0.5)

Page 15: Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Predicting consensus turns Predicting consensus turns • In order to increase the usable

data set for prediction, and to include the more difficult annotation cases.

Page 16: Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Comparison with Human Comparison with Human Tutoring Tutoring

• Using the same experimental methodology for spoken human tutoring dialogues. (include same subject pool, physical problems, web and audio interface, etc)

• 2 annotators to annotate human tutoring data – NnN agreement:88.96%[77.8%]– EnE agreement:77.26%[66.1%]– NPN agreement:75.06%[60.7%]

• The table shows the results for the agreed data

Page 17: Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

DiscussDiscuss• Hypothesize that such differences arise in

part due to differences between the 2 corpora– (1)student turns with computer tutor are much

shorter than with the human tutor, thus contain less emotional content – making both annotating and predicting more difficult.

– (2)student responses to the computer tutor differently and perhaps more idiosyncratically than to the human tutors

– (3)the computer tutor is less flexible than the human tutor which also effects student emotional response and its expression.

Page 18: Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Conclusion and current directionsConclusion and current directions

• Our feature sets outperform a majority baseline, and lexical features outperform acoustic-prosodic features. While adding identifier features typically also improves performance, combining lexical and speech features do not.

• Also suggest that prediction in consensus-labeled turns is harder than in agreed turns.

• Continuing work are exploring partial automation via semi-supervised machine learning.

• Manual annotation might also improve reliability.

• Include features available in our logs (semantic analysis )