circle A Comparison of Tutor and Student Behavior in Speech Versus Text Based Tutoring Carolyn P....

23
ci rc le A Comparison of Tutor and Student Behavior in Speech Versus Text Based Tutoring Carolyn P. Rosé, Diane Litman, Dumisizwe Bhembe, Kate Forbes, Scott Silliman, Ramesh Srivastava, Kurt Vanlehn May 31, 2003 HLT-NAACL Workshop on Educational Applications of NLP

Transcript of circle A Comparison of Tutor and Student Behavior in Speech Versus Text Based Tutoring Carolyn P....

circle

A Comparison of Tutor and Student Behavior in Speech Versus Text Based Tutoring

Carolyn P. Rosé, Diane Litman, Dumisizwe Bhembe, Kate Forbes, Scott Silliman, Ramesh

Srivastava, Kurt VanlehnMay 31, 2003

HLT-NAACL Workshop on Educational Applications of NLP

circle

Overview

Hypothesis: speech based tutorial dialogue systems may be more effective than text based ones

Research Context: WHY2 Conceptual physics tutoring project Parallel corpus collection effort: speech versus text based human

tutoring– Comparing features that correlate with learning gains in text based tutoring

(Rosé et al., 2003)

Parallel system development: speech versus text based intelligent tutoring systems– ITSPOKE (Litman et al., 2003) adds speech input and output to WHY2-Atlas

system (Vanlehn et al., 2002; Jordan and Vanlehn, 2002; Rosé et al., 2002)

circle

Benefits of Tutorial Dialogue

Human tutoring more effective than classroom instruction– 2 Sigma effect sizes (Cohen et al., 1982; Bloom, 1984)

– Conjecture: effective because of collaborative dialogue (Fox, 1993; Graesser et al., 1995; Merrill et al., 1992; Chi et al., 2001)

Student self-explanation enhances learning (Chi et al., 1981; Renkl, 1997; Pressley et al., 1992; Lemke, 1990)– Motivates an “Ask, don’t tell” tutoring strategy (Vanlehn et al., 1998)

– Trend in favor of Socratic versus Didactic tutoring (Rosé et al, 2001)

Student language makes student thinking visible Tutors may tailor the presentation of material to the needs of

each student

circle

Tutorial Dialogue Systems

Tutorial dialogue systems are typically text based – (Evans et al., 2001; Rosé et al., 2001; Aleven et al., 2001; Zinn et al., 2002;

Vanlehn et al., 2002)

Many have been evaluated successfully with students – (Rosé et al., 2001; Heffernan and Koedinger, 2002; Ashley et al., 2002; Graesser

et al., 2001)

Some effective tutoring systems with speech capabilities– (Mostow and Aist, 2001; Aist et al., 2003; Fry et al., 2001)

circle

Benefits of Speech

Students do more self explanation in speech versus text (Hausmann and Chi, 2002)

Our preliminary results show larger effect size for speech based human tutoring in about half the time (1.23 sigma vs. .68 sigma)

Speech monopolizes fewer cognitive resources than typing– Leaves more resources for self-explanation and learning– Potential for hands-free interaction (Smith, 1992)

Speech contains prosodic and acoustic information text lacks– Useful for predicting emotional states (Litman et al., 2003; Ang et al., 2002;

Batliner et al., 2000)

– Improves accuracy at detection and correction of misrecognized utterances (Litman et al., 2000 & 2001; Hirschberg et al., 2001)

circle

Why2 Conceptual Physics Tutoring

circleWHY2-Atlas Typed Human-Human

Tutoring Corpus

Pretest measures prior knowledge of physics– Focus on basic concepts and common misconceptions such as “heavier

objects exert more force” (Hake, 1998; Halloun and Hestenes, 1985)

Students read 9-page mini-textbook Students work through up to 10 essay problems designed to elicit

the expression of common misconceptions – Two balls are released in a vacuum, one of which has twice the mass of the

other. Which will hit the ground first? Explain.

Post-test, similar to pretest, allows us to measure gain in knowledge

Conditions: Text/Speech human tutoring, Atlas,Targeted Text

circle

Targeted Text Versus Dialogue

Here are a few things to keep in mind when calculating acceleration for a body at rest. Acceleration is change in velocity over time.

If velocity is not changing, then there is zero acceleration.

Now, if for a finite time interval the velocity remains zero, then it is true that it is not changing.

Therefore, when a body is at rest during a time interval, its acceleration within that time interval is zero.

Tutor: Here are a few things to keep in mind when calculating acceleration for a body at rest. Acceleration is change of what over time?

Student: velocity

Tutor: Right. If the velocity is not changing, what is the magnitude of the acceleration?

Student: zero.

Tutor: Good. If velocity remains zero within a time interval, how much is it changing?

Student: zero

Tutor: Super. So if a body is at rest during a time interval, what is the...

circle

Extended Student Explanations Correlation between average turn length and learning

– Effect of pretest score regressed out

– R=.565, p<.05, N=17 Detailed analysis of complete transcripts of 7 students who completed

the study (Rosé et al., 2003)

– Coded for question types and negative feedback types Length of answer correlates with likelihood of receiving negative

feedback (Kappa = .78)

– R= .8065, p< .01,N=9

Longer answers more opportunities for learning

circleITSPOKE:

Intelligent Tutoring SPOKEn Dialogue System

“Back-end” is Why2-Atlas

Speech input via Sphinx2 speech recognizer

- 55 dialogue-dependent language models created from

4551 typed student utterances (Why2-Atlas corpus)

- about to be enhanced with spoken data

Speech output via Festival text-to-speech synthesizer

circle

ITSPOKE Screen Shot

circleITSPOKE (Spoken) Human-Human

Tutoring Corpus

Data

- Target size: 20 subjects

- Current size (May 20): 10 subjects / 86 dialogues / 62 transcribed & turn-annotated

WHY2-Atlas (text) vs. ITSPOKE (speech) data

- same experimental procedures

- input and output dialogue modalities differ

- strict turn-taking in typed; overlaps in speech

circleTyped vs. Spoken Human Tutoring: Overview of

Results

Intelligent Tutoring evaluation (learning gains, reading control) Spoken Dialogue evaluation (efficiency) Dialogue Phenomena/Learning correlations

- larger student turn lengths and student-tutor word ratios correlate with learning in text (Rosé et al., 2003, Core et al. 2002)

- are turn length and word ratios similar in text and speech?

circle

Post-Test Results (new!)

Mean SD N

Spoken Dialogue .70 .15 7 Typed Dialogue .67 .12 20 Reading Targeted Text .57 .13 20

Spoken > Reading Targeted (p<.03, sigma=1.23) Typed > Reading Targeted (p<.01, sigma=0.70)

circle

Time on Task Results (new!)

Mean SD N

Reading Targeted Text 85 38 20 Spoken Dialogue 170 56 7Typed Dialogue 430 160 20

Reading Targeted Text > Spoken Dialogue (p<.001) Spoken Dialogue > Typed Dialogue (p<.05)

circle

Differences:Text/Speech (updated)

39.04 393.3 14.4tutortext

26.401185 48tutorspeech

14.37 184.9 12.8studenttext

5.71 281.48 49.45studentspeech

#words/turn

#words/dialog

#turns/dialog

participantcondition

Speech: (n=62 dialogues); Text (n=166) mean student turn length (correlated with learning gains in text) is

shorter in speech (p<.001)

circle

Differences (continued)

Relative proportion of student to tutor words does differ across conditions t(27)=3.68, p<.001

#Student trns/#Tutor trns

0.88 0.45 1.00 0.27

#Student wrds/#Tutor wrds

#Student trns/#Tutor trns

#Student wrds/#Tutor wrds

Text Condition Speech Condition

circleWhy2-Atlas Ex. ITSPOKE Ex.

(Human Human Corpora)

Problem: Suppose that you released 3 identical balls of clay in a vacuum at exactly the same instant. They would all hit the ground at the same instant. Now you stick two of the balls together, forming one ball that is twice as heavy as the remaining, untouched clay ball. Both balls are released in a vacuum at exactly the same instant. Which ball hits the ground first?

Original Essay: Both balls will hit the ground at the same time. The balls are in free fall (only gravitational forces). The ratio of the masses and weight are equal.

--------------------------2 min into dialogue--------------------------------Tutor: You have correctly said the balls are in free fall. What do you conclude from this

fact? Student: There is no air resistance and the balls’ accelerations are constant in regards to

one another Tutor: Right, so the conclusion is that they both have the same acceleration, the free fall

acceleration. Now you have to show that time taken by them to reach the ground will be the same. How do you do that?

Student: F (net forces) / mass = f/m because F = m*a therefore ratio should equal one

another Tutor: But that does not tell you anything about the time taken. What physical quantities

will determine the time taken. Will the initial velocity of the balls matter?

Student: yes, assuming starting at rest? sorry, kind of at a loss at the moment

Problem: If a car is able to accelerate at 2 m/s2, what acceleration can it attain if it is towing another car of equal mass?

Original Essay: If the car is towing another car of equal mass, the maximum acceleration would be the same because the car would be towed behind and the friction caused would only be by the front of the first car.

-----------------------------6.5 min into dialogue--------------------------Tutor: Now this law that force is equal to mass times acceleration, what's this law called?

This is uh since this it is a very important basic uh fact uh it is it is a law of physics. Um you have you have read it in the background material. Can you recall it?

Student: Um no it was one of Newton's laws but I don't- remember which one. (laugh)

Tutor: Right, right- That- is Newton's second law of motion. Student: he I- Ok, because I remember one, two, and three, but I didn't know if there was a

different name

Tutor: Yeah that's right you know Newton was a genius-

Student: (laugh)

Tutor: and uh he looked at a large number of experiments and experimental data that was available and from that he could come to this general law and it is known as Newton's second law of motion. Um many many other scientists before him had seen all this data which was collected by scientists but had not concluded this now it looks very simple but to come to the conclusion from a mass of data was something which required the genius of Newton.

Student: mm hm

circle

Current and Future Directions

Further analysis of dialogue differences

- more student and less tutor questions in speech

- characterization of overlaps in speech

- coding of questions and other phenomena Further learning gains analyses Additional tutors Other dialogue evaluation metrics Human-computer corpus collection and analysis

circle

Summary

Goal: develop and apply language and speech technology to yield the next generation of intelligent tutoring systems

Contributions:– Empirical comparisons between typed and spoken

tutorial dialogue – Correlation of dialogue behavior with learning gains

circle

ITSPOKE: Human-Human Corpus Transcription and Annotation

circle

Example DialogueProblem: If a car is able to accelerate at 2 m/s2, what acceleration can it attain if it is towing another car of

equal mass?

Original Essay: If the car is towing another car of equal mass, the maximum acceleration would be the same because the car would be towed behind and the friction caused would only be by the front of the first car.

-----------------------------6.5 min into dialogue--------------------------

Tutor: Now this law that force is equal to mass times acceleration, what's this law called? This is uh since this it is a very important basic uh fact uh it is it is a law of physics. Um you have you have read it in the background material. Can you recall it?

Student: Um no it was one of Newton's laws but I don't- remember which one. (laugh) Tutor: Right, right-

That- is Newton's second law of motion. Student: he I- Ok, because I remember one, two, and three, but I didn't know if there was a different name

Tutor: Yeah that's right you know Newton was a genius-

circle

Knowledge Construction Dialogues

Motivation: test “Ask, Don’t Tell” strategy in an ITS Interactive directed lines of reasoning: analogies, concrete

illustrations Finite state dialogue structure (Freedman, 2000)

Robust parsing with LCFLex (Rosé and Lavie, 2001)

KCD Authoring Tool Suite (Jordan, Rosé, and VanLehn, 2001)

– 55 KCDs fully implemented and pilot tested in 3 months

Extensively evaluated with students– KCDs are better than hints but not reliably better than minilessons

– (Rosé et al., 2001; Siler et al., 2002; Rosé et al., 2003)