Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of...

14
Carnegie Mellon Mostow 08/27/22, The Sounds of Silence: Towards Automated Evaluation of Towards Automated Evaluation of Student Learning in Student Learning in a Reading Tutor that Listens a Reading Tutor that Listens Jack Mostow and Gregory Aist Jack Mostow and Gregory Aist Project LISTEN, Carnegie Mellon Project LISTEN, Carnegie Mellon University University http://www.cs.cmu.edu/~listen http://www.cs.cmu.edu/~listen

Transcript of Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of...

Page 1: Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack Mostow.

CarnegieMellon

Mostow 04/21/23, p. 1

The Sounds of Silence:

Towards Automated Evaluation Towards Automated Evaluation of of

Student Learning in Student Learning in a Reading Tutor that Listensa Reading Tutor that Listens

Jack Mostow and Gregory AistJack Mostow and Gregory Aist

Project LISTEN, Carnegie Mellon Project LISTEN, Carnegie Mellon UniversityUniversity

http://www.cs.cmu.edu/~listenhttp://www.cs.cmu.edu/~listen

Page 2: Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack Mostow.

CarnegieMellon

Mostow 04/21/23, p. 2

Pilot study in urban elementary school

Goals:Goals:•Analyze extended use of Reading TutorAnalyze extended use of Reading Tutor• Identify opportunities for improvementIdentify opportunities for improvement

Protocol:Protocol:•Principal chose 8 lowest third-grade readersPrincipal chose 8 lowest third-grade readers•Aide took each kid daily to use Reading Tutor in Aide took each kid daily to use Reading Tutor in

small roomsmall room•Kid chose text to read (Kid chose text to read (Weekly ReaderWeekly Reader, poems, …), poems, …)

Milestones:Milestones:•Oct. 96: deployed Pentium, trained users, refined Oct. 96: deployed Pentium, trained users, refined

designdesign•Nov. 96: school pre-tested individuallyNov. 96: school pre-tested individually• June 97: school post-tested individuallyJune 97: school post-tested individually

Page 3: Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack Mostow.

CarnegieMellon

Mostow 04/21/23, p. 3

User may:User may:•click click BackBack•click click HelpHelp•click click GoGo•click wordclick word•readread

Tutor Tutor may:may:

•go ongo on•read wordread word•recue recue wordword

•read read phrasephrase

User-Tutor interaction(11/7/96 version used in pilot

study)

Page 4: Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack Mostow.

CarnegieMellon

Mostow 04/21/23, p. 4

Data recorded by Reading Tutor

Sessions from Nov. 96 to May 97 (excluding Sessions from Nov. 96 to May 97 (excluding outliers)outliers)•29 to 57 sessions per kid, averaging 14 29 to 57 sessions per kid, averaging 14

minutesminutes•Not used during vacations, downtime, Not used during vacations, downtime,

absencesabsences

6 gigabytes of data6 gigabytes of data•.WAV files of kids’ spoken utterances.WAV files of kids’ spoken utterances•.SEG files of time-aligned speech .SEG files of time-aligned speech

recognizer outputrecognizer output•.LOG files of Reading Tutor events.LOG files of Reading Tutor events

Page 5: Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack Mostow.

CarnegieMellon

Mostow 04/21/23, p. 5

What to evaluate?

Usability (can kids use it?)Usability (can kids use it?)•1993 Wizard of Oz experiments1993 Wizard of Oz experiments•Lab and in-school user tests of successive versionsLab and in-school user tests of successive versions

Assistiveness (do kids perform better with than Assistiveness (do kids perform better with than without?)without?)•1994 Reading Coach boosted comprehension by 1994 Reading Coach boosted comprehension by

~20%~20%•But:But: evaluation obtrusive, costly, sparse, evaluation obtrusive, costly, sparse,

subjective, noisysubjective, noisy

Learning (do kids improve over time?)Learning (do kids improve over time?)•Within tutor: Within tutor: this talkthis talk•On unassisted reading: pre-/post-test by schoolOn unassisted reading: pre-/post-test by school•More than with alternatives: More than with alternatives: future studiesfuture studies

Page 6: Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack Mostow.

CarnegieMellon

Mostow 04/21/23, p. 6

How should the Reading Tutor

evaluate learning?

Evaluation should beEvaluation should be•Ecologically validEcologically valid -- based on normal system -- based on normal system

useuse•AuthenticAuthentic -- student chooses material -- student chooses material•UnobtrusiveUnobtrusive -- invisible to student -- invisible to student•AutomaticAutomatic -- objective, cheap -- objective, cheap•FastFast -- computable in real-time on PC -- computable in real-time on PC•RobustRobust -- to student, recognizer, and tutor -- to student, recognizer, and tutor

behaviorbehavior•Data-richData-rich -- based on many observations -- based on many observations•SensitiveSensitive -- detect subtle effects -- detect subtle effects

So estimate improvement in So estimate improvement in assisted performanceassisted performance

Page 7: Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack Mostow.

CarnegieMellon

Mostow 04/21/23, p. 7

How to estimate performance?

AccuracyAccuracy = % of text words matched by = % of text words matched by recognizer outputrecognizer output•Coarse-grainedCoarse-grained•Sensitive to missed wordsSensitive to missed words•Doesn’t penalize requests for helpDoesn’t penalize requests for help

Inter-word latencyInter-word latency = time interval between = time interval between aligned text wordsaligned text words•Finer-grainedFiner-grained•Sensitive to hesitations, insertionsSensitive to hesitations, insertions•Robust to many speech recognizer Robust to many speech recognizer

errorserrors

Page 8: Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack Mostow.

CarnegieMellon

Mostow 04/21/23, p. 8

Estimation of accuracy and latency

(Nov. 96 example from video)

Text:Text:If the computer thinks you need help, it talks to If the computer thinks you need help, it talks to

you.you.

Student said: Student said: if the computer...takes your name...help if the computer...takes your name...help

it...take...s to youit...take...s to you

Recognizer heard: Recognizer heard: IF THE COMPUTER THINKS YOU IF THE HELP IT TO TO YOUIF THE COMPUTER THINKS YOU IF THE HELP IT TO TO YOU

Tutor estimated 81% accuracy; inter-word Tutor estimated 81% accuracy; inter-word latencies:latencies:

If the computer thinks you If the computer thinks you needneed…help, it …help, it talkstalks...to ...to you.you.

?? 43 3943 39 1 1 60 60 4141 226226 7 7 11 242242 1 cs 1 cs

Page 9: Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack Mostow.

CarnegieMellon

Mostow 04/21/23, p. 9

Improvement in accuracy and latency

(same kid reads “help” in May 97)

Text:Text:When some kids jump rope, they help other When some kids jump rope, they help other

people too.people too.

Student said: Student said: when some kids jump rope they help other people toowhen some kids jump rope they help other people too

Recognizer heard: Recognizer heard: WHEN SOME KIDS JUMP ROPE THEY HELP OTHER PEOPLE TOOWHEN SOME KIDS JUMP ROPE THEY HELP OTHER PEOPLE TOO

Tutor estimated 100% accuracy; inter-word Tutor estimated 100% accuracy; inter-word latencies:latencies:

When some kids jump rope, they help other When some kids jump rope, they help other people too.people too.

?? 1 10 34 19 77 9 1 1 10 34 19 77 9 1 34 1 cs34 1 cs

Page 10: Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack Mostow.

CarnegieMellon

Mostow 04/21/23, p. 10

Which performance improvements count?

Echoing the sentence doesn’t count.Echoing the sentence doesn’t count.•So look only at the first try.So look only at the first try.

Picking stories with easier words doesn’t Picking stories with easier words doesn’t count.count.•So look at changes on the same word.So look at changes on the same word.

Memorizing the story doesn’t count.Memorizing the story doesn’t count.•So look only at encounters of words in new So look only at encounters of words in new

contexts.contexts.

Remembering recent words doesn’t count.Remembering recent words doesn’t count.•So look only at the first time a word is So look only at the first time a word is

seen that day.seen that day.

Page 11: Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack Mostow.

CarnegieMellon

Mostow 04/21/23, p. 11

Accuracy increased 16% on same word

from firstfirst to lastlast day seen in new context

50%

60%

70%

80%

90%

mjt mtw mmd mrt mdc mgt mcr fbw

Page 12: Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack Mostow.

CarnegieMellon

Mostow 04/21/23, p. 12

Latency decreased 35% on same word

from firstfirst to lastlast day read in new context

0 cs

25 cs

50 cs

75 cs

100 cs

mjt mtw mmd mrt mdc mgt mcr fbw

Page 13: Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack Mostow.

CarnegieMellon

Mostow 04/21/23, p. 13

Is accuracy and latency estimation...

Ecologically valid?Ecologically valid? Reading Tutor used in school Reading Tutor used in school

Authentic?Authentic? kids choose stories kids choose stories

Unobtrusive? Unobtrusive? evaluate assisted reading invisibly evaluate assisted reading invisibly

Automatic?Automatic? align recognizer output against text align recognizer output against text

Fast?Fast? real-time on Pentium real-time on Pentium

Robust?Robust? to much student, recognizer, and tutor to much student, recognizer, and tutor behaviorbehavior

Data-rich?Data-rich? 10498 utterances, 139133 aligned 10498 utterances, 139133 aligned wordswords

Sensitive?Sensitive? detects significant but subtle effects detects significant but subtle effects (< 0.1 sec)(< 0.1 sec)

Page 14: Carnegie Mellon Mostow 12/7/2015, p. 1 The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack Mostow.

CarnegieMellon

Mostow 04/21/23, p. 14

Conclusion

Does the Reading Tutor help?Does the Reading Tutor help?•Yes, with assisted readingYes, with assisted reading•Transfers to unassisted reading!Transfers to unassisted reading!

Research questions:Research questions:•Who benefits how much, when, and Who benefits how much, when, and

why?why?•How should we improve the Tutor?How should we improve the Tutor?

For more information:For more information:•http://www.cs.cmu.edu/~listenhttp://www.cs.cmu.edu/~listen