® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John...

®

Automatic Scoring of Children's Read-Aloud Text Passages and

Word Lists

Klaus Zechner, John Sabatini and Lei Chen

Educational Testing Service

Confidential and Proprietary. Copyright © 2008 Educational Testing Service.

2

®

Motivation (1)

• Reading evaluations of middle school population gaining importance

• Traditional evaluation of reading: off-line, post-hoc answering of questions on passage

• Oral reading assessment: on-line, can capture additional dimensions such as fluency, pronunciation etc.

• High correlation between oral reading performance measures and traditional reading proficiency measures


3

®

Motivation (2)

• Goal: read-aloud assessment with fully automatic means

• Using automatic speech recognition (ASR)• Corpus of text passages and word lists• Correlations between automatic and manual

performance measure CWPM (“correct words per minute”)


4

®

CWPM

• Main read-aloud proficiency measure in this paper:

“correctly read words per minute” W S D

CWPMT


5

®

Related work

• Project LISTEN (Mostow et al., 1994ff.): a reading tutor for children that listens

• Project TBALL (Alwan, 2007): assessment of children’s language skills


6

®

Challenges in Recognizing Children's Speech

• Variations in acoustics– Sentence duration decreases almost linearly between age 7

and 14– Higher fundamental frequency for children

• Variations in syntax– Children tend to ignore sentence boundaries or pause at

positions in the text where no pause is warrantedConsequently:- Training of specific acoustic and language models


7

®

Data sets

• Passages:– Training: 600+ passages (3 different texts) – Evaluation: 101 passages

• Word lists:– Training: 500+ word lists– Evaluation: 42 word lists


8

®

Annotation of reading errors

• Annotators listen to read-aloud recordings• Enter reading errors into spreadsheet (1 word per

line)• Main annotation types: deletions and substitutions of

words• Further: insertions of words (not relevant in CWPM

formula)


9

®

ASR training and word accuracy

1. Acoustic Model (AM):

- combined ETS data with OGI and CMU Kids data

2. Language Model (LM):

- Passages: interpolated LM, strongly biased to transcribed passages

- Word lists: uniform LM due to difficulty of automatically locating words in signal (noises)

Word Accuracy (on unseen test data):

- Passages: 72%; Word lists: 50%


10

®

Computing CWPM

• We don’t know “true” number of deletions and substitutions so we estimate them comparing the recognizer’s output with the true reference passage (using NIST’s sclite package)

• Pearson Correlations: r=0.86 (passages), r=0.80 (word lists; we do not use the reading time here, i.e. “cw” instead of “cwpm”)

• Spearman Rank Correlations: r>=0.7


11

®

Cohort prediction experiment

• Selected all 27 speakers from passage evaluation set who read all 3 passages

• Placed them into 3 equal-sized cohorts based on manually determined CWPM measure

• Predicted rank of all speakers by ASR and automatic CWPM computation

• Result: All 27 speakers placed in correct cohort (within-cohort rankings differ from human rankings)


12

®

Children’s typical reading errors

• Passages (substitutions):• mostly morphological variants, e.g., asks ==> ask• wrong determiners or prepositions

• Word lists (substitutions):– many part-of-speech errors (e.g., nature ==> natural,

equality ==> equally)


13

®

Typical speech recognition errors

• Passages (substitutions):• Also some morphological variants, but more closed class

word errors (e.g., determiners, conjunctions, prepositions)

• Word lists (substitutions):• Mix of morphological variants, POS-cognates, and

sound-related (e.g., simple ==> example)


14

®

Matched errors (S+D)

Recall of students’ errors by ASR system

Passages 47.7%

Word lists 16.8%


15

®

Summary

• Showed feasibility of automatically scoring children’s read-aloud speech; word lists harder than passages

• High correlation of predicted CWPM with human CWPM (Pearson r>=0.8. Spearman r>=0.7)

• Very successful cohort assignment experiment• Major ASR problems with word lists due to audio

quality issues


16

®

Future work

• Improve accuracy of ASR system (e.g., more data for AM and LM training)

• Add new passages and word lists to corpus• Better recording conditions needed, particularly for word lists

(e.g., pre-recording sound calibration, on-line monitoring, storing of time stamps when words are presented on screen)

• Investigate using fluency-related and other features from ASR output for improved CWPM prediction

® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John...

Documents

Transcript of ® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John...