® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John...
-
Upload
brianna-stevens -
Category
Documents
-
view
212 -
download
0
Transcript of ® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John...
®
Automatic Scoring of Children's Read-Aloud Text Passages and
Word Lists
Klaus Zechner, John Sabatini and Lei Chen
Educational Testing Service
Confidential and Proprietary. Copyright © 2008 Educational Testing Service.
2
®
Motivation (1)
• Reading evaluations of middle school population gaining importance
• Traditional evaluation of reading: off-line, post-hoc answering of questions on passage
• Oral reading assessment: on-line, can capture additional dimensions such as fluency, pronunciation etc.
• High correlation between oral reading performance measures and traditional reading proficiency measures
Confidential and Proprietary. Copyright © 2008 Educational Testing Service.
3
®
Motivation (2)
• Goal: read-aloud assessment with fully automatic means
• Using automatic speech recognition (ASR)• Corpus of text passages and word lists• Correlations between automatic and manual
performance measure CWPM (“correct words per minute”)
Confidential and Proprietary. Copyright © 2008 Educational Testing Service.
4
®
CWPM
• Main read-aloud proficiency measure in this paper:
“correctly read words per minute” W S D
CWPMT
Confidential and Proprietary. Copyright © 2008 Educational Testing Service.
5
®
Related work
• Project LISTEN (Mostow et al., 1994ff.): a reading tutor for children that listens
• Project TBALL (Alwan, 2007): assessment of children’s language skills
Confidential and Proprietary. Copyright © 2008 Educational Testing Service.
6
®
Challenges in Recognizing Children's Speech
• Variations in acoustics– Sentence duration decreases almost linearly between age 7
and 14– Higher fundamental frequency for children
• Variations in syntax– Children tend to ignore sentence boundaries or pause at
positions in the text where no pause is warrantedConsequently:- Training of specific acoustic and language models
Confidential and Proprietary. Copyright © 2008 Educational Testing Service.
7
®
Data sets
• Passages:– Training: 600+ passages (3 different texts) – Evaluation: 101 passages
• Word lists:– Training: 500+ word lists– Evaluation: 42 word lists
Confidential and Proprietary. Copyright © 2008 Educational Testing Service.
8
®
Annotation of reading errors
• Annotators listen to read-aloud recordings• Enter reading errors into spreadsheet (1 word per
line)• Main annotation types: deletions and substitutions of
words• Further: insertions of words (not relevant in CWPM
formula)
Confidential and Proprietary. Copyright © 2008 Educational Testing Service.
9
®
ASR training and word accuracy
1. Acoustic Model (AM):
- combined ETS data with OGI and CMU Kids data
2. Language Model (LM):
- Passages: interpolated LM, strongly biased to transcribed passages
- Word lists: uniform LM due to difficulty of automatically locating words in signal (noises)
Word Accuracy (on unseen test data):
- Passages: 72%; Word lists: 50%
Confidential and Proprietary. Copyright © 2008 Educational Testing Service.
10
®
Computing CWPM
• We don’t know “true” number of deletions and substitutions so we estimate them comparing the recognizer’s output with the true reference passage (using NIST’s sclite package)
• Pearson Correlations: r=0.86 (passages), r=0.80 (word lists; we do not use the reading time here, i.e. “cw” instead of “cwpm”)
• Spearman Rank Correlations: r>=0.7
Confidential and Proprietary. Copyright © 2008 Educational Testing Service.
11
®
Cohort prediction experiment
• Selected all 27 speakers from passage evaluation set who read all 3 passages
• Placed them into 3 equal-sized cohorts based on manually determined CWPM measure
• Predicted rank of all speakers by ASR and automatic CWPM computation
• Result: All 27 speakers placed in correct cohort (within-cohort rankings differ from human rankings)
Confidential and Proprietary. Copyright © 2008 Educational Testing Service.
12
®
Children’s typical reading errors
• Passages (substitutions):• mostly morphological variants, e.g., asks ==> ask• wrong determiners or prepositions
• Word lists (substitutions):– many part-of-speech errors (e.g., nature ==> natural,
equality ==> equally)
Confidential and Proprietary. Copyright © 2008 Educational Testing Service.
13
®
Typical speech recognition errors
• Passages (substitutions):• Also some morphological variants, but more closed class
word errors (e.g., determiners, conjunctions, prepositions)
• Word lists (substitutions):• Mix of morphological variants, POS-cognates, and
sound-related (e.g., simple ==> example)
Confidential and Proprietary. Copyright © 2008 Educational Testing Service.
14
®
Matched errors (S+D)
Recall of students’ errors by ASR system
Passages 47.7%
Word lists 16.8%
Confidential and Proprietary. Copyright © 2008 Educational Testing Service.
15
®
Summary
• Showed feasibility of automatically scoring children’s read-aloud speech; word lists harder than passages
• High correlation of predicted CWPM with human CWPM (Pearson r>=0.8. Spearman r>=0.7)
• Very successful cohort assignment experiment• Major ASR problems with word lists due to audio
quality issues
Confidential and Proprietary. Copyright © 2008 Educational Testing Service.
16
®
Future work
• Improve accuracy of ASR system (e.g., more data for AM and LM training)
• Add new passages and word lists to corpus• Better recording conditions needed, particularly for word lists
(e.g., pre-recording sound calibration, on-line monitoring, storing of time stamps when words are presented on screen)
• Investigate using fluency-related and other features from ASR output for improved CWPM prediction