Catia Cucchiarini
Quantitative assessment of second language learners’ fluency in read and spontaneous speech
Radboud University Nijmegen
Context
Research on automatic assessment of oral proficiencyin Dutch as a second language
Fluency
• Important construct in evaluation of second language proficiency
• Also relevant for pathological speech
Fluency
• Frequently applied notion, but not clearly defined
• Various interpretations • Overall language proficiency• Oral command of a language• Temporal aspect of oral proficiency
Two experiments
• Exp1: read speech
• Exp2: spontaneous speech
• Human fluency judgements
related to
• Objective temporal measures (CSR)
Aim of these experiments
To explore the relationship between objective properties of speech and perceived fluency in read and spontaneous speech, with a view to determining whether such quantitative measures can be used to develop objective fluency tests.
Method: speakers
Exp1: 60 non-native speakers• 3 proficiency levels
• beginner (PL1)• intermediate (PL2)• advanced (PL3)
• different mother tongues• different gender
Method: speakers
Exp2: 60 non-native speakers• 2 proficiency levels
• beginner level (BL)• intermediate level (IL)
• different mother tongues• different gender
Method: speech material
Exp1:
• 2 sets of 5 phonetically rich sentences
• read over the telephone
Method: speech material
Exp2:
existing test of Dutch as a second language (DSL) è Profieltoets
• 8 items from BL version• short tasks, 15 s to answer• candidates can answer immediately
• 8 items from IL version• long tasks, 30 s to answer• candidates have to reflect to provide motivations
Method: ratersExp1:
• 3 phoneticians (PH)
• 3 speech therapists (ST1)
• 3 speech therapists (ST2)
Exp2:
• 5 DSL teachers for BL (RBL)
• 5 DSL teachers for IL (RIL)
Method: automatic scoring
• Speech orthographically transcribed
• CSR: 38 monophones + lexicon
• Viterbi alignment of speech signals and orthographic transcriptions
• Segmentation at phone level
Method: some definitions
• silent pause: a stretch of silence of no less than 200 ms
• dur1 = duration speech without pauses (s)
• dur2 = duration speech with pauses (s)
Method: objective measures
Primary variables• art = # phones / dur1• ros = # phones / dur2• ptr = 100% * dur1 / dur2• mlr = mean # phones between 2
pauses• mlp = mean length silent pauses• dsp = tot. dur. sil. pauses / (dur2 / 60)• # sp = # sil. pauses / (dur2 / 60)
Method: objective measures
Secondary variables
• # fp = # filled pauses / (dur2 / 60)• # disf = # disfluencies / (dur2 / 60)
Method: fluency ratings
• Sentences scored on fluency on the basis of a ten-point scale
• Raters received no special training
Method: rating procedure
• Exp1: each group of raters judged speakers of different proficiency levels
• Exp2: each group of raters judged speakers of the same proficiency level
Results: reliability
raters inter-rater reliability Cronbach’s alpha
PH 0.96
ST1 0.88
ST2 0.83
RBL 0.86
RIL 0.82
Results: raw fluency ratings
read speech (rs) spontaneous speech (ss)
PL1 [10]
PL2 [27]
PL3 [23]
all-rs
BL [28]
IL [29]
all-ss
mean
4.65
5.00
(1.38)
7.36
(0.82)
5.85
5.64
4.80
5.21
s.d.
2.01
(1.88)
1.81
(1.87)
0.95
(1.16)
1.96
(1.94)
0.88
(0.96)
1.06
(1.17)
1.06
(1.12)
Results: objective measures
read speech (rs)
spontaneous speech (ss)
col.1
2
3
4
5
6
7
8
9
PL1 [10]
PL2 [27]
PL3 [23]
all-rs
BL [28]
IL [29]
all-ss
ss/rs
art
10.87
11.15 (1.38)
12.47 (0.82)
11.61 (1.37)
12.25 (1.25)
11.85 (0.81)
12.04 (1.06)
1.0
ros
8.54
(1.88)
8.95
(1.87)
11.03 (1.16)
9.68
(1.94)
5.99
(0.96)
5.31
(1.17)
5.65
(1.12)
0.6
ptr
77.97 (7.69)
79.62 (8.68)
88.28 (5.42)
82.66 (8.57)
49.33 (8.71)
44.92 (9.51)
47.09 (9.32)
0.6
mlr
16.51 (7.67)
18.10 (7.44)
27.73 (7.13)
21.5
(8.77)
9.50
(2.22)
9.33
(2.27)
9.41
(2.23)
0.4
mlp
0.40
(0.08)
0.40
(0.12)
0.34
(0.16)
0.38
(0.13)
0.92
(0.20)
1.02
(0.28)
0.97
(0.25)
2.6
dsp
9.29
(4.48)
8.67
(5.15)
3.97
(2.96)
6.97
(4.87)
27.90 (5.52)
31.02 (6.04)
29.49 (5.95)
4.2
#sp
22.33 (8.45)
20.11 (9.45)
10.18 (6.45)
16.67 (9.65)
31.00 (5.56)
31.41 (4.77)
31.21 (5.13)
1.9
#fp
0.31
(0.50)
0.35
(0.94)
0.32
(0.77)
0.33
(0.81)
10.83 (8.24)
10.55 (7.84)
10.69 (7.97)
32.4
#disf
1.82 (2.20)
1.78
(1.75)
1.04
(1.53)
1.50
(1.76)
2.39
(1.82)
2.19
(2.27)
2.29
(2.04)
1.5
Results: disfluencies
• Repetitions: exact repetitions of words
• Repairs: corrections
• Restarts: repetitions initial parts of words
Results: disfluencies
repetitions repairs restarts read speech 12% 51% 37% spontaneous speech 65% 23% 12%
Results: correlations read speech spontaneous speech
all-RS [60] RBL [28] RIL [29]
art 0.83 ** 0.07 0.05
ros 0.92 ** 0.57 ** 0.39 *
ptr 0.86 ** 0.46 ** 0.39 *
mlr 0.85 ** 0.49 ** 0.65 **
mlp -0.53 ** -0.08 -0.01
dsp -0.84 ** -0.45 ** -0.40 *
# sp -0.84 ** -0.33 * -0.49 **
# fp -0.25 -0.21 -0.21
#disf -0.15 -0.07 -0.27
Results: correlationsPL1 [10] PL2 [27] PL3 [23]
art 0.85 ** 0.76 ** 0.66 **
ros 0.92 ** 0.91 ** 0.73 **
ptr 0.82 ** 0.86 ** 0.58 **
mlr 0.91 ** 0.86 ** 0.57 **
mlp -0.50 -0.68 ** -0.50 **
dsp -0.71 * -0.85 ** -0.61 **
# sp -0.83 ** -0.83 ** -0.57 **
Discussion
• Reliable fluency scoring is possible
• Fluency scores related to task performed
• Role objective variables in rs and ss• similarities: weak relation sec. var. / fluency• differences: varying roles prim. var.
DiscussionRead speech:
• strong relation: art, ros, ptr, #sp, dsp, mlr• weaker relation: mlp
for perceived fluency pause freq. more important than pause length
two factors important fluency rs:• articulation rate • pause frequency
DiscussionSpontaneous speech:
• strong relation: ros, ptr, #sp, dsp, mlr
• weaker relation: art, mlp
possibly higher freq. pauses effaces importance art
fluency in ss particularly related to var. that contain info on pause
freq.
Conclusions
• Reliable fluency scoring by human raters is possible
• Objective fluency scoring is possible
• Fluency scores vary with speech type
• Fluency scores vary with task performed
Conclusions
• Read speech: fluency scores strongly related to art and pause frequency
• Spontaneous speech: fluency scores strongly related to pause frequency and distribution
• Expert fluency ratings can be predicted more accurately on the basis of objective measures in rs than in ss
Conclusions
• Temporal measures of fluency may be used to develop objective fluency tests
• Selection of variables to be employed should be dependent on material investigated and task performed
Top Related