Download - Catia Cucchiarini

Catia Cucchiarini

Quantitative assessment of second language learners’ fluency in read and spontaneous speech

Radboud University Nijmegen

Context

Research on automatic assessment of oral proficiencyin Dutch as a second language

Fluency

• Important construct in evaluation of second language proficiency

• Also relevant for pathological speech

Fluency

• Frequently applied notion, but not clearly defined

• Various interpretations • Overall language proficiency• Oral command of a language• Temporal aspect of oral proficiency

Two experiments

• Exp1: read speech

• Exp2: spontaneous speech

• Human fluency judgements

related to

• Objective temporal measures (CSR)

Aim of these experiments

To explore the relationship between objective properties of speech and perceived fluency in read and spontaneous speech, with a view to determining whether such quantitative measures can be used to develop objective fluency tests.

Method: speakers

Exp1: 60 non-native speakers• 3 proficiency levels

• beginner (PL1)• intermediate (PL2)• advanced (PL3)

• different mother tongues• different gender

Method: speakers

Exp2: 60 non-native speakers• 2 proficiency levels

• beginner level (BL)• intermediate level (IL)

• different mother tongues• different gender

Method: speech material

Exp1:

• 2 sets of 5 phonetically rich sentences

• read over the telephone

Method: speech material

Exp2:

existing test of Dutch as a second language (DSL) è Profieltoets

• 8 items from BL version• short tasks, 15 s to answer• candidates can answer immediately

• 8 items from IL version• long tasks, 30 s to answer• candidates have to reflect to provide motivations

Method: ratersExp1:

• 3 phoneticians (PH)

• 3 speech therapists (ST1)

• 3 speech therapists (ST2)

Exp2:

• 5 DSL teachers for BL (RBL)

• 5 DSL teachers for IL (RIL)

Method: automatic scoring

• Speech orthographically transcribed

• CSR: 38 monophones + lexicon

• Viterbi alignment of speech signals and orthographic transcriptions

• Segmentation at phone level

Method: some definitions

• silent pause: a stretch of silence of no less than 200 ms

• dur1 = duration speech without pauses (s)

• dur2 = duration speech with pauses (s)

Method: objective measures

Primary variables• art = # phones / dur1• ros = # phones / dur2• ptr = 100% * dur1 / dur2• mlr = mean # phones between 2

pauses• mlp = mean length silent pauses• dsp = tot. dur. sil. pauses / (dur2 / 60)• # sp = # sil. pauses / (dur2 / 60)

Method: objective measures

Secondary variables

• # fp = # filled pauses / (dur2 / 60)• # disf = # disfluencies / (dur2 / 60)

Method: fluency ratings

• Sentences scored on fluency on the basis of a ten-point scale

• Raters received no special training

Method: rating procedure

• Exp1: each group of raters judged speakers of different proficiency levels

• Exp2: each group of raters judged speakers of the same proficiency level

Results: reliability

raters inter-rater reliability Cronbach’s alpha

PH 0.96

ST1 0.88

ST2 0.83

RBL 0.86

RIL 0.82

Results: raw fluency ratings

read speech (rs) spontaneous speech (ss)

PL1 [10]

PL2 [27]

PL3 [23]

all-rs

BL [28]

IL [29]

all-ss

mean

4.65

5.00

(1.38)

7.36

(0.82)

5.85

5.64

4.80

5.21

s.d.

2.01

(1.88)

1.81

(1.87)

0.95

(1.16)

1.96

(1.94)

0.88

(0.96)

1.06

(1.17)

1.06

(1.12)

Results: objective measures

read speech (rs)

spontaneous speech (ss)

col.1

2

3

4

5

6

7

8

9

PL1 [10]

PL2 [27]

PL3 [23]

all-rs

BL [28]

IL [29]

all-ss

ss/rs

art

10.87

11.15 (1.38)

12.47 (0.82)

11.61 (1.37)

12.25 (1.25)

11.85 (0.81)

12.04 (1.06)

1.0

ros

8.54

(1.88)

8.95

(1.87)

11.03 (1.16)

9.68

(1.94)

5.99

(0.96)

5.31

(1.17)

5.65

(1.12)

0.6

ptr

77.97 (7.69)

79.62 (8.68)

88.28 (5.42)

82.66 (8.57)

49.33 (8.71)

44.92 (9.51)

47.09 (9.32)

0.6

mlr

16.51 (7.67)

18.10 (7.44)

27.73 (7.13)

21.5

(8.77)

9.50

(2.22)

9.33

(2.27)

9.41

(2.23)

0.4

mlp

0.40

(0.08)

0.40

(0.12)

0.34

(0.16)

0.38

(0.13)

0.92

(0.20)

1.02

(0.28)

0.97

(0.25)

2.6

dsp

9.29

(4.48)

8.67

(5.15)

3.97

(2.96)

6.97

(4.87)

27.90 (5.52)

31.02 (6.04)

29.49 (5.95)

4.2

#sp

22.33 (8.45)

20.11 (9.45)

10.18 (6.45)

16.67 (9.65)

31.00 (5.56)

31.41 (4.77)

31.21 (5.13)

1.9

#fp

0.31

(0.50)

0.35

(0.94)

0.32

(0.77)

0.33

(0.81)

10.83 (8.24)

10.55 (7.84)

10.69 (7.97)

32.4

#disf

1.82 (2.20)

1.78

(1.75)

1.04

(1.53)

1.50

(1.76)

2.39

(1.82)

2.19

(2.27)

2.29

(2.04)

1.5

Results: disfluencies

• Repetitions: exact repetitions of words

• Repairs: corrections

• Restarts: repetitions initial parts of words

Results: disfluencies

repetitions repairs restarts read speech 12% 51% 37% spontaneous speech 65% 23% 12%

Results: correlations read speech spontaneous speech

all-RS [60] RBL [28] RIL [29]

art 0.83 ** 0.07 0.05

ros 0.92 ** 0.57 ** 0.39 *

ptr 0.86 ** 0.46 ** 0.39 *

mlr 0.85 ** 0.49 ** 0.65 **

mlp -0.53 ** -0.08 -0.01

dsp -0.84 ** -0.45 ** -0.40 *

# sp -0.84 ** -0.33 * -0.49 **

# fp -0.25 -0.21 -0.21

#disf -0.15 -0.07 -0.27

Results: correlationsPL1 [10] PL2 [27] PL3 [23]

art 0.85 ** 0.76 ** 0.66 **

ros 0.92 ** 0.91 ** 0.73 **

ptr 0.82 ** 0.86 ** 0.58 **

mlr 0.91 ** 0.86 ** 0.57 **

mlp -0.50 -0.68 ** -0.50 **

dsp -0.71 * -0.85 ** -0.61 **

# sp -0.83 ** -0.83 ** -0.57 **

Discussion

• Reliable fluency scoring is possible

• Fluency scores related to task performed

• Role objective variables in rs and ss• similarities: weak relation sec. var. / fluency• differences: varying roles prim. var.

DiscussionRead speech:

• strong relation: art, ros, ptr, #sp, dsp, mlr• weaker relation: mlp

for perceived fluency pause freq. more important than pause length

two factors important fluency rs:• articulation rate • pause frequency

DiscussionSpontaneous speech:

• strong relation: ros, ptr, #sp, dsp, mlr

• weaker relation: art, mlp

possibly higher freq. pauses effaces importance art

fluency in ss particularly related to var. that contain info on pause

freq.

Conclusions

• Reliable fluency scoring by human raters is possible

• Objective fluency scoring is possible

• Fluency scores vary with speech type

• Fluency scores vary with task performed

Conclusions

• Read speech: fluency scores strongly related to art and pause frequency

• Spontaneous speech: fluency scores strongly related to pause frequency and distribution

• Expert fluency ratings can be predicted more accurately on the basis of objective measures in rs than in ss

Conclusions

• Temporal measures of fluency may be used to develop objective fluency tests

• Selection of variables to be employed should be dependent on material investigated and task performed