ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective...

24
ELIS-DSSP Sint- Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/0 9 1 Objective intelligibility assessment of pathological speakers Catherine Middag, Gwen Van Nuffelen, Jean-Pierre Martens, Marc De Bodt

Transcript of ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective...

Page 1: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 1

Objective intelligibility assessment of pathological speakers

Catherine Middag, Gwen Van Nuffelen,

Jean-Pierre Martens, Marc De Bodt

Page 2: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 2

Introduction

• Intelligibility = popular measure for pathological speech assessment

• Perceptual assessment affected by non-speech information : – familiarity with speaker and type of disorder– usage of linguistic context

• Word intelligibility tests designed to eliminate bias due to linguistic context

• Replacing the human listener by an automatic speech recognizer (ASR) can solve the other problems, but is the ASR sufficiently reliable?– test case : automation of the Dutch Intelligibility Assessment (DIA)

Page 3: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 3

1 .op ø b d f g h j k l m n p r s t v w z

1. dop

2. nuis

3.

top

Dutch Intelligibility Assessment (DIA)

• 50 isolated CVC words• intelligibility = percent phonemes correct

Page 4: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 4

How to apply ASR in the DIA?

• Two approaches– let ASR recognize the words and count the percentage

of correct decisions– let ASR check how well the acoustics match with the

phonetic transcription of the target word (=alignment)

• Our experience– intelligibility emerging from first approach insufficiently

reliable– therefore we developed a system based on alignment

Page 5: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 5

System architecture : flow chart

Speech aligner

speaker features

Intelligibility Prediction

Model

objective score

acoustic feature sequence Xt

target speech transcription

Page 6: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 6

System architecture : flow chart

Speech aligner

speaker features

Intelligibility Prediction

Model

objective score

acoustic feature sequence Xt

target speech transcription

Two systems:• complex state-of-the-art HMM-based system (ASR-ESAT)• simple system with phonological layer (ASR-ELIS) (point more directly to articulatory problems)

Page 7: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 7

System architecture : flow chart

Speech aligner

acoustic feature sequence Xt

target speech transcription

Intelligibility Prediction

Model

objective score

speaker features

Two feature sets:• Phonemic features (patient has trouble pronouncing a certain phoneme)

• Phonological features (patient has problems with voicing, manner or place of articulation)

Page 8: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 8

Extraction of phonemic features (PMF)

# : (0.7+0.5+0.3) /3

/p/ : (0.4+0.8) /2

/o/ : (0.6+0.8) /2

/l/ : 0.6

Speech aligner

=ASR-ESAT

Phonemic features

Frame Phoneme P(st|Xt)

1 # 0.7

2 # 0.5

3 /p/ 0.4

4 /p/ 0.8

5 /o/ 0.6

6 /o/ 0.8

7 /l/ 0.6

8 # 0.3

Page 9: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 9

Extraction of phonological features (PLF)

Frame Phone voicedP(K1|Xt)

backP(K2|Xt)

burstP(K3|Xt)

1 # 0.1 0.1 0.2

2 # 0.1 0.1 0.1

3 /pcl/ 0.2 0.1 0.1

4 /p/ 0.2 0.2 0.6

5 /o/ 0.8 0.7 0.2

6 /o/ 0.6 0.9 0.0

7 /l/ 0.5 0.5 0.1

8 # 0.1 0.1 0.0

Burst : 0.6

Back : (0.7+0.9)/2

Voiced : (0.8+0.6+0.5)/3

Speech aligner

=ASR-ELIS

Phonologicalfeatures

Page 10: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 10

Extraction of phonological features (PLF)

Not burst : (0.2+0.1+…

Not back : (0.1+0.1+…

Not voiced : (0.1+0.1+…

Phonologicalfeatures

Frame Phone voicedP(K1|Xt)

backP(K2|Xt)

burstP(K3|Xt)

1 # 0.1 0.1 0.2

2 # 0.1 0.1 0.1

3 /pcl/ 0.2 0.1 0.1

4 /p/ 0.2 0.2 0.6

5 /o/ 0.8 0.7 0.2

6 /o/ 0.6 0.9 0.0

7 /l/ 0.5 0.5 0.1

8 # 0.1 0.1 0.0

Speech aligner

=ASR-ELIS

Page 11: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 11

Irrelevant features for these phones

Extraction of phonological features (PLF)

Phonologicalfeatures

Frame Phone voicedP(K1|Xt)

backP(K2|Xt)

burstP(K3|Xt)

1 # 0.1 0.1 0.2

2 # 0.1 0.1 0.1

3 /pcl/ 0.2 0.1 0.1

4 /p/ 0.2 0.2 0.6

5 /o/ 0.8 0.7 0.2

6 /o/ 0.6 0.9 0.0

7 /l/ 0.5 0.5 0.1

8 # 0.1 0.1 0.0

Speech aligner

=ASR-ELIS

Page 12: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 12

System architecture : flow chart

Speech aligner

acoustic feature sequence Xt

target speech transcription

speaker features

objective score

Intelligibility Prediction

Model

Page 13: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 13

Intelligibility prediction model (IPM)

• Objective map speaker features (PMF, PLF or combinations) to speaker

intelligibility score

• Model training– train on DIA recordings– pathological speakers (+ some normal control speakers)

• Model type and size– limited number of pathological speakers– high number of features

linear regression model

feature selection

Page 14: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 14

Reference material (DIA)

• 211 speakers :– 51 normals– 60 dysarthric– 12 clefts– 42 hearing impaired– 37 with laryngectomy– 7 with dysphonia– 2 others

• Pathological speakers : mean of 78,7 %

• Normals : mean of 93,3 %• Few with very low score

20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

human score

nu

mb

er o

f p

atie

nts

histogram of the human scores

Page 15: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 15

Results : individual systems

• Based on five-fold cross validation• Measure = Pearson Correlation Coefficient (PCC)

20 40 60 80 100 12030

40

50

60

70

80

90

100

110

Perceptual score

Co

mp

ute

d s

core

20 40 60 80 100 12030

40

50

60

70

80

90

100

110

Perceptual score

Co

mp

ute

d s

core

20 40 60 80 100 12030

40

50

60

70

80

90

100

110

Perceptual score

Co

mp

ute

d s

core

ELIS : PLF : PCC = 0.78 ESAT : PMF : PCC = 0.80

Page 16: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 16

Results : combined system

20 40 60 80 100 12030

40

50

60

70

80

90

100

110

Perceptual score

Com

pute

d sc

ore

PMF + PLF :

PCC = 0.86

Page 17: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 17

Results : pathology-specific IPM

• Instead of creating one general IPM, one can create IPMs for specific pathologies :– still trained on all speakers (enough speakers)– model selection based on performance of speakers of that

pathology (importance of features depends on type of disorder)

Dysarthria Laryngectomy Hearing impairment

PCC 0.94 0.91 0.97

Page 18: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 18

Results : pathology-specific IPM

20 40 60 80 100 12020

40

60

80

100

120

Perceptual score

Com

pute

d sc

ore

• Dysarthria : 0.94 (red circles)

• Dispersion of other speakers is increased

• Largest deviations in low intelligibility area :– scarce data in that area– can be solved by adding

more weight to patients with very low intelligibility

Page 19: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 19

Development of DIA-tool

• PMF and PLF can predict intelligibility of pathological speech:– Combining PMF and PLF yields high PCCs:

• 0.86 for general model• over 0.91 for pathology specific model

– PCCs for specific pathologies compete with subjective inter-rater agreements (0.91)

• This opens up possibilities for development of an automated version of the DIA (see demonstration later) based on PLF + PMF

Page 20: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 20

New feature set : Context-dependent phonological features (CD-PLF)

• Until now:– PMF : Does the patient have trouble pronouncing a

certain phoneme?– PLF : Does the patient have problems with voicing,

manner or place of articulation

• New : Does the patient have problems with a desired change of voicing, manner or place of articulation?

CD-PLFs : how well is change in PLF realized?

Page 21: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 21

Extraction of context-dependent phonological features (CD-PLF)Segment Phone voiced burst …

2 # 0.1 0.2

3 /pcl/ 0.2 0.2

4 /p/ 0.2 0.6

6 /o/ 0.6 0.1

7 /s/ 0.4 0.3

8 # 0.2 0.1

9 /m/ 0.7 0.3

10 /A/ 0.8 0.0

11 /l/ 0.6 0.1

12 # 0.1 0.1

CD-PLF features

Speech aligner

=ASR-ELIS

voicing Burst

Off, on, off : +0.6 Yes, no, no : +0.1

On, on, on : +0.8 No, no, no : +0.0

Page 22: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 22

Results for CD-PLF

• CD-PLFs alone compete with previous best PLF+PMF : 0.86

• CD-PLF+PMF : 0.90 new best!• Pathology-specific results for CD-PLF+PMF :

Dysarthria Laryngectomy Hearing impairment

PCC 0.95 0.94 0.98

Page 23: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 23

Conclusions and future work

• PMF, PLF and CD-PLF can predict intelligibility of pathological speech– CD-PLFs seem to play an important role :

• CD-PLF : PCC = 0.87

• CD-PLF + PMF : PCC=0.90

not the articulation pattern but the change in the articulation pattern matters?

– More research is needed before adding this feature set to the tool

• High PCCs open up new possibilities for :– more profound articulatory assessment, which is directly related to

determination of appropriate therapy– monitoring of effectiveness of chosen therapy tool– using more natural speech (words, phrases) in tests

Page 24: ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective intelligibility assessment of pathological speakers Catherine Middag,

ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 24

• Questions?