ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective...
-
Upload
rosanna-short -
Category
Documents
-
view
217 -
download
0
Transcript of ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent SPACE Symposium - 05/02/091 Objective...
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 1
Objective intelligibility assessment of pathological speakers
Catherine Middag, Gwen Van Nuffelen,
Jean-Pierre Martens, Marc De Bodt
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 2
Introduction
• Intelligibility = popular measure for pathological speech assessment
• Perceptual assessment affected by non-speech information : – familiarity with speaker and type of disorder– usage of linguistic context
• Word intelligibility tests designed to eliminate bias due to linguistic context
• Replacing the human listener by an automatic speech recognizer (ASR) can solve the other problems, but is the ASR sufficiently reliable?– test case : automation of the Dutch Intelligibility Assessment (DIA)
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 3
1 .op ø b d f g h j k l m n p r s t v w z
1. dop
2. nuis
3.
top
Dutch Intelligibility Assessment (DIA)
• 50 isolated CVC words• intelligibility = percent phonemes correct
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 4
How to apply ASR in the DIA?
• Two approaches– let ASR recognize the words and count the percentage
of correct decisions– let ASR check how well the acoustics match with the
phonetic transcription of the target word (=alignment)
• Our experience– intelligibility emerging from first approach insufficiently
reliable– therefore we developed a system based on alignment
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 5
System architecture : flow chart
Speech aligner
speaker features
Intelligibility Prediction
Model
objective score
acoustic feature sequence Xt
target speech transcription
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 6
System architecture : flow chart
Speech aligner
speaker features
Intelligibility Prediction
Model
objective score
acoustic feature sequence Xt
target speech transcription
Two systems:• complex state-of-the-art HMM-based system (ASR-ESAT)• simple system with phonological layer (ASR-ELIS) (point more directly to articulatory problems)
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 7
System architecture : flow chart
Speech aligner
acoustic feature sequence Xt
target speech transcription
Intelligibility Prediction
Model
objective score
speaker features
Two feature sets:• Phonemic features (patient has trouble pronouncing a certain phoneme)
• Phonological features (patient has problems with voicing, manner or place of articulation)
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 8
Extraction of phonemic features (PMF)
# : (0.7+0.5+0.3) /3
/p/ : (0.4+0.8) /2
/o/ : (0.6+0.8) /2
/l/ : 0.6
Speech aligner
=ASR-ESAT
Phonemic features
Frame Phoneme P(st|Xt)
1 # 0.7
2 # 0.5
3 /p/ 0.4
4 /p/ 0.8
5 /o/ 0.6
6 /o/ 0.8
7 /l/ 0.6
8 # 0.3
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 9
Extraction of phonological features (PLF)
Frame Phone voicedP(K1|Xt)
backP(K2|Xt)
burstP(K3|Xt)
1 # 0.1 0.1 0.2
2 # 0.1 0.1 0.1
3 /pcl/ 0.2 0.1 0.1
4 /p/ 0.2 0.2 0.6
5 /o/ 0.8 0.7 0.2
6 /o/ 0.6 0.9 0.0
7 /l/ 0.5 0.5 0.1
8 # 0.1 0.1 0.0
Burst : 0.6
Back : (0.7+0.9)/2
Voiced : (0.8+0.6+0.5)/3
Speech aligner
=ASR-ELIS
Phonologicalfeatures
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 10
Extraction of phonological features (PLF)
Not burst : (0.2+0.1+…
Not back : (0.1+0.1+…
Not voiced : (0.1+0.1+…
Phonologicalfeatures
Frame Phone voicedP(K1|Xt)
backP(K2|Xt)
burstP(K3|Xt)
1 # 0.1 0.1 0.2
2 # 0.1 0.1 0.1
3 /pcl/ 0.2 0.1 0.1
4 /p/ 0.2 0.2 0.6
5 /o/ 0.8 0.7 0.2
6 /o/ 0.6 0.9 0.0
7 /l/ 0.5 0.5 0.1
8 # 0.1 0.1 0.0
Speech aligner
=ASR-ELIS
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 11
Irrelevant features for these phones
Extraction of phonological features (PLF)
Phonologicalfeatures
Frame Phone voicedP(K1|Xt)
backP(K2|Xt)
burstP(K3|Xt)
1 # 0.1 0.1 0.2
2 # 0.1 0.1 0.1
3 /pcl/ 0.2 0.1 0.1
4 /p/ 0.2 0.2 0.6
5 /o/ 0.8 0.7 0.2
6 /o/ 0.6 0.9 0.0
7 /l/ 0.5 0.5 0.1
8 # 0.1 0.1 0.0
Speech aligner
=ASR-ELIS
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 12
System architecture : flow chart
Speech aligner
acoustic feature sequence Xt
target speech transcription
speaker features
objective score
Intelligibility Prediction
Model
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 13
Intelligibility prediction model (IPM)
• Objective map speaker features (PMF, PLF or combinations) to speaker
intelligibility score
• Model training– train on DIA recordings– pathological speakers (+ some normal control speakers)
• Model type and size– limited number of pathological speakers– high number of features
linear regression model
feature selection
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 14
Reference material (DIA)
• 211 speakers :– 51 normals– 60 dysarthric– 12 clefts– 42 hearing impaired– 37 with laryngectomy– 7 with dysphonia– 2 others
• Pathological speakers : mean of 78,7 %
• Normals : mean of 93,3 %• Few with very low score
20 30 40 50 60 70 80 90 1000
10
20
30
40
50
60
human score
nu
mb
er o
f p
atie
nts
histogram of the human scores
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 15
Results : individual systems
• Based on five-fold cross validation• Measure = Pearson Correlation Coefficient (PCC)
20 40 60 80 100 12030
40
50
60
70
80
90
100
110
Perceptual score
Co
mp
ute
d s
core
20 40 60 80 100 12030
40
50
60
70
80
90
100
110
Perceptual score
Co
mp
ute
d s
core
20 40 60 80 100 12030
40
50
60
70
80
90
100
110
Perceptual score
Co
mp
ute
d s
core
ELIS : PLF : PCC = 0.78 ESAT : PMF : PCC = 0.80
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 16
Results : combined system
20 40 60 80 100 12030
40
50
60
70
80
90
100
110
Perceptual score
Com
pute
d sc
ore
PMF + PLF :
PCC = 0.86
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 17
Results : pathology-specific IPM
• Instead of creating one general IPM, one can create IPMs for specific pathologies :– still trained on all speakers (enough speakers)– model selection based on performance of speakers of that
pathology (importance of features depends on type of disorder)
Dysarthria Laryngectomy Hearing impairment
PCC 0.94 0.91 0.97
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 18
Results : pathology-specific IPM
20 40 60 80 100 12020
40
60
80
100
120
Perceptual score
Com
pute
d sc
ore
• Dysarthria : 0.94 (red circles)
• Dispersion of other speakers is increased
• Largest deviations in low intelligibility area :– scarce data in that area– can be solved by adding
more weight to patients with very low intelligibility
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 19
Development of DIA-tool
• PMF and PLF can predict intelligibility of pathological speech:– Combining PMF and PLF yields high PCCs:
• 0.86 for general model• over 0.91 for pathology specific model
– PCCs for specific pathologies compete with subjective inter-rater agreements (0.91)
• This opens up possibilities for development of an automated version of the DIA (see demonstration later) based on PLF + PMF
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 20
New feature set : Context-dependent phonological features (CD-PLF)
• Until now:– PMF : Does the patient have trouble pronouncing a
certain phoneme?– PLF : Does the patient have problems with voicing,
manner or place of articulation
• New : Does the patient have problems with a desired change of voicing, manner or place of articulation?
CD-PLFs : how well is change in PLF realized?
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 21
Extraction of context-dependent phonological features (CD-PLF)Segment Phone voiced burst …
2 # 0.1 0.2
3 /pcl/ 0.2 0.2
4 /p/ 0.2 0.6
6 /o/ 0.6 0.1
7 /s/ 0.4 0.3
8 # 0.2 0.1
9 /m/ 0.7 0.3
10 /A/ 0.8 0.0
11 /l/ 0.6 0.1
12 # 0.1 0.1
CD-PLF features
Speech aligner
=ASR-ELIS
voicing Burst
Off, on, off : +0.6 Yes, no, no : +0.1
On, on, on : +0.8 No, no, no : +0.0
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 22
Results for CD-PLF
• CD-PLFs alone compete with previous best PLF+PMF : 0.86
• CD-PLF+PMF : 0.90 new best!• Pathology-specific results for CD-PLF+PMF :
Dysarthria Laryngectomy Hearing impairment
PCC 0.95 0.94 0.98
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 23
Conclusions and future work
• PMF, PLF and CD-PLF can predict intelligibility of pathological speech– CD-PLFs seem to play an important role :
• CD-PLF : PCC = 0.87
• CD-PLF + PMF : PCC=0.90
not the articulation pattern but the change in the articulation pattern matters?
– More research is needed before adding this feature set to the tool
• High PCCs open up new possibilities for :– more profound articulatory assessment, which is directly related to
determination of appropriate therapy– monitoring of effectiveness of chosen therapy tool– using more natural speech (words, phrases) in tests
ELIS-DSSPSint-Pietersnieuwstraat 41B-9000 Gent SPACE Symposium - 05/02/09 24
• Questions?