Lexical Stress in Speech Recognition · 13th May 2005 18 Delft University of Technology:...
Transcript of Lexical Stress in Speech Recognition · 13th May 2005 18 Delft University of Technology:...
1
Delft University of Technology: Man–Machine InteractionDelft University of Technology
Lexical Stress in SpeechRecognition
Master’s thesis presentation
Rogier van Dalen
13th May 2005
13th May 2005 2
Delft University of Technology: Man–Machine Interaction
Topics
• Objective• What is lexical stress?• Properties of lexical stress• Model• System• Results
13th May 2005 3
Delft University of Technology: Man–Machine Interaction
Objective
Can lexical stress be used in a speechrecogniser to make it perform better?
13th May 2005 3
Delft University of Technology: Man–Machine Interaction
Objective
Can lexical stress be used in a speechrecogniser to make it perform better?
• Find properties of lexical stress
13th May 2005 3
Delft University of Technology: Man–Machine Interaction
Objective
Can lexical stress be used in a speechrecogniser to make it perform better?
• Find properties of lexical stress• Model speech recogniser
13th May 2005 3
Delft University of Technology: Man–Machine Interaction
Objective
Can lexical stress be used in a speechrecogniser to make it perform better?
• Find properties of lexical stress• Model speech recogniser• Implement speech recogniser
13th May 2005 3
Delft University of Technology: Man–Machine Interaction
Objective
Can lexical stress be used in a speechrecogniser to make it perform better?
• Find properties of lexical stress• Model speech recogniser• Implement speech recogniser• Test speech recogniser
13th May 2005 4
Delft University of Technology: Man–Machine Interaction
Garden-variety speech recognition
Input modelled as a concatenation of phonemes
13th May 2005 5
Delft University of Technology: Man–Machine Interaction
What is lexical stress?
/ho:rIkda:r@nka:nOn/
13th May 2005 5
Delft University of Technology: Man–Machine Interaction
What is lexical stress?
/ho:rIkda:r@nka:nOn/
Hoor ik daar een kanon?
13th May 2005 5
Delft University of Technology: Man–Machine Interaction
What is lexical stress?
/ho:rIkda:r@nka:nOn/
Hoor ik daar een kanon?
kanón ‘gun’ or kánon ‘song’?
13th May 2005 6
Delft University of Technology: Man–Machine Interaction
Use of lexical stress
• Minimal pairs
13th May 2005 6
Delft University of Technology: Man–Machine Interaction
Use of lexical stress
• Minimal pairs(a) subject – (to) subject
13th May 2005 6
Delft University of Technology: Man–Machine Interaction
Use of lexical stress
• Minimal pairs(a) subject – (to) subjectDu. aanbod ‘offer’ – aan bod ‘first in line’
13th May 2005 6
Delft University of Technology: Man–Machine Interaction
Use of lexical stress
• Minimal pairs(a) subject – (to) subjectDu. aanbod ‘offer’ – aan bod ‘first in line’Du. voorkomen ‘prevent’ – voorkomen‘happen’
13th May 2005 6
Delft University of Technology: Man–Machine Interaction
Use of lexical stress
• Minimal pairs(a) subject – (to) subjectDu. aanbod ‘offer’ – aan bod ‘first in line’Du. voorkomen ‘prevent’ – voorkomen‘happen’Portuguese falara ‘I had spoken’ – falará ‘hewill speak’
13th May 2005 7
Delft University of Technology: Man–Machine Interaction
Use of lexical stress
13th May 2005 7
Delft University of Technology: Man–Machine Interaction
Use of lexical stress
• Word recognition
13th May 2005 7
Delft University of Technology: Man–Machine Interaction
Use of lexical stress
• Word recognitionDu. october – octopus
13th May 2005 7
Delft University of Technology: Man–Machine Interaction
Use of lexical stress
• Word recognitionDu. october – octopustigress – digress
13th May 2005 8
Delft University of Technology: Man–Machine Interaction
Use of lexical stress
13th May 2005 8
Delft University of Technology: Man–Machine Interaction
Use of lexical stress
• Segmentation
13th May 2005 8
Delft University of Technology: Man–Machine Interaction
Use of lexical stress
• Segmentationconduct ascends uphill
13th May 2005 8
Delft University of Technology: Man–Machine Interaction
Use of lexical stress
• Segmentationconduct ascends uphill‘a doctor sends a pill’?
13th May 2005 9
Delft University of Technology: Man–Machine Interaction
Properties of lexical stress
13th May 2005 9
Delft University of Technology: Man–Machine Interaction
Properties of lexical stress
• Stress works on the syllable level
13th May 2005 10
Delft University of Technology: Man–Machine Interaction
A lexicon
a /eI/
are /A:/
the /Di:/
garden /gA:d@n/
ordinary /O:dInEri/
table /teIb@l/
variety /v@raI@ti/
13th May 2005 10
Delft University of Technology: Man–Machine Interaction
A lexicon
a /eI/ [@]
are /A:/
the /Di:/
garden /gA:d@n/
ordinary /O:dInEri/
table /teIb@l/
variety /v@raI@ti/
13th May 2005 10
Delft University of Technology: Man–Machine Interaction
A lexicon
a /eI/ [@]
are /A:/ [@]
the /Di:/
garden /gA:d@n/
ordinary /O:dInEri/
table /teIb@l/
variety /v@raI@ti/
13th May 2005 10
Delft University of Technology: Man–Machine Interaction
A lexicon
a /eI/ [@]
are /A:/ [@]
the /Di:/ [D@]
garden /gA:d@n/
ordinary /O:dInEri/
table /teIb@l/
variety /v@raI@ti/
13th May 2005 10
Delft University of Technology: Man–Machine Interaction
A lexicon with stress marks
a /eI/ [@]
are /A:/ [@]
the /Di:/ [D@]
garden /"gA:d@n/
ordinary /O:dInEri/
table /teIb@l/
variety /v@raI@ti/
13th May 2005 10
Delft University of Technology: Man–Machine Interaction
A lexicon with stress marks
a /eI/ [@]
are /A:/ [@]
the /Di:/ [D@]
garden /"gA:d@n/ ["gA:dn"]
ordinary /O:dInEri/
table /teIb@l/
variety /v@raI@ti/
13th May 2005 10
Delft University of Technology: Man–Machine Interaction
A lexicon with stress marks
a /eI/ [@]
are /A:/ [@]
the /Di:/ [D@]
garden /"gA:d@n/ ["gA:dn"]
ordinary /"O:dInEri/
table /teIb@l/
variety /v@raI@ti/
13th May 2005 10
Delft University of Technology: Man–Machine Interaction
A lexicon with stress marks
a /eI/ [@]
are /A:/ [@]
the /Di:/ [D@]
garden /"gA:d@n/ ["gA:dn"]
ordinary /"O:dInEri/ ["O:dn"ri]
table /teIb@l/
variety /v@raI@ti/
13th May 2005 10
Delft University of Technology: Man–Machine Interaction
A lexicon with stress marks
a /eI/ [@]
are /A:/ [@]
the /Di:/ [D@]
garden /"gA:d@n/ ["gA:dn"]
ordinary /"O:dInEri/ ["O:dn"ri]
table /"teIb@l/
variety /v@raI@ti/
13th May 2005 10
Delft University of Technology: Man–Machine Interaction
A lexicon with stress marks
a /eI/ [@]
are /A:/ [@]
the /Di:/ [D@]
garden /"gA:d@n/ ["gA:dn"]
ordinary /"O:dInEri/ ["O:dn"ri]
table /"teIb@l/ ["theIbl"]
variety /v@raI@ti/
13th May 2005 10
Delft University of Technology: Man–Machine Interaction
A lexicon with stress marks
a /eI/ [@]
are /A:/ [@]
the /Di:/ [D@]
garden /"gA:d@n/ ["gA:dn"]
ordinary /"O:dInEri/ ["O:dn"ri]
table /"teIb@l/ ["theIbl"]
variety /v@"raI@ti/
13th May 2005 10
Delft University of Technology: Man–Machine Interaction
A lexicon with stress marks
a /eI/ [@]
are /A:/ [@]
the /Di:/ [D@]
garden /"gA:d@n/ ["gA:dn"]
ordinary /"O:dInEri/ ["O:dn"ri]
table /"teIb@l/ ["theIbl"]
variety /v@"raI@ti/ [v""raI@ti]
13th May 2005 11
Delft University of Technology: Man–Machine Interaction
Properties of lexical stress
• Stress works on the syllable level
13th May 2005 11
Delft University of Technology: Man–Machine Interaction
Properties of lexical stress
• Stress works on the syllable level• Unstressed syllables are reduced
13th May 2005 12
Delft University of Technology: Man–Machine Interaction
/ka:"nOn/ ‘gun’ or /"ka:nOn/ ‘song’?
Time (s)0 0.532517
Pitc
h (H
z)
0
400
Time (s)0 0.486281
Pitc
h (H
z)
0
400
|k|a: | n | O | n | |k| a: |n| O | n |
13th May 2005 12
Delft University of Technology: Man–Machine Interaction
/ka:"nOn/ ‘gun’ or /"ka:nOn/ ‘song’?
Time (s)0 0.532517
Pitc
h (H
z)
0
400
Time (s)0 0.486281
Pitc
h (H
z)
0
400
|k|a: | n | O | n | |k| a: |n| O | n |kanón /ka:"non/ ‘gun’ kánon /"ka:non/ ‘song’
13th May 2005 13
Delft University of Technology: Man–Machine Interaction
/"ka:nOn?/ ‘song?’ or /ka:"nOn?/ ‘gun?’?
Time (s)0 0.5
Pitc
h (H
z)
0
400
Time (s)0 0.565397
Pitc
h (H
z)
0
400
|k|a: | n | O | n | |k| a: | n | O | n |
13th May 2005 13
Delft University of Technology: Man–Machine Interaction
/"ka:nOn?/ ‘song?’ or /ka:"nOn?/ ‘gun?’?
Time (s)0 0.5
Pitc
h (H
z)
0
400
Time (s)0 0.565397
Pitc
h (H
z)
0
400
|k|a: | n | O | n | |k| a: | n | O | n |kanón /ka:"non/ ‘gun’ kánon /"ka:non/ ‘song’
13th May 2005 14
Delft University of Technology: Man–Machine Interaction
/"ka:nOn/ ‘song’ or /ka:"nOn/ ‘gun’?
Time (s)0 0.532517
–0.6068
0.6289
0
Time (s)0 0.486281
–0.5964
0.6173
0
|k|a: | n | O | n | |k| a: |n| O | n |
13th May 2005 14
Delft University of Technology: Man–Machine Interaction
/"ka:nOn/ ‘song’ or /ka:"nOn/ ‘gun’?
Time (s)0 0.532517
–0.6068
0.6289
0
Time (s)0 0.486281
–0.5964
0.6173
0
|k|a: | n | O | n | |k| a: |n| O | n |kanón /ka:"non/ ‘gun’ kánon /"ka:non/ ‘song’
13th May 2005 15
Delft University of Technology: Man–Machine Interaction
Properties of lexical stress
• Stress works on the syllable level• Unstressed syllables are reduced
13th May 2005 15
Delft University of Technology: Man–Machine Interaction
Properties of lexical stress
• Stress works on the syllable level• Unstressed syllables are reduced• Stressed syllables have longer durations
13th May 2005 15
Delft University of Technology: Man–Machine Interaction
Properties of lexical stress
• Stress works on the syllable level• Unstressed syllables are reduced• Stressed syllables have longer durations• Stressed syllables are louder
13th May 2005 16
Delft University of Technology: Man–Machine Interaction
/ka:"nOn/ ‘gun’ or /"ka:nOn/ ‘song’?
Time (s)0 0.532517
0
5000
Fre
quen
cy (
Hz)
Time (s)0 0.486281
0
5000
Fre
quen
cy (
Hz)
|k|a: | n | O | n | |k| a: |n| O | n |
13th May 2005 16
Delft University of Technology: Man–Machine Interaction
/ka:"nOn/ ‘gun’ or /"ka:nOn/ ‘song’?
Time (s)0 0.532517
0
5000
Fre
quen
cy (
Hz)
Time (s)0 0.486281
0
5000
Fre
quen
cy (
Hz)
|k|a: | n | O | n | |k| a: |n| O | n |kanón /ka:"non/ ‘gun’ kánon /"ka:non/ ‘song’
13th May 2005 17
Delft University of Technology: Man–Machine Interaction
/"ka:nOn?/ ‘song?’ or /ka:"nOn?/ ‘gun?’?
Time (s)0 0.5
0
5000
Fre
quen
cy (
Hz)
Time (s)0 0.565397
0
5000
Fre
quen
cy (
Hz)
|k|a: | n | O | n | |k| a: | n | O | n |
13th May 2005 17
Delft University of Technology: Man–Machine Interaction
/"ka:nOn?/ ‘song?’ or /ka:"nOn?/ ‘gun?’?
Time (s)0 0.5
0
5000
Fre
quen
cy (
Hz)
Time (s)0 0.565397
0
5000
Fre
quen
cy (
Hz)
|k|a: | n | O | n | |k| a: | n | O | n |kanón /ka:"non/ ‘gun’ kánon /"ka:non/ ‘song’
13th May 2005 18
Delft University of Technology: Man–Machine Interaction
Properties of lexical stress
• Stress works on the syllable level• Unstressed syllables are reduced• Stressed syllables have longer durations• Stressed syllables are louder
13th May 2005 18
Delft University of Technology: Man–Machine Interaction
Properties of lexical stress
• Stress works on the syllable level• Unstressed syllables are reduced• Stressed syllables have longer durations• Stressed syllables are louder• Stressed syllables have more high
frequencies
13th May 2005 19
Delft University of Technology: Man–Machine Interaction
Distinguishing phonemes
/@/
/a:/
/y:/ /u:/
/O/
13th May 2005 20
Delft University of Technology: Man–Machine Interaction
Distinguishing phonemes
/@/
/a:/
/"a:/
/y:/
/"y:/
/u:/
/"u:/
/O/
/"O/
13th May 2005 21
Delft University of Technology: Man–Machine Interaction
Integration in a speech recogniser
• /A a: p t O v V/
13th May 2005 21
Delft University of Technology: Man–Machine Interaction
Integration in a speech recogniser
• /A a: p t O v V/
• Stressed and unstressed versions ofphonemes/A "A a: "a: p "p t "t O "O v "v V "V/
13th May 2005 22
Delft University of Technology: Man–Machine Interaction
Integration in the lexicon
are @
a @
the D @
garden g A: d n
ordinary O: d n r i
table t eI b l
variety v r aI @ t i
13th May 2005 22
Delft University of Technology: Man–Machine Interaction
Integration in the lexicon
are @
a @
the D @
garden "g "A: d n
ordinary "O: d n r i
table "t "eI b l
variety v "r "aI @ t i
13th May 2005 23
Delft University of Technology: Man–Machine Interaction
Integration in feature vectors
13th May 2005 23
Delft University of Technology: Man–Machine Interaction
Integration in feature vectors
MFCCs
Spectral tilt features
13th May 2005 24
Delft University of Technology: Man–Machine Interaction
Modelling duration
a12 a23 a34 a45
a22 a33 a44
b2 b3 b4
13th May 2005 25
Delft University of Technology: Man–Machine Interaction
Model — baseline
Feature
extraction
Viterbi:
phoneme
level
{
@eI
}
l
{
iaI
}
@
{
mn
} Viterbi:
word
level
hmms
aI @ eI il m n
Lexicon
a /@/alien /eIli@n/lion /laI@n/
‘alien’
13th May 2005 26
Delft University of Technology: Man–Machine Interaction
Model — stress-enabled
Feature
extraction
Viterbi:
phoneme
level
{
@
eI
} {
l
"l
}
"i
aI
"aI
@
{
m
n
} Duration
analysis
Viterbi:
word
level
hmms
aI "aI @ eI"eI i "i l "lm "m n "n
Lexicon
a /@/alien /"eI-li-@n/lion /"laI-@n/
‘a lion’
Stress
feature
extraction
13th May 2005 27
Delft University of Technology: Man–Machine Interaction
Model — implemented
Feature
extraction
Viterbi:
phoneme
level
{
@
eI
} {
l
"l
}
"i
aI
"aI
@
{
m
n
}
Viterbi:
word
level
hmms
aI "aI @ eI"eI i "i l "lm "m n "n
Lexicon
a /@/alien /"eI-li-@n/lion /"laI-@n/
‘a lion’
Stress
feature
extraction
13th May 2005 28
Delft University of Technology: Man–Machine Interaction
System
HCopy
Praat
sox
cvf HERest
HCompV
f o: "n "i: m I kt r A: n "s "k "r "I "p S n
"z
trained hmms
HVite
‘recognised words’
13th May 2005 29
Delft University of Technology: Man–Machine Interaction
System
• Hidden Markov Toolkit• Corpus Gesproken Nederlands• 772 recordings• 54 842 files• 775 034 words• 53 hours
13th May 2005 30
Delft University of Technology: Man–Machine Interaction
Time
Using 6 to 8 computers:• Training: 1 – 8 hours per iteration• Evaluation: 4 – 10 hours per iteration• 60 training iterations for 2 recognisers
13th May 2005 31
Delft University of Technology: Man–Machine Interaction
Experimental set-up
Conventional
speech
recogniser
Stress-enabled
speech
recogniser
13th May 2005 32
Delft University of Technology: Man–Machine Interaction
Results — duration /"i:/–/i:/
0 50 100 150 200 250 3000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
13th May 2005 33
Delft University of Technology: Man–Machine Interaction
Results — duration /"n/–/n/
0 50 100 150 200 250 3000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
13th May 2005 34
Delft University of Technology: Man–Machine Interaction
Results — spectral tilt /"a:/–/a:/
−15 −10 −5 0 5 10 15 20 25 300
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
13th May 2005 35
Delft University of Technology: Man–Machine Interaction
Results — spectral tilt /"d/–/d/
−20 −15 −10 −5 0 5 10 15 20 25 300
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
13th May 2005 36
Delft University of Technology: Man–Machine Interaction
Results — training
0
25
30
35
40
45
0 5 10 15 20 25 30 35 40 45 50 55 60
Training iteration −→
Rec
ognitio
nac
cura
cy(%
)−→
13th May 2005 37
Delft University of Technology: Man–Machine Interaction
Results — Recognition improvement
13th May 2005 37
Delft University of Technology: Man–Machine Interaction
Results — Recognition improvement
Conventional
speech
recogniser
Stress-enabled
speech
recogniser
Word error rate
56.72 %
Word error rate
55.27 %
A 2.6 % relative improvement.
13th May 2005 38
Delft University of Technology: Man–Machine Interaction
Conclusion
13th May 2005 38
Delft University of Technology: Man–Machine Interaction
Conclusion
Using lexical stress in an automatic speechrecogniser for continuous, large-vocabularyspeech can improve the recognition rate.
13th May 2005 38
Delft University of Technology: Man–Machine Interaction
Conclusion
Using lexical stress in an automatic speechrecogniser for continuous, large-vocabularyspeech can improve the recognition rate.
• Consonants
13th May 2005 39
Delft University of Technology: Man–Machine Interaction
Future work
• Model duration• Model phrasal stress
13th May 2005 40
Delft University of Technology: Man–Machine Interaction