Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic...
-
Upload
benjamin-bates -
Category
Documents
-
view
223 -
download
0
Transcript of Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic...
![Page 1: Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.](https://reader035.fdocuments.us/reader035/viewer/2022062305/5697bfae1a28abf838c9c6c7/html5/thumbnails/1.jpg)
Institute of Information Science, Academia SinicaInstitute of Information Science, Academia Sinica
12 July, 2011 @ IIS, Academia Sinica12 July, 2011 @ IIS, Academia Sinica
Automatic Detection-based Phone Recognition on TIMIT
Hung-Shin Lee Hung-Shin Lee (( 李鴻欣李鴻欣 ))
Based on Chen and Wang in ISCSLP’08 and Interspeech’09
![Page 2: Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.](https://reader035.fdocuments.us/reader035/viewer/2022062305/5697bfae1a28abf838c9c6c7/html5/thumbnails/2.jpg)
Page-2
Detection-Based ASRDetection-Based ASR
Knowledge Detection
Knowledge Detection IntegrationIntegration
Knowledge (Higher Level)
Knowledge (Higher Level)
• Phonological attr.• Prosodic attr.• Acoustic attr.• …
Human SR
• HMM• CRF• …
• HMM• CRF• …
DB ASR
DetectorsDetectors IntegratorIntegrator ResultsResults
• Phone• Syllable• Word• Sentence• Semantic info• …
• Phone• Syllable• Word• Sentence• Semantic info• …
![Page 3: Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.](https://reader035.fdocuments.us/reader035/viewer/2022062305/5697bfae1a28abf838c9c6c7/html5/thumbnails/3.jpg)
Page-3
Phonological SystemsPhonological Systems
Phonological Systems
SPE(Sound Pattern of
English)
MV(Multi-valued
Feature)
GP(Government Phonology)
Literatures (N. Chomsky & M. Halle, 1968) (S. King, 2000)? (J. Harris, 1994)
Feature Types Production-based, Binary
Production-based,2-10 values
Sound structure primes,Binary
Feature Number 13 6 11
Examples anterior, nasal, round
centrality, front back, manner,
phonation, place, roundness
![Page 4: Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.](https://reader035.fdocuments.us/reader035/viewer/2022062305/5697bfae1a28abf838c9c6c7/html5/thumbnails/4.jpg)
Page-4
Phonological Feature Detection (1)Phonological Feature Detection (1)
MLP (Detectors)MLP (Detectors)
hiddenlayer
posterior probability
quantizationquantization
SPE_14
0101...01
0101...01
GP_11
011..01
011..01
ii-4 i+4
9 frames
13 MFCCs
input layer
recurrentrecurrenttime-delaytime-delay
![Page 5: Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.](https://reader035.fdocuments.us/reader035/viewer/2022062305/5697bfae1a28abf838c9c6c7/html5/thumbnails/5.jpg)
Page-5
Phonological Feature Detection (2)Phonological Feature Detection (2)
ii-4 i+4
9 frames
13 MFCCs
MLP (Centrality)MLP (Centrality)
MLP (Front-Back)MLP (Front-Back)
MLP (Roundness)MLP (Roundness)
0100
0100
100
100
010
010
0100100.........010
0100100.........010
MV_29
time-delaytime-delay
6 MV Features
![Page 6: Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.](https://reader035.fdocuments.us/reader035/viewer/2022062305/5697bfae1a28abf838c9c6c7/html5/thumbnails/6.jpg)
Page-6
Conditional Random Field (CRF) IntegratorConditional Random Field (CRF) Integrator
• General Chain CRF
i kiikk
jijj yytys
Zp xx
xxy ,,,exp
1| 1
state feature function transition feature function
λj, μk : feature function weight parameters
.
.
.X
yi-1Output (phone)
Input (phonological features)
yi
xi-1 xi xi+1
Y
.
.
.
.
.
.
j
k
![Page 7: Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.](https://reader035.fdocuments.us/reader035/viewer/2022062305/5697bfae1a28abf838c9c6c7/html5/thumbnails/7.jpg)
Page-7
CRF Integrator CRF Integrator –– Training Issues Training Issues
• Required Label for CRF Training– Phone: y– Phonological features: x
DetectorsMLP
DetectorsMLP
Speech
Detected-data trained CRF
Phonological features(with errors) DT
CRFDT CRF
Phone labels
Mappingphones → phonological features
Mappingphones → phonological features Phone labels
Oracle-data trained CRF
Phonological features OT CRFOT CRF
Training Data
Training Data
![Page 8: Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.](https://reader035.fdocuments.us/reader035/viewer/2022062305/5697bfae1a28abf838c9c6c7/html5/thumbnails/8.jpg)
Page-8
ExperimentsExperiments
• Corpus: TIMIT– No SA1, SA2– Training set (3296 utts), Dev set (400 utts)– Test set (1344 utts)
• Phone set: TIMIT61– Evaluation: CMU/MIT 39
• Baseline– CI-HMM
• Toolkits– Nico Toolkit (for MLP), CRF++ (for CRF)
![Page 9: Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.](https://reader035.fdocuments.us/reader035/viewer/2022062305/5697bfae1a28abf838c9c6c7/html5/thumbnails/9.jpg)
Page-9
Results (1)Results (1)
Phone Corr. % Phone Acc. %
SPE14 93.28 93.20
GP11 98.39 98.36
MV29 88.75 88.56
Model: OT CRFTest: OD Features
Phone Corr. % Phone Acc. %
HMM-baseline 69.02 63.45
OT CRF SPE14 66.19 29.68
GP11 69.03 31.38
MV29 59.24 30.33
DT CRF SPE14 56.56 55.27
GP11 55.74 54.53
MV29 51.84 50.68
Model: OT/DT CRFTest: DD Features
![Page 10: Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.](https://reader035.fdocuments.us/reader035/viewer/2022062305/5697bfae1a28abf838c9c6c7/html5/thumbnails/10.jpg)
Page-10
Results (2)Results (2)
Methods # System Phone Corr. (%) Phone Acc. (%)
HMM baseline 1 69.02 63.45
OT: SPE+GP+MV 3 61.97 60.65
DT: SPE+GP+MV 3 52.90 52.06
OT+DT: SPE+GP+MV 6 60.81 59.20
OT: SPE+GP+MV +HMM 4 65.53 64.31
DT: SPE+GP+MV +HMM 4 59.57 58.64
OT+DT: SPE+GP+MV +HMM 7 64.22 62.59
System Fusion
![Page 11: Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.](https://reader035.fdocuments.us/reader035/viewer/2022062305/5697bfae1a28abf838c9c6c7/html5/thumbnails/11.jpg)
Page-11
System Fusion with CRFSystem Fusion with CRF
.
.
.X
yi-1Combined Results (Phone)
Phone Sequence
yi
xi-1 xi xi+1
Y
.
.
.
.
.
.
j
k
SPE Sys.
MV Sys.
GP Sys.
HMM Sys.
![Page 12: Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.](https://reader035.fdocuments.us/reader035/viewer/2022062305/5697bfae1a28abf838c9c6c7/html5/thumbnails/12.jpg)
Page-12
Two Types of AFDT ImperfectionTwo Types of AFDT Imperfection
h# n eh ow kcl k w eh ae eh s tcl t ix n
Phone
AF(A)
AF(A’)
AF asynchrony AFDT errors
![Page 13: Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.](https://reader035.fdocuments.us/reader035/viewer/2022062305/5697bfae1a28abf838c9c6c7/html5/thumbnails/13.jpg)
Page-13
CRF Training (1)CRF Training (1)
Phone y
AFs x
t
Mapping Table
PhonePhone
AFsAFs
Oracle Data Training
Phone y
AFs x
t
AFDTAFDT
Detected Data Training
Detected Errors
![Page 14: Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.](https://reader035.fdocuments.us/reader035/viewer/2022062305/5697bfae1a28abf838c9c6c7/html5/thumbnails/14.jpg)
Page-14
CRF Training (2)CRF Training (2)
Phone y
AFs x
t
AFDTAFDT
Aligned Data Training
AF Sequence
AF Sequence
![Page 15: Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.](https://reader035.fdocuments.us/reader035/viewer/2022062305/5697bfae1a28abf838c9c6c7/html5/thumbnails/15.jpg)
Page-15
Results (3)Results (3)
System Phone Corr. (%) Phone Acc. (%)
Upper Bound
OT CRF 98.31 98.28
AT CRF 71.49 70.31
Real Case
OT CRF 70.55 34.38
DT CRF 57.30 56.14
AT CRF 64.87 62.32
27.97 % acc. drops on the introduction of AF asynchrony
Detection Error causes further 7.99 % acc. drop
![Page 16: Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.](https://reader035.fdocuments.us/reader035/viewer/2022062305/5697bfae1a28abf838c9c6c7/html5/thumbnails/16.jpg)
Page-16
AF Asynchrony CompensationAF Asynchrony Compensation
• AF asynchrony is caused by context variation• We can reduce AF asynchrony by letting our systems
learn context variation directly – Long-Term information
Windows + DCTs
MLPWindows + DCTs
Right Context
Left Context
23 dim Mel
MLP
MLP
310ms
144Dim
72Dim
72Dim
72Dim
![Page 17: Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.](https://reader035.fdocuments.us/reader035/viewer/2022062305/5697bfae1a28abf838c9c6c7/html5/thumbnails/17.jpg)
Page-17
Results (4)Results (4)
Test Data Type System Corr Acc
- CI-HMM 69.02 63.45
- CD-HMM 75.76 65.78
Detected (real case)
OT CRF (±3) 75.24 47.97
Long Term AFDT + DT CRF (±3) 64.58 63.12
Ideal (upper bound)
Long Term AFDT + AT CRF 74.96 73.64
MFCC AFDT + AT CRF (±3) 72.87 71.62
Long Term AFDT + AT CRF (±3) 76.83 74.97
Detected (real case)
Long Term AFDT + AT CRF 69.83 66.97
MFCC AFDT + AT CRF (±3) 66.21 63.16
Long Term AFDT + AT CRF (±3) 71.01 67.67
![Page 18: Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.](https://reader035.fdocuments.us/reader035/viewer/2022062305/5697bfae1a28abf838c9c6c7/html5/thumbnails/18.jpg)
Page-18
ConclusionsConclusions
• A well-designed phonological feature system is important– AF asynchrony minimization training and AF-phone
synchronization could also be investigated
• Oracle Trained CRF is able to retrieve more phonological information from speech– High phone correction rate (but sensitive to detection error)– Helpful for combination
• Detection-Based ASR is promising– A front-end detector is a major issue
![Page 19: Institute of Information Science, Academia Sinica 12 July, 2011 @ IIS, Academia Sinica Automatic Detection-based Phone Recognition on TIMIT Hung-Shin Lee.](https://reader035.fdocuments.us/reader035/viewer/2022062305/5697bfae1a28abf838c9c6c7/html5/thumbnails/19.jpg)
Page-19
AF and Phone Alignment Using AFDTAF and Phone Alignment Using AFDT
t
t
t
t
t
phone sequence
AF sequence