-- Speech Activities in CST Thomas Fang...
Transcript of -- Speech Activities in CST Thomas Fang...
Thomas Fang ZhengThomas Fang ZhengCenter of Speech Technology (CST)
State Key Lab of Intelligent Technology and Systems
Department of Computer Science & TechnologyTsinghua University
[email protected], http://sp.cs.tsinghua.edu.cn/~fzheng/
30 Oct 01 at Communications Research Lab, Kyoto
Chinese Spoken Dialogue Systems
-- Speech Activities in CST
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 2
Outline
q Brief Introduction to CST
q Speech R&D Activities (w/ paper references)
q A Flight Spoken Dialogue System - EasyFlightv System Overview
v Keyword Based Robust Parser
v Powerful Dialogue Manager
q Demonstrationsv EasyFlight - Flight inquiry & reservation dialog system
v EasyNav - THU Campus navigation dialog system
q Thanks
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 3
q Founded in 1979, named as Speech Laboratory
q Joined the State Key Laboratory of Intelligent Technology and Systems in 1999, renamed as Center of Speech Technology
q http://sp.cs.tsinghua.edu.cn/
Center of Speech Technology
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 4
5
1
13
7FacultyFacultyFacultyFaculty
Post DoctorsPost DoctorsPost DoctorsPost Doctors
Doctoral StudentsDoctoral StudentsDoctoral StudentsDoctoral Students
Master StudentsMaster StudentsMaster StudentsMaster Students
Members of CST in 2001
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 5
q State fundamental research plan:v NSFv 863v 973v 985 (Tsinghua University)
q Collaboration with industries:v Analog Devices, Inc.v IBMv Intelv Keysun Information Technology Limitedv Lucent Technologiesv Microsoftv Nokiav SoundTek Technology Limitedv Weniwen Technologies Limited)v ...
Founding Resources
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 6
Speech R&D Activitiesq Acoustic Modeling
v Feature Extraction and Selectionv Acoustic Modelingv Accurate & fast AM Searchv Robustness
Speech EnhancementFractalsSpeaker AdaptationSpeaker NormalizationChinese Pronunciation Modeling
q Language Modelingv Characteristics of Chinesev Language Modeling and Searchv LM Adaptation & New Word
Induction
q Natural/Spoken Speech Understanding (NLU/SLU)v NLU - GLR Based Parsingv SLU - KW based robust parsingv Dialogue Manager
q Applicationsv Command and controlv Keyword spottingv Language Learningv Input method editorv Chinese dictation machinev Spoken dialoguesv Speaker identification and
verification
q Resources
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 7
1. Fan Wang, Fang Zheng, and Wenhu Wu. “An MCE based Classification Tree Using Hierarchical Feature-Weighting in Speech Recognition,” EuroSpeech’2001, 3:1947-1950, Sept. 3-7, 2001, Aalborg, Denmark
2. Xinyan Zhang. “Subband analysis based robust speech recognition,” Graduate Project: Tsinghua University, Beijing. June 2001.
Feature Extraction and Selection
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 8
Acoustic Modeling1. Jiyong Zhang, Fang Zheng, Jing Li, Chunhua Luo, and Guoliang
Zhang, “Improved Context-Dependent Acoustic Modeling for Continuous Chinese Speech Recognition,” EuroSpeech, 3:1617-1620, Sept. 3-7, 2001, Aalborg, Denmark
2. Zheng Fang, Wu Wenhu, and Fang Ditang, “Center-Distance Continuous Probability Models And the Distance Measure,” J. of Computer Science and Technology, 13(5): 426-437, Sept., 1998
3. ZHENG Fang, MOU Xiaolong, WU Wenhu, and FANG Ditang, “On the Embedded Multiple-Model Scoring Scheme for Speech Recognition,” International Symposium on Chinese Spoken Language Processing (ISCSLP'98), ASR-A3, pp.49-53, Dec.7-9, 1998, Singapore
4. Guo Qing, Zheng Fang, Wu Jian and Wu Wenhu, “A new method used in HMM for modeling frame correlation,” ICASSP, pp. I-169~172, March 15~19, 1999, Phoenix
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 9
Accurate & fast AM Search1. Guoliang Zhang, Fang Zheng, and Wenhu Wu, “A Two-Layer Lexical Tree Based
Beam Search in Continuous Chinese Speech Recognition,” EuroSpeech, 3:1801-1804, Sept. 3-7, 2001, Aalborg, Denmark
2. Jian Wu, and Fang Zheng. “Reducing time-synchronous beam search effort using stage based look-ahead and language model rank based pruning,” ICSLP’00, pp. IV-262~265
3. Zhanjiang Song, Fang Zheng, and Wenhu Wu. “Statistical knowledge based frame synchronous search strategies in continuous speech recognition,” ICASSP’00, pp. III-1583~1586
4. Jiyong Zhang, Fang Zheng, Shu Du, Zhanjiang Song and Mingxing Xu. “Merging based syllable detection automaton in continuous Chinese speech recognition,” J. of Software, 10(11): 1212~1215, Nov. 1999 (in Chinese)
5. Fang Zheng, Zhanjiang Song, Mingxing Xu, et al. “EasyTalk: A Large-Vocabulary Speaker-Independent Chinese Dictation Machine,” EuroSpeech'99, Vol. 2, pp.819-822, Budapest, Hungary, Sept. 1999
6. Fang Zheng, Mingxing Xu, and Wenhu Wu. “Search strategies in continuous speech recognition,” 5th National Conference on Man-Machine Speech Communications(NCMMSC’98),138-143, Jul. 26-31, 1998, Harbin (in Chinese)
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 10
Speech Enhancement
1. YANG Dali, XU Mingxing, WU Wenhu, ZHENG Fang, “A Noise Cancellation Method Based on Wavelet Transform,”International Symposium on Chinese Spoken Language Processing, pp. 211-214, Oct. 13-15, 2000, Beijing
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 11
Fractals
1. Fan Wang, Fang Zheng, and Wenhu Wu, “A C/V segmentation method for Mandarin speech based on multiscale fractal dimension,” International Conference on Spoken Language Processing, pp. IV-648~651, Oct. 16-20, Beijing
2. WANG Fan, ZHENG Fang, and WU Wenhu, “A self-adapting endpoint detection algorithm for speech recognition in noisy environments based on 1/f process,” International Symposium on Chinese Spoken Language Processing, pp. 327-330, Oct. 13-15,
2000, Beijing
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 12
Speaker Adaptation
1. Lei He, Jian Wu, Ditang Fang, Wenhu Wu, “Speaker adaptation based on combination of map estimation and weighted neighbor regression,” IEEE ICASSP, pp.II-981~984, June 5-9, 2000, Istanbul, Turkey
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 13
Speaker Normalization1. Lei HE, Ditang FANG, and Wenhu WU, “Speaker normalization
training and adaptation for speech recognition,” International Conference on Spoken Language Processing, pp. IV-342~345, Oct. 16-20, Beijing
2. Tranzai LEE, Fang ZHENG, and Wenhu WU, “Reference point alignment frequency warp method for speaker adaptation,”International Conference on Signal Pocessing, pp. II-756~759, Aug. 21-25, 2000, Beijing
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 14
Chinese Pronunciation Modeling1. Fang Zheng, Zhanjiang Song, Pascale Fung, and William Byrne,
“Mandarin Pronunciation Modeling Based on CASS Corpus,”Sino-French Symposium on Speech and Language Processing, pp. 47-53, Oct. 16, 2000, Beijing
2. Pascale Fung, William Byrne, ZHENG Fang Thomas, Terri Kamm, LIU Yi, SONG Zhanjiang, Veera Venkataramani, and Umar Ruhi, “Pronunciation modeling of Mandarin casual speech,” Workshop 2000 on Speech and Language Processing: Final Report for MPM Group, http://www.clsp.jhu.edu/index.shtml
3. Fang Zheng, Zhanjiang Song, Pascale Fung, and William Byrne, “Modeling Pronunciation Variation Using Context-Dependent Weighting and B/S Refined Acoustic Modeling,” EuroSpeech, 1:57-60, Sept. 3-7, 2001, Aalborg, Denmark
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 15
Language Modeling
1. Jian Wu and Fang Zheng, “On enhancing Katz-smoothing based back-off language model,” International Conference on Spoken Language Processing, pp. I-198~201, Oct. 16-20, Beijing
2. Xiaolong Mou, Jinming Zhan, Fang Zheng and Wenhu Wu. “The N-Gram Language Model Based on the Back-off Estimation Algorithm,” The 5th National Conference on Man-Machine Speech Communication (NCMMSC’98), 206-209, July 26-31, 1998, Harbin (in Chinese)
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 16
1. Fang Zheng, Jian Wu and Zhanjiang Song, “Improving the Syllable-Synchronous Network Search Algorithm for Word Decoding in Continuous ChinesE Speech Recognition ,” J. Computer Science & Technology, 15(5): 461-471, Sept. 2000
2. Fang Zheng, “A Syllable-Synchronous Network Search Algorithm for Word Decoding in Chinese Speech Recognition,”ICASSP, pp. II-601~604, March 15~19, 1999, Phoenix
3. Fang Zheng, Jian Wu and Wenhu Wu, “Input Chinese sentences using digits,” International Conference on Spoken Language Processing, pp. III-127~130, Oct. 16-20, Beijing
LM Search
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 17
1. Genqing Wu, Fang Zheng, Ling Jin, and Wenhu Wu, “An online incremental language model adaptation,” EuroSpeech, 3:2139-2142, Sept. 3-7, 2001, Aalborg, Denmark
LM Adaptation & New Word Induction
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 18
NLU - GLR Based Parsing1. Yinfei Huang, Fang Zheng, Yi Su, Fang Li, Wenhu Wu, “A
Theme Structure Method for the Ellipsis Resolution,”EuroSpeech, 3:2153-2156, Sept. 3-7, 2001, Aalborg, Denmark
2. Yi Su, Fang Zheng, and Yinfei Huang, “Design of a Semantic Parser with Support to Ellipsis Resolution in a Chinese Spoken Language Dialogue System,” EuroSpeech, 3:2161-2164, Sept. 3-7, 2001, Aalborg, Denmark
3. Yinfei HUANG, Fang ZHENG, Mingxing XU, Pengju Yan, and Wenhu WU, “Language understanding component for Chinese dialogue system,” International Conference on Spoken Language Processing, pp. III-1053~1056, Oct. 16-20, Beijing
4. Yan Pengju, Zheng Fang, Xu Mingxing, Huang Yinfei, “Word-class stochastic model in a spoken language dialogue system,”International Symposium on Chinese Spoken Language Processing, pp. 141-144, Oct. 13-15, 2000, Beijing
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 19
SLU - KW Based Robust Parsing
1. Pengju Yan, Fang Zheng, Hui Sun, and Mingxing Xu, “Parsing spontaneous speech in the dialogue systems,” EuroSpeech, 3:2149-2152, Sept. 3-7, 2001, Aalborg, Denmark
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 20
Dialogue Manager (DM)
1. Xiaojun Wu, Fang Zheng and Mingxing Xu. “TOPIC Forest: A plan-based dialogue management structure,” International Conference on Acoustics, Speech and Signal Processing, Vol. I., May 7-11, Salt Lake City, USA
2. Li Fang, Zheng Fang, Wu Wenhu, Huang Yinfei, “Dynamic Query Organization and Response Generation in Spoken Dialogue System,” 19th International Conference on Computer Processing of Oriental Languages, May 14-16, Seoul, Korea
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 21
Applications & References
q Command and controlv Fang Zheng, Qixiu Hu, Xiang Deng, et al. “An introduction
to a kind of voice dialers for dummies,” 4th National Conference on Man-Machine Speech Communications (NCMMSC’96), pp.165-168, Oct. 1996, Beijing (in Chinese)
v Yinfei Huang, Fang Zheng, and Wenhu Wu. “EasyCmd: Navigation by Voice Commands,” International Symposium on Chinese Spoken Language Processing (ISCSLP’00), pp. 145-148, Oct. 13-15, 2000, Beijing
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 22
q Keyword spottingv Zheng Fang, Xu Mingxing, Mou Xiaolong, et al. “HarkMan
- A Vocabulary-Independent Keyword Spotter for Spontaneous Chinese Speech,” J. of Computer Science and Technology (JCST), 14(1): 18-26, Jan., 1999
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 23
q Language Learning (Pronunciation Scoring)v Zhanjiang Song, Fang Zheng, Mingxing Xu, and Wenhu
Wu. “An Effective Scoring Method for Speaking Skill Evaluation System,” EuroSpeech'99, Vol. 1, pp.187-190, Budapest, Hungary, Sept. 1999
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 24
q Input method editor (IME)v Fang Zheng, Jian Wu, and Wenhu Wu. “Input Chinese
sentences using digits,” International Conference on Spoken Language Processing (ICSLP’00), pp. III-127~130, Oct. 16-20, Beijing
v Ling JIN, Genqing Wu, Fang Zheng, and Wenhu Wu. “Improved strategies for intelligent sentence input method engine system,” International Symposium on Chinese Spoken Language Processing (ISCSLP’00), pp. 247-250, Oct. 13-15, 2000, Beijing
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 25
q Chinese dictation machine (CDM)v Fang Zheng, Zhanjiang Song, Mingxing Xu, et al. “EasyTalk:
A Large-Vocabulary Speaker-Independent Chinese Dictation Machine,” EuroSpeech'99, Vol. 2, pp.819-822, Budapest, Hungary, Sept. 1999
v Jian Wu, and Fang Zheng. “Reducing time-synchronous beam search effort using stage based look-ahead and language model rank based pruning,” ICSLP’00, pp. IV-262~265
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 26
q Spoken dialoguesv Yinfei Huang, Fang Zheng, Mingxing Xu, et al. “Language
understanding component for Chinese dialogue system,”ICSLP’00, pp. III-1053~1056, Oct. 16-20, Beijing
v Yan Pengju, Zheng Fang, Xu Mingxing, et al. “Word-class stochastic model in a spoken language dialogue system,”ICSLP’00, pp. 141-144, Oct. 13-15, 2000, Beijing
v Pengju Yan, Fang Zheng, Hui Sun, et al. “Parsing Spontaneous speech in the dialogue systems,” to be submitted
v Xiaojun Wu, Fang Zheng and Mingxing Xu. “TOPIC Forest: A plan-based dialogue management structure,” to appear in ICASSP’2001
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 27
q Speaker identification and verification
q Language Identification
q …
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 28
Resources
q Chinese Speech Databasev Standard Chinese (25 CD-ROMs)v Chinese w/ Yue accent (41 CD-ROMs)v Real-world spontaneous telephone dialogue (200 hours)v Chinese annotated spontaneous speech (CASS) corpus
(6 hours)v 863 Speech Recognition Database (40 CD-ROMs)v 863 Speech Synthesis Database (8 CD-ROMs)
q Chinese Text Databasev People’s Daily
EasyFlight
A Spoken Dialogue Systemfor
Flight Information Inquiry and
Flight Reservation
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 30
System Overview
q EasyFlight is a spoken dialogue system providingv Flight information inquiry; and
v Flight reservation.
q EasyFlight features:v Context-dependent understanding (w/ remembering and
forgetting scheme to support ellipsis(省略))
v Robust parsing (to enable spoken language phenomena)
v Topic changeable (to allow user shift among topics freely)
v Mixed-initiative (混合主导) (both the user and the machine can guide the following conversations at anytime)
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 31
User Utterance Keyword Lattice
SyntaxTree
DynamicVocabulary
Inquiry &Update
Dynam
ic Rule
Set
Contexts
SpeechResponse
Texts/Tags
Semanticframe
DialogueManager
KeywordSpotter
Text-to-Speech
SyntacticAnalyzer
SemanticAnalyzer
Results
DomainDatabase
Dialog History
& Status
MaintenanceResponseGenerator
Resp
onse
Foc
us
Text
resp
onse
System Block Diagram
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 32
Keyword Based Robust Parser
q We use a keyword-based parser and a context free grammar (CFG) for spoken language understanding v The symbols of the grammar are semantic-relevant
items
q Why keywords?
q Why Grammar?
q How we do?
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 33
q Why keywords?v For spoken dialogues, there are often
Speech Recognition Errors: deletion, substitution, and insertionSpontaneous Speech Phenomena: garbage, hesitation, repetition, correction, fragment, ellipsis, word disordering, ill form and so on
v So difficult to get fully correct recognized sentence for full sentence parsing
v An alternative way: keyword spotting, semantics-based grammar, partial parsing (each partial result is maintained)
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 34
q Why grammar?v The sentence structure can be viewed as a deterministic tree.
S
NP
PRON
我
VP
V
买
NP
NADJ
明天的 票
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 35
q Why grammar? (cont’d)v The structure of the underlying semantics (语义) and/or the
domain knowledge can also be viewed as a deterministic tree.
QUERY_FOR_FLIGHTS
DATE_TIME
WEEKLY_DATE
next Monday
ROUTE
ARRIVAL_CITY
Beijing
DEPARTURE_CITY
Shenzhen
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 36
q Problems when using grammarv Chinese is an ideographic (表意的) language
sentence in Chinese: casual than English
difficult to be modeled with syntactic grammars
v In dialogue systems, ungrammatical phenomena are common seen
ellipses or missing words/phrases
repetitions
garbage
fragments
disordering
ill forms
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 37
q Solutionsv Define special types of CFG rules to deal with
spoken language phenomena.
v Unlike Parts-of-Speech (POSes) as terminal symbols in traditional grammar, use keyword categories as terminal symbols and semantic units as non-terminal symbols to form a semantics-based grammar
v Enhance and modify the traditional chart parser into the Marionette parser
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 38
q A Keyword based robust parser includes:v Keyword List, used as lexicon for recognizer and
terminal symbols in the semantic grammar
v Grammar Definition, four types of rules are defined
v Grammar Transcription, a semantic grammar based on the analysis on a real-world domain corpus
v Marionette Parser, a chart parser making use of the aforesaid grammar and eliminating ambiguities by pruning/optimizing
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 39
q Keyword listv ~700 lexical wordsv ~70 semantic categoriesv 3 larger classes
Material class (实体类) - each word contains some real domain-specific info.Tag class (标记类) - each category plays a different role in identifying user’s intentionAtom class (原子类) - no word has substantial semantic meaning of their own but can be combined to become larger constituents (成分).
Keyword categories examples
“六” (digital suffix for weekday) ato_1to6
“一” (digits for ID spelling) ato_0to9_yao
“元” (January prefix) ato_january_prefix
“礼拜” (weekday prefix)ato_week
“多少” (“how many”)tag_how_many
“有没有” (“exist or not”)tag_exit_or_not
“到” (“to”)tag_to
“从这儿” (“from here”)tag_from_here
“上午” (“morning”)mat_time_of_the_day
“波音747” (“Boeing 747”)mat_aircraft_type
“CA” (“Air China”)mat_airline_code
“北京” (“Beijing”)mat_city_name
Explanation or exampleCategory
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 40
q Grammar Definitionv 4 types of grammar rules to cope with the spontaneous speech phenomena
Up-tying type (苛刻型) - where the sub-constituents are strictly tied together, as appeared in conventional grammar
By-passing type (跳跃型) - where the sub-constituents are combined together whether there exist gap words in between
Up-messing class (无序型) - where the sub-constituents can appear in any order
Over-crossing class (交叉型) - where the occupations of the sub-constituents can overlap with each other
v Overall featuresKeywords are taken as terminal symbols
All constituents are within semantics category instead of syntactic category
Thus the grammar is a semantic one
The grammar size is over 250
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 41
q Semantic Grammar Transcription (examples)v up-tying (苛刻型) rules
Some crucial information are not allowed to be mixed/inserted by other terms, e.g. personal ID no.
E.g. (in China, ID no. can be 15-digit or 18-digit long)§ sub_id_card_head *→→→→ ato_0to9_yao + ato_0to9_yao +
… +ato_0to9_yao (15 identical terms)
§ id_card_no →→→→ sub_id_card_head§ id_card_no *→→→→ sub_id_card_head +
ato_0to9_yao +ato_0to9_yao +ato_0to9_yao
This is the traditional rule type.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 42
q Semantic Grammar Transcription (examples)v by-passing (跳跃型) rules
Contrarily, some utterances are allowed to be inserted with recognition garbage/fillers, or meaningless parts, e.g., “星期啊三嗯星期四”
E.g.§ sub_week_day →→→→ ato_week + ato_1to6§ sub_week_day_list →→→→ sub_week_day§ sub_week_day_list →→→→ sub_week_day + sub_week_day_list§ sub_date →→→→ sub_week_day_list
A great deal of rules are of this type.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 43
q Semantic Grammar Transcription (examples)v up-messing (无序型) rules
Some information, such as time, city names, plane types etc., can appear without following any predefined orders
E.g. § timeloc_info_cond @→→→→ info_date_time_cond + info_fromto§ plane_info @→→→→ mat_airline_code + mat_aircraft_type§ flight_info_cond @→→→→ timeloc_info_cond + plane_info
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 44
q Semantic Grammar Transcription (examples)v over-crossing (交叉型) rules
Some phrases/constituents, such as “是…吗”, can have other constituents appear in between, e.g. “是到北京吗”, “是两张吗”
E.g. § mark_q_is →→→→ tag_is_or_not§ mark_q_is →→→→ tag_is + tag_question_mark§ mark_q_is →→→→ tag_is_q§ confirm_request #→→→→ mark_q_is + confirm_c
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 45
q When ambiguities (歧义) are met, evaluation are made according to some criteria, the constituent which ranks highest will survivev position (in sentence)
v occupation (number of leaf nodes)
v depth, etc.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 46
q Marionette Parser - an enhanced chart parserv Maintaining all partial resultsv Combining non-adjacent constituents (By-passing rules)v Considering all the possible order of the constituents (Up-
messing rules)v Grouping the constituents whether their occupations overlap
with each other or not (Over-crossing rules)v Taking the precedence of later sub-constituents over earlier
onesv Taking the precedence of larger sub-constituents over smaller
ones.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 47
v A part of the parsing algorithm
For constituent C at position ( )21, pp :
a) for arc nkk YCYYYYY LLL o121 +→ at position ( )0 1, 'p p
where 1'1 pp ≤ , at position ( )20, pp add a new arc
nkk YCYYYYY LLL o121 +→ ;
b) for arc nkk YYYYYY LL o121@ +→ where C is compatible
and not yet applied, add a new arc
nkk YYYYYY LL o121@ +→ at the calculated actual position;
c) for arc nkk YYYYYY LL o121# +→ where C is compatible, not
yet applied, and no overlapping met, add a new arc
nkk YYYYYY LL o121# +→ at the calculated actual position.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 48
v In a semantic tree (as in a syntactic tree)Statically, each rule node corresponds to a semantic function.
Dynamically, each constituent node has a pointer to a semantic function
The semantic analysis procedure is a procedure to call semantic functions:
§ At the very beginning, the topmost node’s semantic function is called;
§ The child node’s semantic function will be called by its parental node recursively;
§ Until the semantics is obtained finally.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 49
flight_info
key_info
tp_info
info_fromto
sub_to
sub_city_name_list
sub_from
mat_city_name×
五 北京从 到 上海那个
tag_from mat_city_name mat_city_name tag_to
...
北京月 一 号
info_data_time
...
×
呃
garbage repetition
various orders
ellipses fragments ill-form
A parsing example.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 50
Powerful Dialogue Manager (DM)q Role:
v Maintain dialogue contexts and statesv Direct the dialoguesv Accept parsing results and generate responses
q Desired features:v Be able to deal with multiple topicsv Topics can be changed freelyv Be able to make full use of information shared by different
topics and to support ellipsis (when topic changed from one to another)
v User and machine mixed-initiative (混合主导)v Be adaptive to users’ interests & parlancesv Be domain-transparent to user (easy to port to new systems)
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 51
q Problemsv Representation problems
Complex relationship among topic itemsItems with different importanceShared information items among topics
v Runtime problemsTo distinguish different user interestsTo handle freely topic changing
q Solutionv A plan-based Topic Forest (TF) structure with Shared
Information Index (SII); and v A Finite-State controller.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 52
q DM Overviewv Input: Semantic Frame
v Main StructureTopic Forest
Shared Info Index
v Output: Text Response
v Reasoning Engine: Strategies
v Dialogue control: Finite-state controller
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 53
q DM Input - Semantic FrameSemantic Framev Topic Information
Current topic
Semantic slots (info. items)
v Non-topic informationStatement/question
Question focus
Reference items
TopicSemantic slot 1Semantic slot 2Semantic slot 3
…...
Sentence TypeQuestion Focus
Reference item 1Reference item 2
…...
Semantic Frame
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 54
q DM Main Structure - Topic Forest (TF)Topic Forest (TF)v Consisting of Topic Trees (each for a single topic domain)v Representing domain topics & maintaining dialog historyv Off-line designed & dynamically loaded
Primary Property (PP):Dominate Info. Items
Secondary Property (SP):Detail Info. Items
Additional Property (AP):Optional Info. Items
Flight Information
PP (AND) AP (AND) SP (OR)
(OR)
(AND)
Airline Information
PP (OR)
Time Difference
PP (AND) AP (AND)
(OR)
(AND)
Date Departure Time
Flight Number
Arrival CityDeparture City
Arrival Time Plane Type Airline
Full Name Code
Time A Time B
Time Difference
City A City B
Abbreviation
Leaf Node: Topic Information Item
Topic Node: Domain Topic Info
Mid Node: Relation among son nodes (AND/OR), used for users’ interests adaptationPart of Topic Forest of EasyFlight (3 topic trees for 3 topics)
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 55
q DM Main Structure – Shared Info Index (SII)Shared Info Index (SII)v A collection of one-to-many mappings
v Help to deal with ellipsis between topics
v Automatically generated after Topic Forest loaded
Flight Number
……
Departure City
……
Arrival City
…… ……
……
Flight Information
PP (OR)
(OR)
Ticket Price
PP (OR)
(AND)
…… ……
Special Semanteme
Flight Number
Departure City Arrival City
Date
(AND)Flight Number
Departure City Arrival City
Shared Info Index connects all topics.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 56
q DM Output - Text ResponseText Responsev Response Generator - Response Generation Functions
One leaf node - one focus - one function
Different responses according to dialog status§ Inquiry failure, confirmation of topic information, ...
Leaning users’ parlances
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 57
q DM Reasoning Engine - StrategiesStrategiesv Maintaining dialogue state history
By filling the topic forest: with the obtained informationInformation sources:
§ User information: that expressed by the user§ Inquiry result: that obtained from the database
Filling operations:§ Appending: keep (remember) previous information§ Replacing: erase (forget) previous information
v Ellipsis ProcessingEllipsis in the same topic: by maintaining a list of the most recent topic information items (stored in the corresponding topic tree)Ellipsis among different topics: by using the Shared Information Index (SII)
v Reasoning StrategyTopic Forest basedQuerying databaseDetermining response focusUsers’ interests adaptationIndependent of domain knowledge
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 58
q Dialogue Control – FiniteFinite--state Controllerstate Controllerv Finite-state based method to control dialogue progress, suitable
for situation of item-by-item confirmation
v On a basis of Topic Forest StructureWith previous structure unchanged (only some leaf nodes added)
With other topics unaffected
With intelligent inquiry supported
Flight Information
AP (AND)
Plane Type Airline Personal IDTicket Amount State
……
Flight Information topic with added leaf nodes
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 59
v The state transition network
State: Initial state: Transition or (with condition):
Flight Inquiry
Flight Confirmation
Ticket Confirmation
Ticket Asking
PersonalIDConfirmation
PersonalIDAsking
yes & ticket amount unknown
no
yes & ticket amount known
flight undecided
flight concluded
correction
extracted
no yes & personal ID unknown
unextracted
yes & personal ID known extracted
no
unextracted
correction
yesxxx Ticket Reservation
xxx
Demonstrations
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 61
EasyFlight
q Flight information inquiry, and
q Flight reservation.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 62
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 63
Greetings.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 64
User wants to book a ticket. Machine initiates by departure city.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 65
User’s response w/ additional arrival city. Machine asks for date.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 66
Machine provides 14 flights and asks for time.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 67
User wants the earliest flight. Machine wants user to confirm.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 68
User changes topic to ask plane type.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 69
User changes topic back, and changes mind to buy two tickets. Machine needs confirmation.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 70
User finally confirms.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 71
Thanks.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 72
EasyNav
q THU Campus Navigation system.
q Was selected to demonstrate for President JiangZemin in '2000 Spring Festival.
q The ONLY system selected for demonstration at THU Information Center during THU 90th
Anniversary.
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 73
How long does it take to walk from X to Y?
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 74
Where can something be done?
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 75
Different ways to ask “which X is closer to Loc_A”? (1/5)
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 76
Different ways to ask “which X is closer to Loc_A”? (2/5)
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 77
Different ways to ask “which X is closer to Loc_A”? (3/5)
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 78
Different ways to ask “which X is closer to Loc_A”? (4/5)
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 79
Different ways to ask “which X is closer to Loc_A”? (5/5)
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 80
What is to the east of Loc_A?
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 81
What are there to the east of Loc_A?
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 82
Where is Loc_A?
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 83
What X is nearby? (Ellipsis)
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 84
Which is better? (Ellipsis)
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 85
Which is cheaper? (Ellipsis)
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 86
How to get Loc_A?
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 87
How to get Loc_B from Loc_A?
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 88
How long does it take to get X?
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 89
How far from Loc_A to a nearest X?
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 90
(When no information in database?)
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 91
(When two places are the same.)
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 92
(When no information in database.)
CCCCCCCCenter of enter of SSSSSSSSpeech peech TTTTTTTTechnology, Tsinghua Universityechnology, Tsinghua University Slide 93
Good-bye.
Thanks for listening
Thomas Fang ZhengThomas Fang ZhengCenter of Speech Technology
State Key Lab of Intelligent Technology and Systems
Department of Computer Science & TechnologyTsinghua University
[email protected], http://sp.cs.tsinghua.edu.cn/~fzheng/