Spoken Language Understanding, the Research/Industry Chasm
-
Upload
kameko-rojas -
Category
Documents
-
view
14 -
download
0
description
Transcript of Spoken Language Understanding, the Research/Industry Chasm
© 2002 IBM Corporation
Natural Language Technologies
May 5, 2004 | SLU for Conversational Systems
Spoken Language Understanding,the Research/Industry Chasm
Roberto PieracciniIBM T.J.Watson Research [email protected]
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
A Brief History of SLU
1970 1980 1990 2000
FSMbased SLU
Phrase StructureATN, Semantic Grammarsmainly for understanding text
Robust ParsingSpontaneous Speech
Call RoutingSentence ClassificationStatistical Parsing
RE
SE
AR
CH
IND
US
TR
Y
FSMbased SLU
Call Routing
ARPA SURDARPARESOURCEMANAGEMENT
ATIS COMMUNICATOR
VoiceXMLSRGS
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
SLU RESEARCH
Open NL understanding
Few deployed systems
Little data available
Artificial tasks
Lack of evaluation paradigm
Little funding for SLU research
COMMERCIAL SLU
Mostly directed dialog
100s of deployed systems
Lots of proprietary data
Customer driven tasks
Task completion evaluation
Revenue based on license or per-minute
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
BASIC, 309
NOISE, 93
ENABLING, 237
APP, 28
DIALOG, 86
NL, 9
TRANS, 14
UND, 24
EuroSpeech 2003 – Paper Breakdown
BASIC: Signal Processing, Speech Modeling, Acoustics, Speech Enhancement, Prosody, Emotions, Speech Coding, Corpora, Phonetics
NOISE: Noise Robustness, Robust ASR,
ENABLING: Speech Recognition, Synthesis, Language Modeling, Speaker/Language ID, Speaker Verif.
APP: non Dialog applications
DIALOG: Dialog and Multimodal systems
NL: Summarization, Title extraction, topic detection, NE recognition, ...
TRANS: Speech to Speech Translation
UND: Spoken Language Understanding
Spoken Language Understanding Papers : 24/800 = 3%14 Academic, 10 Industrial
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
SLU is difficult
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
SLU is difficult to evaluate
End-to-end evaluation– Based on task completion measures
– Needs the full conversational system
– Needs real, motivated users
Semantic evaluation– Based on semantic annotation
– Costly
– Subjective
– Needs interpretation principles
– Highly domain/application dependent
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
The ATIS evaluationI am flying between New York and Washington tomorrow, early in the afternoon
DEP. CITY DEP. AIRPORT ARR. CITY ARR. AIRPORT AIRLINE FLIGHT # DEP. TIME
NNYC JFK DDEN DEN CO 156 12:37
NNYC LGA DDEN DEN DL 8901 12:58
NNYC LGA DDEN DEN DL 8903 13:45
NNYC JFK DDEN DEN AA 578 13:57
NNYC LGA DDEN DEN UA 187 14:15
NNYC JFK DDEN DEN DL 987 15:27
Systems evaluated on the basis of the data retrieved from the relational database
Reference min and max answers
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
The ATIS evaluationI am flying between New York and Washington tomorrow, early in the afternoon
Systems evaluated on the basis of the data retrieved from the relational database
Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)
Edited by the PofI committee over the 5 years of the project
Regular weekly meetings
About 100 principles
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
The ATIS evaluation
Systems evaluated on the basis of the data retrieved from the relational database
Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)
Edited by the PofI committee over the 5 years of the project
Regular weekly meetings
About 100 principles
I am flying between New York and Washington tomorrow, early in the afternoon
2.2.1 A flight "between X and Y" means a flight "from X to Y".
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
The ATIS evaluation
Systems evaluated on the basis of the data retrieved from the relational database
Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)
Edited by the PofI committee over the 5 years of the project
Regular weekly meetings
About 100 principles
I am flying between New York and Washington tomorrow, early in the afternoon
2.2.8 The location of a departure, stop, or arrival should always be taken to be an airport.
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
The ATIS evaluation
Systems evaluated on the basis of the data retrieved from the relational database
Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)
Edited by the PofI committee over the 5 years of the project
Regular weekly meetings
About 100 principles
I am flying between New York and Washington tomorrow, early in the afternoon
2.2.3.3 "Stopovers" will mean "stops" unless the context clearly indicates that the subject intended "stopover", as in "Can I make a two day stopover on that flight?". In that case the query is answered using the stopover column of the restrictions table.
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
The ATIS evaluation
Systems evaluated on the basis of the data retrieved from the relational database
Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)
Edited by the PofI committee over the 5 years of the project
Regular weekly meetings
About 100 principles
I am flying between New York and Washington tomorrow, early in the afternoon
2.2.6 A "red-eye" flight is one that leaves between 9 P.M. and 3 A.M. and arrives between 5 A.M. and 12 noon.
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
The ATIS evaluation
Systems evaluated on the basis of the data retrieved from the relational database
Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)
Edited by the PofI committee over the 5 years of the project
Regular weekly meetings
About 100 principles
I am flying between New York and Washington tomorrow, early in the afternoon
morning 0000 1200afternoon 1200 1800evening 1800 2200day 600 1800night 1800 600early morning 0000 800mid-morning 800 1000
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
The ATIS evaluation
Systems evaluated on the basis of the data retrieved from the relational database
Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)
Edited by the PofI committee over the 5 years of the project
Regular weekly meetings
About 100 principles
I am flying between New York and Washington tomorrow, early in the afternoon
INCLUDES TERM ENDPOINTS? before T1 No after T1 No between T1 and T2 Yes arriving by T1 Yes departing by T1 Yes periods of the day Yes
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
S. Oviatt, "Predicting spoken disfluences during human-computer interaction“
Computer Speech and Language, 1995, 9:19--35.
Do we need SLU in commercial applications?
Disfluences = Self corrections, False starts, Repetitions, Filled pauses
Disfluence rate more than doubles going from constrained to unconstrained interactions.
Disfluence rate grows linearly with length of utterance
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
S.M. Witt, J.D. Williams " Two Studies of Open vs. Directed Dialog Strategies in
Spoken Dialog Systems,“ Proc. of EUROSPEECH 2003, Geneva, CH, September 2003
Do we need SLU in commercial applications?
Apps with one time users do poorly with open prompt systems
Apps with repeat users do with open prompts almost as well as with directed dialog.
OPEN: What would you like to do?DIRECTED: ...choose from the following options: web password reset, course enrollment, direct deposit or benefits.
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
0
10
20
30
40
50
60
1 2 3 4 5 6 >6
Number of words/utterance
Fre
qu
ency
Human to Human – AMEX (SRI) (2082)
DARPA Communicator Data: Dec 2000 (11168)
SpeechWorks deployed applications - Directed dialog + NL (136447)
2.0 2.9 6.0
AVERAGE NUMBER OF WORDS PER SENTENCE
Do we need SLU in commercial applications?
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
Do we need SLU in commercial applications?
Yes, but it depends on the application
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
Sentence Structure
(natural-language-ness)
Simple Phrases
System Initiative
Natural Language
Mixed Initiative
Dialog
Structure
(initiative)
Natural Language
System Initiative
Simple Phrases
Mixed Initiative
Do we need SLU in commercial applications?
PIZZA ORDERING
FLIGHT STATUS
STOCK TRADING
BANKING
HELP DESK
PROBLEM SOLVING
ROUTING
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
An architectural chasm?
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
Research Conversational Architecture
SPEECHRECOGNIZER
LanguageModel
DIALOGMANAGER
NATURALLANGUAGE
UNDERSTANDING
SemanticModel
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
Commercial Conversational Architecture
VoiceXMLBrowser
Grammar5
ApplicationServer
Grammar4Grammar
3Grammar2Grammar
1
GRAMMAR 3ASR resultGRAMMAR 2
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
Current industrial SLU$ROOT = $ITINERARY;
$ITINERARY = $FROM $TO;
$FROM = from $AIRPORT;
$TO = to $AIRPORT;
$NY = (new york) | (J F K ) | kennedy
$BOS = boston | logan
$AIRPORT = ($NY | $BOS) [airport]
direction = "";if (origin == "JFK" && destination == "BOS") {
direction = "north}elseif(origin == "BOS" && destination =="JFK") {
direction = "south";}
origin = airport;
destination = airport
airport = "JFK"
airport = "BOS";
origin = "BOS" destination = "JFK"
From Boston to New York
$BOS $NY
$AIRPORT $AIRPORT
$FROM $TO
$ITINERARY
$ROOT
direction = "south"
airport = "JFK"airport = "BOS";
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
SGRS Standard for grammars<?xml version='1.0' encoding='ISO-8859-1'?><grammar version='1.0' xml:lang='en-us' root="ROOT"> <rule id="ROOT" scope="public"> <ruleref uri="#ITINERARY" tag=" direction = ""; if (ITINERARY.origin == "JFK" && ITINERARY.destination == "BOS") { direction = "north; } elseif(ITINERARY.origin == "BOS" && ITINERARY.destination =="JFK") { direction = "south";}"/> </rule>
<rule id="ITINERARY" scope="public"> <ruleref uri="#FROM" tag="origin = FROM.airport;"/> <ruleref uri="#TO" tag="destination = TO.airport; "/> </rule>
<rule id="FROM"> <item>from</item> <ruleref uri="#AIRPORT" tag="airport=AIRPORT.airport;"/> </rule>
<rule id="NY"> <one-of tag="airport="JFK"/> <item>new york</item> <item>JFK</item> <item>kennedy</item> </one-of>
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
Difficult problems for commercial systems
No data for training in the design/development phaseSystem development with no data
Tools for fast grammar handcrafting
Tools for content word normalization/speech-ification
Oodles of data after deploymentTools for automatic or semi-automatic adaptation/learning
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
The problem of content words
I need to go to Phoenix from New York leaving on February 4th
Sentence structure
Content
Word
Variations
I need to go from New York to Phoenix on February 4th
On February 4th leaving from New York and going to Phoenix
Newark
Boston
Denver
Dallas
Baltimore
San Francisco
Los Angeles
Philadelphia
…
…
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
The problem of content words
Large lists of content words need to have priorsHow to estimate priors with no data (or even if you have data?)
e.g. airport names, flight numbers, street names
Large lists of content words often come from proprietary databases
Spelling to Phonemes
Acronym expansion
Word normalization
14" display w/ anti-glr scrn
Synonym/paraphrases generation
A fourteen inches display with anti-glare screen
A display of fourteen inches size with an anti-reflection screen
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
Exploiting real data
Flight
679
Area Code …
Area Code 3
Area Code 2
Area Code 2
Area Code 1
Time
…
Time
10 AM
Time
9 AM
Time
8 AM
Time
7 AMDay
Day
Thu
Day
Wed
Day
Tue
Day
Mon
Code) Area|NumberFlight (P
)Time|NumberFlight (P
)Code Area|NumberFlight (P)Day|NumberFlight (P
Flight Number Identification
75.00%
80.00%
85.00%
90.00%
Dev Set Eval Set
N-Best Result
Pure BN result
Global Result
Wai, C., Pieraccini, R., Meng, H., “A Dynamic Semantic Model for Rescoring Recognition Hypothesis,” Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2001
TRAINING: 2.8 M utterancesTEST: 1485 utterances
Natural Language Technologies
SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation
Conclusions
There is very little research in SLU todaylack of data, funding, motivation
SLU is difficult and difficult to evaluatesemantic vs. task completion
Certain speech based applications do not need SLU, other do.be aware of competing technologies, even if they are not so advanced
There are difficult problems in commercial SLU that are not addressed by the research community.
realignment of academic and industrial research