Spoken Language Understanding, the Research/Industry Chasm

30
© 2002 IBM Corporation http://w3.ibm.com/ibm/presentations Natural Language Technologies May 5, 2004 | SLU for Conversational Systems Spoken Language Understanding, the Research/Industry Chasm Roberto Pieraccini IBM T.J.Watson Research Center [email protected]

description

Spoken Language Understanding, the Research/Industry Chasm. Roberto Pieraccini IBM T.J.Watson Research Center [email protected]. INDUSTRY. FSM based SLU. Call Routing. RESEARCH. Call Routing Sentence Classification Statistical Parsing. Phrase Structure ATN, Semantic Grammars - PowerPoint PPT Presentation

Transcript of Spoken Language Understanding, the Research/Industry Chasm

© 2002 IBM Corporation

Natural Language Technologies

May 5, 2004 | SLU for Conversational Systems

Spoken Language Understanding,the Research/Industry Chasm

Roberto PieracciniIBM T.J.Watson Research [email protected]

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

A Brief History of SLU

1970 1980 1990 2000

FSMbased SLU

Phrase StructureATN, Semantic Grammarsmainly for understanding text

Robust ParsingSpontaneous Speech

Call RoutingSentence ClassificationStatistical Parsing

RE

SE

AR

CH

IND

US

TR

Y

FSMbased SLU

Call Routing

ARPA SURDARPARESOURCEMANAGEMENT

ATIS COMMUNICATOR

VoiceXMLSRGS

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

SLU RESEARCH

Open NL understanding

Few deployed systems

Little data available

Artificial tasks

Lack of evaluation paradigm

Little funding for SLU research

COMMERCIAL SLU

Mostly directed dialog

100s of deployed systems

Lots of proprietary data

Customer driven tasks

Task completion evaluation

Revenue based on license or per-minute

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

BASIC, 309

NOISE, 93

ENABLING, 237

APP, 28

DIALOG, 86

NL, 9

TRANS, 14

UND, 24

EuroSpeech 2003 – Paper Breakdown

BASIC: Signal Processing, Speech Modeling, Acoustics, Speech Enhancement, Prosody, Emotions, Speech Coding, Corpora, Phonetics

NOISE: Noise Robustness, Robust ASR,

ENABLING: Speech Recognition, Synthesis, Language Modeling, Speaker/Language ID, Speaker Verif.

APP: non Dialog applications

DIALOG: Dialog and Multimodal systems

NL: Summarization, Title extraction, topic detection, NE recognition, ...

TRANS: Speech to Speech Translation

UND: Spoken Language Understanding

Spoken Language Understanding Papers : 24/800 = 3%14 Academic, 10 Industrial

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

SLU is difficult

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

SLU is difficult to evaluate

End-to-end evaluation– Based on task completion measures

– Needs the full conversational system

– Needs real, motivated users

Semantic evaluation– Based on semantic annotation

– Costly

– Subjective

– Needs interpretation principles

– Highly domain/application dependent

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The ATIS evaluationI am flying between New York and Washington tomorrow, early in the afternoon

DEP. CITY DEP. AIRPORT ARR. CITY ARR. AIRPORT AIRLINE FLIGHT # DEP. TIME

NNYC JFK DDEN DEN CO 156 12:37

NNYC LGA DDEN DEN DL 8901 12:58

NNYC LGA DDEN DEN DL 8903 13:45

NNYC JFK DDEN DEN AA 578 13:57

NNYC LGA DDEN DEN UA 187 14:15

NNYC JFK DDEN DEN DL 987 15:27

Systems evaluated on the basis of the data retrieved from the relational database

Reference min and max answers

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The ATIS evaluationI am flying between New York and Washington tomorrow, early in the afternoon

Systems evaluated on the basis of the data retrieved from the relational database

Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)

Edited by the PofI committee over the 5 years of the project

Regular weekly meetings

About 100 principles

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The ATIS evaluation

Systems evaluated on the basis of the data retrieved from the relational database

Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)

Edited by the PofI committee over the 5 years of the project

Regular weekly meetings

About 100 principles

I am flying between New York and Washington tomorrow, early in the afternoon

2.2.1 A flight "between X and Y" means a flight "from X to Y".

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The ATIS evaluation

Systems evaluated on the basis of the data retrieved from the relational database

Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)

Edited by the PofI committee over the 5 years of the project

Regular weekly meetings

About 100 principles

I am flying between New York and Washington tomorrow, early in the afternoon

2.2.8 The location of a departure, stop, or arrival should always be taken to be an airport.

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The ATIS evaluation

Systems evaluated on the basis of the data retrieved from the relational database

Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)

Edited by the PofI committee over the 5 years of the project

Regular weekly meetings

About 100 principles

I am flying between New York and Washington tomorrow, early in the afternoon

2.2.3.3 "Stopovers" will mean "stops" unless the context clearly indicates that the subject intended "stopover", as in "Can I make a two day stopover on that flight?". In that case the query is answered using the stopover column of the restrictions table.

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The ATIS evaluation

Systems evaluated on the basis of the data retrieved from the relational database

Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)

Edited by the PofI committee over the 5 years of the project

Regular weekly meetings

About 100 principles

I am flying between New York and Washington tomorrow, early in the afternoon

2.2.6 A "red-eye" flight is one that leaves between 9 P.M. and 3 A.M. and arrives between 5 A.M. and 12 noon.

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The ATIS evaluation

Systems evaluated on the basis of the data retrieved from the relational database

Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)

Edited by the PofI committee over the 5 years of the project

Regular weekly meetings

About 100 principles

I am flying between New York and Washington tomorrow, early in the afternoon

morning 0000 1200afternoon 1200 1800evening 1800 2200day 600 1800night 1800 600early morning 0000 800mid-morning 800 1000

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The ATIS evaluation

Systems evaluated on the basis of the data retrieved from the relational database

Reference min and max answers Evaluation regulated by Principles of Interpretation (PofI)

Edited by the PofI committee over the 5 years of the project

Regular weekly meetings

About 100 principles

I am flying between New York and Washington tomorrow, early in the afternoon

INCLUDES TERM ENDPOINTS? before T1 No after T1 No between T1 and T2 Yes arriving by T1 Yes departing by T1 Yes periods of the day Yes

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

S. Oviatt, "Predicting spoken disfluences during human-computer interaction“

Computer Speech and Language, 1995, 9:19--35.

Do we need SLU in commercial applications?

Disfluences = Self corrections, False starts, Repetitions, Filled pauses

Disfluence rate more than doubles going from constrained to unconstrained interactions.

Disfluence rate grows linearly with length of utterance

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

S.M. Witt, J.D. Williams " Two Studies of Open vs. Directed Dialog Strategies in

Spoken Dialog Systems,“ Proc. of EUROSPEECH 2003, Geneva, CH, September 2003

Do we need SLU in commercial applications?

Apps with one time users do poorly with open prompt systems

Apps with repeat users do with open prompts almost as well as with directed dialog.

OPEN: What would you like to do?DIRECTED: ...choose from the following options: web password reset, course enrollment, direct deposit or benefits.

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

0

10

20

30

40

50

60

1 2 3 4 5 6 >6

Number of words/utterance

Fre

qu

ency

Human to Human – AMEX (SRI) (2082)

DARPA Communicator Data: Dec 2000 (11168)

SpeechWorks deployed applications - Directed dialog + NL (136447)

2.0 2.9 6.0

AVERAGE NUMBER OF WORDS PER SENTENCE

Do we need SLU in commercial applications?

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

Do we need SLU in commercial applications?

Yes, but it depends on the application

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

Sentence Structure

(natural-language-ness)

Simple Phrases

System Initiative

Natural Language

Mixed Initiative

Dialog

Structure

(initiative)

Natural Language

System Initiative

Simple Phrases

Mixed Initiative

Do we need SLU in commercial applications?

PIZZA ORDERING

FLIGHT STATUS

STOCK TRADING

BANKING

HELP DESK

PROBLEM SOLVING

ROUTING

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

An architectural chasm?

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

Research Conversational Architecture

SPEECHRECOGNIZER

LanguageModel

DIALOGMANAGER

NATURALLANGUAGE

UNDERSTANDING

SemanticModel

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

Commercial Conversational Architecture

VoiceXMLBrowser

Grammar5

ApplicationServer

Grammar4Grammar

3Grammar2Grammar

1

GRAMMAR 3ASR resultGRAMMAR 2

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

Current industrial SLU$ROOT = $ITINERARY;

$ITINERARY = $FROM $TO;

$FROM = from $AIRPORT;

$TO = to $AIRPORT;

$NY = (new york) | (J F K ) | kennedy

$BOS = boston | logan

$AIRPORT = ($NY | $BOS) [airport]

direction = "";if (origin == "JFK" && destination == "BOS") {

direction = "north}elseif(origin == "BOS" && destination =="JFK") {

direction = "south";}

origin = airport;

destination = airport

airport = "JFK"

airport = "BOS";

origin = "BOS" destination = "JFK"

From Boston to New York

$BOS $NY

$AIRPORT $AIRPORT

$FROM $TO

$ITINERARY

$ROOT

direction = "south"

airport = "JFK"airport = "BOS";

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

SGRS Standard for grammars<?xml version='1.0' encoding='ISO-8859-1'?><grammar version='1.0' xml:lang='en-us' root="ROOT"> <rule id="ROOT" scope="public"> <ruleref uri="#ITINERARY" tag=" direction = ""; if (ITINERARY.origin == "JFK" && ITINERARY.destination == "BOS") { direction = "north; } elseif(ITINERARY.origin == "BOS" && ITINERARY.destination =="JFK") { direction = "south";}"/> </rule>

<rule id="ITINERARY" scope="public"> <ruleref uri="#FROM" tag="origin = FROM.airport;"/> <ruleref uri="#TO" tag="destination = TO.airport; "/> </rule>

<rule id="FROM"> <item>from</item> <ruleref uri="#AIRPORT" tag="airport=AIRPORT.airport;"/> </rule>

<rule id="NY"> <one-of tag="airport="JFK"/> <item>new york</item> <item>JFK</item> <item>kennedy</item> </one-of>

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

Difficult problems for commercial systems

No data for training in the design/development phaseSystem development with no data

Tools for fast grammar handcrafting

Tools for content word normalization/speech-ification

Oodles of data after deploymentTools for automatic or semi-automatic adaptation/learning

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The problem of content words

I need to go to Phoenix from New York leaving on February 4th

Sentence structure

Content

Word

Variations

I need to go from New York to Phoenix on February 4th

On February 4th leaving from New York and going to Phoenix

Newark

Boston

Denver

Dallas

Baltimore

San Francisco

Los Angeles

Philadelphia

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

The problem of content words

Large lists of content words need to have priorsHow to estimate priors with no data (or even if you have data?)

e.g. airport names, flight numbers, street names

Large lists of content words often come from proprietary databases

Spelling to Phonemes

Acronym expansion

Word normalization

14" display w/ anti-glr scrn

Synonym/paraphrases generation

A fourteen inches display with anti-glare screen

A display of fourteen inches size with an anti-reflection screen

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

Exploiting real data

Flight

679

Area Code …

Area Code 3

Area Code 2

Area Code 2

Area Code 1

Time

Time

10 AM

Time

9 AM

Time

8 AM

Time

7 AMDay

Day

Thu

Day

Wed

Day

Tue

Day

Mon

Code) Area|NumberFlight (P

)Time|NumberFlight (P

)Code Area|NumberFlight (P)Day|NumberFlight (P

Flight Number Identification

75.00%

80.00%

85.00%

90.00%

Dev Set Eval Set

N-Best Result

Pure BN result

Global Result

Wai, C., Pieraccini, R., Meng, H., “A Dynamic Semantic Model for Rescoring Recognition Hypothesis,” Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2001

TRAINING: 2.8 M utterancesTEST: 1485 utterances

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

Conclusions

There is very little research in SLU todaylack of data, funding, motivation

SLU is difficult and difficult to evaluatesemantic vs. task completion

Certain speech based applications do not need SLU, other do.be aware of competing technologies, even if they are not so advanced

There are difficult problems in commercial SLU that are not addressed by the research community.

realignment of academic and industrial research

Natural Language Technologies

SLU, The Research Industry Chasm | Roberto Pieraccini | HLT/NAACL 2004 © 2002 IBM Corporation

Advertising Campaign for SLU on Google