Post on 30-Dec-2015
SIG IL 2000
Evaluation of a Practical Interlingua for Task-Oriented Dialogue
Lori Levin, Donna Gates, Alon Lavie, Fabio Pianesi, Dorcas Wallace, Taro
Watanabe, Monika Woszczyna
SIG IL 2000
Interchange Format Design
The CSTAR II Interchange Format was designed and developed by all of the CSTAR II partners: CMU, IRST, ETRI, UKA, CLIPS++ and ATR.
www.c-star.org
SIG IL 2000
Expressivity vs Simplicity
• If it is not expressive enough, components of meaning will be lost.
• If it is not simple enough, it can’t be used reliably across sites.
• If it is not simple enough, it will not be quickly portable to new domains.
SIG IL 2000
Task Oriented Sentences
• Perform an action in the domain.
• Are not descriptive.
• Contain fixed expressions that cannot be translated literally.
SIG IL 2000
Domain Actions: Extended, Domain-Specific Speech Acts
Examples:
c:request-information+availability+room
a:give-information+personal-data
c:give-information+temporal+arrival
SIG IL 2000
Components of the Interchange Format
speaker a: a: (agent)
speech act give-informationgive-information
concept* +availability+room+availability+room
argument* (room-type=(single & double), (room-type=(single & double), time=md12)time=md12)
SIG IL 2000
Examples no that’s not necessary c:negatec:negate
yes I am c:affirmc:affirm
and I was wondering what you have in the way of rooms available during that time c:request-information+availability+roomc:request-information+availability+room
my name is alex waibel c:give-information+personal-data (person-name=(given-name=alex, family-c:give-information+personal-data (person-name=(given-name=alex, family-
name=waibel))name=waibel))
and how will you be paying for this a:request-information+payment (method=question)a:request-information+payment (method=question)
I have a mastercard c:give-information+payment (method=mastercard)c:give-information+payment (method=mastercard)
SIG IL 2000
Not Covered orNot Represented in IF
• Relative clauses
• Comparatives (in general)
• Tense
• Number (but quantity is represented)
SIG IL 2000
Expressivity: Coverage Experiment
• Development data was tagged with interlingua representations by human experts.
• Sentences that are not intended to be covered by the interlingua (as judged by human experts) were given the tag “no-tag.”
• Test data was tagged by human experts.
SIG IL 2000
Coverage Experiment:Development and Test Data
Languages Dialogue Type Number of DAUnits
Development Data:
English monolingual 2698
Italian monolingual 234
Korean bilingual (onlyKorean utterancesare included)
1142
Test Data:
Japanese-English bilingual 6069
SIG IL 2000
The Interchange Format Database
61.2.3 olang I lang I Prv IRST “telefono per prenotare delle stanze per quattro colleghi”
61.2.3 olang I lang E Prv IRST “I’m calling to book some rooms for four colleagues”
61.2.3 IF Prv IRST c:request-action+reservation+features+room (for-whom= (associate, quantity=4))
61.2.3 comments: dial-oo5-spkB-roca0-02-3
d.u.sdu olang X lang Y Prv Z “sdu-in-language-Y on one line”d.u.sdu olang X lang E Prv Z “sdu-in-English on one line”d.u.sdu IF Prv Z dialogue-act-on-one-lined.u.asdu comments: your commentsd.u.asdu comments: go here
SIG IL 2000
Coverage of Top 10 Dialogue Acts in Development Data
Cumulative % Percent Count DA5.9 244 no-tag
15.7 15.7 652 acknowledge19.8 4.1 172 affirm23.3 3.4 143 thank26.0 2.7 113 introduce-self28.0 2.0 85 give-info+price30.1 2.0 85 greeting31.9 1.9 78 give-info+temp33.7 1.8 75 give-info+num35.5 1.8 73 give-
info+price+room37.2 1.7 70 req-info+payment
SIG IL 2000
Coverage of Top 10 Speech Acts in Development Data
Cumulative % Percent Count Speech Act30.1 30.1 1250 give-information45.8 15.7 655 acknowledge57.7 11.9 493 req-information62.7 5.0 209 req-verif-give-inf.67.6 4.9 203 request-action71.7 4.1 172 affirm75.1 3.4 143 thank77.9 2.7 113 introduce-self80.2 2.4 98 offer82.4 2.1 89 accept
SIG IL 2000
Coverage of Top 10 Dialogue Acts in Test Data
Cumulative % Percent Count DA4.6 263 no-tag
15.6 15.6 885 acknowledge20.2 4.6 260 thank23.7 3.5 200 introduce-self27.0 3.4 191 affirm29.7 2.7 153 apologize32.3 2.6 147 greeting34.6 2.3 128 closing36.3 1.7 98 give-info+personal38.0 1.7 95 give-info+temp.39.5 1.6 89 give-info+price
SIG IL 2000
Coverage of Top 10 Speech Acts in Test Data
Cumulative % Percent Count DA
25.6 25.6 1454 give-information
41.7 16.1 916 acknowledge
53.6 11.9 677 req-information
58.2 4.6 260 thank
62.0 3.7 213 req-verif-give-info
65.5 3.5 200 Introduce-self
68.8 3.4 191 affirm
72.0 3.2 181 request-action
74.8 2.8 159 accept
77.5 2.7 153 apologize
SIG IL 2000
Simplicity:Consistency of Use Across Sites
• Successful international demo.
• After testing English-Italian and English-Korean, Italian-Korean worked without extra effort.
• Inter-coder agreement experiment• Cross-site evaluation experiment
SIG IL 2000
Inter-coder Agreement Experiment
• 84 DA units from Japanese-English data
• Some dialogue fragments and some isolated sentences
• Coded at CMU and IRST
• Results reported in percent agreement
SIG IL 2000
Inter-Coder Agreement Resuts
Speech Act 82.14
Concept List 88.00
Dialogue Act 65.48
Argument List 85.79
SIG IL 2000
Inter-Coder Agreement Error Analysis of 33 Sentences
• 6 are equivalent due to ambiguity in the IF specification.
• 16 are similar enough to produce output with equivalent meaning.– offer-search+availability: Let me check the availability– give-information+search+availability: I will check the
availability• 4 contain differences where the input sentence was
ambiguous and taggers chose different meanings.– 6 o’clock could be 6:00 or 18:00
• 5 contain errors by one or more taggers and would produce outputs with different meanings
SIG IL 2000
Cross-Site Evaluation
• Analysis and generation grammars were written at different sites (CMU and IRST).
• Analysis at CMU produces IF.
• IF is sent to IRST.
• Generation at IRST produces Italian sentences.
SIG IL 2000
Intra-Site Evaluation
• Analysis and generation are both performed at CMU by researchers in constant contact with each other.
• English-IF-English, English-German, and English-Japanese
SIG IL 2000
Cross Site Evaluation Data
• 130 utterances from a user study performed at CMU
• Speech input• “Traveller” is a second time user.• “Agent” is a system developer.• Traveller and agent cannot see or hear each
other.• All communication is through English-IF-
English paraphrase.
SIG IL 2000
Evaluation Scoring• OK: meaning is preserved• Perfect: meaning is preserved and the output is
fluent• Bad: meaning is not preserved• Acceptable: Sum of Perfect and OK• English-German was graded at CMU, IRST and
CLIPS. • English-IF-English was graded at CMU and CLIPS• English-Japanese was graded at CMU.• English-Italian was graded at IRST.• English-French was graded at CLIPS
SIG IL 2000
End-to-End Evaluation Results
Method Output Lang. % Acceptable Grader Number ofGraders
1. Recognition English 78% CMU 32. Transcr. English 75% CMU+CLIPS 43. Rec. English 61% CMU+CLIPS 44. Transcr. Japanese 77% CMU 25. Rec. Japanese 62% CMU 26. Transcr. German 69% CMU+IRST+
CLIPS5
7. Rec. German 60% CMU+IRST+CLIPS
5
8. Transc. Italian 73% IRST 6
9. Rec. Italian 61% IRST 6
10. Transcr. French 66% IRST 211. Rec. French 56% IRST 2
SIG IL 2000
End-to-End Evaluation Results
Method Output Lang. % Acceptable Grader Number ofGraders
1. Recognition English 78% CMU 32. Transcr. English 74% CMU 33. Rec. English 59% CMU 34. Transcr. Japanese 77% CMU 25. Rec. Japanese 62% CMU 26. Transcr. German 70% CMU 27. Rec. German 58% CMU 28. Transcr. German 67% IRST 29. Rec. German 59% IRST 2
10. Transcr. Italian 73% IRST 611. Rec. Italian 61% IRST 6
SIG IL 2000
Conclusions
• Coverage is surprisingly good for a certain type of data: role playing for flight reservations, hotel reservations, greetings, and payment.
• Cross-site evaluation is about as good as intra-site evaluation.
• Inter-coder agreement could be improved, but not all errors affect translation quality.