Identifying the Translations of Idiomatic Expressions using
Transcript of Identifying the Translations of Idiomatic Expressions using
![Page 1: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/1.jpg)
August 20, 2011
Stéphane HUET and Philippe LANGLAIS
NLPCS 2011 - Copenhagen
Identifying the Translations of Idiomatic Expressions
using TransSearch
![Page 2: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/2.jpg)
Identifying Translations of Idiomatic Expressions 2 S. HUET
Idiomatic Expressions
• Oxford Companion to the English Language– Idioms are expressions of a given language,
whose sense is not predictable from the meanings and arrangement of their elements
– To fight like cat and dog– It rains cats and dogs
![Page 3: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/3.jpg)
Identifying Translations of Idiomatic Expressions 3 S. HUET
The problem of idiomatic expressions
• Numerous in most languages• Have idiosyncratic meanings that disturb
– Non-native persons– NLP
• In Machine Translation (MT)– Group multi-word expressions before the
alignment process [Lambert and Bancs 05]– Add a new feature encoding the fact that a
phrase is a multi-word expression [Carpuat and Diab 10]
![Page 4: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/4.jpg)
Identifying Translations of Idiomatic Expressions 4 S. HUET
Idiomatic expressions and MT
![Page 5: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/5.jpg)
Identifying Translations of Idiomatic Expressions 5 S. HUET
Idiomatic expressions and MT
![Page 6: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/6.jpg)
Identifying Translations of Idiomatic Expressions 6 S. HUET
Objectives of the study
• The ability of the bilingual concordancer TSRali to retrieve the translations of idiomatic expressions
• Practical issues in querying such a system
![Page 7: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/7.jpg)
Identifying Translations of Idiomatic Expressions 7 S. HUET
Outline
• Introduction• TransSearch• Experimental setup• Evaluations• Conclusion
![Page 8: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/8.jpg)
Identifying Translations of Idiomatic Expressions 8 S. HUET
• Available on the Web since 1996• Developed by the Université de Montréal• Subscribed by many professional translators in
Canada• 7.2 M queries over 6 years• Exploits an English-French translation memory• Incorporates word alignment technology
![Page 9: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/9.jpg)
Identifying Translations of Idiomatic Expressions 9 S. HUET
User interface
![Page 10: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/10.jpg)
Identifying Translations of Idiomatic Expressions 10 S. HUET
User interface
1. Retrieve sentence pairs
![Page 11: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/11.jpg)
Identifying Translations of Idiomatic Expressions 11 S. HUET
User interface
2. Spot translations
![Page 12: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/12.jpg)
Identifying Translations of Idiomatic Expressions 12 S. HUET
User interface
3. Identify the list of translations
![Page 13: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/13.jpg)
Identifying Translations of Idiomatic Expressions 13 S. HUET
Alignment and translation
• Word-based alignment (IBM)
• Translation spotting
This is in keeping with that strategy .
La présente mesure est conforme à cette stratégie .
La présente mesure est conforme à cette stratégie .
This is in keeping with that strategy .
– Constrained to contiguous word alignment
![Page 14: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/14.jpg)
Identifying Translations of Idiomatic Expressions 14 S. HUET
Post-processing steps
• Objective: to have relevant and informative translations in the top list
• Bad translations filtering– Supervised classifier– Features: alignment probabilities, POS tags
• Similar translations merging– Inflectional forms of the same canonical words
conforme à / conforme aux
– Difference by grammatical words or punctuations
à l'encontre de / à l'encontre
![Page 15: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/15.jpg)
Identifying Translations of Idiomatic Expressions 15 S. HUET
Type of queries
• Verbatim queries: “normal” queries– is still in its infancy
• Ellipses: for discontinuous expressions– is .. in its infancy
• Dictionary queries: for morphological expansions– be+ still in its+ infancy
• Bilingual queries: to check translations– En: is still in its infancy
Fr: en est encore à ses premiers balbutiements
![Page 16: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/16.jpg)
Identifying Translations of Idiomatic Expressions 16 S. HUET
Outline
• Introduction• TransSearch• Experimental setup• Evaluations• Conclusion
![Page 17: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/17.jpg)
Identifying Translations of Idiomatic Expressions 17 S. HUET
Resources
• Translation memory– Canadian Hansards (1986-2007)– 8.3 M sentence pairs
• Idiom lexicon– French-English phrase book– 1,467 expressions– Some entries with 2 or 3
translations
![Page 18: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/18.jpg)
Identifying Translations of Idiomatic Expressions 18 S. HUET
Type of idiomatic expressions
• 2% are expressed in an informal language– She's well-upholstered.
– Il roule des mécaniques.
• 99% are used in the context of a sentence– It's fantastic to bop till you drop.
• 80% are verbal phrases used in their inflected forms– I slept like a log.
• 20% are fixed expressions– When there's a will, there's a way
![Page 19: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/19.jpg)
Identifying Translations of Idiomatic Expressions 19 S. HUET
Manual preprocessing
• Annotation of words judged as extra information– They put the new salesman through his paces.
• Type of extra information words– Modal verbs: can, must– Semi-modal verbs: am going to, are likely
to– Catenative verbs: want to, keep– Adverbial phrases: in Italy, when he heard
the news– Noun phrases: this poet, his latest book
![Page 20: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/20.jpg)
Identifying Translations of Idiomatic Expressions 20 S. HUET
Number of queries found in the TM
BilingualBilingual ENEN FRFR
Verbatim queries 36 136 248
• EN: I have no axe to grind• FR: Je ne prêche pas pour ma paroisse
![Page 21: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/21.jpg)
Identifying Translations of Idiomatic Expressions 21 S. HUET
Number of queries found in the TM
BilingualBilingual ENEN FRFR
Verbatim queries 36 136 248
+ manual removal of extra words 91 302 410
• EN: I have .. axe to grind• FR: Je .. prêche .. pour ma paroisse
![Page 22: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/22.jpg)
Identifying Translations of Idiomatic Expressions 22 S. HUET
BilingualBilingual ENEN FRFR
Verbatim queries 36 136 248
+ manual removal of extra words 91 302 410
+ removal of extra pronoun 106 381 509
Number of queries found in the TM
• EN: have .. axe to grind• FR: prêche .. pour ma paroisse
![Page 23: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/23.jpg)
Identifying Translations of Idiomatic Expressions 23 S. HUET
Number of queries found in the TM
• EN: have+ .. axe to grind• FR: prêcher+ .. pour sa paroisse
BilingualBilingual ENEN FRFR
Verbatim queries 36 136 248
+ manual removal of extra words 91 302 410
+ removal of extra pronoun 106 381 509
+ verb lemmatization 210 624 650
![Page 24: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/24.jpg)
Identifying Translations of Idiomatic Expressions 24 S. HUET
Number of queries found in the TM
• EN: have+ .. axe to grind• FR: prêcher+ .. pour sa+ paroisse
BilingualBilingual ENEN FRFR
Verbatim queries 36 136 248
+ manual removal of extra words 91 302 410
+ removal of extra pronoun 106 381 509
+ verb lemmatization 210 624 650
+ pronoun and determiner lemmatization 238 700 705
![Page 25: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/25.jpg)
Identifying Translations of Idiomatic Expressions 25 S. HUET
Outline
• Introduction• TransSearch• Experimental setup• Evaluations• Conclusion
![Page 26: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/26.jpg)
Identifying Translations of Idiomatic Expressions 26 S. HUET
Evaluation using the phrase book
• 700 English queries found in the TM– 36 sentence pairs per query– 13 suggested translations
• 705 French queries found in the TM– 32 sentence pairs per query– 15 suggested translations
• Evaluation restrained to 238 entries with English and French sides in a same sentence pair
![Page 27: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/27.jpg)
Identifying Translations of Idiomatic Expressions 27 S. HUET
Recall measured using the phrase book
• For many queries, TransSearch displays relevant translations absent from the reference– est nébuleux displayed after the reference être
dans un état second for to be in a daze– 34 correct translations displayed for to be around
the corner
Rank 1 3 5 all
English queries 41.6 59.2 65.1 74.8
French queries 41.6 54.6 62.6 76.5
![Page 28: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/28.jpg)
Identifying Translations of Idiomatic Expressions 28 S. HUET
Manual evaluation
• 100 French queries• 5 annotators that judged 50 queries each• 3 labels: “correct”, “wrong”, “partial”• Low Fleiss inter-annotator agreement (0.25)
Q: manger à tous les rateliers J1 J2 J3
slurps at everyone's trough correct correct correct
double-dipper partial correct partial
them pot lickers and accusing them of being at the trough and pork barelling
wrong partial wrong
![Page 29: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/29.jpg)
Identifying Translations of Idiomatic Expressions 29 S. HUET
Manual evaluation
• Average rank of the 1st translation labeled as correct by 1 annotator: 1.4
• For 97/100 queries, a correct translation is displayed
correctpartialwrong
42%
22%
36%
![Page 30: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/30.jpg)
Identifying Translations of Idiomatic Expressions 30 S. HUET
Conclusion
• 50% of the idioms of a phrase book found in the TM of TransSearch
• Users should use morphological (+) and proximity (..) operators for idioms
• Only 36% of the displayed translations were clearly wrong
![Page 31: Identifying the Translations of Idiomatic Expressions using](https://reader031.fdocuments.us/reader031/viewer/2022021006/620388f6da24ad121e4a8ac9/html5/thumbnails/31.jpg)
Identifying Translations of Idiomatic Expressions 31 S. HUET
Thank you for your attention