Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.
-
Upload
nathanael-cowell -
Category
Documents
-
view
218 -
download
0
Transcript of Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.
![Page 1: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/1.jpg)
Language Divergences and Solutions
Advanced Machine Translation Seminar
Alison Alvarez
![Page 2: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/2.jpg)
Overview
Introduction Morphology Primer Translation Mismatches
Types Solutions
Translation Divergences Types Solutions
Different MT Systems Generation Heavy Machine Translation DUSTer
![Page 3: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/3.jpg)
Source ≠ Target
Languages don’t encode the same information in the same wayMakes MT complicatedKeeps all of us employed
![Page 4: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/4.jpg)
Morphology in a Nutshell
Morphemes are word partsWork +er Iki +ta +ku +na +ku +na +ri +ma +shi +ta
Types of MorphemesDerivational: makes new word Inflectional: adds information to an existing
word
![Page 5: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/5.jpg)
Morphology in a Nutshell Analytic/Isolating
little or no inflectional morphology, separate words Vietnamese, Chinese I was made to go
Synthetic Lots of inflectional morphology Fusional vs. Agglutinating Romance Languages, Finnish, Japanese, Mapudungun Ika (to go) +se (to make/let) +rare (passive) +ta (past
tense) He need +s (3rd person singular) it.
![Page 6: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/6.jpg)
Translation Differences
TypesTranslation Mismatches
Different information from source to target
Translation Divergences Same information from source to target, but the
meaning is distributed differently in each language
![Page 7: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/7.jpg)
Translation Mismatches
“…the information that is conveyed is different in the source and target languages”
Types: Lexical levelTypological level
![Page 8: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/8.jpg)
Lexical Mismatches
A lexical item in one language may have more distinctions than in another
Brother
弟
otouto
Younger Brother
兄さん
Ani-san
Older Brother
![Page 9: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/9.jpg)
Typological Mismatches
Mismatch between languages with different levels of grammaticalization
One language may be more structurally complex
Source marking, Obligatory Subject
![Page 10: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/10.jpg)
Typological Mismatches
Source: Quechua vs. English (they say) s/he was singing --> takisharansi taki (sing) +sha (progressive) +ra (past) + n (3rd sg)
+si (reportative)
Obligatory Arguments: English vs. Japanese Kusuri wo Nonda --> (I, you, etc.) took medicine. Makasemasu! -->(I’ll) leave (it) to (you)
![Page 11: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/11.jpg)
Translation Mismatch Solutions
More information --> Less information (easy) Less information --> More information (hard)
Context clues Language Models Generalization Formal representations
![Page 12: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/12.jpg)
Translation Divergences
“…the same information is conveyed in source and target texts”
Divergences are quite common Occurs in about 1 out of every three
sentences in the TREC El Norte Newspaper corpus (Spanish-English)
Sentences can have multiple kinds of divergences
![Page 13: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/13.jpg)
Translation Divergence Types
Categorial Divergence Conflational Divergence Structural Divergence Head Swapping Divergence Thematic Divergence
![Page 14: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/14.jpg)
Categorial Divergence
Translation that uses different parts of speech
Tener hambre (have hunger) --> be hungry
Noun --> adjective
![Page 15: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/15.jpg)
Conflational Divergence
The translation of two words using a single word that combines their meaning
Can also be called a lexical gap X stab Z --> X dar puñaladas a Z (X give stabs
to Z) glastuinbouw --> cultivation under glass
![Page 16: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/16.jpg)
Structural Divergence
A difference in the realization of incorporated arguments
PP to Object X entrar en Y (X enter in Y) --> X enter Y X ask for a referendum --> X pedir un
referendum (ask-for a referendum)
![Page 17: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/17.jpg)
Head Swapping Divergence
Involves the demotion of a head verb and the promotion of a modifier verb to head position
S
NP VP
N V PP VP
Yo entro en el cuarto corriendo
S
NP VP
N V PP
I ran into the room.
![Page 18: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/18.jpg)
Thematic Divergence
This divergence occurs when sentence arguments switch argument roles from one language to another
X gustar a Y (X please to Y) --> Y like X
![Page 19: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/19.jpg)
Divergence Solutions and Statistical/EBMT Systems Not really addressed explicitly in SMT Covered in EBMT only if it is covered
extensively in the data
![Page 20: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/20.jpg)
Divergence Solutions and Transfer Systems Hand-written transfer rules Automatic extraction of transfer rules from
bi-texts Problematic with multiple divergences
![Page 21: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/21.jpg)
Divergence Solutions and Interlingua Systems Mel’čuk’s Deep Syntactic Structure Jackendoff’s Lexical Semantic Structure Both require “explicit symmetric knowledge” from
both source and target language Expensive
![Page 22: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/22.jpg)
Divergence Solutions and Interlingua Systems
John swam across a river
Juan cruza el río nadando
[event CAUSE JOHN
[event GO JOHN [path ACROSS JOHN [position AT JOHN RIVER]]]
[manner SWIM+INGLY]]
![Page 23: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/23.jpg)
Generation-Heavy MT
Built to address language divergences Designed for source-poor/target-rich
translation Non-Interlingual Non-Transfer Uses symbolic overgeneration to account
for different translation divergences
![Page 24: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/24.jpg)
Generation-Heavy MT
Source languagesyntactic parser translation lexicon
Target language lexical semantics, categorial variations &
subcategorization frames for overgenerationStatistical language model
![Page 25: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/25.jpg)
GHMT System
![Page 26: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/26.jpg)
Analysis Stage
Independent of Target Language Creates a deep syntactic dependency Only argument structure, top-level
conceptual nodes & thematic-role information
Should normalize over syntactic & morphological phenomena
![Page 27: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/27.jpg)
Translation Stage
Converts SL lexemes to TL lexemes Maintains dependency structure
![Page 28: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/28.jpg)
Analysis/Translation Stage
GIVE (v)
[cause go]
I
agent
STAB (n)
theme
JOHN
goal
![Page 29: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/29.jpg)
Generation Stage
Lexical & Structural Selection Conversion to a thematic dependency
Uses syntactic-thematic linking map “loose” linking
Structural expansion Addresses conflation & head-swapped divergences
Turn thematic dependency to TL syntactic dependency
Addresses categorial divergence
![Page 30: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/30.jpg)
Generation Stage: Structural Expansion
![Page 31: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/31.jpg)
Generation Stage
Linearization Step Creates a word lattice to encode different
possible realizations Implemented using oxyGen engine
Sentences ranked & extracted Nitrogen’s statistical extractor
![Page 32: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/32.jpg)
Generation Stage
![Page 33: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/33.jpg)
GHMT Results
4 of 5 Spanish-English divergences “can be generated using structural expansion & categorial variations”
The remaining 1 out of 5 needed more world knowledge or idiom handling
SL syntactic parser can still be hard to come by
![Page 34: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/34.jpg)
Divergences and DUSTer
Helps to overcome divergences for word alignment & improve coder agreement
Changes an English sentence structure to resemble another language
More accurate alignment and projection of dependency trees without training on dependency tree data
![Page 35: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/35.jpg)
DUSTer
Motivation for the development of automatic correction of divergences
1. “Every Language Pair has translation divergences that are easy to recognize”
2. “Knowing what they are and how to accommodate them provides the basis for refined word level alignment”
3. “Refined word-level” alignment results in improved projection of structural information from English to another language
![Page 36: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/36.jpg)
DUSTer
![Page 37: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/37.jpg)
DUSTer
Bi-text parsed on English side only “Linguistically Motivated” & common search
terms Conducted on Spanish & Arabic (and later
Chinese & Hindi) Uses all of the divergences mentioned before,
plus a “light verb” divergence Try put to trying poner a prueba
![Page 38: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/38.jpg)
DUSTer Rule Development Methods Identify canonical transformations for each
divergence type Categorize English sentences into
divergence type or “none” Apply appropriate transformations Humans align E E’ foreign language
![Page 39: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/39.jpg)
DUSTer Rules
# "kill" => "LightVB kill(N)" (LightVB = light verb)# Presumably, this will work for "kill" => "give death to”# "borrow" => "take lent (thing) to”# "hurt" => "make harm to”# "fear" => "have fear of”# "desire" => "have interest in”# "rest" => "have repose on”# "envy" => "have envy of”type1.B.X [English{2 1 3} Spanish{2 1 3 4 5} ][ Verb<1,i,CatVar:V_N> [ Noun<2,j,Subj> ] [ Noun<3,k,Obj> ] ] <--> [ LightVB<1,Verb>[ Noun<2,j,Subj> ] [ Noun<3,i,Obj> ]
[ Oblique<4,Pred,Prep> [ Noun<5,k,PObj> ] ] ]
![Page 40: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/40.jpg)
DU
ST
er R
esul
ts
![Page 41: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/41.jpg)
Conclusion
Divergences are common They are not handled well by most MT
systems GHMT can account for divergences, but
still needs development DUSTer can handle divergences through
structure transformations, but requires a great deal of linguistic knowledge
![Page 42: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/42.jpg)
The End
Questions?
![Page 43: Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.](https://reader035.fdocuments.us/reader035/viewer/2022070308/551c36e05503460d398b4742/html5/thumbnails/43.jpg)
ReferencesDorr, Bonnie J., "Machine Translation Divergences: A Formal Description and Proposed Solution,"
Computational Linguistics, 20:4, pp. 597--633, 1994.Dorr, Bonnie J. and Nizar Habash, "Interlingua Approximation: A Generation-Heavy Approach", In
Proceedings of Workshop on Interlingua Reliability, Fifth Conference of the Association for Machine Translation in the Americas, AMTA-2002,Tiburon, CA, pp. 1--6, 2002
Dorr, Bonnie J., Clare R. Voss, Eric Peterson, and Michael Kiker, "Concept Based Lexical Selection," Proceedings of the AAAI-94 fall symposium on Knowledge Representation for Natural Language Processing in Implemented Systems, New Orleans, LA, pp. 21--30, 1994.
Dorr, Bonnie J., Lisa Pearl, Rebecca Hwa, and Nizar Habash, "DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-Level Alignment," Proceedings of the Fifth Conference of the Association for Machine Translation in the Americas, AMTA-2002,Tiburon, CA, pp. 31--43, 2002.
Habash, Nizar and Bonnie J. Dorr, "Handling Translation Divergences: Combining Statistical and Symbolic Techniques in Generation-Heavy Machine Translation", In Proceedings of the Fifth Conference of the Association for Machine Translation in the Americas, AMTA-2002,Tiburon, CA, pp. 84--93, 2002.
Haspelmath, Martin. Understanding Morphology. Oxford Univeristy Press, 2002. Kameyama, Megumi and Ryo Ochitani, Stanley Peters “Resolving Translation Mismatches With
Information Flow” Annual Meeting of the Assocation of Computational Linguistics, 1991