Morphological Processing for Statistical Machine...
Transcript of Morphological Processing for Statistical Machine...
![Page 1: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/1.jpg)
Morphological Processingfor
Statistical Machine Translation
Fabienne Cap
May 11th, 2016
![Page 2: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/2.jpg)
Goals for Today
Why Morphological Processing?
Morphological Processing in SMT
A closer look at Compound Merging
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 3: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/3.jpg)
Goals for Today
Why Morphological Processing?
Morphological Processing in SMT
A closer look at Compound Merging
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 4: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/4.jpg)
Data Sparsity
What is data sparsity?
→ rarely occurring words cause problems in statistical applications
Why is this problematic for SMT?
→ less occurrences → less reliable translations
→ unseen words cannot be translated
What can we do about it?
→ make the most out of the available training data!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 5: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/5.jpg)
Data Sparsity
What is data sparsity?
→ rarely occurring words cause problems in statistical applications
Why is this problematic for SMT?
→ less occurrences → less reliable translations
→ unseen words cannot be translated
What can we do about it?
→ make the most out of the available training data!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 6: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/6.jpg)
Data Sparsity
What is data sparsity?
→ rarely occurring words cause problems in statistical applications
Why is this problematic for SMT?
→ less occurrences → less reliable translations
→ unseen words cannot be translated
What can we do about it?
→ make the most out of the available training data!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 7: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/7.jpg)
Data Sparsity
What is data sparsity?
→ rarely occurring words cause problems in statistical applications
Why is this problematic for SMT?
→ less occurrences → less reliable translations
→ unseen words cannot be translated
What can we do about it?
→ make the most out of the available training data!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 8: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/8.jpg)
Data Sparsity
What is data sparsity?
→ rarely occurring words cause problems in statistical applications
Why is this problematic for SMT?
→ less occurrences → less reliable translations
→ unseen words cannot be translated
What can we do about it?
→ make the most out of the available training data!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 9: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/9.jpg)
Data Sparsity
What is data sparsity?
→ rarely occurring words cause problems in statistical applications
Why is this problematic for SMT?
→ less occurrences → less reliable translations
→ unseen words cannot be translated
What can we do about it?
→ make the most out of the available training data!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 10: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/10.jpg)
Data Sparsity
What is data sparsity?
→ rarely occurring words cause problems in statistical applications
Why is this problematic for SMT?
→ less occurrences → less reliable translations
→ unseen words cannot be translated
What can we do about it?
→ make the most out of the available training data!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 11: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/11.jpg)
Hands on Data Sparsity
There are two kinds of sparse data in parallel corpora for SMT:
1 unseen/rarely seen simplex words
→ use more data
2 unseen/rarely seen complex words
→ decomposition into seen words and word parts
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 12: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/12.jpg)
Hands on Data Sparsity
There are two kinds of sparse data in parallel corpora for SMT:
1 unseen/rarely seen simplex words
→ use more data
2 unseen/rarely seen complex words
→ decomposition into seen words and word parts
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 13: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/13.jpg)
Hands on Data Sparsity
There are two kinds of sparse data in parallel corpora for SMT:
1 unseen/rarely seen simplex words
→ use more data
2 unseen/rarely seen complex words
→ decomposition into seen words and word parts
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 14: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/14.jpg)
Hands on Data Sparsity
There are two kinds of sparse data in parallel corpora for SMT:
1 unseen/rarely seen simplex words→ use more data
2 unseen/rarely seen complex words
→ decomposition into seen words and word parts
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 15: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/15.jpg)
Hands on Data Sparsity
There are two kinds of sparse data in parallel corpora for SMT:
1 unseen/rarely seen simplex words→ use more data
2 unseen/rarely seen complex words→ decomposition into seen words and word parts
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 16: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/16.jpg)
Practical Exercise
The Revenge of the Sith:
Sith language is morphologically richer than you thought!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 17: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/17.jpg)
Morphology
Conjugationcase, gender and
number marking
of e.g. adjectives,
nouns, determiner
person, number,
tense, modus,
of verbs
aspekt marking
comparation of
adjectives: positive
comparative, superlative
röd röda se ser, settstor större, störst
modification of wordsInflection
Morphology
Compounding
combination of
free morphemes
into new lexemes
frukt+korg = fruktkorg
Derivation
combination of a
free with one or more
bound morphemes
äta+ −bar = ätbar
creation of new wordsWord Formation
ComparisonDeclination
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 18: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/18.jpg)
Morphology
Conjugationcase, gender and
number marking
of e.g. adjectives,
nouns, determiner
person, number,
tense, modus,
of verbs
aspekt marking
comparation of
adjectives: positive
comparative, superlative
röd röda se ser, settstor större, störst
modification of wordsInflection
Morphology
Compounding
combination of
free morphemes
into new lexemes
frukt+korg = fruktkorg
Derivation
combination of a
free with one or more
bound morphemes
äta+ −bar = ätbar
creation of new wordsWord Formation
ComparisonDeclination
Previous work on morphological processing for SMThas mostly dealt with Deklination, Comparation and Compounding
The goal is to make the source and the target languageas similar as possible prior to word alignmente.g. through lemmatisation or compound splitting
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 19: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/19.jpg)
Morphology
Conjugationcase, gender and
number marking
of e.g. adjectives,
nouns, determiner
person, number,
tense, modus,
of verbs
aspekt marking
comparation of
adjectives: positive
comparative, superlative
röd röda se ser, settstor större, störst
modification of wordsInflection
Morphology
Compounding
combination of
free morphemes
into new lexemes
frukt+korg = fruktkorg
Derivation
combination of a
free with one or more
bound morphemes
äta+ −bar = ätbar
creation of new wordsWord Formation
ComparisonDeclination
Previous work on morphological processing for SMThas mostly dealt with Deklination, Comparation and Compounding
The goal is to make the source and the target languageas similar as possible prior to word alignment
e.g. through lemmatisation or compound splitting
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 20: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/20.jpg)
Morphology
Conjugationcase, gender and
number marking
of e.g. adjectives,
nouns, determiner
person, number,
tense, modus,
of verbs
aspekt marking
comparation of
adjectives: positive
comparative, superlative
röd röda se ser, settstor större, störst
modification of wordsInflection
Morphology
Compounding
combination of
free morphemes
into new lexemes
frukt+korg = fruktkorg
Derivation
combination of a
free with one or more
bound morphemes
äta+ −bar = ätbar
creation of new wordsWord Formation
ComparisonDeclination
Previous work on morphological processing for SMThas mostly dealt with Deklination, Comparation and Compounding
The goal is to make the source and the target languageas similar as possible prior to word alignmente.g. through lemmatisation or compound splitting
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 21: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/21.jpg)
Morphological Processing: Lemmatisation
Das Haus ist blau - The house is blue
Number Case Definite Indefinite
Singular
Nominativ das blaue Haus ein blaues HausGenitiv des blauen Hauses eines blauen HausesAkkusativ in das blaue Haus in ein blaues HausDativ in dem blauen Haus in einem blauen Haus
Plural
Nominativ die blauen Hauser einige blaue HauserGenitiv der blauen Hauser einiger blauer HauserAkkusativ in die blauen Hauser in einige blaue HauserDativ in den blauen Hausern in einigen blauen Hausern
blau, blaue, blaues, blauen, blauer → blue
but: keep differences that are made in both languages!
Haus, Hauses → houseHauser, Hausern → houses
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 22: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/22.jpg)
Morphological Processing: Lemmatisation
Das Haus ist blau - The house is blue
Number Case Definite Indefinite
Singular
Nominativ das blaue Haus ein blaues HausGenitiv des blauen Hauses eines blauen HausesAkkusativ in das blaue Haus in ein blaues HausDativ in dem blauen Haus in einem blauen Haus
Plural
Nominativ die blauen Hauser einige blaue HauserGenitiv der blauen Hauser einiger blauer HauserAkkusativ in die blauen Hauser in einige blaue HauserDativ in den blauen Hausern in einigen blauen Hausern
blau, blaue, blaues, blauen, blauer → blue
but: keep differences that are made in both languages!
Haus, Hauses → houseHauser, Hausern → houses
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 23: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/23.jpg)
Morphological Processing: Lemmatisation
Das Haus ist blau - The house is blue
Number Case Definite Indefinite
Singular
Nominativ das blaue Haus ein blaues HausGenitiv des blauen Hauses eines blauen HausesAkkusativ in das blaue Haus in ein blaues HausDativ in dem blauen Haus in einem blauen Haus
Plural
Nominativ die blauen Hauser einige blaue HauserGenitiv der blauen Hauser einiger blauer HauserAkkusativ in die blauen Hauser in einige blaue HauserDativ in den blauen Hausern in einigen blauen Hausern
blau, blaue, blaues, blauen, blauer → blue
but: keep differences that are made in both languages!
Haus, Hauses → houseHauser, Hausern → houses
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 24: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/24.jpg)
Morphological Processing: Lemmatisation
Das Haus ist blau - The house is blue
Number Case Definite Indefinite
Singular
Nominativ das blaue Haus ein blaues HausGenitiv des blauen Hauses eines blauen HausesAkkusativ in das blaue Haus in ein blaues HausDativ in dem blauen Haus in einem blauen Haus
Plural
Nominativ die blauen Hauser einige blaue HauserGenitiv der blauen Hauser einiger blauer HauserAkkusativ in die blauen Hauser in einige blaue HauserDativ in den blauen Hausern in einigen blauen Hausern
blau, blaue, blaues, blauen, blauer → blue
but: keep differences that are made in both languages!
Haus, Hauses → houseHauser, Hausern → houses
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 25: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/25.jpg)
Morphological Processing: Compound Splitting
Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz
This is a real example!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 26: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/26.jpg)
Morphological Processing: Compound Splitting
Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz
This is a real example!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 27: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/27.jpg)
Morphological Processing: Compound Splitting
beef labelling monitoring task transfer law
Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz
1:6
This is a real example!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 28: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/28.jpg)
Morphological Processing: Compound Splitting
Rindfleisch Etikettierung Überwachung Aufgaben Übertragung Gesetz
beef labelling monitoring task transfer law
Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz
1:6
This is a real example!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 29: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/29.jpg)
Morphological Processing: Compound Splitting
beef labelling monitoring task transfer law1:1
Rindfleisch Etikettierung Überwachung Aufgaben Übertragung Gesetz
beef labelling monitoring task transfer law
Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz
1:6
This is a real example!
→ more compound splitting for SMT in a student project!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 30: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/30.jpg)
Morphological Processing: Compound Splitting
beef labelling monitoring task transfer law1:1
Rindfleisch Etikettierung Überwachung Aufgaben Übertragung Gesetz
beef labelling monitoring task transfer law
Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz
1:6
This is a real example!
→ more compound splitting for SMT in a student project!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 31: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/31.jpg)
Goals for Today
Why Morphological Processing?
Morphological Processing in SMT
A closer look at Compound Merging
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 32: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/32.jpg)
Goals for Today
Why Morphological Processing?
Morphological Processing in SMT
A closer look at Compound Merging
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 33: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/33.jpg)
German to English SMT Example
viele händler verkaufen obst in papiertüten .
many traders sell fruit in paper bags .
training
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 34: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/34.jpg)
German to English SMT Example
papierhändler
verkaufen
Baseline
testingtüten .
German input
viele händler verkaufen obst in papiertüten .
many traders sell fruit in paper bags .
training
Moses
decoder
SMT
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 35: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/35.jpg)
German to English SMT Example
Baseline
testing
German input
papierhändler
tüten . verkaufen
many traders fruit in paper bags .
viele obst in .tütenhändler papierverkaufen
sell
training
Moses
decoder
SMT
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 36: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/36.jpg)
German to English SMT Example
papierhändler
tüten .sell
English output
Baseline
testing
German input
papierhändler
tüten . verkaufen
many traders fruit in paper bags .
viele obst in .tütenhändler papierverkaufen
sell
training
Moses
decoder
SMT
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 37: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/37.jpg)
German to English SMT Example
training
händlerviele
tradersmany
verkaufen obst in
bags .
.
paperfruit insellOur system
papiertüten
papierhändler
tüten .sell
English output
Baseline
testing
German input
papierhändler
tüten . verkaufen
many traders fruit in paper bags .
viele obst in .tütenhändler papierverkaufen
sell
training
Moses
decoder SMT
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 38: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/38.jpg)
German to English SMT Example
training
händlerviele
tradersmany
verkaufen obst in
.
.
fruit insellOur system
papiertüten
bagspaper
papierhändler
tüten .sell
English output
Baseline
testing
German input
papierhändler
tüten . verkaufen
many traders fruit in paper bags .
viele obst in .tütenhändler papierverkaufen
sell
training
Moses
decoder SMT
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 39: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/39.jpg)
German to English SMT Example
training
händlerviele
tradersmany
verkaufen obst in
.
.
fruit insellOur system
paper bags
papier tüten
papierhändler
tüten .sell
English output
Baseline
testing
German input
papierhändler
tüten . verkaufen
many traders fruit in paper bags .
viele obst in .tütenhändler papierverkaufen
sell
training
Moses
decoder SMT
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 40: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/40.jpg)
German to English SMT Example
tüten . verkaufen SMT
Moses
decodertesting
German input
papierhändler
training
händlerviele
tradersmany
verkaufen obst in
.
.
fruit insellOur system
paper bags
papier tüten
papierhändler
tüten .sell
English output
Baseline
testing
German input
papierhändler
tüten . verkaufen
many traders fruit in paper bags .
viele obst in .tütenhändler papierverkaufen
sell
training
Moses
decoder SMT
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 41: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/41.jpg)
German to English SMT Example
tüten . verkaufen SMT
Moses
decodertesting
German input
papierhändler
training
händlerviele
tradersmany
verkaufen obst in
.
.
fruit insellOur system
paper bags
papier tüten
papierhändler
tüten .sell
English output
Baseline
testing
German input
papierhändler
tüten . verkaufen
many traders fruit in paper bags .
viele obst in .tütenhändler papierverkaufen
sell
training
Moses
decoder SMT
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 42: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/42.jpg)
German to English SMT Example
tüten . verkaufen SMT
Moses
decodertesting
German input
papier händler
training
händlerviele
tradersmany
verkaufen obst in
.
.
fruit insellOur system
paper bags
papier tüten
papierhändler
tüten .sell
English output
Baseline
testing
German input
papierhändler
tüten . verkaufen
many traders fruit in paper bags .
viele obst in .tütenhändler papierverkaufen
sell
training
Moses
decoder SMT
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 43: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/43.jpg)
German to English SMT Example
verkaufen tüten . SMT
Moses
decodertesting
German input
papier händler
training
viele
many
obst in
.
.
fruit inOur system
paper bags
papier tütenhändler
traders
verkaufen
sell
papierhändler
tüten .sell
English output
Baseline
testing
German input
papierhändler
tüten . verkaufen
many traders fruit in paper bags .
viele obst in .tütenhändler papierverkaufen
sell
training
Moses
decoder SMT
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 44: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/44.jpg)
German to English SMT Example
bags .
paper traders
sell
English output
verkaufen tüten . SMT
Moses
decodertesting
German input
papier händler
training
viele
many
obst in
.
.
fruit inOur system
paper bags
papier tütenhändler
traders
verkaufen
sell
papierhändler
tüten .sell
English output
Baseline
testing
German input
papierhändler
tüten . verkaufen
many traders fruit in paper bags .
viele obst in .tütenhändler papierverkaufen
sell
training
Moses
decoder SMT
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 45: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/45.jpg)
Now: opposite translation direction!!!
Pay Attention You Must!!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 46: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/46.jpg)
English to German SMT Example
find I them expensivetoo .many traders sell fruit in bags .paper
teuer .die zusindmirverkaufen obst in papiertüten .viele händler
training
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 47: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/47.jpg)
English to German SMT Example
testing
English input
Baseline
find them too expensive .
many paper traders
decoder
SMT
Moses
find I them expensivetoo .many traders sell fruit in bags .paper
teuer .die zusindmirverkaufen obst in papiertüten .viele händler
training
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 48: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/48.jpg)
English to German SMT Example
testing
English input
Baseline
many
find them too expensive .
traderspaper
decoder
SMT
Moses
training
.mirverkaufen obst in papiertüten .
I .sell fruit in bags .many
viele
traders
händler
find them too expensive
sind die zu teuer
paper
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 49: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/49.jpg)
English to German SMT Example
paper
.
German output
viele händler
sind die zu teuertesting
English input
Baseline
many
find them too expensive .
traderspaper
decoder
SMT
Moses
training
.mirverkaufen obst in papiertüten .
I .sell fruit in bags .many
viele
traders
händler
find them too expensive
sind die zu teuer
paper
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 50: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/50.jpg)
English to German SMT Example
paper
.
German output
viele händler
sind die zu teuertesting
English input
Baseline
many
find them too expensive .
traderspaper
decoder
SMT
Moses
training
.mirverkaufen obst in papiertüten .
I .sell fruit in bags .many
viele
traders
händler
find them too expensive
sind die zu teuer
paper
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 51: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/51.jpg)
English to German SMT Example
training
teuer .die zusindmirverkaufen obst inviele händler
find I them expensivetoo .many traders sell fruit in .
.
bagspaper
papiertütenOur system
paper
.
German output
viele händler
sind die zu teuertesting
English input
Baseline
many
find them too expensive .
traderspaper
decoder SMT
Moses
training
.mirverkaufen obst in papiertüten .
I .sell fruit in bags .many
viele
traders
händler
find them too expensive
sind die zu teuer
paper
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 52: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/52.jpg)
English to German SMT Example
training
teuer .die zusindmirverkaufen obst inviele händler
find I them expensivetoo .many traders sell fruit in .
.Our system
paper bags
papiertüten
paper
.
German output
viele händler
sind die zu teuertesting
English input
Baseline
many
find them too expensive .
traderspaper
decoder SMT
Moses
training
.mirverkaufen obst in papiertüten .
I .sell fruit in bags .many
viele
traders
händler
find them too expensive
sind die zu teuer
paper
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 53: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/53.jpg)
English to German SMT Example
training
teuer .die zusindmirverkaufen obst inviele händler
find I them expensivetoo .many traders sell fruit in .
.Our system
paper bags
papier tüten
paper
.
German output
viele händler
sind die zu teuertesting
English input
Baseline
many
find them too expensive .
traderspaper
decoder SMT
Moses
training
.mirverkaufen obst in papiertüten .
I .sell fruit in bags .many
viele
traders
händler
find them too expensive
sind die zu teuer
paper
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 54: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/54.jpg)
English to German SMT Example
testing
English input
many paper traders
find them too expensive .
Moses
SMT decoder
training
teuer .die zusindmirverkaufen obst inviele händler
find I them expensivetoo .many traders sell fruit in .
.Our system
paper bags
papier tüten
paper
.
German output
viele händler
sind die zu teuertesting
English input
Baseline
many
find them too expensive .
traderspaper
decoder SMT
Moses
training
.mirverkaufen obst in papiertüten .
I .sell fruit in bags .many
viele
traders
händler
find them too expensive
sind die zu teuer
paper
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 55: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/55.jpg)
English to German SMT Example
testing
English input
many paper traders
find them too expensive .
Moses
SMT decoder
training
.mirverkaufen obst in
I .sell fruit in .
.Our system
bagsmany traders
viele händler
paper
tütenpapier
find them too expensive
sind die zu teuer
paper
.
German output
viele händler
sind die zu teuertesting
English input
Baseline
many
find them too expensive .
traderspaper
decoder SMT
Moses
training
.mirverkaufen obst in papiertüten .
I .sell fruit in bags .many
viele
traders
händler
find them too expensive
sind die zu teuer
paper
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 56: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/56.jpg)
English to German SMT Example
.
German output
viele papier händler
sind die zu teuertesting
English input
many paper traders
find them too expensive .
Moses
SMT decoder
training
.mirverkaufen obst in
I .sell fruit in .
.Our system
bagsmany traders
viele händler
paper
tütenpapier
find them too expensive
sind die zu teuer
paper
.
German output
viele händler
sind die zu teuertesting
English input
Baseline
many
find them too expensive .
traderspaper
decoder SMT
Moses
training
.mirverkaufen obst in papiertüten .
I .sell fruit in bags .many
viele
traders
händler
find them too expensive
sind die zu teuer
paper
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 57: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/57.jpg)
English to German SMT Example
.
German output
viele papier händler
sind die zu teuertesting
English input
many paper traders
find them too expensive .
Moses
SMT decoder
training
.mirverkaufen obst in
I .sell fruit in .
.Our system
bagsmany traders
viele händler
paper
tütenpapier
find them too expensive
sind die zu teuer
paper
.
German output
viele händler
sind die zu teuertesting
English input
Baseline
many
find them too expensive .
traderspaper
decoder SMT
Moses
training
.mirverkaufen obst in papiertüten .
I .sell fruit in bags .many
viele
traders
händler
find them too expensive
sind die zu teuer
paper
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 58: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/58.jpg)
English to German SMT Example
.
German output
viele
sind die zu teuer
papierhändlertesting
English input
many paper traders
find them too expensive .
Moses
SMT decoder
training
.mirverkaufen obst in
I .sell fruit in .
.Our system
bagsmany traders
viele händler
paper
tütenpapier
find them too expensive
sind die zu teuer
paper
.
German output
viele händler
sind die zu teuertesting
English input
Baseline
many
find them too expensive .
traderspaper
decoder SMT
Moses
training
.mirverkaufen obst in papiertüten .
I .sell fruit in bags .many
viele
traders
händler
find them too expensive
sind die zu teuer
paper
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 59: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/59.jpg)
English to German SMT Example
.
German output
viele
sind die zu teuer
papierhändlertesting
English input
many paper traders
find them too expensive .
Moses
SMT decoder
training
.mirverkaufen obst in
I .sell fruit in .
.Our system
bagsmany traders
viele händler
paper
tütenpapier
find them too expensive
sind die zu teuer
paper
.
German output
viele händler
sind die zu teuertesting
English input
Baseline
many
find them too expensive .
traderspaper
decoder SMT
Moses
training
.mirverkaufen obst in papiertüten .
I .sell fruit in bags .many
viele
traders
händler
find them too expensive
sind die zu teuer
paper
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 60: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/60.jpg)
English to German SMT Example
.
German output
sind die zu teuer
vielen papierhändlerntesting
English input
many paper traders
find them too expensive .
Moses
SMT decoder
training
.mirverkaufen obst in
I .sell fruit in .
.Our system
bagsmany traders
viele händler
paper
tütenpapier
find them too expensive
sind die zu teuer
paper
.
German output
viele händler
sind die zu teuertesting
English input
Baseline
many
find them too expensive .
traderspaper
decoder SMT
Moses
training
.mirverkaufen obst in papiertüten .
I .sell fruit in bags .many
viele
traders
händler
find them too expensive
sind die zu teuer
paper
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 61: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/61.jpg)
German to English SMT Example
Morphological Processing....
allows to translate compounds that have not occurred in thetraining data:
provided that they have been properly splittheir parts must have occurred in the training datait is irrelevant how the parts occurred:as simplex words, compound modifiers or heads
enhances the word counts of simplex words and thus makestheir translations more reliable as well
can produce unseen inflectional variants of seen words
can produce coherent inflected sequences of words
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 62: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/62.jpg)
Goals for Today
Why Morphological Processing?
Morphological Processing in SMT
A closer look at Compound Merging
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 63: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/63.jpg)
Goals for Today
Why Morphological Processing?
Morphological Processing in SMT
A closer look at Compound Merging
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 64: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/64.jpg)
Compound Merging Approaches
1) List-based approach
store compounds and their parts after splitting
only words that are on this list are merged into compounds
2) POS-based approach
POS-markup for compound modifiers:Inflations|N-Part + Rate|N = Inflationsrate|Nrestricts the POS of candidate heads for merging
use CRFs for the merging decision
3) Morphological approach
use a rule-based morphological analyser for analysis andgeneration of compounds
use CRFs for the merging decision
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 65: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/65.jpg)
Compound Merging Approaches
1) List-based approach
store compounds and their parts after splitting
only words that are on this list are merged into compounds
2) POS-based approach
POS-markup for compound modifiers:Inflations|N-Part + Rate|N = Inflationsrate|Nrestricts the POS of candidate heads for merging
use CRFs for the merging decision
3) Morphological approach
use a rule-based morphological analyser for analysis andgeneration of compounds
use CRFs for the merging decision
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 66: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/66.jpg)
Compound Merging Approaches
1) List-based approach
store compounds and their parts after splitting
only words that are on this list are merged into compounds
2) POS-based approach
POS-markup for compound modifiers:Inflations|N-Part + Rate|N = Inflationsrate|Nrestricts the POS of candidate heads for merging
use CRFs for the merging decision
3) Morphological approach
use a rule-based morphological analyser for analysis andgeneration of compounds
use CRFs for the merging decision
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 67: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/67.jpg)
Compound Merging Approaches
1) List-based approach
store compounds and their parts after splitting
only words that are on this list are merged into compounds
2) POS-based approach
POS-markup for compound modifiers:Inflations|N-Part + Rate|N = Inflationsrate|Nrestricts the POS of candidate heads for merging
use CRFs for the merging decision
3) Morphological approach
use a rule-based morphological analyser for analysis andgeneration of compounds
use CRFs for the merging decision
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 68: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/68.jpg)
Compound Merging Approaches
1) List-based approach
store compounds and their parts after splitting
only words that are on this list are merged into compounds
2) POS-based approach
POS-markup for compound modifiers:Inflations|N-Part + Rate|N = Inflationsrate|Nrestricts the POS of candidate heads for merging
use CRFs for the merging decision
3) Morphological approach
use a rule-based morphological analyser for analysis andgeneration of compounds
use CRFs for the merging decision
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 69: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/69.jpg)
Compound Merging Approaches
1) List-based approach
store compounds and their parts after splitting
only words that are on this list are merged into compounds
2) POS-based approach
POS-markup for compound modifiers:Inflations|N-Part + Rate|N = Inflationsrate|Nrestricts the POS of candidate heads for merging
use CRFs for the merging decision
3) Morphological approach
use a rule-based morphological analyser for analysis andgeneration of compounds
use CRFs for the merging decision
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 70: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/70.jpg)
Compound Merging Approaches
1) List-based approach
store compounds and their parts after splitting
only words that are on this list are merged into compounds
2) POS-based approach
POS-markup for compound modifiers:Inflations|N-Part + Rate|N = Inflationsrate|Nrestricts the POS of candidate heads for merging
use CRFs for the merging decision
3) Morphological approach
use a rule-based morphological analyser for analysis andgeneration of compounds
use CRFs for the merging decision
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 71: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/71.jpg)
Compound Merging Approaches
1) List-based approach
store compounds and their parts after splitting
only words that are on this list are merged into compounds
2) POS-based approach
POS-markup for compound modifiers:Inflations|N-Part + Rate|N = Inflationsrate|Nrestricts the POS of candidate heads for merging
use CRFs for the merging decision
3) Morphological approach
use a rule-based morphological analyser for analysis andgeneration of compounds
use CRFs for the merging decision
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 72: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/72.jpg)
Compound Merging Approaches
1) List-based approach
store compounds and their parts after splitting
only words that are on this list are merged into compounds
2) POS-based approach
POS-markup for compound modifiers:Inflations|N-Part + Rate|N = Inflationsrate|Nrestricts the POS of candidate heads for merging
use CRFs for the merging decision
3) Morphological approach
use a rule-based morphological analyser for analysis andgeneration of compounds
use CRFs for the merging decision
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 73: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/73.jpg)
Compound Merging for English to German SMT
Compound merging is a challenging task:
not all two consecutive words that could be merged,should be merged:
“kind” + “punsch” = “kinderpunsch” (punch for children)but: “darf ein kind punsch trinken?”
(may a child drink punch?)
solution: use linear chain Conditional Random Fields(Crfs).
machine learning techniquelearn context-dependent merging decisions
based on features assigned to each word
features can be derived from thetarget and/or the source language
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 74: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/74.jpg)
Compound Merging for English to German SMT
Compound merging is a challenging task:
not all two consecutive words that could be merged,should be merged:
“kind” + “punsch” = “kinderpunsch” (punch for children)but: “darf ein kind punsch trinken?”
(may a child drink punch?)
solution: use linear chain Conditional Random Fields(Crfs).
machine learning techniquelearn context-dependent merging decisions
based on features assigned to each word
features can be derived from thetarget and/or the source language
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 75: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/75.jpg)
Compound Merging for English to German SMT
Compound merging is a challenging task:
not all two consecutive words that could be merged,should be merged:
“kind” + “punsch” = “kinderpunsch” (punch for children)
but: “darf ein kind punsch trinken?”(may a child drink punch?)
solution: use linear chain Conditional Random Fields(Crfs).
machine learning techniquelearn context-dependent merging decisions
based on features assigned to each word
features can be derived from thetarget and/or the source language
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 76: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/76.jpg)
Compound Merging for English to German SMT
Compound merging is a challenging task:
not all two consecutive words that could be merged,should be merged:
“kind” + “punsch” = “kinderpunsch” (punch for children)but: “darf ein kind punsch trinken?”
(may a child drink punch?)
solution: use linear chain Conditional Random Fields(Crfs).
machine learning techniquelearn context-dependent merging decisions
based on features assigned to each word
features can be derived from thetarget and/or the source language
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 77: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/77.jpg)
Compound Merging for English to German SMT
Compound merging is a challenging task:
not all two consecutive words that could be merged,should be merged:
“kind” + “punsch” = “kinderpunsch” (punch for children)but: “darf ein kind punsch trinken?”
(may a child drink punch?)
solution: use linear chain Conditional Random Fields(Crfs).
machine learning techniquelearn context-dependent merging decisions
based on features assigned to each word
features can be derived from thetarget and/or the source language
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 78: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/78.jpg)
Compound Merging for English to German SMT
Compound merging is a challenging task:
not all two consecutive words that could be merged,should be merged:
“kind” + “punsch” = “kinderpunsch” (punch for children)but: “darf ein kind punsch trinken?”
(may a child drink punch?)
solution: use linear chain Conditional Random Fields(Crfs).
machine learning techniquelearn context-dependent merging decisions
based on features assigned to each word
features can be derived from thetarget and/or the source language
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 79: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/79.jpg)
Compound Merging for English to German SMT
Compound merging is a challenging task:
not all two consecutive words that could be merged,should be merged:
“kind” + “punsch” = “kinderpunsch” (punch for children)but: “darf ein kind punsch trinken?”
(may a child drink punch?)
solution: use linear chain Conditional Random Fields(Crfs).
machine learning techniquelearn context-dependent merging decisions
based on features assigned to each word
features can be derived from thetarget and/or the source language
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 80: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/80.jpg)
Compound Merging for English to German SMT
Compound merging is a challenging task:
not all two consecutive words that could be merged,should be merged:
“kind” + “punsch” = “kinderpunsch” (punch for children)but: “darf ein kind punsch trinken?”
(may a child drink punch?)
solution: use linear chain Conditional Random Fields(Crfs).
machine learning techniquelearn context-dependent merging decisionsbased on features assigned to each wordfeatures can be derived from thetarget and/or the source language
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 81: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/81.jpg)
Compound Merging for English to German SMT
Compound merging is a challenging task:
not all two consecutive words that could be merged,should be merged:
“kind” + “punsch” = “kinderpunsch” (punch for children)but: “darf ein kind punsch trinken?”
(may a child drink punch?)
solution: use linear chain Conditional Random Fields(Crfs).
machine learning techniquelearn context-dependent merging decisionsbased on features assigned to each wordfeatures can be derived from thetarget and/or the source language
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 82: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/82.jpg)
Compound Merging for English to German SMT
Examples of CRF features derived from the target language:
part of speech some POS patterns are more likely toform compounds than others
modifier vs. head position some words occur much more often asmodifiers than as heads (and vice versa)
productivity of a modifier some words are more productive thanothers: for each modifier, I count thenumber of different head types
However, as all these features are derived from the (oftendisfluent) target language, they might not be very reliable
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 83: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/83.jpg)
Compound Merging for English to German SMT
Examples of CRF features derived from the target language:
part of speech some POS patterns are more likely toform compounds than others
modifier vs. head position some words occur much more often asmodifiers than as heads (and vice versa)
productivity of a modifier some words are more productive thanothers: for each modifier, I count thenumber of different head types
However, as all these features are derived from the (oftendisfluent) target language, they might not be very reliable
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 84: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/84.jpg)
Compound Merging for English to German SMT
Examples of CRF features derived from the target language:
part of speech some POS patterns are more likely toform compounds than others
modifier vs. head position some words occur much more often asmodifiers than as heads (and vice versa)
productivity of a modifier some words are more productive thanothers: for each modifier, I count thenumber of different head types
However, as all these features are derived from the (oftendisfluent) target language, they might not be very reliable
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 85: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/85.jpg)
Compound Merging for English to German SMT
Examples of CRF features derived from the target language:
part of speech some POS patterns are more likely toform compounds than others
modifier vs. head position some words occur much more often asmodifiers than as heads (and vice versa)
productivity of a modifier some words are more productive thanothers: for each modifier, I count thenumber of different head types
However, as all these features are derived from the (oftendisfluent) target language, they might not be very reliable
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 86: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/86.jpg)
Compound Merging for English to German SMT
Examples of CRF features derived from the target language:
part of speech some POS patterns are more likely toform compounds than others
modifier vs. head position some words occur much more often asmodifiers than as heads (and vice versa)
productivity of a modifier some words are more productive thanothers: for each modifier, I count thenumber of different head types
However, as all these features are derived from the (oftendisfluent) target language, they might not be very reliable
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 87: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/87.jpg)
Compound Merging for English to German SMT
Examples of CRF features derived from the target language:
part of speech some POS patterns are more likely toform compounds than others
modifier vs. head position some words occur much more often asmodifiers than as heads (and vice versa)
productivity of a modifier some words are more productive thanothers: for each modifier, I count thenumber of different head types
However, as all these features are derived from the (oftendisfluent) target language, they might not be very reliable
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 88: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/88.jpg)
Compound Merging for English to German SMT
In contrast, the source sentence is fluent language, andsometimes, the English source sentence structure may help thedecision:
may a child have a punch
MD
NP
NN
VP
NP
NNV $.DTDT
SQ
?
S
should not be merged:darf ein kind punsch trinken?
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 89: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/89.jpg)
Compound Merging for English to German SMT
In contrast, the source sentence is fluent language, andsometimes, the English source sentence structure may help thedecision:
may a child have a punch
MD
NP
NN
VP
NP
NNV $.DTDT
SQ
?
S
should not be merged:darf ein kind punsch trinken?
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 90: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/90.jpg)
Compound Merging for English to German SMT
In contrast, the source sentence is fluent language, andsometimes, the English source sentence structure may help thedecision:
may a child have a punch
MD
NP
NN
VP
NP
NNV $.DTDT
SQ
?
S
should not be merged:darf ein kind punsch trinken?
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 91: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/91.jpg)
Compound Merging for English to German SMT
In contrast, the source sentence is fluent language, andsometimes, the English source sentence structure may help thedecision:
may a have a
MD
NP
NN
VP
NP
NNV $.DTDT
?
S
child punch
SQ
should not be merged:darf ein kind punsch trinken?
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 92: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/92.jpg)
Compound Merging for English to German SMT
In contrast, the source sentence is fluent language, andsometimes, the English source sentence structure may help thedecision:
may a have a
MD
NP
NN
VP
NP
NNV $.DTDT
?
S
child punch
SQ
should not be merged:darf ein kind punsch trinken?
VP
S
VP
everyone
NN
NP
may
MD V
have punch
NN
NP
PP
P
for children
NN
NP
$.
!
should be merged:jeder darf kind punsch haben!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 93: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/93.jpg)
Compound Merging for English to German SMT
In contrast, the source sentence is fluent language, andsometimes, the English source sentence structure may help thedecision:
may a have a
MD
NP
NN
VP
NP
NNV $.DTDT
?
S
child punch
SQ
should not be merged:darf ein kind punsch trinken?
VP
S
VP
everyone
NN
NP
may
MD V
have punch
NN
NP
PP
P
for children
NN
NP
$.
!
should be merged:jeder darf kind punsch haben!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 94: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/94.jpg)
Compound Merging for English to German SMT
In contrast, the source sentence is fluent language, andsometimes, the English source sentence structure may help thedecision:
may a have a
MD
NP
NN
VP
NP
NNV $.DTDT
?
S
child punch
SQ
should not be merged:darf ein kind punsch trinken?
VP
S
VP
everyone
NN
NP
may
MD V
have
NN
PP
P
for
NN
NP
$.
!punch children
NP
should be merged:jeder darf kind punsch haben!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 95: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/95.jpg)
Compound Merging for English to German SMTCRF Training: learn binary merging decisions
German EnglishMERGE?
word POS MOD HEAD PROD EN:NPdarf VM 0 0 0 0 0ein DET 0 0 0 0 0kind NN 16,126 1,195 1,824 0 0punsch NN 2 13 2 0 0trinken VV 0 0 0 0 0? ? 0 0 0 0 0
jeder PRO 0 0 0 0 0darf VM 0 0 0 0 0kind NN 16,126 1,195 1,824 1 1punsch NN 2 13 2 0 0haben VV 0 0 0 0 0! ! 0 0 0 0 0
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 96: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/96.jpg)
Compound Merging for English to German SMTCRF Training: learn binary merging decisions
German EnglishMERGE?
word POS MOD HEAD PROD EN:NPdarf VM 0 0 0 0 0ein DET 0 0 0 0 0kind NN 16,126 1,195 1,824 0 0punsch NN 2 13 2 0 0trinken VV 0 0 0 0 0? ? 0 0 0 0 0
jeder PRO 0 0 0 0 0darf VM 0 0 0 0 0kind NN 16,126 1,195 1,824 1 1punsch NN 2 13 2 0 0haben VV 0 0 0 0 0! ! 0 0 0 0 0
In the training data...→“Kind” occurred more often as a modifier than as a head
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 97: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/97.jpg)
Compound Merging for English to German SMTCRF Training: learn binary merging decisions
German EnglishMERGE?
word POS MOD HEAD PROD EN:NPdarf VM 0 0 0 0 0ein DET 0 0 0 0 0kind NN 16,126 1,195 1,824 0 0punsch NN 2 13 2 0 0trinken VV 0 0 0 0 0? ? 0 0 0 0 0
jeder PRO 0 0 0 0 0darf VM 0 0 0 0 0kind NN 16,126 1,195 1,824 1 1punsch NN 2 13 2 0 0haben VV 0 0 0 0 0! ! 0 0 0 0 0
In the training data...→“Kind” occurred more often as a modifier than as a head→ the opposite applies to “Punsch”!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 98: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/98.jpg)
Compound Merging for English to German SMTCRF Training: learn binary merging decisions
German EnglishMERGE?
word POS MOD HEAD PROD EN:NPdarf VM 0 0 0 0 0ein DET 0 0 0 0 0kind NN 16,126 1,195 1,824 0 0punsch NN 2 13 2 0 0trinken VV 0 0 0 0 0? ? 0 0 0 0 0
jeder PRO 0 0 0 0 0darf VM 0 0 0 0 0kind NN 16,126 1,195 1,824 1 1punsch NN 2 13 2 0 0haben VV 0 0 0 0 0! ! 0 0 0 0 0
English feature determines merging decision!
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 99: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/99.jpg)
To thank you I want
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 100: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/100.jpg)
Where are we?
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 101: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/101.jpg)
Where are we?
Fabienne Cap Morphological ProcessingforStatistical Machine Translation
![Page 102: Morphological Processing for Statistical Machine Translationsara/kurser/MT16/slides/f7b-morph.pdf · Statistical Machine Translation Fabienne Cap May 11th, 2016. Goals for Today Why](https://reader034.fdocuments.us/reader034/viewer/2022050405/5f82fd8c055d2a75864991d7/html5/thumbnails/102.jpg)
Fabienne Cap Morphological ProcessingforStatistical Machine Translation