Bridging the Gap: Machine Translation for Lesser Resourced Languages

Post on 21-Jan-2016

45 views 0 download

Tags:

description

Bridging the Gap: Machine Translation for Lesser Resourced Languages. Christian Monson, Ariadna Font Llitjós, Lori Levin, Alon Lavie, Alison Alvarez, Roberto Aranovich, Jaime Carbonell, Robert Frederking, Erik Peterson, Kathrin Probst. Mapudungun 900,000 Speakers. - PowerPoint PPT Presentation

Transcript of Bridging the Gap: Machine Translation for Lesser Resourced Languages

Bridging the Gap: Machine Translation for Lesser Resourced Languages

Christian Monson, Ariadna Font Llitjós, Lori Levin, Alon Lavie, Alison Alvarez, Roberto Aranovich, Jaime Carbonell,

Robert Frederking, Erik Peterson, Kathrin Probst

2

Inupiaq100’s of Speakers

Quechua6 Million Speakers

Mapudungun900,000 Speakers

Katrina100’s of Speakers

3

Machine Translation (MT)

SourceLanguage

TargetLanguage

4

Machine Translation (MT)

SourceLanguage

TargetLanguageDirect

Statistical MTExample Based MT

5

Machine Translation (MT)

Text Generation

SourceLanguage

TargetLanguage

TransferRule Based MT

DirectStatistical MT

Example Based MT

Syntactic Parsing

Morphologial Analysis+

6

Machine Translation (MT)

Semantic Analysis

Sentence

Planning

Text Generation

SourceLanguage

TargetLanguage

TransferRule Based MT

DirectStatistical MT

Example Based MT

Interlingua

Syntactic Parsing

Morphologial Analysis+

7

Machine Translation (MT)

Semantic Analysis

Text Generation

SourceLanguage

TargetLanguage

TransferRule Based MT

Interlingua

DirectStatistical MT

Example Based MT

+ High quality- Expertise intensive

development cycle

Syntactic Parsing

Morphologial Analysis+

8

Machine Translation (MT)

Semantic Analysis

Text Generation

SourceLanguage

TargetLanguage

TransferRule Based MT

Interlingua

DirectStatistical MT

Example Based MT

+ Short development time

- Requires large bilingual corpus

Syntactic Parsing

Morphologial Analysis+

9

Machine Translation (MT)

Syntactic Parsing

Semantic Analysis

Text Generation

SourceLanguage

TargetLanguage

Interlingua

Morphologial Analysis+

TransferRule Based MT

DirectStatistical MT

Example Based MT

Our Approach

10

Machine Translation (MT)

Syntactic Parsing

Semantic Analysis

Text Generation

SourceLanguage

TargetLanguage

Interlingua

Morphologial Analysis+

TransferRule Based MT

DirectStatistical MT

Example Based MT

+ High quality- Expertise intensive

development cycle

11

Machine Translation (MT)

Syntactic Parsing

Semantic Analysis

Text Generation

SourceLanguage

TargetLanguage

Interlingua

Morphologial Analysis+ Automate the

development of deep-analysis MT

+ High quality- Expertise intensive

development cycle

12

Our Position

Linguistic Structure

and

Bilingual Informants

help automate the development of

deep-analysis machine translation systems

13

Sub-Problems

1. Morphology Induction

2. Syntax Refinement

14

Morphology Induction

1. Linguistic Structure

2. Bilingual Informants

15

Morphology Induction

1. Linguistic Structure

2. Bilingual Informants

16

Paradigms Organize Morphology

Hab Mode ReportPol / Mood

TenseObj Agr

ke pe (ü)rkela a

fiki fu

Ø Ø Ønu afu

ØØ Ø

Mapudungun

Subj Agr / Mood

(ü)n

li

chi

yu

Loc Asp

pa tu

pu ka

Ø Ø

17

Paradigm Discovery in 3 Steps1. Search out partial paradigms in a network of candidates

2. Cluster overlapping partial paradigms

3. Filter the clusters, keeping the largest clusters most likely to model true paradigms

e.er.erá.ido.ieron.ió28: deb, escog, ofrec, roconoc, vend, ...

e.ido.ieron.ir.irá.ió28: asist, dirig, exig, ocurr, sufr, ...

e.erá.ido.ieron.ió28: deb, escog, ...

e.er.ido.ieron.ió46: deb, parec, recog...

e.ido.ieron.irá.ió28: asist, dirig, ...

e.ido.ieron.ir.ió39: asist, bat, sal, ...

e.er.erá.ieron.ió32: deb, padec, romp, ...

e.ido.ieron.ió86: asist, deb, hund,...

e.erá.ieron.ió32: deb, padec, ...

er.ido.ieron.ió58: ascend, ejerc,

recog, ...

ido.ieron.ir.ió44: interrump, sal, ...

azar.e.ido.ieron.ir.ió1: sal

A portion of a Spanish paradigm candidate network

18

Morpho Challenge 2007

Unsupervised Morphology Induction Competition

English• 3rd Place Overall• Bested the Strong Baseline Morfessor (Creutz, 2006)

German• 1st Place when Combined with Morfessor

19

Morpho Challenge 2007

Unsupervised Morphology Induction Competition

English• 3rd Place Overall• Bested the Strong Baseline Morfessor (Creutz, 2006)

German• 1st Place when Combined with Morfessor

No Mapudungun yetAgglutinative sequences of suffixes coming soon

20

Our Machine Translation Architecture

INPUT TEXT

21

Our Machine Translation Architecture

INPUT TEXT

Morphology Analysis

Morphology Analysis Lexicon

22

Our Machine Translation Architecture

INPUT TEXT

Grammar

&

Lexicon

Machine Translation

System

Morphology Analysis

Morphology Analysis Lexicon

23

Morphology Generation

Our Machine Translation Architecture

INPUT TEXT

Grammar

&

Lexicon

Morphology Analysis

Morphology Analysis Lexicon

Morphology Generation

Lexicon

Machine Translation

System

24

Morphology Generation

Our Machine Translation Architecture

INPUT TEXT

Grammar

&

Lexicon

OUTPUT TEXT

Morphology Analysis

Morphology Analysis Lexicon

Morphology Generation

Lexicon

Machine Translation

System

25

Morphology Generation

Our Machine Translation Architecture

INPUT TEXT

Grammar

&

Lexicon

OUTPUT TEXT

Morphology Analysis

Morphology Analysis Lexicon

Morphology Generation

Lexicon

Machine Translation

System

26

Morphology Generation

Our Machine Translation Architecture

INPUT TEXT

Grammar

&

Lexicon

OUTPUT TEXT

Morphology Analysis

Morphology Analysis Lexicon

Morphology Generation

Lexicon

Machine Translation

System

27

Sub-Problems

1. Morphology Induction

2. Syntax Refinement

28

Syntax Refinement

1. Linguistic Structure

2. Bilingual Informants

29

Syntax Refinement

1. Linguistic Structure

2. Bilingual Informants

30

Mapudungun

pelafiñ Maria

Spanish

No vi a María

English

I didn’t see Maria

Linguistic Structure: Syntax

31

Mapudungun

pelafiñ Mariape -la -fi -ñ Mariasee -neg -3.obj -1.subj.indicative Maria

Spanish

No vi a MaríaNo vi a Maríaneg see.1.subj.past.indicative acc Maria

English

I didn’t see Maria

Linguistic Structure: Syntax

32

V

pe

pe-la-fi-ñ Maria

33

V

pe

pe-la-fi-ñ Maria

VSuff

laNegation = +

34

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffGPass all features up

35

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fiobject person = 3

36

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fi

VSuffGPass all features up from both children

37

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

person = 1number = sgmood = ind

38

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

Pass all features up from both children

VSuffG

39

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

Pass all features up from both children

VSuffG

VCheck that:1) negation = +2) tense is undefined

40

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

V NP

N

Maria

N person = 3number = sghuman = +

41

V

pe

pe-la-fi-ñ Maria

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

Check that NP is human = +

Pass features up from V VP

42

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

43

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

Pass all features to Spanish side

44

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

Pass all features down

45

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

Pass object features down

46

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

Accusative marker on objects is introduced because human = +

VP

NP“a”V

47

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

VP::VP [VBar NP] -> [VBar "a" NP]( (X1::Y1)

(X2::Y3)

((X2 type) = (*NOT* personal)) ((X2 human) =c +)

(X0 = X1) ((X0 object) = X2)

(Y0 = X0)

((Y0 object) = (X0 object))(Y1 = Y0)(Y3 = (Y0 object))((Y1 objmarker person) = (Y3 person))((Y1 objmarker number) = (Y3 number))((Y1 objmarker gender) = (Y3 gender)))

48

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

V“no”

Pass person, number, and mood features to Spanish Verb

Assign tense = past

49

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

V“no”

Introduced because negation = +

50

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

V“no”

ver

51

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

V“no”

vervi

person = 1number = sgmood = indicativetense = past

52

V

pe

Transfer to Spanish: Top-Down

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

V“no”

vi N

María

N

Pass features over to Spanish side

53

V

pe

I didn’t see Maria

VSuff

la

VSuffG VSuff

fi

VSuffG VSuff

ñ

VSuffG

NP

N

Maria

N

S

V

VP

S

VP

NP“a”V

V“no”

vi N

María

N

54

Syntax Refinement

1. Linguistic Structure

2. Bilingual Informants

55

Morphology Generation

Syntax Refinement Architecture

INPUT TEXT

Grammar

&

Lexicon

Run-Time MT

System

OUTPUT TEXT

Morphology Analysis

Morphology Analysis Lexicon

Morphology Generation

Lexicon

56

Morphology Generation

INPUT TEXT

Grammar

&

Lexicon

Run-Time MT

System

Rule Refinement

OUTPUT TEXT

Morphology Analysis

Online

Translation

Correction

Tool

Syntax Refinement Architecture

57

INPUT TEXT

Grammar

&

Lexicon

Run-Time MT

System

Rule RefinementMorphology

Analysis

Online

Translation

Correction

Tool

Syntax Refinement Architecture

58

INPUT TEXT

Grammar

&

Lexicon

Run-Time MT

System

Rule Refinement

OUTPUT TEXT

Morphology Analysis

Online

Translation

Correction

Tool

Syntax Refinement Architecture

Morphologhy Generation

59

Children played a game

60

61

62

The children played a game

63

VP

Det

NP

NP

N

niños

N

VP

S

PolP

V

jugaron

V

un N

juego

N

Refining the Grammar

64

VP

Det

NP

NP

N

niños

N

VP

S

PolP

V

jugaron

V

un N

juego

Nlos

Refining the Grammar

65

VP

Det

NP

NP

N

niños

N

VP

S

PolP

V

jugaron

V

un N

juego

Nlos

Refining the Grammar

66

Syntax Refinement Summary

• Increases translation quality on unseen data– English-Spanish experiments (Font Llitjós et al, 2007, MT Summit)

• Generalizes to a Mapudungun-Spanish machine translation system

67

Overall Summary

Linguistic Structure

and

Bilingual Informants

help automate the development of

deep-analysis machine translation systems:

Morphology Induction

and

Syntax Refinement

68

Thank You!