Thai-English MT Project: Transfer Module Prachya Boonkwan Prachya Boonkwan IPA: /pratʃəˑjaː...

39
Thai-English MT Thai-English MT Project: Project: Transfer Module Transfer Module Prachya Boonkwan Prachya Boonkwan IPA: /pratʃəˑjaː bunˑkʰʷan/ NECTEC, Thailand

Transcript of Thai-English MT Project: Transfer Module Prachya Boonkwan Prachya Boonkwan IPA: /pratʃəˑjaː...

Thai-English MT Project:Thai-English MT Project:Transfer ModuleTransfer Module

Prachya BoonkwanPrachya BoonkwanIPA: /pratʃəˑjaː bunˑkʰʷan/

NECTEC, Thailand

April 11, 2007 CERDEC, NJ 2

OutlineOutline

Introduction

Analysis Module

Transfer Module

Generation Module

Conclusion

April 11, 2007 CERDEC, NJ 3

IntroductionIntroduction

Thai-English Machine Translation

ThaiSentence

EnglishSentence

ThaiAnalysis

EnglishGenerationThai English

Transfer

April 11, 2007 CERDEC, NJ 4

Introduction Introduction (cont’d)(cont’d)

Characteristics of ThaiAnalytic language

Subject-Verb-Object pattern

Words written consecutively without space

Serial verb construction

No articles and no mass/concrete classification

Use of classifiers

Auxiliary words to express number, voice, tense, and aspect

April 11, 2007 CERDEC, NJ 5

Introduction Introduction (cont’d)(cont’d)

Issues of Thai-English translationsSummarized from Monthika’s observation

Different orderings between Thai and English

Different verb arguments between Thai and English

Implicit relations in Thai serial noun construction

Semantic duplication in Thai serial verb construction

No plural inflection in Thai

No inflection to express voices, tenses, and aspects in Thai

April 11, 2007 CERDEC, NJ 6

Introduction Introduction (cont’d)(cont’d)

Issue 1: Different orderings between Thai and English

TypesTypes ExamplesExamples

Noun Noun phrasephrase

ssûûererN mmàiàiADJ sõrngsõrngNUM tuatuaCL nánnánDET

Lit: shirtN newADJ twoNUM bodyCL thatDET

Trans: ThoseDET twoNUM newADJ shirtsN

Verb Verb phrasephrase

mmâekruaâekruaN triamtriamV aahãanaahãanN yàangyàangADVMARK rûatrewrûatrewADJ

Lit: female-cookN prepareV mealN ADVMARK rapidADJ

Trans: The female cookN rapidlyADJ+ADVMARK preparesV mealN

April 11, 2007 CERDEC, NJ 7

Introduction Introduction (cont’d)(cont’d)

Issue 2: Different verb arguments between Thai and English

chchãnãnPR paipaiV fràngsèsfràngsèsN

Lit: IPR goV FranceN

Trans: IPR goV toto FranceN

phaanrowngphaanrowngN1 yyùtùtV1 sùubsùubV2 bùrìibùrìiN2

Lit: janitorN1 stopV1 smokeV2 cigaretteN2

Trans: The janitorN1 stoppedV1 smokingsmokingV2 cigaretteN2

April 11, 2007 CERDEC, NJ 8

Introduction Introduction (cont’d)(cont’d)

Issue 3: Implicit relations in Thai serial noun construction

RelationsRelations ExamplesExamples

AdjectiveAdjectivechchâonâathîiâonâathîiN1 kamphuuchaakamphuuchaaN2

Lit: officerN1 cambodiaN2

Trans: cambodianN2 officerN1

PossessionPossessionphuumpanyaaphuumpanyaaN1 banphbanpháábbùùrrùùttN2

Lit: intelligenceN1 ancestorN2

Trans: ancestorN2’s intelligenceN1

AppositionAppositionphphûusàmàkûusàmàkN1 khonnôrkkhonnôrkN2

Lit: applicantN1 outsiderN2

Trans: applicantN1, who is an outsiderN2,

April 11, 2007 CERDEC, NJ 9

Introduction Introduction (cont’d)(cont’d)

Issue 4: Semantic duplication in Thai serial verb construction

raayngaanraayngaanV1 hâiP ssââababV2

Lit: reportV1 toP knowV2

Trans: reportV1 (to knowV2)

khkhâaâaV1 hâiP taaytaayV2

Lit: killV1 toP dieV2

Trans: killV1 (to dieV2)

phûutphûutV1 hâiP fangfangV2

Lit: tellV1 toP listenV2

Trans: tellV1 (to listenV2)

April 11, 2007 CERDEC, NJ 10

Introduction Introduction (cont’d)(cont’d)

Issue 5: No plural inflection in ThaiPluralization MethodsPluralization Methods ExamplesExamples

Numeral phraseNumeral phrasessùnákùnákN sãamsãamNUM tuatuaCL

Lit: dogN threeNUM bodyCL

Trans: threeNUM dogsN

Collective phraseCollective phraseplaaplaaN sãamsãamNUM fũungfũungCOL

Lit: fishN threeNUM schoolCOL

Trans: threeNUM schoolsCOL of fishN

DuplicationDuplicationddèkèkN-dèk-dèkN

Lit: child-childN

Trans: childrenN

Pluralization markerPluralization markerphphûakûakPLU nák-riannák-rianN

Lit: groupPLU studentN

Trans: studentsN

April 11, 2007 CERDEC, NJ 11

Introduction Introduction (cont’d)(cont’d)

Issue 6: No inflection to express voices, tenses, and aspects in Thai

Tns/Asp/ModTns/Asp/Mod ExamplesExamples

Past tensePast tensechchãnãnPR koeykoeyPAST paipaiV fràngsèsfràngsèsN

Lit: IPR PAST goV FranceN

Trans: IPR wentV+PAST to FranceN

Progressive Progressive aspectaspect

chchãnãnPR kamlangkamlangPROG dùemdùemV náamnáamN

Lit: IPR PROG drinkV waterN

Trans: IPR [am drinking]V+PROG waterN

Passive voicePassive voicechchãnãnPR thùukthùukPASS khruukhruuN thamthôtthamthôtV

Lit: IPR PASS teacherN punishV

Trans: IPR [am punished]V+PASS by the teacherN

April 11, 2007 CERDEC, NJ 12

Analysis ModuleAnalysis Module

Overview of Analysis Module

ThaiDep. Tree

Thai WordSegmentor

ThaiParser

ww11ww22ww33

List of wordsand POSes

Analysis ModuleAnalysis Module

ThaiGrammar

ThaiSentence

April 11, 2007 CERDEC, NJ 13

Analysis Module Analysis Module (cont’d)(cont’d)

Thai Parser: input and outputแบลร์�แบลร์� ส่�งส่�ง ทหาร์ทหาร์ อั งกฤษอั งกฤษ ไปไป อั�ร์ กอั�ร์ ก ในใน ป� ป� 20032003

Blair send soldier Britain to Iraq in 2003

‘Blair sent British soldiers to Iraq in 2003.’<

<>

<

< <

แบลร์�แบลร์�Blair

N

ส่�งส่�งsendVT

ทหาร์ทหาร์soldier

N

ไปไปtoP

อั�ร์ กอั�ร์ กIraqN

ในในinP

ป� ป� 200320032003

N

อั งกฤษอั งกฤษBritain

N

<

April 11, 2007 CERDEC, NJ 14

Transfer ModuleTransfer Module

Overview of Transfer Module

ThaiDep. Tree

Thai-EngTransformation English

Dep. Tree

Leaf-NodeCollection

ww11ww22ww33

List of lemmasannotated with

syntacticattributes

Transfer ModuleTransfer Module

MappingTables

April 11, 2007 CERDEC, NJ 15

Transfer Module Transfer Module (cont’d)(cont’d)

Attributes of Thai nouns

TypesTypes AttributesAttributes ExamplesExamples

SyntaxSyntax

Number singular, plural

Person first, second, third

Gender masculine, feminine, neuter

Definiteness definite, indefinite

Type antecedent, anaphora

SemanticsSemantics

Concept human, place, etc.

Role organization, etc.

Domain military, criminal, etc.

Reference human, place, etc.

April 11, 2007 CERDEC, NJ 16

Transfer Module Transfer Module (cont’d)(cont’d)

Attributes of Thai verb

TypesTypes AttributesAttributes ExamplesExamples

SyntaxSyntax

Number singular, plural

Time present, past, future

Gender masculine, feminine, neuter

Type antecedent, anaphora

SemanticsSemantics

Concept dynamic_place, etc.

Domain military, criminal, etc.

Direction inward, outward, etc.

Reference dynamic_place, etc.

April 11, 2007 CERDEC, NJ 17

Transfer Module Transfer Module (cont’d)(cont’d)

Transfer operations

PhrasesPhrases OperationsOperations

Noun phraseNoun phrase

reordering, adjectivization, possessive insertion (-’s), ‘of’ insertion, appositivization, classifier dropping, collective restructuring, number assignment, possessivization

Verb phraseVerb phrase

reordering, VP structure selection, adverbialization, participialization (present, past), infinitivization (with ‘to’/without ‘to’), tense/aspect assignment

SentenceSentence reordering, sentence structure selection

April 11, 2007 CERDEC, NJ 18

Transfer Module Transfer Module (cont’d)(cont’d)

Reordering (R)Relocates constituents resulting quasi-English dependency tree

Attribute assignment (A)Assigns English’s syntactic attributes to quasi-English tree

Insertion (I) & Deletion (D)Inserts/deletes constituents to quasi-English dependency tree resulting English tree

April 11, 2007 CERDEC, NJ 19

Transfer Module Transfer Module (cont’d)(cont’d)

Transfer operations classified into groups

GroupsGroups OperationsOperations

ReorderingReorderingconstituent reordering, collective restructuring, VP structure selection, sentence structure selection

Attribute Attribute AssignmentAssignment

number assignment, tense/aspect assignment, adjectivization, possessivization, participialization, adverbialization

InsertionInsertionpossessive insertion (-’s), ‘of’ insertion, appositivization, infinitivization

DeletionDeletion classifier dropping, serial verb fusion

April 11, 2007 CERDEC, NJ 20

Transfer Module Transfer Module (cont’d)(cont’d)

Steps of transfer operations

ThaiDep. Tree

ReorderingQuasi-English

Dep. Tree

Attributeassignment

Insertion& Deletion

EnglishDep. Tree

April 11, 2007 CERDEC, NJ 21

Transfer Module Transfer Module (cont’d)(cont’d)

Graphical notations: tree pattern

<

any depthany depth only one depthonly one depth

W1 W2

<

> W2

>W3

W4 W1W1 < *W2 (*W3 > (*W4 > *W1)) < *W2

April 11, 2007 CERDEC, NJ 22

Transfer Module Transfer Module (cont’d)(cont’d)

Graphical notation: transfer operation

<

> ADV

N V

>

>

V

N

ADV

OPERATIONOPERATION

RR

(N > V) < *ADV --> N > (*ADV > V) {R}

April 11, 2007 CERDEC, NJ 23

Transfer Module Transfer Module (cont’d)(cont’d)

Demonstration

dog body big three body this - bark IN CL ADJ NUM CL DET PROG V PR

> < <

><

<

<

>

April 11, 2007 CERDEC, NJ 24

Transfer Module Transfer Module (cont’d)(cont’d)

Demonstration (cont’d)

body big dog three body this - bark ICL ADJ N NUM CL DET PROG V PR

>

< <

>

>

<

<

>N < ADJN < ADJ--> ADJ > N {R}--> ADJ > N {R}

April 11, 2007 CERDEC, NJ 25

Transfer Module Transfer Module (cont’d)(cont’d)

Demonstration (cont’d)

three body body big dog this - bark INUM CL CL ADJ N DET PROG V PR

>

> <

>

<

>

<

>N < NUMN < NUM--> NUM > N {R}--> NUM > N {R}

April 11, 2007 CERDEC, NJ 26

Transfer Module Transfer Module (cont’d)(cont’d)

Demonstration (cont’d)

this three body body big dog - bark IDET NUM CL CL ADJ N PROG V PR

>

> <

>

<

>

>

>N < DETN < DET--> DET > N {R}--> DET > N {R}

April 11, 2007 CERDEC, NJ 27

Transfer Module Transfer Module (cont’d)(cont’d)

Demonstration (cont’d)

this three body body big dog - bark IDET NUM CL CL ADJ N PROG V PR

+plu

>

> <

>

<

>

>

>NUM > NNUM > N--> NUM > N[+plu] {A}--> NUM > N[+plu] {A}

April 11, 2007 CERDEC, NJ 28

Transfer Module Transfer Module (cont’d)(cont’d)

Demonstration (cont’d)

this three body body big dog - bark IDET NUM CL CL ADJ N PROG V PR

+plu +plu

>

> <

>

<

>

>

>DET > N[+plu]DET > N[+plu]--> DET[+plu] > N[+plu]--> DET[+plu] > N[+plu]{A}{A}

April 11, 2007 CERDEC, NJ 29

Transfer Module Transfer Module (cont’d)(cont’d)

Demonstration (cont’d)

this three body body big dog - bark IDET NUM CL CL ADJ N PROG V PR

+plu +plu +acc

>

> <

>

<

>

>

>V < PRV < PR--> V < PR[+acc] {A}--> V < PR[+acc] {A}

April 11, 2007 CERDEC, NJ 30

Transfer Module Transfer Module (cont’d)(cont’d)

Demonstration (cont’d)

this three body body big dog - bark IDET NUM CL CL ADJ N PROG V PR

+plu +plu +prog +acc

>

> <

>

<

>

>

>PROG > VPROG > V--> PROG > V[+prog]--> PROG > V[+prog]{A}{A}

April 11, 2007 CERDEC, NJ 31

Transfer Module Transfer Module (cont’d)(cont’d)

Demonstration (cont’d)

this three body body big dog - bark at IDET NUM CL CL ADJ N PROG V PR

+plu +plu +prog +acc

>

>

<

>

<

>

>

>

<

bark < PR -->bark < PR -->bark < (at < PR)bark < (at < PR){I}{I}

April 11, 2007 CERDEC, NJ 32

Transfer Module Transfer Module (cont’d)(cont’d)

Demonstration (cont’d)

this three body big dog - bark at IDET NUM CL ADJ N PROG V PR

+plu +plu +prog +acc

>

>

<

>>

>

>

<

NUM < CL --> NUM {D}NUM < CL --> NUM {D}

April 11, 2007 CERDEC, NJ 33

Transfer Module Transfer Module (cont’d)(cont’d)

Demonstration (cont’d)

this three big dog - bark at IDET NUM ADJ N PROG V PR

+plu +plu +prog +acc

> <

>>

>

>

<

CL > ADJ --> ADJ {D}CL > ADJ --> ADJ {D}

April 11, 2007 CERDEC, NJ 34

Transfer Module Transfer Module (cont’d)(cont’d)

Demonstration (cont’d)

this three big dog bark at IDET NUM ADJ N V PR

+plu +plu +prog +acc

>

<>

>

>

<

PROG > V[+prog]PROG > V[+prog]--> V[+prog] {D}--> V[+prog] {D}

April 11, 2007 CERDEC, NJ 35

Transfer Module Transfer Module (cont’d)(cont’d)

Demonstration (cont’d)

this three big dog bark at IDET NUM ADJ N V PR

+plu +plu +prog +acc

>

<>

>

>

<

this three big dog bark at IDET NUM ADJ N V PR

+plu +plu +prog +acc

Left-NodeLeft-NodeCollectionCollection

These three big dogs are barking at me.These three big dogs are barking at me.

April 11, 2007 CERDEC, NJ 36

Generation ModuleGeneration Module

Overview of generation module

Surface WordGeneration

ArticleInsertion

ww11ww22ww33

List of lemmasannotated with

syntacticattributes

Generation ModuleGeneration Module

LemmaMappingTables

W’W’11W’W’22W’W’33

Surface WordList

DiscourseStack DB

EE11EE22EE33

EnglishOutput

Sentence

April 11, 2007 CERDEC, NJ 37

ConclusionConclusion

Thai-English Machine Translation

ThaiSentence

EnglishSentence

ThaiAnalysis

EnglishGenerationThai English

Transfer

ThaiDep. Tree

ww11ww22ww33

List of lemmasannotated with

syntactic attributes

April 11, 2007 CERDEC, NJ 38

Conclusion Conclusion (cont’d)(cont’d)

Issues of Thai-English translation

Attributes of Thai lexical units

Generalized transfer operationsReordering

Attribute assignment

Insertion

Deletion

April 11, 2007 CERDEC, NJ 39

Upcoming WorkUpcoming Work

Prologizing transfer operations

Generation Module