Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao...

28
Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University

Transcript of Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao...

Page 1: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Statistical Phrase Alignment Model  Using Dependency Relation Probability

Toshiaki Nakazawa and Sadao Kurohashi

Kyoto University

Page 2: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Outline Background Tree-based Statistical Phrase Alignment Model Model Training Experiments Conclusions

2

Page 3: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Conventional Word Sequence Alignment

受 (accept)

光 (light)

素子 (device)

に (ni)

は (ha)

フォト (photo)

ゲート (gate)

を (wo)

用いた (used)

A

photogate

is

used

for

the

photodetector

Page 4: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

・・・exhibited ■ ■astrong ■inhibitory ■effect ■on ■ ■tumor ■growth ■in ■the ■castrated ■mice ■as ■ ■in ■thenon-castrated ■mice ■

 ・・・

非 去勢

マウス

と 同様に

去勢

マウス

の 腫よう

の 成長

に 対し

強い

抑制

効果

を 示した

grow-diag-final-and

Page 5: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Conventional Word Sequence Alignment

受 (accept)

光 (light)

素子 (device)

に (ni)

は (ha)

フォト (photo)

ゲート (gate)

を (wo)

用いた (used)

A

photogate

is

used

for

the

photodetector

素子

フォト

ゲート

用いた

A

photogate

is

used

for

the

photodetector

(accept)

(light)

(device)

(photo)

(gate)

(used)

(ni)

(ha)

(wo)

Proposed Model

1. Dependency trees

Page 6: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Proposed Model

素子

フォト

ゲート

用いた

A

photogate

is

used

for

the

photodetector

(accept)

(light)

(device)

(photo)

(gate)

(used)

(ni)

(ha)

(wo)

1. Dependency trees2. Phrase alignment3. Bi-directional agreement

Page 7: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

・・・exhibited ■ ■

astrong ■

inhibitory ■

effect ■

on ■ ■

tumor ■

growth ■

in ■

the ■

castrated ■

mice ■

as ■ ■

in ■

thenon-castrated ■

mice ■

 ・・・

非 去勢

マウス

と 同様に

去勢

マウス

の 腫よう

の 成長

に 対し

強い

抑制

効果

を 示した

grow-diag-final-and・・・                                                         exhibited                                                    ■ ■ │  ┌─a                                                         │   ─├ strong                                           ■            │   ─├ inhibitory                                              ■         ─├ effect                                                 ■      ─├ on                                                         

│  │  ┌─tumor                            ■                           │  └─growth                                  ■                     ─├ in                         ■                              

│  │  ┌─the                                                         │  │   ─├ castrated                   ■                                    │  └─mice                      ■                                 └─as                ■                                       

   └─ in                                     ■                  

    │ ┌─ the                                                         

    │ ├─ non-castrated

   ■ ■                                                

    └─ mice          ■                                             

 ・・・

┌非

┌去勢

┌マウス

─┌と

┬同様に

┌去勢

┌マウス

┌の

┌腫よう

─ ┌の

┌成長

─┌に

┬対し

┌強い

─ ┬抑制

─┌効果

┬を

示した

Proposed model

Page 8: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Related Work Using tree structures

[Cherry and Lin, 2003], [Quirk et al., 2005], [Galley et al., 2006], ITG, …

Considering phrase alignment [Zhang and Vogel, 2005], [Ion et al., 2006], …

Using two directed models simultaneously [Liang et al., 2006], [Graca et al., 2008], …

Page 9: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Tree-based Statistical Phrase Alignment Model

Page 10: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Dependency Analysis of Sentences

素子

フォト

ゲート

用いた

A

photogate

is

used

for

the

photodetector

(accept)

(light)

(device)

(photo)

(gate)

(used)

(ni)

(ha)

(wo)

受光素子にはフォトゲートを用いたA photogate is used for the photodetector

Source (Japanese) Target (English)

Word order

Head node

Head node

Page 11: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Overview of the Proposed Model(in comparison to the IBM models) IBM models find the best alignment by

Proposed model

a

)|(),|(maxarg

)|,(maxargˆ

eaaef

eafa

a

a

pp

p

Word translatio

n

Word reordering

f: source sentence

e: target sentencea: alignment

)|(),|(maxarg

)|,(maxargˆ

eaaef

eafa

a

a

pp

p

Phrase translatio

n

Dependency Relation

)|(),|()|(),|(maxarg

)|,()|,(maxargˆ

faafeeaaef

faeeafa

a

a

pppp

pp

Phrase translatio

n

Phrase translatio

n

Dependency Relation

Dependency Relation

),|( aefp

)|( eap

: Lexical prob.

: Alignment prob.

Page 12: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Phrase Translation Probability

Page 13: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Phrase Translation Probability

Note that the sentences are not previously segmented into phrases

J

jAjs jsEFpp

1)( )|(),|(

)(Aef

J

jaj jefpp

1

)|(),|( aefIBM Model

)|()NULL|(

)|()|()|(),|(

213

323221

EFpFp

EFpEFpEFpp

Aef

f4

f3

F2

f5

f2

f1

F1

F3

s(j):s(1) = 1s(2) = 2s(3) = 2s(4) = 3s(5) = 1

source

e4

e3

E2e2

e1

E1

E3

A:A1=2A2=3A3=0

target

Page 14: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Dependency Relation Probability

Page 15: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Dependency Relations

Parent-child

Parent-child

Grandparent-child

?Fs(c)

EAs(c)

EAs(p)EAs(c)

rel(fc, fp) = c

Invertedparent-child

EAs(p)

Fs(p)fp

fc

rel(fc, fp) = c;crel(fc, fp) = prel(fc, fp) = NULL_p

source target

・・・

・・・

・・・

・・・

・・・

NULL

Page 16: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Dependency Relation Probability

Ds-pc is a set of parent-child word pairs in the source sentence

Source-side dependency relation probability is defined in the same manner

pcs)cp,(

cp ))f,f(()|(D

t relpp eA

pct)cp,(

cp ))e,e(()|(D

s relpp fA

Page 17: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Model Training

Page 18: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Model Training

Step 1 : Estimate word translation prob. (IBM Model 1)

Initialize dependency relation prob.

Step 2 : Estimate phrase translation prob. and dependency relation prob. E-step

1. Create initial alignment2. Modify the alignment by hill-climbing

Generate possible phrases M-step: Parameter estimation

Word base

Tree base

p( コロラド |Colorado)=0.7p( 大学 |university)=0.6…

p(c) = 0.4p(c;c)= 0.3p(p) = 0.2…

p( コロラド |Colorado)=0.7p( 大学 |university)=0.6p( コロラド 大学 |university of Colorado)=0.9…

Page 19: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Step 2 (E-step)

受光

素子に

はフォト

ゲートを

用いた

A

photogate

is

used

for

the

photodetector

受光

素子に

はフォト

ゲートを

用いた

A

photogate

is

used

for

the

photodetector

Initial Alignment

Swap Reject

Initial alignment is greedily created

Modify the initial alignment with the operations: Swap Reject Add Extend

Example of Hill-climbing

)|(),|()|(),|(maxarg

)|,()|,(maxargˆ

faafeeaaef

faeeafa

a

a

pppp

pp

Page 20: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Generate Possible Phrases Generate new possible phrases by merging

the NULL-aligned nodes into their parent or child non-NULL-aligned nodes

The new possible phrases are taken into consideration from the next iteration

受光

素子に

はフォト

ゲートを

用いた

A

photogate

is

used

for

the

photodetector

Page 21: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Model Training

Step 1 : Estimate word translation prob. (IBM Model 1)

Initialize dependency relation prob.

Step 2 : Estimate phrase translation prob. and dependency relation prob. E-step

1. Create initial alignment2. Modify the alignment by hill-climbing

Generate possible phrases M-step: Parameter estimation

Word base

Tree base

p( コロラド |colorado)=0.7p( 大学 |university)=0.6…

p(c) = 0.4p(c;c)= 0.3p(p) = 0.2…

p( コロラド |colorado)=0.7p( 大学 |university)=0.6p( コロラド 大学 |university of colorado)=0.9…

Page 22: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Experiments

Page 23: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Alignment Experiments Training: JST Ja-En paper abstract corpus (1M

sentences, Ja: 36.4M words, En: 83.6M words) Test: 475 sentences with the gold-standard

alignments annotated by hand Parsers: KNP for Japanese, MSTParser for

English Evaluation criteria: Precision, Recall, F1 For the proposed model, we did 5 iterations in

each Step

Page 24: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Experimental Results

Pre. Rec. FProposed 87.75 50.27 63.92

intersection 90.34 34.28 49.71

grow-final-and 81.32 48.85 61.04

grow-diag-final-and 79.39 51.15 62.22

+1.7

Page 25: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Effectiveness of Phrase and Tree

Pre. Rec. FTrees + Phrases (Proposed) 85.54 51.00 63.90

Trees 89.77 39.47 54.83

Phrases 84.41 47.33 60.65

None 85.07 38.06 52.59

cp

-1

+1

Positional relations instead of dependency relations

Page 26: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Discussions Parsing errors

Parsing accuracy is basically good, but still sometimes makes incorrect parsing results

Parsing probability into the model Search errors

Hill-climbing sometimes goes local minima Random restart

Function words Behave quite differently in different languages (ex.

case markers in Japanese, articles in English) Post-processing

Page 27: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Post-processing for Function Words Reject correspondences between Japanese

particles and English “be” or “have” Reject correspondences of English articles Japanese “ する” and “ れる” or English “be” and

“have” are merged into its parent verb or adjective if they are NULL-aligned

Pre. Rec. FProposed 87.75 50.27 63.92

Proposed+ modify 87.83 58.40 70.16grow-diag-final-and 79.39 51.15 62.22grow-diag-final-and + modify 80.46 51.15 62.54

+6.2

+0.3

Page 28: Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Conclusion and Future Work Linguistically motivated phrase alignment

1. Dependency trees2. Phrase alignment3. Bi-directional agreement

Significantly better results compared to conventional word alignment models

Future work: Apply the proposed model for other language pairs

(Japanese-Chinese and so on) Incorporate parsing probability into our model Investigate the contribution of our alignment

results to the translation quality