Linguistically-motivated Tree-based Probabilistic Phrase Alignment Toshiaki Nakazawa, Sadao...

Linguistically-motivated Tree-based Probabilistic Phrase Alignment

Toshiaki Nakazawa, Sadao Kurohashi (Kyoto University)

Outline Background Tree-based Probabilistic Phrase Alignment

Model Model Training Symmetrization Algorithm Experiments Conclusions

2 05/03/23

Background Many of state-of-the-art SMT systems are

based on “word-based” alignment results Phrase-based SMT [Koehn et al., 2003] Hierarchical Phrase-based SMT [Chiang, 2005] and so on

Some of them incorporate syntactic information “after” word-based alignment [Quirk et al., 2005], [Galley et al., 2006] and so on

Is it enough? Is it able to achieve “practical” translation

quality?

3 05/03/23

Background (cont.) Word-based alignment model works well for

structurally similar language pairs It is not effective for language pairs with great

difference in linguistic structure such as Japanese and English SOV versus SVO

For such language pair, syntactic information is necessary even during alignment process

4 05/03/23

Related Work Syntactic tree-based model

[Yamada and Knight, 2001], [Gildea, 2003], ITG by Wu Incorporating some operations which control sub-

trees (re-order, insert, delete, clone) to reproduce the opposite tree structure

Our model does not require any operations Our model utilizes dependency trees

Dependency tree-based model [Cherry and Lin, 2003] Word-to-word, and one-to-one alignment Our model makes phrase-to-phrase alignment, and

can make many-to-many links

5 05/03/23

Features of Proposed Tree-based Probabilistic Phrase Alignment Model Generation model similar to IBM models Using phrase dependency structures

“phrase” means a linguistic phrase (cf. phrase-based SMT)

Phrase to phrase alignment model Each phrase (node) consists of basically 1 content

word and 0 or more function words Source side content words can be aligned to content

words of target side only (same for function words) Generation starts from the root node and end

up with one of leaf nodes (cf. IBM model is from first word to last word)

6 05/03/23



7 05/03/23

Dependency Analysis of Sentencesプロピレングリコールは血中グルコースインスリンを上昇させ、血中NEFA 濃度を減少させる

Propylene glycol increases in blood glucose and insulin and decreases in NEFA concentration in the blood

Source Target

Word order

Head node Head node

Root node

Root node

8 05/03/23

IBM Model v.s Tree-based Model IBM Model [Brown et al., 93]

Tree-based Model

)|(),|(maxargˆ eapaefpaa

)|(),|(maxargˆ eefa

TapaTTpa

f : source sentencee : target sentence

S

s asss eapaefp

1

)|(),|(maxargˆ

S

s asesesf TapaTTp

1,,, )|(),|(maxargˆ

a : alignment

: parameters

fT : source tree

eT : target tree

9 05/03/23

Model Decomposition:Alignment Probability Define the parent node of as is decomposed as a product of target

side dependency relation probability conditioned on source side relation

If the parent node has been aligned to NULL, indicates the grandparent of , and this continues until has been aligned to other than NULL

models a tree-based reordering

jf

)|( eTap

J

jjjaae ffreleerelpTap

jj1

)),(|),(()|(

jf )|( eTap

jf jf

jf

Dependency relation

probability

)|( eTap


TapaTTpa

11 05/03/23



12 05/03/23

Model Training The proposed model is trained by EM

algorithm First, phrase translation probability is learned

(Model 1) Model 1 can be efficiently learned without

approximation (cf. IBM model 1 and 2) Next, dependency relation probability is

learned (Model 2) with probabilities learned in Model 1 as initial parameters Model 2 needs some approximation (cf. IBM model

3 or greater), we use beam-search algorithm

13 05/03/23

Model 1 Each phrase in source side can

correspond to an arbitrary phrase in target side a or NULL phrase

A probability of one possible alignment is:

Then, tree translation probability is:

Efficiently calculated as:

)1( Jjf j

)1( Iiei )( 0e

)|()|()|,( .1

. jj ajfunc

J

jajcontef efpefpTTap

a

efef TTapTTp )|,()|(

J

j

I

iaj

a

J

jaj jj

efpefp1 01

)|()|(

14 05/03/23

Model 2 (imaginary ROOT node) Root node of a sentence is supposed to

depend on the imaginary ROOT node, which works as a Start-Of-Sentence (SOS) in word-based model

The ROOT node in source tree always corresponds to that of target tree

事例を通して援助の

視点に必要な

ポイントを確認した

ROOT

necessarythe point

through the casein the viewpoint

of the assistwas confirmed

ROOT

15 05/03/23

Model 2 (beam-search algorithm) It is impossible to enumerate all the possible

alignment Consider only a subset of “good-looking”

alignments using beam-search algorithm Ex) beam-width = 4

事例を通して援助の

視点に必要な


necessarythe point


of the assistwas confirmedNULL

16 05/03/23

Model 2 (beam-search algorithm)事例を通して

援助の視点に

必要な


necessary

the pointthrough the case

in the viewpoint


NULL

事例を通して

援助の視点に

必要な


necessary


in the viewpoint


NULL

事例を通して

援助の視点に

必要な


necessary


in the viewpoint


NULL

事例を通して

援助の視点に

必要な


necessary


in the viewpoint


NULL

17 05/03/23

Model 2 (parameter notations) Dependency relation between two

phrases and is defined as a path from to using the following notations: “c-” if is a pre-child of “c+” if is a post-child of “p-” if is a post-child of “p+” if is a pre-child of “INCL” if and are same phrase “ROOT” if is an imaginary ROOT node “NULL” if is aligned to NULL

),( 21 PPrel1P 2P 2P 1P

1P 2P

1P 2P

1P2P1P2P

1P 2P

18 05/03/23

1P

1P

1P

1P

2Pc-

c+p-

p+

2P

2P

2P

2P

1P ROOTROOT2P

1P

Model 2 (parameter notations, cont.) In a case where and are two or more

nodes distant from each other, the relation is described by combining the notations

Ex)

05/03/2319

1P 2P

1P2P

c-c+

c-;c+

1P

2P

c-c+

p-

p-;c+;c-

Dependency Relation Probability Examples事例を通して

援助の視点に

必要な


necessary


in the viewpoint


NULL

事例を通して

援助の視点に

必要な


necessary


in the viewpoint


NULL

事例を通して

援助の視点に

必要な


necessary


in the viewpoint


NULL

事例を通して

援助の視点に

必要な


necessary


in the viewpoint


NULL

20 05/03/23

c-)|(c-ROOT)|(ROOT pp c-)|(p-ROOT)|c-(ROOT; pp

c-)|c(c-;ROOT)|(ROOT pp c-)ROOT;|(ROOTROOT)|(NULL pp



22 05/03/23

Symmetrization Algorithm Since our model is directed, we run the model

bi-directionally and symmetrize two alignment results heuristically

Symmetrization algorithm is similar to [Koehn et al. 2003], which uses 1-best GIZA++ word alignment result of each direction

Our algorithm exploits n-best alignment results of each direction

Three steps: Superimposition Growing Handling isolations

23 05/03/23

Symmetrization Algorithm1. SuperimpositionSource to Target

5-bestTarget to Source

5-best5 210

1010

5 35

1 57 3

105 9

77 1

・・・・・・

24 05/03/23

Symmetrization Algorithm1. Superimposition (cont.)

5 210

1010

5 35

1 57 3

105 9

77 1

Definitive alignment points are adopted The points which don’t

have same or higher scored point in their same row or column

Conflicting points are discarded The points which is in

the same row or column of the adopted point and is not contiguous to the adopted point on tree

5 210

1010

5 35

1 57 3

105 9

77 1

5 210

1010

3

57

105 9

77

25 05/03/23

Symmetrization Algorithm2. Growing Adopt contiguous

points to adopted points in both source and target tree In descending order of

the score From top to bottom From left to right

Discard conflicting points The points which have

adopted point both in the same row and column

5 210

1010

3

57

105 9

77

510

1010

3

57

109

77

26 05/03/23

Symmetrization Algorithm3. Handling Isolation Adopt points which

are not aligned to any phrase in both source and target language

510

1010

3

57

109

77

510

1010

3

57

109

77

27 05/03/23

Alignment Experiment Training corpus

Japanese-English paper abstract corpus provided by JST which consists of about 1M parallel sentences

Gold-standard alignment Manually annotated 100 sentence pairs among the

training corpus Sure (S) alignment only [Och and Ney, 2003]

Evaluation unit Morpheme-based for Japanese Word-based for English

Iterations 5 iterations for Model 1, and 5 iterations for Model 2

28 05/03/23

Alignment Experiment (cont.) Comparative experiment (word-base

alignment) GIZA++ and various symmetrization heuristics

[Koehn et al., 2007] Default settings for GIZA++

Use original forms of words for both Japanese and English

29 05/03/23

Results

Precision Recall F-measure

proposed

1-best-intersection 90.92 41.69 57.171-best-grow 83.30 54.33 65.763-best-grow 81.21 56.52 66.655-best-grow 80.59 57.33 67.00

GIZA++

intersection 88.14 40.18 55.20grow 83.50 49.65 62.27grow-final 67.19 56.91 61.63grow-final-and 78.00 52.93 63.06grow-diag 77.34 53.18 63.03grow-diag-final 67.24 56.63 61.48grow-diag-final-and 74.95 54.26 62.95

30 05/03/23

Example of Alignment ImprovementProposed model Word-base alignment

31 05/03/23

Translation Experiments Training corpus

Same to alignment experiments Test corpus

500 paper abstract sentences Decoder

Moses [Koehn et al., 2007] Use default options except for phrase table limit (20 -

> 10) and distortion limit (6 -> -1) No minimum error rate training

Evaluation BLEU No punctuations and case-insensitive

33 05/03/23

ResultsPre Rec F BLEU

proposed 1-best-intersection 90.92 41.69 57.17 12.735-best-grow 80.59 57.33 67.00 15.40

GIZA++intersection 88.14 40.18 55.20 16.35grow-diag 77.34 53.18 63.03 17.89grow-diag-final-and 74.95 54.26 62.95 17.76

34 05/03/23

Definition of function words is improper Articles? Auxiliary verbs? …

Tree-based decoder is necessary BLEU is essentially insensitive to syntactic

structure Translation quality potentially improved

Potentially Improved Example Input:

これはＬＢ膜の厚みがアビジンを吸着することで増加したことによる。

Proposed (30.13):this is due to the increase in the thickness of the lb film avidin adsorb

GIZA++ (33.78):the thickness of the lb film avidin to adsorption increased by it

Reference:this was due to increased thickness of the lb film by adsorbing avidin

05/03/2335

Conclusion Tree-based probabilistic phrase alignment

model using dependency tree structures Phrase translation probability Dependency relation probability

N-best symmetrization algorithm Achieve high alignment accuracy compared to

word-based models Syntactic information is useful during alignment

process BUT: Unable to improve the BLEU scores of

translation

36 05/03/23

Future Work More flexible model

Content words sometimes correspond to function words and vice versa

Integrate parsing probabilities into the model Parsing errors easily lead to alignment errors By integrating parsing probabilities, parsing

results and alignment can be revised complementary

More syntactical information Use POS or phrase category into the model

37 05/03/23

05/03/2338

Thank You!

Linguistically-motivated Tree-based Probabilistic Phrase Alignment Toshiaki Nakazawa, Sadao...

Documents

Transcript of Linguistically-motivated Tree-based Probabilistic Phrase Alignment Toshiaki Nakazawa, Sadao...