CS 479, section 1: Natural Language Processing

CS 479, section 1:Natural Language Processing

Lecture #35: Word Alignment Models (cont.)

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.

Content by Eric Ringger, partially based on earlier slides from Dan Klein of U.C. Berkeley.

http://creativecommons.org/licenses/by-sa/3.0/



Announcements Project #4

Your insights into treebank grammars? Project #5

Model 2 discussed today! Propose-your-own

Reminder: No presentation, unless you really want to give one!

Check the schedule Plan enough time to succeed! Don’t get or stay blocked. Get your questions answered early. Get the help you need to keep moving forward. No late work accepted after the last day of instruction.

Announcements (2) Project Report:

Early: Wednesday Due: Friday

Homework 0.4 Due: today

Reading Report #14 Phrase-based MT paper Due: next Monday (online again)

EM Revisited

1. What are the four steps of the Expectation Maximization (EM) algorithm? Think of document clustering and/or training IBM

Model 1!

2. What are the two primary purposes of EM?

Objectives

Observe problems with IBM Model 1

Model ordering issues as IBM Model 2!

“Monotonic Translation”

Le Japon secoué par deux nouveaux séismes

Japan shaken by two new quakes

NULL

How would you implement a monotone decoder?(to translate the French)

MT System You could now build a simple MT system using:

English language model English to French alignment model (IBM Model 1)

Canadian Hansard data Monotone Decoder

Greedy Or Viterbi

IBM Model 1

1... Ja a a1 2a 2 3a 3 4a 4 5a 5 6a 6 6a 7 6a

( |ˆ ( , | )1)1 jj

jaI

t fa e eP f

ˆ ˆ( | ) ( , | )a

P f e P f a e

Target:

Source:

One-to-Many Alignments

But there are other problems to think about as the following examples will show:

Problem: Many-to-One Alignments

Problem: Many-to-Many Alignments

Problem: Local Order Change

Le Japon est au confluent de quatre plaques tectoniques

Japan is at the junction of four tectonic plates

“Distortions”

Problem: More Distortions

Le tremblement de terre a fait 39 morts et 3,183 blessés.

The earthquake killed 39 and wounded 3,183.

Insights

How to include “distortion” in the model?

How to prefer nearby distortions over long-distance distortions?

IBM Model 2 Reminder: Model 1

Could model distortions without any strong assumptions about where they occur as a distribution over target language positions:

Could build a model as a distribution over distortion distances:

Matrix View of an Alignment

Preference for the Diagonal But alignments for some language pairs tend to the

diagonal in general: Can use a normal distribution for the distortion model

EM for Model 2 Model 2 Parameters:

Translation probabilities: Distortion parameters:

Initialize with Model 1 Initialize as uniform E-step: For each pair of sentences :

For each French position 1. Calculate posterior over English positions :

2. Increment count of word with word by these amounts:

3. Similarly, for each English position , update:

( | , , )( | , , )

( | )( | )

, )·

( |·

,i

j i

j i

t f et

a i j I Ja f

P i f e jj I J ei

( | , ,, )( )j i P i jf fC ee

| , , ; ,C i j J I f e

EM for Model 2 (cont.) M-step:

Re-estimate by normalizing these counts one conditional distribution for each context

Re-estimate by normalizing the earlier counts one conditional distribution per word e

Iterate until convergence of or a handful of times

See the directions for Project #5 on the course wiki for a more detailed version of this EM algorithm, including implementation tips.

Next

Even better alignment models

Evaluating alignment models

Evaluating translation end-to-end!

CS 479, section 1: Natural Language Processing

Documents

Transcript of CS 479, section 1: Natural Language Processing