CS 479, section 1: Natural Language Processing
description
Transcript of CS 479, section 1: Natural Language Processing
CS 479, section 1:Natural Language Processing
Lecture #35: Word Alignment Models (cont.)
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.
Content by Eric Ringger, partially based on earlier slides from Dan Klein of U.C. Berkeley.
Announcements Project #4
Your insights into treebank grammars? Project #5
Model 2 discussed today! Propose-your-own
Reminder: No presentation, unless you really want to give one!
Check the schedule Plan enough time to succeed! Don’t get or stay blocked. Get your questions answered early. Get the help you need to keep moving forward. No late work accepted after the last day of instruction.
Announcements (2) Project Report:
Early: Wednesday Due: Friday
Homework 0.4 Due: today
Reading Report #14 Phrase-based MT paper Due: next Monday (online again)
EM Revisited
1. What are the four steps of the Expectation Maximization (EM) algorithm? Think of document clustering and/or training IBM
Model 1!
2. What are the two primary purposes of EM?
Objectives
Observe problems with IBM Model 1
Model ordering issues as IBM Model 2!
“Monotonic Translation”
Le Japon secoué par deux nouveaux séismes
Japan shaken by two new quakes
NULL
How would you implement a monotone decoder?(to translate the French)
MT System You could now build a simple MT system using:
English language model English to French alignment model (IBM Model 1)
Canadian Hansard data Monotone Decoder
Greedy Or Viterbi
IBM Model 1
1... Ja a a1 2a 2 3a 3 4a 4 5a 5 6a 6 6a 7 6a
( |ˆ ( , | )1)1 jj
jaI
t fa e eP f
ˆ ˆ( | ) ( , | )a
P f e P f a e
Target:
Source:
One-to-Many Alignments
But there are other problems to think about as the following examples will show:
Problem: Many-to-One Alignments
Problem: Many-to-Many Alignments
Problem: Local Order Change
Le Japon est au confluent de quatre plaques tectoniques
Japan is at the junction of four tectonic plates
“Distortions”
Problem: More Distortions
Le tremblement de terre a fait 39 morts et 3,183 blessés.
The earthquake killed 39 and wounded 3,183.
Insights
How to include “distortion” in the model?
How to prefer nearby distortions over long-distance distortions?
IBM Model 2 Reminder: Model 1
Could model distortions without any strong assumptions about where they occur as a distribution over target language positions:
Could build a model as a distribution over distortion distances:
Matrix View of an Alignment
Preference for the Diagonal But alignments for some language pairs tend to the
diagonal in general: Can use a normal distribution for the distortion model
EM for Model 2 Model 2 Parameters:
Translation probabilities: Distortion parameters:
Initialize with Model 1 Initialize as uniform E-step: For each pair of sentences :
For each French position 1. Calculate posterior over English positions :
2. Increment count of word with word by these amounts:
3. Similarly, for each English position , update:
( | , , )( | , , )
( | )( | )
, )·
( |·
,i
j i
j i
t f et
a i j I Ja f
P i f e jj I J ei
( | , ,, )( )j i P i jf fC ee
| , , ; ,C i j J I f e
EM for Model 2 (cont.) M-step:
Re-estimate by normalizing these counts one conditional distribution for each context
Re-estimate by normalizing the earlier counts one conditional distribution per word e
Iterate until convergence of or a handful of times
See the directions for Project #5 on the course wiki for a more detailed version of this EM algorithm, including implementation tips.
Next
Even better alignment models
Evaluating alignment models
Evaluating translation end-to-end!