Jhu mt lecture_2014-04-15
-
Upload
alopezfoo -
Category
Technology
-
view
151 -
download
1
Embed Size (px)
description
Transcript of Jhu mt lecture_2014-04-15

Representing Huge Translation Models

Statistical Machine Translation
parallel text + alignment

Statistical Machine Translation
extractrules
parallel text + alignment

Statistical Machine Translation
extractrules
scorerules
parallel text + alignment

Statistical Machine Translation
extractrules
scorerules
联合国 安全 理事会 的 五个 常任 理事 国都
load rules into memory
decoder
parallel text + alignment

Statistical Machine Translation
extractrules
scorerules
联合国 安全 理事会 的 五个 常任 理事 国都
load rules into memory
decoder
parallel text + alignment
number of rulesdepends on
corpus size...

Statistical Machine Translation
extractrules
scorerules
联合国 安全 理事会 的 五个 常任 理事 国都
load rules into memory
decoder
parallel text + alignment
... and modelcomplexity

Statistical Machine Translation
parallel text + alignment
extractrules
scorerules
联合国 安全 理事会 的 五个 常任 理事 国都
load filtered rules
into memory
decoding algorithm
filter rules for test set

Baseline Translation Model
• Hierarchical Phrase-based translation (Chiang 2007)
• 1M parallel sentences (27M words)
• GIZA++ alignments (Och & Ney 2003, Koehn et al. 2003)
• alignments are dense
• Heuristics used to restrict number of extracted rules
• 67M rules, 6.1Gb of data
• cf. 225M (Zens & Ney 2007), 55M (DeNeefe et al. 2007)

Some Possible Improvements
• 3.5M sentences (2.5M out-of-domain), 100M words
• Discriminatively trained alignments (Ayan & Dorr 2006)
• Key difference: alignments are sparse
• Loose phrase extraction (Ayan & Dorr 2006)

Some Possible Improvements
• 3.5M sentences (2.5M out-of-domain), 100M words
• Discriminatively trained alignments (Ayan & Dorr 2006)
• Key difference: alignments are sparse
• Loose phrase extraction (Ayan & Dorr 2006)

Some Possible Improvements
• 3.5M sentences (2.5M out-of-domain), 100M words
• Discriminatively trained alignments (Ayan & Dorr 2006)
• Key difference: alignments are sparse
• Loose phrase extraction (Ayan & Dorr 2006)

Some Possible Improvements
• 3.5M sentences (2.5M out-of-domain), 100M words
• Discriminatively trained alignments (Ayan & Dorr 2006)
• Key difference: alignments are sparse
• Loose phrase extraction (Ayan & Dorr 2006)

Some Possible Improvements
• 3.5M sentences (2.5M out-of-domain), 100M words
• Discriminatively trained alignments (Ayan & Dorr 2006)
• Key difference: alignments are sparse
• Loose phrase extraction (Ayan & Dorr 2006)

Some Possible Improvements
• Rule extraction time: 77 CPU days
• does not include sorting or scoring!
• Rules counted: 20 billion
• 2 orders of magnitude larger than state of the art
• Estimated unique rules: 6.6 billion
• Estimated extract file size: 917Gb
• Estimated phrase table size: 600Gb

The Problem
• Current models are bounded by resource limitations.
• We’re already pushing the edge of what’s possible.
• Parallel data aren’t getting any smaller.
• Models aren’t getting any less complex.

The Solution
• Translation by pattern matching.
• Novel pattern matching algorithms.
• Exploit ideas developed in bioinformatics, IR
• Support for tera-scale translation models.

Idea: Translation by Pattern Matching(Callison-Burch et al. 05, Zhang & Vogel 05)
联合国 安全 理事会 的 五个 常任 理事 国都
decodingalgorithm
patternmatchingalgorithm
parallel text + alignment in memory
sentence-specific
rulesextract andscore

it persuades him and it disheartens him
Exact Pattern MatchingInput Pattern

it persuades him and it disheartens him
Exact Pattern MatchingInput Pattern=Query Pattern

it persuades him and it disheartens him
Pattern Matching for Phrase-Based MTInput Pattern

itpersuades
himand
disheartensit persuades
persuades himhim and
and itit disheartens
disheartens him
it persuades himpersuades him and
him and itand it disheartensit disheartens him
it persuades him andpersuades him and it
him and it disheartensand it disheartens himit persuades him and it
persuades him and it disheartenshim and it disheartens him
Pattern Matching for Phrase-Based MTit persuades him and it disheartens himInput Pattern
Query Patterns

Suffix Arraysit makes him and it mars him , it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Text T

Suffix Arraysit makes him and it mars him , it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
it mars him , it sets him on and it takes him off . #4
Text T
Suffix 4

Suffix Arraysit makes him and it mars him , it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
it makes him and it mars him , it sets him on and it takes him ...makes him and it mars him , it sets him on and it takes him off . #him and it mars him , it sets him on and it takes him off . #and it mars him , it sets him on and it takes him off . #it mars him , it sets him on and it takes him off . #mars him , it sets him on and it takes him off . #him , it sets him on and it takes him off . #, it sets him on and it takes him off . #
01234567
...

Suffix Arraysit makes him and it mars him , it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
31221510604
and it mars him , it sets him on and it takes him off . #and it takes him off . #him and it mars him , it sets him on and it takes him off . #him off . #him on and it takes him off . #him , it sets him on and it takes him off . #it makes him and it mars him , it sets him on and it takes him ...it mars him , it sets him on and it takes him off . #...

Suffix Arraysit makes him and it mars him , it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
and it mars him , it sets him on and it takes him off . #and it takes him off . #him and it mars him , it sets him on and it takes him off . #him off . #him on and it takes him off . #him , it sets him on and it takes him off . #it makes him and it mars him , it sets him on and it takes him ...it mars him , it sets him on and it takes him off . #...
31221510604
...

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18
Text T
Suffix Array SA

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Text T
Suffix Array SA
Query Pattern w
him and it
3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Text T
Suffix Array SA
Query Pattern w
him and it
3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Text T
Suffix Array SA
Query Pattern w
him and it
3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Text T
Suffix Array SA
Query Pattern w
him and it
3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Text T
Suffix Array SA
Query Pattern w
him and it
O(|w| log |T |)
3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Text T
Suffix Array SA
Query Pattern w
him and it (Manber & Myers, 93)
O(|w| log |T |)
O(|w| + log |T |)
3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Text T
Suffix Array SA
Query Pattern w
him and it (Manber & Myers, 93)
O(|w| log |T |)
O(|w| + log |T |)
O(|w|) (Abouelhoda et al., 04)
3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Text T
Suffix Array SA
Query Pattern w
him and it (Manber & Myers, 93)
O(|w| log |T |)
O(|w| + log |T |)
O(|w|) (Abouelhoda et al., 04)
3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18
on baseline model: 0.009 seconds/sentence
(not including extraction/scoring)

Problem: Phrases with Gaps
• Hierarchical phrase-based translation (Chiang 2005, 2007)
• Quirk et al. 2005, Simard et al. 2005, DeNeefe et al. 2007
it persuades him and it disheartens him
it X himSource Phrase
Input

Hierarchical Phrases: Phrases with Gaps
• Hierarchical phrase-based translation (Chiang 2005, 2007)
• Quirk et al. 2005, Simard et al. 2005, DeNeefe et al. 2007
it persuades him and it disheartens him
it X himSource Phrase
Input

Hierarchical Phrases: Phrases with Gaps
• Hierarchical phrase-based translation (Chiang 2005, 2007)
• Quirk et al. 2005, Simard et al. 2005, DeNeefe et al. 2007
it persuades him and it disheartens him
it X himSource Phrase
Input

Hierarchical Phrases: Phrases with Gaps
• Hierarchical phrase-based translation (Chiang 2005, 2007)
• Quirk et al. 2005, Simard et al. 2005, DeNeefe et al. 2007
it persuades him and it disheartens him
it X himSource Phrase
Input

Hierarchical Phrases: Phrases with Gaps
• Hierarchical phrase-based translation (Chiang 2005, 2007)
• Quirk et al. 2005, Simard et al. 2005, DeNeefe et al. 2007
it persuades him and it disheartens him
it X and X himSource Phrase
Input

Given an input sentence, efficiently find all hierarchical phrase-based translation rules for that
sentence in the training corpus.
Problem Statement

it persuades him and it disheartens him
Pattern Matching for Hierachical PBMTInput Pattern

itpersuades
himand
disheartensit persuades
persuades himhim and
and itit disheartens
disheartens him
it persuades himpersuades him and
him and itand it disheartensit disheartens him
it persuades him andpersuades him and it
him and it disheartensand it disheartens himit persuades him and it
persuades him and it disheartenshim and it disheartens him
Pattern Matching for Hierarchical PBMTit persuades him and it disheartens himInput Pattern
Query Patterns

Pattern Matching for Hierarchical PBMTit persuades him and it disheartens himInput Pattern
Query Patternsit X and
it X itit X disheartens
it X himpersuades X it
persuades X disheartenspersuades X himit persuades X it
it persuades X disheartensit persuades X him
it X and itit X it disheartens
it X disheartens himit X and X him
persuades him X disheartenspersuades him X him
persuades X it disheartenspersuades X disheartens him
him and X himhim X disheartens him
it persuades him X disheartensit persuades him X him
it persuades X it disheartensit persuades X disheartens him

Pattern Matching for Hierarchical PBMTit persuades him and it disheartens himInput Pattern
Query Patternsit X and it disheartensit X it disheartens him
persuades him and X himpersuades him X disheartens him
persuades X it disheartens himit persuades him and X him
it persuades him X disheartens himit persuades X it disheartens him
it X and it disheartens him

Pattern Matching for Hierarchical PBMTit persuades him and it disheartens himInput Pattern
Query Patternsit X and it disheartensit X it disheartens him
persuades him and X himpersuades him X disheartens him
persuades X it disheartens himit persuades him and X him
it persuades him X disheartens himit persuades X it disheartens him
it X and it disheartens him
This is a variant of approximate pattern matching (Navarro ‘01)

Pattern Matching with Gaps31221510604813
and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...1
him X it
Query pattern
...
!

Pattern Matching with Gaps
him X it
Query pattern !
312215106048131
...
and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

Pattern Matching with Gaps
him X it
Query pattern !
312215106048131
...
and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

Pattern Matching with Gaps
him X it
Query pattern
himit
!
Subpatterns wi
312215106048131
...
and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

Pattern Matching with Gaps
him X it
Query pattern
himit
!
Subpatterns wi
312215106048131
...
and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

Pattern Matching with Gaps
him X it
Query pattern
himit
!
Subpatterns wi
ni Occurrences
312215106048131
...
and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

Pattern Matching with Gaps312215106048131
...
and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...
215106
04813

Pattern Matching with Gaps
215106
04813

Pattern Matching with Gaps
215106
04813
(2, 4)(2, 8)(2, 13)(6, 8)(6, 13)(10, 13)

Pattern Matching with Gaps
215106
04813
RILMS (Rahman et al., 06)
(2, 4)(2, 8)(2, 13)(6, 8)(6, 13)(10, 13)

Pattern Matching with Gaps
215106
04813
RILMS (Rahman et al., 06)
O(!
i
ni)linear in number of occurrences of subpatterns:
(2, 4)(2, 8)(2, 13)(6, 8)(6, 13)(10, 13)

221 seconds
Baseline Timing Result
per sentence
compare: 0.009 seconds per sentencefor contiguous phrases

137 5 27
!
!=w1X...XwI
I!
i=1
(|wi| + log |T | + ni)
!
w
(|w| + log |T |)
2825 3 5 27 82069
contiguous
discontiguous
Complexity Analysis

137 5 27
!
!=w1X...XwI
I!
i=1
(|wi| + log |T | + ni)
!
w
(|w| + log |T |)
2825 3 5 27 82069
contiguous
discontiguous
Complexity Analysis

Exploiting Redundancyit persuades him and it disheartens himInput Pattern
Query Patternsit X and
it X itit X disheartens
it X himpersuades X it
persuades X disheartenspersuades X himit persuades X it
it persuades X disheartensit persuades X him
it X and itit X it disheartens
it X disheartens himit X and X him
persuades him X disheartenspersuades him X him
persuades X it disheartenspersuades X disheartens him
him and X himhim X disheartens him
it persuades him X disheartensit persuades him X him
it persuades X it disheartensit persuades X disheartens him

Exploiting Redundancyit persuades him and it disheartens himInput Pattern
Query Patternsit X and
it X itit X disheartens
it X himpersuades X it
persuades X disheartenspersuades X himit persuades X it
it persuades X disheartensit persuades X him
it X and itit X it disheartens
it X disheartens himit X and X him
persuades him X disheartenspersuades him X him
persuades X it disheartenspersuades X disheartens him
him and X himhim X disheartens him
it persuades him X disheartensit persuades him X him
it persuades X it disheartensit persuades X disheartens him

Exploiting Redundancy
it persuades X disheartens himQuery Pattern

Exploiting Redundancy
it persuades X disheartens himQuery Patternit persuades X disheartensMaximal Prefix
(Zhang & Vogel 2005)

Exploiting Redundancy
it persuades X disheartens himQuery Patternit persuades X disheartens
persuades X disheartens himMaximal PrefixMaximal Suffix

Prefix Tree with Suffix Links
it
persuadeshim
Xhim
him
persuades
X
him
him

221
Baseline
seconds/sentence
Timing Results

177
221
Baseline PrefixTree
seconds/sentence
Timing Results

137 5 27
!
!=w1X...XwI
I!
i=1
(|wi| + log |T | + ni)
!
w
(|w| + log |T |)
2825 3 5 27 82069
contiguous
discontiguous
Complexity Analysis

137 5 27
!
!=w1X...XwI
I!
i=1
(|wi| + log |T | + ni)
!
w
(|w| + log |T |)
2825 3 5 27 82069
contiguous
discontiguous
Complexity Analysis

computations (ranked by time)
cum
ulat
ive
time
(s)
Empirical Analysis

Distribution of Patterns in Training DataFr
eque
ncy
Pattern types (in descending order of frequency)

Distribution of Patterns in Training DataFr
eque
ncy
Pattern types (in descending order of frequency)

Analysis of Problem
• The expensive computations involve at least one frequent subpattern. There are two cases.
• A frequent pattern paired with an infrequent pattern
• Two frequent patterns paired with each other

Frequent × Infrequent Subpatterns

Frequent × Infrequent Subpatterns

Frequent × Infrequent Subpatterns

Frequent × Infrequent Subpatterns

Double Binary SearchBaeza-Yates, 04

Double Binary SearchBaeza-Yates, 04
Queryset Q Dataset D

Double Binary SearchBaeza-Yates, 04
Queryset Q Dataset D

Double Binary SearchBaeza-Yates, 04
Queryset Q Dataset D

Double Binary SearchBaeza-Yates, 04
Queryset Q Dataset D

Double Binary SearchBaeza-Yates, 04
Queryset Q Dataset D

Double Binary SearchBaeza-Yates, 04
Queryset Q Dataset D

Double Binary SearchBaeza-Yates, 04
Queryset Q Dataset D
complexity: |Q| log |D|Upper bound

Obtaining Sorted Sets

Obtaining Sorted Sets
Sort via Stratified Tree(van Emde Boas et al. 1977)

Obtaining Sorted Sets
Problem: complexity increases to O(|Q| log |D| + (|Q| + |D|) log log |T |)
Sort via Stratified Tree(van Emde Boas et al. 1977)

Obtaining Sorted Sets
Problem: complexity increases to
Solution: cache sorted set in prefix
tree
O(|Q| log |D| + (|Q| + |D|) log log |T |)
Sort via Stratified Tree(van Emde Boas et al. 1977)

177
221
Baseline PrefixTree
+ double binary
seconds/sentence
Timing Results

174177
221
Baseline PrefixTree
+ double binary
seconds/sentence
Timing Results

Obtaining Sorted Sets
Sort via Stratified Tree
Problem: sort complexityis still very high for very
frequent patterns

Obtaining Sorted Sets
Solution: precompute theinverted index for 1000 most
frequent contiguous patterns

174177
221
Baseline PrefixTree
+ double binary
seconds/sentence
Timing Results

44
174177
221
Baseline PrefixTree
+ double binary
seconds/sentence
Timing Results
+ inverted indices

Frequent × Frequent Subpatterns

Frequent × Frequent Subpatterns
Problem:There is no clever algorithm to
solve this problem

Solution: Precomputationit makes him and it mars him . it sets him on and it takes him off . #it makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Text

Solution: Precomputationit makes him and it mars him . it sets him on and it takes him off . #
Most Frequent Patterns
it makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Text
it (4)him (4)
Precomputed Pattern Matchesit X him him X it
it X it him X him

Solution: Precomputationit makes him and it mars him . it sets him on and it takes him off . #
Most Frequent Patterns
it makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Text
it (4)him (4)
Precomputed Pattern Matchesit X him him X it
it X it him X him
(0, 2)
(0, 4)
(0, 6)
(0, 8)
(2, 4)
(2, 6)
(2, 8)
(2, 10)
(4, 6)
(4, 8)
(4, 10)
(4, 13)
(6, 8)
(6, 10)
(6, 13)
(6, 15)
(8, 10)
(8, 13)
(8, 15)(10, 13)
(10, 15)
(13, 15)

44
174177
221
Baseline PrefixTree
+ double binary
seconds/sentence
Timing Results
+ inverted indices

1
44
174177
221
Baseline PrefixTree
+ double binary
seconds/sentence
Timing Results
+ inverted indices
+ precomp

Analysis of Fixed Memory Usage
• Source Text: |T|
• Suffix Array: |T|
• Alignments: |T|
• Target Text: |T|
• Total Cost: 4 |T|
• For 27M words: about 700M
• including indices for 1000 words: about 2.1 Gb
• for 100 words: 1.1Gb, increases time to 1.6 secs/sent

Longer Spans, Longer Phrases
1520253035
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1520253035
1 2 3 4 5 6 7 8 9 10
Maximum Span Length
Maximum Phrase Length
BLEU
BLEU

The Tera-Scale Translation Model
• Task: NIST Chinese-English 2005
• Baseline Model: 30.7
• Tera-Scale Model: 32.6
• All modifications contribute to overall score
• With better language model and number translation:
• Baseline Model: 31.9
• Tera-Scale Model: 34.5

Open Questions
• Can we improve speed?
• Can we improve memory use? Compressed self-indexes?
• Uses for arbitrarily large translation models?
• Context-sensitive models (Chan et al. 2007, Carpuat & Wu 2007)
• Factored models (Koehn et al. 2007)
• Syntax-based model (DeNeefe et al. 2007)
• What other algorithms can we use from bioinformatics?

Compact Adaptable TMs on GPUs
• Technique: massively parallel string pattern matching algorithms for...
• Streaming MT (Levenberg et al. 2010)
• Interactive MT (Denkowski et al. 2014)
• Low-memory MT (Lopez, 2008)
• Online learning (Simianer et al. 2012)
• New in this talk:
• New translation models
• New algorithms
• State-of-the-art translation accuracy

Streaming Translation Models
• Align new training data.
• Append to existing training data.
• Extract new phrase pairs as needed.
(Levenberg et al. 2010)

Streaming Translation Models
• Align new training data → online EM. FAST!
• Append to existing training data. FAST!
• Extract new phrase pairs as needed. SLOW...
(Levenberg et al. 2010)
For each phrase pair J:For each occurrence of J:
Extract its translation E (if possible)For each phrase pair J,E:
Compute p(E|J)

Streaming Translation Models
• Align new training data → online EM. FAST!
• Append to existing training data. FAST!
• Extract new phrase pairs as needed. SLOW...
(Levenberg et al. 2010)
For each phrase pair J:For each occurrence of J:
Extract its translation E (if possible)For each phrase pair J,E:
Compute p(E|J)

Streaming Translation Models
• Align new training data → online EM. FAST!
• Append to existing training data. FAST!
• Extract new phrase pairs as needed. SLOW...
(Levenberg et al. 2010)
For each phrase pair J:For each occurrence of J:
Extract its translation E (if possible)For each phrase pair J,E:
Compute p(E|J)
... But easily parallelizable. GPUs to the rescue!

Streaming Translation Models(Levenberg et al. 2010)
For each phrase pair J:For each occurrence of J:
Extract its translation E (if possible)For each phrase pair J,E:
Compute p(E|J)
In parallel on GPU://find all
// extract all

GPUs
• Graphics Processing Unit
• Single Instruction, Multiple Data
• Limited Memory

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #
3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
itpersuades
himand
disheartensit persuades
persuades himhim and
and itit disheartens
disheartens him
it persuades himpersuades him and
him and itand it disheartensit disheartens him
it persuades him andpersuades him and it
him and it disheartensand it disheartens himit persuades him and it
persuades him and it disheartens
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #
3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
itpersuades
himand
disheartensit persuades
persuades himhim and
and itit disheartens
disheartens him
it persuades himpersuades him and
him and itand it disheartensit disheartens him
it persuades him andpersuades him and it
him and it disheartensand it disheartens himit persuades him and it
persuades him and it disheartens
6, 10
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #
3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
itpersuades
himand
disheartensit persuades
persuades himhim and
and itit disheartens
disheartens him
it persuades himpersuades him and
him and itand it disheartensit disheartens him
it persuades him andpersuades him and it
him and it disheartensand it disheartens himit persuades him and it
persuades him and it disheartens
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
6, 10
2, 3
0, 2

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #
3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
itpersuades
himand
disheartensit persuades
persuades himhim and
and itit disheartens
disheartens him
it persuades himpersuades him and
him and itand it disheartensit disheartens him
it persuades him andpersuades him and it
him and it disheartensand it disheartens himit persuades him and it
persuades him and it disheartens
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
6, 10
2, 3
0, 2
Parallel pass 1:Find longest substrings
simultaneously

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #
3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
itpersuades
himand
disheartensit persuades
persuades himhim and
and itit disheartens
disheartens him
it persuades himpersuades him and
him and itand it disheartensit disheartens him
it persuades him andpersuades him and it
him and it disheartensand it disheartens himit persuades him and it
persuades him and it disheartens
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
6, 10
2, 6
2, 3
2, 3
0, 2
0, 2

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #
3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
itpersuades
himand
disheartensit persuades
persuades himhim and
and itit disheartens
disheartens him
it persuades himpersuades him and
him and itand it disheartensit disheartens him
it persuades him andpersuades him and it
him and it disheartensand it disheartens himit persuades him and it
persuades him and it disheartens
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
6, 10
2, 6
2, 3
2, 3
0, 2
0, 2

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #
3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
itpersuades
himand
disheartensit persuades
persuades himhim and
and itit disheartens
disheartens him
it persuades himpersuades him and
him and itand it disheartensit disheartens him
it persuades him andpersuades him and it
him and it disheartensand it disheartens himit persuades him and it
persuades him and it disheartens
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
6, 10
2, 6
2, 3
2, 3
0, 2
0, 2
Parallel pass 2:Find all substrings
simultaneously

Pattern Matching with Gaps312215106048131
...
and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...
him _ it
Query pattern:
himit
Subpatterns:

Pattern Matching with Gaps312215106048131
...
and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...
215106
04813

Pattern Matching with Gaps
215106
04813

Pattern Matching with Gaps
215106
04813
(2, 4)(2, 8)(2, 13)(6, 8)(6, 13)(10, 13)
it makes him and it mars him . it sets him on and it takes him off . # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
on CPU

Pattern Matching with Gaps
215106
04813
(2, 4)(2, 8)(2, 13)(6, 8)(6, 13)(10, 13)
it makes him and it mars him . it sets him on and it takes him off . # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
on CPU

Pattern Matching with Gaps
215106
04813
(2, 4)(10, 13)
it makes him and it mars him . it sets him on and it takes him off . # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
on CPU

Pattern Matching with Gaps on GPU
215106
it makes him and it mars him . it sets him on and it takes him off . # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Pattern Matching with Gaps on GPU
215106
it makes him and it mars him . it sets him on and it takes him off . # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Thread 1
Thread 4Thread 3
Thread 2

Pattern Matching with Gaps on GPU
215106
it makes him and it mars him . it sets him on and it takes him off . # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Thread 1
Thread 4Thread 3
Thread 2(2,4)
(10,13)

Phrase ExtractionI open the box
watashi
wa
hako
wo
akemasu

Phrase ExtractionI open the box
watashi
wa
hako
wo
akemasu
akemasu / open

Phrase ExtractionI open the box
watashi
wa
hako
wo
akemasu
watashi wa / I

Phrase ExtractionI open the box
watashi
wa
hako
wo
akemasu
hako wo / the box

Phrase ExtractionI open the box
watashi
wa
hako
wo
akemasu
hako wo / open the box✘

Phrase ExtractionI open the box
watashi
wa
hako
wo
akemasu
hako wo akemasu / open the box

Hierarchical Phrase ExtractionI open the box
watashi
wa
hako
wo
akemasu
_ akemasu / open _

Hierarchical Phrase ExtractionI open the box
watashi
wa
hako
wo
akemasu
_ akemasu / open _

Hierarchical Phrase ExtractionI open the box
watashi
wa
hako
wo
akemasu
_1 _2 akemasu/ _1 open _2

Hierarchical Phrase ExtractionI open the box
watashi
wa
hako
wo
akemasu
watashi wa hako wo _ / I _ the box

Phrase Extraction on GPUsI open the box
watashi
wa
hako
wo
akemasu
0,1
0,1
3,4
3,4
1,2
0,2 4,5 -,- 2,4
L,R
L’,R’
0
0
1 2 3
1
2
3
4
45

Phrase Extraction on GPUsI open the box
watashi
wa
hako
wo
akemasu
0,1
0,1
3,4
3,4
1,2
0,2 4,5 -,- 2,4
L,R
L’,R’
watashi wa / ???
0
0
1 2 3
1
2
3
4
45

Phrase Extraction on GPUsI open the box
watashi
wa
hako
wo
akemasu
0,1
0,1
3,4
3,4
1,2
0,2 4,5 -,- 2,4
L,R
L’,R’
watashi wa / ???
0
0
1 2 3
1
2
3
4
45

Phrase Extraction on GPUsI open the box
watashi
wa
hako
wo
akemasu0
0
1 2 3
1
2
3
4
0,1
0,1
3,4
3,4
1,2
0,2 4,5 -,- 2,4
watashi wa / ???
L,R
L’,R’
45

Phrase Extraction on GPUsI open the box
watashi
wa
hako
wo
akemasu0
0
1 2 3
1
2
3
4
0,1
0,1
3,4
3,4
1,2
0,2 4,5 -,- 2,4
watashi wa / ???
L,R
L’,R’
45

Phrase Extraction on GPUsI open the box
watashi
wa
hako
wo
akemasu0
0
1 2 3
1
2
3
4
0,1
0,1
3,4
3,4
1,2
0,2 4,5 -,- 2,4
watashi wa / I
L,R
L’,R’
45

Phrase Extraction on GPUsI open the box
watashi
wa
hako
wo
akemasu0
0
1 2 3
1
2
3
4
0,1
0,1
3,4
3,4
1,2
0,2 4,5 -,- 2,4
wo akemasu / ???
L,R
L’,R’
45

Phrase Extraction on GPUsI open the box
watashi
wa
hako
wo
akemasu0
0
1 2 3
1
2
3
4
0,1
0,1
3,4
3,4
1,2
0,2 4,5 -,- 2,4
wo akemasu / ???
L,R
L’,R’
45

Phrase Extraction on GPUsI open the box
watashi
wa
hako
wo
akemasu0
0
1 2 3
1
2
3
4
0,1
0,1
3,4
3,4
1,2
0,2 4,5 -,- 2,4
wo akemasu / ???
L,R
L’,R’
45

Phrase Extraction on GPUsI open the box
watashi
wa
hako
wo
akemasu0
0
1 2 3
1
2
3
4
0,1
0,1
3,4
3,4
1,2
0,2 4,5 -,- 2,4
wo akemasu / ???
L,R
L’,R’
45
✘

Phrase Extraction on GPUsI open the box
watashi
wa
hako
wo
akemasu0
0
1 2 3
1
2
3
4
0,1
0,1
3,4
3,4
1,2
0,2 4,5 -,- 2,4
wo akemasu / ???
L,R
L’,R’
45
✘
Compute i’,j’ simultaneously for all (sampled) occurrences

Representing L and R ArraysI open the box
watashi
wa
hako
wo
akemasu
0,1
0,1
3,4
3,4
1,2
0,2 4,5 -,- 2,4
L,R
L’,R’
0
0
1 2 3
1
2
3
4
45
Fiction!

Representing L and R ArraysI open the box
watashi
wa
hako
wo
akemasu
0,1
0,1
3,4
3,4
1,2
0,2 4,5 -,- 2,4
L,R
L’,R’
7240736
7240737
7240738
Reality!
7240739
7240740
72407417582971
7582972
7582973
7582974
7582975

Representing L and R Arrays
watashi wa hako wo akemasu0,1 0,1 3,4 3,4 1,2
7240736 7240737 7240738 7240739 7240740
wordR,L
i

Representing L and R Arrays
watashi wa hako wo akemasu0,1 0,1 3,4 3,4 1,2
7240736 7240737 7240738 7240739 7240740
wordR,L
i
#7240735

Representing L and R Arrays
watashi wa hako wo akemasu0,1 0,1 3,4 3,4 1,2
7240736 7240737 7240738 7240739 7240740
wordR,L
i
#7240735
P 1 2 3 4 5

Representing L and R Arrays
watashi wa hako wo akemasu0,1 0,1 3,4 3,4 1,2
7240736 7240737 7240738 7240739 7240740
wordR,L
i
#7240735
P 1 2 3 4 5
7582971

Representing L and R Arrays
watashi wa hako wo akemasu0,1 0,1 3,4 3,4 1,2
7240736 7240737 7240738 7240739 7240740
wordR,L
i
#7240735
P 1 2 3 4 5
7582971

Representing L and R Arrays
watashi wa hako wo akemasu0,1 0,1 3,4 3,4 1,2
7240736 7240737 7240738 7240739 7240740
wordR,L
i
#7240735
P 1 2 3 4 5
7582971

Experiments
• NIST ’05-’06 Chinese English, train: Xinhua + UN
• Baseline: very fast CPU implementation mostly in C (Lopez 2007, 2008; Chahuneau et al. 2012)
• State-of-the art cdec MT system (Dyer et al. 2010)
• GPU: NVIDIA Tesla K20c (2496 cores, 5Gb memory)
• CPU: dual 2.90GHz Intel Xeon E5-2690

Phrase Extraction Speed
0
750
1500
2250
3000
CPU GPU (2000)
GPU (4000)
GPU (6000)
GPU (8000)
571514454400
5.4
20011794
16131356
14.2
Xinhua (27M words) Xinhua + UN (108M words)
wor
ds p
er se
cond

Translation Accuracy
• Without hierarchical phrases: 30.7
• With hierarchical phrases on CPU: 33.37
• With hierarchical phrases on GPU: 33.46

End-to-end Translation Speed
• With hierarchical phrases on CPU: 180.5s
• With hierarchical phrases on GPU: 98.9s

Conclusions• GPU pattern matching & extraction algorithms
• Fast adaptable MT with streaming models
• Low-memory footprint MT
• State of the art translation accuracy
• Language models on GPUs?
• Decoding on GPUs?