Jhu mt lecture_2014-04-15

of 165 /165
Representing Huge Translation Models

Embed Size (px)

description

JHU MT class: Representing huge translation models

Transcript of Jhu mt lecture_2014-04-15

Page 1: Jhu mt lecture_2014-04-15

Representing Huge Translation Models

Page 2: Jhu mt lecture_2014-04-15

Statistical Machine Translation

parallel text + alignment

Page 3: Jhu mt lecture_2014-04-15

Statistical Machine Translation

extractrules

parallel text + alignment

Page 4: Jhu mt lecture_2014-04-15

Statistical Machine Translation

extractrules

scorerules

parallel text + alignment

Page 5: Jhu mt lecture_2014-04-15

Statistical Machine Translation

extractrules

scorerules

联合国 安全 理事会 的 五个 常任 理事 国都

load rules into memory

decoder

parallel text + alignment

Page 6: Jhu mt lecture_2014-04-15

Statistical Machine Translation

extractrules

scorerules

联合国 安全 理事会 的 五个 常任 理事 国都

load rules into memory

decoder

parallel text + alignment

number of rulesdepends on

corpus size...

Page 7: Jhu mt lecture_2014-04-15

Statistical Machine Translation

extractrules

scorerules

联合国 安全 理事会 的 五个 常任 理事 国都

load rules into memory

decoder

parallel text + alignment

... and modelcomplexity

Page 8: Jhu mt lecture_2014-04-15

Statistical Machine Translation

parallel text + alignment

extractrules

scorerules

联合国 安全 理事会 的 五个 常任 理事 国都

load filtered rules

into memory

decoding algorithm

filter rules for test set

Page 9: Jhu mt lecture_2014-04-15

Baseline Translation Model

• Hierarchical Phrase-based translation (Chiang 2007)

• 1M parallel sentences (27M words)

• GIZA++ alignments (Och & Ney 2003, Koehn et al. 2003)

• alignments are dense

• Heuristics used to restrict number of extracted rules

• 67M rules, 6.1Gb of data

• cf. 225M (Zens & Ney 2007), 55M (DeNeefe et al. 2007)

Page 10: Jhu mt lecture_2014-04-15

Some Possible Improvements

• 3.5M sentences (2.5M out-of-domain), 100M words

• Discriminatively trained alignments (Ayan & Dorr 2006)

• Key difference: alignments are sparse

• Loose phrase extraction (Ayan & Dorr 2006)

Page 11: Jhu mt lecture_2014-04-15

Some Possible Improvements

• 3.5M sentences (2.5M out-of-domain), 100M words

• Discriminatively trained alignments (Ayan & Dorr 2006)

• Key difference: alignments are sparse

• Loose phrase extraction (Ayan & Dorr 2006)

Page 12: Jhu mt lecture_2014-04-15

Some Possible Improvements

• 3.5M sentences (2.5M out-of-domain), 100M words

• Discriminatively trained alignments (Ayan & Dorr 2006)

• Key difference: alignments are sparse

• Loose phrase extraction (Ayan & Dorr 2006)

Page 13: Jhu mt lecture_2014-04-15

Some Possible Improvements

• 3.5M sentences (2.5M out-of-domain), 100M words

• Discriminatively trained alignments (Ayan & Dorr 2006)

• Key difference: alignments are sparse

• Loose phrase extraction (Ayan & Dorr 2006)

Page 14: Jhu mt lecture_2014-04-15

Some Possible Improvements

• 3.5M sentences (2.5M out-of-domain), 100M words

• Discriminatively trained alignments (Ayan & Dorr 2006)

• Key difference: alignments are sparse

• Loose phrase extraction (Ayan & Dorr 2006)

Page 15: Jhu mt lecture_2014-04-15

Some Possible Improvements

• Rule extraction time: 77 CPU days

• does not include sorting or scoring!

• Rules counted: 20 billion

• 2 orders of magnitude larger than state of the art

• Estimated unique rules: 6.6 billion

• Estimated extract file size: 917Gb

• Estimated phrase table size: 600Gb

Page 16: Jhu mt lecture_2014-04-15

The Problem

• Current models are bounded by resource limitations.

• We’re already pushing the edge of what’s possible.

• Parallel data aren’t getting any smaller.

• Models aren’t getting any less complex.

Page 17: Jhu mt lecture_2014-04-15

The Solution

• Translation by pattern matching.

• Novel pattern matching algorithms.

• Exploit ideas developed in bioinformatics, IR

• Support for tera-scale translation models.

Page 18: Jhu mt lecture_2014-04-15

Idea: Translation by Pattern Matching(Callison-Burch et al. 05, Zhang & Vogel 05)

联合国 安全 理事会 的 五个 常任 理事 国都

decodingalgorithm

patternmatchingalgorithm

parallel text + alignment in memory

sentence-specific

rulesextract andscore

Page 19: Jhu mt lecture_2014-04-15

it persuades him and it disheartens him

Exact Pattern MatchingInput Pattern

Page 20: Jhu mt lecture_2014-04-15

it persuades him and it disheartens him

Exact Pattern MatchingInput Pattern=Query Pattern

Page 21: Jhu mt lecture_2014-04-15

it persuades him and it disheartens him

Pattern Matching for Phrase-Based MTInput Pattern

Page 22: Jhu mt lecture_2014-04-15

itpersuades

himand

disheartensit persuades

persuades himhim and

and itit disheartens

disheartens him

it persuades himpersuades him and

him and itand it disheartensit disheartens him

it persuades him andpersuades him and it

him and it disheartensand it disheartens himit persuades him and it

persuades him and it disheartenshim and it disheartens him

Pattern Matching for Phrase-Based MTit persuades him and it disheartens himInput Pattern

Query Patterns

Page 23: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him , it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text T

Page 24: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him , it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

it mars him , it sets him on and it takes him off . #4

Text T

Suffix 4

Page 25: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him , it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

it makes him and it mars him , it sets him on and it takes him ...makes him and it mars him , it sets him on and it takes him off . #him and it mars him , it sets him on and it takes him off . #and it mars him , it sets him on and it takes him off . #it mars him , it sets him on and it takes him off . #mars him , it sets him on and it takes him off . #him , it sets him on and it takes him off . #, it sets him on and it takes him off . #

01234567

...

Page 26: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him , it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

31221510604

and it mars him , it sets him on and it takes him off . #and it takes him off . #him and it mars him , it sets him on and it takes him off . #him off . #him on and it takes him off . #him , it sets him on and it takes him off . #it makes him and it mars him , it sets him on and it takes him ...it mars him , it sets him on and it takes him off . #...

Page 27: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him , it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

and it mars him , it sets him on and it takes him off . #and it takes him off . #him and it mars him , it sets him on and it takes him off . #him off . #him on and it takes him off . #him , it sets him on and it takes him off . #it makes him and it mars him , it sets him on and it takes him ...it mars him , it sets him on and it takes him off . #...

31221510604

...

Page 28: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Text T

Suffix Array SA

Page 29: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text T

Suffix Array SA

Query Pattern w

him and it

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Page 30: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text T

Suffix Array SA

Query Pattern w

him and it

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Page 31: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text T

Suffix Array SA

Query Pattern w

him and it

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Page 32: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text T

Suffix Array SA

Query Pattern w

him and it

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Page 33: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text T

Suffix Array SA

Query Pattern w

him and it

O(|w| log |T |)

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Page 34: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text T

Suffix Array SA

Query Pattern w

him and it (Manber & Myers, 93)

O(|w| log |T |)

O(|w| + log |T |)

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Page 35: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text T

Suffix Array SA

Query Pattern w

him and it (Manber & Myers, 93)

O(|w| log |T |)

O(|w| + log |T |)

O(|w|) (Abouelhoda et al., 04)

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Page 36: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text T

Suffix Array SA

Query Pattern w

him and it (Manber & Myers, 93)

O(|w| log |T |)

O(|w| + log |T |)

O(|w|) (Abouelhoda et al., 04)

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

on baseline model: 0.009 seconds/sentence

(not including extraction/scoring)

Page 37: Jhu mt lecture_2014-04-15

Problem: Phrases with Gaps

• Hierarchical phrase-based translation (Chiang 2005, 2007)

• Quirk et al. 2005, Simard et al. 2005, DeNeefe et al. 2007

it persuades him and it disheartens him

it X himSource Phrase

Input

Page 38: Jhu mt lecture_2014-04-15

Hierarchical Phrases: Phrases with Gaps

• Hierarchical phrase-based translation (Chiang 2005, 2007)

• Quirk et al. 2005, Simard et al. 2005, DeNeefe et al. 2007

it persuades him and it disheartens him

it X himSource Phrase

Input

Page 39: Jhu mt lecture_2014-04-15

Hierarchical Phrases: Phrases with Gaps

• Hierarchical phrase-based translation (Chiang 2005, 2007)

• Quirk et al. 2005, Simard et al. 2005, DeNeefe et al. 2007

it persuades him and it disheartens him

it X himSource Phrase

Input

Page 40: Jhu mt lecture_2014-04-15

Hierarchical Phrases: Phrases with Gaps

• Hierarchical phrase-based translation (Chiang 2005, 2007)

• Quirk et al. 2005, Simard et al. 2005, DeNeefe et al. 2007

it persuades him and it disheartens him

it X himSource Phrase

Input

Page 41: Jhu mt lecture_2014-04-15

Hierarchical Phrases: Phrases with Gaps

• Hierarchical phrase-based translation (Chiang 2005, 2007)

• Quirk et al. 2005, Simard et al. 2005, DeNeefe et al. 2007

it persuades him and it disheartens him

it X and X himSource Phrase

Input

Page 42: Jhu mt lecture_2014-04-15

Given an input sentence, efficiently find all hierarchical phrase-based translation rules for that

sentence in the training corpus.

Problem Statement

Page 43: Jhu mt lecture_2014-04-15

it persuades him and it disheartens him

Pattern Matching for Hierachical PBMTInput Pattern

Page 44: Jhu mt lecture_2014-04-15

itpersuades

himand

disheartensit persuades

persuades himhim and

and itit disheartens

disheartens him

it persuades himpersuades him and

him and itand it disheartensit disheartens him

it persuades him andpersuades him and it

him and it disheartensand it disheartens himit persuades him and it

persuades him and it disheartenshim and it disheartens him

Pattern Matching for Hierarchical PBMTit persuades him and it disheartens himInput Pattern

Query Patterns

Page 45: Jhu mt lecture_2014-04-15

Pattern Matching for Hierarchical PBMTit persuades him and it disheartens himInput Pattern

Query Patternsit X and

it X itit X disheartens

it X himpersuades X it

persuades X disheartenspersuades X himit persuades X it

it persuades X disheartensit persuades X him

it X and itit X it disheartens

it X disheartens himit X and X him

persuades him X disheartenspersuades him X him

persuades X it disheartenspersuades X disheartens him

him and X himhim X disheartens him

it persuades him X disheartensit persuades him X him

it persuades X it disheartensit persuades X disheartens him

Page 46: Jhu mt lecture_2014-04-15

Pattern Matching for Hierarchical PBMTit persuades him and it disheartens himInput Pattern

Query Patternsit X and it disheartensit X it disheartens him

persuades him and X himpersuades him X disheartens him

persuades X it disheartens himit persuades him and X him

it persuades him X disheartens himit persuades X it disheartens him

it X and it disheartens him

Page 47: Jhu mt lecture_2014-04-15

Pattern Matching for Hierarchical PBMTit persuades him and it disheartens himInput Pattern

Query Patternsit X and it disheartensit X it disheartens him

persuades him and X himpersuades him X disheartens him

persuades X it disheartens himit persuades him and X him

it persuades him X disheartens himit persuades X it disheartens him

it X and it disheartens him

This is a variant of approximate pattern matching (Navarro ‘01)

Page 48: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps31221510604813

and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...1

him X it

Query pattern

...

!

Page 49: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps

him X it

Query pattern !

312215106048131

...

and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

Page 50: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps

him X it

Query pattern !

312215106048131

...

and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

Page 51: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps

him X it

Query pattern

himit

!

Subpatterns wi

312215106048131

...

and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

Page 52: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps

him X it

Query pattern

himit

!

Subpatterns wi

312215106048131

...

and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

Page 53: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps

him X it

Query pattern

himit

!

Subpatterns wi

ni Occurrences

312215106048131

...

and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

Page 54: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps312215106048131

...

and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

215106

04813

Page 55: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps

215106

04813

Page 56: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps

215106

04813

(2, 4)(2, 8)(2, 13)(6, 8)(6, 13)(10, 13)

Page 57: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps

215106

04813

RILMS (Rahman et al., 06)

(2, 4)(2, 8)(2, 13)(6, 8)(6, 13)(10, 13)

Page 58: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps

215106

04813

RILMS (Rahman et al., 06)

O(!

i

ni)linear in number of occurrences of subpatterns:

(2, 4)(2, 8)(2, 13)(6, 8)(6, 13)(10, 13)

Page 59: Jhu mt lecture_2014-04-15

221 seconds

Baseline Timing Result

per sentence

compare: 0.009 seconds per sentencefor contiguous phrases

Page 60: Jhu mt lecture_2014-04-15

137 5 27

!

!=w1X...XwI

I!

i=1

(|wi| + log |T | + ni)

!

w

(|w| + log |T |)

2825 3 5 27 82069

contiguous

discontiguous

Complexity Analysis

Page 61: Jhu mt lecture_2014-04-15

137 5 27

!

!=w1X...XwI

I!

i=1

(|wi| + log |T | + ni)

!

w

(|w| + log |T |)

2825 3 5 27 82069

contiguous

discontiguous

Complexity Analysis

Page 62: Jhu mt lecture_2014-04-15

Exploiting Redundancyit persuades him and it disheartens himInput Pattern

Query Patternsit X and

it X itit X disheartens

it X himpersuades X it

persuades X disheartenspersuades X himit persuades X it

it persuades X disheartensit persuades X him

it X and itit X it disheartens

it X disheartens himit X and X him

persuades him X disheartenspersuades him X him

persuades X it disheartenspersuades X disheartens him

him and X himhim X disheartens him

it persuades him X disheartensit persuades him X him

it persuades X it disheartensit persuades X disheartens him

Page 63: Jhu mt lecture_2014-04-15

Exploiting Redundancyit persuades him and it disheartens himInput Pattern

Query Patternsit X and

it X itit X disheartens

it X himpersuades X it

persuades X disheartenspersuades X himit persuades X it

it persuades X disheartensit persuades X him

it X and itit X it disheartens

it X disheartens himit X and X him

persuades him X disheartenspersuades him X him

persuades X it disheartenspersuades X disheartens him

him and X himhim X disheartens him

it persuades him X disheartensit persuades him X him

it persuades X it disheartensit persuades X disheartens him

Page 64: Jhu mt lecture_2014-04-15

Exploiting Redundancy

it persuades X disheartens himQuery Pattern

Page 65: Jhu mt lecture_2014-04-15

Exploiting Redundancy

it persuades X disheartens himQuery Patternit persuades X disheartensMaximal Prefix

(Zhang & Vogel 2005)

Page 66: Jhu mt lecture_2014-04-15

Exploiting Redundancy

it persuades X disheartens himQuery Patternit persuades X disheartens

persuades X disheartens himMaximal PrefixMaximal Suffix

Page 67: Jhu mt lecture_2014-04-15

Prefix Tree with Suffix Links

it

persuadeshim

Xhim

him

persuades

X

him

him

Page 68: Jhu mt lecture_2014-04-15

221

Baseline

seconds/sentence

Timing Results

Page 69: Jhu mt lecture_2014-04-15

177

221

Baseline PrefixTree

seconds/sentence

Timing Results

Page 70: Jhu mt lecture_2014-04-15

137 5 27

!

!=w1X...XwI

I!

i=1

(|wi| + log |T | + ni)

!

w

(|w| + log |T |)

2825 3 5 27 82069

contiguous

discontiguous

Complexity Analysis

Page 71: Jhu mt lecture_2014-04-15

137 5 27

!

!=w1X...XwI

I!

i=1

(|wi| + log |T | + ni)

!

w

(|w| + log |T |)

2825 3 5 27 82069

contiguous

discontiguous

Complexity Analysis

Page 72: Jhu mt lecture_2014-04-15

computations (ranked by time)

cum

ulat

ive

time

(s)

Empirical Analysis

Page 73: Jhu mt lecture_2014-04-15

Distribution of Patterns in Training DataFr

eque

ncy

Pattern types (in descending order of frequency)

Page 74: Jhu mt lecture_2014-04-15

Distribution of Patterns in Training DataFr

eque

ncy

Pattern types (in descending order of frequency)

Page 75: Jhu mt lecture_2014-04-15

Analysis of Problem

• The expensive computations involve at least one frequent subpattern. There are two cases.

• A frequent pattern paired with an infrequent pattern

• Two frequent patterns paired with each other

Page 76: Jhu mt lecture_2014-04-15

Frequent × Infrequent Subpatterns

Page 77: Jhu mt lecture_2014-04-15

Frequent × Infrequent Subpatterns

Page 78: Jhu mt lecture_2014-04-15

Frequent × Infrequent Subpatterns

Page 79: Jhu mt lecture_2014-04-15

Frequent × Infrequent Subpatterns

Page 80: Jhu mt lecture_2014-04-15

Double Binary SearchBaeza-Yates, 04

Page 81: Jhu mt lecture_2014-04-15

Double Binary SearchBaeza-Yates, 04

Queryset Q Dataset D

Page 82: Jhu mt lecture_2014-04-15

Double Binary SearchBaeza-Yates, 04

Queryset Q Dataset D

Page 83: Jhu mt lecture_2014-04-15

Double Binary SearchBaeza-Yates, 04

Queryset Q Dataset D

Page 84: Jhu mt lecture_2014-04-15

Double Binary SearchBaeza-Yates, 04

Queryset Q Dataset D

Page 85: Jhu mt lecture_2014-04-15

Double Binary SearchBaeza-Yates, 04

Queryset Q Dataset D

Page 86: Jhu mt lecture_2014-04-15

Double Binary SearchBaeza-Yates, 04

Queryset Q Dataset D

Page 87: Jhu mt lecture_2014-04-15

Double Binary SearchBaeza-Yates, 04

Queryset Q Dataset D

complexity: |Q| log |D|Upper bound

Page 88: Jhu mt lecture_2014-04-15

Obtaining Sorted Sets

Page 89: Jhu mt lecture_2014-04-15

Obtaining Sorted Sets

Sort via Stratified Tree(van Emde Boas et al. 1977)

Page 90: Jhu mt lecture_2014-04-15

Obtaining Sorted Sets

Problem: complexity increases to O(|Q| log |D| + (|Q| + |D|) log log |T |)

Sort via Stratified Tree(van Emde Boas et al. 1977)

Page 91: Jhu mt lecture_2014-04-15

Obtaining Sorted Sets

Problem: complexity increases to

Solution: cache sorted set in prefix

tree

O(|Q| log |D| + (|Q| + |D|) log log |T |)

Sort via Stratified Tree(van Emde Boas et al. 1977)

Page 92: Jhu mt lecture_2014-04-15

177

221

Baseline PrefixTree

+ double binary

seconds/sentence

Timing Results

Page 93: Jhu mt lecture_2014-04-15

174177

221

Baseline PrefixTree

+ double binary

seconds/sentence

Timing Results

Page 94: Jhu mt lecture_2014-04-15

Obtaining Sorted Sets

Sort via Stratified Tree

Problem: sort complexityis still very high for very

frequent patterns

Page 95: Jhu mt lecture_2014-04-15

Obtaining Sorted Sets

Solution: precompute theinverted index for 1000 most

frequent contiguous patterns

Page 96: Jhu mt lecture_2014-04-15

174177

221

Baseline PrefixTree

+ double binary

seconds/sentence

Timing Results

Page 97: Jhu mt lecture_2014-04-15

44

174177

221

Baseline PrefixTree

+ double binary

seconds/sentence

Timing Results

+ inverted indices

Page 98: Jhu mt lecture_2014-04-15

Frequent × Frequent Subpatterns

Page 99: Jhu mt lecture_2014-04-15

Frequent × Frequent Subpatterns

Problem:There is no clever algorithm to

solve this problem

Page 100: Jhu mt lecture_2014-04-15

Solution: Precomputationit makes him and it mars him . it sets him on and it takes him off . #it makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text

Page 101: Jhu mt lecture_2014-04-15

Solution: Precomputationit makes him and it mars him . it sets him on and it takes him off . #

Most Frequent Patterns

it makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text

it (4)him (4)

Precomputed Pattern Matchesit X him him X it

it X it him X him

Page 102: Jhu mt lecture_2014-04-15

Solution: Precomputationit makes him and it mars him . it sets him on and it takes him off . #

Most Frequent Patterns

it makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text

it (4)him (4)

Precomputed Pattern Matchesit X him him X it

it X it him X him

(0, 2)

(0, 4)

(0, 6)

(0, 8)

(2, 4)

(2, 6)

(2, 8)

(2, 10)

(4, 6)

(4, 8)

(4, 10)

(4, 13)

(6, 8)

(6, 10)

(6, 13)

(6, 15)

(8, 10)

(8, 13)

(8, 15)(10, 13)

(10, 15)

(13, 15)

Page 103: Jhu mt lecture_2014-04-15

44

174177

221

Baseline PrefixTree

+ double binary

seconds/sentence

Timing Results

+ inverted indices

Page 104: Jhu mt lecture_2014-04-15

1

44

174177

221

Baseline PrefixTree

+ double binary

seconds/sentence

Timing Results

+ inverted indices

+ precomp

Page 105: Jhu mt lecture_2014-04-15

Analysis of Fixed Memory Usage

• Source Text: |T|

• Suffix Array: |T|

• Alignments: |T|

• Target Text: |T|

• Total Cost: 4 |T|

• For 27M words: about 700M

• including indices for 1000 words: about 2.1 Gb

• for 100 words: 1.1Gb, increases time to 1.6 secs/sent

Page 106: Jhu mt lecture_2014-04-15

Longer Spans, Longer Phrases

1520253035

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1520253035

1 2 3 4 5 6 7 8 9 10

Maximum Span Length

Maximum Phrase Length

BLEU

BLEU

Page 107: Jhu mt lecture_2014-04-15

The Tera-Scale Translation Model

• Task: NIST Chinese-English 2005

• Baseline Model: 30.7

• Tera-Scale Model: 32.6

• All modifications contribute to overall score

• With better language model and number translation:

• Baseline Model: 31.9

• Tera-Scale Model: 34.5

Page 108: Jhu mt lecture_2014-04-15

Open Questions

• Can we improve speed?

• Can we improve memory use? Compressed self-indexes?

• Uses for arbitrarily large translation models?

• Context-sensitive models (Chan et al. 2007, Carpuat & Wu 2007)

• Factored models (Koehn et al. 2007)

• Syntax-based model (DeNeefe et al. 2007)

• What other algorithms can we use from bioinformatics?

Page 109: Jhu mt lecture_2014-04-15

Compact Adaptable TMs on GPUs

• Technique: massively parallel string pattern matching algorithms for...

• Streaming MT (Levenberg et al. 2010)

• Interactive MT (Denkowski et al. 2014)

• Low-memory MT (Lopez, 2008)

• Online learning (Simianer et al. 2012)

• New in this talk:

• New translation models

• New algorithms

• State-of-the-art translation accuracy

Page 110: Jhu mt lecture_2014-04-15

Streaming Translation Models

• Align new training data.

• Append to existing training data.

• Extract new phrase pairs as needed.

(Levenberg et al. 2010)

Page 111: Jhu mt lecture_2014-04-15

Streaming Translation Models

• Align new training data → online EM. FAST!

• Append to existing training data. FAST!

• Extract new phrase pairs as needed. SLOW...

(Levenberg et al. 2010)

For each phrase pair J:For each occurrence of J:

Extract its translation E (if possible)For each phrase pair J,E:

Compute p(E|J)

Page 112: Jhu mt lecture_2014-04-15

Streaming Translation Models

• Align new training data → online EM. FAST!

• Append to existing training data. FAST!

• Extract new phrase pairs as needed. SLOW...

(Levenberg et al. 2010)

For each phrase pair J:For each occurrence of J:

Extract its translation E (if possible)For each phrase pair J,E:

Compute p(E|J)

Page 113: Jhu mt lecture_2014-04-15

Streaming Translation Models

• Align new training data → online EM. FAST!

• Append to existing training data. FAST!

• Extract new phrase pairs as needed. SLOW...

(Levenberg et al. 2010)

For each phrase pair J:For each occurrence of J:

Extract its translation E (if possible)For each phrase pair J,E:

Compute p(E|J)

... But easily parallelizable. GPUs to the rescue!

Page 114: Jhu mt lecture_2014-04-15

Streaming Translation Models(Levenberg et al. 2010)

For each phrase pair J:For each occurrence of J:

Extract its translation E (if possible)For each phrase pair J,E:

Compute p(E|J)

In parallel on GPU://find all

// extract all

Page 115: Jhu mt lecture_2014-04-15

GPUs

• Graphics Processing Unit

• Single Instruction, Multiple Data

• Limited Memory

Page 116: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

itpersuades

himand

disheartensit persuades

persuades himhim and

and itit disheartens

disheartens him

it persuades himpersuades him and

him and itand it disheartensit disheartens him

it persuades him andpersuades him and it

him and it disheartensand it disheartens himit persuades him and it

persuades him and it disheartens

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Page 117: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

itpersuades

himand

disheartensit persuades

persuades himhim and

and itit disheartens

disheartens him

it persuades himpersuades him and

him and itand it disheartensit disheartens him

it persuades him andpersuades him and it

him and it disheartensand it disheartens himit persuades him and it

persuades him and it disheartens

6, 10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Page 118: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

itpersuades

himand

disheartensit persuades

persuades himhim and

and itit disheartens

disheartens him

it persuades himpersuades him and

him and itand it disheartensit disheartens him

it persuades him andpersuades him and it

him and it disheartensand it disheartens himit persuades him and it

persuades him and it disheartens

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

6, 10

2, 3

0, 2

Page 119: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

itpersuades

himand

disheartensit persuades

persuades himhim and

and itit disheartens

disheartens him

it persuades himpersuades him and

him and itand it disheartensit disheartens him

it persuades him andpersuades him and it

him and it disheartensand it disheartens himit persuades him and it

persuades him and it disheartens

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

6, 10

2, 3

0, 2

Parallel pass 1:Find longest substrings

simultaneously

Page 120: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

itpersuades

himand

disheartensit persuades

persuades himhim and

and itit disheartens

disheartens him

it persuades himpersuades him and

him and itand it disheartensit disheartens him

it persuades him andpersuades him and it

him and it disheartensand it disheartens himit persuades him and it

persuades him and it disheartens

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

6, 10

2, 6

2, 3

2, 3

0, 2

0, 2

Page 121: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

itpersuades

himand

disheartensit persuades

persuades himhim and

and itit disheartens

disheartens him

it persuades himpersuades him and

him and itand it disheartensit disheartens him

it persuades him andpersuades him and it

him and it disheartensand it disheartens himit persuades him and it

persuades him and it disheartens

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

6, 10

2, 6

2, 3

2, 3

0, 2

0, 2

Page 122: Jhu mt lecture_2014-04-15

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

itpersuades

himand

disheartensit persuades

persuades himhim and

and itit disheartens

disheartens him

it persuades himpersuades him and

him and itand it disheartensit disheartens him

it persuades him andpersuades him and it

him and it disheartensand it disheartens himit persuades him and it

persuades him and it disheartens

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

6, 10

2, 6

2, 3

2, 3

0, 2

0, 2

Parallel pass 2:Find all substrings

simultaneously

Page 123: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps312215106048131

...

and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

him _ it

Query pattern:

himit

Subpatterns:

Page 124: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps312215106048131

...

and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

215106

04813

Page 125: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps

215106

04813

Page 126: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps

215106

04813

(2, 4)(2, 8)(2, 13)(6, 8)(6, 13)(10, 13)

it makes him and it mars him . it sets him on and it takes him off . # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

on CPU

Page 127: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps

215106

04813

(2, 4)(2, 8)(2, 13)(6, 8)(6, 13)(10, 13)

it makes him and it mars him . it sets him on and it takes him off . # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

on CPU

Page 128: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps

215106

04813

(2, 4)(10, 13)

it makes him and it mars him . it sets him on and it takes him off . # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

on CPU

Page 129: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps on GPU

215106

it makes him and it mars him . it sets him on and it takes him off . # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Page 130: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps on GPU

215106

it makes him and it mars him . it sets him on and it takes him off . # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Thread 1

Thread 4Thread 3

Thread 2

Page 131: Jhu mt lecture_2014-04-15

Pattern Matching with Gaps on GPU

215106

it makes him and it mars him . it sets him on and it takes him off . # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Thread 1

Thread 4Thread 3

Thread 2(2,4)

(10,13)

Page 132: Jhu mt lecture_2014-04-15

Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

Page 133: Jhu mt lecture_2014-04-15

Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

akemasu / open

Page 134: Jhu mt lecture_2014-04-15

Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

watashi wa / I

Page 135: Jhu mt lecture_2014-04-15

Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

hako wo / the box

Page 136: Jhu mt lecture_2014-04-15

Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

hako wo / open the box✘

Page 137: Jhu mt lecture_2014-04-15

Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

hako wo akemasu / open the box

Page 138: Jhu mt lecture_2014-04-15

Hierarchical Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

_ akemasu / open _

Page 139: Jhu mt lecture_2014-04-15

Hierarchical Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

_ akemasu / open _

Page 140: Jhu mt lecture_2014-04-15

Hierarchical Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

_1 _2 akemasu/ _1 open _2

Page 141: Jhu mt lecture_2014-04-15

Hierarchical Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

watashi wa hako wo _ / I _ the box

Page 142: Jhu mt lecture_2014-04-15

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

L,R

L’,R’

0

0

1 2 3

1

2

3

4

45

Page 143: Jhu mt lecture_2014-04-15

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

L,R

L’,R’

watashi wa / ???

0

0

1 2 3

1

2

3

4

45

Page 144: Jhu mt lecture_2014-04-15

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

L,R

L’,R’

watashi wa / ???

0

0

1 2 3

1

2

3

4

45

Page 145: Jhu mt lecture_2014-04-15

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu0

0

1 2 3

1

2

3

4

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

watashi wa / ???

L,R

L’,R’

45

Page 146: Jhu mt lecture_2014-04-15

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu0

0

1 2 3

1

2

3

4

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

watashi wa / ???

L,R

L’,R’

45

Page 147: Jhu mt lecture_2014-04-15

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu0

0

1 2 3

1

2

3

4

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

watashi wa / I

L,R

L’,R’

45

Page 148: Jhu mt lecture_2014-04-15

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu0

0

1 2 3

1

2

3

4

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

wo akemasu / ???

L,R

L’,R’

45

Page 149: Jhu mt lecture_2014-04-15

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu0

0

1 2 3

1

2

3

4

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

wo akemasu / ???

L,R

L’,R’

45

Page 150: Jhu mt lecture_2014-04-15

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu0

0

1 2 3

1

2

3

4

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

wo akemasu / ???

L,R

L’,R’

45

Page 151: Jhu mt lecture_2014-04-15

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu0

0

1 2 3

1

2

3

4

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

wo akemasu / ???

L,R

L’,R’

45

Page 152: Jhu mt lecture_2014-04-15

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu0

0

1 2 3

1

2

3

4

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

wo akemasu / ???

L,R

L’,R’

45

Compute i’,j’ simultaneously for all (sampled) occurrences

Page 153: Jhu mt lecture_2014-04-15

Representing L and R ArraysI open the box

watashi

wa

hako

wo

akemasu

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

L,R

L’,R’

0

0

1 2 3

1

2

3

4

45

Fiction!

Page 154: Jhu mt lecture_2014-04-15

Representing L and R ArraysI open the box

watashi

wa

hako

wo

akemasu

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

L,R

L’,R’

7240736

7240737

7240738

Reality!

7240739

7240740

72407417582971

7582972

7582973

7582974

7582975

Page 155: Jhu mt lecture_2014-04-15

Representing L and R Arrays

watashi wa hako wo akemasu0,1 0,1 3,4 3,4 1,2

7240736 7240737 7240738 7240739 7240740

wordR,L

i

Page 156: Jhu mt lecture_2014-04-15

Representing L and R Arrays

watashi wa hako wo akemasu0,1 0,1 3,4 3,4 1,2

7240736 7240737 7240738 7240739 7240740

wordR,L

i

#7240735

Page 157: Jhu mt lecture_2014-04-15

Representing L and R Arrays

watashi wa hako wo akemasu0,1 0,1 3,4 3,4 1,2

7240736 7240737 7240738 7240739 7240740

wordR,L

i

#7240735

P 1 2 3 4 5

Page 158: Jhu mt lecture_2014-04-15

Representing L and R Arrays

watashi wa hako wo akemasu0,1 0,1 3,4 3,4 1,2

7240736 7240737 7240738 7240739 7240740

wordR,L

i

#7240735

P 1 2 3 4 5

7582971

Page 159: Jhu mt lecture_2014-04-15

Representing L and R Arrays

watashi wa hako wo akemasu0,1 0,1 3,4 3,4 1,2

7240736 7240737 7240738 7240739 7240740

wordR,L

i

#7240735

P 1 2 3 4 5

7582971

Page 160: Jhu mt lecture_2014-04-15

Representing L and R Arrays

watashi wa hako wo akemasu0,1 0,1 3,4 3,4 1,2

7240736 7240737 7240738 7240739 7240740

wordR,L

i

#7240735

P 1 2 3 4 5

7582971

Page 161: Jhu mt lecture_2014-04-15

Experiments

• NIST ’05-’06 Chinese English, train: Xinhua + UN

• Baseline: very fast CPU implementation mostly in C (Lopez 2007, 2008; Chahuneau et al. 2012)

• State-of-the art cdec MT system (Dyer et al. 2010)

• GPU: NVIDIA Tesla K20c (2496 cores, 5Gb memory)

• CPU: dual 2.90GHz Intel Xeon E5-2690

Page 162: Jhu mt lecture_2014-04-15

Phrase Extraction Speed

0

750

1500

2250

3000

CPU GPU (2000)

GPU (4000)

GPU (6000)

GPU (8000)

571514454400

5.4

20011794

16131356

14.2

Xinhua (27M words) Xinhua + UN (108M words)

wor

ds p

er se

cond

Page 163: Jhu mt lecture_2014-04-15

Translation Accuracy

• Without hierarchical phrases: 30.7

• With hierarchical phrases on CPU: 33.37

• With hierarchical phrases on GPU: 33.46

Page 164: Jhu mt lecture_2014-04-15

End-to-end Translation Speed

• With hierarchical phrases on CPU: 180.5s

• With hierarchical phrases on GPU: 98.9s

Page 165: Jhu mt lecture_2014-04-15

Conclusions• GPU pattern matching & extraction algorithms

• Fast adaptable MT with streaming models

• Low-memory footprint MT

• State of the art translation accuracy

• Language models on GPUs?

• Decoding on GPUs?