Language Model Algorithms - Carnegie Mellon...
Transcript of Language Model Algorithms - Carnegie Mellon...
![Page 1: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/1.jpg)
EstimatingQueryingDecoding
Language Model Algorithms
Kenneth Heafield
Carnegie Mellon, University of Edinburgh
March 7, 2013
Kenneth Heafield Language Model Algorithms
![Page 2: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/2.jpg)
EstimatingQueryingDecoding
MT is Expensive
I “Since decoding is very time-intensive” [Jehl et al, 2012]
I “Based on the amount of memory we can afford” [Wuebker et al, 2012]
Much of the computational cost is due to the language model.
Kenneth Heafield Language Model Algorithms
![Page 3: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/3.jpg)
EstimatingQueryingDecoding
MT is Expensive
I “Since decoding is very time-intensive” [Jehl et al, 2012]
I “Based on the amount of memory we can afford” [Wuebker et al, 2012]
Much of the computational cost is due to the language model.
Kenneth Heafield Language Model Algorithms
![Page 4: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/4.jpg)
EstimatingQueryingDecoding
Desiderata
I More data
I Less CPU time
I Less RAMI High Quality
I Search AccuracyI BLEU
Kenneth Heafield Language Model Algorithms
![Page 5: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/5.jpg)
EstimatingQueryingDecoding
Language Model Algorithms
Estimating Text → ARPA file with probabilities.
Querying ARPA file → efficient data structure.
Decoding Searching for high-scoring translations.
Kenneth Heafield Language Model Algorithms
![Page 6: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/6.jpg)
EstimatingQueryingDecoding
SortingModified Kneser-Ney
Estimating: The Problem
SRILM uses too much memory =⇒ limit on data size.Also, annoying to compile.
IRSTLM does not estimate modified Kneser-Ney models.Also, segfaults.
Kenneth Heafield Language Model Algorithms
![Page 7: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/7.jpg)
EstimatingQueryingDecoding
SortingModified Kneser-Ney
Estimating: The Problem
SRILM uses too much memory =⇒ limit on data size.Also, annoying to compile.
IRSTLM does not estimate modified Kneser-Ney models.Also, segfaults.
Kenneth Heafield Language Model Algorithms
![Page 8: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/8.jpg)
EstimatingQueryingDecoding
SortingModified Kneser-Ney
Stupid Backoff [Brants et al, 2007]
1. Count n-grams offline
count(wn1 )
2. Compute pseudo-probabilities at runtime
s(wn|wn−11 ) =
{ count(wn1 )
count(wn−11 )
if count(wn) > 0
0.4s(wn|wn−12 ) if count(wn) = 0
NB: s does not sum to 1.
Kenneth Heafield Language Model Algorithms
![Page 9: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/9.jpg)
EstimatingQueryingDecoding
SortingModified Kneser-Ney
Stupid Backoff [Brants et al, 2007]
1. Count n-grams offline
count(wn1 )
2. Compute pseudo-probabilities at runtime
s(wn|wn−11 ) =
{ count(wn1 )
count(wn−11 )
if count(wn) > 0
0.4s(wn|wn−12 ) if count(wn) = 0
NB: s does not sum to 1.
Kenneth Heafield Language Model Algorithms
![Page 10: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/10.jpg)
EstimatingQueryingDecoding
SortingModified Kneser-Ney
Counting n-grams
<s> Australia is one of the few
5-gram Count<s> Australia is one of 1Australia is one of the 1is one of the few 1
Hash table?
Runs out of RAM.
Kenneth Heafield Language Model Algorithms
![Page 11: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/11.jpg)
EstimatingQueryingDecoding
SortingModified Kneser-Ney
Counting n-grams
<s> Australia is one of the few
5-gram Count<s> Australia is one of 1Australia is one of the 1is one of the few 1
Hash table?Runs out of RAM.
Kenneth Heafield Language Model Algorithms
![Page 12: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/12.jpg)
EstimatingQueryingDecoding
SortingModified Kneser-Ney
Spill to Disk When RAM Runs Out
Text
Hash Table
Hash Table Hash Table
Sort Sort
File File
File
Merge Sort
Kenneth Heafield Language Model Algorithms
![Page 13: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/13.jpg)
EstimatingQueryingDecoding
SortingModified Kneser-Ney
Split Data
Text
Hash Table Hash Table
Sort Sort
File File
File
Merge Sort
Kenneth Heafield Language Model Algorithms
![Page 14: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/14.jpg)
EstimatingQueryingDecoding
SortingModified Kneser-Ney
Split and Merge
Text
Hash Table Hash Table
Sort Sort
File File
File
Merge Sort
Kenneth Heafield Language Model Algorithms
![Page 15: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/15.jpg)
EstimatingQueryingDecoding
SortingModified Kneser-Ney
Fanout: Merging More Than 2 Files
logfanout |data| passes, each of which reads and writes |data|.
Small fanout =⇒ more passes.Large fanout =⇒ more disk seeks.
Compromise: 64 MB buffer per file; fanout = (RAM/64MB)− 1.
Kenneth Heafield Language Model Algorithms
![Page 16: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/16.jpg)
EstimatingQueryingDecoding
SortingModified Kneser-Ney
Optimizing Merge Sort
I Laziness: do the last merge as the file is being read
I Sharding: partition the data by key
I Combine counts during merge passes
Kenneth Heafield Language Model Algorithms
![Page 17: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/17.jpg)
EstimatingQueryingDecoding
SortingModified Kneser-Ney
Now we can estimate Stupid Backoff models.
What about modified Kneser-Ney?
Kenneth Heafield Language Model Algorithms
![Page 18: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/18.jpg)
EstimatingQueryingDecoding
SortingModified Kneser-Ney
Modified Kneser-NeyText
Extract 5-Grams
Adjust Counts
Normalize
Interpolate
ARPA file
Sort in Suffix Order
Sort in Context Order
Sort in Suffix Order
Kenneth Heafield Language Model Algorithms
![Page 19: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/19.jpg)
EstimatingQueryingDecoding
SortingModified Kneser-Ney
Sorting Orders
Suffix Order5 4 3 2 1A A A A AC A A A AA Y A B AA Y Z B AA A A A YC A B A Z
Context Order4 3 2 1 5A A A A AA A A A YC A A A AC A B A ZA Y A B AA Y Z B A
Each step is a streaming algorithm in suffix or context order.
Kenneth Heafield Language Model Algorithms
![Page 20: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/20.jpg)
EstimatingQueryingDecoding
SortingModified Kneser-Ney
Adjust Counts
adjusted(wn1 ) =
{count(wn
1 ) if n = 5 or w1 = <s>
|v : count(vwn1 ) > 0| otherwise
Suffix order makes vwn1 consecutive.
Kenneth Heafield Language Model Algorithms
![Page 21: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/21.jpg)
EstimatingQueryingDecoding
SortingModified Kneser-Ney
Normalize
normalized(wn1 ) =
adjusted(wn1 )− discount(adjusted(wn
1 ))∑v adjusted(wn−1
1 v)
Context order makes wn−11 v consecutive.
The denominator is computed by a thread that reads ahead.
Kenneth Heafield Language Model Algorithms
![Page 22: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/22.jpg)
EstimatingQueryingDecoding
SortingModified Kneser-Ney
Interpolate
p(wn|wn−11 ) = normalized(wn
1 ) + backoff(wn−11 )p(wn|wn−1
2 )
Suffix order means p(wn|wn−12 ) and p(wn|wn−1
1 ) are close.
Kenneth Heafield Language Model Algorithms
![Page 23: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/23.jpg)
EstimatingQueryingDecoding
SortingModified Kneser-Ney
Estimation Results
lmplz 132 billion tokens, 4.3 days, one machine (140 GBRAM), lossless
Google 2007 31 billion tokens, 2 days, 400 machines, prunedsingleton words
Kenneth Heafield Language Model Algorithms
![Page 24: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/24.jpg)
EstimatingQueryingDecoding
LookupOptimizing
Querying: The Problem
The LM is queried ≈2.7 million times per sentence translated.=⇒ Represent the LM to make queries fast.
The University of Edinburgh owns a machine with 1 TB RAM.=⇒ Find more training data!
Kenneth Heafield Language Model Algorithms
![Page 25: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/25.jpg)
EstimatingQueryingDecoding
LookupOptimizing
Querying: The Problem
The LM is queried ≈2.7 million times per sentence translated.=⇒ Represent the LM to make queries fast.
The University of Edinburgh owns a machine with 1 TB RAM.=⇒ Find more training data!
Kenneth Heafield Language Model Algorithms
![Page 26: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/26.jpg)
EstimatingQueryingDecoding
LookupOptimizing
Example Language ModelUnigrams
Words log p log b<s> -∞ -2.0iran -4.1 -0.8is -2.5 -1.4one -3.3 -0.9of -2.5 -1.1
BigramsWords log p log b<s> iran -3.3 -1.2iran is -1.7 -0.4is one -2.0 -0.9one of -1.4 -0.6
TrigramsWords log p<s> iran is -1.1iran is one -2.0is one of -0.3
Kenneth Heafield Language Model Algorithms
![Page 27: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/27.jpg)
EstimatingQueryingDecoding
LookupOptimizing
Example QueriesUnigrams
Words log p log b<s> -∞ -2.0iran -4.1 -0.8is -2.5 -1.4one -3.3 -0.9of -2.5 -1.1
BigramsWords log p log b<s> iran -3.3 -1.2iran is -1.7 -0.4is one -2.0 -0.9one of -1.4 -0.6
TrigramsWords log p<s> iran is -1.1iran is one -2.0is one of -0.3
Query: <s> iran is
log p(is | <s> iran) = -1.1
Query: iran is oflog p(of) -2.5log backoff(is) -1.4log backoff(iran is) + -0.4
log p(of | iran is) = -4.3
Kenneth Heafield Language Model Algorithms
![Page 28: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/28.jpg)
EstimatingQueryingDecoding
LookupOptimizing
Lookup I: Giant Hash Table
Hash every n-gram to a 64-bit integer. Ignore collisions.Store n-grams in custom linear probing hash tables.
Fastest.
Kenneth Heafield Language Model Algorithms
![Page 29: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/29.jpg)
EstimatingQueryingDecoding
LookupOptimizing
Lookup II: Minimal Perfect Hash [Talbot and Brants, 2008]
Function from seen n-grams to an integers:
Perfect Each seen n-gram has a unique integer.
Minimal Integers are in [0, |n-grams|).
Use the integer to index an array of probability and backoff.
False Positives
Problem Unseen n-grams collide with seen n-grams.
Solution Store an f -bit signature =⇒ 2−f false positives.
Lowest memory but has false positives.
Kenneth Heafield Language Model Algorithms
![Page 30: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/30.jpg)
EstimatingQueryingDecoding
LookupOptimizing
Lookup II: Minimal Perfect Hash [Talbot and Brants, 2008]
Function from seen n-grams to an integers:
Perfect Each seen n-gram has a unique integer.
Minimal Integers are in [0, |n-grams|).
Use the integer to index an array of probability and backoff.
False Positives
Problem Unseen n-grams collide with seen n-grams.
Solution Store an f -bit signature =⇒ 2−f false positives.
Lowest memory but has false positives.
Kenneth Heafield Language Model Algorithms
![Page 31: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/31.jpg)
EstimatingQueryingDecoding
LookupOptimizing
Lookup III: Reverse Trie
Australia <s>
are
oneare
is Australiais Australia <s>
<s>
of oneare
is
Most popular format. Exact, middle memory usage.
Kenneth Heafield Language Model Algorithms
![Page 32: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/32.jpg)
EstimatingQueryingDecoding
LookupOptimizing
Optimizing Storage
Bit-Level Packing
Use only as many bits as needed.
Chop Bits [Raj and Whittaker, 2003]
In a sorted array, encode bits by the offset where they roll over.
Quantization [Whittaker and Raj, 2001] [IRSTLM]
Cluter floats into 2q bins then store q bits/float.
Kenneth Heafield Language Model Algorithms
![Page 33: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/33.jpg)
EstimatingQueryingDecoding
LookupOptimizing
Storing Less
Filtering
Remove n-grams that will not be queried during decoding.
Pruning
Remove low-count n-grams.
Encode Less Than Probability and Backoff
I Stupid backoff has one value: count.
I Collapse probability and backoff (requires more CPU time).
Kenneth Heafield Language Model Algorithms
![Page 34: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/34.jpg)
EstimatingQueryingDecoding
LookupOptimizing
Moses Benchmarks: Single Threaded
8
8
TrieProbing
Chop SRIIRST
8 Rand Backoff 2−8 false
,
/
0 2 4 6 8 10 120
1
2
3
4
Memory (GB)
Tim
e(h
)
Kenneth Heafield Language Model Algorithms
![Page 35: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/35.jpg)
EstimatingQueryingDecoding
LookupOptimizing
Querying Takeaways
I Optimizing the LM optimizes the decoder.
I Approximations can have negligible impact on quality27.29 BLEU → 27.09 BLEU by quantizing to 4 bits.
I There is no single best data structure.
Kenneth Heafield Language Model Algorithms
![Page 36: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/36.jpg)
EstimatingQueryingDecoding
Decoding
March 5: Cube Pruning
Can we do better?
Kenneth Heafield Language Model Algorithms
![Page 37: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/37.jpg)
EstimatingQueryingDecoding
Recall: Cube Pruning
I Each constituent is visited in bottom-up (topological) order.
I Generate a fixed number of hypothese per constituent.
I Estimated probabilities for words on the edge.
Kenneth Heafield Language Model Algorithms
![Page 38: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/38.jpg)
EstimatingQueryingDecoding
A Beam
X :X
countries that a maintain diplomatic relations ` with North Korea .
Left State Right Staterelations
tiescountries that have a an embassy in ` DPR Korea .country a that maintains some diplomatic ties ` in North Korea .nations which has a some diplomatic ties ` with DPR Korea .country a that maintains some diplomatic ties ` with DPR Korea .
� denotes words omitted by state.Rule “is a X :X”
Cube pruning tests variants of the same bad idea.
Kenneth Heafield Language Model Algorithms
![Page 39: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/39.jpg)
EstimatingQueryingDecoding
A Beam With State
X :X
Left State Right State Score(countries that a � ` with North Korea .) -2(nations which has a � ` with DPR Korea .) -4(countries that have a � ` DPR Korea .) -5(country a � ` in North Korea .) -8(country a � ` with DPR Korea .) -9
� denotes words omitted by state.
Rule “is a X :X”Cube pruning tests variants of the same bad idea.
Kenneth Heafield Language Model Algorithms
![Page 40: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/40.jpg)
EstimatingQueryingDecoding
A Beam With State
X :X
Left State Right State Score(countries that a � ` with North Korea .) -2(nations which has a � ` with DPR Korea .) -4(countries that have a � ` DPR Korea .) -5(country a � ` in North Korea .) -8(country a � ` with DPR Korea .) -9
is ais ais a
is a
� denotes words omitted by state.
Rule “is a X :X”Cube pruning tests variants of the same bad idea.
Kenneth Heafield Language Model Algorithms
![Page 41: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/41.jpg)
EstimatingQueryingDecoding
High Level Idea of Incremental Search
Group hypotheses by common outer words.Score outer words first, see how well they do.
Kenneth Heafield Language Model Algorithms
![Page 42: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/42.jpg)
EstimatingQueryingDecoding
Make a tree
(ε � ε)
(country a � Korea .)
(country a � ` with DPR Korea .)
(country a � ` in North Korea .)
(nations which has a � ` with DPR Korea .)
(countries that � Korea .)
(countries that have a � ` DPR Korea .)
(countries that a � ` with North Korea .)
[1]
Kenneth Heafield Language Model Algorithms
![Page 43: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/43.jpg)
EstimatingQueryingDecoding
Use the Tree
Try “in (country a � ` Korea .)” once and see if you like it.
Pop top candiate off the priority queue, expand children, push.
Kenneth Heafield Language Model Algorithms
![Page 44: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/44.jpg)
EstimatingQueryingDecoding
Use the Tree
Try “in (country a � ` Korea .)” once and see if you like it.
Pop top candiate off the priority queue, expand children, push.
Kenneth Heafield Language Model Algorithms
![Page 45: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is](https://reader034.fdocuments.us/reader034/viewer/2022042107/5e871e9b33db052f7b239274/html5/thumbnails/45.jpg)
EstimatingQueryingDecoding
Results
Kenneth Heafield Language Model Algorithms