Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing...
Transcript of Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing...
![Page 1: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/1.jpg)
LanguageModelingI
AlgorithmsforNLP
TaylorBerg-Kirkpatrick– CMU
Slides:DanKlein– UCBerkeley
![Page 2: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/2.jpg)
TheNoisy-ChannelModel
§ Wewanttopredictasentencegivenacoustics:
§ Thenoisy-channelapproach:
Acousticmodel:HMMsoverwordpositionswithmixturesofGaussiansasemissions
Languagemodel:Distributionsoversequencesofwords
(sentences)
![Page 3: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/3.jpg)
sourceP(w) w a
decoderobserved
argmaxP(w|a)=argmaxP(a|w)P(w)w w
w abest
channelP(a|w)
LanguageModel AcousticModel
ASRComponents
![Page 4: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/4.jpg)
AcousticConfusions
thestationsignsareindeepinenglish -14732thestationssignsareindeepinenglish -14735thestationsignsareindeepintoenglish -14739thestation'ssignsareindeepinenglish -14740thestationsignsareindeepintheenglish -14741thestationsignsareindeedinenglish -14757thestation'ssignsareindeedinenglish -14760thestationsignsareindians inenglish -14790thestationsignsareindian inenglish -14799thestationssignsareindians inenglish -14807thestationssignsareindians andenglish -14815
![Page 5: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/5.jpg)
Translation:Codebreaking?
“Also knowing nothing official about, but having guessedand inferred considerable about, the powerful newmechanized methods in cryptography—methods which Ibelieve succeed even when one does not know whatlanguage has been coded—one naturally wonders if theproblem of translation could conceivably be treated as aproblem in cryptography. When I look at an article inRussian, I say: ‘This is really written in English, but it hasbeen coded in some strange symbols. I will now proceed todecode.’ ”
WarrenWeaver(1947)
![Page 6: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/6.jpg)
sourceP(e) e f
decoderobserved
argmaxP(e|f)=argmaxP(f|e)P(e)e e
e fbest
channelP(f|e)
Language Model Translation Model
MTSystemComponents
![Page 7: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/7.jpg)
OtherNoisyChannelModels?
§ We’renotdoingthisonlyforASR(andMT)§ Grammar/spellingcorrection§ Handwritingrecognition,OCR§ Documentsummarization§ Dialoggeneration§ Linguisticdecipherment§ …
![Page 8: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/8.jpg)
Language Models§ Alanguagemodelisadistributionoversequencesofwords
(sentences)
§ What’sw?(closedvsopenvocabulary)§ What’sn?(mustsumtooneoveralllengths)§ Canhaverichstructureorbelinguisticallynaive
§ Whylanguagemodels?§ Usuallythepointistoassignhighweightstoplausiblesentences(cf
acousticconfusions)§ Thisisnotthesameasmodelinggrammaticality
𝑃 𝑤 = 𝑃 𝑤$ …𝑤&
![Page 9: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/9.jpg)
N-GramModels
![Page 10: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/10.jpg)
N-Gram Models§ Usechainruletogeneratewordsleft-to-right
§ Can’tconditionontheentireleftcontext
§ N-grammodelsmakeaMarkovassumption
P(???|Turntopage134andlookatthepictureofthe)
![Page 11: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/11.jpg)
Empirical N-Grams§ HowdoweknowP(w|history)?
§ Usestatisticsfromdata(examplesusingGoogleN-Grams)§ E.g.whatisP(door|the)?
§ Thisisthemaximumlikelihoodestimate
198015222 the first194623024 the same168504105 the following158562063 the world…14112454 the door-----------------23135851162 the *
Trai
ning
Cou
nts
![Page 12: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/12.jpg)
IncreasingN-Gram Order§ Higherorderscapturemoredependencies
198015222 the first194623024 the same168504105 the following158562063 the world…14112454 the door-----------------23135851162 the *
197302 close the window 191125 close the door 152500 close the gap 116451 close the thread 87298 close the deal
-----------------3785230 close the *
Bigram Model Trigram Model
P(door | the) = 0.0006 P(door | close the) = 0.05
![Page 13: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/13.jpg)
IncreasingN-GramOrder
![Page 14: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/14.jpg)
Sparsity
3380 please close the door1601 please close the window1164 please close the new1159 please close the gate…0 please close the first-----------------13951 please close the *
Pleaseclosethefirstdoorontheleft.
![Page 15: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/15.jpg)
Sparsity§ Problemswithn-grammodels:
§ Newwords(openvocabulary)§ Synaptitute§ 132,701.03§ multidisciplinarization
§ Oldwordsinnewcontexts
§ Aside:Zipf’s Law§ Types(words)vs.tokens(wordoccurences)§ Broadly:mostwordtypesarerareones§ Specifically:
§ Rankwordtypesbytokenfrequency§ Frequencyinverselyproportionaltorank
§ Notspecialtolanguage:randomlygeneratedcharacterstringshavethisproperty(tryit!)
§ Thislawqualitatively(butrarelyquantitatively)informsNLP
0
0.2
0.4
0.6
0.8
1
0 500000 1000000
Frac
tion
Seen
Number of Words
Unigrams
Bigrams
![Page 16: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/16.jpg)
N-GramEstimation
![Page 17: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/17.jpg)
Smoothing§ Weoftenwanttomakeestimatesfromsparsestatistics:
§ Smoothingflattensspikydistributionssotheygeneralizebetter:
§ VeryimportantalloverNLP,buteasytodobadly
P(w | denied the)3 allegations2 reports1 claims1 request7 total
alle
gatio
ns
char
ges
mot
ion
bene
fits
…
alle
gatio
ns
repo
rts
clai
ms
char
ges
requ
est
mot
ion
bene
fits
…
alle
gatio
ns
repo
rts
clai
ms
requ
est
P(w | denied the)2.5 allegations1.5 reports0.5 claims0.5 request2 other7 total
![Page 18: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/18.jpg)
LikelihoodandPerplexity§ HowdowemeasureLM“goodness”?
§ Shannon’sgame:predictthenextword
WhenIeatpizza,Iwipeoffthe_________
§ Formally:definetestset(log)likelihood
§ Perplexity:“averageperwordbranchingfactor”
grease 0.5
sauce 0.4
dust 0.05
….
mice 0.0001
….
the 1e-100
3516 wipe off the excess 1034 wipe off the dust547 wipe off the sweat518 wipe off the mouthpiece…120 wipe off the grease0 wipe off the sauce0 wipe off the mice-----------------28048 wipe off the *
logP (X|✓) =X
w2X
logP (w|✓)
perp(X, ✓) = exp
✓� logP (X|✓)
|X|
◆
![Page 19: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/19.jpg)
MeasuringModelQuality(Speech)
§ WereallywantbetterASR(orwhatever),notbetterperplexities
§ Forspeech,wecareaboutworderrorrate(WER)
§ Commonissue:intrinsicmeasureslikeperplexityareeasiertouse,butextrinsiconesaremorecredible
Correctanswer: Andysawa part ofthemovie
Recognizeroutput: And he sawapart ofthemovie
insertions +deletions +substitutionstruesentencesizeWER: =4/7=57%
![Page 20: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/20.jpg)
KeyIdeasforN-GramLMs
![Page 21: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/21.jpg)
Idea1:Interpolation
Pleaseclosethefirstdoorontheleft.
3380 please close the door1601 please close the window1164 please close the new1159 please close the gate…0 please close the first-----------------13951 please close the *
198015222 the first194623024 the same168504105 the following158562063 the world……-----------------23135851162 the *
197302 close the window 191125 close the door 152500 close the gap 116451 close the thread…8662 close the first-----------------3785230 close the *
0.0 0.002 0.009
SpecificbutSparse DensebutGeneral
4-Gram 3-Gram 2-Gram
![Page 22: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/22.jpg)
(Linear)Interpolation§ Simplestwaytomixdifferentorders:linearinterpolation
§ Howtochooselambdas?§ Shouldlambdadependonthecountsofthehistories?
§ Choosingweights:eithergridsearchorEMusingheld-outdata
§ Bettermethodshaveinterpolationweightsconnectedtocontextcounts,soyousmoothmorewhenyouknowless
![Page 23: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/23.jpg)
Train,Held-Out,Test§ Wanttomaximizelikelihoodontest,nottrainingdata
§ Empiricaln-gramswon’tgeneralizewell§ Modelsderivedfromcounts/sufficientstatisticsrequire
generalizationparameterstobetunedonheld-outdatatosimulatetestgeneralization
§ Sethyperparameters tomaximizethelikelihoodoftheheld-outdata(usuallywithgridsearchorEM)
Training Data Held-OutData
TestData
Counts / parameters from here
Hyperparametersfrom here
Evaluate here
![Page 24: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/24.jpg)
Idea2:Discounting§ Observation:N-gramsoccurmoreintrainingdatathanthey
willlater
Countin22MWords Futurec*(Next22M)1 0.452 1.253 2.244 3.235 4.21
EmpiricalBigramCounts(ChurchandGale,91)
![Page 25: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/25.jpg)
Absolute Discounting
§ Absolutediscounting§ Reducenumeratorcountsbyaconstantd(e.g.0.75)§ Maybehaveaspecialdiscountforsmallcounts§ Redistributethe“shaved”masstoamodelofnewevents
§ Exampleformulation
![Page 26: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/26.jpg)
Idea3:Fertility
§ Shannongame:“Therewasanunexpected_____”§ “delay”?§ “Francisco”?
§ Contextfertility:numberofdistinctcontexttypesthatawordoccursin§ Whatisthefertilityof“delay”?§ Whatisthefertilityof“Francisco”?§ Whichismorelikelyinanarbitrarynewcontext?
![Page 27: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/27.jpg)
Kneser-Ney Smoothing§ Kneser-Neysmoothingcombinestwoideas
§ Discountandreallocatelikeabsolutediscounting§ Inthebackoff model,wordprobabilitiesareproportionaltocontextfertility,notfrequency
§ Theoryandpractice§ Practice:KNsmoothinghasbeenrepeatedlyprovenbotheffectiveandefficient
§ Theory:KNsmoothingasapproximateinferenceinahierarchicalPitman-Yor process[Teh,2006]
P (w) / |{w0 : c(w0, w) > 0}|
![Page 28: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/28.jpg)
Kneser-NeyDetails§ Allordersrecursivelydiscountandback-off:
§ Alphaiscomputedtomaketheprobabilitynormalize(seeifyoucanfigureoutanexpression).
§ Forthehighestorder,c’isthetokencountofthen-gram.Forallothersitisthecontextfertilityofthen-gram:
§ Theunigrambasecasedoesnotneedtodiscount.§ Variantsarepossible(e.g.differentdforlowcounts)
c
0(x) = |{u : c(u, x) > 0}|
Pk(w|prevk�1) =max(c0(prevk�1, w)� d, 0)P
v c0(prevk�1, v)
+ ↵(prev k � 1)Pk�1(w|prevk�2)
![Page 29: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/29.jpg)
WhatActuallyWorks?§ Trigramsandbeyond:
§ Unigrams,bigramsgenerallyuseless
§ Trigramsmuchbetter§ 4-,5-gramsandmoreare
reallyusefulinMT,butgainsaremorelimitedforspeech
§ Discounting§ Absolutediscounting,Good-
Turing,held-outestimation,Witten-Bell,etc…
§ Contextcounting§ Kneser-Neyconstructionof
lower-ordermodels
§ See[Chen+Goodman]readingfortonsofgraphs…
[Graph fromJoshua Goodman]
![Page 30: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/30.jpg)
Idea4:BigData
There’snodatalikemoredata.
![Page 31: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/31.jpg)
Data>>Method?§ Havingmoredataisbetter…
§ …butsoisusingabetterestimator§ Anotherissue:N>3hashugecostsinspeechrecognizers
5.56
6.57
7.58
8.59
9.510
1 2 3 4 5 6 7 8 9 10 20n-gram order
Ent
ropy
100,000 Katz
100,000 KN
1,000,000 Katz
1,000,000 KN
10,000,000 Katz
10,000,000 KN
all Katz
all KN
![Page 32: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/32.jpg)
TonsofData?
[Brants etal,2007]
![Page 33: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/33.jpg)
What about…
![Page 34: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/34.jpg)
Unknown Words?§ Whatabouttotallyunseenwords?
§ MostLMapplicationsareclosedvocabulary§ ASRsystemswillonlyproposewordsthatareintheirpronunciation
dictionary§ MTsystemswillonlyproposewordsthatareintheirphrasetables
(modulospecialmodelsfornumbers,etc)
§ Inprinciple,onecanbuildopenvocabularyLMs§ E.g.modelsovercharactersequencesratherthanwordsequences§ Back-offneedstogodownintoa“generatenewword”model§ Typicallyifyouneedthis,ahigh-ordercharactermodelwilldo
![Page 35: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/35.jpg)
What’sinanN-Gram?§ Justabouteverylocalcorrelation!
§ Wordclassrestrictions:“willhavebeen___”§ Morphology:“she___”,“they___”§ Semanticclassrestrictions:“dancedthe___”§ Idioms:“addinsultto___”§ Worldknowledge:“icecapshave___”§ Popculture:“theempirestrikes___”
§ Butnotthelong-distanceones§ “Thecomputer whichIhadjustputintothemachineroomonthefifthfloor___.”
![Page 36: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/36.jpg)
LinguisticPain?§ TheN-Gramassumptionhurtsone’sinnerlinguist!
§ Manylinguisticargumentsthatlanguageisn’tregular§ Long-distancedependencies§ Recursivestructure
§ Answers§ N-gramsonlymodellocalcorrelations,buttheygetthemall§ AsNincreases,theycatchevenmorecorrelations§ N-grammodelsscalemuchmoreeasilythanstructuredLMs
§ Notconvinced?§ CanbuildLMsoutofourgrammarmodels(laterinthecourse)§ Takeanygenerativemodelwithwordsatthebottomandmarginalize
outtheothervariables
![Page 37: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/37.jpg)
WhatGetsCaptured?
§ Bigrammodel:§ [texaco,rose,one,in,this,issue,is,pursuing,growth,in,a,boiler,
house,said,mr.,gurria,mexico,'s,motion,control,proposal,without,permission,from,five,hundred,fifty,five,yen]
§ [outside,new,car,parking,lot,of,the,agreement,reached]§ [this,would,be,a,record,november]
§ PCFGmodel:§ [This,quarter,‘s,surprisingly,independent,attack,paid,off,the,risk,
involving,IRS,leaders,and,transportation,prices,.]§ [It,could,be,announced,sometime,.]§ [Mr.,Toseland,believes,the,average,defense,economy,is,drafted,
from,slightly,more,than,12,stocks,.]
![Page 38: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/38.jpg)
ScalingUp?§ There’salotoftrainingdataoutthere…
…nextclasswe’lltalkabouthowtomakeitfit.
![Page 39: Algorithms for NLPdemo.clab.cs.cmu.edu/11711fa18/slides/lecture_2_language_models_1.pdfIncreasing N-GramOrder §Higher orders capture more dependencies 198015222 the first 194623024](https://reader034.fdocuments.us/reader034/viewer/2022050608/5faf3d588e4cd66772678a0d/html5/thumbnails/39.jpg)
OtherTechniques?§ Lotsofothertechniques
§ MaximumentropyLMs(soon)
§ NeuralnetworkLMs(soon)
§ Syntactic/grammar-structuredLMs(muchlater)