TTIC 31190: Natural Language...
Transcript of TTIC 31190: Natural Language...
![Page 1: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/1.jpg)
TTIC31190:NaturalLanguageProcessing
KevinGimpelWinter2016
Lecture7:SequenceModels
1
![Page 2: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/2.jpg)
Announcements• Assignment2hasbeenposted,dueFeb.3• MidtermscheduledforThursday,Feb.18• ProjectproposaldueTuesday,Feb.23• Thursday’sclasswillbemorelikealab/flippedclass– wewillusethewhiteboardandimplementthingsinclass,sobringpaper,laptop,etc.
2
![Page 3: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/3.jpg)
Roadmap• classification• words• lexicalsemantics• languagemodeling• sequencelabeling• syntaxandsyntacticparsing• neuralnetworkmethodsinNLP• semanticcompositionality• semanticparsing• unsupervisedlearning• machinetranslationandotherapplications
3
![Page 4: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/4.jpg)
LanguageModeling
• goal:computetheprobabilityofasequenceofwords:
![Page 5: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/5.jpg)
MarkovAssumptionforLanguageModeling
AndreiMarkov
J&M/SLP3
![Page 6: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/6.jpg)
Intuitionofsmoothing(fromDanKlein)
• Whenwehavesparsestatistics:
• Stealprobabilitymasstogeneralizebetter:
P(w|deniedthe)3allegations2reports1claims1request7total
P(w|deniedthe)2.5allegations1.5reports0.5claims0.5request2other7total
allegatio
ns
reports
claims
attack
request
man
outcome …
allegatio
ns
attack
man
outcome
…allegatio
ns
reports
claims
request
![Page 7: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/7.jpg)
“Add-1”estimation• alsocalledLaplacesmoothing• justadd1toallcounts!
J&M/SLP3
![Page 8: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/8.jpg)
Backoff andInterpolation
• sometimesithelpstouseless context– conditiononlesscontextforcontextsyouhaven’tlearnedmuchabout
• backoff:– usetrigramifyouhavegoodevidence,otherwisebigram,otherwiseunigram
• interpolation:– mixtureofunigram,bigram,trigram(etc.)models
• interpolationworksbetter
J&M/SLP3
![Page 9: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/9.jpg)
LinearInterpolation
• simpleinterpolation:
4.4 • SMOOTHING 15
The sharp change in counts and probabilities occurs because too much probabil-ity mass is moved to all the zeros.
4.4.2 Add-k smoothingOne alternative to add-one smoothing is to move a bit less of the probability massfrom the seen to the unseen events. Instead of adding 1 to each count, we add a frac-tional count k (.5? .05? .01?). This algorithm is therefore called add-k smoothing.add-k
P⇤Add-k(wn|wn�1) =
C(wn�1wn)+ kC(wn�1)+ kV
(4.23)
Add-k smoothing requires that we have a method for choosing k; this can bedone, for example, by optimizing on a devset. Although add-k is is useful for sometasks (including text classification), it turns out that it still doesn’t work well forlanguage modeling, generating counts with poor variances and often inappropriatediscounts (Gale and Church, 1994).
4.4.3 Backoff and InterpolationThe discounting we have been discussing so far can help solve the problem of zerofrequency N-grams. But there is an additional source of knowledge we can drawon. If we are trying to compute P(wn|wn�2wn�1) but we have no examples of aparticular trigram wn�2wn�1wn, we can instead estimate its probability by usingthe bigram probability P(wn|wn�1). Similarly, if we don’t have counts to computeP(wn|wn�1), we can look to the unigram P(wn).
In other words, sometimes using less context is a good thing, helping to general-ize more for contexts that the model hasn’t learned much about. There are two waysto use this N-gram “hierarchy”. In backoff, we use the trigram if the evidence isbackoff
sufficient, otherwise we use the bigram, otherwise the unigram. In other words, weonly “back off” to a lower-order N-gram if we have zero evidence for a higher-orderN-gram. By contrast, in interpolation, we always mix the probability estimatesinterpolation
from all the N-gram estimators, weighing and combining the trigram, bigram, andunigram counts.
In simple linear interpolation, we combine different order N-grams by linearlyinterpolating all the models. Thus, we estimate the trigram probability P(wn|wn�2wn�1)by mixing together the unigram, bigram, and trigram probabilities, each weightedby a l :
P̂(wn|wn�2wn�1) = l1P(wn|wn�2wn�1)
+l2P(wn|wn�1)
+l3P(wn) (4.24)
such that the l s sum to 1: X
i
li = 1 (4.25)
In a slightly more sophisticated version of linear interpolation, each l weight iscomputed in a more sophisticated way, by conditioning on the context. This way,if we have particularly accurate counts for a particular bigram, we assume that thecounts of the trigrams based on this bigram will be more trustworthy, so we canmake the l s for those trigrams higher and thus give that trigram more weight in
4.4 • SMOOTHING 15
The sharp change in counts and probabilities occurs because too much probabil-ity mass is moved to all the zeros.
4.4.2 Add-k smoothingOne alternative to add-one smoothing is to move a bit less of the probability massfrom the seen to the unseen events. Instead of adding 1 to each count, we add a frac-tional count k (.5? .05? .01?). This algorithm is therefore called add-k smoothing.add-k
P⇤Add-k(wn|wn�1) =
C(wn�1wn)+ kC(wn�1)+ kV
(4.23)
Add-k smoothing requires that we have a method for choosing k; this can bedone, for example, by optimizing on a devset. Although add-k is is useful for sometasks (including text classification), it turns out that it still doesn’t work well forlanguage modeling, generating counts with poor variances and often inappropriatediscounts (Gale and Church, 1994).
4.4.3 Backoff and InterpolationThe discounting we have been discussing so far can help solve the problem of zerofrequency N-grams. But there is an additional source of knowledge we can drawon. If we are trying to compute P(wn|wn�2wn�1) but we have no examples of aparticular trigram wn�2wn�1wn, we can instead estimate its probability by usingthe bigram probability P(wn|wn�1). Similarly, if we don’t have counts to computeP(wn|wn�1), we can look to the unigram P(wn).
In other words, sometimes using less context is a good thing, helping to general-ize more for contexts that the model hasn’t learned much about. There are two waysto use this N-gram “hierarchy”. In backoff, we use the trigram if the evidence isbackoff
sufficient, otherwise we use the bigram, otherwise the unigram. In other words, weonly “back off” to a lower-order N-gram if we have zero evidence for a higher-orderN-gram. By contrast, in interpolation, we always mix the probability estimatesinterpolation
from all the N-gram estimators, weighing and combining the trigram, bigram, andunigram counts.
In simple linear interpolation, we combine different order N-grams by linearlyinterpolating all the models. Thus, we estimate the trigram probability P(wn|wn�2wn�1)by mixing together the unigram, bigram, and trigram probabilities, each weightedby a l :
P̂(wn|wn�2wn�1) = l1P(wn|wn�2wn�1)
+l2P(wn|wn�1)
+l3P(wn) (4.24)
such that the l s sum to 1: X
i
li = 1 (4.25)
In a slightly more sophisticated version of linear interpolation, each l weight iscomputed in a more sophisticated way, by conditioning on the context. This way,if we have particularly accurate counts for a particular bigram, we assume that thecounts of the trigrams based on this bigram will be more trustworthy, so we canmake the l s for those trigrams higher and thus give that trigram more weight in
J&M/SLP3
![Page 10: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/10.jpg)
• betterestimateforprobabilitiesoflower-orderunigrams!– Shannongame:Ican’tseewithoutmyreading___________?– “Francisco”ismorecommonthan“glasses”– …but“Francisco”alwaysfollows“San”
• unigramismostusefulwhenwehaven’tseenbigram!• soinsteadofunigramP(w)(“Howlikelyisw?”)• usePcontinuation(w) (“Howlikelyisw toappearasanovelcontinuation?”)– foreachword,count#ofbigramtypesitcompletes:
Kneser-NeySmoothing
PCONTINUATION (w)∝ {wi−1 : c(wi−1,w)> 0}
J&M/SLP3
![Page 11: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/11.jpg)
Kneser-NeySmoothing• howmanytimesdoesw appearasanovelcontinuation?
• normalizebytotalnumberofwordbigramtypes:
PCONTINUATION (w) ={wi−1 : c(wi−1,w)> 0}
{(wj−1,wj ) : c(wj−1,wj )> 0}
PCONTINUATION (w)∝ {wi−1 : c(wi−1,w)> 0}
{(wj−1,wj ) : c(wj−1,wj )> 0}
J&M/SLP3
![Page 12: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/12.jpg)
N-gramSmoothingSummary• add-1estimation:– OKfortextcategorization,notforlanguagemodeling
• forverylargeN-gramcollectionsliketheWeb:– stupidbackoff
• mostcommonlyusedmethod:– modifiedinterpolatedKneser-Ney
12J&M/SLP3
![Page 13: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/13.jpg)
Roadmap• classification• words• lexicalsemantics• languagemodeling• sequencelabeling• syntaxandsyntacticparsing• neuralnetworkmethodsinNLP• semanticcompositionality• semanticparsing• unsupervisedlearning• machinetranslationandotherapplications
13
![Page 14: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/14.jpg)
Linguisticphenomena:summarysofar…• wordshavestructure(stems andaffixes)• wordshavemultiplemeanings(senses)à wordsenseambiguity– sensesofawordcanbehomonymousorpolysemous– senseshaverelationships:
• hyponymy (“isa”)• meronymy (“partof”,“memberof”)
• variability/flexibilityoflinguisticexpression– manywaystoexpressthesamemeaning(asyousawinAssignment1)
– wordvectorstelluswhentwowordsaresimilar• today:part-of-speech
14
![Page 15: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/15.jpg)
15
Part-of-SpeechTagging
determinerverb(past)prep.properproperposs.adj.nounSomequestionedifTimCook’sfirstproduct
modalverbdet.adjectivenounprep.properpunc.wouldbeabreakawayhitforApple.
![Page 16: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/16.jpg)
determinerverb(past)prep.properproperposs.adj.noun
modalverbdet.adjectivenounprep.properpunc.
16
Part-of-SpeechTagging
determinerverb(past)prep.nounnounposs.adj.nounSomequestionedifTimCook’sfirstproduct
modalverbdet.adjectivenounprep.nounpunc.wouldbeabreakawayhitforApple.
![Page 17: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/17.jpg)
Part-of-Speech(POS)• functionalcategoryofaword:– noun,verb,adjective,etc.– howisthewordfunctioninginitscontext?
• dependentoncontextlikewordsense,butdifferentfromsense:– senserepresentswordmeaning,POSrepresentswordfunction
– senseusesadistinctcategoryofsensesperword,POSusessamesetofcategoriesforallwords
17
![Page 18: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/18.jpg)
PennTreebanktagset
18
![Page 19: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/19.jpg)
UniversalTagSet• manyusesmallersetsofcoarsertags• e.g.,“universaltagset”containing12tags:– noun,verb,adjective,adverb,pronoun,determiner/article,adposition (prepositionorpostposition),numeral,conjunction,particle,punctuation,other
19
Petrov,Das,McDonald (2011)
![Page 20: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/20.jpg)
ikr smh heaskedfiryo lastnamesohecanadduonfb lololol =D#lolz
20
intj pronoun prepadj prep verbotherverbarticlenoun pronoun
pronoun propernoun
verbprep intj emoticonhashtag
TwitterPart-of-SpeechTagging
adj =adjectiveprep=prepositionintj =interjection
• weremovedsomefine-grainedPOStags,thenaddedTwitter-specifictags:hashtag@-mentionURL/emailaddressemoticonTwitterdiscoursemarkerother(multi-wordabbreviations,symbols,garbage)
![Page 21: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/21.jpg)
wordsensevs.part-of-speech
21
wordsense part-of-speech
semantic orsyntactic?semantic:
indicatesmeaningofwordinitscontext
syntactic:indicatesfunction ofwordinits
context
numberofcategories |V|words,~5senseseachà5|V|categories!
typicalPOS tagsetshave12to45tags
inter-annotatoragreement low; somesensedistinctions
arehighly subjective
high; relativelyfewPOS tagsandfunction isrelativelyshallow/surface-level
independentorjointclassificationofnearby
words?
independent:canclassifyasinglewordbasedoncontextwords;structuredprediction israrelyused
joint:strongrelationshipbetween
tagsofnearbywords;structuredpredictionoften
used
![Page 22: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/22.jpg)
HowmightPOStagsbeuseful?• textclassification• machinetranslation• questionanswering
22
![Page 23: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/23.jpg)
ClassificationFramework
learning:choose_
modeling:definescorefunctioninference:solve_
23
![Page 24: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/24.jpg)
ApplicationsofourClassificationFramework
24
textclassification:
x y
thehulk isanangerfueledmonsterwithincrediblestrengthandresistancetodamage. objective
intryingtobedaringandoriginal,itcomesoffasonlyoccasionallysatiricalandneverfresh. subjective
={objective,subjective}
![Page 25: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/25.jpg)
ApplicationsofourClassificationFramework
25
wordsenseclassifierforbass:
x y
he’sabassinthechoir. bass3
our bassisline-caughtfromtheAtlantic. bass4
={bass1,bass2,…,bass8}
![Page 26: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/26.jpg)
ApplicationsofourClassificationFramework
26
skip-grammodelasaclassifier:
x y
agriculture <s>
agriculture is
agriculture the
=V (theentirevocabulary)
corpus(EnglishWikipedia):agriculture isthetraditionalmainstayofthecambodian economy.butbenares hasbeendestroyedbyanearthquake .…
![Page 27: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/27.jpg)
ApplicationsofourClassifierFrameworksofar
27
task input(x) output(y) outputspace() sizeof
textclassification asentence goldstandard
label forx
pre-defined, smalllabelset (e.g.,
{positive,negative})2-10
wordsensedisambiguation
instanceofaparticularword(e.g.,bass)with
itscontext
goldstandardwordsenseofx
pre-definedsenseinventory from
WordNet forbass2-30
learning skip-gramwordembeddings
instanceofawordinacorpus
awordinthecontextofx in
acorpusvocabulary |V|
part-of-speechtagging asentence
goldstandardpart-of-speech
tagsforx
allpossiblepart-of-speech tagsequenceswithsamelengthasx
|P||x|
![Page 28: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/28.jpg)
ApplicationsofourClassifierFrameworksofar
28
task input(x) output(y) outputspace() sizeof
textclassification asentence goldstandard
label forx
pre-defined, smalllabelset (e.g.,
{positive,negative})2-10
wordsensedisambiguation
instanceofaparticularword(e.g.,bass)with
itscontext
goldstandardwordsenseofx
pre-definedsenseinventory from
WordNet forbass2-30
learning skip-gramwordembeddings
instanceofawordinacorpus
awordinthecontextofx in
acorpusvocabulary |V|
part-of-speechtagging asentence
goldstandardpart-of-speech
tagsforx
allpossiblepart-of-speech tagsequenceswithsamelengthasx
|P||x|
exponentialinsizeofinput!“structuredprediction”
![Page 29: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/29.jpg)
determinerverb(past)prep.properproperposs.adj.noun
modalverbdet.adjectivenounprep.properpunc.
29
Part-of-SpeechTagging
determinerverb(past)prep.nounnounposs.adj.nounSomequestionedifTimCook’sfirstproduct
modalverbdet.adjectivenounprep.nounpunc.wouldbeabreakawayhitforApple.
SomequestionedifTimCook’sfirstproductwouldbeabreakawayhitforApple.
NamedEntityRecognition
PERSON ORGANIZATION
Simplestkindofstructuredprediction:SequenceLabeling
![Page 30: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/30.jpg)
Learning
learning:choose_
30
![Page 31: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/31.jpg)
EmpiricalRiskMinimizationwithSurrogateLossFunctions
31
• giventrainingdata:whereeach isalabel
• wewanttosolvethefollowing:
manypossiblelossfunctionstoconsider
optimizing
![Page 32: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/32.jpg)
LossFunctions
32
name loss whereused
cost(“0-1”)intractable,but
underlies“directerrorminimization”
perceptron perceptronalgorithm(Rosenblatt,1958)
hingesupportvector
machines,other large-marginalgorithms
log
logisticregression,conditional randomfields,maximumentropymodels
![Page 33: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/33.jpg)
(Sub)gradientsofLossesforLinearModels
33
name entryj of(sub)gradientofloss forlinearmodel
cost(“0-1”) notsubdifferentiable ingeneral
perceptron
hinge
log
whateverlossisusedduringtraining,classify (NOT costClassify)isusedtopredictlabelsfordev/testdata!
![Page 34: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/34.jpg)
(Sub)gradientsofLossesforLinearModels
34
name entryj of(sub)gradientofloss forlinearmodel
cost(“0-1”) notsubdifferentiable ingeneral
perceptron
hinge
log
expectationoffeaturevaluewithrespecttodistributionovery (wheredistribution isdefinedbytheta)
alternativenotation:
![Page 35: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/35.jpg)
SequenceModels• modelsthatassignscores(couldbeprobabilities)tosequences
• generalcategorythatincludesmanymodelsusedwidelyinpractice:– n-gramlanguagemodels– hiddenMarkovmodels– “chain”conditionalrandomfields– maximumentropyMarkovmodels
35
![Page 36: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/36.jpg)
HiddenMarkovModels(HMMs)• HMMsdefineajointprobabilitydistributionoverinputsequencesx andoutputsequencesy:
• conditionalindependenceassumptions(“Markovassumption”)areusedtofactorizethisjointdistributionintosmallterms
• widelyusedinNLP,speechrecognition,bioinformatics,manyotherareas
36
![Page 37: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/37.jpg)
HiddenMarkovModels(HMMs)• HMMsdefineajointprobabilitydistributionoverinputsequencesx andoutputsequencesy:
• assumption:outputsequencey “generates”inputsequencex:
• thesearetoodifficulttoestimate,let’suseMarkovassumptions
37
![Page 38: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/38.jpg)
MarkovAssumptionforLanguageModeling
AndreiMarkov
trigrammodel:
![Page 39: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/39.jpg)
IndependenceandConditionalIndependence
• Independence:tworandomvariablesX andY areindependentif:
(or)forallvaluesx andy
• ConditionalIndependence:tworandomvariablesXandY areconditionallyindependentgivenathirdvariableZ if
forallvaluesofx,y,andz(or )
39
![Page 40: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/40.jpg)
MarkovAssumptionforLanguageModeling
AndreiMarkov
trigrammodel:
![Page 41: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/41.jpg)
ConditionalIndependenceAssumptionsofHMMs
• twoy’sareconditionallyindependentgiventhey’sbetweenthem:
• anx atpositioni isconditionallyindependentofothery’sgiventhey atpositioni:
41
![Page 42: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/42.jpg)
GraphicalModelforanHMM(forasequenceoflength4)
42
y1 y2 y3 y4
x1 x2 x3 x4
agraphicalmodelisagraphinwhich:
eachnodecorrespondstoarandomvariable
eachdirectededgecorrespondstoaconditionalprobabilitydistributionofthetargetnodegiventhesourcenode
conditionalindependencestatementsamongrandomvariablesareencodedbytheedgestructure
![Page 43: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/43.jpg)
GraphicalModelforanHMM(forasequenceoflength4)
43
y1 y2 y3 y4
x1 x2 x3 x4
conditionalindependencestatementsamongrandomvariablesareencodedbytheedgestructureà weonlyhavetoworryaboutlocaldistributions:
transitionparameters:
emissionparameters:
![Page 44: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/44.jpg)
GraphicalModelforanHMM(forasequenceoflength4)
44
y1 y2 y3 y4
x1 x2 x3 x4
transitionparameters:
emissionparameters:
![Page 45: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/45.jpg)
45
Class-Based n-gram Models of Natural Language
Pe te r F. B rown" Pe te r V. deSouza* R o b e r t L. Mercer* IBM T. J. Watson Research Center
V incen t J. Del la Pietra* Jen i fe r C. Lai*
We address the problem of predicting a word from previous words in a sample of text. In particular, we discuss n-gram models based on classes of words. We also discuss several statistical algorithms for assigning words to classes based on the frequency of their co-occurrence with other words. We find that we are able to extract classes that have the flavor of either syntactically based groupings or semantically based groupings, depending on the nature of the underlying statistics.
1. Introduct ion
In a number of natural language processing tasks, we face the problem of recovering a string of English words after it has been garbled by passage through a noisy channel. To tackle this problem successfully, we must be able to estimate the probability with which any particular string of English words will be presented as input to the noisy channel. In this paper, we discuss a method for making such estimates. We also discuss the related topic of assigning words to classes according to statistical behavior in a large body of text.
In the next section, we review the concept of a language model and give a defini- tion of n-gram models. In Section 3, we look at the subset of n-gram models in which the words are divided into classes. We show that for n = 2 the maximum likelihood assignment of words to classes is equivalent to the assignment for which the average mutual information of adjacent classes is greatest. Finding an optimal assignment of words to classes is computationally hard, but we describe two algorithms for finding a suboptimal assignment. In Section 4, we apply mutual information to two other forms of word clustering. First, we use it to find pairs of words that function together as a single lexical entity. Then, by examining the probability that two words will appear within a reasonable distance of one another, we use it to find classes that have some loose semantic coherence.
In describing our work, we draw freely on terminology and notation from the mathematical theory of communication. The reader who is unfamiliar with this field or who has allowed his or her facility with some of its concepts to fall into disrepair may profit from a brief perusal of Feller (1950) and Gallagher (1968). In the first of these, the reader should focus on conditional probabilities and on Markov chains; in the second, on entropy and mutual information.
* IBM T. J. Watson Research Center, Yorktown Heights, New York 10598.
(~) 1992 Association for Computational Linguistics
Peter F. Brown and Vincent J. Della Pietra Class-Based n-gram Models of Natural Language
Friday Monday Thursday Wednesday Tuesday Saturday Sunday weekends Sundays Saturdays June March July April January December October November September August people guys folks fellows CEOs chaps doubters commies unfortunates blokes down backwards ashore sideways southward northward overboard aloft downwards adrift water gas coal liquid acid sand carbon steam shale iron great big vast sudden mere sheer gigantic lifelong scant colossal man woman boy girl lawyer doctor guy farmer teacher citizen American Indian European Japanese German African Catholic Israeli Italian Arab pressure temperature permeability density porosity stress velocity viscosity gravity tension mother wife father son husband brother daughter sister boss uncle machine device controller processor CPU printer spindle subsystem compiler plotter John George James Bob Robert Paul William Jim David Mike anyone someone anybody somebody feet miles pounds degrees inches barrels tons acres meters bytes director chief professor commissioner commander treasurer founder superintendent dean cus- todian liberal conservative parliamentary royal progressive Tory provisional separatist federalist PQ had hadn't hath would've could've should've must've might've asking telling wondering instructing informing kidding reminding bc)thering thanking deposing that tha theat head body hands eyes voice arm seat eye hair mouth
Table 2 Classes from a 260,741-word vocabulary.
we include no more than the ten most frequent words of any class (the other two months would appear with the class of months if we extended this limit to twelve). The degree to which the classes capture both syntactic and semantic aspects of English is quite surprising given that they were constructed from nothing more than counts of bigrams. The class {that tha theat} is interesting because although tha and theat are not English words, the computer has discovered that in our data each of them is most often a mistyped that.
Table 4 shows the number of class 1-, 2-, and 3-grams occurring in the text with various frequencies. We can expect from these data that maximum likelihood estimates will assign a probability of 0 to about 3.8 percent of the class 3-grams and to about .02 percent of the class 2-grams in a new sample of English text. This is a substantial improvement over the corresponding numbers for a 3-gram language model, which are 14.7 percent for word 3-grams and 2.2 percent for word 2-grams, but we have achieved this at the expense of precision in the model. With a class model, we distin- guish between two different words of the same class only according to their relative frequencies in the text as a whole. Looking at the classes in Tables 2 and 3, we feel that
475
Computational Linguistics,1992
“BrownClustering”
![Page 46: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/46.jpg)
hiddenMarkovmodelwithone-cluster-per-wordconstraint
46
justin bieber forpresident
y1 y2 y3 y4
BrownClustering(Brownetal.,1992)
![Page 47: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/47.jpg)
hiddenMarkovmodelwithone-cluster-per-wordconstraint
47
justin bieber forpresident
y1 y2 y3 y4
algorithm:� initializeeachwordasitsowncluster� greedilymergeclusterstoimprovedatalikelihood
BrownClustering(Brownetal.,1992)
![Page 48: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/48.jpg)
hiddenMarkovmodelwithone-cluster-per-wordconstraint
48
justin bieber forpresident
y1 y2 y3 y4
algorithm:� initializeeachwordasitsowncluster� greedilymergeclusterstoimprovedatalikelihood
outputshierarchicalclustering
BrownClustering(Brownetal.,1992)
![Page 49: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/49.jpg)
weinduced1000Brownclustersfrom56millionEnglishtweets(1billionwords)
onlywordsthatappearedatleast40times
(Owoputi,O’Connor,Dyer,Gimpel,Schneider,andSmith,2013)
49
![Page 50: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/50.jpg)
ExampleClustermissedlovedhatedmisreadadmiredunderestimatedresistedadoreddislikedregrettedmissd fanciedluved prefered luvdoverdidmistypedmisd misssed loooovedmisjudgedlovedd loooved loathedlurves lovd
50
![Page 51: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/51.jpg)
ExampleClustermissedlovedhatedmisreadadmiredunderestimatedresistedadoreddislikedregrettedmissd fanciedluved prefered luvdoverdidmistypedmisd misssed loooovedmisjudgedlovedd loooved loathedlurves lovd
51
spellingvariation
![Page 52: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/52.jpg)
“really”reallyrlyrealy genuinelyrlly reallly reallllyreallyy rele realli relly reallllly reli reali sholl rilyreallyyy reeeeally realllllly reaally reeeally rilireaaally reaaaally reallyyyy rilly realllllllyreeeeeally reeally shol realllyyy reely rellereaaaaally shole really2 reallyyyyy _really_realllllllly reaaly realllyy reallii reallt genuinly rellirealllyyyy reeeeeeally weally reaaallly reallllyyyreallllllllly reaallly realyy /really/reaaaaaally reallureaaaallly reeaally rreally reallyreally eally reeeaaally reeeaaallyreaallyy reallyyyyyy –really- reallyreallyreally rilli reallllyyyy relalyreallllyy really-reallyr3ally reeli reallie realllllyyy rli reallllllllllyreaaaly reeeeeeeally
52
![Page 53: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/53.jpg)
“really”reallyrlyrealy genuinelyrlly reallly reallllyreallyy rele realli relly reallllly reli reali sholl rilyreallyyy reeeeally realllllly reaally reeeally rilireaaally reaaaally reallyyyy rilly realllllllyreeeeeally reeally shol realllyyy reely rellereaaaaally shole really2 reallyyyyy _really_realllllllly reaaly realllyy reallii reallt genuinly rellirealllyyyy reeeeeeally weally reaaallly reallllyyyreallllllllly reaallly realyy /really/reaaaaaally reallureaaaallly reeaally rreally reallyreally eally reeeaaally reeeaaallyreaallyy reallyyyyyy –really- reallyreallyreally rilli reallllyyyy relalyreallllyy really-reallyr3ally reeli reallie realllllyyy rli reallllllllllyreaaaly reeeeeeeally
53
![Page 54: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/54.jpg)
“really”reallyrlyrealy genuinelyrlly reallly reallllyreallyy rele realli relly reallllly reli reali sholl rilyreallyyy reeeeally realllllly reaally reeeally rilireaaally reaaaally reallyyyy rilly realllllllyreeeeeally reeally shol realllyyy reely rellereaaaaally shole really2 reallyyyyy _really_realllllllly reaaly realllyy reallii reallt genuinly rellirealllyyyy reeeeeeally weally reaaallly reallllyyyreallllllllly reaallly realyy /really/reaaaaaally reallureaaaallly reeaally rreally reallyreally eally reeeaaally reeeaaallyreaallyy reallyyyyyy –really- reallyreallyreally rilli reallllyyyy relalyreallllyy really-reallyr3ally reeli reallie realllllyyy rli reallllllllllyreaaaly reeeeeeeally
54
![Page 55: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/55.jpg)
“goingto”gonna gunna gona gna guna gnna ganna qonnagonnna gana qunna gonne goona gonnaa g0nnagoina gonnah goingto gunnah gonaa gonangunnna going2gonnnnagunnaa gonny gunaaquna goonna qona gonns goinna gonnae qnnagonnaaa gnaa
55
![Page 56: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/56.jpg)
“so”soo sooo soooo sooooo soooooo sooooooosoooooooo sooooooooo soooooooooosooooooooooo soooooooooooosooooooooooooo soso soooooooooooooosooooooooooooooo soooooooooooooooosososo superrr sooooooooooooooooo ssoooso0osuperrrr so0soooooooooooooooooososososo sooooooooooooooooooossoo sssooosoooooooooooooooooooo#toos0ossoooo s00
56
![Page 57: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/57.jpg)
hotfriedpeanuthomemadegrilledspicysoycheesycoconutveggieroastedleftoverblueberryicydunkinmashedrottenmellowboilingcrispypeppermintfruitytoastedcrunchyscrambledcreamyboiledchunkyfunnelsoggyclamsteamedcajun steamingchewysteamynachomincereese's shreddedsaltedglazedspicedventi pickledpowderedbutternutmisobeetsizzling
57
Food-RelatedAdjectives
![Page 58: TTIC 31190: Natural Language Processingttic.uchicago.edu/~kgimpel/teaching/31190/lectures/7.pdfAnnouncements • Assignment 2 has been posted, due Feb. 3 • Midterm scheduled for](https://reader035.fdocuments.us/reader035/viewer/2022062604/5fbda0ff9a01f709fc3c80ae/html5/thumbnails/58.jpg)
AdjectiveIntensifiers/Qualifierskinda hella sorta hecka kindof kindaa kinna hellla propahelluh kindda justa #slickhelllla hela jii sortof hellaakida wiggity hellllla hekka hellah kindaaa hellaaa kindahknda kind-ofslicc wiggidy helllllla jih jye kinnda odheekiinda heka sorda ohde kind've kidna baree rle hellaaaajussa
58