CS378: Natural Language Processing Lecture 10: Seq 3...
Transcript of CS378: Natural Language Processing Lecture 10: Seq 3...
![Page 1: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/1.jpg)
CS378:NaturalLanguageProcessingLecture10:Seq3/SyntaxI
GregDurrett
![Page 2: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/2.jpg)
Announcements
‣Midterm:listoftopicsnextweek.CoverscontentuptoMarch7
‣ A2duetoday
‣ CRFswillNOTbeonthemidterm,acoupleothertopicstoo
‣ A3outtomorrow
![Page 3: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/3.jpg)
Today
‣ CondiRonalrandomfields
‣ NamedenRtyrecogniRon
‣ SyntaxandconsRtuencyparsing
![Page 4: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/4.jpg)
CRFsandNER
![Page 5: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/5.jpg)
NamedEnRtyRecogniRon
BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.
PERSON LOC ORG
B-PER I-PER O O O B-LOC B-ORGO O O O O
‣ FrameasasequenceproblemwithaBIOtagset:begin,inside,outside
‣WhymightanHMMnotdosowellhere?
‣ LotsofO’s,sotagsaren’tasinformaRveaboutcontext
‣ Needsub-wordfeaturesonunknownwords‣ CRFsarediscriminaRvemodelsthatwillsolvetheseproblems
![Page 6: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/6.jpg)
CondiRonalRandomFields
‣ HMMsareexpressibleasBayesnets(factorgraphs)
y1 y2 yn
x1 x2 xn
…
‣ ThisreflectsthefollowingdecomposiRon:
‣ Locallynormalizedmodel:eachfactorisaprobabilitydistribuRonthatnormalizes
P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .
![Page 7: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/7.jpg)
CondiRonalRandomFields
anyreal-valuedscoringfuncRonofitsarguments
‣ Howdowemaxovery?RequiresconsideringanexponenRalnumberofsequencesingeneral
‣ CRFs:discriminaRvemodelswiththefollowingglobally-normalizedform:
‣ HMMs:
‣ NaiveBayes:logisRcregression::HMMs:CRFslocalvs.globalnormalizaRon<->generaRvevs.discriminaRve
normalizerZ
P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .
P (y|x) =Q
k exp(�k(x,y))Py0
Qk exp(�k(x,y0
))
![Page 8: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/8.jpg)
SequenRalCRFs
y1 y2 yn
x1 x2 xn
…
P (y|x) /Y
k
exp(�k(x,y))
y1 y2 yn
x1 x2 xn
…�t
�e
�o
P (y|x) / exp(�
o
(y1))
nY
i=2
exp(�
t
(y
i�1, yi))
nY
i=1
exp(�
e
(x
i
, y
i
))
‣ HMMs:
‣ CRFs:
P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .
![Page 9: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/9.jpg)
SequenRalCRFs
y1 y2 yn
x1 x2 xn
…�t
�e
�o
P (y|x) / exp(�
o
(y1))
nY
i=2
exp(�
t
(y
i�1, yi))
nY
i=1
exp(�
e
(x
i
, y
i
))
‣WecondiRononx,soeveryfactorcandependonallofx
nY
i=1
exp(�e(yi, i,x))
‣ ycan’tdependarbitrarilyonxinageneraRvemodel tokenindex—letsuslookatcurrentword
y1 y2 yn…�t
�e
�o
![Page 10: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/10.jpg)
SequenRalCRFs
y1 y2 yn…�t
�e
�o
‣ Don’tincludeiniRaldistribuRon,canbakeintootherfactors
P (y|x) = 1
Z
nY
i=2
exp(�t(yi�1, yi))nY
i=1
exp(�e(yi, i,x))
SequenRalCRFs:
![Page 11: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/11.jpg)
FeatureFuncRons
y1 y2 yn…
�e
�t
‣ Phiscanbealmostanything!HereweuselinearfuncRonsofsparsefeatures
‣ LookslikeoursingleweightvectormulRclasslogisRcregressionmodel
�t(yi�1, yi) = w>ft(yi�1, yi)
P (y|x) / expw>
"nX
i=2
ft(yi�1, yi) +nX
i=1
fe(yi, i,x)
#�e(yi, i,x) = w>fe(yi, i,x)
P (y|x) = 1
Z
nY
i=2
exp(�t(yi�1, yi))nY
i=1
exp(�e(yi, i,x))
![Page 12: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/12.jpg)
BasicFeaturesforNER
BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.
OB-LOC
TransiRons:
Emissions: Ind[B-LOC&Currentword=Hangzhou]Ind[B-LOC&Prevword=to]
ft(yi�1, yi) = Ind[yi�1 & yi]
fe(y6, 6,x) =
P (y|x) / expw>
"nX
i=2
ft(yi�1, yi) +nX
i=1
fe(yi, i,x)
#
=Ind[O—B-LOC]
![Page 13: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/13.jpg)
CRFsOutline
‣Model: P (y|x) = 1
Z
nY
i=2
exp(�t(yi�1, yi))nY
i=1
exp(�e(yi, i,x))
‣ Inference:argmaxP(y|x)fromViterbi
P (y|x) / expw>
"nX
i=2
ft(yi�1, yi) +nX
i=1
fe(yi, i,x)
#
‣ Learning:requiresrunningsum-productViterbitocomputeposteriorprobabiliResP(y|x)ateachstepi
![Page 14: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/14.jpg)
FeaturesforNER
‣ Contextfeatures(can’tuseinHMM!)‣Wordsbefore/ajer‣ Tagsbefore/ajer
‣Wordfeatures(canuseinHMM)‣ CapitalizaRon‣Wordshape‣ Prefixes/suffixes‣ Lexicalindicators
‣ Gazeleers‣Wordclusters
Leicestershire
Boston
Applereleasedanewversion…
AccordingtotheNewYorkTimes…
![Page 15: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/15.jpg)
EvaluaRngNER
‣ PredicRonofallOssRllgets66%accuracyonthisexample!
BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.
PERSON LOC ORG
B-PER I-PER O O O B-LOC B-ORGO O O O O
‣Whatwereallywanttoknow:howmanynamedenRtychunkpredicRonsdidwegetright?‣ Precision:oftheoneswepredicted,howmanyareright?
‣ Recall:ofthegoldnamedenRRes,howmanydidwefind?
‣ F-measure:harmonicmeanofthesetwo
![Page 16: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/16.jpg)
NER
‣ CRFwithlexicalfeaturescangetaround85F1onthisproblem
‣ OtherpiecesofinformaRonthatmanysystemscapture
‣Worldknowledge:
ThedelegaRonmetthepresidentattheairport,Tanjugsaid.
![Page 17: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/17.jpg)
ORG?PER?
NonlocalFeatures
ThedelegaRonmetthepresidentattheairport,Tanjugsaid.
ThenewsagencyTanjugreportedontheoutcomeofthemeeRng.
‣Morecomplexfactorgraphstructurescanletyoucapturethis,orjustdecodesentencesinorderandusefeaturesonprevioussentences
FinkelandManning(2008),RaRnovandRoth(2009)
![Page 18: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/18.jpg)
HowwelldoNERsystemsdo?
RaRnovandRoth(2009)
Lampleetal.(2016)
BiLSTM-CRF+ELMo Petersetal.(2018)
92.2
![Page 19: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/19.jpg)
Takeaways
‣ CRFsarestructuredfeature-basedmodels
‣ Efficienttodoinferenceandlearningusingdynamicprograms
‣ LookslikelogisRcregression,butrequiresmoreefforttoimplement
![Page 20: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/20.jpg)
ConsRtuencyParsing
![Page 21: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/21.jpg)
Syntax‣ Studyofwordorderandhowwordsformsentences
‣Whydowecareaboutsyntax?
‣ Recognizeverb-argumentstructures(whoisdoingwhattowhom?)
‣MulRpleinterpretaRonsofwords(nounorverb?Fedraises…example)
‣ HigherlevelofabstracRonbeyondwords:somelanguagesareSVO,someareVSO,someareSOV,parsingcancanonicalize
![Page 22: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/22.jpg)
ConsRtuencyParsing‣ Tree-structuredsyntacRcanalysesofsentences
‣ Commonthings:nounphrases,verbphrases,preposiRonalphrases
‣ BolomlayerisPOStags
‣ ExampleswillbeinEnglish.ConsRtuencymakessenseforalotoflanguagesbutnotall
![Page 23: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/23.jpg)
sentenRalcomplement
wholeembeddedsentence
adverbialphrase
![Page 24: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/24.jpg)
ConsRtuencyParsing
Examples
![Page 25: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/25.jpg)
Challenges‣ PPalachment
§ Ifwedonoannota+on,thesetreesdifferonlyinonerule:§ VP→VPPP§ NP→NPPP
§ Parsewillgoonewayortheother,regardlessofwords§ Lexicaliza+onallowsustobesensi+vetospecificwords
sameparseas“thecakewithsomeicing”
![Page 26: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/26.jpg)
Challenges‣ NPinternalstructure:tags+depthofanalysis
![Page 27: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/27.jpg)
ConsRtuency‣ HowdoweknowwhattheconsRtuentsare?
‣ ConsRtuencytests:‣ SubsRtuRonbyproform(e.g.,pronoun)
‣ Clejing(Itwaswithaspoonthat…)
‣ Answerellipsis(Whatdidtheyeat?thecake) (How?withaspoon)
‣ SomeRmesconsRtuencyisnotclear,e.g.,coordinaRon:shewenttoandboughtfoodatthestore
![Page 28: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/28.jpg)
Context-FreeGrammars,CKY
![Page 29: CS378: Natural Language Processing Lecture 10: Seq 3 ...gdurrett/courses/sp2019/lectures/lec10-1pp.pdf · Barack Obama will travel to Hangzhou today for the G20 mee=ng . PERSON LOC](https://reader035.fdocuments.us/reader035/viewer/2022071011/5fc912df14ef0f43a247fda2/html5/thumbnails/29.jpg)
Survey
‣ 1.Thepaceofthefirstfewlectures(naiveBayes,logisRcregression,perceptron,etc.)was[toofast/tooslow/justright]
‣ 2.Thepaceofthelastfewlectures(tagging,Viterbi,parsing)was[toofast/tooslow/justright]
‣ 3.Thehomeworksoverallare[toohard/tooeasy/justright]
‣ 4.IwouldpreferA3bedueon[FridayMarch8/MondayMarch11](midtermisonThursday,March14)
‣ 5.Othercomments(likes/dislikes)