Administrivia CS395T: Structured Models for NLP Lecture 10: …gdurrett/courses/fa2017/... ·...
Transcript of Administrivia CS395T: Structured Models for NLP Lecture 10: …gdurrett/courses/fa2017/... ·...
CS395T:StructuredModelsforNLPLecture10:Trees4
GregDurrett
Administrivia
‣ Project1gradedbylateweek/thisweekend
Recall:Eisner’sAlgorithm‣ LeMandrightchildrenarebuiltindependently,headsareedgesofspans‣ Completeitem:allchildrenareaPached,headisatthe“tallend”‣ Incompleteitem:arcfrom“tallend”to“shortend”,maysTllexpectchildren
DT NNTOVBDDT NNthe housetoranthe dog
ROOT
Recall:MSTAlgorithm
‣ Eisner:searchoverthespaceofprojecTvetrees,O(n3)
‣MST:findmaximumdirectedspanningtree—findsnonprojecTvetreesaswellasprojecTvetreesO(n2)
‣MSTrestrictedtofeaturesonsingledependencies,Eisnercanbegeneralizedtoincorporatehigher-orderfeatures(grandparents,siblings,etc.)ataTmecomplexitycost,orwithbeaming
Recall:TransiTon-BasedParsing
‣ Start:stackcontains[ROOT],buffercontains[Iatesomespaghedbolognese]
‣ ShiM:topofbuffer->topofstack
‣ LeM-Arc: �|w�2, w�1 ! �|w�1 w�1w�2
‣ Right-Arc �|w�2, w�1 ! �|w�2
isnowachildof,
w�1 w�2,
‣ End:stackcontains[ROOT],bufferisempty[]
‣ Musttake2nstepsfornwords(nshiMs,nLA/RA)
isnowachildof
‣ Arc-standardsystem:threeoperaTons
Recall:TransiTon-BasedParsing
[ROOTate]
I
[somespaghedbolognese]
[ROOTatesomespaghed]
I
[bolognese]
[ROOTatespaghed]
I some
[bolognese]
S
L
Iatesomespaghedbolognese
S
ROOT Stopofbuffer->topofstackLARA
poptwo,leMarcbetweenthempoptwo,rightarcbetweenthem
ThisLecture
‣GlobalDecoding
‣ EarlyupdaTng
‣ ConnecTonstoreinforcementlearning,dynamicoracles
‣ State-of-the-artdependencyparsers,relatedtasks
GreedyTraining:StaTcStates
‣ Greedy:eachboxformsatrainingexample(s,a*)
Statespace
Goldendstate
Startstate
=BadalternaTvedecisions
GlobalDecoding
‣ Greedyparser:trainedtomaketherightdecision(S,LA,RA)fromanygoldstatewemightcometo
‣ Whymightthisbebad?
‣ WhatweopTmizingwhenwedecodeeachsentence?
s abest(s)
abest argmaxaw>f(s, a)
‣ Nothing…we’reexecuTng:
GlobalDecoding
[ROOTgavehim]
I
[dinner]
‣ Correct:Right-arc,ShiM,Right-arc,Right-arc
Igavehimdinner
ROOT
[ROOTgave]
I
[dinner]
him
[ROOTgavedinner]
I
[]
him
[ROOTgave]
I
[]
him dinner
GlobalDecoding:ACartoon
S
LA
RA
‣ Bothwrong!Alsobothprobablylowscoring!
RA S‣ Correct,highscoringopTon
[ROOTgavehim]
I
[dinner]Igavehimdinner
ROOT
[ROOTgavehimdinner]
I
[]
LA
[ROOTgave]
I him
[dinner]
GlobalDecoding:ACartoon
[ROOTgavehim]
I
[dinner]Igavehimdinner
ROOT
‣ Lookaheadcanhelpusavoidgedngstuckinbadspots
‣ Globalmodel:maximizesumofscoresoveralldecisions
‣ SimilartohowViterbiworks:wemaintainuncertaintyoverthecurrentstatesothatifanotheronelooksmoreopTmalgoingforward,wecanusethatone
GlobalShiM-ReduceParsing
[ROOTgavehim]
I
[dinner]Igavehimdinner
ROOT
‣Global:
‣ Canwedosearchexactly?
‣No!Usebeamsearch
‣Greedy:repeatedlyexecute
s abest(s)
abest argmaxaw>f(s, a)
argmaxs,af(s,a) =2nX
i=1
w>f(si, ai)
si+1 = ai(si)
‣ Howmanystatessarethere?
GlobalShiM-ReduceParsing
[ROOTgavehimdinner]
I
[]
[ROOTgave]
I him
[dinner]LA
RA
S
-1.2
+0.9
[ROOTgavehim]
I
[]-3.0
dinner
[ROOTgavedinner]
I
[]-2.0
him
[ROOTgavedinner]
I him+2.0
[]
‣ Beamsearchgaveusthelookaheadtomaketherightdecision
TrainingGlobalParsers‣ Cancomputeapproximatemaxeswithbeamsearch
‣ StructuredSVM:doloss-augmenteddecode,gradient=goldfeats-guessfeats
‣Whathappensifwesetbeamsize=1?
argmaxs,af(s,a) =2nX
i=1
w>f(si, ai)
‣ Structuredperceptron:normaldecode,gradient=goldfeats-guessfeats
GlobalTrainingForeachepoch
Foreachsentence
Fori=1…2*len(sentence)#2ntransiTonsinarc-standard
beam[i]=compute_successors(beam[i-1])
predicTon=beam[2*len(sentence),0]#argmax=topoflastbeam
apply_gradient_update(feats(gold)-feats(predicTon))#FeatsarecumulaTveoverthewholesentence
GlobalTraining
‣ LearnnegaTveweightsforfeaturesinthesestates—greedytrainingwouldneverseethesestates
Statespace
GoldendstateStartstate
‣ Inglobal,wekeepgoingifwescrewup!
Predendstate
Globalvs.Greedy
‣ Greedy:2nlocaltrainingexamples
Statespace
GoldendstateStartstate
‣ Global:oneglobalexample
‣ Inglobal,wekeepgoingifwescrewup!
EarlyUpdaTng
EarlyUpdaTng
Thisdecisionwasbad
Butthesemight’vebeengood!hardtotell
CollinsandRoark(2004)
Statespace
GoldendstateStartstate
EarlyUpdaTng
[ROOTgavedinner]
I
[]
him
Igavehimdinner
ROOT
‣ Ideallywedon’twanttopenalizethisdecision(updateawayfromit)—insteadjustpenalizethedecisionthatwasobviouslywrong
[ROOTgave]
I
[]
him‣ Wrongstate—wealreadymessedup!
dinner
‣ MadethebestofabadsituaTonbypudngagoodarcin(gave->dinner)
RA
CollinsandRoark(2004)
EarlyUpdaTng
‣ SoluTon:makeanupdateassoonasthegoldparsefallsoffthebeam
‣ goldfeats-guessfeatscomputeduptothispoint
EarlyUpdaTng
[ROOTgavehimdinner]
I
[]
[ROOTgave]
I him
[dinner]LA
RA
S
-1.2
+0.9[ROOTgavehim]
I
[]+1.0
dinner
[ROOTgavedinner]
I
[]-2.0
him
[ROOTgavedinner]
I him-3.0
[]‣ Goldhasfallenoffbeam!
‣ Update:goldfeats-predfeats
TrainingwithEarlyUpdaTngForeachepoch
Foreachsentence
Fori=1…2*len(sentence)#2ntransiTonsinarc-standard
beam[i]=compute_successors(beam[i-1])
Ifbeam[i]doesnotcontaingold:
break
apply_gradient_update(feats(gold[0:i])-feats(beam[i,0]))#FeatsarecumulaTveupunTlthispoint
apply_gradient_update(feats(gold)-feats(beam[2*len(sentence),0]))#GoldsurvivedtotheendbutmaysTllnotbeone-best
ConnecTonstoReinforcementLearning
MoTvaTon
‣ Partofthebenefitisweseestateswewouldn’thaveseenduringgreedydecoding‣ (STlltrueevenwithearlyupdaTngduetobeamsearch)
BePerGreedyAlgorithm
Foreachepoch:
Foreachsentence:
Parsethesentencewiththecurrentweights
Foreachstatesintheparse:
DeterminewhattherightacTona*was
Trainonthisexample(updatetowardsf(s,a*),awayfromf(s,apred))
‣ Howdowedeterminethis?
DynamicOracles‣Whenyoumakesomebaddecisions,howdoyoudigyourselfout?
GoldbergandNivre(2012)
‣ Scoreofdecisionainstatesleadingtos’:loss(a)=loss(best_possible_tree(s’))-loss(best_possible_tree(s))
‣ best_possible_tree(s):computestheopTmaldecisionsequencefromstatestotheendresulTngthelowestoverallloss
‣ Implementedbyabunchoflogicthatlooksatthetree:“ifweputaright-arcfroma->b,wecan’tgivebanymorechildren,soloseapointforeveryunboundchild,alsoloseapointifaisn’tb’shead…”
‣ a*=argminaloss(a)
ConnecTonstoReinforcementLearning
‣MarkovDecisionProcess:statess,acTonsa,transiTonsT,rewardsr,discountfactor
‣ TisdeterminisTcforus,=1(nodiscount)
‣ Usingthe“bePergreedyalgorithm”correspondstoon-policylearninghere
‣ Onerewardsystem:r=1ifacToniswhatdynamicoraclesays,0otherwise
‣ Butdynamicoraclesarehardtobuild:(
�
�
‣ Maximizesumofrewardsovertheparse
Searn
Daumeetal.(2009)
‣ Searn:frameworkforturningstructuredproblemsintoclassificaTonproblems
‣ Takethecurrentpolicy(=weights),generatestatessbyrunningthatpolicyonagivenexample
‣ EvaluteacTonainstatesbytakinga,thenfollowingyourcurrentpolicytocompleTonandcompuTngtheloss(=best_possible_lossisapproximatedbycurrentpolicy)
‣ WhatifwejusthadalossfuncTonl(y,y*)thatscoredwholepredicTons?I.e.,allrewardcomesattheend
‣ DAGGERalgorithmfromRLliterature
MoTvaTon
States,evaluateacTonsa
y*
…bycompuTnglosseshere`(y1,y
⇤)
`(y2,y⇤)
`(y3,y⇤)
GlobalModelsvs.RL
‣ RLtechniquesareusuallynottherightthingtodounlessyoulossfuncTonandstatespacearereallycomplicated
‣Otherwise,besttousedynamicoraclesorglobalmodels
‣ StructuredpredicTonproblemsaren’treally“RL”inthattheenvironmentdynamicsareunderstood
‣ Theseissuesarisefarbeyondparsing!Coreference,machinetranslaTon,dialoguesystems,…
State-of-the-artParsers
State-of-the-artParsers
‣ 2012:MaltparserwasSOTAwasfortransiTon-based(~90UAS),similartowhatyou’llbuild
‣ 2010:Koo’s3rd-orderparserwasSOTAforgraph-based(~93UAS)
‣ 2014:ChenandManninggot92UASwithtransiTon-basedneuralmodel
‣ 2005:MSTParsergotsolidperformance(~91UAS)
State-of-the-artParsers
ChenandManning(2014)
ParseyMcParseFace
Andoretal.(2016)
‣ Currentstate-of-the-art,releasedbyGooglepublicly
‣ 94.61UASonthePennTreebankusingaglobaltransiTon-basedsystemwithearlyupdaTng
‣ FeedforwardneuralnetslookingatwordsandPOSassociatedwith‣ Wordsatthetopofthestack‣ Thosewords’children‣ Wordsinthebuffer‣ FeaturesetpioneeredbyChenandManning(2014),Googlefine-tunedit
‣ AddiTonaldataharvestedvia“tri-training”
StackLSTMs
Dyeretal.(2015)
‣ UseLSTMsoverstack,buffer,pastacTonsequence.Trainedgreedily‣ SlightlylessgoodthanParsey
SemanTcRoleLabeling‣ Anotherkindoftree-structuredannotaTon,likeasubsetofdependency
‣ VerbrolesfromPropbank(Palmeretal.,2005),nominalpredicatestoo
FigurefromHeetal.(2017)
quicken:
AbstractMeaningRepresentaTon‣Graph-structuredannotaTon
Theboywantstogo
Banarescuetal.(2014)
‣ SupersetofSRL:fullsentenceanalyses,containscoreferenceandmulT-wordexpressionsaswell
‣ F1scoresinthe60s:hard!
‣ Socomprehensivethatit’shardtopredict,butsTlldoesn’thandletenseorsomeotherthings…
Takeaways
‣GlobaltrainingisanalternaTvetogreedytraining
‣UsebeamsearchforinferencecombinedwithearlyupdaTngforbestresults
‣Dynamicoracles+followingthepredictedpathinthestatespacelookslikereinforcementlearning
Survey
‣ Paceoflastlecture+thislecture:[tooslow][justright][toofast]
‣ Paceofclassoverall:[tooslow][justright][toofast]
‣Writeonethingyoulikeabouttheclass
‣Writeonethingyoudon’tlikeabouttheclass