CS388: Natural Language Processing Lecture 10: Syntax...
Transcript of CS388: Natural Language Processing Lecture 10: Syntax...
CS388:NaturalLanguageProcessingLecture10:SyntaxI
GregDurrett
SlidesadaptedfromDanKlein,UCBerkeley
Recall:CNNsvs.LSTMs
‣ BothLSTMsandconvoluLonallayerstransformtheinputusingcontext
themoviewasgood themoviewasgood
nxk
cfilters,mxkeach
O(n)xc
nxk
nx2c
BiLSTMwith hiddensizec
‣ LSTM:“globally”looksattheenLresentence(butlocalformanyproblems)
‣ CNN:localdependingonfilterwidth+numberoflayers
Recall:CNNs
themoviewasgood
nxk
cfilters,mxkeach
nxc
maxpoolingoverthesentence
c-dimensionalvector
projecLon+soZmax
P (y|x)
W
‣Maxpooling:returnthemaxacLvaLonofagivenfilterovertheenLresentence;likealogicalOR(sumpoolingislikelogicalAND)
Recall:NeuralCRFs
BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.
PERSON LOC ORG
B-PER I-PER O O O B-LOC B-ORGO O O O O
BarackObamawilltraveltoHangzhou
1)Computef(x)
2)Runforward-backward
3)Computeerrorsignal
4)Backprop(noknowledgeofsequenLalstructurerequired)
ThisLecture‣ ConsLtuencyformalism
‣ Context-freegrammarsandtheCKYalgorithm
‣ Refininggrammars
‣DiscriminaLveparsers
ConsLtuency
Syntax‣ Studyofwordorderandhowwordsformsentences
‣Whydowecareaboutsyntax?
‣ Recognizeverb-argumentstructures(whoisdoingwhattowhom?)
‣ MulLpleinterpretaLonsofwords(nounorverb?Fedraises…example)
‣ HigherlevelofabstracLonbeyondwords:somelanguagesareSVO,someareVSO,someareSOV,parsingcancanonicalize
ConsLtuencyParsing‣ Tree-structuredsyntacLcanalysesofsentences
‣ Commonthings:nounphrases,verbphrases,preposiLonalphrases
‣ BogomlayerisPOStags
‣ ExampleswillbeinEnglish.ConsLtuencymakessenseforalotoflanguagesbutnotall
sentenLalcomplement
wholeembeddedsentence
adverbialphrase
ConsLtuencyParsing
Theratthecatchasedsqueaked
IracedtoIndianapolis,unimpededbytraffic
Challenges‣ PPagachment
§ Ifwedonoannota+on,thesetreesdifferonlyinonerule:§ VP→VPPP§ NP→NPPP
§ Parsewillgoonewayortheother,regardlessofwords§ Lexicaliza+onallowsustobesensi+vetospecificwords
sameparseas“thecakewithsomeicing”
Challenges‣NPinternalstructure:tags+depthofanalysis
ConsLtuency‣HowdoweknowwhattheconsLtuentsare?
‣ ConsLtuencytests:‣ SubsLtuLonbyproform(e.g.,pronoun)
‣ CleZing(Itwaswithaspoonthat…)
‣ Answerellipsis(Whatdidtheyeat?thecake) (How?withaspoon)
‣ SomeLmesconsLtuencyisnotclear,e.g.,coordinaLon:shewenttoandboughtfoodatthestore
Context-FreeGrammars,CKY
CFGsandPCFGs§ Writesymbolicorlogicalrules:
§ Usededuc4onsystemstoproveparsesfromwords§ Minimalgrammaron“Fedraises”sentence:36parses§ Simple10-rulegrammar:592parses§ Real-sizegrammar:manymillionsofparses
§ Thisscaledverybadly,didn’tyieldbroad-coveragetools
Grammar (CFG) Lexicon
ROOT → S
S → NP VP
NP → DT NN
NP → NN NNS
NN → interest
NNS → raises
VBP → interest
VBZ → raises
…
NP → NP PP
VP → VBP NP
VP → VBP NP PP
PP → IN NP
‣ Context-freegrammar:symbolswhichrewriteasoneormoresymbols
‣ Lexiconconsistsof“preterminals”(POStags)rewriLngasterminals(words)
‣ CFGisatuple(N,T,S,R):N=nonterminals,T=terminals,S=startsymbol(generallyaspecialROOTsymbol),R=rules
‣ PCFG:probabiliLesassociatedwithrewrites,normalizebysourcesymbol
0.20.5
0.30.70.31.0
1.01.0
1.01.01.01.0
EsLmaLngPCFGs
‣ MaximumlikelihoodPCFG:countandnormalize!SameasHMMs/NaiveBayes
S→NPVP
NP→PRPNP→DTNN
…
1.00.50.5
‣ TreeTisaseriesofruleapplicaLonsr. P (T ) =Y
r2T
P (r|parent(r))
BinarizaLon‣ Toparseefficiently,weneedourPCFGstobeatmostbinary(notCNF)
VP
VBD NP PP PP
sold thebook toher for$3
P(VP→VBDNPPPPP)=0.2
VP
VBD VP
NP
PP
VP
PP
VP
VBD VP-[NPPPPP]
NP
PP
VP-[PPPP]
PP
‣ Lossless: ‣ Lossy:
P(VP→VBZPP)=0.1…
ChomskyNormalForm
VP
VBD VP-[NPPPPP]
VBD
PP
VP-[PPPP]
PPP(VP→VBDVP-[NPPPPP])=0.2
VP
VBD VP
NP
PP
VP
PP
‣ Lossless: ‣ Lossy:
P(VP→VBDVP)=0.2
P(VP→NPVP)=0.03P(VP-[NPPPPP]→NPVP-[PPPP])=1.0
P(VP-[PPPP]→PPPP)=1.0 P(VP→PPPP)=0.001‣ DeterminisLcsymbolsmakethis thesameasbefore
‣MakesdifferentindependentassumpLons,notthesamePCFG
CKY
He wrote a long report on Mars
NPPP
NP
‣ FindargmaxP(T|x)=argmaxP(T,x)
‣ Dynamicprogramming:chartmaintainsthebestwayofbuildingsymbolXoverspan(i,j)
‣ Loopoverallsplitpointsk,applyrulesX->YZtobuild Xineverypossibleway
‣ CKY=Viterbi,alsoanalgorithmcalledinside-outside=forward-backward Cocke-Kasami-Younger
i jk
X
ZY
UnaryRulesSBAR
S
theratthecatchasedsqueaked
NP
NNSmice
‣ UnaryproducLonsintreebankneedtobedealtwithbyparsers
‣ Binarytreesovernwordshaveatmostn-1nodes,butyoucanhaveunlimitednumbersofnodeswithunaries(S→SBAR→NP→S→…)
‣ InpracLce:enforceatmostoneunaryovereachspan,modifyCKYaccordingly
Results
KleinandManning(2003)
‣ StandarddatasetforEnglish:PennTreebank(Marcusetal.,1993)
‣ EvaluaLon:F1overlabeledconsLtuentsofthesentence
‣ VanillaPCFG:~75F1
‣ BestPCFGsforEnglish:~90F1
‣Otherlanguages:resultsvarywidelydependingonannotaLon+complexityofthegrammar
‣ SOTA:95F1
RefiningGeneraLveGrammars
PCFGIndependenceAssumpLons
11%9%
6%
NP PP DT NN PRP
9% 9%
21%
NP PP DT NN PRP
7%4%
23%
NP PP DT NN PRP
All NPs NPs under S NPs under VP
‣ Languageisnotcontext-free:NPsindifferentcontextsrewritedifferently
‣ Canwemakethegrammar“lesscontext-free”?
RuleAnnotaLon
‣ VerLcal(parent)annotaLon:addtheparentsymboltoeachnode,candograndparentstoo
§ Ver$calMarkovorder:rewritesdependonpastkancestornodes.(cf.parentannota$on)
Order 1 Order 2
72%73%74%75%76%77%78%79%
1 2v 2 3v 3
Vertical Markov Order
05000
10000
150002000025000
1 2v 2 3v 3
Vertical Markov Order
Symbols
KleinandManning(2003)
‣ LikeatrigramHMMtagger,incorporatesmorecontext
70%
71%
72%
73%
74%
0 1 2v 2 inf
Horizontal Markov Order
0
3000
6000
9000
12000
0 1 2v 2 inf
Horizontal Markov Order
Symbols
Order 1 Order ∞
KleinandManning(2003)
‣ HorizontalannotaLon:rememberthestatesof mulL-arityrulesduringbinarizaLon
AnnotatedTree
KleinandManning(2003)
‣ 75F1withbasicPCFG=>86.3F1withthishighlycustomizedPCFG(SOTAwas90F1attheLme,butwithmorecomplexmethods)
LexicalizedParsers
§ What’sdifferentbetweenbasicPCFGscoreshere?§ What(lexical)correla;onsneedtobescored?
‣ EvenwithparentannotaLon,thesetreeshavethesamerules.Needtousethewords
LexicalizedParsers§ Add“headwords”to
eachphrasalnode§ Syntac4cvs.seman4c
heads§ Headshipnotin(most)
treebanks§ Usuallyuseheadrules,
e.g.:§ NP:
§ TakeleFmostNP§ TakerightmostN*§ TakerightmostJJ§ Takerightchild
§ VP:§ TakeleFmostVB*§ TakeleFmostVP§ TakeleFchild
‣ Annotateeachgrammarsymbolwithits“headword”:mostimportantwordofthatconsLtuent
‣ RulesforidenLfyingheadwords(e.g.,thelastwordofanNPbeforeapreposiLonistypicallythehead)
‣ CollinsandCharniak(late90s):~89F1withthese
DiscriminaLveParsers
CRFParsing
HewrotealongreportonMars.
PP
NP
HewrotealongreportonMars.
PPNP
VP
VBDNP
Myreport
Fig.1report—onMars wrote—onMars
CRFParsing
Taskaretal.(2004)Hall,Durreg,andKlein(2014)
DurregandKlein(2015)
score
LeZchildlastword=report ∧ NP PPNP
w>f NP PPNP
2 5 7=
f NP PPNP
2 5 7HewrotealongreportonMars.
PPNP
NP
=2 5 7
wrotealongreportonMars.
wrotealongreportonMars.
‣ Canlearnthatwereport[PP],whichiscommonduetorepor=ngonthings
‣ Can“neuralize”thisaswelllikeneuralCRFsforNER
+Discrete ConLnuous
He wrote a long report on Mars
NPPP
NP
‣ Chartremainsdiscrete!
‣ Feedforwardpassonnets
‣ RunCKYdynamicprogram
‣DiscretefeaturecomputaLon
+Discrete ConLnuous…
Parsingasentence:
DurregandKlein(ACL2015)
JointDiscreteandConLnuousParsing NeuralCRFParsing
Sternetal.(2017),Kitaevetal.(2018)
‣ Simplerversion:scorecons=tuentsratherthanruleapplicaLons
score w>f NP2 7
=
HewrotealongreportonMars.
PPNP
NP
2 5 7
wrotealongreportonMars.
‣ UseBiLSTMs(Stern)orself-agenLon(Kitaev)tocomputespanembeddings
‣ 91-93F1,95F1withELMo(SOTA).Greatonotherlangstoo!
Takeaways‣ PCFGsesLmatedgeneraLvelycanperformwellifsufficientlyengineered
‣ NeuralCRFsworkwellforconsLtuencyparsing
‣NextLme:revisitlexicalizedparsingasdependencyparsing
Survey‣Writeonethingyoulikeabouttheclass
‣Writeonethingyoudon’tlikeabouttheclass