Syntactic Parsing - Sameer...

Post on 16-Oct-2020

4 views 0 download

Transcript of Syntactic Parsing - Sameer...

SyntacticParsing

Prof.SameerSinghCS295:STATISTICALNLP

WINTER2017

February7,2017

BasedonslidesfromNathanSchneider,NoahSmith,MarineCarpuat,DanJurafsky,andeveryoneelsetheycopiedfrom.

Outline

CS295:STATISTICALNLP(WINTER2017) 2

SyntacticParsing

ContextFreeGrammars

Parsing:CKYAlgorithm

Outline

CS295:STATISTICALNLP(WINTER2017) 3

SyntacticParsing

ContextFreeGrammars

Parsing:CKYAlgorithm

LimitationsofSequenceTags

CS295:STATISTICALNLP(WINTER2017) 4

JohnSmithshotBillinhispajamas.

Whathappened?Whoshotwho?Whowaswearingthepajamas?

Usinghttp://nlp.stanford.edu:8080/corenlp/process

Constituents

CS295:STATISTICALNLP(WINTER2017) 5

• Constituentbehaveasaunitthatcanberearranged:Johntalked[tothechildren][aboutdrugs].Johntalked[aboutdrugs][tothechildren].Johntalkeddrugstothechildrenabout

• Orsubstituted/expanded:

Johntalked[tothechildrentakingthedrugs][aboutalcohol].

HarrytheHorseahigh-classspotsuchasMindy’stheBroadwaycoppersthereasonhecomesintotheHotBoxtheythreepartiesfromBrooklyn

X

arrive(s)attract(s)love(s)sit(s)

“NounphrasesappearbeforeverbsinEnglish.”

ConstituentsandGrammars

CS295:STATISTICALNLP(WINTER2017) 6

Grammar

• Tellsyouhowtheconstituentscanbearranged• Implicitknowledgeforus(weoftencan’ttellwhysomethingiswrong)• Generateall,andonly,thepossiblesentencesofthelanguage• Differentfrommeaning:

Colorlessgreenideassleepfuriously.

• Thewordsareintherightorder,• Andthatideasaregreenandcolorless,• Andthatideassleep,• Andthatsleepingisdonefuriously,• Asopposedto:“sleepgreenfuriouslyideascolorless”

UsesofParsing

CS295:STATISTICALNLP(WINTER2017) 7

• Grammarcheckers• Dialogsystems• Highprecisionquestionanswering• Namedentityrecognition• Sentencecompression• Extracting opinionsaboutproducts• Improvedinteractionincomputergames• Helpinglinguistsfinddata• Machinetranslation• Relationextractionsystems

[send[thetextmessagefromJames][toSharon]]

[translate[themessage][fromHindi][toEnglish]]

Outline

CS295:STATISTICALNLP(WINTER2017) 8

SyntacticParsing

ContextFreeGrammars

Parsing:CKYAlgorithm

BasicGrammar:RegularExpr.

CS295:STATISTICALNLP(WINTER2017) 9

• Youcancaptureindividualwords:• (man|dog|cat)

• Simplesentences:• (man|dog|cat)(ate|loves|consumed)(.|food|lunch)

• Infinitelength?Yes!• men(wholike(cats|dogs))*cry.

FiniteStateMachine Start S1

End

S2

S3men

cry

wholike

dogs

cats

ButtooweakforEnglish.

Context-FreeGrammars

CS295:STATISTICALNLP(WINTER2017) 10

Grammar,G TerminalSymbols Non-terminalSymbols

Rules

Grammarappliesrulesrecursively..Ifwecanconstructtheinputsentence,itisinthegrammar,otherwisenot.

CS295:STATISTICALNLP(WINTER2017) 11

ExampleCFG

ExampleParseTree

CS295:STATISTICALNLP(WINTER2017) 12

Ipreferamorningflight.

ExampleParseTree:Brackets

CS295:STATISTICALNLP(WINTER2017) 13

Ipreferamorningflight.

Moredetails:NounPhrases

CS295:STATISTICALNLP(WINTER2017) 14

“allthemorningflightsfromDenvertoTampaleavingbefore10”

NP® ProperNoun

NP® Det Nominal

Nominal® Noun|NounNominal

SimpleNounPhrases

ComplexNounPhrases

RecursiveNounPhrases

CS295:STATISTICALNLP(WINTER2017) 15

thisisthehouse

thisisthehousethatJackbuilt

thisisthecatthatlivesinthehousethatJackbuilt

thisisthedogthatchasedthecatthatlivesinthehousethatJackbuilt

thisisthefleathatbitthedogthatchasedthecatthatlivesinthehousetheJackbuilt

thisisthevirusthatinfectedthefleathatbitthedogthatchasedthecatthatlivesinthehousethatJackbuilt

Moredetails:VerbPhrases

CS295:STATISTICALNLP(WINTER2017) 16

VP® Verb disappearVP® VerbNP preferamorningflightVP® Verb NPPP leaveBostoninthemorningVP® Verb PP leaveinthemorning

SimpleVerbPhrases

Butallverbsarenotthesame!(thisgrammarovergenerates)

Sneezed: Johnsneezed.Find: PleasefindaflighttoNY.Give: Givemeacheaperfare.Help: Canyouhelpmewithaflight?Prefer: Iprefertoleaveearlier.Told: IwastoldUnitedhasaflight.Solution:subcategorize!

TypesofSentences

CS295:STATISTICALNLP(WINTER2017) 17

Declarative Aplaneleft.S® NPVP

Imperative Showtheplane.S® VP

Yes/noQuestions Didtheplaneleave?S® AuxNPVP

Wh-Questions Whendidtheplaneleave?S®WhNP AuxNPVP

SourceofGrammar?

CS295:STATISTICALNLP(WINTER2017) 18

Writesymbolicgrammar(CFGoroftenricher)andlexiconS® NPVP NN® interestNP® (DT)NN NNS® ratesNP® NNNNS NNS® raisesNP® NNP VBP® interestVP® VNP VBZ® rates

Usedgrammar/proofsystemstoproveparsesfromwords

Fedraisesinterestrates0.5%inefforttocontrolinflation◦ Minimalgrammar: 36parses◦ Simple10rulegrammar: 592parses◦ Real-sizebroad-coveragegrammar: millionsofparses

Manual

NoamChomsky

SourceofGrammar?

CS295:STATISTICALNLP(WINTER2017) 19

Fromdata!

ThePennTreebank

Buildingatreebankseemsalotslowerandlessusefulthanbuildingagrammar

Butatreebankgivesusmanythings• Reusabilityofthelabor

• Manyparsers,POStaggers,etc.• Valuableresourceforlinguistics

• Broadcoverage• Frequenciesanddistributionalinformation• Awaytoevaluatesystems

[Marcusetal.1993,ComputationalLinguistics]

CS295:STATISTICALNLP(WINTER2017) 20

( (S(NP-SBJ (DT The) (NN move))(VP (VBD followed)(NP(NP (DT a) (NN round))(PP (IN of)(NP(NP (JJ similar) (NNS increases))(PP (IN by)(NP (JJ other) (NNS lenders)))

(PP (IN against)(NP (NNP Arizona) (JJ real) (NN estate) (NNS loans))))))

(, ,)(S-ADV(NP-SBJ (-NONE- *))(VP (VBG reflecting)(NP(NP (DT a) (VBG continuing) (NN decline))(PP-LOC (IN in)(NP (DT that) (NN market)))))))

(. .)))

CS295:STATISTICALNLP(WINTER2017) 21

Someoftherules,withcounts40717PP→INNP33803S→NP-SBJVP22513NP-SBJ→-NONE-21877NP→NPPP20740NP→DTNN14153S→NP-SBJVP.12922VP→TOVP11881PP-LOC→INNP11467NP-SBJ→PRP11378NP→-NONE-11291NP→NN...989VP→VBGS985NP-SBJ→NN983PP-MNR→INNP983NP-SBJ→DT969VP→VBNVP

100VP→VBDPP-PRD100PRN→:NP:100NP→DTJJS100NP-CLR→NN99NP-SBJ-1→DTNNP98VP→VBNNPPP-DIR98VP→VBDPP-TMP98PP-TMP→VBGNP97VP→VBDADVP-TMPVP...10WHNP-1→WRBJJ10VP→VPCCVPPP-TMP10VP→VPCCVPADVP-MNR10VP→VBZS,SBAR-ADV10VP→VBZSADVP-TMP

4500rulesforVP!

EvaluatingParses

CS295:STATISTICALNLP(WINTER2017) 22

Eachparsetreeisrepresentedbyalistoftuples:

Usethistoestimateprecision/recall!

EvaluatingParses:Example

CS295:STATISTICALNLP(WINTER2017) 23

Outline

CS295:STATISTICALNLP(WINTER2017) 24

SyntacticParsing

ContextFreeGrammars

Parsing:CKYAlgorithm

TheParsingProblem

CS295:STATISTICALNLP(WINTER2017) 25

Givensentencex andgrammarG,

Recognition “Proof”isadeduction,validparsetree.Issentencex inthegrammar?Ifso,proveit.

ParsingEvenwithsmallgrammars,bruteforcegrowsexponentially!

Showoneormorederivationsforx inG.

“Bookthatflight”

TopDownParsing

CS295:STATISTICALNLP(WINTER2017) 26

“Bookthatflight”ConsidersonlyvalidtreesButareinconsistentwiththewords!

Bottom-upParsing

CS295:STATISTICALNLP(WINTER2017) 27

BuildsonlyconsistenttreesButmostofthemareinvalid(don’tgoanywhere)!

“Bookthatflight”

ChomskyNormalForm

CS295:STATISTICALNLP(WINTER2017) 28

Contextfreegrammarwhereallnon-terminalstogo:- 2non-terminals,or- Asingleterminal A® BC D® w

ConvertingtoCNF

A® BB® CDB® w

A® CDA® w

Case1

A® BCDEA® XEX® YDY® BC

Case2

CS295:STATISTICALNLP(WINTER2017) 29

OriginalGrammar ChomskyNormalForm

DynamicProgramming

CS295:STATISTICALNLP(WINTER2017) 30

table[i,j]=Setofallvalidnon-terminalsfortheconstituentspan(i,j)

Recursion Rule:A® BCA

B C

(i,j)

(i,k) (k,j)

IfyoufindaksuchthatBisintable[i,k],andCisintable[k,j],thenAshouldbeintable[i,j]

BasecaseRule:A® word[j]

Ashouldbeintable[j-1,j]

A (j-1,j)

word[j]

CKYAlgorithm

CS295:STATISTICALNLP(WINTER2017) 31

Book the flight through TWA

Outline

CS295:STATISTICALNLP(WINTER2017) 32

SyntacticParsing

ContextFreeGrammars

Parsing:CKYAlgorithm

Upcoming…

CS295:STATISTICALNLP(WINTER2017) 33

• Homework2isdueinaweek:February13,2017• Homework1gradeswillbeavailabletonight

Homework

• Proposalisdueontonight• Only2pagesProject

• Papersummaries:February17,February28,March14• Only1 pageeachSummaries