11 Syntactic Parsing. Produce the correct syntactic parse tree for a sentence.

69
1 Syntactic Parsing

Transcript of 11 Syntactic Parsing. Produce the correct syntactic parse tree for a sentence.

11

Syntactic Parsing

Syntactic Parsing

• Produce the correct syntactic parse tree for a sentence.

Programming languages

printf ("/charset [%s", (re_opcode_t) *(p - 1) == charset_not ? "^" : "");

assert (p + *p < pend);

for (c = 0; c < 256; c++) if (c / 8 < *p && (p[1 + (c/8)] & (1 << (c % 8)))) { /* Are we starting a range? */ if (last + 1 == c && ! inrange) { putchar ('-');

inrange = 1; } /* Have we broken a range? */ else if (last + 1 != c && inrange) { putchar (last); inrange = 0; }

if (! inrange) putchar (c);

last = c; } Easy to parse.

Designed that way!

Natural languages

No {} () [] to indicate scope & precedence

Lots of overloading (arity varies)

Grammar isn’t known in advance!

Ambiguity

printf "/charset %s", re_opcode_t *p - 1 == charset_not ? "^" : ""; assert p + *p < pend; for c = 0; c < 256; c++ if c / 8 < *p && p1 + c/8 & 1 << c % 8 Are we starting a range? if last + 1 == c && ! inrange putchar '-'; inrange = 1; Have we broken a range? else if last + 1 != c && inrange putchar last; inrange = 0; if ! inrange putchar c; last = c;

The parsing problem

PARSER

Grammar

scorer

correct test trees

testsentences

accuracy

Recent parsers quite accurate

… good enough to help NLP tasks!

Applications of parsing (1/2)

Machine translation (Alshawi 1996, Wu 1997, ...)

English Chinesetreeoperations

Speech synthesis from parses (Prevost 1996)

The government plans to raise income tax. The government plans to raise income tax the imagination.

Speech recognition using parsing (Chelba et al 1998)

Put the file in the folder.

Put the file and the folder.

Applications of parsing (2/2)

Grammar checking (Microsoft)

Information extraction (Hobbs 1996)

NY Timesarchive

Database

query

Context Free Grammars (CFG)

• N a set of non-terminal symbols (or variables)

a set of terminal symbols (disjoint from N)

• R a set of productions or rules of the form A→, where A is a non-terminal and is a string of symbols from ( N)*

• S, a designated non-terminal called the start symbol

Simple CFG for English

S → NP VPS → Aux NP VPS → VPNP → PronounNP → Proper-NounNP → Det NominalNominal → NounNominal → Nominal NounNominal → Nominal PPVP → VerbVP → Verb NPVP → VP PPPP → Prep NP

Det → the | a | that | thisNoun → book | flight | meal | moneyVerb → book | include | preferPronoun → I | he | she | meProper-Noun → Houston | NWAAux → doesPrep → from | to | on | near | through

Grammar Lexicon

Sentence Generation

• Sentences are generated by recursively rewriting the start symbol using the productions until only terminals symbols remain. S

VP

Verb NP

Det Nominal

Nominal PP

book

Prep NP

through

Houston

Proper-Noun

the

flight

Noun

Derivation or

Parse Tree

Parsing

• Given a string of terminals and a CFG, determine if the string can be generated by the CFG.– Also return a parse tree for the string– Also return all possible parse trees for the string

• Must search space of derivations for one that derives the given string.– Top-Down Parsing: Start searching space of

derivations for the start symbol.– Bottom-up Parsing: Start search space of reverse

deivations from the terminal symbols in the string.

Parsing Example

S

VP

Verb NP

book Det Nominal

that Noun

flight

book that flight

•13

Top-Down Parsing

• Expand rules, starting with S and working down to leaves– Replace the left-most non-terminal with each of its

possible expansions.

• While we guarantee that any parse in progress will be S-rooted, it will expand non-terminals that can’t lead to the existing input– None of the trees take the properties of the lexical

items into account until the last stage

•14

Expansion techniques

• Breadth-First Expansion All the nodes at each level are expanded once before going to the next (lower) level.– This is memory intensive when many grammar rules

are involved

• Depth-First– Expand a particular node at a level, only considering

an alternate node at that level if the parser fails as a result of the earlier expansion

– i.e., expand the tree all the way down until you can’t expand any more

•15

Top-Down Depth-First Parsing

• There are still some choices that have to be made:

• 1. Which leaf node should be expanded first?– Left-to-right strategy moves through the leaf nodes in a left-to-

right fashion

• 2. Which rule should be applied first?– There are multiple NP rules; which should be used first?

– Can just use the textual order of rules from the grammar• There may be reasons to take rules in a particular order (e.g.,

probabilities)

•16

Top-Down breath-First Parsing

• Search states are kept in an agenda– Search states consist of partial trees and a pointer to the next

input word in the sentence

• Based on what we’ve seen before, apply the next item on the agenda to the current tree

• Add new items to (the front of) the agenda, based on the rules in the grammar which can expand at the (leftmost) node– We maintain the depth-first strategy by adding new

hypotheses (rules) to the front of the agenda

– If we added them to the back, we would have a breadth-first strategy

•17

Top-down (depth-first) parsing

S

NP VP

S

S

NP VPAux

Does

S

NP VP

Det Nom

DoesAux

this

S

NP VP

Det NomDoes

FAIL

Does this flight include a meal ?

•18

Top-down (breadth-first) parsing

S

S

NP VP

S

VP

S

Aux VPNP

S

NP VP

Det Nom

S

NP VP

PropN

S

NP VP

Det Nom

Aux

S

NP VPAux

PropN

S

VP

V NP

S

VP

V

•19

Bottom-Up Parsing

• Bottom-Up Parsing is input-driven start from the words and move up to form a tree

• Here we match one or more nodes on the upper fringe of the parse tree against the right-hand side of a CFG rule, building the left-hand side as a parent node of those nodes.– We can also have breadth-first and depth-first approaches

– The example on the next slide (p. 362, Fig. 10.4) moves in a breadth-first fashion

• While any parse in progress will be tied to the input, many may not lead to an S!– e.g., left-most trees in plies 1-4 of next Figure

•20

Bottom-up parsing

•21

Comparing Top-Down and Bottom-Up Parsing

• Top-Down:– While we guarantee that any parse in progress will

be S-rooted, it will expand non-terminals that can’t lead to the existing input, e.g., first 4 trees in third ply.

• Bottom-Up:– While any parse in progress will be tied to the input,

many may not lead to an S, e.g., left-most trees in plies 1-4 of Figure.

• So, both pure top-down and pure bottom up approaches are highly inefficient.

•22

Left-Corner Parsing

• Motivation:– Both pure top-down and bottom-up approaches are inefficient

– The correct top-down parse has to be consistent with the left-most word of the input

• Left-corner parsing: a way of using bottom-up constraints as part of a top-down strategy.– Left-corner rule: expand a node with a grammar rule only if

the current input can serve as the left corner from this rule.

– Left-corner from a rule: first word along the left edge of a derivation from the rule

• Put the left-corners into a table, which can then guide parsing

•23

Left-Corner Example

S NP VP

S VP

S Aux NP VP

NP Det Nominal | ProperNoun

Nominal Noun Nominal | Noun

VP Verb | Verb NP

Noun book | flight | meal | money

Verb book | include | prefer

Aux does

ProperNoun Houston | TWA

Left Corners S => NP …=> Det, ProperNoun

VP => Verb

Aux … => Aux

NP => Det, ProperNoun

VP => Verb

Nominal => Noun

•24

Other problems: Left-Recursion

• Left-corner parsers still guided by top-down parsing

• Consider rules like:S S and S

NP NP PP

– A top-down left-to-right depth-first parser could apply a rule to expand a node (e.g., S), and then apply that same rule again, and again, ad infinitum.

• Left Recursion: A grammar is left-recursive if a non-terminal leads to a derivation that includes itself as its leftmost immediate or non-immediate child (i.e., along its leftmost branch).

• PROBLEM: Top-Down parsers may not terminate on a left-recursive grammar

•25

Other problems: Repeated Parsing of Subtrees

• When parser backtracks to an alternative expansion of a non-terminal, it loses all parses of sub-constituents that it built.– There is a good chance that it will rebuild the parses of some of

those constituents again.

– This can occur repeatedly.

• a flight from Indianopolis to Houston on TWA– NP Det Nom

• Will build an NP for a flight, before failing when the parser realizes the input PPs aren’t covered

– NP NP PP• Will again build an NP for a flight, before failing when the parser

realizes the two remaining PPs in the input aren’t covered

•26

• Duplicated effort caused by backtracking in top-down parsing

•27

Other problems: Ambiguity

• Repeated parsing of sub-trees is even more of a problem for ambiguous sentences

• 2 kinds of ambiguities: attachment, coordination– PP attachment:

• NP or VP: I shot an elephant in my pajamas.

• NP bracketing: the meal [on flight 286] [from SF] [to Denver]

– Coordination:• [old men] and women vs. old [men and women]

• Parsers have to disambiguate between lots of valid parses or return all parses– Using statistical, semantical and pragmatic knowledge as the source

of disambiguation

• Local ambiguity: even if the sentence isn’t ambiguous it can be inefficient because of local ambiguity: e.g: parsing “Book” in sentence “Book that flight”

•28

Ambiguity (PP-attachment)

•29

VP VP PPNP NP PP

Addressing the problems: Dynamic Programming Parsing

• To avoid extensive repeated work, must cache intermediate results, i.e. completed phrases.

• Caching (memoizing) critical to obtaining a polynomial time parsing (recognition) algorithm for CFGs.

• Dynamic programming algorithms based on both top-down and bottom-up search can achieve O(n3) recognition time where n is the length of the input string.

30

Dynamic Programming Parsing Methods

• CKY (Cocke-Kasami-Younger) algorithm based on bottom-up parsing and requires first normalizing the grammar.

• Earley parser is based on top-down parsing and does not require normalizing grammar but is more complex.

• More generally, chart parsers retain completed phrases in a chart and can combine top-down and bottom-up search.

31

CKY

• First grammar must be converted to Chomsky normal form (CNF) in which productions must have either exactly 2 non-terminal symbols on the RHS or 1 terminal symbol (lexicon rules).

• Parse bottom-up storing phrases formed from all substrings in a triangular table (chart).

32

English Grammar Conversion

S → NP VPS → Aux NP VP

S → VP

NP → PronounNP → Proper-NounNP → Det NominalNominal → NounNominal → Nominal NounNominal → Nominal PPVP → VerbVP → Verb NPVP → VP PPPP → Prep NP

Original Grammar Chomsky Normal Form

S → NP VPS → X1 VPX1 → Aux NPS → book | include | preferS → Verb NPS → VP PPNP → I | he | she | meNP → Houston | NWANP → Det NominalNominal → book | flight | meal | moneyNominal → Nominal NounNominal → Nominal PPVP → book | include | preferVP → Verb NPVP → VP PPPP → Prep NP

CKY Parser

34

Book the flight through Houston

i=0

1

2

3

4

j= 1 2 3 4 5

Cell[i,j]contains allconstituents(non-terminals)covering wordsi +1 through j

CKY Parser

35

Book the flight through Houston

S, VP, Verb,Nominal,Noun

Det

Nominal,Noun

None

NP

CKY Parser

36

Book the flight through Houston

S, VP, Verb,Nominal,Noun

Det

Nominal,Noun

None

NP

VP

CKY Parser

37

Book the flight through Houston

S, VP, Verb,Nominal,Noun

Det

Nominal,Noun

None

NP

VP

S

CKY Parser

38

Book the flight through Houston

S, VP, Verb,Nominal,Noun

Det

Nominal,Noun

None

NP

VP

S

CKY Parser

39

Book the flight through Houston

S, VP, Verb,Nominal,Noun

Det

Nominal,Noun

None

NP

VP

S

Prep

None

None

None

CKY Parser

40

Book the flight through Houston

S, VP, Verb,Nominal,Noun

Det

Nominal,Noun

None

NP

VP

S

Prep

None

None

None

NPProperNoun

PP

CKY Parser

41

Book the flight through Houston

S, VP, Verb,Nominal,Noun

Det

Nominal,Noun

None

NP

VP

S

Prep

None

None

None

NPProperNoun

PP

Nominal

CKY Parser

42

Book the flight through Houston

S, VP, Verb,Nominal,Noun

Det

Nominal,Noun

None

NP

VP

S

Prep

None

None

None

NPProperNoun

PP

Nominal

NP

CKY Parser

43

Book the flight through Houston

S, VP, Verb,Nominal,Noun

Det

Nominal,Noun

None

NP

VP

S

Prep

None

None

None

NPProperNoun

PP

Nominal

NP

VP

CKY Parser

44

Book the flight through Houston

S, VP, Verb,Nominal,Noun

Det

Nominal,Noun

None

NP

VP

S

Prep

None

None

None

NPProperNoun

PP

Nominal

NP

SVP

CKY Parser

45

Book the flight through Houston

S, VP, Verb,Nominal,Noun

Det

Nominal,Noun

None

NP

VP

S

Prep

None

None

None

NPProperNoun

PP

Nominal

NP

VPSVP

CKY Parser

46

Book the flight through Houston

S, VP, Verb,Nominal,Noun

Det

Nominal,Noun

None

NP

VP

S

Prep

None

None

None

NPProperNoun

PP

Nominal

NP

VPSVP

S

CKY Parser

47

Book the flight through Houston

S, VP, Verb,Nominal,Noun

Det

Nominal,Noun

None

NP

VP

S

Prep

None

None

None

NPProperNoun

PP

Nominal

NP

VPSVP

S Parse Tree#1

CKY Parser

48

Book the flight through Houston

S, VP, Verb,Nominal,Noun

Det

Nominal,Noun

None

NP

VP

S

Prep

None

None

None

NPProperNoun

PP

Nominal

NP

VPSVP

S Parse Tree#2

Complexity of CKY (recognition)

• There are (n(n+1)/2) = O(n2) cells

• Filling each cell requires looking at every possible split point between the two non-terminals needed to introduce a new phrase.

• There are O(n) possible split points.

• Total time complexity is O(n3)

49

Complexity of CKY (all parses)

• Previous analysis assumes the number of phrase labels in each cell is fixed by the size of the grammar.

• If compute all derivations for each non-terminal, the number of cell entries can expand combinatorially.

• Since the number of parses can be exponential, so is the complexity of finding all parse trees.

50

Effect of CNF on Parse Trees

• Parse trees are for CNF grammar not the original grammar.

• A post-process can repair the parse tree to return a parse tree for the original grammar.

51

Syntactic Ambiguity

• Just produces all possible parse trees.

• Does not address the important issue of ambiguity resolution.

52

•53

Earley Parsing

• Uses DP to implement top-down search• Single left-to-right pass and filling a table

named Chart(N+1 entry) • 3 kind of information in each entry:– A Subtree corresponding to a single grammar rule

– Information about progress made in completing this subtree

– Position of the subtree respect to the input

•54

Earley Parsing Representation

• The parser uses a representation for parse state based on dotted rules. S NP VP

• Dotted rules distinguish what has been seen so far from what has not been seen (i.e., the remainder). – The constituents seen so far are to the left of the dot in the rule, the

remainder is to the right.

• Parse information is stored in a chart, represented as a graph. – The nodes represent word positions.

– The labels represent the portion (using the dot notation) of the grammar rule that spans that word position.

In other words, at each position, there is a set of labels (each of which is a dotted rule, also called a state), indicating the partial parse tree produced until then.

55

Example: Chart for A Dog

• Given a trivial grammarNP D N

D a

N dog

• Here’s the chart for the complete parse of “a dog”

NP D N [0,0] (predict)

D a [0,1] (scan)

NP D N [0,1] (complete)

N dog [1,2] (scan)

NP D N [0,2] (complete)

•56

More Early Parsing Terminology

• A state is complete if it has a dot at the right-hand side of its rule. Otherwise, it is incomplete.

• At each position, there is a list (actually, a queue) of states.– The parser moves through the N+1 sets of states in the chart

left-to-right, processing the states in each set in order.

– States will be stored in a FIFO (first-in first-out) queue at each start position

• The processing applies one of three operators, each of which takes a state and produces new states added to the chart.– Scanner, Predictor, Completer

– There is no backtracking.

•57

Earley Parsing Algorithm

• In the top level loop, for each position, for each state, it calls the predictor, or else the scanner, or else the completer. – The algorithm never backtracks and never removes

states, so we don’t redo any work

– The goal is to have S α• as the last chart entry, i.e. the dot has moved over the entire input to derive an S

•58

The Earley Algorithm

•59

•60

Prediction

Procedure PREDICTOR((AB, [i, j]))

For each (B) in grammar do

Enqueue((B , [j, j]), chart[j])

End• Predicting is the task of saying what kinds of

input we expect to see– Add a rule to the chart saying that we have not seen

, but when we do, it will form a B

– The rule covers no input, so it goes from j to j

• Such rules provide the top-down aspect of the algorithm

•61

Scanning

Procedure SCANNER ((AB, [i, j]))

If B is a part-of-speech for word[j] then

Enqueue((B word[j], [j, j+1]), chart[j+1])

Scanning reads in lexical items– We add a dotted rule indicating that a word has

been seen between j and j+1

– This is then added to the following (j+1) chart

• Such a completed dotted rule can be used to complete other dotted rules

• These rules also show how the Earley parser has a bottom-up component

•62

Completion

Procedure COMPLETER((B, [j, k]))

For each (AB, [i, j]) in chart[j] do

Enqueue((A B, [i, k]), chart[k])

End

• Completion combines two rules in order to move the dot, i.e., indicate that something has been seen– A rule covering B has been seen, so any rule A which refers to

B in its RHS moves the dot

– Instead of spanning from i to j, A now spans from i to k, which is where B ended

• Once the dot is moved, the rule will not be created again

•63

Example (Book that flight)

•64

Example(Book that flight)

•65

Example(Book that flight), cont

•66

Example(Book that flight), cont

•67

Example(Book that flight), cont

•68

Earley parsing

• The Earley algorithm is efficient, running in polynomial time.

• Technically, however, it is a recognizer, not a parser– To make it a parser, each state needs to be

augmented with a pointer to the states that its rule covers

– For example, a VP would point to the state where its V was completed and the state where its NP was completed

Conclusions

• Syntax parse trees specify the syntactic structure of a sentence that helps determine its meaning.– John ate the spaghetti with meatballs with

chopsticks.

– How did John eat the spaghetti? What did John eat?

• CFGs can be used to define the grammar of a natural language.

• Dynamic programming algorithms allow computing a single parse tree in cubic time or all parse trees in exponential time. 69