LIN 69321 LIN6932: Topics in Computational Linguistics Hana Filip.

LIN 6932 1

LIN6932: Topics in Computational Linguistics

Hana Filip

LIN 6932 2

• Parsing with context-free grammars

LIN 6932 3

Grammar Equivalence and Chomsky Normal Form

• Weak equivalence

• Strong equivalence

LIN 6932 4

Grammar Equivalence and Chomsky Normal Form (CNF)

many proofs in the field of languages and computability make use of the Chomsky Normal Form.

there are algorithms that decide whether a given string can be generated by a given grammar and that use the Chomsky normal form: e.g., the CYK (Cocke-Younger-Kasami)

LIN 6932 5


• Chomsky Normal Form (CNF)is one of the most basic Normal Forms (roughly: in the context of computing and rewriting systems, a form that cannot be further reduced to a simpler form).In CNF each production (rewriting rule) has the form

A → B C or A → α

where – A, B and C are nonterminal symbols

– α is a terminal symbol (i.e., a symbol that represents a constant value)– productions (rewriting rules) are expansive: throughout the derivation of a

string, each string of terminals and nonterminals is always either the same length or one element longer than the previous such string

LIN 6932 6


• For grammars in Chomsky Normal Form the parse tree is always a binary tree.

• We can talk about the relationship between:– the depth of the parse tree, and – the length of its yield.

LIN 6932 7


• If a parse tree for a word string w is generated by a CNF and the parse tree– has a path length of at most i,– then the length of w is at most 2i-1.

LIN 6932 8


LIN 6932 9


Every grammar in Chomsky normal form is context-free, and conversely,

every context-free grammar can be efficiently transformed into an equivalent one which is in Chomsky normal form.

LIN 6932 10


LIN 6932 11


If a CFG is in Chomsky NF,you know exactly how many steps it takesto generate a string – especially you have anupper limit

LIN 6932 12

CFG for Fragment of English: G0

LIN 6932 13

Parse Tree for ‘Book that flight’ using G0

LIN 6932 14

FSA and Syntactic Parsing with CFGs(see previous lecture: types of formal grammar on Chomsky H - the class of

languages they generate - types of finite state automata that recognizes each class)

CFG rule: NP (Det) Adj* N

LIN 6932 15

Parsing as a Search Problem

• parsing (linguistics: syntax analysis) is the process of analyzing a sequence of tokens to determine its grammatical structure with respect to a given formal grammar.

LIN 6932 16

Parsing as a Search Problem

• Searching FSAs– Finding the right path through the automaton– Search space defined by structure of FSA

• Searching CFGs– Finding the right parse tree among all possible parse trees– Search space defined by the grammar

• Constraints provided by – the input sentence and – the automaton or grammar

LIN 6932 17

Two Search StrategiesHow can we use Go to assign the correct parse tree(s) to a given string of words? • Constraints provided by

– the input sentence and – the automaton or grammar

Give rise to two search strategies:

• Top-Down (Hypothesis-Directed) Search– Search for tree starting from S until input words covered.

• Bottom-Up (Data-Directed) Search– Start with words and build upwards toward S

LIN 6932 18

Two Search Strategiessearch strategies and epistemology (the study of knowledge and justified belief, philosophy of science)

• Top-Down (Hypothesis-Directed) Search–Search for tree starting from S until all input words covered

–Rationalist tradition: emphasizes the use of prior knowledge

• Bottom-Up (Data-Directed) Search–Start with words and build upwards toward S

–Empiricist tradition: emphasizes the data

The rationalist vs. empiricist controversy concerns the extent to which we are dependent upon sense experience in our effort to gain knowledge

LIN 6932 19

Top-Down Parser

• Builds from the root S node down to the leaves• Assuming we build all trees in parallel:

– Find all trees with root S– Next expand all constituents in these trees/rules– Continue until leaves are part of speech categories (pos)– Candidate trees failing to match pos of input string are rejected

• Top-Down: Rationalist Tradition– Expectation- or Theory-driven– Goal: Build tree for input starting with S

LIN 6932 20

Top-Down Search Space for G0

LIN 6932 21

Bottom-Up Parsing• The earliest known parsing algorithm (suggested by Yngve 1955)• Parser begins with words of input and builds up trees, applying G0 rules

whose right-hand sides match• Book that flight

N Det N V Det N

Book that flight Book that flight– ‘Book’ ambiguous– Parse continues until an S root node reached or no further node expansion possible

• Bottom-Up: Empiricist Tradition– Data driven– Primary consideration: Lowest sub-trees of final tree must hook up with words in

input.

LIN 6932 22

Expanding Bottom-Up Search Space for ‘Book that flight’

LIN 6932 23

Comparing Top-Down and Bottom-Up

• Top-Down parsers: never explore illegal parses (e.g. parses that can’t form an S) -- but waste time on trees that can never match the input

• Bottom-Up parsers: never explore trees inconsistent with input -- but waste time exploring illegal parses (no S root)

• For both: how to explore the search space?– Pursuing all parses in parallel or …?– Which node to expand next?– Which rule to apply next?

LIN 6932 24

A Possible Top-Down Parsing Strategy

• Depth-first search: – start at the root (selecting some node as the root in the

graph case) and expand as far as possible until

– you reach a state (tree) inconsistent with input, backtrack to the most recent unexplored state (tree)

• Which node to expand?– Leftmost

• Which grammar rule to use?– Order in the grammar

LIN 6932 25

Basic Algorithm for Top-Down, Depth-First, Left-Right Strategy

• Initialize agenda with ‘S’ tree and point to first word and make this current search state (cur)

• Loop until successful parse or empty agenda– Apply next applicable grammar rule to leftmost

unexpanded node (n) of current tree (t) on agenda and push resulting tree (t’) onto agenda

• If n is a POS category and matches the POS of cur, push new tree (t’’) onto agenda

• Else pop t’ from agenda

– Final agenda contains history of successful parse

LIN 6932 26

Example: Does this flight include a meal?

LIN 6932 27

Example continued …

LIN 6932 28

Augmenting Top-Down Parsing with Bottom-Up Filtering

• We saw: Top-Down, depth-first, L-to-R parsing – Expands non-terminals along the tree’s left edge down to

leftmost leaf of tree– Moves on to expand down to next leftmost leaf…

• In a successful parse, current input word will be the first word in derivation of the unexpanded node that the parser is currently processing

• So … look ahead to left-corner of the tree – B is a left-corner of A if A ==>* B– Build table with left-corners of all non-terminals in grammar

and consult before applying rule

LIN 6932 29

Left Corners

Pre-compute all POS that can serve as the leftmost POS in the derivations of each non-terminal category

LIN 6932 30

Left-Corner Table for G0

Previous Example:

Category Left Corners

S NP, Det, PropN, Aux, V

NP Det, PropN

Nom N

VP V

LIN 6932 31

Summing Up Parsing Strategies

• Parsing is a search problem which may be implemented with many search strategies

• Top-Down vs. Bottom-Up Parsers– Both generate too many useless trees– Combine the two to avoid over-generation: Top-Down

Parsing with Bottom-Up look-ahead

• Left-corner table provides more efficient look-ahead– Pre-compute all POS that can serve as the leftmost

POS in the derivations of each non-terminal category

LIN 6932 32

Three Critical Problems in Parsing

• Left Recursion

• Ambiguity

• Repeated Parsing of Sub-trees

LIN 6932 33

Left Recursion• A long-standing issue regarding algorithms that manipulate context-free

grammars (CFGs) in a "top-down" left-to-right fashion is that left recursion can lead to nontermination, to an infinite loop.

• Direct Left Recursion happens when you have a rule that calls itself before anything else.Examples: NPNP PP, NP NP and NP, VP VP PP, S S and S

• Indirect Left Recursion: Example: NP Det Nominal

Det NP ’s

LIN 6932 34

Left Recursion• Indirect Left Recursion:

Example: NP Det NominalDet NP ’s

NP NP NP

NominalDet

’s

LIN 6932 35

Solutions to Left Recursion

• Don't use recursive rules

• Rule ordering

• Limit depth of recursion in parsing to some analytically or empirically set limit

• Don't use top-down parsing

LIN 6932 36

Solution: Grammar Rewriting

• Rewrite a left-recursive grammar to a weakly equivalent one which is not left-recursive.

• How?– By Hand (ick) or …– Automatically

LIN 6932 37


I saw the man on the hill with a telescope.

N V NP PP PP

NP: noun phrase

PP: prepositional phrase

Phrase: characterized by its head (N, V, P)

Ambiguous: 5 possible parses

LIN 6932 38



(1) S

NP VP

V NP

N PP PP

LIN 6932 39



(2) S

NP VP PP

V NP

N PP

LIN 6932 40



(3) S

NP VP PP PP

V NP

LIN 6932 41

Solution: Grammar RewritingI saw the man on the hill with the telescope…

NP NP PP (recursive)

NP N PP (nonrecursive)

NP N

…becomes…

NP N NP’

NP’ PP NP’

NP’ e

• Not so obvious what these rules mean…

LIN 6932 42

Rule Ordering

• Bad:– NP NP PP– NP Det N

• Rule ordering: non-recursive rules first – First: NP Det N– Then: NP NP PP

LIN 6932 43

Depth Bound

• Set an arbitrary bound

• Set an analytically derived bound

• Run tests and derive reasonable bound empirically

LIN 6932 44

Ambiguity

• Lexical Ambiguity– Leads to hypotheses that are locally reasonable

but eventually lead nowhere – “Book that flight”

• Structural Ambiguity– Leads to multiple parses for the same input

LIN 6932 45

Lexical Ambiguity: Word Sense Disambiguation (WSD) as Text Categorization

• Each sense of an ambiguous word is treated as a category.– “play” (verb)

• play-game• play-instrument• play-role

– “pen” (noun)• writing-instrument• enclosure

• Treat current sentence (or preceding and current sentence) as a document to be classified.– “play”:

• play-game: “John played soccer in the stadium on Friday.”• play-instrument: “John played guitar in the band on Friday.”• play-role: “John played Hamlet in the theater on Friday.”

– “pen”:• writing-instrument: “John wrote the letter with a pen in New York.”• enclosure: “John put the dog in the pen in New York.”

LIN 6932 46

Structural ambiguity

• Multiple legal structures– Attachment (e.g. I saw a man on a hill with a telescope)– Coordination (e.g. younger cats and dogs)– NP bracketing (e.g. Spanish language teachers)

LIN 6932 47

Two Parse Trees for Ambiguous Sentence

LIN 6932 48

Humor and Ambiguity

• Many jokes rely on the ambiguity of language:– Groucho Marx: One morning I shot an elephant in my pajamas. How he

got into my pajamas, I’ll never know.

– She criticized my apartment, so I knocked her flat.

– Noah took all of the animals on the ark in pairs. Except the worms, they came in apples.

– Policeman to little boy: “We are looking for a thief with a bicycle.” Little boy: “Wouldn’t you be better using your eyes.”

– Why is the teacher wearing sun-glasses. Because the class is so bright.

LIN 6932 49

Ambiguity is Explosive• Ambiguities compound to generate enormous

numbers of possible interpretations.• In English, a sentence ending in n prepositional

phrases has over 2n syntactic interpretations.– “I saw the man with the telescope”: 2 parses– “I saw the man on the hill with the telescope.”: 5 parses– “I saw the man on the hill in Texas with the telescope”:

14 parses– “I saw the man on the hill in Texas with the telescope at

noon.”: 42 parses

LIN 6932 50

What’s the solution?

Return all possible parses and

disambiguate using “other methods”

LIN 6932 51

Summing Up

• Parsing is a search problem which may be implemented with many control strategies– Top-Down or Bottom-Up approaches each have

problems• Combining the two solves some but not all issues

– Left recursion– Syntactic ambiguity

• Next time: Making use of statistical information about syntactic constituents

LIN 6932 52

Dynamic Programming

• Create table of solutions to sub-problems (e.g. subtrees) as parse proceeds

• Look up subtrees for each constituent rather than re-parsing

• Since all parses implicitly stored, all available for later disambiguation

• Examples: Cocke-Younger-Kasami (CYK) (1960), Graham-Harrison-Ruzzo (GHR) (1980) and Earley (1970) algorithms

LIN 6932 53

Earley Algorithm• Jay Earley (1970) • A type of chart parser that uses dynamic

programming to do parallel top-down search• Can parse all context-free languages • Dot notation

– Given a production A BCD where B, C, and D are symbols in the grammar (terminals or nonterminals), the notation A B • C D represents a condition in which B has already been parsed and the sequence C D is expected.

LIN 6932 54

Earley Algorithm• left-to-right pass fills out a chart with N+1 states

– Think of chart entries as sitting between words in the input string keeping track of states of the parse at these positions

0 Book 1 that 2 flight 3

– For each word position, chart contains set of states representing all partial parse trees generated to date.

E.g. chart[0] contains all partial parse trees generated at the beginning of the sentence

LIN 6932 55

Chart Entries Represent three types of constituents:• completed constituents

We keep track of what we have built with what we call complete edges in the chart, noting where a constituent stops and where it starts. For the lexical complete edges in

0 Book 1 that 2 flight 3 we want:

Verb [0,1] Det [1,2] Noun [2,3]

• in-progress constituents• predicted constituents

We keep track of what we are looking for with incomplete edges, saying what we are looking for and where it starts (we don't know where it ends yet). We start out looking for an S at 0. S [0]

LIN 6932 56

Progress in parse represented by Dotted Rules

• Position of • indicates the progress made in recognizing a grammar rule

•0 Book 1 that 2 flight 3

S • VP, [0,0] (predicted)NP Det • Nom, [1,2] (in progress)VP V NP •, [0,3] (completed)

• [x,y] tells us what portion of the input is spanned so far by this rule

• Each State si:<dotted rule>, [<back pointer>,<current position>]

LIN 6932 57

S • VP, [0,0] – First 0 means S constituent begins at the start of input

– Second 0 means the dot here too

– So, this is a top-down prediction

NP Det • Nom, [1,2]– the NP begins at position 1

– the dot is at position 2

– so, Det has been successfully parsed

– Nom predicted next


LIN 6932 58

0 Book 1 that 2 flight 3 (continued)

VP V NP •, [0,3]– Successful VP parse of entire input

LIN 6932 59

Some Earley edges (parser states)

1. S => • NP VP [0,0]: Incomplete. We're trying to build an S that starts at 0 using the NP VP rule. We've found nothing yet (the dot is in first position), we're currently looking for an NP that starts at 0.

2. S => NP • VP [0,1]: Incomplete. We're trying to build an S that starts at 0 using the NP VP rule. We've found an NP (the dot is in second position), we're currently looking for a VP that starts at 0.

3. S => NP VP • [0,3]: Complete. We've succeeded in building an S that starts at 0 using the NP VP rule. It ends at 3 (the dot is in the last position), we're currently looking for a VP that starts at 0.

LIN 6932 60

Successful Parse

• Final answer found by looking at last entry in chart

• If entry is S •, [nil, N], then input parsed successfully

• Chart will also contain record of all possible parses of input string, given the grammar

LIN 6932 61

Parsing Procedure for the Earley Algorithm

• Move through each set of states in order, applying one of three operators to each state:– predictor: adds predictions (creates new states) to

the chart– scanner: reads input words and enters states

corresponding to those words to chart– completer: moves the dot to right when new

constituent found

LIN 6932 62

Earley Algorithm from Book

LIN 6932 63

Earley Algorithm: Essential Ideas Initialize

• To look for an S at 0, we add the following incomplete edge at 0, which uses a dummy category gamma,

• gamma => • S [0,0]

LIN 6932 64

Earley Algorithm: Essential Ideas Predictor

• Intuition: create new state for top-down prediction of new phrase

• Applied when non part-of-speech non-terminals are to the right of a dot: S • VP [0,0]

• Adds new states to current chart– One new state for each expansion of the non-terminal in

the grammarVP • V [0,0]VP • V NP [0,0]

LIN 6932 65

Earley Algorithm: Essential Ideas Scanner

• Intuition: Create new states for rules matching part of speech of next word.

• Applicable when part of speech is to the right of a dot: VP • V NP [0,0] ‘Book…’

• Looks at current word in input• If match, adds state(s) to next chart

– if VP • Verb NP, [0, 0] is processed, scanner consults the current word in the input; since book can be a verb, it matches the expectation in the current state. This results in the creation of the new state which is added to the chart: Verb book • , [0,1]

LIN 6932 66

Earley Algorithm: Essential Ideas Completer

• Intuition: parser has finished a new phrase, so must find and advance states, all states that were waiting for this

• Applied when dot has reached right end of ruleNP Det Nom • [1,3]

• Find all states with dot at 1 and expecting an NPVP V • NP [0,1]

• Adds new (completed) state(s) to current chartVP V NP • [0,3]


LIN 6932 67

Example: State Set S0 for Parsing “Book that flight” using Grammar G0

nil

LIN 6932 68

Example: State Set S1 for Parsing “Book that flight”

VP V and VP V NP are both passed to Scanner, which adds them to Chart[1], moving dots to right

Scanner

Scanner

LIN 6932 69

Last Two States

Scanner

ScannerScanner

γ → S . [nil,3] Completer

LIN 6932 70

Error Handling

Valid sentences will leave the state

S , [nil, N]

• What happens when we look at the contents of the last table column and don't find a S rule?– Is it a total loss? No...– Chart contains every constituent and combination of

constituents possible for the input given the grammar

• Also useful for partial parsing or shallow parsing used in information extraction

LIN 6932 71

How do we retrieve the parses at the end?

• The representation of each state must be augmented with an additional field to store information about the completed states that generated its constituents– i.e. what state did we advance here?– Read the pointers back from the final state

LIN 6932 72

Earley’s Keys to Efficiency

• Left-recursion, Ambiguity and repeated re-parsing of subtrees– Solution: dynamic programming

• Combine top-down predictions with bottom-up look-ahead to avoid unnecessary expansions

• Earley is still one of the most efficient parsers• All efficient parsers avoid re-computation in a

similar way.

LIN 6932 73

Next Time

* Chapter 1 of Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2007.

* D. Moldovan, S. Harabagiu, M. Pasca, R. Mihalcea, R. Goodrum, R. Girju, and V. Rus. 1999. LASSO: A tool for surfing the answer net. In Proceedings of the Eighth Text Retrieval Conference (TREC-8), 1999.

* E. Brill, S. Dumais and M. Banko. 2002. An analysis of the AskMSR question-answering system. Proceedings of EMNLP 2002

LIN 69321 LIN6932: Topics in Computational Linguistics Hana Filip.

Documents

Transcript of LIN 69321 LIN6932: Topics in Computational Linguistics Hana Filip.