INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and...

88
INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology Group (LTG) October 28, 2015 University of Oslo : Department of Informatics

Transcript of INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and...

Page 1: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

INF4820: Algorithms for

Artificial Intelligence and

Natural Language Processing

Chart Parsing

Stephan Oepen & Erik Velldal

Language Technology Group (LTG)

October 28, 2015

University of Oslo : Department of Informatics

Page 2: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Last Time

◮ Mid-Way Evaluation

◮ Forward Algorithm

◮ Quiz & Bonus Points

◮ Syntactic Structure

Overview

Page 3: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Last Time

◮ Mid-Way Evaluation

◮ Forward Algorithm

◮ Quiz & Bonus Points

◮ Syntactic Structure

Today

◮ Context-Free Grammar

◮ Treebanks

◮ Probabilistic CFGs◮ Syntactic Parsing

◮ Naıve: Recursive-Descent◮ Dynamic Programming: CKY

Overview

Page 4: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Group members at the Language Technology Groupsupervise a variety of topics for MSc projects

in natural language processing.Many candidate projects are available on-line.

Please make contact with us.

(2) What is the probability of the bi-gram

language technology

when ignoring case and punctuation,

and using Laplace smoothing?

Recall: Question (2): Language Modelling

Page 5: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

? technology following right after language → P(B|A)

? language technology occuring somewhere → P(A,B)

Recall: Interpreting the Questions?

Page 6: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

? technology following right after language → P(B|A)

? language technology occuring somewhere → P(A,B)

? language and technology occuring somewhere → P(A,B)

Recall: Interpreting the Questions?

Page 7: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

? technology following right after language → P(B|A)

? language technology occuring somewhere → P(A,B)

? language and technology occuring somewhere → P(A,B)

Recall: Interpreting the Questions?

Page 8: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

? technology following right after language → P(B|A)

? language technology occuring somewhere → P(A,B)

? language and technology occuring somewhere → P(A,B)

Recall: Joint and Conditional Probabilities

P(A,B) = P(A) × P(B|A)

A ≡{

wi−1 = language}

B ≡{

wi = technology}

Recall: Interpreting the Questions?

Page 9: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

? technology following right after language → P(B|A)

? language technology occuring somewhere → P(A,B)

? language and technology occuring somewhere → P(A,B)

Recall: Joint and Conditional Probabilities

P(A,B) = P(A) × P(B|A)

A ≡{

wi−1 = language}

B ≡{

wi = technology}

Alternatively: A Complex Event

A ≡{

wi−1 = language ∧ wi = technology}

Recall: Interpreting the Questions?

Page 10: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Constituency

◮ Words tends to lump together into groups that behave likesingle units: we call them constituents.

◮ Constituency tests give evidence for constituent structure:◮ interchangeable in similar syntactic environments.◮ can be co-ordinated◮ can be moved within a sentence as a unit

Recall: Syntactic Structures

Page 11: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Constituency

◮ Words tends to lump together into groups that behave likesingle units: we call them constituents.

◮ Constituency tests give evidence for constituent structure:◮ interchangeable in similar syntactic environments.◮ can be co-ordinated◮ can be moved within a sentence as a unit

(4) Kim read [a very interesting book about grammar]NP.

Kim read [it]NP.

(5) Kim [read a book]VP, [gave it to Sandy]VP, and [left]VP.

(6) [Interesting books about grammar] I like.

Examples from Linguistic Fundamentals for NLP: 100 Essentials from Morphology and Syntax. Bender (2013)

Recall: Syntactic Structures

Page 12: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Formal grammars describe a language, giving us a way to:

◮ judge or predict well-formedness

Kim was happy because passed the exam.

Kim was happy because final grade was an A.

Recall: Grammar Aids Understanding

Page 13: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Formal grammars describe a language, giving us a way to:

◮ judge or predict well-formedness

Kim was happy because passed the exam.

Kim was happy because final grade was an A.

◮ make explicit structural ambiguities

Have her report on my desk by Friday!

I like to eat sushi with { chopsticks | tuna }.

Recall: Grammar Aids Understanding

Page 14: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Formal grammars describe a language, giving us a way to:

◮ judge or predict well-formedness

Kim was happy because passed the exam.

Kim was happy because final grade was an A.

◮ make explicit structural ambiguities

Have her report on my desk by Friday!

I like to eat sushi with { chopsticks | tuna }.

◮ derive abstract representations of meaning

Kim gave Sandy a book.

Kim gave a book to Sandy.

Sandy was given a book by Kim.

Recall: Grammar Aids Understanding

Page 15: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

The Grammar of Spanish✬

S→ NP VP

VP→ V NP

VP→ VP PP

PP→ P NP

NP→ “nieve”

NP→ “Juan”

NP→ “Oslo”

V→ “amo”

P→ “en”

A Grossly Simplified Example

Page 16: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

The Grammar of Spanish✬

S→ NP VP

VP→ V NP

VP→ VP PP

PP→ P NP

NP→ “nieve”

NP→ “Juan”

NP→ “Oslo”

V→ “amo”

P→ “en”

A Grossly Simplified Example

Page 17: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

The Grammar of Spanish✬

S→ NP VP

VP→ V NP

VP→ VP PP

PP→ P NP

NP→ “nieve”

NP→ “Juan”

NP→ “Oslo”

V→ “amo”

P→ “en”

A Grossly Simplified Example

Page 18: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

The Grammar of Spanish✬

S→ NP VP

VP→ V NP

VP→ VP PP

PP→ P NP

NP→ “nieve”

NP→ “Juan”

NP→ “Oslo”

V→ “amo”

P→ “en”

S

NP

Juan

VP

VP

V

amo

NP

nieve

PP

P

en

NP

Oslo✞✝ ☎✆Juan amo nieve en Oslo

A Grossly Simplified Example

Page 19: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

The Grammar of Spanish✬

S→ NP VP

VP→ V NP

VP→ VP PP

PP→ P NP

NP→ “nieve”

NP→ “Juan”

NP→ “Oslo”

V→ “amo”

P→ “en”

S

NP

Juan

VP

VP

V

amo

NP

nieve

PP

P

en

NP

Oslo✞✝ ☎✆Juan amo nieve en Oslo

A Grossly Simplified Example

Page 20: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

The Grammar of Spanish✬

S→ NP VP

VP→ V NP

VP→ VP PP

PP→ P NP

NP→ “nieve”

NP→ “Juan”

NP→ “Oslo”

V→ “amo”

P→ “en”

S

NP

Juan

VP

VP

V

amo

NP

nieve

PP

P

en

NP

Oslo✞✝ ☎✆Juan amo nieve en Oslo

A Grossly Simplified Example

Page 21: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

The Grammar of Spanish✬

S→ NP VP

VP→ V NP

VP→ VP PP

PP→ P NP

NP→ “nieve”

NP→ “Juan”

NP→ “Oslo”

V→ “amo”

P→ “en”

S

NP

Juan

VP

VP

V

amo

NP

nieve

PP

P

en

NP

Oslo✞✝ ☎✆Juan amo nieve en Oslo

A Grossly Simplified Example

Page 22: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

The Grammar of Spanish✬

S→ NP VP

VP→ V NP

VP→ VP PP

PP→ P NP

NP→ “nieve”

NP→ “Juan”

NP→ “Oslo”

V→ “amo”

P→ “en”

S

NP

Juan

VP

VP

V

amo

NP

nieve

PP

P

en

NP

Oslo✞✝ ☎✆Juan amo nieve en Oslo

A Grossly Simplified Example

Page 23: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

The Grammar of Spanish✬

S→ NP VP

VP→ V NP

VP→ VP PP

PP→ P NP

NP→ “nieve”

NP→ “Juan”

NP→ “Oslo”

V→ “amo”

P→ “en”

S

NP

Juan

VP

VP

V

amo

NP

nieve

PP

P

en

NP

Oslo✞✝ ☎✆Juan amo nieve en Oslo

A Grossly Simplified Example

Page 24: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

The Grammar of Spanish✬

S→ NP VP

VP→ V NP

VP→ VP PP

PP→ P NP

NP→ “nieve”

NP→ “Juan”

NP→ “Oslo”

V→ “amo”

P→ “en”

S

NP

Juan

VP

VP

V

amo

NP

nieve

PP

P

en

NP

Oslo✞✝ ☎✆Juan amo nieve en Oslo

A Grossly Simplified Example

Page 25: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

The Grammar of Spanish✬

S→ NP VP

VP→ V NP

VP→ VP PP

PP→ P NP

NP→ “nieve”

NP→ “Juan”

NP→ “Oslo”

V→ “amo”

P→ “en”

S

NP

Juan

VP

VP

V

amo

NP

nieve

PP

P

en

NP

Oslo✞✝ ☎✆Juan amo nieve en Oslo

A Grossly Simplified Example

Page 26: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

The Grammar of Spanish✬

S→ NP VP

VP→ V NP

VP→ VP PP

PP→ P NP

NP→ “nieve”

NP→ “Juan”

NP→ “Oslo”

V→ “amo”

P→ “en”

S

NP

Juan

VP

VP

V

amo

NP

nieve

PP

P

en

NP

Oslo✞✝ ☎✆Juan amo nieve en Oslo

A Grossly Simplified Example

Page 27: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

The Grammar of Spanish✬

S→ NP VP

VP→ V NP

VP→ VP PP

PP→ P NP

NP→ “nieve”

NP→ “Juan”

NP→ “Oslo”

V→ “amo”

P→ “en”

S

NP

Juan

VP

VP

V

amo

NP

nieve

PP

P

en

NP

Oslo✞✝ ☎✆Juan amo nieve en Oslo

A Grossly Simplified Example

Page 28: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

The Grammar of Spanish✬

S→ NP VP {VP ( NP ) }

VP→ V NP {V ( NP ) }

VP→ VP PP {PP ( VP ) }

PP→ P NP {P ( NP ) }

NP→ “nieve” { snow }

NP→ “Juan” { John }

NP→ “Oslo” {Oslo }

V→ “amo” {λbλa adore ( a, b ) }

P→ “en” {λdλc in ( c, d ) }

S

NP

Juan

VP

VP

V

amo

NP

nieve

PP

P

en

NP

Oslo✞✝ ☎✆Juan amo nieve en Oslo

A Grossly Simplified Example

Page 29: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

S: {in ( adore ( John , snow ) , Oslo )}

NP: {John}

Juan

VP: {λa in ( adore ( a, snow ) , Oslo )}

VP: {λa adore ( a, snow )}

V:{λbλa adore ( a, b )}

amo

NP:{snow}

nieve

PP:{λc in ( c,Oslo )}

P:{λdλc in ( c, d )}

en

NP:{Oslo}

Oslo✎✍

☞✌VP→ V NP { V ( NP ) }

Meaning Composition (Still Very Simplified)

Page 30: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

S: {adore (John, in ( snow ,Oslo )}

NP: {John}

Juan

VP: {λa adore (a, in ( snow,Oslo )}

V:{λbλa adore ( a, b )}

amo

NP:{in ( snow,Oslo )}

NP:{snow}

nieve

PP:{λc in ( c,Oslo )}

P:{λdλc in ( c, d )}

en

NP:{Oslo}

Oslo✎✍

☞✌NP→ NP PP { PP ( NP ) }

Another Interpretation

Page 31: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

◮ Formal system for modeling constituent structure.

◮ Defined in terms of a lexicon and a set of rules

Context Free Grammars (CFGs)

Page 32: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

◮ Formal system for modeling constituent structure.

◮ Defined in terms of a lexicon and a set of rules

◮ Formal models of ‘language’ in a broad sense◮ natural languages, programming languages,

communication protocols, . . .

Context Free Grammars (CFGs)

Page 33: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

◮ Formal system for modeling constituent structure.

◮ Defined in terms of a lexicon and a set of rules

◮ Formal models of ‘language’ in a broad sense◮ natural languages, programming languages,

communication protocols, . . .

◮ Can be expressed in the ‘meta-syntax’ of the Backus-NaurForm (BNF) formalism.

◮ When looking up concepts and syntax in the Common LispHyperSpec, you have been reading (extended) BNF.

Context Free Grammars (CFGs)

Page 34: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

◮ Formal system for modeling constituent structure.

◮ Defined in terms of a lexicon and a set of rules

◮ Formal models of ‘language’ in a broad sense◮ natural languages, programming languages,

communication protocols, . . .

◮ Can be expressed in the ‘meta-syntax’ of the Backus-NaurForm (BNF) formalism.

◮ When looking up concepts and syntax in the Common LispHyperSpec, you have been reading (extended) BNF.

◮ Powerful enough to express sophisticated relations amongwords, yet in a computationally tractable way.

Context Free Grammars (CFGs)

Page 35: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Formally, a CFG is a quadruple: G = 〈C,Σ,P,S〉

CFGs (Formally, this Time)

Page 36: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Formally, a CFG is a quadruple: G = 〈C,Σ,P,S〉◮ C is the set of categories (aka non-terminals),

◮ {S,NP,VP,V}

CFGs (Formally, this Time)

Page 37: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Formally, a CFG is a quadruple: G = 〈C,Σ,P,S〉◮ C is the set of categories (aka non-terminals),

◮ {S,NP,VP,V}

◮ Σ is the vocabulary (aka terminals),◮ {Kim, snow, adores, in}

CFGs (Formally, this Time)

Page 38: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Formally, a CFG is a quadruple: G = 〈C,Σ,P,S〉◮ C is the set of categories (aka non-terminals),

◮ {S,NP,VP,V}

◮ Σ is the vocabulary (aka terminals),◮ {Kim, snow, adores, in}

◮ P is a set of category rewrite rules (aka productions)

S→ NP VP NP→ KimVP→ V NP NP→ snow

V→ adores

CFGs (Formally, this Time)

Page 39: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Formally, a CFG is a quadruple: G = 〈C,Σ,P,S〉◮ C is the set of categories (aka non-terminals),

◮ {S,NP,VP,V}

◮ Σ is the vocabulary (aka terminals),◮ {Kim, snow, adores, in}

◮ P is a set of category rewrite rules (aka productions)

S→ NP VP NP→ KimVP→ V NP NP→ snow

V→ adores

◮ S ∈ C is the start symbol, a filter on complete results;

CFGs (Formally, this Time)

Page 40: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Formally, a CFG is a quadruple: G = 〈C,Σ,P,S〉◮ C is the set of categories (aka non-terminals),

◮ {S,NP,VP,V}

◮ Σ is the vocabulary (aka terminals),◮ {Kim, snow, adores, in}

◮ P is a set of category rewrite rules (aka productions)

S→ NP VP NP→ KimVP→ V NP NP→ snow

V→ adores

◮ S ∈ C is the start symbol, a filter on complete results;

◮ for each rule α→ β1, β2, ..., βn ∈ P: α ∈ C and βi ∈ C ∪ Σ

CFGs (Formally, this Time)

Page 41: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Top-down view of generative grammars:

◮ For a grammar G, the language LG is defined as the set ofstrings that can be derived from S.

◮ To derive wn1

from S, we use the rules in P to recursivelyrewrite S into the sequence wn

1where each wi ∈ Σ

Generative Grammar

Page 42: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Top-down view of generative grammars:

◮ For a grammar G, the language LG is defined as the set ofstrings that can be derived from S.

◮ To derive wn1

from S, we use the rules in P to recursivelyrewrite S into the sequence wn

1where each wi ∈ Σ

◮ The grammar is seen as generating strings.

◮ Grammatical strings are defined as strings that can begenerated by the grammar.

Generative Grammar

Page 43: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Top-down view of generative grammars:

◮ For a grammar G, the language LG is defined as the set ofstrings that can be derived from S.

◮ To derive wn1

from S, we use the rules in P to recursivelyrewrite S into the sequence wn

1where each wi ∈ Σ

◮ The grammar is seen as generating strings.

◮ Grammatical strings are defined as strings that can begenerated by the grammar.

◮ The ‘context-freeness’ of CFGs refers to the fact that werewrite non-terminals without regard to the overall contextin which they occur.

Generative Grammar

Page 44: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Generally

◮ A treebank is a corpus paired with ‘gold-standard’(syntactic) analyses

◮ Can be created by manual annotation or selection amongoutputs from automated processing (plus correction).

Treebanks

Page 45: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Generally

◮ A treebank is a corpus paired with ‘gold-standard’(syntactic) analyses

◮ Can be created by manual annotation or selection amongoutputs from automated processing (plus correction).

Penn Treebank (Marcus et al., 1993)

◮ About one million tokens of Wall Street Journal text

◮ Hand-corrected PoS annotation using 45 word classes

◮ Manual annotation with (somewhat) coarse constituentstructure

Treebanks

Page 46: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

S

advp

rb

Still

,

,

np-sbj-1

np

nnp

Time

pos

’s

nn

move

vp

vbz

is

vp

vbg

being

vbn

received

np

-none-

*-1

advp-mnr

rb

well

.

.

Still, Time’s move is being received well.

[WSJ 2350]

One Example from the Penn Treebank

Page 47: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

S

advp

rb

Still

,

,

np-sbj-1

np

nnp

Time

pos

’s

nn

move

vp

vbz

is

vp

vbg

being

vbn

received

np

-none-

*-1

advp-mnr

rb

well

.

.

Still, Time’s move is being received well.

[WSJ 2350]

One Example from the Penn Treebank

Page 48: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

S

advp

rb

Still

,

,

np-sbj-1

np

nnp

Time

pos

’s

nn

move

vp

vbz

is

vp

vbg

being

vbn

received

np

-none-

*-1

advp-mnr

rb

well

.

.

Still, Time’s move is being received well.

[WSJ 2350]

One Example from the Penn Treebank

Page 49: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

S

advp

rb

Still

,

,

np

np

nnp

Time

pos

’s

nn

move

vp

vbz

is

vp

vbg

being

vbn

received

advp

rb

well

.

.

Still, Time’s move is being received well.

[WSJ 2350]

Elimination of Traces and Functions

Page 50: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

◮ We are interested, not just in which trees apply to asentence, but also to which tree is most likely.

Probabilitic Context-Free Grammars

Page 51: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

◮ We are interested, not just in which trees apply to asentence, but also to which tree is most likely.

◮ Probabilistic context-free grammars (PCFGs) augmentCFGs by adding probabilities to each production, e.g.

◮ S→ NP VP 0.6◮ S→ NP VP PP 0.4

◮ These are conditional probabilities — the probability of theright hand side (RHS) given the left hand side (LHS)

◮ P(S→ NP VP) = P(NP VP|S)

Probabilitic Context-Free Grammars

Page 52: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

◮ We are interested, not just in which trees apply to asentence, but also to which tree is most likely.

◮ Probabilistic context-free grammars (PCFGs) augmentCFGs by adding probabilities to each production, e.g.

◮ S→ NP VP 0.6◮ S→ NP VP PP 0.4

◮ These are conditional probabilities — the probability of theright hand side (RHS) given the left hand side (LHS)

◮ P(S→ NP VP) = P(NP VP|S)

◮ We can learn these probabilities from a treebank, againusing Maximum Likelihood Estimation.

Probabilitic Context-Free Grammars

Page 53: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

S

advp

rb

Still

,

,

np

np

nnp

Time

pos

’s

nn

move

vp

vbz

is

vp

vbg

being

vbn

received

advp

rb

well

.

.

Still, Time’s move is being received well.

[WSJ 2350]

Estimating PCFGs (1/3)

Page 54: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

(S

(ADVP (RB "Still"))

(|,| ",")

(NP

(NP (NNP "Time") (POS "’s"))

(NN "move"))

(VP

(VBZ "is")

(VP

(VBG "being")

(VP

(VBN "received")

(ADVP (RB "well")))))

(\. "."))

Estimating PCFGs (2/3)

Page 55: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

(S

(ADVP (RB "Still"))

(|,| ",")

(NP

(NP (NNP "Time") (POS "’s"))

(NN "move"))

(VP

(VBZ "is")

(VP

(VBG "being")

(VP

(VBN "received")

(ADVP (RB "well")))))

(\. "."))

RB→ Still 1

Estimating PCFGs (2/3)

Page 56: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

(S

(ADVP (RB "Still"))

(|,| ",")

(NP

(NP (NNP "Time") (POS "’s"))

(NN "move"))

(VP

(VBZ "is")

(VP

(VBG "being")

(VP

(VBN "received")

(ADVP (RB "well")))))

(\. "."))

RB→ Still 1AVP→ RB 1

Estimating PCFGs (2/3)

Page 57: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

(S

(ADVP (RB "Still"))

(|,| ",")

(NP

(NP (NNP "Time") (POS "’s"))

(NN "move"))

(VP

(VBZ "is")

(VP

(VBG "being")

(VP

(VBN "received")

(ADVP (RB "well")))))

(\. "."))

RB→ Still 1AVP→ RB 1|,| → , 1

Estimating PCFGs (2/3)

Page 58: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

(S

(ADVP (RB "Still"))

(|,| ",")

(NP

(NP (NNP "Time") (POS "’s"))

(NN "move"))

(VP

(VBZ "is")

(VP

(VBG "being")

(VP

(VBN "received")

(ADVP (RB "well")))))

(\. "."))

RB→ Still 1AVP→ RB 1|,| → , 1NNP→ Time 1

Estimating PCFGs (2/3)

Page 59: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

(S

(ADVP (RB "Still"))

(|,| ",")

(NP

(NP (NNP "Time") (POS "’s"))

(NN "move"))

(VP

(VBZ "is")

(VP

(VBG "being")

(VP

(VBN "received")

(ADVP (RB "well")))))

(\. "."))

RB→ Still 1AVP→ RB 1|,| → , 1NNP→ Time 1POS→ ’s 1

Estimating PCFGs (2/3)

Page 60: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

(S

(ADVP (RB "Still"))

(|,| ",")

(NP

(NP (NNP "Time") (POS "’s"))

(NN "move"))

(VP

(VBZ "is")

(VP

(VBG "being")

(VP

(VBN "received")

(ADVP (RB "well")))))

(\. "."))

RB→ Still 1AVP→ RB 1|,| → , 1NNP→ Time 1POS→ ’s 1NP→ NNP POS 1

Estimating PCFGs (2/3)

Page 61: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

(S

(ADVP (RB "Still"))

(|,| ",")

(NP

(NP (NNP "Time") (POS "’s"))

(NN "move"))

(VP

(VBZ "is")

(VP

(VBG "being")

(VP

(VBN "received")

(ADVP (RB "well")))))

(\. "."))

RB→ Still 1AVP→ RB 1|,| → , 1NNP→ Time 1POS→ ’s 1NP→ NNP POS 1NN→move 1

Estimating PCFGs (2/3)

Page 62: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

(S

(ADVP (RB "Still"))

(|,| ",")

(NP

(NP (NNP "Time") (POS "’s"))

(NN "move"))

(VP

(VBZ "is")

(VP

(VBG "being")

(VP

(VBN "received")

(ADVP (RB "well")))))

(\. "."))

RB→ Still 1AVP→ RB 1|,| → , 1NNP→ Time 1POS→ ’s 1NP→ NNP POS 1NN→move 1NP→ NP NN 1

Estimating PCFGs (2/3)

Page 63: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

(S

(ADVP (RB "Still"))

(|,| ",")

(NP

(NP (NNP "Time") (POS "’s"))

(NN "move"))

(VP

(VBZ "is")

(VP

(VBG "being")

(VP

(VBN "received")

(ADVP (RB "well")))))

(\. "."))

RB→ Still 1AVP→ RB 1|,| → , 1NNP→ Time 1POS→ ’s 1NP→ NNP POS 1NN→move 1NP→ NP NN 1VBZ→ is 1

Estimating PCFGs (2/3)

Page 64: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

(S

(ADVP (RB "Still"))

(|,| ",")

(NP

(NP (NNP "Time") (POS "’s"))

(NN "move"))

(VP

(VBZ "is")

(VP

(VBG "being")

(VP

(VBN "received")

(ADVP (RB "well")))))

(\. "."))

RB→ Still 1AVP→ RB 1|,| → , 1NNP→ Time 1POS→ ’s 1NP→ NNP POS 1NN→move 1NP→ NP NN 1VBZ→ is 1VBG→ being 1

Estimating PCFGs (2/3)

Page 65: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

(S

(ADVP (RB "Still"))

(|,| ",")

(NP

(NP (NNP "Time") (POS "’s"))

(NN "move"))

(VP

(VBZ "is")

(VP

(VBG "being")

(VP

(VBN "received")

(ADVP (RB "well")))))

(\. "."))

RB→ Still 1AVP→ RB 1|,| → , 1NNP→ Time 1POS→ ’s 1NP→ NNP POS 1NN→move 1NP→ NP NN 1VBZ→ is 1VBG→ being 1VBN→ received 1

Estimating PCFGs (2/3)

Page 66: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

(S

(ADVP (RB "Still"))

(|,| ",")

(NP

(NP (NNP "Time") (POS "’s"))

(NN "move"))

(VP

(VBZ "is")

(VP

(VBG "being")

(VP

(VBN "received")

(ADVP (RB "well")))))

(\. "."))

RB→ Still 1AVP→ RB 1|,| → , 1NNP→ Time 1POS→ ’s 1NP→ NNP POS 1NN→move 1NP→ NP NN 1VBZ→ is 1VBG→ being 1VBN→ received 1RB→well 1

Estimating PCFGs (2/3)

Page 67: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

(S

(ADVP (RB "Still"))

(|,| ",")

(NP

(NP (NNP "Time") (POS "’s"))

(NN "move"))

(VP

(VBZ "is")

(VP

(VBG "being")

(VP

(VBN "received")

(ADVP (RB "well")))))

(\. "."))

RB→ Still 1AVP→ RB 2|,| → , 1NNP→ Time 1POS→ ’s 1NP→ NNP POS 1NN→move 1NP→ NP NN 1VBZ→ is 1VBG→ being 1VBN→ received 1RB→well 1

Estimating PCFGs (2/3)

Page 68: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

(S

(ADVP (RB "Still"))

(|,| ",")

(NP

(NP (NNP "Time") (POS "’s"))

(NN "move"))

(VP

(VBZ "is")

(VP

(VBG "being")

(VP

(VBN "received")

(ADVP (RB "well")))))

(\. "."))

RB→ Still 1AVP→ RB 2|,| → , 1NNP→ Time 1POS→ ’s 1NP→ NNP POS 1NN→move 1NP→ NP NN 1VBZ→ is 1VBG→ being 1VBN→ received 1RB→well 1VP→ VBN ADVP 1

Estimating PCFGs (2/3)

Page 69: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

(S

(ADVP (RB "Still"))

(|,| ",")

(NP

(NP (NNP "Time") (POS "’s"))

(NN "move"))

(VP

(VBZ "is")

(VP

(VBG "being")

(VP

(VBN "received")

(ADVP (RB "well")))))

(\. "."))

RB→ Still 1AVP→ RB 2|,| → , 1NNP→ Time 1POS→ ’s 1NP→ NNP POS 1NN→move 1NP→ NP NN 1VBZ→ is 1VBG→ being 1VBN→ received 1RB→well 1VP→ VBN ADVP 1VP→ VBG VP 1

Estimating PCFGs (2/3)

Page 70: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

(S

(ADVP (RB "Still"))

(|,| ",")

(NP

(NP (NNP "Time") (POS "’s"))

(NN "move"))

(VP

(VBZ "is")

(VP

(VBG "being")

(VP

(VBN "received")

(ADVP (RB "well")))))

(\. "."))

RB→ Still 1AVP→ RB 2|,| → , 1NNP→ Time 1POS→ ’s 1NP→ NNP POS 1NN→move 1NP→ NP NN 1VBZ→ is 1VBG→ being 1VBN→ received 1RB→well 1VP→ VBN ADVP 1VP→ VBG VP 1\. → . 1

Estimating PCFGs (2/3)

Page 71: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

(S

(ADVP (RB "Still"))

(|,| ",")

(NP

(NP (NNP "Time") (POS "’s"))

(NN "move"))

(VP

(VBZ "is")

(VP

(VBG "being")

(VP

(VBN "received")

(ADVP (RB "well")))))

(\. "."))

RB→ Still 1AVP→ RB 2|,| → , 1NNP→ Time 1POS→ ’s 1NP→ NNP POS 1NN→move 1NP→ NP NN 1VBZ→ is 1VBG→ being 1VBN→ received 1RB→well 1VP→ VBN ADVP 1VP→ VBG VP 1\. → . 1S→ ADVP |,| NP VP \. 1

Estimating PCFGs (2/3)

Page 72: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

(S

(ADVP (RB "Still"))

(|,| ",")

(NP

(NP (NNP "Time") (POS "’s"))

(NN "move"))

(VP

(VBZ "is")

(VP

(VBG "being")

(VP

(VBN "received")

(ADVP (RB "well")))))

(\. "."))

RB→ Still 1AVP→ RB 2|,| → , 1NNP→ Time 1POS→ ’s 1NP→ NNP POS 1NN→move 1NP→ NP NN 1VBZ→ is 1VBG→ being 1VBN→ received 1RB→well 1VP→ VBN ADVP 1VP→ VBG VP 1\. → . 1S→ ADVP |,| NP VP \. 1START→ S 1

Estimating PCFGs (2/3)

Page 73: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Once we have counts of all the rules, we turn them intoprobabilities.

Estimating PCFGs (3/3)

Page 74: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Once we have counts of all the rules, we turn them intoprobabilities.

S→ ADVP |,| NP VP \. 50 S→ NP VP \. 400S→ NP VP PP \. 350 S→ VP ! 100S→ NP VP S \. 200 S→ NP VP 50

Estimating PCFGs (3/3)

Page 75: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Once we have counts of all the rules, we turn them intoprobabilities.

S→ ADVP |,| NP VP \. 50 S→ NP VP \. 400S→ NP VP PP \. 350 S→ VP ! 100S→ NP VP S \. 200 S→ NP VP 50

P(S→ ADVP |, | NP VP \.) ≈C(S→ ADVP |, | NP VP \.)

C(S)

Estimating PCFGs (3/3)

Page 76: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Once we have counts of all the rules, we turn them intoprobabilities.

S→ ADVP |,| NP VP \. 50 S→ NP VP \. 400S→ NP VP PP \. 350 S→ VP ! 100S→ NP VP S \. 200 S→ NP VP 50

P(S→ ADVP |, | NP VP \.) ≈C(S→ ADVP |, | NP VP \.)

C(S)

=50

1150

= 0.0435

Estimating PCFGs (3/3)

Page 77: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Parsing with CFGs: Moving to a Procedural View

S→ NP VP

VP→ V | V NP | VP PP

NP→ NP PP

PP→ P NP

NP→ Kim | snow | Oslo

V→ adores

P→ in

All Complete Derivations

• are rooted in the start symbol S;

• label internal nodes with cate-

gories ∈ C, leafs with words ∈ Σ;

• instantiate a grammar rule ∈ P at

each local subtree of depth one.

S

NP

Kim

VP

VP

V

adores

NP

snow

PP

P

in

NP

Oslo

S

NP

Kim

VP

V

adores

NP

NP

snow

PP

P

in

NP

oslo

inf4820 — -oct- ([email protected])

Chart Parsing for Context-Free Grammars (18)

Page 78: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Parsing with CFGs: Moving to a Procedural View

S→ NP VP

VP→ V | V NP | VP PP

NP→ NP PP

PP→ P NP

NP→ Kim | snow | Oslo

V→ adores

P→ in

All Complete Derivations

• are rooted in the start symbol S;

• label internal nodes with cate-

gories ∈ C, leafs with words ∈ Σ;

• instantiate a grammar rule ∈ P at

each local subtree of depth one.

S

NP

Kim

VP

VP

V

adores

NP

snow

PP

P

in

NP

Oslo

S

NP

Kim

VP

V

adores

NP

NP

snow

PP

P

in

NP

oslo

inf4820 — -oct- ([email protected])

Chart Parsing for Context-Free Grammars (18)

Page 79: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Recursive Descend: A Naıve Parsing Algorithm

Control Structure

• top-down: given a parsing goal α, use all grammar rules that rewrite α;

• successively instantiate (extend) the right-hand sides of each rule;

• for each βi in the RHS of each rule, recursively attempt to parse βi;

• termination: when α is a prefix of the input string, parsing succeeds.

(Intermediate) Results

• Each result records a (partial) tree and remaining input to be parsed;

• complete results consume the full input string and are rooted in S;

• whenever a RHS is fully instantiated, a new tree is built and returned;

• all results at each level are combined and successively accumulated.

inf4820 — -oct- ([email protected])

Chart Parsing for Context-Free Grammars (19)

Page 80: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Recursive Descend: A Naıve Parsing Algorithm

Control Structure

• top-down: given a parsing goal α, use all grammar rules that rewrite α;

• successively instantiate (extend) the right-hand sides of each rule;

• for each βi in the RHS of each rule, recursively attempt to parse βi;

• termination: when α is a prefix of the input string, parsing succeeds.

(Intermediate) Results

• Each result records a (partial) tree and remaining input to be parsed;

• complete results consume the full input string and are rooted in S;

• whenever a RHS is fully instantiated, a new tree is built and returned;

• all results at each level are combined and successively accumulated.

inf4820 — -oct- ([email protected])

Chart Parsing for Context-Free Grammars (19)

Page 81: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

The Recursive Descent Parser✬

(defun parse (input goal)

(if (equal (first input) goal)

(let ((edge (make-edge :category (first input))))

(list (make-parse :edge edge :input (rest input))))

(loop

for rule in (rules-deriving goal)

append (extend-parse (rule-lhs rule) nil (rule-rhs rule) input))))

(defun extend-parse (goal analyzed unanalyzed input)

(if (null unanalyzed)

(let ((tree (cons goal analyzed)))

(list (make-parse :tree tree :input input)))

(loop

for parse in (parse input (first unanalyzed))

append (extend-parse

goal (append analyzed (list (parse-tree parse)))

(rest unanalyzed)

(parse-input parse)))))

inf4820 — -oct- ([email protected])

Chart Parsing for Context-Free Grammars (20)

Page 82: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Quantifying the Complexity of the Parsing Task

1 2 3 4 5 6 7 8

Number of Prepositional Phrases (n)

0

250000

500000

750000

1000000

1250000

1500000

Recursive Function Calls

• • • • • ••

Kim adores snow (in Oslo)n

n trees calls

0 1 46

1 2 170

2 5 593

3 14 2,093

4 42 7,539

5 132 27,627

6 429 102,570

7 1430 384,566

8 4862 1,452,776... ... ...

inf4820 — -oct- ([email protected])

Chart Parsing for Context-Free Grammars (21)

Page 83: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Top­Down vs. Bottom­Up Parsing

Top-Down (Goal-Oriented)

• Left recursion (e.g. a rule like ‘VP→ VP PP’) causes infinite recursion;

• search is uninformed by the (observable) input: can hypothesize many

unmotivated sub-trees, assuming terminals (words) that are not present;

→ assume bottom-up as basic search strategy for remainder of the course.

Bottom-Up (Data-Oriented)

• unary (left-recursive) rules (e.g. ‘NP→ NP’) would still be problematic;

• lack of parsing goal: compute all possible derivations for, say, the input

adores snow ; however, it is ultimately rejected since it is not sentential;

• availability of partial analyses desirable for, at least, some applications.

inf4820 — -oct- ([email protected])

Chart Parsing for Context-Free Grammars (22)

Page 84: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Top­Down vs. Bottom­Up Parsing

Top-Down (Goal-Oriented)

• Left recursion (e.g. a rule like ‘VP→ VP PP’) causes infinite recursion;

• search is uninformed by the (observable) input: can hypothesize many

unmotivated sub-trees, assuming terminals (words) that are not present;

→ assume bottom-up as basic search strategy for remainder of the course.

Bottom-Up (Data-Oriented)

• unary (left-recursive) rules (e.g. ‘NP→ NP’) would still be problematic;

• lack of parsing goal: compute all possible derivations for, say, the input

adores snow ; however, it is ultimately rejected since it is not sentential;

• availability of partial analyses desirable for, at least, some applications.

inf4820 — -oct- ([email protected])

Chart Parsing for Context-Free Grammars (22)

Page 85: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

A Key Insight: Local Ambiguity

• For many substrings, more than one way of deriving the same category;

• NPs: 1 | 2 | 3 | 6 | 7 | 9 ; PPs: 4 | 5 | 8 ; 9 ≡ 1 + 8 | 6 + 5 ;

• parse forest — a single item represents multiple trees [Billot & Lang, 89].

✪2 3 4 5 6 7

boys with hats from France

1 2 3

4 5

6 7

8

9

inf4820 — -oct- ([email protected])

Chart Parsing for Context-Free Grammars (23)

Page 86: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

The CKY (Cocke, Kasami, & Younger) Algorithm

for (0 ≤ i < |input |) do

chart [i,i+1]← {α |α→ input i ∈ P};for (1 ≤ l < |input |) do

for (0 ≤ i < |input | − l) do

for (1 ≤ j ≤ l) do

if (α→ β1 β2 ∈ P ∧ β1 ∈ chart [i,i+j] ∧ β2 ∈ chart [i+j,i+l+1]) then

chart [i,i+l+1]← chart [i,i+l+1] ∪ {α};

[0,2]← [0,1] + [1,2]· · ·

[0,5]← [0,1] + [1,5][0,5]← [0,2] + [2,5][0,5]← [0,3] + [3,5][0,5]← [0,4] + [4,5]

1 2 3 4 5

0 NP S S

1 V VP VP

2 NP NP

3 P PP

4 NP

inf4820 — -oct- ([email protected])

Chart Parsing for Context-Free Grammars (24)

Page 87: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Limitations of the CKY Algorithm

Built-In Assumptions

• Chomsky Normal Form grammars: α→ β1β2 or α→ γ (βi ∈ C, γ ∈ Σ);

• breadth-first (aka exhaustive): always compute all values for each cell;

• rigid control structure: bottom-up, left-to-right (one diagonal at a time).

Generalized Chart Parsing

• Liberate order of computation: no assumptions about earlier results;

• active edges encode partial rule instantiations, ‘waiting’ for additional

(adjacent and passive) constituents to complete: [1, 2,VP→ V •NP];

• parser can fill in chart cells in any order and guarantee completeness.

inf4820 — -oct- ([email protected])

Chart Parsing for Context-Free Grammars (25)

Page 88: INF4820: Algorithms for Artificial Intelligence and Natural ... · Artificial Intelligence and Natural Language Processing Chart Parsing Stephan Oepen & Erik Velldal Language Technology

Limitations of the CKY Algorithm

Built-In Assumptions

• Chomsky Normal Form grammars: α→ β1β2 or α→ γ (βi ∈ C, γ ∈ Σ);

• breadth-first (aka exhaustive): always compute all values for each cell;

• rigid control structure: bottom-up, left-to-right (one diagonal at a time).

Generalized Chart Parsing

• Liberate order of computation: no assumptions about earlier results;

• active edges encode partial rule instantiations, ‘waiting’ for additional

(adjacent and passive) constituents to complete: [1, 2,VP→ V •NP];

• parser can fill in chart cells in any order and guarantee completeness.

inf4820 — -oct- ([email protected])

Chart Parsing for Context-Free Grammars (25)