Joint Ambiguity Modeling in NLP - uio.no · Joint Ambiguity Modeling in NLP Woodley Packard...

Post on 19-Aug-2018

216 views 0 download

Transcript of Joint Ambiguity Modeling in NLP - uio.no · Joint Ambiguity Modeling in NLP Woodley Packard...

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Joint Ambiguity Modeling in NLP

Woodley Packard

Universitet i Oslo

March 21, 2011

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Introduction

Ambiguity is a central phenomenon in natural language,affecting accuracy and efficiency in most if not all types ofNLP.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word boundaries

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word boundaries

◮ Difficult in some languages – e.g. Chinese

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word boundaries

◮ Difficult in some languages – e.g. Chinese

◮ Norwegian examples lifted from recent mailing listactivity:

◮ Lege-ring, sei-del, sel-skap, bru-sau-tomat,

sports-av-iser...

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Sentence Boundaries

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Sentence Boundaries

◮ Again, not marked in some languages

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Sentence Boundaries

◮ Again, not marked in some languages

◮ Even in English, the markers are overloaded:

◮ The CEO had the P.R. Department leaders make risky

moves.

◮ The citizens voted in the U.S. Presidential Election

polls.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses

◮ English word bank:

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses

◮ English word bank:

◮ (n) Financial institution

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses

◮ English word bank:

◮ (n) Financial institution

◮ (n) Side of a river / stream

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses

◮ English word bank:

◮ (n) Financial institution

◮ (n) Side of a river / stream

◮ (n) Repository of resources (tree bank, bank of

switches)

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses

◮ English word bank:

◮ (n) Financial institution

◮ (n) Side of a river / stream

◮ (n) Repository of resources (tree bank, bank of

switches)

◮ (v) To bet everything (on ...)

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses

◮ English word bank:

◮ (n) Financial institution

◮ (n) Side of a river / stream

◮ (n) Repository of resources (tree bank, bank of

switches)

◮ (v) To bet everything (on ...)

◮ (v) Tilting to make a turn in an airplane

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses

◮ English word bank:

◮ (n) Financial institution

◮ (n) Side of a river / stream

◮ (n) Repository of resources (tree bank, bank of

switches)

◮ (v) To bet everything (on ...)

◮ (v) Tilting to make a turn in an airplane

◮ (v) To do business at a bank (financial institution)

◮ .... others.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Syntactic Ambiguity

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Syntactic Ambiguity

◮ I saw the man with the telescope.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Syntactic Ambiguity

◮ I saw the man with the telescope.

◮ Anaphora

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Syntactic Ambiguity

◮ I saw the man with the telescope.

◮ Anaphora

◮ Jessie and Alex grew up together, but eventually he

moved to the West coast and she moved to the East

coast.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Syntactic Ambiguity

◮ I saw the man with the telescope.

◮ Anaphora

◮ Jessie and Alex grew up together, but eventually he

moved to the West coast and she moved to the East

coast.

◮ .... others.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Traditional Approaches to Dealing with Ambiguity

◮ How well can we resolve some particular type ofambiguity?

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Traditional Approaches to Dealing with Ambiguity

◮ How well can we resolve some particular type ofambiguity?

◮ Rich literature about how to handle each type ofambiguity.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Traditional Approaches to Dealing with Ambiguity

◮ How well can we resolve some particular type ofambiguity?

◮ Rich literature about how to handle each type ofambiguity.

◮ None are completely solved problems.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Traditional Approaches to Dealing with Ambiguity

◮ How well can we resolve some particular type ofambiguity?

◮ Rich literature about how to handle each type ofambiguity.

◮ None are completely solved problems.

◮ Some have been solved fairly well (word boundaries,sentence boundaries).

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Traditional Approaches to Dealing with Ambiguity

◮ How well can we resolve some particular type ofambiguity?

◮ Rich literature about how to handle each type ofambiguity.

◮ None are completely solved problems.

◮ Some have been solved fairly well (word boundaries,sentence boundaries).

◮ But most have room for improvement.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

An Emerging Trend in Research

◮ Results about how diverse types of information can behelpful in ambiguity resolution.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

An Emerging Trend in Research

◮ Results about how diverse types of information can behelpful in ambiguity resolution.

◮ Large body of research about using syntax as a guidefor resolving anaphora.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

An Emerging Trend in Research

◮ Results about how diverse types of information can behelpful in ambiguity resolution.

◮ Large body of research about using syntax as a guidefor resolving anaphora.

Some others:

◮ “Using Syntactic Dependency as Local Context toResolve Word Sense Ambiguity” D. Lin, ACL 1997.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

An Emerging Trend in Research

◮ Results about how diverse types of information can behelpful in ambiguity resolution.

◮ Large body of research about using syntax as a guidefor resolving anaphora.

Some others:

◮ “Using Syntactic Dependency as Local Context toResolve Word Sense Ambiguity” D. Lin, ACL 1997.

◮ “Improving Parsing and PP attachment Performancewith Sense Information” E. Agirre, T. Baldwin, D.Martinez, ACL 2008.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Joint Modeling of AmbiguityIdea:Instead of modeling individual marginal distributions for eachtype of ambiguity, or conditional models involving two typesof ambiguity, what if we model a joint distribution for all thetypes of ambiguity we are interested in at once?

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Joint Modeling of AmbiguityIdea:Instead of modeling individual marginal distributions for eachtype of ambiguity, or conditional models involving two typesof ambiguity, what if we model a joint distribution for all thetypes of ambiguity we are interested in at once?

◮ Gain: modeling flexibility

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Joint Modeling of AmbiguityIdea:Instead of modeling individual marginal distributions for eachtype of ambiguity, or conditional models involving two typesof ambiguity, what if we model a joint distribution for all thetypes of ambiguity we are interested in at once?

◮ Gain: modeling flexibility

◮ Cost: greater complexity

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Joint Modeling of AmbiguityIdea:Instead of modeling individual marginal distributions for eachtype of ambiguity, or conditional models involving two typesof ambiguity, what if we model a joint distribution for all thetypes of ambiguity we are interested in at once?

◮ Gain: modeling flexibility

◮ Cost: greater complexity

◮ Possible framework: graphical models

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Joint Modeling of AmbiguityIdea:Instead of modeling individual marginal distributions for eachtype of ambiguity, or conditional models involving two typesof ambiguity, what if we model a joint distribution for all thetypes of ambiguity we are interested in at once?

◮ Gain: modeling flexibility

◮ Cost: greater complexity

◮ Possible framework: graphical models

◮ Inference and parameter estimation: sometimestractable, sometimes not.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

A Graphical Model for Ambiguity

Anaphora

Syntax

Hobbs 1978

WSD

Lin 1997Agirre et al. 2008

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Other Dependencies Seem Likely

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Other Dependencies Seem Likely

◮ Information about anaphora may be helpful indisambiguating syntax.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Other Dependencies Seem Likely

◮ Information about anaphora may be helpful indisambiguating syntax.

◮ Word Priming: information about the senses of words ina given sentence may be helpful for determining wordsenses in nearby subsequent sentences.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Other Dependencies Seem Likely

◮ Information about anaphora may be helpful indisambiguating syntax.

◮ Word Priming: information about the senses of words ina given sentence may be helpful for determining wordsenses in nearby subsequent sentences.

◮ Syntactic Priming: information about the constructionsused in a given sentence may be helpful for determiningthe syntax of nearby subsequent sentences.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Other Dependencies Seem Likely

◮ Information about anaphora may be helpful indisambiguating syntax.

◮ Word Priming: information about the senses of words ina given sentence may be helpful for determining wordsenses in nearby subsequent sentences.

◮ Syntactic Priming: information about the constructionsused in a given sentence may be helpful for determiningthe syntax of nearby subsequent sentences.

◮ Maybe others too!

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

A Revised Graphical Model for Ambiguity

Anaphora

Syntax1 Syntax2 Syntax3

WSD1 WSD2 WSD3

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

A Roadmap for my Project

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

A Roadmap for my Project

◮ Build basic disambiguation systems for a few types ofambiguity

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

A Roadmap for my Project

◮ Build basic disambiguation systems for a few types ofambiguity

◮ Set baselines for how well we can disambiguate withouta joint model

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

A Roadmap for my Project

◮ Build basic disambiguation systems for a few types ofambiguity

◮ Set baselines for how well we can disambiguate withouta joint model

◮ Learn how to combine information from disparatesystems to form a joint model

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

A Roadmap for my Project

◮ Build basic disambiguation systems for a few types ofambiguity

◮ Set baselines for how well we can disambiguate withouta joint model

◮ Learn how to combine information from disparatesystems to form a joint model

◮ Evaluate the joint model’s performance on each type ofambiguity vs. the baselines

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Where am I on that roadmap?

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Where am I on that roadmap?

◮ Not very far along, really. I’ve built a syntaxdisambiguation system and set a strong baseline forsyntax disambiguation in isolation.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Where am I on that roadmap?

◮ Not very far along, really. I’ve built a syntaxdisambiguation system and set a strong baseline forsyntax disambiguation in isolation.

◮ Started some experiments into using more globalinformation, but so far nothing worth reporting.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Next, shifting gears...

I’ll describe the work I’ve done exploring the space of syntaxdisambiguation for HPSG grammars.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Syntax Disambiguation

◮ Given an utterance, find the best analysis licensed byyour grammar.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Syntax Disambiguation

◮ Given an utterance, find the best analysis licensed byyour grammar.

◮ With broad-coverage grammars, there can be a lot ofcandidates analyses!

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Syntax Disambiguation

◮ Given an utterance, find the best analysis licensed byyour grammar.

◮ With broad-coverage grammars, there can be a lot ofcandidates analyses!

◮ ERG licenses more than 10,000 distinct analyses for:

I would still have an appointment slot free on Tuesday, the

sixth of April, but only in the afternoon.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Syntax DisambiguationThe different analyses are usually not all semanticallyequivalent. How do we know which meaning was intendedby the speaker?

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Syntax DisambiguationThe different analyses are usually not all semanticallyequivalent. How do we know which meaning was intendedby the speaker?

Common solution: annotate the intended meaning on asufficiently large corpus of example utterances, and thenapply machine learning techniques to build a model that willallow us to guess the intended meaning on unseen data.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Layout

Modeling Ambiguity

Syntax DisambiguationMaximum Entropy ModelsMaxEnt for the ERGEvaluation Metrics

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Conditional log-linear models

p(y |x) =ew ·f (x ,y)

y ′ ew ·f (x ,y ′)

where w is a vector of feature weights, typically learned fromthe training data.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Conditional log-linear models

p(y |x) =ew ·f (x ,y)

y ′ ew ·f (x ,y ′)

where w is a vector of feature weights, typically learned fromthe training data.

◮ Conditional probability model for classification/rankingproblems.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Conditional log-linear models

p(y |x) =ew ·f (x ,y)

y ′ ew ·f (x ,y ′)

where w is a vector of feature weights, typically learned fromthe training data.

◮ Conditional probability model for classification/rankingproblems.

◮ Describe relationship of class y to input x by n

real-valued feature functions fj(x , y).

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Conditional log-linear models

p(y |x) =ew ·f (x ,y)

y ′ ew ·f (x ,y ′)

where w is a vector of feature weights, typically learned fromthe training data.

◮ Conditional probability model for classification/rankingproblems.

◮ Describe relationship of class y to input x by n

real-valued feature functions fj(x , y).

◮ Equivalently, a vector valued feature function f (x , y).

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Conditional log-linear models

p(y |x) =ew ·f (x ,y)

y ′ ew ·f (x ,y ′)

where w is a vector of feature weights, typically learned fromthe training data.

◮ Conditional probability model for classification/rankingproblems.

◮ Describe relationship of class y to input x by n

real-valued feature functions fj(x , y).

◮ Equivalently, a vector valued feature function f (x , y).

Since the denominator is a function only of x and does notdepend on y , determining arg maxy p(y |x) amounts tomaximizing w · f (x , y).

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Maximum Entropy Models

◮ One common way of selecting w .

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Maximum Entropy Models

◮ One common way of selecting w .

◮ Also known as Multinomial Logistic Regression.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Maximum Entropy Models

◮ One common way of selecting w .

◮ Also known as Multinomial Logistic Regression.

◮ Corresponds to maximum likelihood estimation –maximize the conditional likelihood of the training data.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Maximum Entropy Models

◮ One common way of selecting w .

◮ Also known as Multinomial Logistic Regression.

◮ Corresponds to maximum likelihood estimation –maximize the conditional likelihood of the training data.

◮ Alternate description: model with maximum entropysubject to Ep(f (x , y)) = f̃ (x , y), the empiricalexpectation of f (x , y) on the training data.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Maximum Entropy Models

◮ One common way of selecting w .

◮ Also known as Multinomial Logistic Regression.

◮ Corresponds to maximum likelihood estimation –maximize the conditional likelihood of the training data.

◮ Alternate description: model with maximum entropysubject to Ep(f (x , y)) = f̃ (x , y), the empiricalexpectation of f (x , y) on the training data.

◮ Popular applications: POS tagging, NER, parsedisambiguation, ...

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Maximum Entropy Models

◮ One common way of selecting w .

◮ Also known as Multinomial Logistic Regression.

◮ Corresponds to maximum likelihood estimation –maximize the conditional likelihood of the training data.

◮ Alternate description: model with maximum entropysubject to Ep(f (x , y)) = f̃ (x , y), the empiricalexpectation of f (x , y) on the training data.

◮ Popular applications: POS tagging, NER, parsedisambiguation, ...

◮ Other techniques for picking w : SVMs, Perceptrons, ...

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Layout

Modeling Ambiguity

Syntax DisambiguationMaximum Entropy ModelsMaxEnt for the ERGEvaluation Metrics

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

The ERG and WeScience

◮ English Resource Grammar (ERG): broad coverageprecision computational English grammar based onHPSG

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

The ERG and WeScience

◮ English Resource Grammar (ERG): broad coverageprecision computational English grammar based onHPSG

◮ WeScience: a treebank of gold ERG analyses for around9000 sentences from Wikipedia in the domain ofComputational Linguistics.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

How well can we disambiguate when parsing WeScienceinputs just by looking at ERG analyses in isolation?

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

How well can we disambiguate when parsing WeScienceinputs just by looking at ERG analyses in isolation?

◮ Use Maximum Entropy modeling; x = an inputsentence and y = a candidate analysis.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

How well can we disambiguate when parsing WeScienceinputs just by looking at ERG analyses in isolation?

◮ Use Maximum Entropy modeling; x = an inputsentence and y = a candidate analysis.

◮ f (x , y) = a vector of features describing a candidateanalysis, possibly making reference to the input item.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

How well can we disambiguate when parsing WeScienceinputs just by looking at ERG analyses in isolation?

◮ Use Maximum Entropy modeling; x = an inputsentence and y = a candidate analysis.

◮ f (x , y) = a vector of features describing a candidateanalysis, possibly making reference to the input item.

◮ Train a MaxEnt model on WeScience.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

How well can we disambiguate when parsing WeScienceinputs just by looking at ERG analyses in isolation?

◮ Use Maximum Entropy modeling; x = an inputsentence and y = a candidate analysis.

◮ f (x , y) = a vector of features describing a candidateanalysis, possibly making reference to the input item.

◮ Train a MaxEnt model on WeScience.

◮ To disambiguate an unseen input, select the candidateanalysis with the largest p(y |x).

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

How well can we disambiguate when parsing WeScienceinputs just by looking at ERG analyses in isolation?

◮ Use Maximum Entropy modeling; x = an inputsentence and y = a candidate analysis.

◮ f (x , y) = a vector of features describing a candidateanalysis, possibly making reference to the input item.

◮ Train a MaxEnt model on WeScience.

◮ To disambiguate an unseen input, select the candidateanalysis with the largest p(y |x).

◮ Scoring: on a held-out test set, for what proportion ofthe inputs can our model identify the correct goldanalysis? (exact match)

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

How well can we disambiguate when parsing WeScienceinputs just by looking at ERG analyses in isolation?

◮ Use Maximum Entropy modeling; x = an inputsentence and y = a candidate analysis.

◮ f (x , y) = a vector of features describing a candidateanalysis, possibly making reference to the input item.

◮ Train a MaxEnt model on WeScience.

◮ To disambiguate an unseen input, select the candidateanalysis with the largest p(y |x).

◮ Scoring: on a held-out test set, for what proportion ofthe inputs can our model identify the correct goldanalysis? (exact match)

◮ Use 10-fold cross-validation to reduce measurementnoise.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

An ERG analysis consists of a derivation tree and an MRS

meaning representation.Simplified candidate analysis for The very large cat meowed.:

SB-HD MC C

SP-HD N C

D - THE LEThe

AJ-HDN NORM C

SP-HD HC C

AV - DG-V LEvery

AJ - I LElarge

N SG ILR

N - C LEcat

W PERIOD PLR

V PST OLR

V - LEmeowed.

{

the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

What’s there left to do?Define f (x , y).

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

What’s there left to do?Define f (x , y).

◮ Note that since the analysis y includes all theinformation needed to reconstruct x , we may as welljust talk about f (y) instead of f (x , y).

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

What’s there left to do?Define f (x , y).

◮ Note that since the analysis y includes all theinformation needed to reconstruct x , we may as welljust talk about f (y) instead of f (x , y).

◮ We’ll make f (y) a sparse and very high dimensionalvector, with a few 1’s and 2’s here and there.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

What’s there left to do?Define f (x , y).

◮ Note that since the analysis y includes all theinformation needed to reconstruct x , we may as welljust talk about f (y) instead of f (x , y).

◮ We’ll make f (y) a sparse and very high dimensionalvector, with a few 1’s and 2’s here and there.

◮ Each dimension is called a feature.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Cookie Cutter Features

◮ In previous work on HPSG parse disambiguation, eachfeature records the number of times some particularpattern of tree nodes or MRS predications/variables isfound in the candidate analysis y .

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Cookie Cutter Features

◮ In previous work on HPSG parse disambiguation, eachfeature records the number of times some particularpattern of tree nodes or MRS predications/variables isfound in the candidate analysis y .

◮ Defining f (y) amounts to making a list of subgraphs tolook for in the tree and MRS for y .

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Cookie Cutter Features

◮ In previous work on HPSG parse disambiguation, eachfeature records the number of times some particularpattern of tree nodes or MRS predications/variables isfound in the candidate analysis y .

◮ Defining f (y) amounts to making a list of subgraphs tolook for in the tree and MRS for y .

◮ A straightforward way of defining a bunch of“interesting” subgraphs to look for is to decide on a“cookie-cutter” shape, and use that to cut out sectionsof all the trees in the treebank.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Cookie Cutter Features

◮ In previous work on HPSG parse disambiguation, eachfeature records the number of times some particularpattern of tree nodes or MRS predications/variables isfound in the candidate analysis y .

◮ Defining f (y) amounts to making a list of subgraphs tolook for in the tree and MRS for y .

◮ A straightforward way of defining a bunch of“interesting” subgraphs to look for is to decide on a“cookie-cutter” shape, and use that to cut out sectionsof all the trees in the treebank.

◮ This allows us to quickly enumerate tens or hundreds ofthousands of subgraphs that can occur in analyses, andbuild feature vectors out of them.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Baseline FeaturesOne of the simplest useful cookie cutters looks like this:

Cookie Cutter Example Subgraph

?

? ?

SP-HD HC C

AV - DG-V LE AJ - I LE

This cookie-cutter matches about 57,000 distinct subgraphsfrom WeScience.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Baseline Features (cont’d)

◮ So, we get a 57,000 dimensional feature space.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Baseline Features (cont’d)

◮ So, we get a 57,000 dimensional feature space.

◮ MaxEnt model accuracy: 40.4%

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Baseline Features (cont’d)

◮ So, we get a 57,000 dimensional feature space.

◮ MaxEnt model accuracy: 40.4%

◮ For comparison, random choice accuracy: 8.1%

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Baseline Features (cont’d)

◮ So, we get a 57,000 dimensional feature space.

◮ MaxEnt model accuracy: 40.4%

◮ For comparison, random choice accuracy: 8.1%

◮ It turns out 40.4% is a fairly strong baseline.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Baseline Features (cont’d)

◮ So, we get a 57,000 dimensional feature space.

◮ MaxEnt model accuracy: 40.4%

◮ For comparison, random choice accuracy: 8.1%

◮ It turns out 40.4% is a fairly strong baseline.

◮ We’ll take this as our baseline against which to evaluateother ideas.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Can we do better?

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Can we do better?

◮ Of course!

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Can we do better?

◮ Of course!

◮ I tried about 60 different combinations of feature sets.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Can we do better?

◮ Of course!

◮ I tried about 60 different combinations of feature sets.

◮ I’ll show you several of the most interesting ones.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Can we do better?

◮ Of course!

◮ I tried about 60 different combinations of feature sets.

◮ I’ll show you several of the most interesting ones.

◮ Baseline features included in addition to those I’lldescribe.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

GrandparentingThe single most helpful feature type just adds some parentsto the baseline cookie cutter:

GP[1] GP[2] GP[3]

?

?

? ?

?

?

?

? ?

?

?

?

?

? ?

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

GrandparentingThe single most helpful feature type just adds some parentsto the baseline cookie cutter:

GP[1] GP[2] GP[3]

?

?

? ?

?

?

?

? ?

?

?

?

?

? ?42.97% 44.46% 44.9%

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Uncles

?

? ?

? ? ?

? ?

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Uncles

?

? ?

? ? ?

? ?

42.97%

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Syntactic Dependencies

◮ Converts the tree into a list of syntactic dependencies

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Syntactic Dependencies

◮ Converts the tree into a list of syntactic dependencies

◮ e.g.: SB-HD MC C (meowed. : V - LE, cat : N - C LE)

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Syntactic Dependencies

◮ Converts the tree into a list of syntactic dependencies

◮ e.g.: SB-HD MC C (meowed. : V - LE, cat : N - C LE)

◮ Each such dependency is considered a feature.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Syntactic Dependencies

◮ Converts the tree into a list of syntactic dependencies

◮ e.g.: SB-HD MC C (meowed. : V - LE, cat : N - C LE)

◮ Each such dependency is considered a feature.

◮ Accuracy: 42.97%

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Lexicalizations

◮ Decorate each node in the tree with information aboutthe lexical head of that subtree.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Lexicalizations

◮ Decorate each node in the tree with information aboutthe lexical head of that subtree.

◮ Some decoration choices: part of speech, lexical type,HEAD value, lexeme name, surface form, stem

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Lexicalizations

◮ Decorate each node in the tree with information aboutthe lexical head of that subtree.

◮ Some decoration choices: part of speech, lexical type,HEAD value, lexeme name, surface form, stem

◮ Best performing decoration: lexeme name

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Lexicalizations

◮ Decorate each node in the tree with information aboutthe lexical head of that subtree.

◮ Some decoration choices: part of speech, lexical type,HEAD value, lexeme name, surface form, stem

◮ Best performing decoration: lexeme name

◮ Accuracy with baseline cookie cutter: 44.16%

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Lexicalizations

◮ Decorate each node in the tree with information aboutthe lexical head of that subtree.

◮ Some decoration choices: part of speech, lexical type,HEAD value, lexeme name, surface form, stem

◮ Best performing decoration: lexeme name

◮ Accuracy with baseline cookie cutter: 44.16%

◮ We can do this with other cookie cutters as well.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Lexicalizations

◮ Decorate each node in the tree with information aboutthe lexical head of that subtree.

◮ Some decoration choices: part of speech, lexical type,HEAD value, lexeme name, surface form, stem

◮ Best performing decoration: lexeme name

◮ Accuracy with baseline cookie cutter: 44.16%

◮ We can do this with other cookie cutters as well.

◮ Accuracy with GP[2] cookie cutter: 45.89%

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

MRS Features˘

the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}

So far, we’ve only described features extracted from thederivation tree portion of the analysis.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

MRS Features˘

the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}

So far, we’ve only described features extracted from thederivation tree portion of the analysis.

◮ What about the MRS?

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

MRS Features˘

the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}

So far, we’ve only described features extracted from thederivation tree portion of the analysis.

◮ What about the MRS?

◮ Tried two schemes for encoding MRS into features:variable-centric and predication-centric

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Variable-centric MRS Features˘

the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}

For each MRS variable, features for all (unordered) pairs andtriples from the set of relations that are known to apply tothat variable.Example triple for x1: (large a.ARG1, meow v.ARG1, cat n.ARG0)

Example pair for e2: (large a.ARG0, very x deg.ARG1)

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Variable-centric MRS Features{

the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}

For each MRS variable, features for all (unordered) pairs andtriples from the set of relations that are known to apply tothat variable.Example triple for x1: (large a.ARG1, meow v.ARG1, cat n.ARG0)

Example pair for e2: (large a.ARG0, very x deg.ARG1)

Accuracy: 41.40%

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Predication-centric MRS Features{

the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}

One feature for each elementary predication in the MRS,describing the relations pointed to by the non-ARG0 roles.Example feature: very x deg(ARG1=large a)

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Predication-centric MRS Features{

the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}

One feature for each elementary predication in the MRS,describing the relations pointed to by the non-ARG0 roles.Example feature: very x deg(ARG1=large a)Accuracy: 42.61%

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Predication-centric MRS Features{

the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}

One feature for each elementary predication in the MRS,describing the relations pointed to by the non-ARG0 roles.Example feature: very x deg(ARG1=large a)Accuracy: 42.61%Both MRS feature sets combined: 42.63% ...

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Combining Several Feature Sets

◮ Information contained in different feature templates isnot orthogonal

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Combining Several Feature Sets

◮ Information contained in different feature templates isnot orthogonal

◮ Subadditivity of performance improvements

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Combining Several Feature Sets

◮ Information contained in different feature templates isnot orthogonal

◮ Subadditivity of performance improvements

◮ However, modest improvements are possible.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Combining Several Feature Sets

◮ Information contained in different feature templates isnot orthogonal

◮ Subadditivity of performance improvements

◮ However, modest improvements are possible.

◮ Best combination I’ve found: all the MRS features,lexeme name lexicalization, GP[2], and a few otherfeatures I didn’t describe that don’t perform well inisolation.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Combining Several Feature Sets

◮ Information contained in different feature templates isnot orthogonal

◮ Subadditivity of performance improvements

◮ However, modest improvements are possible.

◮ Best combination I’ve found: all the MRS features,lexeme name lexicalization, GP[2], and a few otherfeatures I didn’t describe that don’t perform well inisolation.

◮ Accuracy: 47.52%

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

MaxEnt Disambiguation for WeScience: Summary

Picking the right analysis is hard.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

MaxEnt Disambiguation for WeScience: Summary

Picking the right analysis is hard.

Random choice 8.1%Strong baseline 40.4%

Best model 47.5%

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Layout

Modeling Ambiguity

Syntax DisambiguationMaximum Entropy ModelsMaxEnt for the ERGEvaluation Metrics

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Doesn’t 47% seem kind of sad?Well, yes, in a way.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Doesn’t 47% seem kind of sad?Well, yes, in a way.

◮ But there are other ways to evaluate that give us muchcheerier numbers!

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Doesn’t 47% seem kind of sad?Well, yes, in a way.

◮ But there are other ways to evaluate that give us muchcheerier numbers!

◮ 47% of the time we get an exactly correct answer.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Doesn’t 47% seem kind of sad?Well, yes, in a way.

◮ But there are other ways to evaluate that give us muchcheerier numbers!

◮ 47% of the time we get an exactly correct answer.

◮ But we don’t assign ourselves any partial credit forgetting a partially correct answer.

◮ Many other metrics do.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Some Other Metrics

Random Baseline Best Model

Exact Tree Match 8.1% 40.4% 47.5%Exact MRS Match 8.8% 41.3% 48.5%

Unlabeled PARSEVAL 80.7% 93.3% 94.7%Labeled PARSEVAL 70.3% 88.0% 90.5%Unlabeled Syn-Deps 79.2% 92.1% 93.8%Labeled Syn-Deps 71.0% 89.1% 91.3%Elementary Deps 82.1% 94.2% 95.4%Leaf Ancestor 79.0% 92.4% 93.7%

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Do metrics agree with each other?What if models that perform well under Exact Tree Matchdon’t necessarily perform will under, say, PARSEVAL?

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Do metrics agree with each other?What if models that perform well under Exact Tree Matchdon’t necessarily perform will under, say, PARSEVAL?

◮ For an arbitrary pair of evaluation metrics, that couldhappen.

◮ Different metrics can evaluate different aspects of amodel’s performance.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Do metrics agree with each other?What if models that perform well under Exact Tree Matchdon’t necessarily perform will under, say, PARSEVAL?

◮ For an arbitrary pair of evaluation metrics, that couldhappen.

◮ Different metrics can evaluate different aspects of amodel’s performance.

◮ But how about for the metrics that people commonlyemploy as overall figures of merit?

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Fortunately, this doesn’t turn out to be much of an issue.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Fortunately, this doesn’t turn out to be much of an issue.

◮ Conducted two experiments

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Fortunately, this doesn’t turn out to be much of an issue.

◮ Conducted two experiments

◮ First: optimizing MaxEnt meta-parameter

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Fortunately, this doesn’t turn out to be much of an issue.

◮ Conducted two experiments

◮ First: optimizing MaxEnt meta-parameter

◮ Second: picking feature combinations

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Optimizing MaxEnt meta-parameter

◮ MaxEnt has a regularization parameter

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Optimizing MaxEnt meta-parameter

◮ MaxEnt has a regularization parameter

◮ Controls the trade-off between generalization andoverfitting

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Optimizing MaxEnt meta-parameter

◮ MaxEnt has a regularization parameter

◮ Controls the trade-off between generalization andoverfitting

◮ Given an evaluation metric, we can determine the“optimal” value for the regularization parameterthrough cross-validation.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Optimizing MaxEnt meta-parameter

◮ MaxEnt has a regularization parameter

◮ Controls the trade-off between generalization andoverfitting

◮ Given an evaluation metric, we can determine the“optimal” value for the regularization parameterthrough cross-validation.

◮ How does this optimal value vary as a function of whichmetric is used?

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Optimizing MaxEnt meta-parameter (cont’d)

24

26

28

30

32

34

36

38

40

42

0.001 0.01 0.1 1 10 100 1000

Exa

ct M

atch

Acc

urac

y (%

)

Regularization Variance Parameter

Regularized Performance of pcfg baseline

pcfg baseline

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Optimizing MaxEnt meta-parameter (cont’d)

-1.5

-1

-0.5

0

0.5

1

0.001 0.01 0.1 1 10 100 1000

Z-S

core

s

Regularization

Z-Score Comparison of Metrics

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Optimizing MaxEnt meta-parameter (cont’d)Evidently, in practice the optimum meta-parameter is almostexactly the same for all the different metrics.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Optimizing MaxEnt meta-parameter (cont’d)Evidently, in practice the optimum meta-parameter is almostexactly the same for all the different metrics.

◮ Maximum error rate increase from optimizing with adifferent metric from the set listed a few slides ago, onbaseline feature configuration: 0.41%.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Optimizing MaxEnt meta-parameter (cont’d)Evidently, in practice the optimum meta-parameter is almostexactly the same for all the different metrics.

◮ Maximum error rate increase from optimizing with adifferent metric from the set listed a few slides ago, onbaseline feature configuration: 0.41%.

◮ Averaging of that figure over all the feature setconfigurations: 0.81%.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Optimizing MaxEnt meta-parameter (cont’d)Evidently, in practice the optimum meta-parameter is almostexactly the same for all the different metrics.

◮ Maximum error rate increase from optimizing with adifferent metric from the set listed a few slides ago, onbaseline feature configuration: 0.41%.

◮ Averaging of that figure over all the feature setconfigurations: 0.81%.

◮ Conclusion: it doesn’t really matter what metric youuse to optimize the meta-parameter.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Selecting a feature set combination

◮ Above, I described a handful of feature configurations; Iactually tested about 60 different combinations.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Selecting a feature set combination

◮ Above, I described a handful of feature configurations; Iactually tested about 60 different combinations.

◮ In principal, the metric that I used to decide which onewas best matters.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Selecting a feature set combination

◮ Above, I described a handful of feature configurations; Iactually tested about 60 different combinations.

◮ In principal, the metric that I used to decide which onewas best matters.

◮ However, in fact the metrics all ranked the sameconfiguration as the best.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Metrics: SummaryThere are many different syntax disambiguation evaluationmetrics available.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Metrics: SummaryThere are many different syntax disambiguation evaluationmetrics available.However, from the point of view of optimizing a model,there is little difference between the 6 or so most commonlyused metrics.

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Syntax Disambiguation: ConclusionMy best combination model appears to represent a decentapproximation of the best performance available fromcurrent techniques when viewed through any of thecommonly used metrics.Hence it is a suitable baseline for judging the success offuture forays into joint disambiguation.