Joint Ambiguity Modeling in NLP - uio.no · Joint Ambiguity Modeling in NLP Woodley Packard...

Joint AmbiguityModeling in NLP

Woodley Packard

ModelingAmbiguity

SyntaxDisambiguation

Maximum EntropyModels

MaxEnt for the ERG

Evaluation Metrics

Joint Ambiguity Modeling in NLP

Woodley Packard

Universitet i Oslo

March 21, 2011

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Introduction

Ambiguity is a central phenomenon in natural language,affecting accuracy and efficiency in most if not all types ofNLP.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word boundaries

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word boundaries

◮ Difficult in some languages – e.g. Chinese

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word boundaries

◮ Difficult in some languages – e.g. Chinese

◮ Norwegian examples lifted from recent mailing listactivity:

◮ Lege-ring, sei-del, sel-skap, bru-sau-tomat,

sports-av-iser...

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Sentence Boundaries

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Again, not marked in some languages

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Again, not marked in some languages

◮ Even in English, the markers are overloaded:

◮ The CEO had the P.R. Department leaders make risky

moves.

◮ The citizens voted in the U.S. Presidential Election

polls.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses

◮ English word bank:

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses

◮ (n) Financial institution

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses

◮ (n) Side of a river / stream

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses

◮ (n) Repository of resources (tree bank, bank of

switches)

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses

switches)

◮ (v) To bet everything (on ...)

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses

switches)

◮ (v) Tilting to make a turn in an airplane

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Word Senses

switches)

◮ (v) Tilting to make a turn in an airplane

◮ (v) To do business at a bank (financial institution)

◮ .... others.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Syntactic Ambiguity

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ I saw the man with the telescope.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Anaphora

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Anaphora

◮ Jessie and Alex grew up together, but eventually he

moved to the West coast and she moved to the East

coast.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Types of Ambiguity

◮ Anaphora

◮ Jessie and Alex grew up together, but eventually he

moved to the West coast and she moved to the East

coast.

◮ .... others.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Traditional Approaches to Dealing with Ambiguity

◮ How well can we resolve some particular type ofambiguity?

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Rich literature about how to handle each type ofambiguity.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ None are completely solved problems.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Some have been solved fairly well (word boundaries,sentence boundaries).

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Some have been solved fairly well (word boundaries,sentence boundaries).

◮ But most have room for improvement.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

An Emerging Trend in Research

◮ Results about how diverse types of information can behelpful in ambiguity resolution.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Large body of research about using syntax as a guidefor resolving anaphora.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Some others:

◮ “Using Syntactic Dependency as Local Context toResolve Word Sense Ambiguity” D. Lin, ACL 1997.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Some others:

◮ “Using Syntactic Dependency as Local Context toResolve Word Sense Ambiguity” D. Lin, ACL 1997.

◮ “Improving Parsing and PP attachment Performancewith Sense Information” E. Agirre, T. Baldwin, D.Martinez, ACL 2008.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Joint Modeling of AmbiguityIdea:Instead of modeling individual marginal distributions for eachtype of ambiguity, or conditional models involving two typesof ambiguity, what if we model a joint distribution for all thetypes of ambiguity we are interested in at once?

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Gain: modeling flexibility

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Cost: greater complexity

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Possible framework: graphical models

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Possible framework: graphical models

◮ Inference and parameter estimation: sometimestractable, sometimes not.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

A Graphical Model for Ambiguity

Anaphora

Syntax

Hobbs 1978

Lin 1997Agirre et al. 2008

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Other Dependencies Seem Likely

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Information about anaphora may be helpful indisambiguating syntax.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Word Priming: information about the senses of words ina given sentence may be helpful for determining wordsenses in nearby subsequent sentences.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Syntactic Priming: information about the constructionsused in a given sentence may be helpful for determiningthe syntax of nearby subsequent sentences.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Syntactic Priming: information about the constructionsused in a given sentence may be helpful for determiningthe syntax of nearby subsequent sentences.

◮ Maybe others too!

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

A Revised Graphical Model for Ambiguity

Anaphora

Syntax1 Syntax2 Syntax3

WSD1 WSD2 WSD3

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

A Roadmap for my Project

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Build basic disambiguation systems for a few types ofambiguity

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Set baselines for how well we can disambiguate withouta joint model

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Learn how to combine information from disparatesystems to form a joint model

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Learn how to combine information from disparatesystems to form a joint model

◮ Evaluate the joint model’s performance on each type ofambiguity vs. the baselines

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Where am I on that roadmap?

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Not very far along, really. I’ve built a syntaxdisambiguation system and set a strong baseline forsyntax disambiguation in isolation.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Not very far along, really. I’ve built a syntaxdisambiguation system and set a strong baseline forsyntax disambiguation in isolation.

◮ Started some experiments into using more globalinformation, but so far nothing worth reporting.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Next, shifting gears...

I’ll describe the work I’ve done exploring the space of syntaxdisambiguation for HPSG grammars.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Syntax Disambiguation

◮ Given an utterance, find the best analysis licensed byyour grammar.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ With broad-coverage grammars, there can be a lot ofcandidates analyses!

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ With broad-coverage grammars, there can be a lot ofcandidates analyses!

◮ ERG licenses more than 10,000 distinct analyses for:

I would still have an appointment slot free on Tuesday, the

sixth of April, but only in the afternoon.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Syntax DisambiguationThe different analyses are usually not all semanticallyequivalent. How do we know which meaning was intendedby the speaker?

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Syntax DisambiguationThe different analyses are usually not all semanticallyequivalent. How do we know which meaning was intendedby the speaker?

Common solution: annotate the intended meaning on asufficiently large corpus of example utterances, and thenapply machine learning techniques to build a model that willallow us to guess the intended meaning on unseen data.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Layout

Modeling Ambiguity

Syntax DisambiguationMaximum Entropy ModelsMaxEnt for the ERGEvaluation Metrics

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Conditional log-linear models

p(y |x) =ew ·f (x ,y)

y ′ ew ·f (x ,y ′)

where w is a vector of feature weights, typically learned fromthe training data.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

p(y |x) =ew ·f (x ,y)

y ′ ew ·f (x ,y ′)

◮ Conditional probability model for classification/rankingproblems.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

p(y |x) =ew ·f (x ,y)

y ′ ew ·f (x ,y ′)

◮ Describe relationship of class y to input x by n

real-valued feature functions fj(x , y).

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

p(y |x) =ew ·f (x ,y)

y ′ ew ·f (x ,y ′)

◮ Equivalently, a vector valued feature function f (x , y).

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

p(y |x) =ew ·f (x ,y)

y ′ ew ·f (x ,y ′)

◮ Equivalently, a vector valued feature function f (x , y).

Since the denominator is a function only of x and does notdepend on y , determining arg maxy p(y |x) amounts tomaximizing w · f (x , y).

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Maximum Entropy Models

◮ One common way of selecting w .

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Also known as Multinomial Logistic Regression.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Corresponds to maximum likelihood estimation –maximize the conditional likelihood of the training data.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Alternate description: model with maximum entropysubject to Ep(f (x , y)) = f̃ (x , y), the empiricalexpectation of f (x , y) on the training data.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Popular applications: POS tagging, NER, parsedisambiguation, ...

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Popular applications: POS tagging, NER, parsedisambiguation, ...

◮ Other techniques for picking w : SVMs, Perceptrons, ...

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Layout

Modeling Ambiguity

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

The ERG and WeScience

◮ English Resource Grammar (ERG): broad coverageprecision computational English grammar based onHPSG

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

The ERG and WeScience

◮ English Resource Grammar (ERG): broad coverageprecision computational English grammar based onHPSG

◮ WeScience: a treebank of gold ERG analyses for around9000 sentences from Wikipedia in the domain ofComputational Linguistics.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

How well can we disambiguate when parsing WeScienceinputs just by looking at ERG analyses in isolation?

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Use Maximum Entropy modeling; x = an inputsentence and y = a candidate analysis.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ f (x , y) = a vector of features describing a candidateanalysis, possibly making reference to the input item.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Train a MaxEnt model on WeScience.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ To disambiguate an unseen input, select the candidateanalysis with the largest p(y |x).

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Scoring: on a held-out test set, for what proportion ofthe inputs can our model identify the correct goldanalysis? (exact match)

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Scoring: on a held-out test set, for what proportion ofthe inputs can our model identify the correct goldanalysis? (exact match)

◮ Use 10-fold cross-validation to reduce measurementnoise.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

An ERG analysis consists of a derivation tree and an MRS

meaning representation.Simplified candidate analysis for The very large cat meowed.:

SB-HD MC C

SP-HD N C

D - THE LEThe

AJ-HDN NORM C

SP-HD HC C

AV - DG-V LEvery

AJ - I LElarge

N SG ILR

N - C LEcat

W PERIOD PLR

V PST OLR

V - LEmeowed.

the q(x1), cat n(x1), large a(e2, x1), meow v(e3, x1), very x deg(e4, e2)}

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

What’s there left to do?Define f (x , y).

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Note that since the analysis y includes all theinformation needed to reconstruct x , we may as welljust talk about f (y) instead of f (x , y).

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ We’ll make f (y) a sparse and very high dimensionalvector, with a few 1’s and 2’s here and there.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ We’ll make f (y) a sparse and very high dimensionalvector, with a few 1’s and 2’s here and there.

◮ Each dimension is called a feature.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Cookie Cutter Features

◮ In previous work on HPSG parse disambiguation, eachfeature records the number of times some particularpattern of tree nodes or MRS predications/variables isfound in the candidate analysis y .

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Defining f (y) amounts to making a list of subgraphs tolook for in the tree and MRS for y .

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ A straightforward way of defining a bunch of“interesting” subgraphs to look for is to decide on a“cookie-cutter” shape, and use that to cut out sectionsof all the trees in the treebank.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ A straightforward way of defining a bunch of“interesting” subgraphs to look for is to decide on a“cookie-cutter” shape, and use that to cut out sectionsof all the trees in the treebank.

◮ This allows us to quickly enumerate tens or hundreds ofthousands of subgraphs that can occur in analyses, andbuild feature vectors out of them.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Baseline FeaturesOne of the simplest useful cookie cutters looks like this:

Cookie Cutter Example Subgraph

SP-HD HC C

AV - DG-V LE AJ - I LE

This cookie-cutter matches about 57,000 distinct subgraphsfrom WeScience.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Baseline Features (cont’d)

◮ So, we get a 57,000 dimensional feature space.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ MaxEnt model accuracy: 40.4%

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ For comparison, random choice accuracy: 8.1%

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ It turns out 40.4% is a fairly strong baseline.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ It turns out 40.4% is a fairly strong baseline.

◮ We’ll take this as our baseline against which to evaluateother ideas.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Can we do better?

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Can we do better?

◮ Of course!

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Can we do better?

◮ Of course!

◮ I tried about 60 different combinations of feature sets.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Can we do better?

◮ Of course!

◮ I’ll show you several of the most interesting ones.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Can we do better?

◮ Of course!

◮ I’ll show you several of the most interesting ones.

◮ Baseline features included in addition to those I’lldescribe.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

GrandparentingThe single most helpful feature type just adds some parentsto the baseline cookie cutter:

GP[1] GP[2] GP[3]

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

GrandparentingThe single most helpful feature type just adds some parentsto the baseline cookie cutter:

GP[1] GP[2] GP[3]

? ?42.97% 44.46% 44.9%

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Uncles

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Uncles

42.97%

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Syntactic Dependencies

◮ Converts the tree into a list of syntactic dependencies

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ e.g.: SB-HD MC C (meowed. : V - LE, cat : N - C LE)

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Each such dependency is considered a feature.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Each such dependency is considered a feature.

◮ Accuracy: 42.97%

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Lexicalizations

◮ Decorate each node in the tree with information aboutthe lexical head of that subtree.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Lexicalizations

◮ Some decoration choices: part of speech, lexical type,HEAD value, lexeme name, surface form, stem

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Lexicalizations

◮ Best performing decoration: lexeme name

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Lexicalizations

◮ Accuracy with baseline cookie cutter: 44.16%

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Lexicalizations

◮ We can do this with other cookie cutters as well.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Lexicalizations

◮ We can do this with other cookie cutters as well.

◮ Accuracy with GP[2] cookie cutter: 45.89%

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

MRS Features˘

So far, we’ve only described features extracted from thederivation tree portion of the analysis.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

MRS Features˘

◮ What about the MRS?

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

MRS Features˘

◮ What about the MRS?

◮ Tried two schemes for encoding MRS into features:variable-centric and predication-centric

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Variable-centric MRS Features˘

For each MRS variable, features for all (unordered) pairs andtriples from the set of relations that are known to apply tothat variable.Example triple for x1: (large a.ARG1, meow v.ARG1, cat n.ARG0)

Example pair for e2: (large a.ARG0, very x deg.ARG1)

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Variable-centric MRS Features{

For each MRS variable, features for all (unordered) pairs andtriples from the set of relations that are known to apply tothat variable.Example triple for x1: (large a.ARG1, meow v.ARG1, cat n.ARG0)

Example pair for e2: (large a.ARG0, very x deg.ARG1)

Accuracy: 41.40%

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Predication-centric MRS Features{

One feature for each elementary predication in the MRS,describing the relations pointed to by the non-ARG0 roles.Example feature: very x deg(ARG1=large a)

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

One feature for each elementary predication in the MRS,describing the relations pointed to by the non-ARG0 roles.Example feature: very x deg(ARG1=large a)Accuracy: 42.61%

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

One feature for each elementary predication in the MRS,describing the relations pointed to by the non-ARG0 roles.Example feature: very x deg(ARG1=large a)Accuracy: 42.61%Both MRS feature sets combined: 42.63% ...

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Combining Several Feature Sets

◮ Information contained in different feature templates isnot orthogonal

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Subadditivity of performance improvements

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ However, modest improvements are possible.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Best combination I’ve found: all the MRS features,lexeme name lexicalization, GP[2], and a few otherfeatures I didn’t describe that don’t perform well inisolation.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Best combination I’ve found: all the MRS features,lexeme name lexicalization, GP[2], and a few otherfeatures I didn’t describe that don’t perform well inisolation.

◮ Accuracy: 47.52%

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

MaxEnt Disambiguation for WeScience: Summary

Picking the right analysis is hard.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

MaxEnt Disambiguation for WeScience: Summary

Picking the right analysis is hard.

Random choice 8.1%Strong baseline 40.4%

Best model 47.5%

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Layout

Modeling Ambiguity

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Doesn’t 47% seem kind of sad?Well, yes, in a way.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ But there are other ways to evaluate that give us muchcheerier numbers!

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ 47% of the time we get an exactly correct answer.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ 47% of the time we get an exactly correct answer.

◮ But we don’t assign ourselves any partial credit forgetting a partially correct answer.

◮ Many other metrics do.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Some Other Metrics

Random Baseline Best Model

Exact Tree Match 8.1% 40.4% 47.5%Exact MRS Match 8.8% 41.3% 48.5%

Unlabeled PARSEVAL 80.7% 93.3% 94.7%Labeled PARSEVAL 70.3% 88.0% 90.5%Unlabeled Syn-Deps 79.2% 92.1% 93.8%Labeled Syn-Deps 71.0% 89.1% 91.3%Elementary Deps 82.1% 94.2% 95.4%Leaf Ancestor 79.0% 92.4% 93.7%

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Do metrics agree with each other?What if models that perform well under Exact Tree Matchdon’t necessarily perform will under, say, PARSEVAL?

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ For an arbitrary pair of evaluation metrics, that couldhappen.

◮ Different metrics can evaluate different aspects of amodel’s performance.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ For an arbitrary pair of evaluation metrics, that couldhappen.

◮ Different metrics can evaluate different aspects of amodel’s performance.

◮ But how about for the metrics that people commonlyemploy as overall figures of merit?

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Fortunately, this doesn’t turn out to be much of an issue.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Conducted two experiments

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ First: optimizing MaxEnt meta-parameter

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ First: optimizing MaxEnt meta-parameter

◮ Second: picking feature combinations

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Optimizing MaxEnt meta-parameter

◮ MaxEnt has a regularization parameter

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Controls the trade-off between generalization andoverfitting

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Given an evaluation metric, we can determine the“optimal” value for the regularization parameterthrough cross-validation.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Given an evaluation metric, we can determine the“optimal” value for the regularization parameterthrough cross-validation.

◮ How does this optimal value vary as a function of whichmetric is used?

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Optimizing MaxEnt meta-parameter (cont’d)

0.001 0.01 0.1 1 10 100 1000

Regularization Variance Parameter

Regularized Performance of pcfg baseline

pcfg baseline

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Optimizing MaxEnt meta-parameter (cont’d)

0.001 0.01 0.1 1 10 100 1000

Regularization

Z-Score Comparison of Metrics

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Optimizing MaxEnt meta-parameter (cont’d)Evidently, in practice the optimum meta-parameter is almostexactly the same for all the different metrics.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Maximum error rate increase from optimizing with adifferent metric from the set listed a few slides ago, onbaseline feature configuration: 0.41%.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Averaging of that figure over all the feature setconfigurations: 0.81%.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ Averaging of that figure over all the feature setconfigurations: 0.81%.

◮ Conclusion: it doesn’t really matter what metric youuse to optimize the meta-parameter.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Selecting a feature set combination

◮ Above, I described a handful of feature configurations; Iactually tested about 60 different combinations.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ In principal, the metric that I used to decide which onewas best matters.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

◮ In principal, the metric that I used to decide which onewas best matters.

◮ However, in fact the metrics all ranked the sameconfiguration as the best.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Metrics: SummaryThere are many different syntax disambiguation evaluationmetrics available.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Metrics: SummaryThere are many different syntax disambiguation evaluationmetrics available.However, from the point of view of optimizing a model,there is little difference between the 6 or so most commonlyused metrics.

Woodley Packard

ModelingAmbiguity

MaxEnt for the ERG

Evaluation Metrics

Syntax Disambiguation: ConclusionMy best combination model appears to represent a decentapproximation of the best performance available fromcurrent techniques when viewed through any of thecommonly used metrics.Hence it is a suitable baseline for judging the success offuture forays into joint disambiguation.

Joint Ambiguity Modeling in NLP - uio.no · Joint Ambiguity Modeling in NLP Woodley Packard...

Documents

Transcript of Joint Ambiguity Modeling in NLP - uio.no · Joint Ambiguity Modeling in NLP Woodley Packard...

Farnsworth et al - Ambiguity About Ambiguity

CS626: NLP, Speech and the Webcs626/cs626-sem1-2012/lecture_slides/c… · CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 15, 17: Parsing Ambiguity,

General Nonlinear Programming (NLP) Softwarecs777/presentations/Modelling...Outline Why Modeling Languages? Types of Modeling Languages Intro to Sample Problem Examination of: AMPL

Framework for Conceptual Modeling on Natural Language Textsceur-ws.org/Vol-1625/paper2.pdf · Conceptual modeling in the Natural Language Processing (NLP) is a way of modeling semantics.

Introduction to NLP - Ilsils.albany.edu/wp-content/uploads/2019/01/Lecture2_IntroNLP.pdfSemantic Ambiguity •Two deﬁnitionsof “mother” 1.a woman who has given birth to a child

NLP Training - NLP Certification

Computational Modeling of Lexical Ambiguitylinlin/papers/phd-defense.pdfComputational Modeling of Lexical Ambiguity Linlin Li Cluster of Excellence (MMCI), Saarland University flinling@coli.uni-sb.de

Nlp-Automata in Nlp

CS11-747 Neural Networks for NLP Language Modeling ...

Algorithms for NLP - demo.clab.cs.cmu.edudemo.clab.cs.cmu.edu/11711fa18/slides/FA18 11-711 lecture 11 - parsing 1.pdf · Algorithms for NLP. Ambiguity I saw a girl with a telescope

Measuring Ambiguity Aversionstatic.luiss.it/hey/ambiguity/papers/Moore_Eckel_2006.pdf · of ambiguity over probabilities, over payoffs, and decisions involving ambiguity over probabilities

Techno NLP · PDF fileSleight of Mouth Modeling Negotiation Skills Timeline and advanced NLP Change Techniques NLP mastery applies to all logical levels, for it is a way of:

Maximum Entropy Modeling and its application to NLP

Semantics (Representing Meaning)sharif.edu/~sani/courses/nlp/lec6.pdf · logical form to the final representation . 4 Semantic Interpretation . 5 Semantic Ambiguity • Ambiguity

NLP: N-Grams - Dan Garrette · NLP: N-Grams Dan Garrette dhg@cs.utexas.edu December 27, 2013 1 Language Modeling Tasks Language iden cation / Authorship identi cation Machine Translation

Tensor Product Generation Networks for Deep NLP Modeling · Tensor Product Generation Networks for Deep NLP Modeling Qiuyuan Huang, Paul Smolensky, Xiaodong He, Li Deng, Dapeng Wu

Introduction to NLP - Ils · 1. What is NLP? 2. Some application areas of NLP 3. A brief history of NLP 4. Famous NLP systems 5. Ambiguity and NLP 6. Overcoming ambiguity – Brief

Introduction to NLP - ils.albany.eduWhat is NLP? 2.Some application areas of NLP 3.A brief history of NLP 4.Famous NLP systems 5. Ambiguity and NLP 6.Overcoming ambiguity – Brief

Robert Dilts - Modeling With NLP

Timo Honkela: A short introduction to Modeling ambiguity and vagueness