Question Classification II

82
Question Classification II Ling573 NLP Systems and Applications April 30, 2013

description

Question Classification II. Ling573 NLP Systems and Applications April 30, 2013. Roadmap. Question classification variations: SVM classifiers Sequence classifiers Sense information improvements Question series. Question Classification with Support Vector Machines. Hacioglu & Ward 2003 - PowerPoint PPT Presentation

Transcript of Question Classification II

Page 1: Question Classification II

Question Classification II

Ling573NLP Systems and Applications

April 30, 2013

Page 2: Question Classification II

RoadmapQuestion classification variations:

SVM classifiers

Sequence classifiers

Sense information improvements

Question series

Page 3: Question Classification II

Question Classification with Support Vector

MachinesHacioglu & Ward 2003Same taxonomy, training, test data as Li & Roth

Page 4: Question Classification II

Question Classification with Support Vector

MachinesHacioglu & Ward 200Same taxonomy, training, test data as Li & RothApproach:

Shallow processing

Simpler features

Strong discriminative classifiers

Page 5: Question Classification II

Question Classification with Support Vector

MachinesHacioglu & Ward 2003Same taxonomy, training, test data as Li & RothApproach:

Shallow processing

Simpler features

Strong discriminative classifiers

Page 6: Question Classification II

Features & ProcessingContrast: (Li & Roth)

POS, chunk info; NE tagging; other sense info

Page 7: Question Classification II

Features & ProcessingContrast: (Li & Roth)

POS, chunk info; NE tagging; other sense infoPreprocessing:

Only letters, convert to lower case, stopped, stemmed

Page 8: Question Classification II

Features & ProcessingContrast: (Li & Roth)

POS, chunk info; NE tagging; other sense infoPreprocessing:

Only letters, convert to lower case, stopped, stemmed

Terms:Most informative 2000 word N-grams Identifinder NE tags (7 or 9 tags)

Page 9: Question Classification II

Classification & ResultsEmploys support vector machines for

classificationBest results: Bi-gram, 7 NE classes

Page 10: Question Classification II

Classification & ResultsEmploys support vector machines for

classificationBest results: Bi-gram, 7 NE classes

Better than Li & Roth w/POS+chunk, but no semantics

Page 11: Question Classification II

Classification & ResultsEmploys support vector machines for

classificationBest results: Bi-gram, 7 NE classes

Better than Li & Roth w/POS+chunk, but no semantics

Fewer NE categories better More categories, more errors

Page 12: Question Classification II

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer

spans’

Page 13: Question Classification II

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer

spans’Intuition:

Humans identify Atype from few tokens w/little syntax

Page 14: Question Classification II

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer

spans’Intuition:

Humans identify Atype from few tokens w/little syntaxWho wrote Hamlet?

Page 15: Question Classification II

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer

spans’Intuition:

Humans identify Atype from few tokens w/little syntaxWho wrote Hamlet?

Page 16: Question Classification II

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer

spans’Intuition:

Humans identify Atype from few tokens w/little syntaxWho wrote Hamlet? How many dogs pull a sled at Iditarod?

Page 17: Question Classification II

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer

spans’Intuition:

Humans identify Atype from few tokens w/little syntaxWho wrote Hamlet? How many dogs pull a sled at Iditarod?

Page 18: Question Classification II

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer

spans’Intuition:

Humans identify Atype from few tokens w/little syntaxWho wrote Hamlet? How many dogs pull a sled at Iditarod?How much does a rhino weigh?

Page 19: Question Classification II

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer

spans’Intuition:

Humans identify Atype from few tokens w/little syntaxWho wrote Hamlet? How many dogs pull a sled at Iditarod?How much does a rhino weigh?

Page 20: Question Classification II

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer

spans’Intuition:

Humans identify Atype from few tokens w/little syntaxWho wrote Hamlet? How many dogs pull a sled at Iditarod?How much does a rhino weigh?

Single contiguous span of tokens

Page 21: Question Classification II

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer spans’Intuition:

Humans identify Atype from few tokens w/little syntaxWho wrote Hamlet? How many dogs pull a sled at Iditarod?How much does a rhino weigh?

Single contiguous span of tokensHow much does a rhino weigh?

Page 22: Question Classification II

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer spans’Intuition:

Humans identify Atype from few tokens w/little syntaxWho wrote Hamlet? How many dogs pull a sled at Iditarod?How much does a rhino weigh?

Single contiguous span of tokensHow much does a rhino weigh?Who is the CEO of IBM?

Page 23: Question Classification II

Informer Spans as Features

Sensitive to question structureWhat is Bill Clinton’s wife’s profession?

Page 24: Question Classification II

Informer Spans as Features

Sensitive to question structureWhat is Bill Clinton’s wife’s profession?

Page 25: Question Classification II

Informer Spans as Features

Sensitive to question structureWhat is Bill Clinton’s wife’s profession?

Idea: Augment Q classifier word ngrams w/IS info

Page 26: Question Classification II

Informer Spans as Features

Sensitive to question structureWhat is Bill Clinton’s wife’s profession?

Idea: Augment Q classifier word ngrams w/IS info

Informer span features: IS ngrams

Page 27: Question Classification II

Informer Spans as Features

Sensitive to question structureWhat is Bill Clinton’s wife’s profession?

Idea: Augment Q classifier word ngrams w/IS info

Informer span features: IS ngrams Informer ngrams hypernyms:

Generalize over words or compounds

Page 28: Question Classification II

Informer Spans as Features

Sensitive to question structureWhat is Bill Clinton’s wife’s profession?

Idea: Augment Q classifier word ngrams w/IS info

Informer span features: IS ngrams Informer ngrams hypernyms:

Generalize over words or compoundsWSD?

Page 29: Question Classification II

Informer Spans as Features

Sensitive to question structureWhat is Bill Clinton’s wife’s profession?

Idea: Augment Q classifier word ngrams w/IS info

Informer span features: IS ngrams Informer ngrams hypernyms:

Generalize over words or compoundsWSD? No

Page 30: Question Classification II

Effect of Informer SpansClassifier: Linear SVM + multiclass

Page 31: Question Classification II

Effect of Informer SpansClassifier: Linear SVM + multiclass

Notable improvement for IS hypernyms

Page 32: Question Classification II

Effect of Informer SpansClassifier: Linear SVM + multiclass

Notable improvement for IS hypernymsBetter than all hypernyms – filter sources of noise

Biggest improvements for ‘what’, ‘which’ questions

Page 33: Question Classification II

Perfect vs CRF Informer Spans

Page 34: Question Classification II

Recognizing Informer Spans

Idea: contiguous spans, syntactically governed

Page 35: Question Classification II

Recognizing Informer Spans

Idea: contiguous spans, syntactically governedUse sequential learner w/syntactic information

Page 36: Question Classification II

Recognizing Informer Spans

Idea: contiguous spans, syntactically governedUse sequential learner w/syntactic information

Tag spans with B(egin),I(nside),O(outside)Employ syntax to capture long range factors

Page 37: Question Classification II

Recognizing Informer Spans

Idea: contiguous spans, syntactically governedUse sequential learner w/syntactic information

Tag spans with B(egin),I(nside),O(outside)Employ syntax to capture long range factors

Matrix of features derived from parse tree

Page 38: Question Classification II

Recognizing Informer Spans

Idea: contiguous spans, syntactically governedUse sequential learner w/syntactic information

Tag spans with B(egin),I(nside),O(outside)Employ syntax to capture long range factors

Matrix of features derived from parse treeCell:x[i,l], i is position, l is depth in parse tree, only

2Values:

Tag: POS, constituent label in the positionNum: number of preceding chunks with same tag

Page 39: Question Classification II

Parser OutputParse

Page 40: Question Classification II

Parse TabulationEncoding and table:

Page 41: Question Classification II

CRF Indicator FeaturesCell:

IsTag, IsNum: e.g. y4 = 1 and x[4,2].tag=NPAlso, IsPrevTag, IsNextTag

Page 42: Question Classification II

CRF Indicator FeaturesCell:

IsTag, IsNum: e.g. y4 = 1 and x[4,2].tag=NPAlso, IsPrevTag, IsNextTag

Edge: IsEdge: (u,v) , yi-1=u and yi=v IsBegin, IsEnd

Page 43: Question Classification II

CRF Indicator FeaturesCell:

IsTag, IsNum: e.g. y4 = 1 and x[4,2].tag=NPAlso, IsPrevTag, IsNextTag

Edge: IsEdge: (u,v) , yi-1=u and yi=v IsBegin, IsEnd

All features improve

Page 44: Question Classification II

CRF Indicator FeaturesCell:

IsTag, IsNum: e.g. y4 = 1 and x[4,2].tag=NPAlso, IsPrevTag, IsNextTag

Edge: IsEdge: (u,v) , yi-1=u and yi=v IsBegin, IsEnd

All features improve

Question accuracy: Oracle: 88%; CRF: 86.2%

Page 45: Question Classification II

Question Classification Using Headwords and Their HypernymsHuang, Thint, and Qin 2008Questions:

Why didn’t WordNet/Hypernym features help in L&R?

Page 46: Question Classification II

Question Classification Using Headwords and Their HypernymsHuang, Thint, and Qin 2008Questions:

Why didn’t WordNet/Hypernym features help in L&R?

Best results in L&R - ~200,000 feats; ~700 activeCan we do as well with fewer features?

Page 47: Question Classification II

Question Classification Using Headwords and Their HypernymsHuang, Thint, and Qin 2008Questions:

Why didn’t WordNet/Hypernym features help in L&R?

Best results in L&R - ~200,000 feats; ~700 activeCan we do as well with fewer features?

Approach:Refine features:

Page 48: Question Classification II

Question Classification Using Headwords and Their HypernymsHuang, Thint, and Qin 2008Questions:

Why didn’t WordNet/Hypernym features help in L&R?

Best results in L&R - ~200,000 feats; ~700 activeCan we do as well with fewer features?

Approach:Refine features:

Restrict use of WordNet to headwords

Page 49: Question Classification II

Question Classification Using Headwords and Their Hypernyms

Huang, Thint, and Qin 2008Questions:

Why didn’t WordNet/Hypernym features help in L&R?Best results in L&R - ~200,000 feats; ~700 active

Can we do as well with fewer features?

Approach:Refine features:

Restrict use of WordNet to headwordsEmploy WSD techniques

SVM, MaxEnt classifiers

Page 50: Question Classification II

Head Word FeaturesHead words:

Chunks and spans can be noisy

Page 51: Question Classification II

Head Word FeaturesHead words:

Chunks and spans can be noisyE.g. Bought a share in which baseball team?

Page 52: Question Classification II

Head Word FeaturesHead words:

Chunks and spans can be noisyE.g. Bought a share in which baseball team?

Type: HUM: group (not ENTY:sport) Head word is more specific

Page 53: Question Classification II

Head Word FeaturesHead words:

Chunks and spans can be noisyE.g. Bought a share in which baseball team?

Type: HUM: group (not ENTY:sport) Head word is more specific

Employ rules over parse trees to extract head words

Page 54: Question Classification II

Head Word FeaturesHead words:

Chunks and spans can be noisyE.g. Bought a share in which baseball team?

Type: HUM: group (not ENTY:sport) Head word is more specific

Employ rules over parse trees to extract head words

Issue: vague headsE.g. What is the proper name for a female walrus?

Head = ‘name’?

Page 55: Question Classification II

Head Word FeaturesHead words:

Chunks and spans can be noisyE.g. Bought a share in which baseball team?

Type: HUM: group (not ENTY:sport) Head word is more specific

Employ rules over parse trees to extract head words

Issue: vague headsE.g. What is the proper name for a female walrus?

Head = ‘name’?Apply fix patterns to extract sub-head (e.g. walrus)

Page 56: Question Classification II

Head Word FeaturesHead words:

Chunks and spans can be noisyE.g. Bought a share in which baseball team?

Type: HUM: group (not ENTY:sport) Head word is more specific

Employ rules over parse trees to extract head words Issue: vague heads

E.g. What is the proper name for a female walrus? Head = ‘name’?

Apply fix patterns to extract sub-head (e.g. walrus)Also, simple regexp for other feature type

E.g. ‘what is’ cue to definition type

Page 57: Question Classification II

WordNet FeaturesHypernyms:

Enable generalization: dog->..->animalCan generate noise: also

Page 58: Question Classification II

WordNet FeaturesHypernyms:

Enable generalization: dog->..->animalCan generate noise: also dog ->…-> person

Page 59: Question Classification II

WordNet FeaturesHypernyms:

Enable generalization: dog->..->animalCan generate noise: also dog ->…-> person

Adding low noise hypernymsWhich senses?

Page 60: Question Classification II

WordNet FeaturesHypernyms:

Enable generalization: dog->..->animalCan generate noise: also dog ->…-> person

Adding low noise hypernymsWhich senses?

Restrict to matching WordNet POS

Page 61: Question Classification II

WordNet FeaturesHypernyms:

Enable generalization: dog->..->animalCan generate noise: also dog ->…-> person

Adding low noise hypernymsWhich senses?

Restrict to matching WordNet POS Which word senses?

Page 62: Question Classification II

WordNet FeaturesHypernyms:

Enable generalization: dog->..->animalCan generate noise: also dog ->…-> person

Adding low noise hypernymsWhich senses?

Restrict to matching WordNet POS Which word senses?

Use Lesk algorithm: overlap b/t question & WN gloss

Page 63: Question Classification II

WordNet FeaturesHypernyms:

Enable generalization: dog->..->animalCan generate noise: also dog ->…-> person

Adding low noise hypernymsWhich senses?

Restrict to matching WordNet POS Which word senses?

Use Lesk algorithm: overlap b/t question & WN glossHow deep?

Page 64: Question Classification II

WordNet FeaturesHypernyms:

Enable generalization: dog->..->animalCan generate noise: also dog ->…-> person

Adding low noise hypernymsWhich senses?

Restrict to matching WordNet POS Which word senses?

Use Lesk algorithm: overlap b/t question & WN glossHow deep?

Based on validation set: 6

Page 65: Question Classification II

WordNet FeaturesHypernyms:

Enable generalization: dog->..->animal Can generate noise: also dog ->…-> person

Adding low noise hypernyms Which senses?

Restrict to matching WordNet POS Which word senses?

Use Lesk algorithm: overlap b/t question & WN gloss How deep?

Based on validation set: 6

Q Type similarity: compute similarity b/t headword & type Use type as feature

Page 66: Question Classification II

Other FeaturesQuestion wh-word:

What,which,who,where,when,how,why, and rest

Page 67: Question Classification II

Other FeaturesQuestion wh-word:

What,which,who,where,when,how,why, and rest

N-grams: uni-,bi-,tri-grams

Page 68: Question Classification II

Other FeaturesQuestion wh-word:

What,which,who,where,when,how,why, and rest

N-grams: uni-,bi-,tri-grams

Word shape:Case features: all upper, all lower, mixed, all digit,

other

Page 69: Question Classification II

Results

Per feature-type results:

Page 70: Question Classification II

Results: IncrementalAdditive improvement:

Page 71: Question Classification II

Error AnalysisInherent ambiguity:

What is mad cow disease?ENT: disease or DESC:def

Page 72: Question Classification II

Error AnalysisInherent ambiguity:

What is mad cow disease?ENT: disease or DESC:def

Inconsistent labeling:What is the population of Kansas? NUM: otherWhat is the population of Arcadia, FL ?

Page 73: Question Classification II

Error AnalysisInherent ambiguity:

What is mad cow disease?ENT: disease or DESC:def

Inconsistent labeling:What is the population of Kansas? NUM: otherWhat is the population of Arcadia, FL ? NUM:count

Parser error

Page 74: Question Classification II

Question Classification: Summary

Issue: Integrating rich features/deeper processing

Page 75: Question Classification II

Question Classification: Summary

Issue: Integrating rich features/deeper processing

Errors in processing introduce noise

Page 76: Question Classification II

Question Classification: Summary

Issue: Integrating rich features/deeper processing

Errors in processing introduce noiseNoise in added features increases error

Page 77: Question Classification II

Question Classification: Summary

Issue: Integrating rich features/deeper processing

Errors in processing introduce noiseNoise in added features increases errorLarge numbers of features can be problematic for

training

Page 78: Question Classification II

Question Classification: Summary

Issue: Integrating rich features/deeper processing

Errors in processing introduce noiseNoise in added features increases errorLarge numbers of features can be problematic for

training

Alternative solutions:

Page 79: Question Classification II

Question Classification: Summary

Issue: Integrating rich features/deeper processing

Errors in processing introduce noiseNoise in added features increases errorLarge numbers of features can be problematic for

training

Alternative solutions:Use more accurate shallow processing, better

classifier

Page 80: Question Classification II

Question Classification: Summary

Issue: Integrating rich features/deeper processing

Errors in processing introduce noiseNoise in added features increases errorLarge numbers of features can be problematic for

training

Alternative solutions:Use more accurate shallow processing, better

classifierRestrict addition of features to

Page 81: Question Classification II

Question Classification: Summary

Issue: Integrating rich features/deeper processing

Errors in processing introduce noiseNoise in added features increases errorLarge numbers of features can be problematic for training

Alternative solutions:Use more accurate shallow processing, better classifierRestrict addition of features to

Informer spansHeadwords

Page 82: Question Classification II

Question Classification: Summary

Issue: Integrating rich features/deeper processing

Errors in processing introduce noiseNoise in added features increases errorLarge numbers of features can be problematic for training

Alternative solutions:Use more accurate shallow processing, better classifierRestrict addition of features to

Informer spansHeadwords

Filter features to be added