Ling573 NLP Systems and Applications May 7, 2013

103
Ling573 NLP Systems and Applications May 7, 2013

description

Ling573 NLP Systems and Applications May 7, 2013. Local Context and SMT for Question Expansion. “ Statistical Machine Translation for Query Expansion in Answer Retrieval”, Riezler et al, 2007 Investigates data-driven approaches to query exp. Local Context and SMT for Question Expansion. - PowerPoint PPT Presentation

Transcript of Ling573 NLP Systems and Applications May 7, 2013

Page 1: Ling573 NLP Systems and Applications May 7, 2013

Ling573NLP Systems and Applications

May 7, 2013

Page 2: Ling573 NLP Systems and Applications May 7, 2013

Local Context and SMT for Question Expansion

“Statistical Machine Translation for Query Expansion in Answer Retrieval”, Riezler et al, 2007

Investigates data-driven approaches to query exp.

Page 3: Ling573 NLP Systems and Applications May 7, 2013

Local Context and SMT for Question Expansion

“Statistical Machine Translation for Query Expansion in Answer Retrieval”, Riezler et al, 2007

Investigates data-driven approaches to query exp.Local context analysis (pseudo-rel. feedback)

Page 4: Ling573 NLP Systems and Applications May 7, 2013

Local Context and SMT for Question Expansion

“Statistical Machine Translation for Query Expansion in Answer Retrieval”, Riezler et al, 2007

Investigates data-driven approaches to query exp.Local context analysis (pseudo-rel. feedback)Contrasts: Collection global measures

Terms identified by statistical machine translationTerms identified by automatic paraphrasing

Page 5: Ling573 NLP Systems and Applications May 7, 2013

MotivationFundamental challenge in QA (and IR)

Bridging the “lexical chasm”

Page 6: Ling573 NLP Systems and Applications May 7, 2013

MotivationFundamental challenge in QA (and IR)

Bridging the “lexical chasm”Divide between user’s info need, author’s lexical

choiceResult of linguistic ambiguity

Many approaches:QA

Page 7: Ling573 NLP Systems and Applications May 7, 2013

MotivationFundamental challenge in QA (and IR)

Bridging the “lexical chasm”Divide between user’s info need, author’s lexical

choiceResult of linguistic ambiguity

Many approaches:QA

Question reformulation, syntactic rewritingOntology-based expansionMT-based reranking

IR:

Page 8: Ling573 NLP Systems and Applications May 7, 2013

MotivationFundamental challenge in QA (and IR)

Bridging the “lexical chasm”Divide between user’s info need, author’s lexical

choiceResult of linguistic ambiguity

Many approaches:QA

Question reformulation, syntactic rewritingOntology-based expansionMT-based reranking

IR: query expansion with pseudo-relevance feedback

Page 9: Ling573 NLP Systems and Applications May 7, 2013

Task & ApproachGoal:

Answer retrieval from FAQ pagesIR problem: matching queries to docs of Q-A pairsQA problem: finding answers in restricted document

set

Page 10: Ling573 NLP Systems and Applications May 7, 2013

Task & ApproachGoal:

Answer retrieval from FAQ pagesIR problem: matching queries to docs of Q-A pairsQA problem: finding answers in restricted document

set

Approach: Bridge lexical gap with statistical machine

translationPerform query expansion

Expansion terms identified via phrase-based MT

Page 11: Ling573 NLP Systems and Applications May 7, 2013

ChallengeVaried results for query expansionIR:

External resources:

Page 12: Ling573 NLP Systems and Applications May 7, 2013

ChallengeVaried results for query expansionIR:

External resources:Inconclusive (Vorhees), some gain (Fang)

Page 13: Ling573 NLP Systems and Applications May 7, 2013

ChallengeVaried results for query expansionIR:

External resources:Inconclusive (Vorhees), some gain (Fang)

Local, global collection statistics: Substantial gainsQA:

Page 14: Ling573 NLP Systems and Applications May 7, 2013

ChallengeVaried results for query expansionIR:

External resources:Inconclusive (Vorhees), some gain (Fang)

Local, global collection statistics: Substantial gainsQA:

External resources: mixed resultsLocal, global collection stats: sizable gains

Page 15: Ling573 NLP Systems and Applications May 7, 2013

Creating the FAQ CorpusPrior FAQ collections limited in scope, quality

Page 16: Ling573 NLP Systems and Applications May 7, 2013

Creating the FAQ CorpusPrior FAQ collections limited in scope, quality

Web search and scraping ‘FAQ’ in title/urlSearch in proprietary collections

Page 17: Ling573 NLP Systems and Applications May 7, 2013

Creating the FAQ CorpusPrior FAQ collections limited in scope, quality

Web search and scraping ‘FAQ’ in title/urlSearch in proprietary collections1-2.8M Q-A pairs

Inspection shows poor quality

Extracted from 4B page corpus

Page 18: Ling573 NLP Systems and Applications May 7, 2013

Creating the FAQ CorpusPrior FAQ collections limited in scope, quality

Web search and scraping ‘FAQ’ in title/urlSearch in proprietary collections1-2.8M Q-A pairs

Inspection shows poor quality

Extracted from 4B page corpus (they’re Google)Precision-oriented extraction

Page 19: Ling573 NLP Systems and Applications May 7, 2013

Creating the FAQ CorpusPrior FAQ collections limited in scope, quality

Web search and scraping ‘FAQ’ in title/urlSearch in proprietary collections1-2.8M Q-A pairs

Inspection shows poor quality

Extracted from 4B page corpus (they’re Google)Precision-oriented extraction

Search for ‘faq’, Train FAQ page classifier ~800K pages

Page 20: Ling573 NLP Systems and Applications May 7, 2013

Creating the FAQ CorpusPrior FAQ collections limited in scope, quality

Web search and scraping ‘FAQ’ in title/urlSearch in proprietary collections1-2.8M Q-A pairs

Inspection shows poor quality

Extracted from 4B page corpus (they’re Google)Precision-oriented extraction

Search for ‘faq’, Train FAQ page classifier ~800K pagesQ-A pairs: trained labeler: features?

Page 21: Ling573 NLP Systems and Applications May 7, 2013

Creating the FAQ CorpusPrior FAQ collections limited in scope, quality

Web search and scraping ‘FAQ’ in title/urlSearch in proprietary collections1-2.8M Q-A pairs

Inspection shows poor quality

Extracted from 4B page corpus (they’re Google)Precision-oriented extraction

Search for ‘faq’, Train FAQ page classifier ~800K pagesQ-A pairs: trained labeler: features?

punctuation, HTML tags (<p>,..), markers (Q:), lexical (what,how) 10M pairs (98% precision)

Page 22: Ling573 NLP Systems and Applications May 7, 2013

Machine Translation ModelSMT query expansion:

Builds on alignments from SMT models

Page 23: Ling573 NLP Systems and Applications May 7, 2013

Machine Translation ModelSMT query expansion:

Builds on alignments from SMT modelsBasic noisy channel machine translation model:

e: English; f: French

Page 24: Ling573 NLP Systems and Applications May 7, 2013

Machine Translation ModelSMT query expansion:

Builds on alignments from SMT modelsBasic noisy channel machine translation model:

e: English; f: French

p(e): ‘language model’; p(f|e): translation model

Page 25: Ling573 NLP Systems and Applications May 7, 2013

Machine Translation ModelSMT query expansion:

Builds on alignments from SMT modelsBasic noisy channel machine translation model:

e: English; f: French

p(e): ‘language model’; p(f|e): translation modelCalculated from relative frequencies of phrases

Phrases: larger blocks of aligned words

Page 26: Ling573 NLP Systems and Applications May 7, 2013

Machine Translation ModelSMT query expansion:

Builds on alignments from SMT modelsBasic noisy channel machine translation model:

e: English; f: French

p(e): ‘language model’; p(f|e): translation modelCalculated from relative frequencies of phrases

Phrases: larger blocks of aligned wordsSequence of phrases:

Page 27: Ling573 NLP Systems and Applications May 7, 2013

Machine Translation ModelSMT query expansion:

Builds on alignments from SMT modelsBasic noisy channel machine translation model:

e: English; f: French

p(e): ‘language model’; p(f|e): translation modelCalculated from relative frequencies of phrases

Phrases: larger blocks of aligned wordsSequence of phrases:

Page 28: Ling573 NLP Systems and Applications May 7, 2013

Question-Answer TranslationView Q-A pairs from FAQ as translation pairs

Q as translation of A (and vice versa)

Page 29: Ling573 NLP Systems and Applications May 7, 2013

Question-Answer TranslationView Q-A pairs from FAQ as translation pairs

Q as translation of A (and vice versa)Goal:

Learn alignments b/t question words & synonymous answer words

Page 30: Ling573 NLP Systems and Applications May 7, 2013

Question-Answer TranslationView Q-A pairs from FAQ as translation pairs

Q as translation of A (and vice versa)Goal:

Learn alignments b/t question words & synonymous answer wordsNot interested in fluency, ignore that part of MT

model

Issues: Differences from typical MT

Page 31: Ling573 NLP Systems and Applications May 7, 2013

Question-Answer TranslationView Q-A pairs from FAQ as translation pairs

Q as translation of A (and vice versa)Goal:

Learn alignments b/t question words & synonymous answer wordsNot interested in fluency, ignore that part of MT

model

Issues: Differences from typical MTLength differences

Page 32: Ling573 NLP Systems and Applications May 7, 2013

Question-Answer TranslationView Q-A pairs from FAQ as translation pairs

Q as translation of A (and vice versa)Goal:

Learn alignments b/t question words & synonymous answer wordsNot interested in fluency, ignore that part of MT model

Issues: Differences from typical MTLength differences Modify null alignment weightsLess important words

Page 33: Ling573 NLP Systems and Applications May 7, 2013

Question-Answer TranslationView Q-A pairs from FAQ as translation pairs

Q as translation of A (and vice versa)Goal:

Learn alignments b/t question words & synonymous answer wordsNot interested in fluency, ignore that part of MT model

Issues: Differences from typical MTLength differences Modify null alignment weightsLess important words Use intersection of

bidirectional alignments

Page 34: Ling573 NLP Systems and Applications May 7, 2013

ExampleQ: “How to live with cat allergies”Add expansion terms

Translations not seen in original query

Page 35: Ling573 NLP Systems and Applications May 7, 2013

SMT-based ParaphrasingKey approach intuition:

Identify paraphrases by translating to and from a ‘pivot’ language

Page 36: Ling573 NLP Systems and Applications May 7, 2013

SMT-based ParaphrasingKey approach intuition:

Identify paraphrases by translating to and from a ‘pivot’ language

Paraphrase rewrites yield phrasal ‘synonyms’E.g. translate E -> C -> E: find E phrases aligned to C

Page 37: Ling573 NLP Systems and Applications May 7, 2013

SMT-based ParaphrasingKey approach intuition:

Identify paraphrases by translating to and from a ‘pivot’ language

Paraphrase rewrites yield phrasal ‘synonyms’E.g. translate E -> C -> E: find E phrases aligned to C

Given paraphrase pair (trg, syn): pick best pivot

Page 38: Ling573 NLP Systems and Applications May 7, 2013

SMT-based ParaphrasingKey approach intuition:

Identify paraphrases by translating to and from a ‘pivot’ language

Paraphrase rewrites yield phrasal ‘synonyms’E.g. translate E -> C -> E: find E phrases aligned to C

Given paraphrase pair (trg, syn): pick best pivot

Page 39: Ling573 NLP Systems and Applications May 7, 2013

SMT-based ParaphrasingFeatures employed:

Phrase translation probabilities, lexical translation probabilities, reordering score, # words, # phrases, LM

Page 40: Ling573 NLP Systems and Applications May 7, 2013

SMT-based ParaphrasingFeatures employed:

Phrase translation probabilities, lexical translation probabilities, reordering score, # words, # phrases, LM

Trained on NIST multiple Chinese-English translations

Page 41: Ling573 NLP Systems and Applications May 7, 2013

SMT-based ParaphrasingFeatures employed:

Phrase translation probabilities, lexical translation probabilities, reordering score, # words, # phrases, LM

Trained on NIST multiple Chinese-English translations

Page 42: Ling573 NLP Systems and Applications May 7, 2013

ExampleQ: “How to live with cat allergies”Expansion approach:

Add new terms from n-best paraphrases

Page 43: Ling573 NLP Systems and Applications May 7, 2013

Retrieval ModelWeighted linear combination of vector similarity

valsComputed between query and fields of Q-A pair

Page 44: Ling573 NLP Systems and Applications May 7, 2013

Retrieval ModelWeighted linear combination of vector similarity

valsComputed between query and fields of Q-A pair

8 Q-A pair fields:1) Full FAQ text; 2) Question text; 3) answer text;4) title text; 5) 1-4 without stopwords

Page 45: Ling573 NLP Systems and Applications May 7, 2013

Retrieval ModelWeighted linear combination of vector similarity

valsComputed between query and fields of Q-A pair

8 Q-A pair fields:1) Full FAQ text; 2) Question text; 3) answer text;4) title text; 5) 1-4 without stopwordsHighest weights:

Page 46: Ling573 NLP Systems and Applications May 7, 2013

Retrieval ModelWeighted linear combination of vector similarity

valsComputed between query and fields of Q-A pair

8 Q-A pair fields:1) Full FAQ text; 2) Question text; 3) answer text;4) title text; 5) 1-4 without stopwordsHighest weights: Raw Q text;

Then stopped full text, stopped Q textThen stopped A text, stopped title text

No phrase matching or stemming

Page 47: Ling573 NLP Systems and Applications May 7, 2013

Query ExpansionSMT Term selection:

New terms from 50-best paraphrases7.8 terms added

Page 48: Ling573 NLP Systems and Applications May 7, 2013

Query ExpansionSMT Term selection:

New terms from 50-best paraphrases7.8 terms added

New terms from 20-best translations3.1 terms added

Why?

Page 49: Ling573 NLP Systems and Applications May 7, 2013

Query ExpansionSMT Term selection:

New terms from 50-best paraphrases7.8 terms added

New terms from 20-best translations3.1 terms addedWhy? - paraphrasing more constrained, less noisy

Weighting: Paraphrase: same; Trans: higher A text

Page 50: Ling573 NLP Systems and Applications May 7, 2013

Query ExpansionSMT Term selection:

New terms from 50-best paraphrases7.8 terms added

New terms from 20-best translations3.1 terms addedWhy? - paraphrasing more constrained, less noisy

Weighting: Paraphrase: same; Trans: higher A textLocal expansion (Xu and Croft)

top 20 docs, terms weighted by tfidf of answers

Page 51: Ling573 NLP Systems and Applications May 7, 2013

Query ExpansionSMT Term selection:

New terms from 50-best paraphrases7.8 terms added

New terms from 20-best translations3.1 terms addedWhy? - paraphrasing more constrained, less noisy

Weighting: Paraphrase: same; Trans: higher A textLocal expansion (Xu and Croft)

top 20 docs, terms weighted by tfidf of answersUse answer preference weighting for retrieval9.25 terms added

Page 52: Ling573 NLP Systems and Applications May 7, 2013

ExperimentsTest queries from MetaCrawler query logs

60 well-formed NL questions

Page 53: Ling573 NLP Systems and Applications May 7, 2013

ExperimentsTest queries from MetaCrawler query logs

60 well-formed NL questionsIssue: Systems fail on 1/3 of questions

No relevant answers retrievedE.g. “how do you make a cornhusk doll?”, “what does

8x certification mean”, etcSerious recall problem in QA DB

Page 54: Ling573 NLP Systems and Applications May 7, 2013

ExperimentsTest queries from MetaCrawler query logs

60 well-formed NL questionsIssue: Systems fail on 1/3 of questions

No relevant answers retrievedE.g. “how do you make a cornhusk doll?”, “what does

8x certification mean”, etcSerious recall problem in QA DB

Retrieve 20 results:Compute evaluation measures @10, 20

Page 55: Ling573 NLP Systems and Applications May 7, 2013

ExperimentsTest queries from MetaCrawler query logs

60 well-formed NL questionsIssue: Systems fail on 1/3 of questions

No relevant answers retrievedE.g. “how do you make a cornhusk doll?”, “what does

8x certification mean”, etcSerious recall problem in QA DB

Retrieve 20 results:Compute evaluation measures @10, 20

Page 56: Ling573 NLP Systems and Applications May 7, 2013

EvaluationManually label top 20 answers by 2 judges

Page 57: Ling573 NLP Systems and Applications May 7, 2013

EvaluationManually label top 20 answers by 2 judgesQuality rating: 3 point scale

adequate (2): Includes the answermaterial (1): Some relevant information, no exact

ansunsatisfactory (0): No relevant info

Page 58: Ling573 NLP Systems and Applications May 7, 2013

EvaluationManually label top 20 answers by 2 judgesQuality rating: 3 point scale

adequate (2): Includes the answermaterial (1): Some relevant information, no exact ansunsatisfactory (0): No relevant info

Compute ‘Successtype @ n’Type: 2,1,0 aboven: # of documents returned

Why not MRR?

Page 59: Ling573 NLP Systems and Applications May 7, 2013

EvaluationManually label top 20 answers by 2 judgesQuality rating: 3 point scale

adequate (2): Includes the answer material (1): Some relevant information, no exact ans unsatisfactory (0): No relevant info

Compute ‘Successtype @ n’ Type: 2,1,0 above n: # of documents returned

Why not MRR? - Reduce sensitivity to high rank Reward recall improvement

Page 60: Ling573 NLP Systems and Applications May 7, 2013

Evaluation Manually label top 20 answers by 2 judges Quality rating: 3 point scale

adequate (2): Includes the answer material (1): Some relevant information, no exact ans unsatisfactory (0): No relevant info

Compute ‘Successtype @ n’ Type: 2,1,0 above n: # of documents returned

Why not MRR? - Reduce sensitivity to high rank Reward recall improvement MRR rewards systems with answers in top 1, but poorly on

everything else

Page 61: Ling573 NLP Systems and Applications May 7, 2013

Results

Page 62: Ling573 NLP Systems and Applications May 7, 2013

DiscussionCompare baseline retrieval to:

Local RF expansionCombined translation, paraphrase based

expansion

Page 63: Ling573 NLP Systems and Applications May 7, 2013

DiscussionCompare baseline retrieval to:

Local RF expansionCombined translation, paraphrase based

expansion

Both forms of query expansion improve baseline2 @10: Local: +11%; SMT: +40.7%2,1 (easier task): little change

Page 64: Ling573 NLP Systems and Applications May 7, 2013

Example Expansions-++

-++

-

++

+++

---

Page 65: Ling573 NLP Systems and Applications May 7, 2013

ObservationsExpansion improves for rigorous criteria

Better for SMT than local RF

Why?

Page 66: Ling573 NLP Systems and Applications May 7, 2013

ObservationsExpansion improves for rigorous criteria

Better for SMT than local RF

Why?Both can introduce some good termsLocal RF introduces more irrelevant termsSMT more constrainedChallenge: Balance introducing info vs noise

Page 67: Ling573 NLP Systems and Applications May 7, 2013

Comparing Question Reformulations

“Exact Phrases in Information Retrieval for Question Answering”, Stoyanchev et al, 2008

InvestigatesRole of ‘exact phrases’ in retrieval for QAOptimal query construction through document

retrievalFrom Web or AQUAINT collection

Impact of query specificity on passage retrieval

Page 68: Ling573 NLP Systems and Applications May 7, 2013

MotivationRetrieval bottleneck in Question-Answering

Retrieval provides source for answer extraction If retrieval fails to return answer-contained

documents, downstream answer processing is guaranteed to fail

Focus on recall in information retrieval phaseConsistent relationship b/t quality of IR and of QA

Page 69: Ling573 NLP Systems and Applications May 7, 2013

MotivationRetrieval bottleneck in Question-Answering

Retrieval provides source for answer extraction If retrieval fails to return answer-contained

documents, downstream answer processing is guaranteed to fail

Focus on recall in information retrieval phaseConsistent relationship b/t quality of IR and of QA

Main factor in retrieval: query Approaches vary from simple to complex

processing or expansion with external resources

Page 70: Ling573 NLP Systems and Applications May 7, 2013

ApproachFocus on use of ‘exact phrases’ from a question

Page 71: Ling573 NLP Systems and Applications May 7, 2013

ApproachFocus on use of ‘exact phrases’ from a questionAnalyze impact of diff’t linguistic components of

QRelate to answer candidate sentences

Page 72: Ling573 NLP Systems and Applications May 7, 2013

ApproachFocus on use of ‘exact phrases’ from a questionAnalyze impact of diff’t linguistic components of

QRelate to answer candidate sentences

Evaluate query construction for Web, Trec retrievalOptimize query construction

Page 73: Ling573 NLP Systems and Applications May 7, 2013

ApproachFocus on use of ‘exact phrases’ from a questionAnalyze impact of diff’t linguistic components of

QRelate to answer candidate sentences

Evaluate query construction for Web, Trec retrievalOptimize query construction

Evaluate query construction for sentence retrievalAnalyze specificity

Page 74: Ling573 NLP Systems and Applications May 7, 2013

Data Sources & ResourcesDocuments:

TREC QA AQUAINT corpusWeb

Page 75: Ling573 NLP Systems and Applications May 7, 2013

Data Sources & ResourcesDocuments:

TREC QA AQUAINT corpusWeb

Questions:TREC2006, non-empty questions

Page 76: Ling573 NLP Systems and Applications May 7, 2013

Data Sources & ResourcesDocuments:

TREC QA AQUAINT corpusWeb

Questions:TREC2006, non-empty questions

Gold standard: NIST-provided relevant docs, answer key: 3.5

docs/Q

Page 77: Ling573 NLP Systems and Applications May 7, 2013

Data Sources & ResourcesDocuments:

TREC QA AQUAINT corpus Web

Questions: TREC2006, non-empty questions

Gold standard: NIST-provided relevant docs, answer key: 3.5 docs/Q

Resources: IR: Lucene; NLTK, Lingpipe: phrase, NE annotation

Also hand-corrected

Page 78: Ling573 NLP Systems and Applications May 7, 2013

Query Processing Approach

Exploit less-resource intensive methodsChunking, NERApplied only to questions, candidate sentences

Not applied to full collection

Page 79: Ling573 NLP Systems and Applications May 7, 2013

Query Processing Approach

Exploit less-resource intensive methodsChunking, NERApplied only to questions, candidate sentences

Not applied to full collection can use on Web

Exact phrase motivation:Phrases can improve retrieval

“In what year did the movie win academy awards?”

Page 80: Ling573 NLP Systems and Applications May 7, 2013

Query Processing Approach

Exploit less-resource intensive methodsChunking, NERApplied only to questions, candidate sentences

Not applied to full collection can use on Web

Exact phrase motivation:Phrases can improve retrieval

“In what year did the movie win academy awards?”Phrase:

Page 81: Ling573 NLP Systems and Applications May 7, 2013

Query Processing Approach

Exploit less-resource intensive methodsChunking, NERApplied only to questions, candidate sentences

Not applied to full collection can use on Web

Exact phrase motivation:Phrases can improve retrieval

“In what year did the movie win academy awards?”Phrase: Can rank documents higherDisjunct:

Page 82: Ling573 NLP Systems and Applications May 7, 2013

Query Processing Approach

Exploit less-resource intensive methodsChunking, NERApplied only to questions, candidate sentences

Not applied to full collection can use on Web

Exact phrase motivation:Phrases can improve retrieval

“In what year did the movie win academy awards?”Phrase: Can rank documents higherDisjunct: Can dilute pool

Page 83: Ling573 NLP Systems and Applications May 7, 2013

Query ProcessingNER on Question and target

target: 1991 eruption on Mt. Pinatubo vs Nirvana

Page 84: Ling573 NLP Systems and Applications May 7, 2013

Query ProcessingNER on Question and target

target: 1991 eruption on Mt. Pinatubo vs Nirvana

Page 85: Ling573 NLP Systems and Applications May 7, 2013

Query ProcessingNER on Question and target

target: 1991 eruption on Mt. Pinatubo vs Nirvana

Uses LingPipe: ORG, LOC, PERPhrases (NLTK)

NP, VP, PP

Page 86: Ling573 NLP Systems and Applications May 7, 2013

Query ProcessingNER on Question and target

target: 1991 eruption on Mt. Pinatubo vs Nirvana

Uses LingPipe: ORG, LOC, PERPhrases (NLTK)

NP, VP, PPConverted Q-phrases:

Heuristic paraphrases on question as declarativeE.g. Who was|is NOUN|PRONOUN VBD NOUN|

PRONOUN was|is VBD

Page 87: Ling573 NLP Systems and Applications May 7, 2013

Query ProcessingNER on Question and target

target: 1991 eruption on Mt. Pinatubo vs Nirvana Uses LingPipe: ORG, LOC, PER

Phrases (NLTK) NP, VP, PP

Converted Q-phrases: Heuristic paraphrases on question as declarative

E.g. Who was|is NOUN|PRONOUN VBD NOUN|PRONOUN was|is VBD

q-phrase: expected form of simple answerE.g. When was Mozart born? Mozart was born

How likely are we to see a q-phrase?

Page 88: Ling573 NLP Systems and Applications May 7, 2013

Query Processing NER on Question and target

target: 1991 eruption on Mt. Pinatubo vs Nirvana Uses LingPipe: ORG, LOC, PER

Phrases (NLTK) NP, VP, PP

Converted Q-phrases: Heuristic paraphrases on question as declarative

E.g. Who was|is NOUN|PRONOUN VBD NOUN|PRONOUN was|is VBD q-phrase: expected form of simple answer

E.g. When was Mozart born? Mozart was born How likely are we to see a q-phrase? Unlikely How likely is it to be right if we do see it?

Page 89: Ling573 NLP Systems and Applications May 7, 2013

Query Processing NER on Question and target

target: 1991 eruption on Mt. Pinatubo vs Nirvana Uses LingPipe: ORG, LOC, PER

Phrases (NLTK) NP, VP, PP

Converted Q-phrases: Heuristic paraphrases on question as declarative

E.g. Who was|is NOUN|PRONOUN VBD NOUN|PRONOUN was|is VBD q-phrase: expected form of simple answer

E.g. When was Mozart born? Mozart was born How likely are we to see a q-phrase? Unlikely How likely is it to be right if we do see it? Very

Page 90: Ling573 NLP Systems and Applications May 7, 2013

Comparing Query FormsBaseline: words from question and target

Page 91: Ling573 NLP Systems and Applications May 7, 2013

Comparing Query FormsBaseline: words from question and targetExperimental:

Words, quoted exact phrases, quoted names entities

Backoff: Lucene: weight based on type

Page 92: Ling573 NLP Systems and Applications May 7, 2013

Comparing Query FormsBaseline: words from question and targetExperimental:

Words, quoted exact phrases, quoted names entities

Backoff: Lucene: weight based on typeBackoff: Web: 1) converted q-phrases;

2) phrases; 3) w/o phrases -- until 20 retrievedCombined with target in all cases

Page 93: Ling573 NLP Systems and Applications May 7, 2013

Comparing Query FormsBaseline: words from question and targetExperimental:

Words, quoted exact phrases, quoted names entities

Backoff: Lucene: weight based on typeBackoff: Web: 1) converted q-phrases;

2) phrases; 3) w/o phrases -- until 20 retrievedCombined with target in all cases

Max 20 documents: expensive downstream processSentences split, ranked

Page 94: Ling573 NLP Systems and Applications May 7, 2013

Query Components

Page 95: Ling573 NLP Systems and Applications May 7, 2013

Query Components in Supporting Sentences

Highest precision:

Page 96: Ling573 NLP Systems and Applications May 7, 2013

Query Components in Supporting Sentences

Highest precision: Converted q-phrase, then phrase,..

Page 97: Ling573 NLP Systems and Applications May 7, 2013

Query Components in Supporting Sentences

Highest precision: Converted q-phrase, then phrase,..Words likely to appear, but don’t discriminate

Page 98: Ling573 NLP Systems and Applications May 7, 2013

ResultsDocument and sentence retrievalMetrics:

Document retrieval:Average recallMRROverall document recall: % of questions w/>=1 correct doc

Sentence retrievalSentence MRROverall sentence recallAverage prec of first sentence# correct in top 10, top 50

Page 99: Ling573 NLP Systems and Applications May 7, 2013

ResultsDocument and sentence retrievalMetrics:

Document retrieval:Average recallMRROverall document recall: % of questions w/>=1 correct

doc

Page 100: Ling573 NLP Systems and Applications May 7, 2013

Results

Page 101: Ling573 NLP Systems and Applications May 7, 2013

DiscussionDocument retrieval:

About half of the correct docs are retrieved, rank 1-2

Sentence retrieval:Lower, correct sentence ~ rank 3

Page 102: Ling573 NLP Systems and Applications May 7, 2013

DiscussionDocument retrieval:

About half of the correct docs are retrieved, rank 1-2

Sentence retrieval:Lower, correct sentence ~ rank 3

Little difference for exact phrases in AQUAINT

Page 103: Ling573 NLP Systems and Applications May 7, 2013

DiscussionDocument retrieval:

About half of the correct docs are retrieved, rank 1-2

Sentence retrieval:Lower, correct sentence ~ rank 3

Little difference for exact phrases in AQUAINTWeb:

Retrieval improved by exact phrasesManual more than auto (20-30%) relative

Precision affected by tagging errors