Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf...

Post on 01-Jan-2016

219 views 0 download

Tags:

Transcript of Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf...

Natural Language Based Reformulation Resource and Web Exploitation for Question

Answering

Ulf Hermjakob, Abdessamad Echihabi, Daniel MarcuUniversity of Southern California

Presented By:Soobia Afroz

Introduction

The degree of difficulty How closely a given corpus matches the question and NOT on the question itself

Q: When was the UN founded?

A: The UN was formed in January 1942.

A: The name "United Nations", coined by United States President Franklin D. Roosevelt, was first used in the "Declaration by United Nations" of 1 January 1942, during the Second World War, when representatives of 26 nations pledged their Governments to

continue fighting together against the Axis Powers.

Larger text => Good Answers => Validation in original text

Paraphrasing questions:

Create semantically equivalent paraphrases of the questions Match Answer/string with any of the paraphrases

• Question paraphrases + Retrieval engine Find documents containing correct answers

• Rank and select better answers• Automatically paraphrase questions by TextMap.

Example:

“How did Mahatma Gandhi die?”

“How deep is Crater Lake?”

“Who invented the cotton gin?”

Automatic Paraphrases of questions:

How the system works:

• Parse questions

• Identify the answer type of the question

• Reformulate the questionaverage reformulations: 3.14

• Match at parse-tree level

1. Syntactic reformulations

• Turn a question into declarative form, e.g.,

2. Inference Reformulations

.

3. Reformulation Chains

4. Generation

Information Retrieval and the Web

TREC (Text Retrieval Conference)

IR system for Webclopedia

Web

Web based IR system

Query Reformulation module

Web Search engine

Sentence Ranking module

1. Query Reformulation module

Previous attempts:• Simple, exhaustive string-based manipulations• Transformation grammars• Learning algorithms

Current attempt:• Analyze how people naturally form queries to find answers• Randomly selected 50 TREC8 questions• Manually produced simplest queries that yield the most Web pages containing

answers• Analyzed the manually-produced queries and categorized them into seven ‘natural’

techniques that were used to form a natural language question• Derived algorithms that replicate each of the observed technique

Query Reformulation Techniques

2. Sentence Ranking module

• Produce a list of Boolean queries for each question using all the query reformulation techniques

• Retrieve the top ten results for each query using a web search engine• Retrieve the documents, strip HTML, segment the text into sentences• Each sentence is ranked according to 2 schemas:

Score w.r.t. queries terms:-- Each word in query assigned a weight-- Each quoted term in the query has a weight equal to the sum of the weights of its

words-- Each sentence has a weight equal to the weighted overlap with queries terms

Score w.r.t. answers:-- Tag sentences using BBN’s IdentiFinder (a hidden Markov model that learns to recognize and classify names,

dates, times, and numerical quantities.)-- Score sentences according to the overlap with answer type, checked against the

answer type and the semantic entities found by IdentiFinder

Evaluation of the results:

Evaluation of the results:

Evaluation of the results:

Reformulations led to more correct answers when used in conjunction with a large corpus like the Web.

Conclusion

Likelihood of finding correct answers is increased by QR

IR module produces higher quality answer candidates

Scoring precision is increased for answer candidates

A strong match with a reformulation provides additional confidence in the correctness of the answer