Applying Machine Learning - Abdessamad Echihabi at SDL Connect 16
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf...
-
Upload
arron-oliver -
Category
Documents
-
view
219 -
download
0
Transcript of Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf...
![Page 1: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649ec95503460f94bd61ce/html5/thumbnails/1.jpg)
Natural Language Based Reformulation Resource and Web Exploitation for Question
Answering
Ulf Hermjakob, Abdessamad Echihabi, Daniel MarcuUniversity of Southern California
Presented By:Soobia Afroz
![Page 2: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649ec95503460f94bd61ce/html5/thumbnails/2.jpg)
Introduction
The degree of difficulty How closely a given corpus matches the question and NOT on the question itself
Q: When was the UN founded?
A: The UN was formed in January 1942.
A: The name "United Nations", coined by United States President Franklin D. Roosevelt, was first used in the "Declaration by United Nations" of 1 January 1942, during the Second World War, when representatives of 26 nations pledged their Governments to
continue fighting together against the Axis Powers.
Larger text => Good Answers => Validation in original text
![Page 3: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649ec95503460f94bd61ce/html5/thumbnails/3.jpg)
Paraphrasing questions:
Create semantically equivalent paraphrases of the questions Match Answer/string with any of the paraphrases
• Question paraphrases + Retrieval engine Find documents containing correct answers
• Rank and select better answers• Automatically paraphrase questions by TextMap.
Example:
“How did Mahatma Gandhi die?”
“How deep is Crater Lake?”
“Who invented the cotton gin?”
![Page 4: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649ec95503460f94bd61ce/html5/thumbnails/4.jpg)
Automatic Paraphrases of questions:
![Page 5: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649ec95503460f94bd61ce/html5/thumbnails/5.jpg)
How the system works:
• Parse questions
• Identify the answer type of the question
• Reformulate the questionaverage reformulations: 3.14
• Match at parse-tree level
![Page 6: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649ec95503460f94bd61ce/html5/thumbnails/6.jpg)
1. Syntactic reformulations
• Turn a question into declarative form, e.g.,
![Page 7: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649ec95503460f94bd61ce/html5/thumbnails/7.jpg)
2. Inference Reformulations
.
![Page 8: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649ec95503460f94bd61ce/html5/thumbnails/8.jpg)
3. Reformulation Chains
![Page 9: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649ec95503460f94bd61ce/html5/thumbnails/9.jpg)
4. Generation
![Page 10: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649ec95503460f94bd61ce/html5/thumbnails/10.jpg)
Information Retrieval and the Web
TREC (Text Retrieval Conference)
IR system for Webclopedia
Web
Web based IR system
Query Reformulation module
Web Search engine
Sentence Ranking module
![Page 11: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649ec95503460f94bd61ce/html5/thumbnails/11.jpg)
1. Query Reformulation module
Previous attempts:• Simple, exhaustive string-based manipulations• Transformation grammars• Learning algorithms
Current attempt:• Analyze how people naturally form queries to find answers• Randomly selected 50 TREC8 questions• Manually produced simplest queries that yield the most Web pages containing
answers• Analyzed the manually-produced queries and categorized them into seven ‘natural’
techniques that were used to form a natural language question• Derived algorithms that replicate each of the observed technique
![Page 12: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649ec95503460f94bd61ce/html5/thumbnails/12.jpg)
Query Reformulation Techniques
![Page 13: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649ec95503460f94bd61ce/html5/thumbnails/13.jpg)
2. Sentence Ranking module
• Produce a list of Boolean queries for each question using all the query reformulation techniques
• Retrieve the top ten results for each query using a web search engine• Retrieve the documents, strip HTML, segment the text into sentences• Each sentence is ranked according to 2 schemas:
Score w.r.t. queries terms:-- Each word in query assigned a weight-- Each quoted term in the query has a weight equal to the sum of the weights of its
words-- Each sentence has a weight equal to the weighted overlap with queries terms
Score w.r.t. answers:-- Tag sentences using BBN’s IdentiFinder (a hidden Markov model that learns to recognize and classify names,
dates, times, and numerical quantities.)-- Score sentences according to the overlap with answer type, checked against the
answer type and the semantic entities found by IdentiFinder
![Page 14: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649ec95503460f94bd61ce/html5/thumbnails/14.jpg)
Evaluation of the results:
![Page 15: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649ec95503460f94bd61ce/html5/thumbnails/15.jpg)
Evaluation of the results:
![Page 16: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649ec95503460f94bd61ce/html5/thumbnails/16.jpg)
Evaluation of the results:
Reformulations led to more correct answers when used in conjunction with a large corpus like the Web.
![Page 17: Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.](https://reader030.fdocuments.us/reader030/viewer/2022032607/56649ec95503460f94bd61ce/html5/thumbnails/17.jpg)
Conclusion
Likelihood of finding correct answers is increased by QR
IR module produces higher quality answer candidates
Scoring precision is increased for answer candidates
A strong match with a reformulation provides additional confidence in the correctness of the answer