Natural Language Question Answering
description
Transcript of Natural Language Question Answering
Natural Language Question Answering
Gary Geunbae LeeNLP Lab.
Professor@POSTECHChief Scientist@DiQuest
Kiss_tutorial 2
Dan Jurafsky
2
Question AnsweringOne of the oldest NLP tasks (punched card systems in 1961)
Simmons, Klein, McConlogue. 1964. Indexing and Dependency Logic for Answering English Questions. American Documentation 15:30, 196-204
Dan Jurafsky
Question Answering: IBM’s Watson
• Won Jeopardy on February 16, 2011!
3
WILLIAM WILKINSON’S “AN ACCOUNT OF THE PRINCIPALITIES OF
WALLACHIA AND MOLDOVIA”INSPIRED THIS AUTHOR’S
MOST FAMOUS NOVEL
Bram Stoker
Dan Jurafsky
Apple’s Siri
4
Dan Jurafsky
Wolfram Alpha
5
Gary G. Lee, Postech
6
Task Definition Open-domain QA
Find answers to open-domain natural language questions from large collections of documents
Questions are usually about factoid Documents includes texts, web pages, databases, multimedia, maps,
etc. Canned QA
Map new questions into predefined questions for which answers exist.
FAQ Finder
Gary G. Lee, Postech
7
Example (1) Q: What is the fastest car in the world? Correct answer:
…, the Jaguar XJ220 is the dearest (Pounds 415,000), fastest (217 mph / 350 kmh) and most sought-after car in the world.
Wrong answer:… will stretch Volkswagen’s lead in the world’s fastest-growing vehicle
market. Demand is expected to soar in the next few years as more Chinese are able to afford their own cars.
Gary G. Lee, Postech
8
Example (2) Q: How did Socrates die?
Answer: … Socrates drunk poisoned wine.
Needs world knowledge and reasoning.(Anyone drinking or eating something that is poisoned is likely to die.)
Gary G. Lee, Postech
9
A taxonomy of QA systems
Class Type Example
1 Factual Q: What is the largest city in Germany?A: … Berlin, the largest city in Germany, …
2 Simple reasoning Q: How did Socrates die?A: … Socrates poisoned himself …
3 Fusion list Q: What are the arguments for and against prayer in school?Answer across several texts
4 Interactive context Clarification questions
5 Speculative Q: Should the Fed raise interest rates at their next meeting?Answer provides analogies to past actions
Moldovan et al., ACL 2002
Gary G. Lee, Postech
10
Enabling Technologies and Applications
Enabling Technologies POS Tagger Parser WSD Named Entity Tagger Information Retrieval Information Extraction Inference engine Ontology (WordNet) Knowledge Acquisition Knowledge Classification Language generation
Applications Smart agents Situation Management E-Commerce Summarization Tutoring / Learning Personal Assistant in business On-line Documentation On-line Troubleshooting Semantic Web
Gary G. Lee, Postech
11
Question Taxonomies Wendy Lehnert’s Taxonomy
13 question categories Based on Shank’s Conceptual Dependency
Arthur Graesser’s Taxonomy 18 question categories Expanding Lehnert’s categories Based on speech act theory
Question Types in LASSO [Moldovan et al., TREC-8] Combines question stems, question focus and phrasal heads. Unifies the question class with the answer type via the question
focus.
Gary G. Lee, Postech
12
Earlier QA Systems The QUALM system (Wendy Lehnert, 1978)
Reads stories and answers questions about what was read The LUNAR system (W. Woods, 1977)
One of the first user evaluations of question answering systems The STUDENT system (Terry Winograd, 1977)
Solved high school algebra problems The MURAX system (Julian Kupiec, 1993)
Uses online encyclopedia for closed-class questions The START system (Boris Katz, 1997)
Uses annotations to process questions from the Web FAQ-Finder (Robin Burke, 1997)
Uses Frequently Asked Questions from Usenet newsgroups
Gary G. Lee, Postech
13
Recent QA ConferencesTREC-8 TREC-9 TREC-10 TREC-11 NTCIR-3
No. Questions 200 693 500 500 200
Collection size 2GB 3GB 3GB 3GB (AQUAINT) 280MB
Question source From text Query logs Query logs Query logs From text
Participating systems 41 75 92 76 36
Performance 64.5% 76% 69% 85.6% 60.8%
Answer format 50&250 bytes 50&250 bytes 50 bytes Exact Exact
Top 5, MRR Top 5, MRR
Top 5, MRRMain taskList taskContext taskNull answer
Top 1, CWMain taskList taskNull answer
Top 5, MRRTask 1Task 2Task 3
Dan Jurafsky
14
Types of Questions in Modern Systems
• Factoid questions• Who wrote “The Universal Declaration of Human Rights”?• How many calories are there in two slices of apple pie?• What is the average age of the onset of autism?• Where is Apple Computer based?
• Complex (narrative) questions:• In children with an acute febrile illness, what is the
efficacy of acetaminophen in reducing fever?• What do scholars think about Jefferson’s position on
dealing with pirates?
Dan Jurafsky
Commercial systems: mainly factoid questions
Where is the Louvre Museum located?
In Paris, France
What’s the abbreviation for limited partnership?
L.P.
What are the names of Odin’s ravens?
Huginn and Muninn
What currency is used in China? The yuan
What kind of nuts are used in marzipan?
almonds
What instrument does Max Roach play?
drums
What is the telephone number for Stanford University?
650-723-2300
Dan Jurafsky
Paradigms for QA
• IR-based approaches• TREC; IBM Watson; Google
• Knowledge-based and Hybrid approaches• IBM Watson; Apple Siri; Wolfram Alpha; True
Knowledge Evi
16
Dan Jurafsky
Many questions can already be answered by web search
• a
17
Dan Jurafsky
IR-based Question Answering
• a
18
Dan Jurafsky
19
IR-based Factoid QA
Dan Jurafsky
IR-based Factoid QA
• QUESTION PROCESSING• Detect question type, answer type, focus, relations• Formulate queries to send to a search engine
• PASSAGE RETRIEVAL• Retrieve ranked documents• Break into suitable passages and rerank
• ANSWER PROCESSING• Extract candidate answers• Rank candidates
• using evidence from the text and external sources
Dan Jurafsky
Knowledge-based approaches (Siri)
• Build a semantic representation of the query• Times, dates, locations, entities, numeric quantities
• Map from this semantics to query structured data or resources• Geospatial databases• Ontologies (Wikipedia infoboxes, dbPedia, WordNet, Yago)• Restaurant review sources and reservation services• Scientific databases
21
Dan Jurafsky
Hybrid approaches (IBM Watson)
• Build a shallow semantic representation of the query• Generate answer candidates using IR methods
• Augmented with ontologies and semi-structured data• Score each candidate using richer knowledge sources
• Geospatial databases• Temporal reasoning• Taxonomical classification
22
Dan Jurafsky
Factoid Q/A
23
Dan Jurafsky
Question ProcessingThings to extract from the question
• Answer Type Detection• Decide the named entity type (person, place) of the answer
• Query Formulation• Choose query keywords for the IR system
• Question Type classification• Is this a definition question, a math question, a list question?
• Focus Detection• Find the question words that are replaced by the answer
• Relation Extraction• Find relations between entities in the question
24
Dan Jurafsky
Question Processing They’re the two states you could be reentering if you’re crossing Florida’s northern border
• Answer Type: US state• Query: two states, border, Florida, north• Focus: the two states• Relations: borders(Florida, ?x, north)
25
Dan Jurafsky
Answer Type Detection: Named Entities
• Who founded Virgin Airlines?• PERSON
• What Canadian city has the largest population?• CITY.
Dan Jurafsky
Answer Type Taxonomy
• 6 coarse classes• ABBEVIATION, ENTITY, DESCRIPTION, HUMAN, LOCATION,
NUMERIC• 50 finer classes
• LOCATION: city, country, mountain…• HUMAN: group, individual, title, description• ENTITY: animal, body, color, currency…
27
Xin Li, Dan Roth. 2002. Learning Question Classifiers. COLING'02
Dan Jurafsky
28
Part of Li & Roth’s Answer Type Taxonomy
Dan Jurafsky
29
Answer Types
Dan Jurafsky
30
More Answer Types
Dan Jurafsky
Answer types in Jeopardy
• 2500 answer types in 20,000 Jeopardy question sample• The most frequent 200 answer types cover < 50% of data• The 40 most frequent Jeopardy answer typeshe, country, city, man, film, state, she, author, group, here, company, president, capital, star, novel, character, woman, river, island, king, song, part, series, sport, singer, actor, play, team, show, actress, animal, presidential, composer, musical, nation, book, title, leader, game
31
Ferrucci et al. 2010. Building Watson: An Overview of the DeepQA Project. AI Magazine. Fall 2010. 59-79.
Dan Jurafsky
Answer Type Detection
• Hand-written rules• Machine Learning• Hybrids
Dan Jurafsky
Answer Type Detection
• Regular expression-based rules can get some cases:• Who {is|was|are|were} PERSON• PERSON (YEAR – YEAR)
• Other rules use the question headword: (the headword of the first noun phrase after the wh-word)
• Which city in China has the largest number of foreign financial companies?
• What is the state flower of California?
Dan Jurafsky
Answer Type Detection
• Most often, we treat the problem as machine learning classification • Define a taxonomy of question types• Annotate training data for each question type• Train classifiers for each question class
using a rich set of features.• features include those hand-written rules!
34
Dan Jurafsky
Features for Answer Type Detection
• Question words and phrases• Part-of-speech tags• Parse features (headwords)• Named Entities• Semantically related words
35
Dan Jurafsky
Factoid Q/A
36
Dan Jurafsky
Keyword Selection Algorithm
1. Select all non-stop words in quotations2. Select all NNP words in recognized named entities3. Select all complex nominals with their adjectival modifiers4. Select all other complex nominals5. Select all nouns with their adjectival modifiers6. Select all other nouns7. Select all verbs 8. Select all adverbs 9. Select the QFW word (skipped in all previous steps) 10. Select all other words
Dan Moldovan, Sanda Harabagiu, Marius Paca, Rada Mihalcea, Richard Goodrum, Roxana Girju and Vasile Rus. 1999. Proceedings of TREC-8.
Dan Jurafsky
Choosing keywords from the query
38
Who coined the term “cyberspace” in his novel “Neuromancer”?
1 1
4 4
7
cyberspace/1 Neuromancer/1 term/4 novel/4 coined/7
Slide from Mihai Surdeanu
Dan Jurafsky
Factoid Q/A
39
Dan Jurafsky
40
Passage Retrieval
• Step 1: IR engine retrieves documents using query terms• Step 2: Segment the documents into shorter units
• something like paragraphs• Step 3: Passage ranking
• Use answer type to help rerank passages
Dan Jurafsky
Features for Passage Ranking
• Number of Named Entities of the right type in passage• Number of query words in passage• Number of question N-grams also in passage• Proximity of query keywords to each other in passage• Longest sequence of question words• Rank of the document containing passage
Either in rule-based classifiers or with supervised machine learning
Dan Jurafsky
Factoid Q/A
42
Dan Jurafsky
Answer Extraction
• Run an answer-type named-entity tagger on the passages• Each answer type requires a named-entity tagger that detects it• If answer type is CITY, tagger has to tag CITY
• Can be full NER, simple regular expressions, or hybrid• Return the string with the right type:
• Who is the prime minister of India (PERSON)Manmohan Singh, Prime Minister of India, had told left leaders that the deal would not be renegotiated.• How tall is Mt. Everest? (LENGTH)The official height of Mount Everest is 29035 feet
Dan Jurafsky
Ranking Candidate Answers
• But what if there are multiple candidate answers!
Q: Who was Queen Victoria’s second son?• Answer Type: Person
• Passage:The Marie biscuit is named after Marie Alexandrovna, the daughter of Czar Alexander II of Russia and wife of Alfred, the second son of Queen Victoria and Prince Albert
Dan Jurafsky
Ranking Candidate Answers
• But what if there are multiple candidate answers!
Q: Who was Queen Victoria’s second son?• Answer Type: Person
• Passage:The Marie biscuit is named after Marie Alexandrovna, the daughter of Czar Alexander II of Russia and wife of Alfred, the second son of Queen Victoria and Prince Albert
Dan Jurafsky
Use machine learning:Features for ranking candidate answers
Answer type match: Candidate contains a phrase with the correct answer type.Pattern match: Regular expression pattern matches the candidate.Question keywords: # of question keywords in the candidate.Keyword distance: Distance in words between the candidate and query keywords Novelty factor: A word in the candidate is not in the query.Apposition features: The candidate is an appositive to question termsPunctuation location: The candidate is immediately followed by a comma, period, quotation marks, semicolon, or exclamation mark.Sequences of question terms: The length of the longest sequence of question terms that occurs in the candidate answer.
Dan Jurafsky
Candidate Answer scoring in IBM Watson
• Each candidate answer gets scores from >50 components• (from unstructured text, semi-structured text, triple stores)
• logical form (parse) match between question and candidate• passage source reliability • geospatial location
• California is ”southwest of Montana”• temporal relationships• taxonomic classification
47
Dan Jurafsky
48
Common Evaluation Metrics
1. Accuracy (does answer match gold-labeled answer?)2. Mean Reciprocal Rank
• For each query return a ranked list of M candidate answers.• Its score is 1/Rank of the first right answer.• Take the mean over all N queries
Gary G. Lee, Postech
49
A Generic QA Architecture
QAQuestion Answers
DocumentCollection
Question Processing
Document Retrieval
Answer Ex-traction &
Formulation
Resources (dictionary, ontology, …)
Query Documents
Gary G. Lee, Postech
50
IBM - Statistical QA
31 answer categories Named Entity tagging Focus expansion using WordNet Query expansion using encyclopedia Dependency Relations matching Use a maximum entropy model in both Answer Type Prediction and
Answer Selection
Ittycheriah et al. TREC-9, TREC-10
Query & Focus Ex-pansion
Answer Type Prediction
IR Answer Se-lection
Question
Docs
Answers
Answer type Named Entity Marking
Gary G. Lee, Postech
51
USC/ISI - Webclopedia
Parsing questions and top segments CONTEX
Query expansion using WordNet Score each sentence using word scores Match constraint patterns & parse trees Match semantic type & parse tree elements
Hovy et al. TREC-9, TREC-10
ParsingIR Matching
Question Docs
Answers
Create Query
Select & rank sentences
Parsing
Inference
Ranking
QAtypology
Constraintpatterns
Gary G. Lee, Postech
52
LCC - PowerAnswerMoldovan et al. ACL-02
M1: Keyword pre-processing
(split/bind/spell)
Question
Docs
M2: Construction of question repre-
sentation
M3: Derivation of expected an-
swer type
M4: Keyword selection
M5: Key-word expan-
sion
M6: Actual retrieval of documents
and passages
M7: Passage post-filtering
M8: Identifi-cation of
candidate an-swers
M9: An-swer
ranking
M5: Answer formulation
M8’: Logic
Proving
Answers
Loop1 Loop2Loop3
Gary G. Lee, Postech
53
LCC – PowerAnswer (cont’d) Keyword alternations
Morphological alternations invent inventor
Lexical alternations How far distance
Semantic alternations like prefer
3 feedback loops Loop1: adjust Boolean queries Loop2: lexical alternations Loop3: semantic alternations
Logic proving Unification between the question and answer logic forms
Dan Jurafsky
Relation Extraction
• Answers: Databases of Relations• born-in(“Emma Goldman”, “June 27 1869”)• author-of(“Cao Xue Qin”, “Dream of the Red Chamber”)• Draw from Wikipedia infoboxes, DBpedia, FreeBase, etc.
• Questions: Extracting Relations in QuestionsWhose granddaughter starred in E.T.?
(acted-in ?x “E.T.”) (granddaughter-of ?x ?y)
54
Dan Jurafsky
Temporal Reasoning
• Relation databases• (and obituaries, biographical dictionaries, etc.)
• IBM Watson”In 1594 he took a job as a tax collector in Andalusia”Candidates:
• Thoreau is a bad answer (born in 1817)• Cervantes is possible (was alive in 1594)
55
Dan Jurafsky
Geospatial knowledge(containment, directionality, borders)
• Beijing is a good answer for ”Asian city”• California is ”southwest of Montana”• geonames.org:
56
Dan Jurafsky
Context and Conversation in Virtual Assistants like Siri
• Coreference helps resolve ambiguitiesU: “Book a table at Il Fornaio at 7:00 with my mom”U: “Also send her an email reminder”
• Clarification questions:U: “Chicago pizza”S: “Did you mean pizza restaurants in Chicago or Chicago-style pizza?”
57
Dan Jurafsky
Answering harder questionsQ: What is water spinach?A: Water spinach (ipomoea aquatica) is a semi-aquatic leafy green plant with long hollow stems and spear- or heart-shaped leaves, widely grown throughout Asia as a leaf vegetable. The leaves and stems are often eaten stir-fried flavored with salt or in soups. Other common names include morning glory vegetable, kangkong (Malay), rau muong (Viet.), ong choi (Cant.), and kong xin cai (Mand.). It is not related to spinach, but is closely related to sweet potato and convolvulus.
Dan Jurafsky
Answering harder question
Q: In children with an acute febrile illness, what is the efficacy of single medication therapy with acetaminophen or ibuprofen in reducing fever?A: Ibuprofen provided greater temperature decrement and longer duration of antipyresis than acetaminophen when the two drugs were administered in approximately equal doses. (PubMedID: 1621668, Evidence Strength: A)
Dan Jurafsky
Answering harder questions via query-focused summarization
• The (bottom-up) snippet method• Find a set of relevant documents• Extract informative sentences from the documents (using tf-idf, MMR)• Order and modify the sentences into an answer
• The (top-down) information extraction method• build specific answerers for different question types:
• definition questions,• biography questions, • certain medical questions
Dan Jurafsky
The Information Extraction method
• a good biography of a person contains:• a person’s birth/death, fame factor, education, nationality and so on
• a good definition contains:• genus or hypernym
• The Hajj is a type of ritual• a medical answer about a drug’s use contains:
• the problem (the medical condition), • the intervention (the drug or procedure), and • the outcome (the result of the study).
Dan Jurafsky
Information that should be in the answer for 3 kinds of questions
Dan Jurafsky
Architecture for complex question answering: definition questions
S. Blair-Goldensohn, K. McKeown and A. Schlaikjer. 2004. Answering Definition Questions: A Hyrbid Approach.
Gary G. Lee, Postech
64
Postech qa: Answer Type
Top 18, Total 84
DATE TIME PROPERTY LANGUAGE_UNIT LANGUAGE SYMBOLIC_REP ACTION ACTIVITY LIFE_FORM
AGE DISTANCE DURATION LENGTH MONETARY_VALUE NUMBER PERCENTAGE POWER RATE SIZE SPEED TEMP
ERATURE VOLUME WEIGHT
QUANTITY NATURAL_OBJECT LOCATION SUBSTANCE ARTIFACT GROUP PHENOMENON STATUS BODY_PART
…………
SiteQ system
Gary G. Lee, Postech
65
Layout of SiteQ/JPart-Of-Speech Tagging
(POSTAG/J)Document Retrieval
(POSNIR/J)
Passage Selection- divide document into dynamic passages- score passages
Chunking &Category Processing- NP,VP chunking- syntactic/semantic category check
Detection of Answer Candidates- LSP matching
Question
Answer
SemanticCategory
Dictionary
LSP grammarfor question
LSP grammarfor answer
IndexDB
Identification ofAnswer Type- locate question word- LSP matching- date constraint check
Answer Filtering& Scoring& Ranking
SiteQ system
Gary G. Lee, Postech
66
Question Processing
question
Tagger
NP Chunker
Answer TypeDeterminerQuery
FormatterNormalizerRE Matcher
Qnorm dic
LSP
query answer type
SiteQ system
Gary G. Lee, Postech
67
Lexico-Semantic Patterns LSP is a pattern expressed by lexical entries, POS, syntactic
categories and semantic categories and more flexible than surface pattern
Used for Identification of answer type of a question Detection of answer candidates from selected passages
SiteQ system
Gary G. Lee, Postech
68
Answer Type Determination (1)
Who was the 23rd president of the United States?
Who/WP was/VBD the/DT 23rd/JJ president/NN of/IN the/DT United/NP States/NPS ?/SENT ?
%who%be@person
(%who)(%be)(@person) PERSON
Tagger
NP Chunker
Normalizer
RE Matcher
Who/WP was/VBD [[ the/DT 23rd/JJ president/NN ] of/IN [ the/DT United/NP States/NPS ]] ?/SENT ?
SiteQ system
Gary G. Lee, Postech
69
Dynamic Passage Selection (1) Analysis of answer passages of TREC-10 questions
About 80% of query terms occurred within 50-word (or 3-sentence) window including answer string in its center
Definition of passage Prefer sentence window to word window Prefer dynamic passage to static passage Passage is defined as any consecutive three sentences
SiteQ system
Gary G. Lee, Postech
70
Term list
Answer Extraction
Pivot cre-ation
Query-term detec-tion
Noun-phrase chunk-ing
Answer detection core
Answer detec-tion
Query-term filter-ing
Answer scor-ingAnswer filter-
ing
AAD process-ing
Stemming (Porter’s) WordNet
Stop-Word
SiteQ system
Gary G. Lee, Postech
71
Answer Detection (1)
… The/DT velocity/NN of/IN light/NN is/VBZ 300,000/CD km/NN per/IN second/JJ in/IN the/DT physics/NN
textbooks/NNS …
cd@unit_length%per@unit_time speed|1|4|4
SiteQ system
Gary G. Lee, Postech
72
Answer Scoring Based on
Passage score Number of unique terms matched in the passage Distances from each matched term
)()1(
avgdistptucptuc
qtucptucLSPwgt
PScoreAScore
LSPwgt: the weight of LSP grammaravgdist: the average distance between the answer candidate and each matched term
SiteQ system
Gary G. Lee, Postech
73
Performance in TREC-10
MRR MRR(strict) (lenient)
how + adj/adv 31 0.316 0.332how do 2 0.250 0.250what do 24 0.050 0.050what is 242 0.308 0.320what/which noun 88 0.289 0.331when 26 0.362 0.362where 27 0.515 0.515who 46 0.464 0.471why 4 0.125 0.125name a 2 0.500 0.500Total 492
Q-type freqRank # of Qs
(strict)# of Qs
(le-nient)
Avg. of 67 runs
1 121 124 88.582 45 49 28.243 24 29 20.464 15 16 12.575 11 14 12.46
No 276 260 329.7MRR 0.320 0.335 0.234
SiteQ system
Gary G. Lee, Postech
74
Performance in QAC-1 Performance of question
answering Return 994 answer candidates
among which 183 responses were judged as correct
MRR is 0.608, the best performance (average is 0.303)
Question Answer Output Correct
200 288 994 183
Recall Precision F-measure MRR
0.635 0.184 0.285 0.608
Rank # of questions
1 98
2 27
3 14
4 6
5 4
Total 149
SiteQ system
Gary G. Lee, Postech
75
Contents Introduction: QA as a high precision search Architecture of QA Systems Main components for QA A QA system: SiteQ
Issues for advanced QA
Gary G. Lee, Postech
76
Question Interpretation Use pragmatic knowledge
“Will Prime Minister Mori survive the political crisis?” The analyst is not literary meaningWill Prime Minister Mori be still alive when the political crisis is over? But expresses the belief thatThe current political crisis might cost the Japanese Prime Minister his
job. Decompose complex questions into series of simpler
questions Expand the question with contextual information
Gary G. Lee, Postech
77
Semantic Parsing Use the explicit representation of semantic roles as encoded
in FrameNet FrameNet provides database of words with associated semantic
roles Enables efficient, computable semantic representation of
questions and texts. Filling in semantic roles
Prominent role for named entities Generalization across domains FrameNet coverage and fallback techniques
Gary G. Lee, Postech
78
Information Fusion Combining answer fragments in a concise response Technologies
Learning paraphrases, syntactic and lexical Detecting important differences Merging descriptions Learning and formulating content plans
Gary G. Lee, Postech
79
Answer Generation Compare request fills from several answer extractions to
derive answer Merge / remove duplicates Detect ambiguity Reconcile answer (secondary search) Identify possible contradictions Apply utility measures
Gary G. Lee, Postech
80
And more … Real Time QA Multi-Lingual QA Interactive QA Advanced Reasoning for QA User Profiling for QA Collaborative QA Spoken language QA