Is Question Answering an Acquired Skill?
description
Transcript of Is Question Answering an Acquired Skill?
Is Question Answering
an Acquired Skill?Soumen Chakrabarti
IIT BombayWith
Ganesh Ramakrishnan
Deepa Paranjpe Pushpak
Bhattacharyya
QA Chakrabarti
The query-response gap Language models for Web corpus and Web
queries radically different (Church, 2003—4)
Not surprising, because• Users are conditioned to drop verbs,
prepositions and articles (anything interesting)• Queries inherently seek to express a “missing
piece”, documents don’t IR vs. DB
• DB queries clearly indicate what’s given and what’s missing in a query
• IR systems do not (yet)
QA Chakrabarti
Web search and QA Information need – words relating
“things” + “thing” aliases = telegraphic Web queries• Cheapest laptop with wireless
best price laptop 802.11• Why is the sky blue? sky blue because• When was the Space Needle built?
“Space Needle” history People used to ask telegraphic queries
• Fix keywords you are sure of• Guess document features that will answer
the missing piece in your query
QA Chakrabarti
Factoid QA Specialize given domain to a token related
to ground constants in the query• What animal is Winnie the Pooh?
• hyponym(“animal”) NEAR “Winnie the Pooh”• When was television invented?
• instance-of(“time”) NEAR “television” NEAR synonym(“invented”)
FIND x “NEAR” GroundConstants(question) WHERE x IS-A Atype(question)• Ground constants: Winnie the Pooh, television• Atypes: animal, time
QA Chakrabarti
A relational view of QA
Entity class or atype may be expressed by• A finite IS-A hierarchy (e.g. WordNet, TAP)• A surface pattern matching infinitely many strings
(e.g. “digit+”, “Xx+”, “preceded by a preposition”) Match selectors, specialize atype to answer tokens
Question Atypeclues Selectors
Answerpassage
Questionwords
“Answer zone”
DirectsyntacticmatchEntity class
IS-ALimit searchto certain rows
Locate whichcolumn to read
“Answer zone”
Attributeor column
name
QA Chakrabarti
But who provides is-a info? Compiled knowledge bases (WordNet,
CYC) Automatic “soft” compilations
• Google sets• KnowItAll• BioText
Basic tricks• Do jordan and
basketball cooccurmore often thanyou’d expect?
• Small phrase probes like “actor Willis”
QA Chakrabarti
Benefits of the relational view “Scaling up by dumbing down”
• Next stop after vector-space• Far short of real knowledge representation
and inference• Barely getting practical at (near) Web scale
Can set up as a learning problem: train with questions (query logs) and answers in context
Transparent, self-tuning, easy to deploy• Feature extractors used in entity taggers• Relational/graphical learning on features
QA Chakrabarti
Broad strategy Learn soft patterns of correlation
between question features and answer context
Use models to index corpus with atype annotations
Given query, assign a soft reward to all atype patterns
Search efficiently for passages containing promising tokens
Score passages and report best token sequences
QA Chakrabarti
What TREC QA feels like How to assemble chunker, parser, POS and NE
tagger, WordNet, WSD, … into a QA system? Experts get much insight from old QA pairs
• Matching an upper-cased term adds a 60% bonus … for multi-words terms and 30% for single words
• Matching a WordNet synonym … discounts by 10% (lower case) and 50% (upper case)
• Lower-case term matches after Porter stemming are discounted 30%; upper-case matches 70%
QA Chakrabarti
Talk outline Relational interpretation of QA Motivation for a “clean-room” IE+ML
system Learning to map between questions and
answers using is-a hierarchies and IE-style surface patterns• Can handle prominent finite set of atypes:
person, place, time, measurements,… Extending to arbitrary atype specializations
• Required for what… and which… questions Ongoing work and concluding remarks
QA Chakrabarti
Feature + Soft match FIND x “NEAR” GroundConstants(question)
WHERE x IS-A Atype(question) No fixed question or answer type system Convert “x IS-A Atype(question)” to a soft
match “DoesAtypeMatch(x, question)Question Answer tokens
Passage
IE-style surfacefeature extractors
WordNet hypernymfeature extractors
IE-style surfacefeature extractors
Question feature vector
Snippet feature vectorLearn joint distrib.
QA Chakrabarti
Feature extraction: Intuitionhow who
fast manyfar rich wrote first
How fast can a cheetah run?
A cheetah can chase its preyat up to 90 km/h
How fast does light travel?
Nothing moves faster than186,000 miles per hour, thespeed of light
rate#n#2
abstraction#n#6NNS
rate
#n#2
mag
nitu
de_r
elat
ion#
n#1
mile
#n#3
linea
r_un
it#n#
1
mea
sure
#n#3
defin
ite_q
uant
ity#n
#1
pape
r_m
oney
#n#1
curre
ncy#
n#1
writer, composer,artist, musician
NNP, person
explorer
QA Chakrabarti
Feature extractors Question features: 1, 2, 3-token
sequences starting with standard wh-words
Passage surface features: hasCap, hasXx, isAbbrev, hasDigit, isAllDigit, lpos, rpos,…
Passage WordNet features: all noun hypernym ancestors of all senses of token
Get top 300 passages from IR engine For each token invoke feature extractors Label = 1 if token is in answer span, 0 o/w Question vector xq, passage vector xp
QA Chakrabarti
Preliminary likelihood ratio tests
Surface patterns WordNet hypernyms
QA Chakrabarti
Joint feature-vector design Obvious “linear” juxtaposition x=(xp,xq)
• Does not expose pairwise dependencies “Quadratic” form x = xq xp
• All pairwise product of elements Model has param for every pair
Can discount for redundancy in pair info If xq (xp) is fixed, what xp (xq) will yield
the largest Pr(Y=1|x)? (linear iceberg query)
xwx
exp11)|1Pr(Y
how_farwhen
what_city
region#n#3entity#n#1
QA Chakrabarti
Classification accuracy
0
0.2
0.4
0.6
0.8
1
0 0.05 0.1 0.15 0.2False positiveTr
ue p
ositi
ve LinearQuadratic
0
0.2
0.4
0.6
0.8
0 0.2 0.4 0.6 0.8 1RecallP
reci
sion
Linear
Quadratic
Pairing more accurate than linear model Steep learning curve; linear never “gets it” beyond
“prior” atypes like proper nouns (common in TREC) Are the estimated w parameters meaningful?
QA Chakrabarti
Parameter anecdotes Surface and
WordNet features complement each other
General concepts get negative params: use in predictive annotation
Learning is symmetric (QA)
QA Chakrabarti
Query-driven information extraction
“Basis” of atypes A, a A could be a synset, a surface pattern, feature of a parse tree
Question q “projected” to vector (wa: a A) in atype space via learning conditional model
E.g. if q is “when…” or “how long…” whasDigit and wtime_period#n#1 are large, wregion#n#1 is small
Each corpus token t has associated indicator features a(t ) for every a
E.g. hasDigit(3,000) = is-a(region#n#1)(Japan) = 1 Can also learn [0,1] value of is-a proximity
QA Chakrabarti
Single token scoring A token t is a candidate answer if
Hq(t ): Reward tokens appearing “near” selectors matched from question• 0/1: appears within fixed window with selector/s• Activation in linear token sequence model• Proximity in chunk sequences, parse trees,…
Order tokens by decreasing
0)()( Aa
aa qwtAtype indicator features of the token
Projection of questionto “atype space”
…the armadillo, found in Texas, is covered with strong horny plates
Aa
aaq qwttH )()()(
QA Chakrabarti
Mean reciprocal rank (MRR) nq = smallest rank among answer
passages MRR = (1/|Q|) qQ(1/nq)
• Dropping passage from #1 to #2 as bad as dropping it from #2 to
TREC requires MRR5: round up nq>5 to • Improving rank from 20 to 6 as useless as
improving it from 20 to 15 Aggregate score influenced by many
complex subsystems• Complete description rarely available
QA Chakrabarti
Effect of eliminating non-answers
300 top IR score hits If Pr(Y=1|token) <
threshold reject token All tokens rejected then
reject passage Present survivors in IR
order
0
100
200
300
0 100 200 300IR rank
Ran
k af
ter f
ilter
ing
TREC 2000TREC 2002
TREC 20000.491
0.336
0.3
0.4
0.5
0 0.5Acceptance threshold
MR
R
MRRMRR5Baseline
TREC 20020.334
0.224
0.2
0.25
0.3
0.35
0 0.5Acceptance threshold
MR
R
MRRMRR5Baseline
QA Chakrabarti
Drill-down and ablation studies Scale average MRR
improvement to 1• What, Which <
average• Who average
Atype of what… and which… not captured well by 3-grams starting at wh-words
Atype ranges over essentially infiniteset with relativelylittle training data
TREC 2002
0.8
0.9
1
1.1
1.2
wha
t
whi
ch
nam
e
whe
re
how
whe
n
whoQuestion
type-->
Rel
ativ
e M
RR
ga
in
QA Chakrabarti
Talk outline Relational interpretation of QA Motivation for a “clean-room” IE+ML
system Learning to map between questions and
answers using is-a hierarchies and IE-style surface patterns• Can handle prominent finite set of atypes:
person, place, time, measurements,… Extending to arbitrary atype specializations
• Required for what… and which… questions Ongoing work and concluding remarks
QA Chakrabarti
What…, which…, name… atype clues
Assumption: Question sentence has a wh-word and a main/auxiliary verb
Observation: Atype clues are embedded in a noun phrase (NP) adjoining the main or auxiliary verb
Heuristic: Atype clue = head of this NP• Use a shallow parser and apply rule
Head can have attributes• Which (American (general)) is buried in
Salzburg?• Name (Saturn’s (largest (moon)))
QA Chakrabarti
Atype clue extraction statsQuestion
type #Questions #Extracted correctly
what 630 612which 29 28name 23 20
Simple heuristic quite effective If successful, extracted atype is mapped to
WordNet synset (mooncelestial body etc.) If no atype of this form available, try the “self-
evident” atypes (who, when, where, how_X etc.) New boolean feature for candidate token: is
token hyponym of atype synset?
QA Chakrabarti
The last piece: Learning selectors
Which question words are likely to appear (almost) unchanged in an answer passage?• Constants in select-clauses of SQL queries• Guides backoff policy for keyword query
Arises in Web search sessions too• Opera login fails• Opera problem with login• Opera login accept password• Opera account authentication• …
QA Chakrabarti
Features for identifying selectors
Local and global features• POS of word, POS of adjacent words, case
info, proximity to wh-word• Suppose word is associated with synset set S
• NumSense: size of S (how polysemous is the word?)
• NumLemma: average #lemmas describing s S
Model as a sequential learning problem• Each token has local context and global
features
POS@0 POS@1POS@-1
QA Chakrabarti
Selector results Global features (IDF, NumSense, NumLemma)
essential for accuracy• Best F1 accuracy with local features alone: 71—73%• With local and global features: 81%
Decision trees better than logistic regression• F1=81% as against LR F1=75%• Intuitive decision branches• But logistic regression gives scores for query backoff
N um Lem ma@ 0<=2.5 N um Lem m a@ 0>2.5
N um S ense@ 0<=9 N um Sense@ 0>9
P O S@ -1=N oun ...
P O S@ 0=Adj
PO S @ -1=N oun
N um Lem m a@ 0<=1.82 N um Lem m a@ 0>1.82
PO S @ 0=V erb
QA Chakrabarti
Putting together a QA system
QASystem
Wordnet
POSTagger
TrainingCorpus
Shallow parse
rLearning tools
N-E Tagger
QA Chakrabarti
Question
PassageIndex
Corpus
Sentence splitterPassage indexer
Candidatepassage
Keyword query
Keyword querygenerator
ShallowParser
Noun andverb markers
AtypeExtractor
Atype clues
Learning to rerank passagesSample features:•Do selectors match? How many?•Is some non-selector passage token a specialization of the question’s atype clue?•Min, avg, linear token distance between candidate token and matched selectors
LogisticRegression
Rerankedpassages
Putting together a QA systemTokenizer
POS TaggerTaggedquestion
TokenizerPOS Tagger
Entity Extractor
Taggedpassage
SelectorLearner
Is QA pair?
QA Chakrabarti
Learning to re-rank passages Remove passage tokens matching
selectors• User already knows these are in passage
Find passage token/s specializing atype
For each candidate token collect• Atype of question, original rank of passage• Min, avg linear distances to matched
selectors• POS and entity tag of token if available
Ushuaia, a port of about 30,000 dwellers set between the Beagle Channel and …
How many inhabitants live in the town of Ushuaia
selector matchSurface pattern hasDigitsWordNet match
5 tokens apart 1
QA Chakrabarti
Re-ranking results Categorical and
numeric attributes Logistic regression Good precision,
poor recall Use logit score to
re-rank passages Rank of first correct
passage shifts substantially
194479
1
10
100
1000
1 2 3 4 5 6 7 8 9 10Answer at rankFr
eque
ncy
BaselineRerank
QA Chakrabarti
MRR gains from what, which, name
Substantial gain in MRR
What/which now show above-average MRR gains
TREC 2000 top MRRs:0.76 0.71 0.46 0.46 0.31
Ranking strategy TREC 2000 TREC 2002IR score (Lucene) 0.377 0.249Conditional model 0.491 0.334Atype for what/which/name 0.71 0.565
00.10.20.30.40.50.60.70.8
whe
n
wha
t
whe
re
how
whi
ch
how
man
y
how
muc
h
Question type
MR
R
Pre-reranking
Post-reranking
QA Chakrabarti
Generalization across corpora
Across-year numbers close to train/test split on a single year
Features and model seem to capture corpus-independent linguistic Q+A artifacts
QA Chakrabarti
Conclusion Clean-room QA= feature extraction+learning
• Recover structure info from question• Learn correlations between question structure
and passage features Competitive accuracy with negligible domain
expertise or manual intervention Ongoing work
• Use model coefficients for predictive annotation• Combine token scores to better passage scores• Treat all question types uniformly• Use redundancy available from the Web