QUALIFIER in TREC-12 QA Main Task
Hui Yang, Hang Cui, Min-Yen Kan, Mstislav Maslennikov, Long Qiu, Tat-Seng Chua School of ComputingNational University of SingaporeEmail: [email protected]
Outline
IntroductionFactoid SubsystemList SubsystemDefinition Subsystem ResultConclusion and Future Work
Introduction
Given a question and a large text corpus, return an “answer” rather than relevant “documents”
QA is at the intersection of IR + IE + NLP Our system - QUALIFIER
Consists 3 subsystems External Resources – Web, WordNet, Ontology Event-based Question Answering New Modules introduced
Outline
IntroductionFactoid SubsystemList SubsystemDefinition Subsystem ResultConclusion and Future Work
Factoid System Overview
Q u es t io n(d efin i t io n ,
facto id ,l i s t )
o r igin a l que r yt e r m s
O n to lo g y
A n s w er
Q u es tio n An aly s is
Q ue st io n C la ssif ic a t io n
Q u er y P ar s in g
q c las s
Q A E v en t An aly s is
W eb P re-ret ri ev alD o cu m en ts
S n ip p e ts
W o r d N et
S t ru ctu red Q u ery
D o c u m en t R e tr iev a l
T R E CC o r p u s
An s w er E x tr ac t io n
N am ed E n tity
A n s w er J u s t ificat io n
c an o n ic a liza t io nr es o lu tio n
P as s ag e R etr iev a l
As s o c ia t io n R u lesN S en ten ce W in d o w
An ap h o r aR es o lu tio n
D o c u m en ts
R efin ed D o cu m en ts
An s w er S e lec tio n
S C R
Factoid Subsystem
Detailed Question Analysis QA Event Construction QA Event Mining Answer Selection Answer Justification Fine-grained Named Entity Recognition Anaphora Resolution Canonicalization Coreference Successive Constraint Relaxation
Factoid Subsystem
Detailed Question Analysis QA Event Construction QA Event Mining Answer Selection Answer Justification Fine-grained Named Entity Recognition Anaphora Resolution Canonicalization Coreference Successive Constraint Relaxation
Why Event-based QA - I
The world consists of two basic types of things: entities and events and people often ask questions about them.
From Question Answering’s Point of View Questions = “enquiries about entities or
events”.
Why Event-based QA - II
QA Entities “Anything having existence (living or
nonliving)” E.g. “What is the democratic party
symbol?”
QA Events “Something that happens at a given place
and time”. E.g. “How did donkey become
democratic party symbol?”
Thomas Nast
1870Harper’s Weekly cartoon
Why Event-based QA - III Entity Questions
Properties, or entities themselves
definition questions. Event Questions
Elements of events Location, Time, Subject, Object, Quantity Description Action, etc.
WH-Question QA Event Elements
Who Subject, Object
Where Location
When Time
WhatSubject, Object, Description, Action
Which Subject, Object,
How Quantity, Description
Table 1: Correspondence of WH-Questions & Event Elements
question :== event | event_element | entity | entity_property event :== { event_element }event_element :== time | location | subject | object | quantity | description | action |
otherentity :== object | subjectentity_property :== quantity | description | other
Event-based QA Hypothesis
Equivalency: QA event Ei,Ej ,if all_elements(Ei) =
all_elements(Ej), then Ei = Ej, and vice versa;
Generality: if all_elements(Ei) is a subset of
all_elements(Ej), then Ei is more general than Ej;
Cohesiveness: if elements a, b both belong to an event Ei, and a, c do not belong to a known event,
then co-occurrence(a,b) is greater than co-occurrence(a,c);
Predictability: if elements a, b both belong to an event Ei, then a => b and b => a.
QA Event Space
Consider an event to be a point in a multi-dimensional QA event space.
If we know all the elements about an event, then we can easily answer different questions about it E.g. “When did Bob Marley die ?”
As there are innate associations among these elements if they belong to the same event (Cohesiveness), we can use what are already known To narrow the search scope To find rest of the unknown event elements, the answer (Predictability)
Problems to be Solved
However, for most of the cases, it is difficult to find the correct unknown element(s), i.e., the correct answer
Two major problems: Insufficient known elements Inexact known elements
Solution: Explore the use of world knowledge (Web and WordNet glosses) to find more known elements Exploit the lexical knowledge from (WordNet synsets and morphemics) to find exact forms.
How to Find a QA Event
Using Web From original query term q(0) , retrieve top N web documents qi
(0)q(0), extract nearby non-trivial words in one sentence or n words away (in Cq ) and rank them by computing its probability of correlation with qi
(0)
Using WordNet qi
(0)q(0), extract terms that are lexically related to q i(0) by
locating them in Gloss Gq and Synset Sq Combine the external knowledge resources to form term
collection:Kq = Cq + (Gq Sq)
)(
)()( )0(
)0(
iiks
iiksik
qtd
qtdtweight
QA Event Construction
Structured Query FormulationWe perform structural analysis on Kq to
form semantic groups of terms
Given any two distinct terms ti, tj Kq , we
compute their Lexical correlation Co-occurrence correlation Distance correlation
QA Event Construction
For example, “What Spanish explorer discovered the Mississippi River?”
Eve n t El e m e n t s
Eve n t
" M ississip p i"
fi rs triv er
E u ro p ean
1 5 4 1 H ern an d oS o to D e
M ississip p i
S p an ishF ren ch
The final Boolean query becomes: “(Mississippi) & (French|Spanish) & (Hernando & Soto & De) & (1541) & (explorer) & (first | European |river)”.
QA Event Mining
Extract important association rules among the elements by using data mining techniques.
Given a QA event Ei, we define X, Y as two sets of event elements.
Event mining studies the rules of the form X Y, where X, Y are QA event element sets, X Y =, and Y {elementoriginal }=. if X Y , ignore X Y. if cardinality(Y) > 1, ignore X Y. if Y {elementoriginal }, ignore X Y.
Passage & Answer Selection
Select Passage based on Answer Event Score (AES) from the relevant documents in the QA corpus:
Support (X Y) = Confidence (X Y) = The weight for answers candidate j is
defined as:
ele
N
iiiiele
N
ruleConfidenceruleSupportMMPAES
r
1
)))()((*()(
XXYXd
andedoriginalwindow
wexp
)(
)(
)(
Xd
YXd
w
w
jY
iij ruleSupportPAESjweight
1)()()(
Related Modules: Fine-grained Named Entity Recognition
Fine-grained NE Tagging Non-ascii Character Remover Number Format Converter
E.g. “one hundred eleven” => 111 Rule Confliction Revolver
Longer Length Ontology Handcrafted Priorities
HUMAN: Basic, Organization, Person
TIME: Basic, Day, Month, Year
LOCATI ON: Basic, Body, City, Continent, Country, County, I sland, Lake, Mountain, Ocean, Planet, Province, River, Town
NUMBER: Basic, Age, Area, Count, Degree, Distance, Frequency, Money, Percent, Period, Range, Size, Speed
CODE URL, Telephone, Post code, Email address, Product index
OBJ ECT:
Basic, Animal, Breed, Color, Currency, Entertainment, Game, Language, Music, Plant, Profession, Religion, War, Works
Related Modules: Answer Justification
We generate axioms based on our manually constructed ontology. For example, q1425: What is the population of Maryland? Sentence: “Maryland 's population is 50,000 and
growing rapidly.” Ontology Axiom (OA): Maryland (c1) & population
(c1, c2) -> 5000000(c2)
In this way, we could identify the wrong answer “50000”, which is the surface text shown.
Factoid Results
1 Focus on answer coverage
w anaphora resolution, more successive constraint relaxation loops
2 Focus on answer precision
w/o anaphora resolution, less successive constraint relaxation loops
Factoid Results
# correct 232 Accuracy 0.562 # unsupported 24 Precision of recognizing NI L 0.160 # inexact 13 Recall of recognizing NI L 0.400
1
# wrong 144 # correct 225 Accuracy 0.545 # unsupported 20 Precision of recognizing NI L 0.158 # inexact 12 Recall of recognizing NI L 0.767
2
# wrong 156
Outline
IntroductionFactoid SubsystemList SubsystemDefinition Subsystem ResultConclusion and Future Work
List System Overview
Q u es t io n(d efin i t io n ,
facto id ,l i s t )
o r igin a l que r yt e r m s
O n to lo g y
A n s w er
Q u es tio n An aly s is
Q ue st io n C la ssif ic a t io n
Q u er y P ar s in g
q c las s
Q A E v en t An aly s is
W eb P re-ret ri ev alD o cu m en ts
S n ip p e ts
W o r d N et
S t ru ctu red Q u ery
D o c u m en t R e tr iev a l
T R E CC o r p u s
An s w er E x tr ac t io n
N am ed E n tity
A n s w er J u s t ificat io n
c an o n ic a liza t io nr es o lu tio n
P as s ag e R etr iev a l
As s o c ia t io n R u lesN S en ten ce W in d o w
An ap h o r aR es o lu tio n
D o c u m en ts
R efin ed D o cu m en ts
An s w er S e lec tio n
S C R
List Subsystem
Multiple Answers from Same Paragraph Canonicalization Resolution
Unique answer “the States” , “USA”, “United States”, etc
Pattern-based Answer Extraction <same_type_NE>, <same_type_NE> and
<same_type_NE> + verb … … include: <same_type_NE>, <same_type_NE>,
<same_type_NE> … “list of …” “top” + number + adj-superlative
List Results
Average precision 0.568 Average recall 0.264
nusmmlr1 nusmmlr2 nusmmlr3 Average F1 0.317
Outline
IntroductionFactoid SubsystemList SubsystemDefinition Subsystem ResultConclusion and Future Work
System Overview
Q u es t io n(d efin i t io n ,
facto id ,l i s t )
o r igin a l que r yt e r m s
O n to lo g y
A n s w er
Q u es tio n An aly s is
Q ue st io n C la ssif ic a t io n
Q u er y P ar s in g
q c las s
Q A E v en t An aly s is
W eb P re-ret ri ev alD o cu m en ts
S n ip p e ts
W o r d N et
S t ru ctu red Q u ery
D o c u m en t R e tr iev a l
T R E CC o r p u s
An s w er E x tr ac t io n
N am ed E n tity
A n s w er J u s t ificat io n
c an o n ic a liza t io nr es o lu tio n
P as s ag e R etr iev a l
As s o c ia t io n R u lesN S en ten ce W in d o w
An ap h o r aR es o lu tio n
D o c u m en ts
R efin ed D o cu m en ts
An s w er S e lec tio n
S C R
Definition SubsystemI nput:
Rel evantSentences
Defi ni ti onalPattern
Reposi tory
SentenceRanki ng
* - - - - - - - -- - - - - - - -* - - - - - - -- - - - - - - - - -
……-- - - - - - - -* - - - - - - - -- - - - - - - - - - -* - - - - - - -
Stati sti csfor words i nthe sentences
Constructqueri es
Web
WebSni ppets
Most co-occurri ngwords i n Web
sni ppets.
Sentence Sel ecti on(Progressi ve MMR) Defi ni t i on
Definition Subsystem
Pre-processing document filter anaphora resolution sentence “positive set” and “negative set”
Sentence Ranking Sentence weighting in Corpus
Sentence weighting in Web
Overall weighting :
)))(
#1log()(1log()(
wCorpusSF
SentencesNegativewCorpusSFsWeight
NegativeswPositiveCorpus
w Positive
Web wSFCorpus
ntencesPositiveSewSFWebsWeight )
)(_
#1log())(_1log()(
WebCorpus WeightWeightsWeight )1()(
Definition Subsystem
Answer Generation (Progressive Maximal Margin Relevance)
1. All sentences are ordered in descending order by weights.
2. Add the first sentence to the summary.3. Examine the following sentences.
If Weight(stc)- Weight(next_stc) >avg_sim(stc), Add next_stc to summary;
4. Go to Step 3) till the length limit of the target summary is satisfied.
Definition Results
We empirically set the length of the summary for People and Objects based on question classification results.
Run # # sentence Algorithm Result
1 People:12 Object: 10
Full sentences 0.471
2 People:12 Object: 10
Text f ragments 0.479
3 People:10 Object:8
Text f ragments 0.460
Outline
IntroductionFactoid SubsystemList SubsystemDefinition Subsystem ResultConclusion and Future Work
Overall Performance
nusmmlr1 0.471 nusmmlr2 0.479 nusmmlr3 0.460
Conclusion and Future Work
Conclusion Event-based Question Answering Factoid question and list questions explore the power of Event-
based QA Definition questions answering combines IR and Summarization Use Ontology to boost the performance of our NE and answer
justification modules Future Work
Give a formal proof of our QA event hypothesis Working towards an online question answering system Interactive QA Analysis and opinion questions VideoQA – question answering on news video
Top Related