AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3...

AQUAINT

BBN’s AQUA ProjectBBN’s AQUA Project

Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu

3 December 2002

AQUAINTBBN’s Approach to QABBN’s Approach to QA

• Theme: Use document retrieval, entity recognition, & proposition recognition

• Analyze the question

– Reduce question to propositions and a bag of words

– Predict the type of the answer

• Rank candidate answers using passage retrieval from primary corpus (the Aquaint corpus)

• Other knowledge sources (e.g. the Web) are optionally used to rerank answers

• Re-rank candidates based on propositions

• Estimate confidence for answers

AQUAINTSystem DiagramSystem Diagram

Question Classification

Web Search

NP Labeling

Treebank

Name Annotation

Name Extraction

Parsing

Description ClassificationProposition Finding

Document Retrieval

Confidence Estimation

Passage Retrieval

Question

Answer & Confidence Score

Name Extraction

Regularization Proposition Bank

AQUAINT

Question ClassificationQuestion Classification

AQUAINTQuestion ClassificationQuestion Classification

• A hybrid approach based on rules and statistical parsing & question templates– Match question templates against statistical parses– Back off to statistical bag-of-word classification

• Example features used for classification– The type of WHNP starting the question (e.g. “Who”,

“What”, “When” …) – The headword of the core NP– WordNet definition– Bag of words– Main verb of the question

• Performance– TREC8&9 questions for training– ~85% when testing on TREC10

AQUAINTExamples of Question AnalysisExamples of Question Analysis

• Where is the Taj Mahal?

– WHNP=where

– Answer type: Location or GPE

• Which pianist won the last International Tchaikovsky Competition?

– Headword of core NP=pianist,

– WordNet definition=person

– Answer type: Person

AQUAINTQuestion-Answer TypesQuestion-Answer Types

Type Subtype

ORGANIZATIONCORPORATION EDUCATIONAL GOVERNMENT HOSPITAL HOTEL MUSEUM OTHER POLITICAL RELIGIOUS

LOCATION CONTINENT LAKE_SEA_OCEAN OTHER REGION RIVER BORDER

FAC AIRPORT ATTRACTION BRIDGE BUILDING HIGHWAY_STREET OTHER

PRODUCT DRUG OTHER VEHICLE WEAPON

NATIONALITY NATIONALITY OTHER POLITICAL RELIGION

LANGUAGE

FAC_DESC AIRPORT ATTRACTION BRIDGE BUILDING HIGHWAY_STREET OTHER

GPE_DESC CITY COUNTRY OTHER STATE_PROVINCE

ORG_DESCCORPORATION EDUCATIONAL GOVERNMENT HOSPITAL HOTEL MUSEUM OTHER POLITICAL RELIGIOUS

CONTACT_INFO ADDRESS OTHER PHONE

WORK_OF_ART BOOK OTHER PAINTING PLAY SONG

*Thanks to USC/ISI and IBM groups for sharing the conclusions of their analyses.

AQUAINTQuestion Answer Types (cont’d)Question Answer Types (cont’d)

PRODUCT_DESC OTHER VIHICLE WEAPON

PERSON

EVENT HURRICAN OTHER WAR

SUBSTANCE CHEMICAL DRUG FOOD OTHER

PER_DESC

PRODCUT OTHER

ORDINAL

ANIMAL

QUANTITY1D 1D_SPACE 2D 2D_SPACE 3D 3D_SPACE ENERGY OTHER SPEED WEIGHT TEMPERATURE

GPE CITY COUNTRY OTHER STATE_PROVINCE

DISEASE

CARDINAL

PERCENT

DATE AGE DATE DURATION OTHER

AQUAINTFrequency of Q TypesFrequency of Q Types

finiti

AQUAINT

InterpretationInterpretation

AQUAINTIdentiFinderIdentiFinderTMTM Status Status

• Current IdentiFinder performance on types

• IdentiFinder easily trainable for other languages, e.g., Arabic and Chinese

SubcategoryCategory

88 89 88.487 88 87.3

AQUAINTProposition IndexingProposition Indexing

• A shallow semantic representation

– Deeper than bags of words

– But broad enough to cover all the text

• Characterizes documents by

– The entities they contain

– Propositions involving those entities

• Resolves all references to entities

– Whether named, described, or pronominal

• Represents all propositions that are directly stated in the text

AQUAINTProposition Finding ExampleProposition Finding Example

Propositions

• (e1: “Dell”)

• (e2: “Comaq”)

• (e3: “the most PCs”)

• (e4: “2001”)

• (sold subj:e1, obj:e3, in:e4)

• (beating subj:e1, obj:e2)

• Question: Which company sold the most PCs in 2001?

• Text: Dell, beating Compaq, sold the most PCs in 2001.

• Passage retrieval would select the wrong answer

Answer

AQUAINTProposition Recognition StrategyProposition Recognition Strategy

• Start with a lexicalized, probabilistic (LPCFG) parsing model

• Distinguish names by replacing NP labels with NPP

• Currently, rules normalize the parse tree to produce propositions

• At a later date, extend the statistical model to

– Predict argument labels for clauses

– Resolve references to entities

AQUAINTConfidence EstimationConfidence Estimation

• Compute probability P(correct|Q,A) from the following featuresP(correct|Q,A)P(correct|type(Q), <m,n>, PropSat)– type(Q): question type– m: question length– n: number of matched question words in answer

context– PropSat: whether answer satisfies propositions in the

question• Confidence for answers found on the Web P(correct|Q,A)P(correct|Freq, InTrec)

– Freq=Number of Web hits, using Google– InTrec=Whether Q was also a top answer from Aquaint

corpus

AQUAINT

Dependence of Answer Correctness Dependence of Answer Correctness on Question Typeon Question Type

AQUAINT

Dependence on Proposition Dependence on Proposition SatisfactionSatisfaction

PropSat=True PropSat=False

AQUAINT

Dependence on Number of Matched Dependence on Number of Matched WordsWords

0 2 4 6

number of matched words

questionlength=3

questionlength=4

questionlength=5

AQUAINT

Dependence of AnswerDependence of AnswerCorrectness on Web FrequencyCorrectness on Web Frequency

0 50 100 150

Fre que ncy of answe r in Google sum m arie s

INT REC t rue

INT REC false

AQUAINTOfficial Results of TREC2002QAOfficial Results of TREC2002QA

RunTagUnranked Average

Precision

Ranked Average

Precision

Upper-bound

BBN2002A 0.186 0.257 0.498

BBN2002B 0.288 0.468 0.646

BBN2002C 0.284 0.499 0.641

• BBN2002A did not use Web

• BBN2002B&C used Web

• Unranked average precision=percentage of questions for which the first answer is correct

• Ranked average precision=Confidence weighted score, the official metric for TREC2002

• Upper-bound=confidence weighted score given perfect confidence estimation

AQUAINTRecent Progress Recent Progress

• In the last six months, we have:

– Retrained our name tagger (IdentiFinderTM) for roughly 29 question types

– Distributed the re-trained English version of IdentiFinder to other sites

– Participated in the Question Answering track of TREC 2002

– Participated in a pilot evaluation of automatically answering definitional/biographical questions

– Developed a demonstration of our question answering system AQUA against streaming news

AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3...

Documents

Transcript of AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3...

BBN Technologies Statistical Models of Text: From Bags of Words to Structure Ralph Weischedel 17 April 2000.

History of PMMA of Trustees HON. PATRICIA B. LICUANAN, Ph.D. Chairman, Commission on Higher Education Chairperson, PMMA Board of Trustees RADM RICHARD U RITUAL PMMA Superintendent,

Geodesics in Heat - Department of Computer Sciencemisha/ReadingSeminar/Papers/Crane12.pdfGeodesics in Heat Keenan Crane Caltech Clarisse Weischedel Universitat G¨ ottingen¨ Max Wardetzky

Patricia B. Licuanan , Ph.D. Chairperson Commission on Higher Education

Alice Lorraine Flood was born September 14, 1927 in ......Alice Lorraine Flood was born September 14, 1927 in Okoboji, SD to Emil Flood and Mary (Mollie) Weischedel. She was the fourth

· PDF filephilippines statement by hon. patricia b. licuanan chairperson commission on higher education republic of the philippines during the 55th session of the

A genome-wide BAC-end sequence survey provides first ... · provides first insights into sweetpotato (Ipomoea batatas (L.) Lam.) genome composition Zengzhi Si, Bing Du, Jinxi Huo,

Dr. Patricia Licuanan Education and Human Capital Development.ppt

James licuanan ppp

1 An Integrated Annotation DB in OntoNotes Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel .

December 2016 January 2017 CvSU Administrative Council ... · The three-day conference was graced by Dr. Patricia B. Licuanan, CHED Chairperson, who served as the keynote speaker.

Calcul2 Programming Language Reference Manual sedwards/classes/2013/w4115-fall/lrms/Calcul^2.pdf · Calcul2 Programming Language Reference Manual Junde Huang Kewei Ge Zhan Shu Jinxi

Lincoln Elementary Mrs. McNeely Principal April 2018 Grade ... · Lincoln Elementary Mrs. McNeely — ... tact Jim Panerio at Simmons Mid-dle School, 725-7937 or Sara Weischedel at

AQUAINT A Hybrid Approach to Answering Biographical/Definitional Questions Ana Licuanan, Scott Miller, Ralph Weischedel, Jinxi Xu 10 June 2003.

Last farewell of Domingo P. Licuanan

AQUAINT Building an Initial Cross-lingual Question Answering System: English Question -> Chinese Collection Ralph Weischedel, Ana Licuanan, Jinxi Xu 6.

Passive UHF RFID Road Tag Antenna Sustainability Jinxi Chen, Yen Bao Le, Chanyoon Park Electrical Engineering Department.

OntoNotes Release 4 - catalog.ldc.upenn.edu · OntoNotes Release 4.0 with OntoNotes DB Tool v. 0.999 beta 2010-12-24 Ralph Weischedel, Sameer Pradhan, Lance Ramshaw, Jeff

1 OntoNotes: A Unified Relational Semantic Representation Sameer Pradhan, Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel.

THE STATE OF PHILIPPINE HIGHER EDUCATIONpeac.org.ph/wp-content/uploads/2017/12/LICUANAN-Philippine... · IMPROVING ACCESS AND EQUITY MEDICAL EDUCATION SCHOLARSHIPS ... Led by CHED