Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas,...

Evaluating Answer Validation in multi-stream Question

Answering

Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo

UNED NLP & IR group

nlp.uned.es

The Second International Workshop on Evaluating Information Access (EVIA-NTCIR 2008)

Tokyo, 16 December 2008

nlp.uned.es

Content

1. Context and motivation• Question Answering at CLEF• Answer Validation Exercise at CLEF

2. Evaluating the validation of answers

3. Evaluating the selection of answers• Correct selection• Correct rejection

4. Analysis and discussion

5. Conclusion

nlp.uned.es

Evolution of the CLEF-QA Track

2004 2005 20062007

Target language

s3 7 8 9 10 11

UE Official

Collections

News 1994 +News 1995+ Wikipedia Nov. 2006

JRC-Acquis

Type of questions

200 Factoid

+ Temporal

restrictions

+ Definitions

- Type of

question

+ Lists

+ Linked questions

+ Closed lists

FactoidDefinition

MotivePurposeProcedur

Supporting

information

Document SnippetParagrap

Pilots and

Exercises

Temporal restrictio

AVEReal TimeWiQA

AVEQAST

AVEQASTWSDQA

GikiCLEFQAST

nlp.uned.es

Evolution of Results

2003 - 2006 (Spanish)

OverallBest

result<60%

Definitions

Best result>80% NOT

IR approach

nlp.uned.es

Pipeline Upper Bounds

Use Answer Validation to break the pipeline

Question

Answer

Questionanalysis

PassageRetrieval

AnswerExtraction

AnswerRanking

1.00.8 0.8 0.64x x =

Not enough evidence

nlp.uned.es

Results in CLEF-QA 2006 (Spanish)

Perfect combination

Best system 52,5%

Best with ORGANIZATION

Best with PERSON

Best with TIME

nlp.uned.es

Collaborative architectures

Diferent systems response better different types of questions

• Specialisation• Collaboration

QA sys1

QA sys2

QA sys3

QA sysn

Question

Candidate answers

Answer Validation &

Selection

Answer

Evaluation Framwork

nlp.uned.es

Collaborative architectures

How to select the good answer?• Redundancy• Voting• Confidence score• Performance history

Why not deeper analysis?

nlp.uned.es

Answer Validation Exercise (AVE)

Objective

Validate the correctness of the answers

Given by real QA systems...

...the participants at CLEF QA

nlp.uned.es

Answer Validation Exercise (AVE)

QuestionAnswering

QuestionCandidate answer

Supporting Text

Textual Entailment

Answer is not correct or not enough evidence

Automatic HypothesisGeneration

QuestionHypothesis

Answer is correct

AVE 2006

AVE 2007 - 2008

Answer Validation

nlp.uned.es

Techniques in AVE 2007

Overview AVE 2007Generates hypotheses 6

Wordnet 3

Chunking 3

n-grams, longest common Subsequences

Phrase transformations 2

Num. expressions 6

Temp. expressions 4

Coreference resolution 2

Dependency analysis 3

Syntactic similarity 4

Functions (sub, obj, etc) 3

Syntactic transformations 1

Word-sense disambiguation 2

Semantic parsing 4

Semantic role labeling 2

First order logic representation

Theorem prover 3

Semantic similarity 2

nlp.uned.es

Evaluation linked to main QA task

Question

Answering

Systems’ answers

Systems’ Supporting Texts

Answer

Validation

Exercise

Questions

Systems’ Validation (YES, NO)

Human Judgements (R,W,X,U)

QA Track results

Mapping(YES, NO)

Evaluation

AVE Track results

Reuse human assessments

nlp.uned.es

Content

1. Context and motivation

3. Evaluating the selection of answers

5. Conclusion

nlp.uned.es

QA sys1

QA sys2

QA sys3

QA sysn

Question

Candidate answers

Answer Validation &

Selection

Answer

Participant systems in aCLEF – QA

Evaluation of AnswerValidation & Selection

Evaluation Proposed

nlp.uned.es

Collections

<q id="116" lang="EN"><q_str> What is Zanussi? </q_str><a id="116_1" value="">

<a_str> was an Italian producer of home appliances </a_str><t_str doc="Zanussi">Zanussi For the Polish film director, see Krzysztof Zanussi. For the hot-air balloon, see Zanussi (balloon). Zanussi was an Italian producer of home appliances that in 1984 was bought</t_str>

</a><a id="116_2" value="">

<a_str> who had also been in Cassibile since August 31 </a_str><t_str doc="en/p29/2998260.xml">Only after the signing had taken place was Giuseppe Castellano informed of the additional clauses that had been presented by general Ronald Campbell to another Italian general, Zanussi, who had also been in Cassibile since August 31.</t_str>

</a><a id="116_4" value="">

<a_str> 3 </a_str><t_str doc="1618911.xml">(1985) 3 Out of 5 Live (1985) What Is This?</t_str>

</a></q>

nlp.uned.es

Evaluating the Validation

ValidationDecide if each candidate answer is correct or not

• YES | NO

Not balanced collections

Approach: Detect if there is enough evidence to accept an answer

Measures: Precision, recall and F over correct answers

Baseline system: Accept all answers

nlp.uned.es

Evaluating the Validation

nrecall

Correct Answer

Incorrect

Answer

AnswerAccepte

dnCA nWA

AnswerRejecte

dnCR nWR

nprecision

precisionrecall

precisionrecallF

nlp.uned.es

Evaluating the Selection

Quantify the potential gain of Answer Validation in Question Answering

• Compare AV systems with QA systems

Develop measures more comparable to QA accuracy

questions

correctlyansweredquestions

naccuracyqa ___

nlp.uned.es

Evaluating the selection

Given a question with several candidate answersTwo options:

Selection Select an answer ≡ try to answer the question

• Correct selection: answer was correct• Incorrect selection: answer was incorrect

Rejection Reject all candidate answers ≡ leave question

unanswered• Correct rejection: All candidate answers were incorrect• Incorrect rejection: Not all candidate answers were

incorrect

nlp.uned.es

n questionsn= nCA + nWA + nWS + nWR + nCR

Question with Correct Answer

Question without

Correct Answer

Question Answered Correctly(One Answer Selected)

Question Answered Incorrectly

nWA nWS

Question Unanswered(All Answers Rejected)

nWR nCR

naccuracyqa CA_

100__% recallselectionbest

WRWACA

nrecall

nprecision

Not comparable to qa_accuracy

nlp.uned.es

n questionsn= nCA + nWA + nWS + nWR + nCR

Question with Correct Answer

Question without

Correct Answer

Question Answered Correctly(One Answer Selected)

Question Answered Incorrectly

nWA nWS

Question Unanswered(All Answers Rejected)

nWR nCR

naccuracyqa CA_

naccuracyrej CR_

nlp.uned.es

naccuracyqa CA_

naccuracyrej CR_

naccuracy CRCA

Rewards rejection(not balanced

Interpretation for QA: all questions correctly rejected by AV will be answered

correctly

nlp.uned.es

nestimated CA

CRCACACRCA

naccuracyqa CA_

naccuracyrej CR_

Interpretation for QA: questions correctly rejected by AV will be answered correctly in qa_accuracy

proportion

nlp.uned.es

Content

1. Context and motivation

3. Evaluating the selection of answers

5. Conclusion

nlp.uned.es

Analysis and discussion(AVE 2007 English)

Validation

Selection

QA_acc correlated to R

“Estimated” adjusts it

nlp.uned.es

Multi-stream QA performance (AVE 2007 English)

nlp.uned.es

Analysis and discussion (AVE 2007 Spanish)

Validation

Selection

Comparing AV & QA

nlp.uned.es

Conclusion

Evaluation framework for Answer Validation & Selection systems

Measures that reward not only Correct Selection but also Correct Rejection

• Promote improvement of QA systems

Allow comparison between AV and QA systems• In what conditions multi-stream perform better• Room for improvement just using multi-stream-QA• Potential gain that AV systems can provide to QA

Thanks!

http://nlp.uned.es/clef-qa/ave

http://www.clef-campaign.org

Acknowledgement: EU project T-CLEF (ICT-1-4-1 215231)

Evaluating Answer Validation in multi-stream Question

Answering

Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo

UNED NLP & IR group

nlp.uned.es

The Second International Workshop on Evaluating Information Access (EVIA-NTCIR 2008)

Tokyo, 16 December 2008

Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas,...

Documents

Transcript of Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas,...

files.constantcontact.comfiles.constantcontact.com/.../6fc25bd3-6b07-4f56-af1d-6… · Web viewThe name Tempranillo is derived from the word ”temprano”, ... Muscat, Verdejo

Instituto Nacional de Rehabilitación Maricela Verdejo Silva PhD.

Twinny Cra Peñas Spain - 1st Week

E5 - Estadistica Empresarial - Casas Sanchez - Santos Peñas

The EIGRP Protocol in Maudemaude.cs.uiuc.edu/papers/pdf/Riesco-Verdejo-EIGRP-tr.pdf · The EIGRP Protocol in Maude Adri an Riesco and Alberto Verdejo Technical Report 3/07 Departamento

Using Social Network Analysis Techniques to ... - nlp.uned.es

Pascal DELBECK...V Dulce de Invierno, Javier Sanz, Verdejo 2016 45 € Toro Dom. del Bendito, Antojo Rubio Nº3, Verdejo, Malvasía, Palomino 60 € Dom. del Bendito, La Chispa Negra,

Symbiosis of Evolutionary Techniques and Statistical ...nlp.uned.es/~lurdes/araujo/TEC695.pdf · of NLP, and the development of the system is highly simplify by using the statistical

Evaluating Hierarchical Clustering of Search Results Departamento de Lenguajes y Sistemas Informáticos UNED, Spain Juan Cigarrán Anselmo Peñas Julio Gonzalo.

Peñas Albas Avda. - · PDF fileal Mercadona Los Hueros c/ Generalife, esquina c/ Medina Azahara ... a Reyes Magos Peñas Albas Avda. España esquina c/ Escocia. Junto al Centro Comercial

Bonjovi Verdejo - Hes the King of Kings.pdf

Real Madrid reglamento peñas BN · 2021. 1. 10. · 1 R EGLAMENTO DE PEÑAS DEL REAL MADRID Aprobado por la Junta Directiva del Real Madrid el día 14 de julio de 2016 REGLAMENTO

A comparison of Extrinsic Clustering Evaluation Metrics ...nlp.uned.es/docs/amigo2007a.pdf · A comparison of Extrinsic Clustering Evaluation Metrics based on Formal Constraints Enrique

Evaluating Question Answering Validation Anselmo Peñas (and Alvaro Rodrigo) NLP & IR group UNED nlp.uned.es Information Science Institute Marina del Rey,

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,

Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.

Challenge Organizers - BIUu.cs.biu.ac.il/~nlp/RTE1/Proceedings/rte05_proceedings.pdf · Challenge Organizers Ido Dagan (Bar Ilan University, Israel) ... Jesús Herrera, Anselmo Peñas,

CLEF 2008 Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner.

CVN - Daniel Garijo Verdejo

Discovering related scienti˜c literature beyond˚semantic ...nlp.uned.es/~juaner/papers/jmromo19-scim.pdf · 15 J97-3002 P01-1067 0.24 Highlyrelated 16 W96-0213 J93-2004 0.30 Related