Post on 22-Feb-2016
description
1
ResPubliQA 2010:QA on European
LegislationAnselmo Peñas, UNED, Spain
Pamela Forner, CELCT, ItalyRichard Sutcliffe, U. Limerick, Ireland
Alvaro Rodrigo, UNED, Spain
http://celct.isti.cnr.it/ResPubliQA/
ResPubliQA 2010, 22 September, Padua, Italy 2
Outline
The Multiple Language Question Answering Track at CLEF – a bit of History
ResPubliQA this year– What is new
Participation, Runs and Languages Assessment and Metrics Results Conclusions
ResPubliQA 2010, 22 September, Padua, Italy 3
Multiple Language Question Answering at CLEF
Era I: 2003-2006
Era II: 2007-2008
Era III: 2009-2010
Ungrouped mainly factoid questions asked against monolingual newspapers; Exact answers returned
Grouped questions asked against newspapers and Wikipedia; Exact answers returned
ResPubliQA - Ungrouped questions against multilingual parallel-aligned EU legislative documents; Passages returned
Started in 2003: eighth year
ResPubliQA 2010, 22 September, Padua, Italy 4
ResPubliQA 2010 – Second Year Key points:– same set of questions in all languages– same document collections: parallel aligned documents
Same objectives:– to move towards a domain of potential users– to allow the direct comparison of performances across
languages– to allow QA technologies to be evaluated against IR
approaches– to promote use of Validation technologies
But also some novelties…
ResPubliQA 2010, 22 September, Padua, Italy 5
What’s new
1. New Task (Answer Selection)2. New document collection (EuroParl) 3. New question types4. Automatic Evaluation
ResPubliQA 2010, 22 September, Padua, Italy 6
The Tasks Paragraph Selection (PS) – to extract a relevant paragraph of text
that satisfies completely the information need expressed by a natural language question
Answer Selection (AS)– to demarcate the shorter string of text
corresponding to the exact answer supported by the entire paragraph
NEW
7
The Collections Subset of JRC-Acquis (10,700 docs per lang)– EU treaties, EU legislation, agreements and resolutions– Between 1950 and 2006– Parallel-aligned at the doc level (not always at paragraph)– XML-TEI.2 encoding
Small subset of EUROPARL (~ 150 docs per lang)– Proceedings of the European Parliament
• translations into Romanian from January 2009• Debates (CRE) from 2009 and Texts Adopted (TA) from 2007
– Parallel-aligned at the doc level (not always at paragraph)– XML encoding
ResPubliQA 2010, 22 September, Padua, Italy
NEW
ResPubliQA 2010, 22 September, Padua, Italy 8
EuroParl Collection is compatible with Acquis domain allows to widen the scope of the
questions
Unfortunately– small number of texts • documents are not fully translatedThe specific fragments of JRC-Acquis and Europarl
used by ResPubliQA is available at http://celct.isti.cnr.it/ResPubliQA/Downloads
ResPubliQA 2010, 22 September, Padua, Italy 9
Questions
two new question categories:– OPINION What did the Council think about the terrorist attacks on
London?– OTHERWhat is the e-Content program about?
Reason and Purpose categories merged togetherWhy was Perwiz Kambakhsh sentenced to death?
And also Factoid, Definition, Procedure
ResPubliQA 2010, 22 September, Padua, Italy 10
ResPubliQA Campaigns
TaskRegistere
dgroups
Participant groups
Submitted Runs
Organizing people
ResPubliQA 2009 20 11 28 + 16
(baseline runs) 9
ResPubliQA 2010 24 13
49 (42 PS and 7
AS)
6 (+ 6 additional translators/assesso
rs)More participants and more submissions
ResPubliQA 2010, 22 September, Padua, Italy 11
ResPubliQA 2010 Participants
System name Team Reference
bpac SZTAKI, HUNGARY Nemeskey
dict
Dhirubhai Ambani Institute of Information and Communication Technology, INDIA Sabnani et al
elix University of Basque Country, SPAIN Agirre et alicia RACAI, ROMANIA Ion et aliles LIMSI-CNRS, FRANCE Tannier et alju_c Jadavpur University, INDIA Pakray et alloga University Koblenz, GERMANY Glöckner and Pelzernlel U. Politecnica Valencia, SPAIN Correa et alprib Priberam, PORTUGAL -uaic Al.I.Cuza\ University of Iasi, ROMANIA Iftene et al
uc3mUniversidad Carlos III de Madrid, SPAIN Vicente-Díez et al
uiir University of Indonesia, INDONESIA Toba et aluned UNED, SPAIN Rodrigo et al
13 participants 8 countries
4 new participants
ResPubliQA 2010, 22 September, Padua, Italy 12
Submissions by Task and Language
Target language Source
languages
DE EN ES FR IT PT RO TotalDE 4 (4,0) 4 (4,0)EN 19
(16,3)2
(2,0)21
(18,3)ES 7
(6,1)7 (6,1)
EU 2 (2,0) 2 (2,0)FR 7 (5,2) 7 (5,2)IT 3 (2,1) 3 (2,1)PT 1 (1,0) 1 (1,0)RO 4
(4,0)4 (4,0)
Total 4 (4,0)
21 (18,3)
7 (6,1)
7 (5,2)
3 (2,1)
1 (1,0)
6 (6,0)
49 (42,7)
ResPubliQA 2010, 22 September, Padua, Italy 13
System Output Two options:– Give an answer (paragraph or exact answer)– Return NOA as response = no answer is given
The system is not confident about the correctness of its answer
Objective:– avoid to return an incorrect answer– reduce only the portion of wrong answers
ResPubliQA 2010, 22 September, Padua, Italy 14
Evaluation Measure
)(11@nnnn
nc R
UR
nR: number of questions correctly answerednU: number of questions unansweredn: total number of questions (200 this year)
If nU = 0 then c@1=nR/n Accuracy
ResPubliQA 2010, 22 September, Padua, Italy 15
AssessmentTwo steps:1) Automatic evaluationo responses automatically compared against the Gold
Standard manually produced – answers that exactly match with the GoldStandard, are
given the correct value (R) – correctness of a response: exact match of Document
identifier, Paragraph identifier, and the text retrieved by the system with respect to those in the GoldStandard
2) Manual assessmento Non-matching paragraphs/ answers judged by human
assessorso anonymous and simultaneous for the same question
31% of the answers automatically marked as correct
ResPubliQA 2010, 22 September, Padua, Italy 16
Assessment for Paragraph Selection (PS)
binary assessment: – Right (R) – Wrong (W)
NOA answers:– automatically filtered and marked as U (Unanswered)– discarded candidate answers were also evaluated
• NoA R: NoA, but the candidate answer was correct• NoA W: NoA, and the candidate answer was incorrect• Noa Empty: NoA and no candidate answer was given
evaluators were guided by the initial “gold” paragraph– only a hint
ResPubliQA 2010, 22 September, Padua, Italy 17
Assessment for Answer Selection (AS)
R (Right): the answer-string consists of an exact and correct answer, supported by the returned paragraph;
X (ineXact): the answer-string contains either part of a correct answer present in the returned paragraph or it contains all the correct answer plus unnecessary additional text;
M (Missed): the answer-string does not contain a correct answer even in part but the returned paragraph in fact does contain a correct answer;
W (Wrong): the answer-string does not contain a correct answer and moreover the returned paragraph does not contain it either; or it contains an unsupported answer
ResPubliQA 2010, 22 September, Padua, Italy 18
Monolingual Results for PS
system DE EN ES FR IT PT ROCombination 0.75 0.94 0.82 0.74 0.73 0.56 0.70
uiir101 0.73dict102 0.68bpac102 0.68loga102 0.62loga101 0.59prib101 0.56nlel101 0.49 0.65 0.56 0.55 0.63bpac101 0.65elix101 0.65
IR baseline (uned)
0.65 0.54
uned102 0.54uc3m102 0.52uc3m101 0.51dict101 0.64uiir102 0.64
uned101 0.63elix102 0.62nlel102 0.59 0.62 0.20 0.55 0.53ju_c101 0.50iles102 0.48 0.36uaic102 0.46 0.24 0.55uaic101 0.43 0.30 0.52icia102 0.49
ResPubliQA 2010, 22 September, Padua, Italy 19
Improvement in the Performance
BEST AVERAGEResPubliQA 2009 0.68 0.39ResPubliQA 2010 0.73 0.54
Monolingual PS Task:
2010 Collections BEST AVERAGEJRC-Acquis 0.71 0.53EuroParl 0.77 0.55
ResPubliQA 2010, 22 September, Padua, Italy 20
Cross-language Results for PS
system DE EN ES FR IT PT RO
elix102euen 0.36
elix101euen 0.33
icia101enro 0.29icia102enro 0.29In comparison to ResPubliQA 2009:
– More cross-language runs (+ 2) – Improvement in the best performance: from c@1 0.18 to 0.36
ResPubliQA 2010, 22 September, Padua, Italy 21
Results for the AS Task
System c@1 #R #W
#M
#X #NoA
#NoA R
#NoA W
#NoA M
#NoA X
#NoA empty
combination 0.30 60 140
0 0 0 0 0 0 0 0
ju_c101ASenen
0.26 31 12 10 8 115 0 40 24 0
75iles101ASen
en
0.09 17 124
6 44 9 0 0 0 0
9iles101ASfrfr 0.08 14 12
87 36 15 0 0 0 0
15nlel101ASen
en
0.07 10 97 20 6 67 0 0 0 0
67nlel101ASes
es
0.06 12 138
21 1 28 0 0 0 0
28nlel101ASitit 0.03 6 13
918 7 30 0 0 0 0
30nlel101ASfrfr 0.02 4 13
213 11 40 0 0 0 0
40
ResPubliQA 2010, 22 September, Padua, Italy 22
Conclusions
Successful continuation of ResPubliQA 2009
AS task: few groups and poor results Overall improvement of results New document collection and new
question types c@1 evaluation metric encourages the
use of validation module
ResPubliQA 2010, 22 September, Padua, Italy 23
More on System Analyses and Approaches
MLQA’10 Workshop on Wednesday 14:30 – 18:00
24
ResPubliQA 2010:QA on European
LegislationThank you!