CLEF 2008 Multilingual Question Answering Track
description
Transcript of CLEF 2008 Multilingual Question Answering Track
CLEF 2008
Multilingual Question Answering Track
UNEDAnselmo PeñasValentín SamaÁlvaro Rodrigo
CELCTDanilo Giampiccolo
Pamela Forner
2
QA 2008 Task and Exercises
QA Main task (6th edition) Pilot: QA WSD, English newswire collections with
Word Sense Disambiguation
Answer Validation Exercise – AVE (3rd edition)
QA on Speech Transcripts – QAST (2nd edition)
3
Main Task QA 2008Organizing Committee
CELCT (D. Giampiccolo, P. Forner): Italian UNED (A. Peñas): Spanish U. Groeningen (G. Bosma): Dutch U. Limerick (R. Sutcliff): English DFKI (B. Sacalenau): German ELDA/ELRA (N. Moreau): French Linguateca (P. Rocha): Portuguese Bulgarian Academy of Sciences (P. Osenova): Bulgarian♦ IASI (C. Forascu): Romanian♦ U. Basque Country (I. Alegria): Basque♦ ILSP (P.Prokopidis): Greek
4
Evolution of the Track2003 2004 2005 2006 2007 2008
Target languages
3 7 8 9 10 11
Collections News 1994 +News 1995 +Wikipedia Nov. 2006
Type of questions
200 Factoid
+ Temporal restrictions
+ Definitions
- Type of question
+ Lists
+ Linked questions
+ Closed lists
Supporting information
Doc. Snippet
Pilots and Exercises
Temporal restrictions
Lists
AVEReal Time
WiQA
AVEQAST
AVEQAST
WSDQA
5
200 questions
FACTOID (loc, mea, org, per, tim, cnt, obj , oth)
DEFINITION (per, org, obj, oth)
CLOSED LIST Who were the components of The Beatles? Who were the last three presidents of Italy?
LINKED QUESTIONS Who was called the “Iron-Chancellor”? When was he born? Who was his first wife?
♦ Temporal restrictions by date, by period, by event♦ NIL questions (without known answer in the collection)
6
43 Activated Language Combinations(at least one registered participant)
77
Activated Tasks
MONOLINGUAL CROSS-LINGUAL TOTAL
CLEF 2003 3 5 8
CLEF 2004 6 13 19
CLEF 2005 8 15 23
CLEF 20067 17 24
CLEF 2007 8 29 37
CLEF 2008 10 33 43
8
8
Submitted runs
Submitted runs Monolingual Cross-lingual
CLEF 2003 17 6 11
CLEF 2004 48 (+182%) 20 28
CLEF 2005 67 (+40%) 43 24
CLEF 2006 77 (+15%) 42 35
CLEF 2007 37 (-52%) 20 17
CLEF 2008 51 (+38%) 31 20
9
Participant groups
Newcomers Veterans TOTAL Registered
CLEF 2003 - - 8 -
CLEF 2004 13 518
(+125%)22
CLEF 2005 9 1524
(+33%)27
CLEF 2006 10 2030
(+25%)36
CLEF 2007 8 1422
(-26%)29
CLEF 2008 8 13 21 33
10
List of Participants (random order)
Bulgaria
11
Groups per year and target collection
0
5
10
15
20
25
30
35
40
45
2003 2004 2005 2006 2007 2008
Greek
Finnish
French
Spanish
English
Italian
Ducth
Bulgarian
Basque
Romanian
German
Portuguese
Task Change
Natural selection?
Above 20 groups
12
Groups per target collection
012345678910
2003 2004 2005 2006 2007 2008
English
Spanish
French
Portuguese
German
Romanian
Italian
Bulgarian
Ducth
Basque
Finnish
Greek
13
2008 participation: Comparative evaluation?
Lack from evaluation perspective:
4 languages without comparison between different groups
Breakout session
Language RunsDifferent groups
Portuguese 9 6
Spanish 10 4
English 5 4
German 11 3
Romanian 4 2
Dutch 4 1
Basque 4 1
French 3 1
Bulgarian 1 1
Italian 0 0
Greek 0 0
14
54,0
63,5
29,0
23,7
29,4 27,9
22,8 23,6
35,0
41,8
19,0
49,5
39,535,0
25,0
10,9 13,218,517,0
14,7
69,064,5
41,545,5
0,0
10,0
20,0
30,0
40,0
50,0
60,0
70,0
80,0
2003 2004 2005 2006 2007 2008
Best Bilingual Average Bilingual Best Monolingual Average Monolingual
Results: Best and Average scores
15
Best scores by language34
,01
23,5
24,5 28
45,5
28,6
4
53,1
6
68,9
5
28,1
9
31,2
65,9
6
14
44,5
54
11,5
5
25,5
50,5
30
37,0
19,0
42,5
56,5
0,0
25,5
63,5
22,5
32,5
22,6
3
42,3
330
0
10
20
30
40
50
60
70
80
German
English
Spanish
French
Italian
Dutch
Portuguese
Romanian
Best2004
Best2005
Best2006
Best2007
Best2008
16
37
23 22
25,5
63,5
56,5
0
10
20
30
40
50
60
70
80
DF
KI
HA
GE
N
INA
OE
GR
ON
ING
EN
PR
IBE
RA
M
SY
NA
PS
E
2004 2005 2006 2007 2008
Best scores by participant
17
Results depend on type of questions
Definitions Almost solved for several systems 80%-95%
Factoids 50%-65% for several systems
Temporal restrictions Same level of difficulty as factoids for some systems
Closed lists Still very difficult
Linked questions Still very difficult
Now wikipedia provides more answers
18
Conclusion
Same task as 2007 Same level of participation (slightly
better) 11 target languages (9 with participation) 43 activated subtasks 21 participants 51 runs
Same results (slightly better)
19
Future direction
Less participants per language Poor comparison Change methodology: one task for all
Critics to QA over wikipedia Easier to find questions with IR No user model Change collection
QA proposal for 2009 SC and breakout