D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife...

12
D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe Documents and Linguistic Technology Group Univeristy of Limerick

Transcript of D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife...

Page 1: D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe Documents and.

DLT

Cross-Language French-English Question Answering using the DLT System at CLEF 2003

Aoife O’Gorman

Igal Gabbay

Richard F.E. Sutcliffe

Documents and Linguistic Technology Group

Univeristy of Limerick

Page 2: D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe Documents and.

DLT

Outline

• Objectives

• System architecture

• Key components

• Task performance evaluation

• Findings

Page 3: D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe Documents and.

DLT

Objectives

• Learn the issues involved in multilingual QA

• Combine the components of our existing English and French monolingual QA systems

Page 4: D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe Documents and.

DLT

System architecture

Query classification

Query translation (Google) & re-formulation

Text retrieval (dtSearch)

Named entity recognition

Answer entity selection

Page 5: D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe Documents and.

DLT

Query classification

• Categories based on translated TREC 2002 queries

• Keyword based classification what_country

De quel pays le jeu de croquet est-il originaire

De quel nation..?

• Unknown

Page 6: D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe Documents and.

DLT

Query translation and re-formulation

• Submitting the French query in its original form on the Google Language Tools page

• Tokenisation

• Selective removal of stopwords

• Example:

Qui a été élu gouverneur de la California?

Who was elected governor of California?

[ ‘elected’, ‘governor’, ‘California’]

Page 7: D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe Documents and.

DLT

Text Retrieval: Submitting queries to dtSearch

• dtSeach indexed the doc collection based on <DOC> tags

• Inserting a w/1 connector between two capitalised words

• Submitting untranslated quotations for exact match

• Inserting an AND connnector between all other terms (Boolean)

• Limited verb expansion based on common verbs used in TREC questions

Page 8: D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe Documents and.

DLT

Named Entity Recoginition:General Names

• Captures any instances of general names in cases where we are not sure what to look for.

• A general_name is defined in our system to be up to five capitalised terms interspersed with optional prepositions.

• Examples: Limerick City

University of Limerick

Page 9: D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe Documents and.

DLT

Answer entity selection

• highest_scoring

What year was Robert Frost born?

in entity(date,[1,8,7,5],[[],[],[], [], [1,8,7,5]],[],[],[]), poet target([Robert]) target(Frost]) was target([born]) in San Francisco

• most_frequent

When did “The Simpsons” first appear on television?

When target([The]) target([Simpsons]) was target(first]) broadcast in entity(date[1,9,8,9,,[[],[],[],[],[],[1,9,8,9],[],[],])

Page 10: D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe Documents and.

DLT

Task performance evaluation

Group Run Name MRR No. of Q. with at least one right

answer

NIL Questions

strict lenient strict lenient returned correct

CS-CMU

lumoex031bf .153 .170 38 42 92 8

lumoex032bf .131 .149 31 35 91 7

DLTG dltgex031bf .115 .120 23 24 119 10

dltgex032bf .110 .115 22 23 119 10

RALI udemex032bf .140 .160 38 42 3 1

Adapted from Magnini (2003)

Page 11: D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe Documents and.

DLT

Findings

• Query classification: unexpected formulation of queries, too few categories

• Translation: problems with names, titles,

- We need better query-specific translation

- Localisation of names/titles

- Possibly limit translation to search terms

An interface could be built for the parser to enable it to be tested by an end user

• Error types 6-13 could be investigated and the parser extended to handle some of them

• Practical studies in the use of STS could be carried out

Page 12: D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe Documents and.

DLT

Findings

• Text retrieval: allow relaxation and more sophisticated expansion of search queries

• Named entity recognition: find better alternatives to answer questions of type Unknown

• Answer entity selection: take into account distance and density of query terms

• Usability issue: answers may need to be translated back to French