CLEF 2008 Workshop Aarhus, September 17, 2008 ELDA 1 Overview of QAST 2008 - Question Answering on...

12
CLEF 2008 Workshop Aarhus, September 17, 2008 ELDA ELDA 1 Overview of QAST 2008 - Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L. Lamel, S. Rosset (2) , N. Moreau, D. Mostefa (3) (1) UPC, Spain (2) LIMSI, France (3) ELDA, France QAST Website : http://www.lsi.upc.edu/~qast/

description

CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 3 Objectives of QAST Development of robust QA for speech transcripts -Measure loss due to ASR inaccuracies manual transcriptions, automatic transcriptions -Measure loss at different ASR word error rates -Test with different kinds of speech spontaneous speech, prepared speech -Development of QA for languages other than English English, French, Spanish

Transcript of CLEF 2008 Workshop Aarhus, September 17, 2008 ELDA 1 Overview of QAST 2008 - Question Answering on...

Page 1: CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST 2008 - Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.

CLEF 2008 WorkshopAarhus, September 17, 2008

ELDAELDA1

Overview of QAST 2008

- Question Answering on Speech Transcriptions -

J. Turmo, P. Comas (1), L. Lamel, S. Rosset (2) , N. Moreau, D. Mostefa (3)

(1) UPC, Spain (2) LIMSI, France (3) ELDA, France

QAST Website : http://www.lsi.upc.edu/~qast/

Page 2: CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST 2008 - Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.

CLEF 2008 WorkshopAarhus, September 17, 2008

ELDAELDA2

Outline

1. Objectives2. Description of the tasks3. Participants4. Results5. Future work

Page 3: CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST 2008 - Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.

CLEF 2008 WorkshopAarhus, September 17, 2008

ELDAELDA3

Objectives of QAST 2008

- Development of robust QA for speech transcripts

- Measure loss due to ASR inaccuraciesmanual transcriptions, automatic transcriptions

- Measure loss at different ASR word error rates

- Test with different kinds of speechspontaneous speech, prepared speech

- Development of QA for languages other than EnglishEnglish, French, Spanish

Page 4: CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST 2008 - Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.

CLEF 2008 WorkshopAarhus, September 17, 2008

ELDAELDA4

QAST 2008 Organization

Task jointly organized by :

- UPC, Spain (Coordinator)J. Turmo, P. Comas

- ELDA, FranceN. Moreau, D. Mostefa

- LIMSI-CNRS, FranceS. Rosset, L. Lamel

Page 5: CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST 2008 - Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.

CLEF 2008 WorkshopAarhus, September 17, 2008

ELDAELDA5

Evaluation Data

Corpus Lang. Description Tasks WERCHIL

QAST 2007English Lectures (~25h) T1(a): Manual transcriptions -

T1(b): ASR transcriptions 20%

AMIQAST 2007

English Meetings (~100h) T2(a): Manual transcriptions -

T2(b): ASR transcriptions 38%

ESTER French Broadcast News (~10h)

T3(a): Manual transcriptions -

T3(b): ASR transcriptions 11.9% / 23.9% / 35.4%

EPPS-EN

English Sessions European Parliament (~3h)

T4(a): Manual transcriptions -

T4(b): ASR transcriptions 10.6% / 14.0% / 24.1%

EPPS-ES

Spanish Sessions European Parliament (~3h)

T5(a): manual transcriptions -

T5(b): ASR transcriptions 11.5% / 12.7% / 13.7%

Page 6: CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST 2008 - Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.

CLEF 2008 WorkshopAarhus, September 17, 2008

ELDAELDA6

Development set Evaluation set

Task Data # questions Data # questionsT1 (CHIL, English) 10 seminars 50 15 seminars 100

T2 (AMI, English) 50 meetings 50 118 meetings 100

T3 (ESTER, French) 6 shows 50 12 shows 100

T4 (EPPS, English) 3 sessions 50 3 sessions 100

T5 (EPPS, Spanish) 1 session 50 5 sessions 100

Questions

• Factual questions: ~75%Expected answers = named entities (10 types: person, location, organization, language, system, measure, time, color, shape, material)

• Definition questions: ~25%4 types of answers: person, organization, object, other

• ‘NIL’ questions: ~10%

Page 7: CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST 2008 - Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.

CLEF 2008 WorkshopAarhus, September 17, 2008

ELDAELDA7

• Participants could submit up to:– 2 submissions per task (and per WER)– 5 answers per question

• Answers for ‘manual transcriptions’ tasks:Answer_string + Doc_ID

• Answers for ‘automatic transcriptions’ tasks:Answer_string + Doc_ID + Time_start + Time_end

Submissions

Page 8: CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST 2008 - Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.

CLEF 2008 WorkshopAarhus, September 17, 2008

ELDAELDA8

• Four possible judgments (as in QA@CLEF):Correct / Incorrect / Inexact / Unsupported

• ‘Manual transcriptions’ tasks:Manual assessment with the QASTLE interface

• ‘Automatic’ transcriptions tasksAutomatic assessment (script) + manual check

• 2 metrics:– Mean Reciprocal Rank (MRR)

measures how well right answers are ranked on average– Accuracy

fraction of correct answers ranked in the first position

Assessments

Page 9: CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST 2008 - Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.

CLEF 2008 WorkshopAarhus, September 17, 2008

ELDAELDA9

49 submissions from 5 participants:

Participants

T1a T1b T2a T2b T3a T3b T4a T4b T5a T5b

2 - - - - - 2 - - -

- - - - - - 1 2 - -

1 1 1 1 2 3 1 3 2 3

- - - - - - 1 3 - -

1 2 1 2 - - 1 6 1 6

4 3 2 3 2 3 6 14 3 9

Univ. Chemnitz (CUT)

INAOE

LIMSI

Univ. Alicante (UA)

UPC

TOTAL:

Page 10: CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST 2008 - Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.

CLEF 2008 WorkshopAarhus, September 17, 2008

ELDAELDA10

Best results for manual transcriptions

TaskT1aT2aT3aT4aT5a

FactualMRR Acc(%)0.53 47.40.47 37.80.50 45.30.44 40.00.32 29.3

DefinitionalMRR Acc(%)0.18 18.20.22 19.20.47 44.00.16 16.00.44 36.0

AllMRR Acc(%)0.45 41.00.40 33.00.49 45.00.37 34.00.35 31.0

Page 11: CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST 2008 - Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.

CLEF 2008 WorkshopAarhus, September 17, 2008

ELDAELDA11

Best results for ASR transcriptions

Task WERT1b 20.0%T2b 38.0%T3b 11.9%

23.9%35.4%

T4b 10.6%14.0%24.1%

T5b 11.5%12.7%13.7%

AllMRR Acc(%)0.34 31.0

0.20 18.0

0.45 41.0

0.30 25.0

0.24 21.0

0.33 30.0

0.24 20.0

0.23 19.0

0.26 24.0

0.23 20.0

0.25 23.0

All (manual)

MRR Acc(%)

0.45 41.0

0.40 33.0

0.49 45.0

0.37 34.0

0.35 31.0

Page 12: CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST 2008 - Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.

CLEF 2008 WorkshopAarhus, September 17, 2008

ELDAELDA12

• 5 participants (as in 2007)• 4 different countries (vs. 5 in 2007)

Germany, Spain, France, Mexico• 49 submitted runs (vs. 28 runs in 2007)• Loss in accuracy with ASR transcribed speech

(performance falls when WER rises)• QAST 2009: Written & Oral Questions...

Conclusion