LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation...

16
LREC 2008 2 Marrakech, May 29, 2008 Outline 1. Motivations 2. Objectives 3. QAST 2007 1. Tasks 2. Participants 3. Results 4. QAST 2008 5. Conclusion

Transcript of LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation...

Page 1: LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

LREC 2008 2 Marrakech, May 29, 2008

Outline

1. Motivations

2. Objectives

3. QAST 20071. Tasks

2. Participants

3. Results

4. QAST 2008

5. Conclusion

Page 2: LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

LREC 2008 3 Marrakech, May 29, 2008

QAST Organization

Evaluation campaign is jointly organized by :

- UPC, Spain (J. Turmo, P. Comas)

Coordinator

- ELDA, France (N. Moreau, C. Ayache, D. Mostefa)

- LIMSI, France (S. Rosset, L. Lamel)

Page 3: LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

LREC 2008 4 Marrakech, May 29, 2008

Motivations

• Much of human interaction is via spoken language• QA research developed techniques for written texts

with correct syntactic and semantic structures• Spoken data is very different from textual data

– Speech phenomena, false starts, speech corrections, truncated words, etc

– Grammatical structure of spontenous speech is very particular

– No punctuation and no capitalization– For meetings, interaction creates run-on sentences where

the distance between the first part and the last one can be very long

Page 4: LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

LREC 2008 5 Marrakech, May 29, 2008

Objectives

• In general, motivating and driving the design of novel and robust factual QA architectures for automatic speech transcriptions.

• Comparing the performances systems dealing with both types of transcriptions and both types of questions (fatual and definitional).

• Measuring the loss of each system due to ASR.• Measuring the loss of each system due to the ASR

output degradation.

Page 5: LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

LREC 2008 6 Marrakech, May 29, 2008

• Corpus:– The CHIL corpus:

• 25 seminars of 1 hour each• Spontenous speech• English spoken by non native speakers• Domain of lectures: Speech and language processing• Manual transcription done by ELDA• Automatic transcription provided by LIMSI

– The AMI corpus:• 168 meetings (100 hours)• Spontenous speech• English• Domain of meetings: Design of television remote control• Manual transcription done by AMI• Automatic transcription provided by AMI

• 4 tasks:– T1 : QA in manual transcriptions of lectures– T2 : QA in automatic transcriptions of lectures– T3 : QA in manual transcriptions of meetings– T4 : QA in automatic transcriptions of meetings

QAST 2007: Resources and tasks

Page 6: LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

LREC 2008 7 Marrakech, May 29, 2008

For each task, 2 sets of questions were provided:• Development set:

– Lectures: 10 seminars, 50 questions– Meetings: 50 meetings, 50 questions

• Evaluation set:– Lectures: 15 seminars, 100 questions– Meetings: 118 meetings, 100 questions

• Factual questions. No definition questions.

• Expected answers = named entities.List of NEs: person, location, organization, language, system/method, measure, time, color, shape, material.

QAST 2007 : development and evaluation

Page 7: LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

LREC 2008 8 Marrakech, May 29, 2008

• Assessors used QASTLE, an evaluation tool developed by ELDA, to evaluate the data.

QAST 2007: Human judgment

Page 8: LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

LREC 2008 9 Marrakech, May 29, 2008

• Four possible judgments:– Correct – Incorrect – Non-Exact– Unsupported

• Two metrics were used:– Mean Reciprocal Rank (MRR):

measures how well ranked is a right answer.– Accuracy:

the fraction of correct answers ranked in the first position in the list of 5 possible answers

• Participants could submit up to 2 submissions per task and 5 answers per question.

Task: Scoring

Page 9: LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

LREC 2008 10 Marrakech, May 29, 2008

• Five teams submitted results for one or more QAST tasks:

– CLT, Center for Language Technology, Australia ;

– DFKI, Germany ;

– LIMSI, Laboratoire d’Informatique et de Mécanique des Sciences de l’Ingénieur, France ;

– TOKYO, Tokyo Institute of Technology, Japan ;

– UPC, Universitat Politècnica de Catalunya, Spain.

• In total, 28 submission files were evaluated:

Participants

CHIL Corpus AMI Corpus

T1 T2 T3 T4

8 submissions 9 submissions 5 submissions 6 submissions

Page 10: LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

LREC 2008 11 Marrakech, May 29, 2008

Results for CHIL lectures (T1 and T2)

SystemManual Automatic

MRR Accuracy MRR Accuracy

S1 0.09 0.06 0.06 0.03

S2 0.09 0.05 0.05 0.02

S3 0.17 0.15 0.09 0.09

S4 0.37 0.32 0.23 0.20

S5 0.46 0.39 0.24 0.21

S6 0.19 0.14 0.12 0.08

S7 0.20 0.14 0.12 0.08

S8 0.53 0.51 0.37 0.36

S9     0.25 0.24

Page 11: LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

LREC 2008 12 Marrakech, May 29, 2008

Results for AMI meetings (T3 and T4)

System

Manual Automatic

MRR Accuracy MRR Accuracy

S1 0.23 0.16 0.10 0.06

S2 0.25 0.20 0.13 0.08

S3 0.28 0.25 0.19 0.18

S4 0.31 0.25 0.19 0.17

S5 0.26 0.25 0.22 0.21

S6     0.15 0.13

Page 12: LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

LREC 2008 13 Marrakech, May 29, 2008

QAST 2008• Extension of QAST 2007:

– 3 languages: French, English, Spanish– 4 domains: Broadcast news, Parliament speeches, Lectures, Meetings– Different level of WERs (10%, 20% and 30%)– Factual and Definition questions

• 5 corpora– CHIL lectures– AMI meetings – TC-STAR05 EPPS English corpus – TC-STAR05 EPPS Spanish corpus– ESTER French broadcast news corpus

• Evaluation from June 15-June 30

Page 13: LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

LREC 2008 14 Marrakech, May 29, 2008

QAST 2008 tasks

• T1a: Question Answering in manual transcriptions of lectures (CHIL corpus)• T1b: Question Answering in automatic transcriptions of lectures (CHIL corpus)• T2a: Question Answering in manual transcriptions of meetings (AMI corpus)• T2b: Question Answering in automatic transcriptions of meetings (AMI corpus)• T3a: Question Answering in manual transcriptions of broadcast news for French

(ESTER corpus)• T3b: Question Answering in automatic transcriptions of broadcast news for French

(ESTER corpus)• T4a: Question Answering in manual transcriptions of European Parliament Plenary

sessions in English (EPPS English corpus)• T4b: Question Answering in automatic transcriptions of European Parliament

Plenary sessions in English (EPPS English corpus)• T5a: Question Answering in manual transcriptions of European Parliament Plenary

sessions in Spanish (EPPS Spanish corpus)• T5b: Question Answering in automatic transcriptions of European Parliament

Plenary in Spanish (EPPS Spanish corpus)

Page 14: LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

LREC 2008 16 Marrakech, May 29, 2008

• We presented the Question Answering on Speech Transcripts evaluation campaigns framework

• QAST 2007– 5 participants from 5 different countries (France,

Germany, Spain, Australia and Japan) 28 runs– Encouraging results– High loss in accuracy with ASR output

Conclusion and future work (1/2)

Page 15: LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

LREC 2008 17 Marrakech, May 29, 2008

• QAST 2008 is an extension of QAST 2007 (3 languages, 4 domains, definition and factual questions, multiple ASR outputs with different WERs)

• It’s still time to join QAST 2008 (participation is free)

• Future work aims at including:– Cross lingual tasks,

– Oral questions,

– Other domains.

Conclusion and future work (2/2)

Page 16: LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation in CLEF L. Lamel 1, S. Rosset 1, C. Ayache 2, D. Mostefa.

LREC 2008 18 Marrakech, May 29, 2008

The QAST Website:

http://www.lsi.upc.edu/~qast/

For more information