LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation...

LREC 2008 2 Marrakech, May 29, 2008

Outline

1. Motivations

2. Objectives

3. QAST 20071. Tasks

2. Participants

3. Results

4. QAST 2008

5. Conclusion


QAST Organization

Evaluation campaign is jointly organized by :

- UPC, Spain (J. Turmo, P. Comas)

Coordinator

- ELDA, France (N. Moreau, C. Ayache, D. Mostefa)

- LIMSI, France (S. Rosset, L. Lamel)


Motivations

• Much of human interaction is via spoken language• QA research developed techniques for written texts

with correct syntactic and semantic structures• Spoken data is very different from textual data

– Speech phenomena, false starts, speech corrections, truncated words, etc

– Grammatical structure of spontenous speech is very particular

– No punctuation and no capitalization– For meetings, interaction creates run-on sentences where

the distance between the first part and the last one can be very long


Objectives

• In general, motivating and driving the design of novel and robust factual QA architectures for automatic speech transcriptions.

• Comparing the performances systems dealing with both types of transcriptions and both types of questions (fatual and definitional).

• Measuring the loss of each system due to ASR.• Measuring the loss of each system due to the ASR

output degradation.


• Corpus:– The CHIL corpus:

• 25 seminars of 1 hour each• Spontenous speech• English spoken by non native speakers• Domain of lectures: Speech and language processing• Manual transcription done by ELDA• Automatic transcription provided by LIMSI

– The AMI corpus:• 168 meetings (100 hours)• Spontenous speech• English• Domain of meetings: Design of television remote control• Manual transcription done by AMI• Automatic transcription provided by AMI

• 4 tasks:– T1 : QA in manual transcriptions of lectures– T2 : QA in automatic transcriptions of lectures– T3 : QA in manual transcriptions of meetings– T4 : QA in automatic transcriptions of meetings

QAST 2007: Resources and tasks


For each task, 2 sets of questions were provided:• Development set:

– Lectures: 10 seminars, 50 questions– Meetings: 50 meetings, 50 questions

• Evaluation set:– Lectures: 15 seminars, 100 questions– Meetings: 118 meetings, 100 questions

• Factual questions. No definition questions.

• Expected answers = named entities.List of NEs: person, location, organization, language, system/method, measure, time, color, shape, material.

QAST 2007 : development and evaluation


• Assessors used QASTLE, an evaluation tool developed by ELDA, to evaluate the data.

QAST 2007: Human judgment


• Four possible judgments:– Correct – Incorrect – Non-Exact– Unsupported

• Two metrics were used:– Mean Reciprocal Rank (MRR):

measures how well ranked is a right answer.– Accuracy:

the fraction of correct answers ranked in the first position in the list of 5 possible answers

• Participants could submit up to 2 submissions per task and 5 answers per question.

Task: Scoring


• Five teams submitted results for one or more QAST tasks:

– CLT, Center for Language Technology, Australia ;

– DFKI, Germany ;

– LIMSI, Laboratoire d’Informatique et de Mécanique des Sciences de l’Ingénieur, France ;

– TOKYO, Tokyo Institute of Technology, Japan ;

– UPC, Universitat Politècnica de Catalunya, Spain.

• In total, 28 submission files were evaluated:

Participants

CHIL Corpus AMI Corpus

T1 T2 T3 T4

8 submissions 9 submissions 5 submissions 6 submissions


Results for CHIL lectures (T1 and T2)

SystemManual Automatic

MRR Accuracy MRR Accuracy

S1 0.09 0.06 0.06 0.03

S2 0.09 0.05 0.05 0.02

S3 0.17 0.15 0.09 0.09

S4 0.37 0.32 0.23 0.20

S5 0.46 0.39 0.24 0.21

S6 0.19 0.14 0.12 0.08

S7 0.20 0.14 0.12 0.08

S8 0.53 0.51 0.37 0.36

S9 0.25 0.24


Results for AMI meetings (T3 and T4)

System

Manual Automatic

MRR Accuracy MRR Accuracy

S1 0.23 0.16 0.10 0.06

S2 0.25 0.20 0.13 0.08

S3 0.28 0.25 0.19 0.18

S4 0.31 0.25 0.19 0.17

S5 0.26 0.25 0.22 0.21

S6 0.15 0.13


QAST 2008• Extension of QAST 2007:

– 3 languages: French, English, Spanish– 4 domains: Broadcast news, Parliament speeches, Lectures, Meetings– Different level of WERs (10%, 20% and 30%)– Factual and Definition questions

• 5 corpora– CHIL lectures– AMI meetings – TC-STAR05 EPPS English corpus – TC-STAR05 EPPS Spanish corpus– ESTER French broadcast news corpus

• Evaluation from June 15-June 30


QAST 2008 tasks

• T1a: Question Answering in manual transcriptions of lectures (CHIL corpus)• T1b: Question Answering in automatic transcriptions of lectures (CHIL corpus)• T2a: Question Answering in manual transcriptions of meetings (AMI corpus)• T2b: Question Answering in automatic transcriptions of meetings (AMI corpus)• T3a: Question Answering in manual transcriptions of broadcast news for French

(ESTER corpus)• T3b: Question Answering in automatic transcriptions of broadcast news for French

(ESTER corpus)• T4a: Question Answering in manual transcriptions of European Parliament Plenary

sessions in English (EPPS English corpus)• T4b: Question Answering in automatic transcriptions of European Parliament

Plenary sessions in English (EPPS English corpus)• T5a: Question Answering in manual transcriptions of European Parliament Plenary

sessions in Spanish (EPPS Spanish corpus)• T5b: Question Answering in automatic transcriptions of European Parliament

Plenary in Spanish (EPPS Spanish corpus)


• We presented the Question Answering on Speech Transcripts evaluation campaigns framework

• QAST 2007– 5 participants from 5 different countries (France,

Germany, Spain, Australia and Japan) 28 runs– Encouraging results– High loss in accuracy with ASR output

Conclusion and future work (1/2)


• QAST 2008 is an extension of QAST 2007 (3 languages, 4 domains, definition and factual questions, multiple ASR outputs with different WERs)

• It’s still time to join QAST 2008 (participation is free)

• Future work aims at including:– Cross lingual tasks,

– Oral questions,

– Other domains.

Conclusion and future work (2/2)


The QAST Website:

http://www.lsi.upc.edu/~qast/

For more information

LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation...

Documents

Transcript of LREC 2008 1 Marrakech, May 29, 2008 Question Answering on Speech Transcriptions: the QAST evaluation...