Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan...

22
Entity-Driven Desiderata Computational Linguistics: Jordan Boyd-Graber University of Maryland COREFERENCE Adapted from slides by Phil Blunsom Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 1/8

Transcript of Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan...

Page 1: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

Entity-Driven Desiderata

Computational Linguistics: Jordan Boyd-GraberUniversity of MarylandCOREFERENCE

Adapted from slides by Phil Blunsom

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 1 / 8

Page 2: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

Machine Reading Framework

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 2 / 8

Page 3: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

CNN/Daily Mail Datasets

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 3 / 8

Page 4: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

CNN/Daily Mail Datasets

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 3 / 8

Page 5: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

CNN/Daily Mail Datasets

Good

we aimed to factor out worldknowledge through entityanonymisation so models could notrely on correlations rather thanunderstanding.

Bad

The generation process and entityanonymisation reduced the task tomultiple choice and introducedadditional noise.

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 3 / 8

Page 6: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

CNN/Daily Mail Datasets

Good

posing reading comprehension as alarge scale conditional modelling taskmade it accessible to machinelearning researchers, generating agreat deal of subsequent research.

Bad

while this approach is reasonable forbuilding applications, it is entirely thewrong way to develop and evaluatenatural language understanding.

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 3 / 8

Page 7: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

Narrative QA

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 4 / 8

Page 8: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

Narrative QA

Good

A challenging evaluation that tests arange of language understanding,particularly temporal aspects ofnarrative, and also scalability ascurrent models cannot represent andreason over full narratives

Bad

Performing well on this task is clearlywell beyond current models, bothrepresentationally andcomputationally. As such it will behard for researchers to hill climb onthis evaluation.

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 4 / 8

Page 9: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

Narrative QA

Good

The relatively small number ofnarratives for training models forcesresearchers to approach this taskfrom a transfer learning perspective.

Bad

The relatively small number ofnarratives means that this dataset isnot of immediate use for thosewanting to build supervised modelsfor applications.

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 4 / 8

Page 10: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

MS Marco

Questions are mined from a search engine and matched with candidateanswer passages using IR techniques.

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 5 / 8

Page 11: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

MS Marco

Good

The reliance on real queries createsa much more useful resource forthose interested in applications.

Bad

People rarely ask interestingquestions of search engines, and theuse of IR techniques to collectcandidate passages limits theusefulness of this dataset forevaluating language understanding.

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 5 / 8

Page 12: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

MS Marco

Good

Unrestricted answers allow a greaterrange of questions.

Bad

How to evaluate freeform answers isan unsolved problem. Bleu is not theanswer!

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 5 / 8

Page 13: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

SQuAD

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 6 / 8

Page 14: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

SQuAD

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 6 / 8

Page 15: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

SQuAD

Good

Very scalable annotation process thatcan cheaply generate large numbersof questions per article.

Bad

Annotating questions directly fromthe context passages strongly skewsthe data distribution. The task thenbecomes reverse engineering theannotators, rather than languageunderstanding.

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 6 / 8

Page 16: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

SQuAD

Good

The online leaderboard allows easybenchmarking of systems andmotivates competition.

Bad

Answers as spans reduces the taskto multiple choice, and doesnâAZtallow questions with answers latent inthe text.

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 6 / 8

Page 17: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

SQuAD

Good

Human upperbound sets reasonablegoal.

Bad

Allows mischaracterization of what itmeans to “read”.

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 6 / 8

Page 18: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

Quiz Bowl

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 7 / 8

Page 19: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

Quiz Bowl

Good

Free data from experts

Bad

Sometimes can be trivially solvedwith pattern matching

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 7 / 8

Page 20: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

Quiz Bowl

Good

Based on already known knowledge

Bad

Not tied to readable data

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 7 / 8

Page 21: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

Quiz Bowl

Good

Human comparison makes sense

Bad

More cumbersome computerevaluation

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 7 / 8

Page 22: Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan Boyd-Graber jUMD Entity-Driven Desiderata 4 / 8. Narrative QA Good The relatively small number

Possible Projects

� Improve selection of answer spans

� Improve IR search for context

� Visualizing reader spans

� Domain adaptation (Wikipedia/Questions/Books)

Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 8 / 8