Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan...
Transcript of Entity-Driven Desideratajbg/teaching/CMSC_723/15c_qa.pdfComputational Linguistics: Jordan...
Entity-Driven Desiderata
Computational Linguistics: Jordan Boyd-GraberUniversity of MarylandCOREFERENCE
Adapted from slides by Phil Blunsom
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 1 / 8
Machine Reading Framework
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 2 / 8
CNN/Daily Mail Datasets
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 3 / 8
CNN/Daily Mail Datasets
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 3 / 8
CNN/Daily Mail Datasets
Good
we aimed to factor out worldknowledge through entityanonymisation so models could notrely on correlations rather thanunderstanding.
Bad
The generation process and entityanonymisation reduced the task tomultiple choice and introducedadditional noise.
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 3 / 8
CNN/Daily Mail Datasets
Good
posing reading comprehension as alarge scale conditional modelling taskmade it accessible to machinelearning researchers, generating agreat deal of subsequent research.
Bad
while this approach is reasonable forbuilding applications, it is entirely thewrong way to develop and evaluatenatural language understanding.
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 3 / 8
Narrative QA
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 4 / 8
Narrative QA
Good
A challenging evaluation that tests arange of language understanding,particularly temporal aspects ofnarrative, and also scalability ascurrent models cannot represent andreason over full narratives
Bad
Performing well on this task is clearlywell beyond current models, bothrepresentationally andcomputationally. As such it will behard for researchers to hill climb onthis evaluation.
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 4 / 8
Narrative QA
Good
The relatively small number ofnarratives for training models forcesresearchers to approach this taskfrom a transfer learning perspective.
Bad
The relatively small number ofnarratives means that this dataset isnot of immediate use for thosewanting to build supervised modelsfor applications.
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 4 / 8
MS Marco
Questions are mined from a search engine and matched with candidateanswer passages using IR techniques.
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 5 / 8
MS Marco
Good
The reliance on real queries createsa much more useful resource forthose interested in applications.
Bad
People rarely ask interestingquestions of search engines, and theuse of IR techniques to collectcandidate passages limits theusefulness of this dataset forevaluating language understanding.
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 5 / 8
MS Marco
Good
Unrestricted answers allow a greaterrange of questions.
Bad
How to evaluate freeform answers isan unsolved problem. Bleu is not theanswer!
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 5 / 8
SQuAD
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 6 / 8
SQuAD
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 6 / 8
SQuAD
Good
Very scalable annotation process thatcan cheaply generate large numbersof questions per article.
Bad
Annotating questions directly fromthe context passages strongly skewsthe data distribution. The task thenbecomes reverse engineering theannotators, rather than languageunderstanding.
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 6 / 8
SQuAD
Good
The online leaderboard allows easybenchmarking of systems andmotivates competition.
Bad
Answers as spans reduces the taskto multiple choice, and doesnâAZtallow questions with answers latent inthe text.
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 6 / 8
SQuAD
Good
Human upperbound sets reasonablegoal.
Bad
Allows mischaracterization of what itmeans to “read”.
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 6 / 8
Quiz Bowl
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 7 / 8
Quiz Bowl
Good
Free data from experts
Bad
Sometimes can be trivially solvedwith pattern matching
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 7 / 8
Quiz Bowl
Good
Based on already known knowledge
Bad
Not tied to readable data
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 7 / 8
Quiz Bowl
Good
Human comparison makes sense
Bad
More cumbersome computerevaluation
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 7 / 8
Possible Projects
� Improve selection of answer spans
� Improve IR search for context
� Visualizing reader spans
� Domain adaptation (Wikipedia/Questions/Books)
Computational Linguistics: Jordan Boyd-Graber | UMD Entity-Driven Desiderata | 8 / 8