Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is...

41

Upload
others
Category

Documents
view
1
download
0

Embed Size (px):

Transcript of Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is...

Page 1: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Page 2: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Percy Liang

Questioning the Question Answering Dataset

Page 3: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Questioning the Question Answering Dataset

Percy Liang

Microsoft Faculty Summit — July 17, 2017

Page 4: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Datasets drive progress

1

Page 5: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

What about question answering?

2

Page 6: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

What about question answering?

Desiderata: datasets that can’t be ”solved” by ”cheap tricks”

2

Page 7: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Executable semantic parsing

What is the largest city in Europe by population?[database]

3

Page 8: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Executable semantic parsing

What is the largest city in Europe by population?

semantic parsing

Cities

[database]

3

Page 9: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Executable semantic parsing

What is the largest city in Europe by population?

semantic parsing

Cities Europe

[database]

3

Page 10: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Executable semantic parsing

What is the largest city in Europe by population?

semantic parsing

Cities ContainedBy(Europe)

[database]

3

Page 11: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Executable semantic parsing

What is the largest city in Europe by population?

semantic parsing

Cities ∩ ContainedBy(Europe)

[database]

3

Page 12: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Executable semantic parsing

What is the largest city in Europe by population?

semantic parsing

Cities ∩ ContainedBy(Europe) Population

[database]

3

Page 13: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Executable semantic parsing

What is the largest city in Europe by population?

semantic parsing

argmax(Cities ∩ ContainedBy(Europe),Population)

[database]

3

Page 14: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Executable semantic parsing

What is the largest city in Europe by population?

semantic parsing

argmax(Cities ∩ ContainedBy(Europe),Population)

execute

Istanbul

[database]

3

Page 15: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

WebQuestions (2013)

what art did wassily kandinsky do?

what boarding school did mark zuckerberg go to?

what book is mark twain famous for? BarackObama

Person

Type

Politician

Profession

1961.08.04

DateOfBirth

HonoluluPlaceOfBirth

Hawaii

ContainedBy

City

Type

UnitedStates

ContainedBy

USState

Type

Event8

Marriage

MichelleObama

Spouse

Type

FemaleGender

1992.10.03

StartDate

Event3

PlacesLived

Chicago

Location

Event21

PlacesLived

Location

ContainedBy

• Realistic: user queries from search engine

• Most questions can’t be answered by Freebase

• Small dataset: only 6K questions (started with 100K)

• Noisy: ceiling is 60-70% (current best is 55%)

• Not much compositionality (can be solved without semantic parsing)

[Berant et al; 2013]

4

Page 16: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

WikiTableQuestions (2015)[Pasupat & Liang 2015]

5

Page 17: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

WikiTableQuestions (2015)

• Compositionality: requires reasoning

• Clean: crowdworkers made sure questions were answerable

• Gave up on realism

• State-of-the-art is 45%, ceiling is around 90%

• Dataset is too small given the complexity (22K questions)?

• A bit niche for the NLP community

[Pasupat & Liang 2015]

6

Page 18: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

SQuAD (2016)[Rajpurkar et al. 2016]

7

Page 19: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

SQuAD (2016)

• Desiderata: large and clean

• 100K examples from 536 articles

• Answer is span of paragraph

• Train and test have disjoint articles

• Gave up on guarding against cheap tricks

• Questions don’t require much reasoning

[Rajpurkar et al. 2016]

7

Page 20: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

SQuAD (2016)

• Desiderata: large and clean

• 100K examples from 536 articles

• Answer is span of paragraph

• Train and test have disjoint articles

• Gave up on guarding against cheap tricks

• Questions don’t require much reasoning

• ”necessary but not sufficient” dataset

[Rajpurkar et al. 2016]

7

Page 21: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

worksheets.codalab.org

submit code to run on test set ⇒ public models

8

Page 22: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

• over 1 year

• 30 teams

• best system: 84.7%

• humans: 91.2%

• Advances in attention mechanisms

9

Page 23: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Other datasets

Fill in the blank:

• CNN/DailyMail (1.4M): cloze-style predict entity

• Children’s Book Test (688K), BookTest (14M): cloze-style predict word

10

Page 24: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Other datasets

Fill in the blank:

• CNN/DailyMail (1.4M): cloze-style predict entity

• Children’s Book Test (688K), BookTest (14M): cloze-style predict word

• LAMBADA: predict word (dataset constructed to need longer context)

• ROCStories (100K): predict wrong/right ending, model discourse

10

Page 25: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Other datasets

Fill in the blank:

• CNN/DailyMail (1.4M): cloze-style predict entity

• Children’s Book Test (688K), BookTest (14M): cloze-style predict word

• LAMBADA: predict word (dataset constructed to need longer context)

• ROCStories (100K): predict wrong/right ending, model discourse

Question answering via maching reading:

• MCTest: 2640 questions, multiple choice, reasoning

• AI2 science questions: 855 questions, knowledge, reasoning

10

Page 26: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Other datasets

Fill in the blank:

• CNN/DailyMail (1.4M): cloze-style predict entity

• Children’s Book Test (688K), BookTest (14M): cloze-style predict word

• LAMBADA: predict word (dataset constructed to need longer context)

• ROCStories (100K): predict wrong/right ending, model discourse

Question answering via maching reading:

• MCTest: 2640 questions, multiple choice, reasoning

• AI2 science questions: 855 questions, knowledge, reasoning

• NewsQA (100K): questions crowdsourced based on CNN summaries

• MS-MARCO (100K): real questions from users of search engines

• TriviaQA (650K): trivia questions made by enthusiasts

• Quasar (80K): cloze-style from StackOverflow, trivia questions10

Page 27: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Are these datasets enough?

11

Page 28: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Adversarial evaluation

Peyton Manning became the first quarterback ever to lead two different teams to multiple Super Bowls. He is also the oldestquarterback ever to play in a Super Bowl at age 39. The past record was held by John Elway, who led the Broncos to victory inSuper Bowl XXXIII at age 38 and is currently Denver’s Executive Vice President of Football Operations and General Manager.

What is the name of the quarterback who was 38 in Super Bowl XXXIII?

BiDAF

John Elway

[with Robin Jia; EMNLP 2017]

12

Page 29: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Adversarial evaluation

Peyton Manning became the first quarterback ever to lead two different teams to multiple Super Bowls. He is also the oldestquarterback ever to play in a Super Bowl at age 39. The past record was held by John Elway, who led the Broncos to victory inSuper Bowl XXXIII at age 38 and is currently Denver’s Executive Vice President of Football Operations and General Manager.Jeff Dean is the name of the quarterback who was 37 in Champ Bowl XXXIV.

What is the name of the quarterback who was 38 in Super Bowl XXXIII?

BiDAF

[with Robin Jia; EMNLP 2017]

12

Page 30: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Adversarial evaluation

Peyton Manning became the first quarterback ever to lead two different teams to multiple Super Bowls. He is also the oldestquarterback ever to play in a Super Bowl at age 39. The past record was held by John Elway, who led the Broncos to victory inSuper Bowl XXXIII at age 38 and is currently Denver’s Executive Vice President of Football Operations and General Manager.Jeff Dean is the name of the quarterback who was 37 in Champ Bowl XXXIV.

What is the name of the quarterback who was 38 in Super Bowl XXXIII?

BiDAF

Jeff Dean

[with Robin Jia; EMNLP 2017]

12

Page 31: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

13

Page 32: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Results on public models

Model Original F1 Adversarial F1

ReasoNet-E 81.1 49.8

SEDT-E 80.1 46.5

BiDAF-E 80.0 46.9

Mnemonic-E 79.1 55.3

Ruminating 78.8 47.7

jNet 78.6 47.0

Mnemonic-S 78.5 56.0

ReasoNet-S 78.2 50.3

MPCM-S 77.0 50.0

RaSOR 76.2 49.5

BiDAF-S 75.5 45.7

14

Page 33: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Results on public models

Model Original F1 Adversarial F1

ReasoNet-E 81.1 49.8

SEDT-E 80.1 46.5

BiDAF-E 80.0 46.9

Mnemonic-E 79.1 55.3

Ruminating 78.8 47.7

jNet 78.6 47.0

Mnemonic-S 78.5 56.0

ReasoNet-S 78.2 50.3

MPCM-S 77.0 50.0

RaSOR 76.2 49.5

BiDAF-S 75.5 45.7

Humans 92.6 89.2 14

Page 34: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Training versus testing

Peyton Manning became the first quarterback ever to lead two different teams to multiple Super Bowls. He is also the oldestquarterback ever to play in a Super Bowl at age 39. The past record was held by John Elway, who led the Broncos to victory inSuper Bowl XXXIII at age 38 and is currently Denver’s Executive Vice President of Football Operations and General Manager.Jeff Dean is the name of the quarterback who was 37 in Champ Bowl XXXIV.

• If retrain model on adversarial examples, 74.3 ⇒ 70.0

• If test this model on prepended sentences, 70.0 ⇒ 36.9

• Think of this as test-set evaluation only

15

Page 35: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Proposal: testing properties

Properties:

• Should be invariant under adding distracting sentences

test exampleproperty 1

implied test examples

16

Page 36: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Proposal: testing properties

Properties:

• Should be invariant under adding distracting sentences

• Should be invariant under paraphrase

• If replace ’John Elway’ with entity X, should return X

test exampleproperty 1

implied test examples

test exampleproperty 2

implied test examples

test exampleproperty 3

implied test examples

16

Page 37: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Proposal: testing properties

Properties:

• Should be invariant under adding distracting sentences

• Should be invariant under paraphrase

• If replace ’John Elway’ with entity X, should return X

test exampleproperty 1

implied test examples

test exampleproperty 2

implied test examples

test exampleproperty 3

implied test examples

• Ongoing: automatically generate much wider space of adversarial examples (adversaries= teachers)

16

Page 38: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Summary

• Tradeoff between clean dataset and realistic dataset

• Progression of datasets, each at the cusp of what’s solvable

• Stronger testing of models based on adversaries and properties

17

Page 39: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Jonathan Berant Panupong Pasupat Pranav Rajpurkar Robin Jia

DARPA NSF Future of Life Microsoft Facebook

Thank you!

18

Page 40: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

Page 41: Questioning the Question Answering Dataset · 2018-01-04 · Super Bowl XXXIII at age 38 and is currently Denver's Executive Vice President of Football Operations and General Manager.

El Vocero Virtual XXXIII

El Vocero Virtual XXXIII

DENVER'S FOOD SYSTEM

DENVER'S FOOD SYSTEM

An Evaluation of Denver's SchoolChoice Process, 2012-2014

An Evaluation of Denver's SchoolChoice Process, 2012-2014

Volume XXXIII, Issue 13

Volume XXXIII, Issue 13

LEGALLY SPEAKING XXXIII

LEGALLY SPEAKING XXXIII

2007-2008: DU Profiles, the University of Denver's fact book

2007-2008: DU Profiles, the University of Denver's fact book

MISAS | MASS SCHEDULE XXXIII Sunday in Ordinary Time| XXXIII Domingo …saintmichael.cc/wp-content/uploads/2020/11/022302... · 2020. 11. 11. · XXXIII Sunday in Ordinary Time| XXXIII

MISAS | MASS SCHEDULE XXXIII Sunday in Ordinary Time| XXXIII Domingo …saintmichael.cc/wp-content/uploads/2020/11/022302... · 2020. 11. 11. · XXXIII Sunday in Ordinary Time| XXXIII

Caps xxvii xxxiii

Caps xxvii xxxiii

Denver's South Platte River Vision Implementation Plan - Jeff Shoemaker, The Greenway Foundation

Denver's South Platte River Vision Implementation Plan - Jeff Shoemaker, The Greenway Foundation

Crafting an Innovation School: Findings from Denver's first eight Innovation schools

Crafting an Innovation School: Findings from Denver's first eight Innovation schools

2009-2010: DU Profiles, the University of Denver's fact book

2009-2010: DU Profiles, the University of Denver's fact book

33. Obra Periodística XXXIII

33. Obra Periodística XXXIII

Inglés v (Xxxiii)

Inglés v (Xxxiii)

Cornell Review XXXIII #6

Cornell Review XXXIII #6

XXXIII UNESCO-SEG-SGA Latin American … · XXXIII UNESCO-SEG-SGA Latin American Metallogeny Course XXXIII Curso Latinoamericano de Metalogenia UNESCO-SEG-SGA Campus of the University

XXXIII UNESCO-SEG-SGA Latin American … · XXXIII UNESCO-SEG-SGA Latin American Metallogeny Course XXXIII Curso Latinoamericano de Metalogenia UNESCO-SEG-SGA Campus of the University

XXXIII Revelation and the Creation

XXXIII Revelation and the Creation

Volume 6, Issue 3 Windsor Gardens, Denver's Premier Active ...

Volume 6, Issue 3 Windsor Gardens, Denver's Premier Active ...

Thick and Thin Questions QUESTIONING QUESTIONING.

Thick and Thin Questions QUESTIONING QUESTIONING.

2010-2011: DU Profiles, the University of Denver's fact book

2010-2011: DU Profiles, the University of Denver's fact book

XXXIII SUNDAY IN ORDINARY TIME/XXXIII DOMINGO EN …u4b.468.myftpupload.com/wp-content/uploads/2018/11/... · 11/18/2018 · XXXIII SUNDAY IN ORDINARY TIME/XXXIII DOMINGO EN TIEMPO

XXXIII SUNDAY IN ORDINARY TIME/XXXIII DOMINGO EN …u4b.468.myftpupload.com/wp-content/uploads/2018/11/... · 11/18/2018 · XXXIII SUNDAY IN ORDINARY TIME/XXXIII DOMINGO EN TIEMPO

Languages

Pages

Legal

Copyright © 2022 FDOCUMENTS