Representaon Learning for Reading Comprehension

51
Representa)on Learning for Reading Comprehension Russ Salakhutdinov Machine Learning Department Carnegie Mellon University Canadian Institute for Advanced Research Joint work with Bhuwan Dhingra, Zhilin Yang, Ye Yuan, Junjie Hu, Hanxiao Liu, and William Cohen

Transcript of Representaon Learning for Reading Comprehension

Page 1: Representaon Learning for Reading Comprehension

Representa)onLearningforReadingComprehension

RussSalakhutdinovMachine Learning Department

Carnegie Mellon UniversityCanadian Institute for Advanced Research

Joint work withBhuwanDhingra,ZhilinYang,YeYuan,JunjieHu,

HanxiaoLiu,andWilliamCohen

Page 2: Representaon Learning for Reading Comprehension

TalkRoadmap

•  Mul)plica)veandFine-grainedAJen)on

•  Incorpora)ngKnowledgeasExplicitMemoryforRNNs

•  Genera)veDomain-Adap)veNets

Page 3: Representaon Learning for Reading Comprehension

•  Query:President-electBarackObamasaidTuesdayhewasnotawareofallegedcorrup)onbyXwhowasarrestedonchargesoftryingtosellObama’ssenateseat.

Who-Did-WhatDataset•  Document:“…arrestedIllinoisgovernorRodBlagojevichandhis

chiefofstaffJohnHarrisoncorrup)oncharges…includedBlogojevichallegedlyconspiringtosellortradethesenateseatleZvacantbyPresident-electBarackObama…”

•  Answer:RodBlagojevich

Onishi, Wang, Bansal, Gimpel, McAllester, EMNLP, 2016

Page 4: Representaon Learning for Reading Comprehension

Introduc)on• Thecloze-styleQ/Ataskinvolvestuplesoftheform

Ø  isadocument(paragraph/context)Ø  isaques)onoverthecontentsofthatdocumentØ  istheanswertothisquery

• Theanswercomesfromafixedvocabulary.

• Task:givenadocumentquerypairfindwhichanswers.

Page 5: Representaon Learning for Reading Comprehension

PriorWork• LSTMswithAJen)on:Hermannetal.,2015,Hilletal,2015,Chenetal,2016,Bahdanauetal.,2014

• MemoryNetworks:Westonetal.,2014,Sukhbaataretal.,2015,Bajgaretal.,2016

• AJen)onSumReader:Kadlecetal.,2016,Cuiwtal,2016

• DynamicEn)tyRepresenta)ons:Kobayashietal.,2016

• NeuralSeman)cEncoders,Munkhdalai&Yu,2016

• Itera)veAJen)veReader,Sordonietal.,2016

• ReasoNet,Shenetal.,2016

Page 6: Representaon Learning for Reading Comprehension

RecurrentNeuralNetwork

x1 x2 x3

h1 h2 h3

Nonlinearity HiddenStateatprevious)mestep

Inputat)mestept

Page 7: Representaon Learning for Reading Comprehension

Mul)plica)veIntegra)on

• Replace

• With

• Ormoregenerally

Wu et al., NIPS 2016

Page 8: Representaon Learning for Reading Comprehension

Mul)plica)veIntegra)on

Wu et al., NIPS 2016

• Ormoregenerally

Page 9: Representaon Learning for Reading Comprehension

Represen)ngDocument/Query• Letdenotedenotethetokenembeddingsofthedocument.

• Letdenoteembeddingsofthequery.

• |D|and|Q|denotethedocumentandquerylengthsrespec)vely

Page 10: Representaon Learning for Reading Comprehension

Represen)ngDocument/Query• ForwardRNNreadssentencesfromleZtoright:

• BackwardRNNreadssentencesfromrighttoleZ:

• Thehiddenstatesarethenconcatenated:

Page 11: Representaon Learning for Reading Comprehension

Represen)ngDocument/Query• UseGRUstoencodeadocumentandaquery:

• Notethat,forexample,Qisamatrix

• WecanthenuseGatedAJen)onmechanism:

Page 12: Representaon Learning for Reading Comprehension

Ø  usetheelement-wisemul)plica)onoperatortomodeltheinterac)onsbetweenand

GatedAJen)onMechanism• ForeachtokendinD,weformatoken-specificrepresenta)onofthequery:

Dhingra, Liu, Yang, Cohen, Salakhutdinov, ACL 2017

Page 13: Representaon Learning for Reading Comprehension

Mul)-hopArchitecture• ManyQAtasksrequirereasoningovermul)plesentences.• Needtoperformsseveralpassesoverthecontext.

Dhingra, Liu, Yang, Cohen, Salakhutdinov, ACL 2017

Page 14: Representaon Learning for Reading Comprehension

OutputModel• Probabilitythatapar)culartokeninthedocumentanswersthequery:

Ø  Takeaninnerproductbetweenthequeryembeddingandtheoutputofthelastlayer:

• Theprobabilityofapar)cularcandidateisthenaggregatedoveralldocumenttokenswhichappearinc:

setofposi)onswhereatokenincappearsinthedocumentd.

Pointer Sum Attention of Kadlec et al., 2016

Page 15: Representaon Learning for Reading Comprehension

OutputModel• Theprobabilityofapar)cularcandidateisthenaggregatedoveralldocumenttokenswhichappearinc:

• Thecandidatewithmaximumprobabilityisselectedasthepredictedanswer:

• Usecross-entropylossbetweenthepredictedprobabili)esandthetrueanswers.

Page 16: Representaon Learning for Reading Comprehension

Datasets• Wehavelookedat6datasets:

• Thewordlookupwasini)alizedwithGloVevectors(Penningtonetal.,2014)

Page 17: Representaon Learning for Reading Comprehension

AffectofMul)plica)veGa)ng• Performanceofdifferentga)ngfunc)onson“WhodidWhat”(WDW)dataset.

Page 18: Representaon Learning for Reading Comprehension
Page 19: Representaon Learning for Reading Comprehension

AnalysisofAJen)on•  Context:“…arrestedIllinoisgovernorRodBlagojevichandhischiefofstaffJohn

Harrisoncorrup)oncharges…includedBlogojevichallegedlyconspiringtosellortradethesenateseatleZvacantbyPresident-electBarackObama…”

•  Query:“President-electBarackObamasaidTuesdayhewasnotawareofallegedcorrup)onbyXwhowasarrestedonchargesoftryingtosellObama’ssenateseat.”

•  Answer:RodBlagojevich

Layer1 Layer2

Page 20: Representaon Learning for Reading Comprehension

AnalysisofAJen)on•  Context:“…arrestedIllinoisgovernorRodBlagojevichandhischiefofstaffJohn

Harrisoncorrup)oncharges…includedBlogojevichallegedlyconspiringtosellortradethesenateseatleZvacantbyPresident-electBarackObama…”

•  Query:“President-electBarackObamasaidTuesdayhewasnotawareofallegedcorrup)onbyXwhowasarrestedonchargesoftryingtosellObama’ssenateseat.”

•  Answer:RodBlagojevich

Layer1 Layer2

Code+Data:hJps://github.com/bdhingra/ga-reader

Page 21: Representaon Learning for Reading Comprehension

Wordsvs.Characters• Word-levelrepresenta)onsaregoodatlearningtheseman)csofthetokens

• Character-levelrepresenta)onsaremoresuitableformodelingsub-wordmorphologies(“cat”vs.“cats”)

Ø  Word-levelrepresenta)onsareobtainedfromalearnedlookuptable

Ø  Character-levelrepresenta)onsareusuallyobtainedbyapplyingRNNorCNN

• Hybridword-charactermodelshavebeenshowntobesuccessfulinvariousNLPtasks(Yangetal.,2016a,Miyamoto&Cho(2016),Lingetal.,2015)

• Commonlyusedmethodistoconcatenatethesetworepresenta)ons

Page 22: Representaon Learning for Reading Comprehension

Fine-GrainedGa)ng• Fine-grainedga)ngmechanism:

Word-levelrepresenta)on

Character-levelrepresenta)on

Ga)ng

Addi)onalfeatures:nameden)tytags,part-of-speechtags,documentfrequencyvectors,wordlook-uprepresenta)ons

Yang et al., ICLR 2017

Page 23: Representaon Learning for Reading Comprehension

Children’sBookTest(CBC)Dataset

Yang et al., ICLR 2017

Page 24: Representaon Learning for Reading Comprehension

Wordsvs.Characters• Highgatevalues:character-levelrepresenta)ons• Lowgatevalues:word-levelrepresenta)ons.

Yang et al., ICLR 2017

Page 25: Representaon Learning for Reading Comprehension

TalkRoadmap

•  Mul)plica)veandFine-grainedAJen)on

•  Linguis)cKnowledgeasExplicitMemoryforRNNs

•  Genera)veDomain-Adap)veNets

Page 26: Representaon Learning for Reading Comprehension

Her plain face broke into a huge smile when she saw Terry. “Terry!” she called out. She rushed to meet him and they embraced. “Hon, I want you to meet an old friend, Owen McKenna. Owen, please meet Emily.'' She gave me a quick nod and turned back to X

Broad-ContextLanguageModeling

LAMBADA dataset, Paperno et al., 2016

Page 27: Representaon Learning for Reading Comprehension

Her plain face broke into a huge smile when she saw Terry. “Terry!” she called out. She rushed to meet him and they embraced. “Hon, I want you to meet an old friend, Owen McKenna. Owen, please meet Emily.'' She gave me a quick nod and turned back to X

Broad-ContextLanguageModeling

LAMBADA dataset, Paperno et al., 2016

Page 28: Representaon Learning for Reading Comprehension

Her plain face broke into a huge smile when she saw Terry. “Terry!” she called out. She rushed to meet him and they embraced. “Hon, I want you to meet an old friend, Owen McKenna. Owen, please meet Emily.'' She gave me a quick nod and turned back to X

X = Terry

Broad-ContextLanguageModeling

LAMBADA dataset, Paperno et al., 2016

Page 29: Representaon Learning for Reading Comprehension

HerplainfacebrokeintoahugesmilewhenshesawTerry.“Terry!”shecalledout.Sherushedtomeethimandtheyembraced.“Hon,Iwantyoutomeetanoldfriend,OwenMcKenna.Owen,pleasemeetEmily.'’ShegavemeaquicknodandturnedbacktoX

CoreferenceDependencyParses

En)tyrela)ons

Wordrela)ons

CoreNLP

Freebase

WordNet

RecurrentNeuralNetwork

TextRepresenta)on

Incorpora)ngPriorKnowledge

Page 30: Representaon Learning for Reading Comprehension

there ball the left She

kitchen the to went She

football the got Mary

CoreferenceHyper/Hyponymy

RNN

Dhingra, Yang, Cohen, Salakhutdinov 2017

Incorpora)ngPriorKnowledge

Page 31: Representaon Learning for Reading Comprehension

MemoryasAcyclicGraphEncoding(MAGE)-RNN

there ball the left She

kitchen the to went She

football the got Mary

CoreferenceHyper/Hyponymy

RNN

RNN

xt

Mt

h0h1...

ht�1

e1 e|E|. . .

ht

Mt+1gt

Dhingra, Yang, Cohen, Salakhutdinov 2017

Incorpora)ngPriorKnowledge

Page 32: Representaon Learning for Reading Comprehension

there ball the left she

kitchen the to went she

football the got Mary

ForwardSubgraph

Forward/BackwardDAG

there ball the left she

kitchen the to went she

football the got Mary

BackwardSubgraph

Page 33: Representaon Learning for Reading Comprehension

there ball the left she

kitchen the to went she

football the got Mary

? football the is Where

Mul)pleSequences

Page 34: Representaon Learning for Reading Comprehension

ResultswithCoreference

• Plantoincorporaterela)onsbeyondcoreference

Model bAbiQA(1ktraining)

LAMBADA CNN

PreviousBest 90.1 49.0 77.9GA 75.2 50.0 77.9GA+MAGE 91.3 51.6 78.6

• AJen)onoveredgetypes

Page 35: Representaon Learning for Reading Comprehension

LearnedRepresenta)on

Page 36: Representaon Learning for Reading Comprehension

TalkRoadmap

•  Mul)plica)veandFine-grainedAJen)on

•  Linguis)cKnowledgeasExplicitMemoryforRNNs

•  Genera)veDomain-Adap)veNets

Page 37: Representaon Learning for Reading Comprehension

Inmeteorology,precipita)onisanyproductofthecondensa)onofatmosphericwatervaporthatfallsundergravity.Themainformsofprecipita)onincludedrizzle,rain,sleet,snow,andhail…Precipita)onformsassmallerdropletscoalesceviacollisionwithotherraindropsoricecrystalswithinacloud.Short,intenseperiodsofraininscaJeredloca)onsarecalled“showers”

Whatcausesprecipita)ontofall?gravity

• Givenaparagraph/ques)on,extractaspanoftextastheanswer• Expensivetoobtainlargelabeleddatasets• SOTAapproachesrelyonlargelabeleddatasets

Extrac)veQues)onAnswering

SQuAD Dataset, Rajpurkar et al., 2016

Page 38: Representaon Learning for Reading Comprehension

LeverageUnlabeledText

• Almostunlimitedunlabeledtext.

Page 39: Representaon Learning for Reading Comprehension

LabeledQApairs Unlabeledtext

QAModel

Semi-SupervisedQA

Page 40: Representaon Learning for Reading Comprehension

Extrac)veQues)onAnswering

• UsePOS/NER/parsingtoextractpossibleanswerchunks• Anythingcanbetheanswers• Wewillassumethatanswersareavailable.

Inmeteorology,precipita)onisanyproductofthecondensa)onofatmosphericwatervaporthatfallsundergravity.Themainformsofprecipita)onincludedrizzle,rain,sleet,snow,andhail…Precipita)onformsassmallerdropletscoalesceviacollisionwithotherraindropsoricecrystalswithinacloud.Short,intenseperiodsofraininscaJeredloca)onsarecalled“showers”

Whatcausesprecipita)ontofall?gravity

Page 41: Representaon Learning for Reading Comprehension

Labeleddata

p,q,a

Unlabeleddata

p,a

q

GeneratorG:From(p,a)qSeq2seqwithcopymechanism

DiscriminatorD:CombinetotrainaQAmodel

From(p,q)aGAreader

Genera)ngQues)ons

Page 42: Representaon Learning for Reading Comprehension

Baseline1:ContextQues)ons

Generateacontextques)onfortheanswer“gravity”

Inmeteorology,precipita)onisanyproductofthecondensa)onofatmosphericwatervaporthatfallsundergravity.Themainformsofprecipita)onincludedrizzle,rain,sleet,snow,andhail…Precipita)onformsassmallerdropletscoalesceviacollisionwithotherraindropsoricecrystalswithinacloud.Short,intenseperiodsofraininscaJeredloca)onsarecalled“showers”

Whatcausesprecipita)ontofall?gravity

WatervaporthatfallsunderThemainformsof

Page 43: Representaon Learning for Reading Comprehension

Baseline2:GANs

paragraph,answer

paragraph,ques)on

Answer(reconstruc)on)

TrueorFakeques)on?

G

DD’

Goodfellow et al., 2014, Ganin et al. 2014 , Xia et al., 2016

Page 44: Representaon Learning for Reading Comprehension

Johnson et al., 2016; Chu et al., 2017

Genera)veDomain-Adap)veNets(GDANs)

LabeledData

TrainD TrainG

Yang, Hu, Salakhutdinov, Cohen, ACL 2017

UnlabeledData

TrainD

Page 45: Representaon Learning for Reading Comprehension

Johnson et al., 2016; Chu et al., 2017

Genera)veDomain-Adap)veNets(GDANs)

LabeledData

UnlabeledData

TrainD TrainD TrainG

GeneratorasaDataDomain

Condi)onDiscriminatorDonDomainsAdversarialtrainingforG

Yang, Hu, Salakhutdinov, Cohen, ACL 2017

Page 46: Representaon Learning for Reading Comprehension

ExamplesContext:“…anaddi)onalwarmingoftheEarth’ssurface.Theycalculatewith

confidencethatC02hasbeenresponsibleforoverhalftheenhancedgreenhouseeffect.Theypredictthatundera“businessasusual”scenario,…”

Answer:overhalfQuesJon:whattheenhancedgreenhouseeffectthatCO2beenresponsiblefor?GroundTrueQ:Howmuchofthegreenhouseeffectisduetocarbondioxide?

Context:“…in0000,bankamericardwasrenamedandspunoffintoaseparatecompanyknowntodayasvisainc.”

Answer:visainc.QuesJon:whatwastheseparatecompanybankamericard?GroundTrueQ:whatpresent-daycompanydidbankamericardturninto?

Yang, Hu, Salakhutdinov, Cohen, ACL 2017

Page 47: Representaon Learning for Reading Comprehension

SQuADdataset

Labelingrate Method TestF1 ExactMatching

0.1 Supervised 0.3815 0.24920.1 Context 0.4515 0.29660.1 Gen+GAN 0.4373 0.28850.1 GDAN 0.4802 0.32180.5 Supervised 0.5722 0.41870.5 Context 0.5740 0.41950.5 Gen+GAN 0.5590 0.40440.5 GDAN 0.5831 0.4267

• SQuADdataset:87,636training,10,600developmentinstances• Use50Kunlabelledexamples.

Page 48: Representaon Learning for Reading Comprehension

Varia)onalAutoencoder(VAE)• Transformsamplesfromsomesimpledistribu)on(e.g.normal)tothedatamanifold:

Determinis)cneuralnetwork

Genera)veProcess

Knigma and Welling, 2014

Themoviewasawfulandboring

Page 49: Representaon Learning for Reading Comprehension

VAEforTextGenera)on

Hu, Yang, Liang, Salakhutdinov, Xing, 2017

• Samplec,fixz.

Page 50: Representaon Learning for Reading Comprehension

VAEforTextGenera)on

Hu, Yang, Liang, Salakhutdinov, Xing, 2017

• Samplez,fixc.

Page 51: Representaon Learning for Reading Comprehension

Thankyou