Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based...

Post on 18-Aug-2020

2 views 0 download

Transcript of Stacking With Auxiliary Features: Improved Ensembling for ... · • Bipartite Graph-based...

Stacking With Auxiliary Features: Improved Ensembling for Natural Language and Vision

NazneenRajaniPhDProposal

November7,2016Committeemembers:RayMooney,KatrinErk,GregDurrettandKenBarker

Outline• Introduction• Background&RelatedWork• CompletedWork

– StackedEnsemblesofInformationExtractorsforKnowledgeBasePopulation(ACL2015)– StackingWithAuxiliaryFeatures(Underreview)–CombiningSupervisedandUnsupervisedEnsemblesforKnowledgeBasePopulation(EMNLP2016)

• ProposedWork– Short-termproposals– Long-termproposals

2

Introduction

3

System 1

f( )

System 2

System N-1

System N

input

input

input

input

output

• Ensembling:Usedbythe$1MwinningteamfortheNetflixcompetition

Introduction• Makeauxiliaryinformationaccessibletotheensemble

4

System 1

f( )

System 2

System N-1

System N

input

input

input

input

output

Auxiliary information about task and systems

Background and Related Work

5

Cold Start Slot Filling (CSSF)• KnowledgeBasePopulation(KBP)isataskofdiscoveringentityfactsandaddingtoaKB

• Relationextraction,aKBPsub-task,usingfixedontologyisslotfilling

• CSSFisanannualNISTevaluationofbuildingKBfromscratch- queryentitiesandpre-definedslots- textcorpus 6

Cold Start Slot Filling (CSSF)• Someslotsaresingle-valued(per:age)whilesomearelist-valued(per:children)

• Entitytypes:PER,ORG,GPE• Alongwithfills,systemsmustprovide- confidencescore- provenance—docid:startoffset-endoffset

7

Cold Start Slot Filling (CSSF)

8

1. city_of_headquarters:2. website:3. subsidiaries:4. employees:5. shareholders:

Microsoftisatechnologycompany,headquarteredinRedmond,Washingtonthatdevelops…

city_of_headquarters:Redmondprovenance:

confidencescore:1.0

org:Microsoft

Cold Start Slot Filling (CSSF)

Distant Supervision

Bootstrapping

Universal Schema

Multi-Instance Multi-Learning

Training Data

Source Corpus

Query

Query Expansion

Document level IR

Aliasing

Answer 9

Entity Discovery and Linking (EDL)• KBPsub-taskinvolvingtwoNLPproblems- NamedEntityRecognition(NER)- Disambiguation

• EDLisanannualNISTevaluationin3languages:English,SpanishandChinese

• Tri-lingualEntityDiscoveryandLinking(TEDL)10

Tri-lingual Entity Discovery and Linking (TEDL)

• Detectallentitymentionsincorpus• LinkmentionstoEnglishKB(FreeBase)• IfnoKBentryfound,clusterintoaNILID• Entitytypes—PER,ORG,GPE,FAC,LOC• Systemsmustalsoprovideconfidencescore

11

Tri-lingual Entity Discovery and Linking (TEDL)

12

FreeBase entry: Hillary Diane Rodham Clinton is a US Secretary of State, U.S. Senator, and First Lady of the United States. From 2009 to 2013, she was the 67th Secretary of State, serving under President Barack Obama. She previously represented New York in the U.S. Senate. Source Corpus Document:

Hillary Clinton Not Talking About ’92 Clinton-Gore Confederate Campaign Button.. FreeBase entry:

WilliamJefferson"Bill"ClintonisanAmericanpoli5cianwhoservedasthe42ndPresidentoftheUnitedStatesfrom1993to2001.ClintonwasGovernorofArkansasfrom1979to1981and1983to1992,andArkansasAJorneyGeneralfrom1977to1979.

Tri-lingual Entity Discovery and Linking (TEDL)

Unsupervised Similarity

Supervised Classification

Graph Based

Joint Approach

FreeBase KB

Query

Query Expansion

Answer

Candidate Generation and Ranking

13

ImageNet Object Detection• WidelyknownannualcompetitioninCVforlarge-scaleobjectrecognition

• Objectdetection- detectallinstancesofobjectcategories(total200)inimages

- localizeusingaxis-alignedBoundingBoxes(BB)• ObjectcategoriesareWordNetsynsets• Systemsalsoprovideconfidencescores 14

ImageNet Object Detection

15

Ensemble Algorithms(Wolpert, 1992)

16

• StackingSystem 1

System 2

System N-1

System N

Trained classifier

Accept?

conf 1

conf 2

conf N-1

conf N

Ensemble Algorithms• BipartiteGraph-basedConsensusMaximization(BGCM)(Gaoetal.,2009)- ensembling->optimizationoverbipartitegraph- combiningsupervisedandunsupervisedmodels

• MixturesofExperts(ME)(Jacobsetal.,1991)- partitiontheproblemintosub-spaces- learntoswitchexpertsbasedoninputusingagatingnetwork- DeepMixturesofExperts(Eigenetal.,2013)

17

Completed Work:I. Stacked Ensembles of Information Extractors for

Knowledge Base Population (ACL2015)

18

Stacking(Wolpert, 1992)

Foragivenproposedslot-fill,e.g.spouse(Barack,Michelle),combineconfidencesfrommulgplesystems:

System 1

System 2

System N-1

System N

Trained linear SVM

conf 1

conf 2

conf N-1

conf N Accept? 19

Stacking with FeaturesForagivenproposedslot-fill,e.g.spouse(Barack,Michelle),combineconfidencesfrommulgplesystems:

System 1

System 2

System N-1

System N

conf 2

conf N-1

conf N

Trained linear SVM

Accept?

Slot Type conf 1

20

Stacking with FeaturesForagivenproposedslot-fill,e.g.spouse(Barack,Michelle),combineconfidencesfrommulgplesystems:

System 1

System 2

System N-1

System N

Trained linear SVM

Accept?

Slot Type Provenance conf 1

conf 2

conf N-1

conf N 21

Document Provenance Feature• Foragivenqueryandslot,foreachsystem,i,thereisafeatureDPi:- Nsystemsprovideafillfortheslot.- Ofthese,ngivesameprovenancedocidasi.- DPi=n/Nisthedocumentprovenancescore.

• Measuresextenttowhichsystemsagreeondocumentprovenanceoftheslotfill.

22

Offset Provenance Feature• Degreeofoverlapbetweensystems’provenancestrings.• UsesJaccardsimilaritycoefficient.

• SystemswithdifferentdocidhavezeroOP

23

Offset Provenance Feature

24

Offsets System 1 System 2 System 3

Start Offset 1 4 5

End Offset 9 7 12

OP1 =12×49+512

"

#$

%

&'

1 2 3 4 5 6 7 8 9 10 11 12 13

System 2

System 3

Results

Approach Precision Recall F1

Union 0.176 0.647 0.277

Voting 0.694 0.256 0.374

Best ESF system in 2014 (Stanford) 0.585 0.298 0.395

Stacking 0.606 0.402 0.483

Stacking + Relation 0.607 0.406 0.486

Stacking + Provenance + Relation 0.541 0.466 0.501

25

• Usingthe10commonsystemsbetween2013and2014

(>=3)

Takeaways• Stackedmeta-classifierbeatsthebestperforming2014KBPSF

systembyanF1gainof11points.• Featuresthatutilizeauxiliaryinformationimprovestacking

performance.• Ensemblinghasclearadvantagesbutnaiveapproachessuchas

votingdonotperformaswell.• Althoughsystemschangeeveryyear,thereareadvantagesin

trainingonpastdata.26

Completed Work:II. Stacking With Auxiliary Features (under review)

27

Stacking With Auxiliary Features (SWAF)

System1

System2

SystemN

TrainedMeta-classifier

ProvenanceFeatures

conf2

confN Accept?

SystemN-1 confN-1

conf1

AuxiliaryFeatures

InstanceFeatures

• Stackingusingtwotypesofauxiliaryfeatures:

28

Instance Features• Enablesstackertodiscriminatebetweeninputinstancetypes

• Somesystemsarebetteratcertaininputtypes• CSSF—slottype(per:age)• TEDL—entitytype(PER/ORG/GPE/FAC/LOC)• Objectdetection—objectcategoryandSIFTfeaturedescriptors

29

Provenance Features• Enablesthestackertodiscriminatebetweensystems

• Outputisreliableifsystemsagreeonsource• CSSFsameasslotfilling• TEDL—measuresoverlapofamention

30

Provenance Features• Objectdetection—measureBBoverlap

31

+

Post-processing• CSSF

- singlevaluedslotfills—resolveconflicts- listvaluesslotfills—alwaysinclude

• TEDL- KBID—includeinoutput- *NILID—mergeacrosssystemsifatleastoneoverlap

• Objectdetection- Foreachsystem,measuremaximumsumoverlapwithothersystems- Union/intersection—penalizedbyevaluationmetric

32

Results• 2015CSSF—10sharedsystems

Approach Precision Recall F1ME(Jacobsetal.,1991) 0.479 0.184 0.266Oraclevoting(>=3) 0.438 0.272 0.336

Toprankedsystem(Angelietal.,2015) 0.399 0.306 0.346Stacking 0.497 0.282 0.359

Stacking+instancefeatures 0.498 0.284 0.360

Stacking+provenancefeatures 0.508 0.286 0.366

SWAF 0.466 0.331 0.38733

Results• 2015TEDL—6sharedsystems

Approach Precision Recall F1Oraclevoting(>=4) 0.514 0.601 0.554

ME(Jacobsetal.,1991) 0.721 0.494 0.587Toprankedsystem(Siletal.,2015) 0.693 0.547 0.611

Stacking 0.729 0.528 0.613Stacking+instancefeatures 0.783 0.511 0.619

Stacking+provenancefeatures 0.814 0.508 0.625

SWAF 0.814 0.515 0.63034

Results• 2015ImageNetobjectdetection—3sharedsystems

Approach MeanAP MedianAPOraclevoting(>=1) 0.366 0.368

Beststandalonesystem(VGG+selectivesearch) 0.434 0.430Stacking 0.451 0.441

Stacking+instancefeatures 0.461 0.45MixturesofExperts(Jacobsetal.,1991) 0.494 0.489

Stacking+provenancefeatures 0.502 0.494

SWAF 0.506 0.497 35

Results on object detection

36

Takeaways• SWAFproducedSOTAonCSSFandTEDL;significant

improvementsonobjectdetection• OurapproachismorerobustthanMEintermsofnumberof

componentsystems• Workswellforimageswithmultipleinstancesofthesame

object

37

Completed Work:III. Combining Supervised and Unsupervised Ensembles for

Knowledge Base Population (EMNLP2016)

38

Combining supervised & unsupervised ensembles

SupSystem1

SupSystem2

SupSystemN

UnsupSystem1

TrainedlinearSVM

AuxiliaryFeatures

conf1

conf2

confN

UnsupSystem2 Calibratedconf

UnsupSystemM

ConstrainedOp@miza@on(Wengetal,2013)

Accept?

39

Unsupervised ensemble(Wang et al., 2013)

• Approachtoaggregaterawconfidencevalues• Re-weighttheconfidencescoreofaninstance- numberofsystemsthatproduceit- rankofthosesystems

• Uniformweightsforallsystems• Ourworkextendstoentitylinking

40

Results• 2015CSSF—#supsystems=10,#unsupsystems=13

Approach Precision Recall F1Constrainedoptimization 0.1712 0.3998 0.2397

Oraclevoting(>=3) 0.4384 0.2720 0.3357Toprankedsystem(Angelietal.,2015) 0.3989 0.3058 0.3462

SWAF 0.4656 0.3312 0.3871BGCMforcombiningsup+unsup 0.4902 0.3363 0.3989Stackingforcombiningsup+unsup

(BGCM) 0.5901 0.3021 0.3996

Stackingforcombiningsup+unsup(constrainedoptimization) 0.4676 0.4314 0.4489

41

Results• 2015TEDL—#supsystems=6,#unsupsystems=4

Approach Precision Recall F1Constrainedoptimization 0.176 0.445 0.252

Oraclevoting(>=4) 0.514 0.601 0.554Toprankedsystem(Siletal.,2015) 0.693 0.547 0.611

SWAF 0.813 0.515 0.630BGCMforcombiningsup+unsup 0.810 0.517 0.631Stackingforcombiningsup+unsup

(BGCM) 0.803 0.525 0.635

Stackingforcombiningsup+unsup(constrainedoptimization) 0.686 0.624 0.653

42

Takeaways• Manyhighrankingsystemsw/otrainingdata• Approximately1/3ofpossibleoutputsproducedbyunsupervisedensemble

• Combinationimprovesrecallsubstantially

43

Proposed Work:I. Short-term proposals — Semantic Instance-level Features

44

Instance-level features• Completedworkincludedonlysuperficialinstancefeatures

• Focusmoreontheinstancefeatures—taskspecific• Specifically,moresemanticfeatures• Basedontheresults,thesefeatures:- helpimproveperformancebythemselves,- usedalongwithprovenance

45

EDL instance-level features(Francis et al., 2016)

• UsedcontextualinformationtodisambiguateentitymentionsusingCNNsforEDL

• Computessimilaritiesbetweenamention'ssourcedocumentanditspotentialentitytargetsatmultiplegranularities.

• CNNs:textblocktopicvector46

EDL instance-level features

Men$onHillaryClintonContextForahostofreasons,HillaryClintonsomehowhasfailedtodeveloptherhetoricalorinterpersonalskillsthatmadeherhusband,andBarackObama,soappealingonthecampaigntrail.DocumentIdon'tdisagreewiththegistofDowd'sarBcle.Forahostofreasons,HillaryClintonsomehowhasfailedtodeveloptherhetoricalorinterpersonalskillsthatmadeherhusband,andBarackObama,soappealingonthecampaigntrail.Clintonhasalso,forreasonsgoodandbad,madeanumberoferrorsinjudgementinherrun-uptohercurrentcampaign...

TitleHillaryDianeRodhamClintonAr$cleHillaryClintonisaformerUnitedStatesSecretaryofState,U.S.Senator,andFirstLadyoftheUnitedStates.From2009to2013,shewasthe67thSecretaryofState,servingunderPresidentBarackObama.ShepreviouslyrepresentedNewYorkintheU.S.Senate.Beforethat,asthewifeofPresidentBillClinton,shewasFirstLadyfrom1993to2001.Inthe2008elecBon,ClintonwasaleadingcandidatefortheDemocraBcpresidenBalnominaBon.AnaBveofIllinois,HillaryRodhamwasthefirststudentcommencementspeakeratWellesleyCollegein1969.ShethenearnedaJ.D.fromYaleLawSchoolin1973.AXerabriefsBntasaCongressionallegalcounsel,shemovedtoArkansasandmarriedBillClintonin1975.RodhamcofoundedtheArkansasAdvocatesforChildrenandFamiliesin1977..

smenBon tBtle

tarBcle

scontext

sdocument

• Examplesourceandtargetgranularitiesforaninstanceinthe2016NISTKBPdataset.

47

Object detection instance-level features• ImageNetprovidesattributesdatasetforcertaincategories• Annotatedwithpre-definedsetsofattributes:- Color:black,blue,brown,gray,green,orange,pink,red,violet,white,yellow

- Pattern:spotted,striped- Shape:long,round,rectangular,square- Texture:furry,smooth,rough,shiny,metallic,vegetation,wooden,wet

48

Proposed Work:I. Short-term proposals — Improve Foreign Language KBP

49

Foreign language features• ThisworkwillonlyapplytotheKBPtasks• Resultsonthe2016TEDLtask

Language Precision Recall F1

English 0.805 0.508 0.623

Spanish 0.79 0.443 0.568

Chinese 0.792 0.495 0.609

Combined 0.789 0.481 0.59750

Foreign language features• TEDL-foreignlanguagetrainingdata• AuxiliaryfeaturesdonottranslatetoChineseandSpanish

• Straightforwardfeature—languageindicator• Uselanguageindependentfeatures- non-lexical

51

Language Independent Entity Linking (LIEL) solution to TEDL

(Sil and Florian, 2016)

• EntitycategoryPMI• Categoricalrelationfrequency• Titleco-occurrencefrequency

52

Proposed Work:II. Long-term proposals — Visual Question Answering

53

Visual Question Answering (VQA)(Antol et al., 2015)

• UnderstandhowDNNsdoobjectdetection

54

Visual Question Answering (VQA)• VQAinvolvesbothlanguageandvision• DemonstrateSWAFonVQA• Ensemblebasedontheanswers- Multiplechoicequestions- Openendedanswers—90%one-wordanswers

• Useexplanationsasauxiliaryfeatures 55

Proposed Work:II. Long-term proposals — Explanations as auxiliary features

56

Explanation as auxiliary features• Completedworkfocusedonusingprovenance• Captured“where”aspectoftheoutput• RecentworkongeneratingexplanationstointerpretDNNs:- TowardsTransparentAIsystems- Generatingvisualexplanations- VisualQuestionAnswering(VQA)

• DARPAprogramforexplainableAI(XAI) 57

Explanation as auxiliary features• Useexplanationsasauxiliaryfeatures• Capture“why”aspectoftheoutput• Twotypesofexplanations:- Textual- Visual

58

Text as Explanation(Hendricks et al., 2016)

• Generatingvisualexplanations• Jointlypredictvisualclassandgeneratetextasexplanation

• Usesdescriptivepropertiesvisibleintheimage

59

Text as Explanation

60

ThisisaKentuckywarblerbecausethisisayellowbirdwithashorttail

ThisisaKentuckywarblerbecausethisisayellowbirdwithablackcheekpatchandablackcrown

SystemA(Berkeley) SystemBInputimage

Text as Explanation• Trustagreementbetweensystemswithsimilarexplanations

• MTmetrics—BLEU/METEORforsimilarity• MinimumBayesRisk(MBR)decoding• Embeddingsofwordsintheexplanation

61

Images as Explanation• DNNsattendtorelevantpartsofimagewhiledoingVQA(Goyaletal.,2016)

• Heat-maptovisualizeattentioninimages• Humanstrustsystemswithbetterexplanationsmoreevenwhentheyallpredictthesameoutput(Selvarajuetal.,2016)

• Enablethestackertolearntorelyonsystemsthat“look”attherightregionoftheimagewhilepredictingtheanswer

62

Images as Explanation

63

SystemA SystemBInputimage

Q:Whatcoloristhecat? A:Brown A:Brown

Images as Explanation• UsevisualexplanationtoimproveVQA• Measureagreementbetweensystems’heat-maps- KL-divergence- Measurecorrelation

• Usingvisualexplanation- improveperformance- modelwithbetterexplanations 64

Conclusion

65

Conclusion• Generalproblemofcombiningoutputsfromdiversesystems

• SWAFonthreedifficulttasks• Provenancecaptures“where”oftheoutput• Combiningsupervisedandunsupervisedensemblesimprovesrecall

• Short-term:betterauxiliaryfeatures• Long-term:focuson“why”oftheoutput

66

67

System1

System2

System3

Meta-classifier

You!Questions?

ThankYou!

Questions?

ThankYou!Questions?

Backup slides

68

Results on CSSF

69

Results on TEDL

70

Number of systems in 2016Supervised Unsupervised

English Chinese Spanish English Chinese Spanish

TEDL 5 4 4 7 3 3

CSSF 8 2 3 8 1 0

71

Learning Curve• Systemschangeeachyear.• Stillusefultotrainonpastdata.

72

Incremental Training on Systems• Sortthecommonsystemsbasedontheirperformance.• Traintheclassifieraddingonesystemateachstep.• Teston2014data.

73

Unsupervised ensemble• Mutualexclusionproperty

• Listvaluedslotfillreplace1by

• Forentity-linking,1isreplacedwith

avg no. of correct slot fills

total no. of slot fills

avg no. of correct mentions for an entity type

total no. of mentions for that entity type

74

Ratio of sup and unsup systems• Unsupervised~1/3ofthecombination• Commonoutput:22%forCSSFand15%forTEDL

75

KBP instance-level features• Embedthewordsinad-dimensionalspace- d=300withwindowsize=21

• Wordsvectorusingaconv-netfilterMg

• SimilarsemanticfeaturesbetweenquerydocumentandprovenancedocumentfortheCSSFtask

76

Language Independent Entity Linking (LIEL) solution to TEDL (Sil and Florian, 2016)

• EntitycategoryPMI- CalculatesthePMIbetweenpairofentities(e1,e2)thatco-occurinadocument

• Categoricalrelationfrequency- CountthenumberofKBrelationsthatexistsbetweenpairofentities(e1,e2)

• Titleco-occurrencefrequency- Foreverypairofconsecutiveentities(e,e’),computesthenumberoftimese’appearsasalinkintheKBpagefore 77