Introduction to Bayesian Additive Regression Trees for ...Observational Studies 5:52-70. Carnegie...

IntroductiontoBayesianAdditiveRegressionTreesforCausalInferenceNICOLEBOHMECARNEGIEMON TAN A S TAT E U N I V ER S I T Y, N IC O LE .CA R NEG IE@MON TANA . EDU

Roadmap◦Whatareadditiveregressiontrees?◦WhyBayesian?◦BARTforcausalinference

RegressiontreesBuildingblockofBARTisregressiontrees:◦ Algorithmicpartitionofdataintonon-overlappingsubsets◦Goalistominimizethevarianceintheresponsevariablewithinsubsets◦ Resultingregressionfitisthemeanofeachsubset

TreefittingexampleAnexampleoftheconstructionofasingletree.ThedataarepartitionedfirstatX=80,andthenatX=90amongthoseobservationswithX80.Thefitforthetreeisthemeanofobservationsthatfallineachterminalnode(shaded ingrey),andisshownashorizontallinesegments onthescatterplot.(fromCarnegie&Wu2020)

Why*Additive* regressiontrees?◦ Asingleregressiontreewillover-emphasizeinteractionsbetweenvariablesandhavedifficultyfindinglinearrelationships.

Why*Additive* regressiontrees?◦ Asingleregressiontreewillover-emphasizeinteractionsbetweenvariablesandhavedifficultyfindinglinearrelationships.◦ Alternative:fitmanysmalltreesusingaback-fittingalgorithm.◦ Fitasmalltree◦ Getthefittedvaluesfromthattree◦ Subtractfittedvaluesfromtheobservedvaluesoftheresponse◦ Fitanothersmalltree

◦ Repeatuntilsomenumberofsmalltreeshavebeenfit.

Caveat:OverfittingWithoutsomemechanismtocontrolit,anytree-basedmodelcaneasilyover-fitthedata◦ Boostedregressiontreeslimittreedepthanduseshrinkagebymultiplyingthefitofeachtreebysomesmallconstant,chosenbycross-validation

◦ Asingletreemodelmightusecross-validationto“prune”branchesinthedecisiontreethatarenotrobusttoremovalofafewobservationsfromthedata

◦ BARTsets intelligentpriorsforthedepthofeachtreeandtheshrinkagefactor

Advantage:ComplexresponsesurfacesFromHill(2011):Leftpanelisasinglebinarytreefittothedata,rightpanelshowstruereponse curve,single-treefitandBARTfit.

TheBARTmodelBARTsum-of-treesmodel:Y=g(z,x;T1,M1)+…+g(z,x;Tm,Mm)+ε =f(z,x)+ε◦ Treemodel(T,M)◦ T isabinarytree◦ M={µ1,µ2,…,µb} isthevectorofmeansintheb terminalnodesofthetree

◦ g(z,x;T,M): valueobtainedbyfollowingobservation(z,x)downtreeandreturningmeanfortheterminalnodeinwhichitlands

◦ ε ~N(0,σ2)

BARTregularizationprior:◦ Priorpreferencefortreestobesmall(fewterminalnodes)◦ PriorshrinkingmeansMj toward0◦ Prioronσ2 suggestingitissomewhatsmallerthananOLSregression

WhyaBayesianframework?◦ Datacanovercomeassumptionsaboutdepthoftrees,shrinkageneeded◦ Numberoftreesremainsasatuningparameter◦ ComputationalbenefitsfromavoidingCV◦ Embedswhatisnormallyanalgorithmicapproachinalikelihoodframeworktoproducecoherentuncertaintyintervals,unusualformachinelearningapproaches

BARTandcausalinference:why?Precisemodelingoftheresponsesurface◦ Morethoroughcontrolforconfoundingthanwithtraditionalparametricmodels.

Straightforwardestimationofcausaleffectsfromposteriordistributions◦ Averagetreatmenteffects◦ Heterogeneouscausaleffects

◦ FigurefromDorieetal(2019):Resultsfromacausalinferencecompetitionusingautomatedmethods.

◦ Allmethodsfromleft(excludingOracle)toSuperLearner useanonparametricmethodtofittheresponsesurface

Obtainingposteriordistributions1.Setup“testset”ofcovariates

2.FitBARTonobservedcovariatesandresponse

3.Computecausaleffectofinterestfromdrawsoftheestimatedresponseforthetestset

Obtainingposteriordistributions1.Setup“testset”ofcovariates◦ ToestimateaSATE:covariatesofallobservations withtreatmentsettoopposite ofobserved

2.FitBARTonobservedcovariatesandresponse

3.Computecausaleffectofinterestfromdrawsoftheestimatedresponseforthetestset◦ ToestimateaSATE:compute differenceofestimateforobserved treatmentandcounterfactualtreatment(changesign sothatdifference istreated-untreated).Averageacross observations foreachdrawfrom theposterior

◦ Canthenplot posterior, computemeanandcredible intervalforATE

OrusebartCause packageinR,whichhasawrapperthatcompletesthesestepsforyou!

Ex:HeterogeneoustreatmenteffectsFromCarnegie,et.al.(2019):Left:meansand95%credibleintervalsofposteriordistributionsforATEofmindsetinterventiononstudentachievement,foreachlevelofanorderedcategoricalvariable(studentexpectationsforsuccess).Right:posteriordistributionfordifferenceinmeaneffectsbetweentwotoplevelsoffuturesuccessandtherest.(Simulateddatachallenge)

ToolsforBARTmodelingThereareanumberofRpackagesthatfitBARTmodels:◦ BayesTree:basicBARTmodel◦ dbarts:expandstoincluderandomeffectsmodelsandautomaticcrossvalidation

◦ bartCause:wrapperfunctionsusingdbarts implementationthatspecificallytargetcausalinference

◦ treatSens:includessensitivity analysismethodsforBARTmodels

SensitivityanalysiswithBARTFigurefromCarnegieetal.(2019).

Inconclusion◦ Advantagesofmachine learningwithadvantagesofformalstatisticalinference

◦ Computationally (relatively)efficient◦ Robustimplementationforeaseofuse◦ Demonstratedsuccess incausalinferencechallenges◦ Toolsavailabletoassess sensitivitytounmeasuredconfoundingnotcapturedbyflexibleresponsesurfacemodeling

References◦ Carnegie,NB,Dorie,V,andHill,JH.(2019)ExaminingtreatmenteffectheterogeneityusingBART.ObservationalStudies 5:52-70.◦ CarnegieNB,andWuJ.(2020)VariableselectionandparametertuningforBARTmodelingintheFragileFamiliesChallenge.Socius Inpress.

◦ Chipman,H.,George,E.andMcCulloch,R.(2007).Bayesianensemblelearning.InAdvancesinNeuralInformationProcessingSystems19(B.Scholkopf,J.PlattandT.Hoffman,eds.).MIT Press,Cambridge,MA.

◦ Chipman,H.A.,George,E.I.andMcCulloch,R.E.(2010).BART:Bayesianadditiveregressiontrees.AnnalsofAppliedStatistics;4:266–298.

◦ Chipman H,McCullochR.BayesTree:BayesianmethodsforTreeBasedModels,2010.Availablefrom:http://CRAN.R-project.org/package=BayesTree

◦ DorieV,Chipman H,McCullochR.DBARTS:DiscreteBayesianAdditiveRegressionTreesSampler.Availablefrom:http://CRAN.R-project.org/package=dbarts

◦ DorieV,HillJL.bartCause:CausalinferenceusingBayesianAdditiveRegressionTrees.Availablefrom:http://cran.r-project.org/package=bartcause

◦ DorieV,HillJL,Shalit U,ScottM,andCervoneD.(2019)Automatedversusdo-it-yourselfmethodsforcausalinference:lessonslearnedfromadataanalysiscompetition.StatisticalScience;34(1):43-68.

◦ Green,D.P.andKern,H.L.(2012).ModelingheterogeneoustreatmenteffectsinsurveyexperimentswithBayesianadditiveregressiontrees.PublicOpinionQuarterly;76:491–511.

◦ HillJ.(2011)Bayesiannonparametricmodelingforcausalinference.JournalofComputationalandGraphicalStatistics;20(1):217–240.

◦ HillJ,SuYS.(2013)Assessing lackofcommonsupportincausalinferenceusingBayesiannonparametrics:Implicationsforevaluatingtheeffectofbreastfeedingonchildren’scognitiveoutcomes.AnnalsofAppliedStatistics;7(3):1386–1420.

Introduction to Bayesian Additive Regression Trees for ...Observational Studies 5:52-70. Carnegie...

Documents

Transcript of Introduction to Bayesian Additive Regression Trees for ...Observational Studies 5:52-70. Carnegie...