Introduction to Bayesian Additive Regression Trees for ...Observational Studies 5:52-70. Carnegie...
Transcript of Introduction to Bayesian Additive Regression Trees for ...Observational Studies 5:52-70. Carnegie...
-
IntroductiontoBayesianAdditiveRegressionTreesforCausalInferenceNICOLEBOHMECARNEGIEMON TAN A S TAT E U N I V ER S I T Y, N IC O LE .CA R NEG IE@MON TANA . EDU
-
Roadmap◦Whatareadditiveregressiontrees?◦WhyBayesian?◦BARTforcausalinference
-
RegressiontreesBuildingblockofBARTisregressiontrees:◦ Algorithmicpartitionofdataintonon-overlappingsubsets◦Goalistominimizethevarianceintheresponsevariablewithinsubsets◦ Resultingregressionfitisthemeanofeachsubset
-
TreefittingexampleAnexampleoftheconstructionofasingletree.ThedataarepartitionedfirstatX=80,andthenatX=90amongthoseobservationswithX80.Thefitforthetreeisthemeanofobservationsthatfallineachterminalnode(shaded ingrey),andisshownashorizontallinesegments onthescatterplot.(fromCarnegie&Wu2020)
-
Why*Additive* regressiontrees?◦ Asingleregressiontreewillover-emphasizeinteractionsbetweenvariablesandhavedifficultyfindinglinearrelationships.
-
Why*Additive* regressiontrees?◦ Asingleregressiontreewillover-emphasizeinteractionsbetweenvariablesandhavedifficultyfindinglinearrelationships.◦ Alternative:fitmanysmalltreesusingaback-fittingalgorithm.◦ Fitasmalltree◦ Getthefittedvaluesfromthattree◦ Subtractfittedvaluesfromtheobservedvaluesoftheresponse◦ Fitanothersmalltree
◦ Repeatuntilsomenumberofsmalltreeshavebeenfit.
-
Caveat:OverfittingWithoutsomemechanismtocontrolit,anytree-basedmodelcaneasilyover-fitthedata◦ Boostedregressiontreeslimittreedepthanduseshrinkagebymultiplyingthefitofeachtreebysomesmallconstant,chosenbycross-validation
◦ Asingletreemodelmightusecross-validationto“prune”branchesinthedecisiontreethatarenotrobusttoremovalofafewobservationsfromthedata
◦ BARTsets intelligentpriorsforthedepthofeachtreeandtheshrinkagefactor
-
Advantage:ComplexresponsesurfacesFromHill(2011):Leftpanelisasinglebinarytreefittothedata,rightpanelshowstruereponse curve,single-treefitandBARTfit.
-
TheBARTmodelBARTsum-of-treesmodel:Y=g(z,x;T1,M1)+…+g(z,x;Tm,Mm)+ε =f(z,x)+ε◦ Treemodel(T,M)◦ T isabinarytree◦ M={µ1,µ2,…,µb} isthevectorofmeansintheb terminalnodesofthetree
◦ g(z,x;T,M): valueobtainedbyfollowingobservation(z,x)downtreeandreturningmeanfortheterminalnodeinwhichitlands
◦ ε ~N(0,σ2)
BARTregularizationprior:◦ Priorpreferencefortreestobesmall(fewterminalnodes)◦ PriorshrinkingmeansMj toward0◦ Prioronσ2 suggestingitissomewhatsmallerthananOLSregression
-
WhyaBayesianframework?◦ Datacanovercomeassumptionsaboutdepthoftrees,shrinkageneeded◦ Numberoftreesremainsasatuningparameter◦ ComputationalbenefitsfromavoidingCV◦ Embedswhatisnormallyanalgorithmicapproachinalikelihoodframeworktoproducecoherentuncertaintyintervals,unusualformachinelearningapproaches
-
BARTandcausalinference:why?Precisemodelingoftheresponsesurface◦ Morethoroughcontrolforconfoundingthanwithtraditionalparametricmodels.
Straightforwardestimationofcausaleffectsfromposteriordistributions◦ Averagetreatmenteffects◦ Heterogeneouscausaleffects
-
◦ FigurefromDorieetal(2019):Resultsfromacausalinferencecompetitionusingautomatedmethods.
◦ Allmethodsfromleft(excludingOracle)toSuperLearner useanonparametricmethodtofittheresponsesurface
-
Obtainingposteriordistributions1.Setup“testset”ofcovariates
2.FitBARTonobservedcovariatesandresponse
3.Computecausaleffectofinterestfromdrawsoftheestimatedresponseforthetestset
-
Obtainingposteriordistributions1.Setup“testset”ofcovariates◦ ToestimateaSATE:covariatesofallobservations withtreatmentsettoopposite ofobserved
2.FitBARTonobservedcovariatesandresponse
3.Computecausaleffectofinterestfromdrawsoftheestimatedresponseforthetestset◦ ToestimateaSATE:compute differenceofestimateforobserved treatmentandcounterfactualtreatment(changesign sothatdifference istreated-untreated).Averageacross observations foreachdrawfrom theposterior
◦ Canthenplot posterior, computemeanandcredible intervalforATE
OrusebartCause packageinR,whichhasawrapperthatcompletesthesestepsforyou!
-
Ex:HeterogeneoustreatmenteffectsFromCarnegie,et.al.(2019):Left:meansand95%credibleintervalsofposteriordistributionsforATEofmindsetinterventiononstudentachievement,foreachlevelofanorderedcategoricalvariable(studentexpectationsforsuccess).Right:posteriordistributionfordifferenceinmeaneffectsbetweentwotoplevelsoffuturesuccessandtherest.(Simulateddatachallenge)
-
ToolsforBARTmodelingThereareanumberofRpackagesthatfitBARTmodels:◦ BayesTree:basicBARTmodel◦ dbarts:expandstoincluderandomeffectsmodelsandautomaticcrossvalidation
◦ bartCause:wrapperfunctionsusingdbarts implementationthatspecificallytargetcausalinference
◦ treatSens:includessensitivity analysismethodsforBARTmodels
-
SensitivityanalysiswithBARTFigurefromCarnegieetal.(2019).
-
Inconclusion◦ Advantagesofmachine learningwithadvantagesofformalstatisticalinference
◦ Computationally (relatively)efficient◦ Robustimplementationforeaseofuse◦ Demonstratedsuccess incausalinferencechallenges◦ Toolsavailabletoassess sensitivitytounmeasuredconfoundingnotcapturedbyflexibleresponsesurfacemodeling
-
References◦ Carnegie,NB,Dorie,V,andHill,JH.(2019)ExaminingtreatmenteffectheterogeneityusingBART.ObservationalStudies 5:52-70.◦ CarnegieNB,andWuJ.(2020)VariableselectionandparametertuningforBARTmodelingintheFragileFamiliesChallenge.Socius Inpress.
◦ Chipman,H.,George,E.andMcCulloch,R.(2007).Bayesianensemblelearning.InAdvancesinNeuralInformationProcessingSystems19(B.Scholkopf,J.PlattandT.Hoffman,eds.).MIT Press,Cambridge,MA.
◦ Chipman,H.A.,George,E.I.andMcCulloch,R.E.(2010).BART:Bayesianadditiveregressiontrees.AnnalsofAppliedStatistics;4:266–298.
◦ Chipman H,McCullochR.BayesTree:BayesianmethodsforTreeBasedModels,2010.Availablefrom:http://CRAN.R-project.org/package=BayesTree
◦ DorieV,Chipman H,McCullochR.DBARTS:DiscreteBayesianAdditiveRegressionTreesSampler.Availablefrom:http://CRAN.R-project.org/package=dbarts
◦ DorieV,HillJL.bartCause:CausalinferenceusingBayesianAdditiveRegressionTrees.Availablefrom:http://cran.r-project.org/package=bartcause
◦ DorieV,HillJL,Shalit U,ScottM,andCervoneD.(2019)Automatedversusdo-it-yourselfmethodsforcausalinference:lessonslearnedfromadataanalysiscompetition.StatisticalScience;34(1):43-68.
◦ Green,D.P.andKern,H.L.(2012).ModelingheterogeneoustreatmenteffectsinsurveyexperimentswithBayesianadditiveregressiontrees.PublicOpinionQuarterly;76:491–511.
◦ HillJ.(2011)Bayesiannonparametricmodelingforcausalinference.JournalofComputationalandGraphicalStatistics;20(1):217–240.
◦ HillJ,SuYS.(2013)Assessing lackofcommonsupportincausalinferenceusingBayesiannonparametrics:Implicationsforevaluatingtheeffectofbreastfeedingonchildren’scognitiveoutcomes.AnnalsofAppliedStatistics;7(3):1386–1420.