Intro to Causalitytoni/Courses/Fairness/Lectures/... · Quick Note on Fairness and Causality •...

Post on 07-May-2020

8 views 0 download

Transcript of Intro to Causalitytoni/Courses/Fairness/Lectures/... · Quick Note on Fairness and Causality •...

IntrotoCausalityDavidMadras

October22,2019

Simpson’sParadox

TheMontyHallProblem

TheMontyHallProblem

1. Threedoors– 2havegoatsbehindthem,1hasacar(youwanttowinthecar)

2. Youchooseadoor,butdon’topenit3. Thehost,Monty,opensanother door(nottheoneyouchose),and

showsyouthatthereisagoatbehindthatdoor4. Younowhavetheoptiontoswitchyourdoorfromtheoneyou

chosetotheotherunopeneddoor5. Whatshouldyoudo?Shouldyouswitch?

TheMontyHallProblem

What’sGoingOn?

Causation!=Correlation

• Inmachinelearning,wetrytolearncorrelationsfromdata• “WhencanwepredictXfromY?”

• Incausalinference,wetrytomodelcausation• “WhendoesXcause Y?”

• Thesearenotthesame!• Icecreamconsumptioncorrelateswithmurderrates• Icecreamdoesnotcausemurder(usually)

CorrelationsCanBeMisleading

https://www.tylervigen.com/spurious-correlations

CausalModelling

• Twooptions:1. Runarandomizedexperiment

CausalModelling

• Twooptions:1. Runarandomizedexperiment2. Makeassumptionsabouthowourdataisgenerated

CausalDAGs

• PioneeredbyJudeaPearl• Describesgenerativeprocessofdata

CausalDAGs

• PioneeredbyJudeaPearl• Describes(stochastic)generativeprocessofdata

CausalDAGs

• Tisamedicaltreatment• Yisadisease• Xareotherfeaturesaboutpatients(say,age)

• Wewanttoknowthecausaleffect ofourtreatmentonthedisease.

CausalDAGs

• Experimentaldata:randomizedexperiment• WedecidewhichpeopleshouldtakeT

• Observationaldata:noexperiment• PeoplechosewhetherornottotakeT

• Experimentsareexpensiveandrare• Observationscanbebiased• E.g.WhatifmostlyyoungpeoplechooseT?

AskingCausalQuestions• SupposeTisbinary(1:receivedtreatment,0:didnot)• SupposeY isbinary(1:diseasecured,0:diseasenotcured)• Wewanttoknow“Ifwegivesomeonethetreatment(T=1), whatistheprobabilitytheyarecured(Y=1)?”

• Thisisnot equaltoP(Y=1|T=1)• Supposemostlyyoungpeopletakethetreatment,andmostwerecured,i.e.P(Y=1|T=1)ishigh• Isthisbecausethetreatmentisgood?Orbecausetheyareyoung?

Correlation vs.Causation

• Correlation

• Intheobserveddata,howoftendopeoplewhotakethetreatmentbecomecured?• Theobserveddatamaybebiased!!

Correlationvs.Causation

• Let’ssimulate arandomizedexperiment• i.e.• CutthearrowfromXtoT• Thisiscalledado-operation

• Then,wecanestimatecausation:

Correlationvs.Causation

• Correlation

• Causation– treatmentisindependent ofX

InversePropensityWeighting

• Cancalculatethisusinginversepropensityscores• RatherthanadjustingforX,sufficienttoadjustforP(T|X)

P(T|X)

InversePropensityWeighting

• Cancalculatethisusinginversepropensityscores• Thesearecalledstabilizedweights

MatchingEstimators

• Matchupsampleswithdifferenttreatmentsthatareneartoeachother• Similartoreweighting

Review:Whattodo withacausalDAG

ThecausaleffectofTonYis

Thisisgreat!Butwe’vemadesomeassumptions.

Simpson’sParadox,Explained

Simpson’sParadox,Explained

Size

Trmt Y

Simpson’sParadox,Explained

Size

Trmt Y

MontyHallProblem,Explained

Boringexplanation:

MontyHallProblem,Explained

Causalexplanation:• Mydoorlocationis

correlatedwiththecarlocation,conditioned onwhichdoorMontyopens!

CarLocationMyDoor

OpenedDoor

https://twitter.com/EpiEllie/status/1020772459128197121

MontyHallProblem,Explained

Causalexplanation:• Mydoorlocationis

correlatedwiththecarlocation,conditioned onwhichdoorMontyopens!

• ThisisbecauseMontywon’tshowmethecar

• Ifhe’sguessingalso,thencorrelationdisappears

CarLocationMyDoor

Monty’sDoor

StructuralAssumptions

• AllofthisassumesthatourassumptionsabouttheDAGthatgeneratedourdataarecorrect

• Specifically,weassumethattherearenohiddenconfounders• Confounder:avariablewhichcausallyeffectsboththetreatment(T)andtheoutcome(Y)• No hiddenconfoundersmeansthatwehaveobservedallconfounders

• Thisisastrongassumption!

HiddenConfounders

• CannotcalculateP(Y|do(T))here,sinceUisunobserved

• Wesayinthiscasethatthecausaleffectisunidentifiable• Eveninthecaseofinfinitedataandcomputation,wecannevercalculatethisquantity

X

T Y

U

WhatCanWeDowithHiddenConfounders?

• Instrumentalvariables• Findsomevariablewhicheffectsonly thetreatment

• Sensitivityanalysis• Essentially,assumesomemaximumamountofconfounding• Yieldsconfidenceinterval

• Proxies• Otherobservedfeaturesgiveusinformationaboutthehiddenconfounder

InstrumentalVariables

• Findaninstrument – variablewhichonlyaffectstreatment• Decouplestreatmentandoutcomevariation

• Withlinearfunctions,solveanalytically• Butcanalsouseanyfunctionapproximators

SensitivityAnalysis

• Determinetherelationshipbetweenstrengthofconfoundingandcausaleffect• Example:Doessmokingcauselungcancer?(wenowknow,yes)• Theremay beagenethatcauseslungcancerand smoking• Wecan’tknowforsure!• However,wecanfigureouthowstrongthisgenewouldneedtobetoresultintheobservedeffect• Turnsout– verystrong

X Gene

Smoking Cancer

SensitivityAnalysis

• Theideais:parametrizeyouruncertainty,andthendecidewhichvaluesofthatparameterarereasonable

UsingProxies

• Insteadofmeasuringthehiddenconfounder,measuresomeproxies(V=fprox(U))• Proxies:variablesthatarecausedbytheconfounder• IfUisachild’sage,Vmightbeheight

• Iffprox isknownorlinear,wecanestimatethiseffect

X

T

U

Y V

UsingProxies

• Iffprox isnon-linear,wemighttrytheCausalEffectVAE• LearnaposteriordistributionP(U|V)withvariationalmethods• However,thismethoddoesnotprovidetheoreticalguarantees• Resultsmaybeunverifiable:proceedwithcaution!

X

T

U

Y V

CausalityandOtherAreasofML

• ReinforcementLearning• Naturalcombination– RLisallabouttakingactionsintheworld• Off-policylearningalreadyhaselementsofcausalinference

• Robustclassification• Causalitycanbenaturallanguageforspecifyingdistributionalrobustness

• Fairness• Ifdatasetisbiased,MLoutputsmightbeunfair• Causalityhelpsusthinkaboutdatasetbias,andmitigateunfaireffects

QuickNoteonFairnessandCausality

• Manyfairnessproblems(e.g.loans,medicaldiagnosis)areactuallycausalinferenceproblems!• WetalkaboutthelabelY– however,thisisnotalwaysobservable• Forinstance,wecan’tknowifsomeonewould returnaloanifwedon’tgiveonetothem!• Thismeansifwejusttrainaclassifieronhistoricaldata,ourestimatewillbebiased• Biasedinthefairnesssenseand thetechnicalsense

• Generaltakeaway:ifyourdataisgeneratedbypastdecisions,thinkveryhardabouttheoutputofyourMLmodel!

FeedbackLoops

• Takesustopart2…feedbackloops• WhenMLsystemsaredeployed,theymakemanydecisionsovertime• Soourpastpredictionscanimpactourfuturepredictions!• Notgood

UnfairFeedbackLoops

• We’lllookat“FairnessWithoutDemographicsinRepeatedLossMinimization”(Hashimotoetal,ICML2018)• Domain:recommendersystems• Supposewehaveamajoritygroup(A=1)andminoritygroup(A=0)• Ourrecommendersystemmayhavehighoverallaccuracybutlowaccuracyontheminoritygroup• Thiscanhappenduetoempiricalriskminimization(ERM)

• Canalsobeduetorepeateddecision-making

RepeatedLossMinimization

• Whenwegivebadrecommendations,peopleleaveoursystem• Overtime,thelow-accuracygroupwillshrink

Distributionally RobustOptimization

• Upweight exampleswithhighlossinordertoimprovetheworstcase• Inthelongrun,thiswillpreventclustersfrombeingunderserved

• Thisendsupbeingequalto

Distributionally RobustOptimization

• Upweight exampleswithhighlossinordertoimprovetheworstcase• Inthelongrun,thiswillpreventclustersfrombeingunderserved

Conclusion

• Yourdataisnotwhatitseems• MLmodelsonlyworkifyourtraining/testsetactuallylookliketheenvironmentyoudeploythemin• Thiscanmakeyourresultsunfair• Orjustincorrect

• Soexamineyourmodelassumptionsanddatacollectioncarefully!