Reasoning with Cause and Effect - CS | Computer...

18
This article is an edited transcript of a lecture given at IJCAI-99, Stockholm, Sweden, on 4 August 1999. The article summarizes concepts, principles, and tools that were found useful in applications involving causal modeling. The principles are based on structural-model semantics in which functional (or counterfactual) relationships repre- senting autonomous physical processes are the fundamental building blocks. The article presents the conceptual basis of this semantics, illustrates its application in simple problems, and discusses its ramifications to computational and cognitive problems concerning causation. T he subject of my lecture this evening is causality. 1 It is not an easy topic to speak about, but it is a fun topic to speak about. It is not easy because like religion, sex, and intelligence, causality was meant to be prac- ticed, not analyzed. It is fun because, like reli- gion, sex, and intelligence, emotions run high; examples are plenty; there are plenty of inter- esting people to talk to; and above all, the experience of watching our private thoughts magnified under the microscope of formal analysis is exhilarating. From Hume to AI The modern study of causation begins with the Scottish philosopher David Hume (figure 1). Hume has introduced to philosophy three rev- olutionary ideas that, today, are taken for granted by almost everybody, not only philosophers: First, he made a sharp distinc- tion between analytic and empirical claims—analytic claims are the product of thoughts; empirical claims are matters of fact. Second, he classified causal claims as empirical rather than analytic. Third, he identified the source of all empirical claims with human experience, namely, sensory input. Putting ideas 2 and 3 together has left philosophers baffled for over two centuries over two major riddles: First, what empirical evidence legitimizes a cause-effect connection? Second, what inferences can be drawn from causal information and how? We in AI have the audacity to hope that today after two centuries of philosophical debate, we can say something useful on this topic because for us, the question of causation is not purely academic. We must build machines that make sense of what goes on in their environment so they can recover when things do not turn out exactly as expected. Likewise, we must build machines that under- stand causal talk when we have the time to teach them what we know about the world because the way we communicate about the world is through this strange language called causation. This pressure to build machines that both learn about, and reason with, cause and effect, something that David Hume did not experi- ence, now casts new light on the riddles of cau- sation, colored with an engineering flavor: How should a robot acquire causal informa- tion from the environment? How should a robot process causal information received from its creator-programmer? I do not touch on the first riddle because David Heckerman (1999) covered this topic earlier, both eloquently and comprehensively. I want to discuss primarily the second prob- lem—how we go from facts coupled with causal premises to conclusions that we could not obtain from either component alone. On the surface, the second problem sounds Articles SPRING 2002 95 Reasoning with Cause and Effect Judea Pearl Copyright © 2002, American Association for Artificial Intelligence. All rights reserved. 0738-4602-2002 / $2.00 AI Magazine Volume 23 Number 1 (2002) (© AAAI)

Transcript of Reasoning with Cause and Effect - CS | Computer...

Page 1: Reasoning with Cause and Effect - CS | Computer Scienceftp.cs.ucla.edu/pub/stat_ser/r265-ai-mag.pdf · a causal model. In general, the purpose of a model is to assign truth values

■ This article is an edited transcript of a lecture givenat IJCAI-99, Stockholm, Sweden, on 4 August1999. The article summarizes concepts, principles,and tools that were found useful in applicationsinvolving causal modeling. The principles arebased on structural-model semantics in whichfunctional (or counterfactual) relationships repre-senting autonomous physical processes are thefundamental building blocks. The article presentsthe conceptual basis of this semantics, illustratesits application in simple problems, and discussesits ramifications to computational and cognitiveproblems concerning causation.

The subject of my lecture this evening iscausality.1 It is not an easy topic to speakabout, but it is a fun topic to speak about.

It is not easy because like religion, sex, andintelligence, causality was meant to be prac-ticed, not analyzed. It is fun because, like reli-gion, sex, and intelligence, emotions run high;examples are plenty; there are plenty of inter-esting people to talk to; and above all, theexperience of watching our private thoughtsmagnified under the microscope of formalanalysis is exhilarating.

From Hume to AIThe modern study of causation begins with theScottish philosopher David Hume (figure 1).Hume has introduced to philosophy three rev-olutionary ideas that, today, are taken forgranted by almost everybody, not onlyphilosophers: First, he made a sharp distinc-tion between analytic and empiricalclaims—analytic claims are the product ofthoughts; empirical claims are matters of fact.Second, he classified causal claims as empirical

rather than analytic. Third, he identified thesource of all empirical claims with humanexperience, namely, sensory input.

Putting ideas 2 and 3 together has leftphilosophers baffled for over two centuriesover two major riddles: First, what empiricalevidence legitimizes a cause-effect connection?Second, what inferences can be drawn fromcausal information and how?

We in AI have the audacity to hope thattoday after two centuries of philosophicaldebate, we can say something useful on thistopic because for us, the question of causationis not purely academic. We must buildmachines that make sense of what goes on intheir environment so they can recover whenthings do not turn out exactly as expected.Likewise, we must build machines that under-stand causal talk when we have the time toteach them what we know about the worldbecause the way we communicate about theworld is through this strange language calledcausation.

This pressure to build machines that bothlearn about, and reason with, cause and effect,something that David Hume did not experi-ence, now casts new light on the riddles of cau-sation, colored with an engineering flavor:How should a robot acquire causal informa-tion from the environment? How should arobot process causal information received fromits creator-programmer?

I do not touch on the first riddle becauseDavid Heckerman (1999) covered this topicearlier, both eloquently and comprehensively.I want to discuss primarily the second prob-lem—how we go from facts coupled withcausal premises to conclusions that we couldnot obtain from either component alone.

On the surface, the second problem sounds

Articles

SPRING 2002 95

Reasoning with Cause and Effect

Judea Pearl

Copyright © 2002, American Association for Artificial Intelligence. All rights reserved. 0738-4602-2002 / $2.00

AI Magazine Volume 23 Number 1 (2002) (© AAAI)

Page 2: Reasoning with Cause and Effect - CS | Computer Scienceftp.cs.ucla.edu/pub/stat_ser/r265-ai-mag.pdf · a causal model. In general, the purpose of a model is to assign truth values

What if we open the left lock? Output:

The right lock might get closed.

In these two examples, the strange output isderived from solid logical principles, chainingin the first, constraint satisfaction in the sec-ond, yet we feel that there is a missing ingredi-ent that the computer did not quite grasp,something to do with causality. Evidently thereis some valuable information conveyed bycausal vocabulary that is essential for correctunderstanding of the input. What is this infor-mation, and what is that magic logic thatshould permit a computer to select the rightinformation, and what is the semantics behindsuch logic? It is this sort of question that Iwould like to address in this talk because Iknow that many in this audience are dealingwith such questions and have made promisingproposals for answering them. Most notableare people working in qualitative physics, trou-bleshooting, planning under uncertainty,modeling behavior of physical systems, con-structing theories of action and change, andperhaps even those working in natural lan-guage understanding because our language isloaded with causal expressions. Since 1990, Ihave examined many (though not all) of theseproposals, together with others that have beensuggested by philosophers and economists,and I have extracted from them a set of basicprinciples that I would like to share with youtonight.

The Basic PrinciplesI am now convinced that the entire story ofcausality unfolds from just three basic princi-ples: (1) causation encodes behavior underinterventions, (2) interventions are surgerieson mechanisms, and (3) mechanisms are stablefunctional relationships.

The central theme is to view causality as acomputational scheme devised to facilitateprediction of the effects of actions. I use theterm intervention here, instead of action, toemphasize that the role of causality can best beunderstood if we view actions as external enti-ties, originating from outside our theory not asa mode of behavior within the theory.

To understand the three principles it is betterto start from the end and go backwards: (3) Theworld is modeled as an assembly of stablemechanisms, or physical laws, that are suffi-cient for determining all events that are ofinterest to the modeler. The mechanisms areautonomous, like mechanical linkages in amachine or logic gates in electronic circuits—we can change one without changing the oth-ers. (2) Interventions always involve the break-

trivial: One can simply take the causal rules,apply them to the facts, and derive the conclu-sions by standard logical deduction. However,it is not as trivial as it sounds. The exercise ofdrawing the proper conclusions from causalinput has met with traumatic experiences inAI. One of my favorite examples is described inthe following hypothetical man-machine dia-logue:

Input: 1. If the grass is wet, then it rained. 2. If we break this bottle, the grass will getwet. Output: If we break this bottle, then it rained.

Another troublesome example (Lin 1995) isillustrated in the following dialogue:

Input: 1. A suitcase will open iff both locks areopen. 2. The right lock is open. 3. The suitcase is closed.Query:

Articles

96 AI MAGAZINE

Figure 1. David Hume, 1711–1776.

Page 3: Reasoning with Cause and Effect - CS | Computer Scienceftp.cs.ucla.edu/pub/stat_ser/r265-ai-mag.pdf · a causal model. In general, the purpose of a model is to assign truth values

down of mechanisms. I call this breakdown asurgery to emphasize its dual painful-remedialcharacter. (1) Causal relationships tell us whichmechanism is to be surgically modified by anygiven action.

Causal ModelsThese principles can be encapsulated neatlyand organized in a mathematical object calleda causal model. In general, the purpose of amodel is to assign truth values to sentences ina given language. If models in standard logicassign truth values to logical formulas, causalmodels embrace a wider class of sentences,including phrases that we normally classify as“causal.” These include:

Actions:B will be true if we do A.

Counterfactuals:B would be different if A were true.

Explanation:B occurred because of A.

There could be more, but I concentrate onthese three because they are commonly usedand because I believe that all other causal sen-tences can be reduced to these three. The dif-ference between actions and counterfactuals ismerely that in counterfactuals, the clashbetween the antecedent and the current stateof affairs is explicit.

To allay any fear that a causal model is someformidable mathematical object, let me exem-plify the beast with two familiar examples. Fig-ure 2 shows a causal model we all rememberfrom high school—a circuit diagram. There arefour interesting points to notice in this example:

First, it qualifies as a causal model because itcontains the information to confirm or refuteall action, counterfactual, and explanatory sen-tences concerning the operation of the circuit.For example, anyone can figure out what theoutput would be if we set Y to zero, if wereplace a certain OR gate with a NOR gate, or ifwe perform any of the billions of combinationsof such actions.

Second, Boolean formulas are insufficient foranswering such action queries. For example,the Boolean formula associated with the circuit{Y = OR(X, Z), W = NOT(Y)} is equivalent to theformula associated with {W = NOR(X, Z), Y =NOT(W)}. However, setting Y to zero in the firstcircuit entails W = 1; not so in the second cir-cuit, where Y has no effect on W.

Third, the actions about which we can rea-son, given the circuit, were not specified inadvance; they do not have special names, andthey do not show up in the diagram. In fact,the great majority of the action queries thatthis circuit can answer have never been

considered by the designer of this circuit. Fourth, how does the circuit convey this

extra information? It is done through twoencoding tricks: (1) The symbolic units corre-spond to stable physical mechanisms (that is,the logical gates). Second, each variable hasprecisely one mechanism that determines itsvalue. In our example, the electric voltage ineach junction of the circuit is the output of oneand only one logic gate (otherwise, a valueconflict can arise under certain input combina-tions).

As another example, figure 3 displays thefirst causal model that was put down on paper:Sewal Wright’s (1921) path diagram. It showshow the fur pattern of the guinea pigs in a lit-ter is determined by various genetic and envi-ronmental factors.

Again, (1) it qualifies as a causal model, (2)the algebraic equations in themselves do notqualify as a causal model, and (3) the extrainformation comes from having each variabledetermined by a stable functional relationshipconnecting it to its parents in the diagram. (Asin O = eD + hH + eE, where O stands for the per-centage of black area on the guinea pig fur.)

Now that we are on familiar ground, let usobserve more closely the way a causal modelencodes the information needed for answeringcausal queries. Instead of a formal definitionthat is given shortly (definition 1), I illustratethe working of a causal model through a vividexample. It describes a tense moment in thelife of a gentleman facing a firing squad (figure4). The captain (C) awaits the court order (U);

Articles

SPRING 2002 97

Z

YX

Input Output

Figure 2. Causal Models: Why They Are Needed.

Page 4: Reasoning with Cause and Effect - CS | Computer Scienceftp.cs.ucla.edu/pub/stat_ser/r265-ai-mag.pdf · a causal model. In general, the purpose of a model is to assign truth values

S2(Abduction)—If the prisoner is alive, thenthe captain did not signal: ¬D ⇒ ¬C.

S3(Transduction)—If rifleman A shot, then Bshot as well:A ⇒ B.

S4(Action)—If the captain gave no signal,and rifleman A decides to shoot, then theprisoner will die, and B will not shoot:¬C ⇒ DA and ¬BA.

S5(Counterfactual)—If the prisoner is dead,then the prisoner would still be dead even ifrifleman A had not shot: D ⇒ D¬A.

S6(Explanation)— The prisoner died becauserifleman A shot. Caused(A, D).

The simplest sentences are S1 to S3, whichcan be evaluated by standard deduction; theyinvolve standard logical connectives becausethey deal with inferences from beliefs to beliefsabout a static world.

Next in difficulty is action sentence S4,requiring some causal information; then comescounterfactual S5, requiring more detailedcausal information; and the hardest is the expla-nation sentence (S6) whose semantics is still notcompletely settled (to be discussed later).

Sentence S4 offers us the first chance to wit-ness what information a causal model provideson top of a logical model.

Shooting with no signal constitutes a blatantviolation of one mechanism in the story: rifle-man A’s commitment to follow the captain’s sig-nal. Violation renders this mechanism inactive;hence, we must excise the corresponding equa-tion from the model, using a surgeon’s knife,and replace it with a new mechanism: A = TRUE(figure 5). To indicate that the consequent partof our query (the prisoner’s death, D) is to beevaluated in a modified model, with A = TRUEoverriding A = C, we use the subscript notationDA. Now we can easily verify that D holds true inthe modified model; hence, S4 evaluates to true.Note that the surgery suppresses abduction;from seeing A shoot, we can infer that B shot aswell (recall A ⇒ B), but from making A shoot, wecan no longer infer what B does, unless we knowwhether the captain signaled.

Everything we do with graphs, we can, ofcourse, do with symbols. We need to be careful,however, in distinguishing facts from rules(domain constraints) and mark the privilegedelement in each rule (the left-hand side), as infigure 6.

Here we see for the first time the role ofcausal order: Which mechanism should beexcised by an action whose direct effect is A?

riflemen A and B obey the captain’s signal; theprisoner dies iff any of the riflemen shoot. Themeaning of the symbols is obvious from thestory: the only new symbol is the functionalequality =, which is borrowed here from Euler(around 1730), meaning that the left-hand sideis determined by the right-hand side and notthe other way around.

Answering Queries with Causal ModelsAssume that we want to evaluate (mechanical-ly) the following sentences:

S1(Prediction)—If rifleman A did not shoot,then the prisoner is alive: ¬A ⇒ ¬ D.

Articles

98 AI MAGAZINE

Figure 3. Genetic Models (S. Wright, 1920).

U: Court orders the execution

C: Captain gives a signal

A: Rifleman-A shoots

B: Rifleman-B shoots

D: Prisoner Dies

=: Functional Equality (New Symbol)

U

D

B

C

A

C = U

A = C B= C

D = A � B

Figure 4. Causal Models at Work (the Impatient Firing Squad).

Page 5: Reasoning with Cause and Effect - CS | Computer Scienceftp.cs.ucla.edu/pub/stat_ser/r265-ai-mag.pdf · a causal model. In general, the purpose of a model is to assign truth values

(Note that the symbol A appears in two equa-tions.) The answer is, Excise the equation inwhich A is the privileged variable (A = C in fig-ure 6). Once we create the mutilated modelMA, we draw the conclusions by standarddeduction and easily confirm the truth of S4:The prisoner will be dead, denoted DA, becauseD is true in MA.

Consider now our counterfactual sentenceS5: If the prisoner is dead, he would still bedead if A were not to have shot, D ⇒ D¬A. Theantecedent ¬A should still be treated as inter-ventional surgery but only after we fullyaccount for the evidence given: D.

This calls for three inferential steps, asshown in figure 7: (1) abduction: interpret thepast in light of the evidence; (2) action: bendthe course of history (minimally) to accountfor the hypothetical antecedent (¬ A); and (3)prediction: project the consequences to thefuture.

Note that this three-step procedure can becombined into one. If we use an asterisk to dis-tinguish postmodification from premodifica-tion variables, we can combine M and MA intoone logical theory and prove the validity of S5by purely logical deduction in the combinedtheory. To illustrate, we write S5 as D ⇒ D*¬A*(thus, if D is true in the actual world, then Dwould also be true in the hypothetical worldcreated by the modification ¬A*) and prove thevalidity of D* in the combined theory, asshown in figure 8.

Suppose now that we are not entirely igno-rant of U but can assess the degree of beliefP(u). The same three steps apply to the compu-tation of the counterfactual probability (thatthe prisoner be dead if A were not to haveshot). The only difference is that we now usethe evidence to update P(u) into P(u | D) anddraw probabilistic, instead of logical, conclu-sions (figure 9).

Graphically, the combined theories of figure8 can be represented by two graphs sharing theU variables (called twin network) (figure 10). Thetwin network is particularly useful in proba-bilistic calculations because we can simplypropagate evidence (using Bayes’s networktechniques) from the actual to the hypotheti-cal network.

Let us now summarize the formal elementsinvolved in this causal exercise.

Definition 1: Causal ModelsA causal model is a 3-tuple

⟨M = V, U, F⟩with a mutilation operator do(x): M → Mx,where

(1) V = {V1, …, Vn} endogenous variables,

(2) U = {U1, …, Um} background variables

(3) F = set of n functions, fi: V \ Vi × U →Vi, each of the formvi = fi(pai, ui) PAi ⊆ V \ Vi Ui U

(4) Mx = ⟨U, V, Fx⟩, X⊆ V, x ∈ X, whereFx = {fi: Vi ∉X} ∪ {X = x}

Articles

SPRING 2002 99

U

D

B

C

ATrue ⇒

If the captain gave no signaland Mr. A decides to shoot,the prisoner will die:

¬C ⇒ DAand B will not shoot:

¬C ⇒ ¬BA

S4. (action):

Figure 5. Why Causal Models? Guide for Surgery.

(U)C = U (C)

A (A)B = C (B)D = A � B (D)

Facts: ¬C

A = C

U

D

B

C

ATrue

S4. (action): If the captain gave no signal andA decides to shoot, the prisoner will die andB will not shoot, ¬C ⇒ DA & ¬BA

Conclusions: A, D, ¬B, ¬U, ¬C

Model MA (Modify A = C):

Figure 6. Mutilation in Symbolic Causal Models.

U

D

B

C

A

Abduction Action Prediction

U

D

B

C

A

False

True

True

U

D

B

C

A

False

TrueTrue

True

S5. If the prisoner is dead, he would still be deadif A were not to have shot. D ⇒ D¬A

Figure 7. Three Steps to Computing Counterfactuals.

Page 6: Reasoning with Cause and Effect - CS | Computer Scienceftp.cs.ucla.edu/pub/stat_ser/r265-ai-mag.pdf · a causal model. In general, the purpose of a model is to assign truth values

(Replace all functions fi corresponding to Xwith the constant functions X = x).

Definition 2: CounterfactualThe sentence, Y would be y had X been x,”denoted Yx(u) = y, is interpreted to mean“the solution for Y in Mx is equal to yunder the current conditions U = u.”

The Role of CausalityWe have seen that action queries can beanswered in one step: standard deduction on amutilated submodel. Counterfactual queries,however, required a preparatory stage ofabduction. The questions naturally arise: Whoneeds counterfactuals? Why spend time oncomputing such convoluted sentences? Itturns out that counterfactuals are common-place, and abduction-free action sentences area fiction. Action queries are brought into focusby certain undesired observations, potentiallymodifiable by the actions. The step of abduc-tion, which is characteristic of counterfactualqueries, cannot be disposed of and must there-fore precede the surgery step. Thus, mostaction queries are semantically identical tocounterfactual queries.

Consider an example from troubleshooting,where we observe a low output and we ask:

Action query:Will the output get higher if we replace thetransistor?

Counterfactual query:Would the output be higher had the transis-tor been replaced?

The two sentences in this example are equiv-alent: Both demand an abductive step toaccount for the observation, which compli-cates things a bit. In probabilistic analysis, afunctional specification is needed; conditionalprobabilities alone are not sufficient foranswering observation-triggered actionqueries. In symbolic analysis, abnormalitiesmust be explicated in functional details; thecatch-all symbol ¬ab(p) (standing for “p is notabnormal”) is not sufficient.

Thus, we come to the million dollar ques-tion: Why causality?

To this point, we have discussed actions,counterfactuals, surgeries, mechanism, abduc-tion, and so on, but is causality really neces-sary? Indeed, if we know which mechanismseach action modifies, and the nature of themodification, we can avoid all talk of causa-tion—the ramification of each action can beobtained by simply mutilating the appropriatemechanisms, then simulating the naturalcourse of events. The price we pay is that we

Articles

100 AI MAGAZINE

Prove: D ⇒ D¬A

Combined Theory:

(U)

C* = U C = U (C)

¬A* A = C (A)

B* = C* (B)

D*= A* � B* D = A � B (D)

Facts: D

Conclusions: U, A, B, C, D, ¬A*, C*, B*, D*

B = C

Figure 8. Symbolic Evaluation of Counterfactuals.

U

D

B

C

Abduction Action Prediction

U

D

B

C

AFalse

P(u|D)

P(D¬A|D)

P(u) U

D

B

C

AFalse

True

A

P(S5). The prisoner is dead. How likely is it that he would be deadif A were not to have shot. P(D¬A |D) = ?

P(u|D)

P(u|D)

Figure 9. Computing Probabilities of Counterfactuals.

False

True

TrueD*

B*

C*

A*

U

D

B

C

A

W

P(Alive had A not shot | A shot, Dead) =

P(¬D) in model �M¬A, P (u, w | A, D)� =P(¬D* | D) in twin-network

Figure 10. Probability of Counterfactuals: The Twin Network.

Page 7: Reasoning with Cause and Effect - CS | Computer Scienceftp.cs.ucla.edu/pub/stat_ser/r265-ai-mag.pdf · a causal model. In general, the purpose of a model is to assign truth values

need to specify an action not by its directeffects but, rather, by the mechanisms that theaction modifies.

For example, instead of saying “this actionmoves the coffee cup to location x,” I wouldneed to say, “This action neutralizes the staticfriction of the coffee cup and replaces it with aforward acceleration a for a period of 1 second,followed by deceleration for a period of 2 sec-onds....”

This statement is awfully clumsy: Mostmechanisms do not have names in nontechni-cal languages, and when they do, the names donot match the granularity of ordinary lan-guage. Causality enables us to reason correctlyabout actions while we keep the mechanismimplicit. All we need to specify is the action’sdirect effects; the rest follows by mutilationand simulation.

However, to figure out which mechanismdeserves mutilation, there must be one-to-onecorrespondence between variables and mecha-nisms. Is this a realistic requirement? In gener-al, no. An arbitrary collection of n equations onn variables would not normally enjoy thisproperty. Even a typical resistive network (forexample, a voltage divider) does not enjoy it.But because causal thinking is so pervasive inour language, we can conclude that our con-ceptualization of the world is more structuredand that it does enjoy the one-to-one corre-spondence. We say “raise taxes,” “clean yourface,” “make him laugh,” or, in general, do(p),and miraculously, people understand us with-out asking for a mechanism name.2

Perhaps the best “AI proof” of the ubiquityof the modality do(p) is the existence of thelanguage STRIPS (Fikes and Nilsson 1971), inwhich actions are specified by theireffects—the ADD-LIST and DELETE-LIST. Let uscompare causal surgeries to STRIPS surgeries.Both accept actions as modalities, both per-form surgeries, but STRIPS performs the surgeryon propositions, and causal theories first iden-tify the mechanism to be excised and then per-form the surgery on mechanisms, not onpropositions. The result is that direct effectssuffice, and indirect ramifications of an actionneed not be specified; they can be inferredfrom the direct effects using the mutilate-sim-ulate cycle of figure 5.

ApplicationsThus, we arrive at the midpoint of our story. Ihave talked about the story of causation fromHume to robotics, I have discussed the seman-tics of causal utterances and the principlesbehind the interpretation of action and coun-

terfactual sentences, and now it is time to askabout the applications of these principles. Italk about two types of applications: The firstrelates to the evaluation of actions and the sec-ond to finding explanations.

Inferring Effects of ActionsLet us start with the evaluation of actions. Wesaw that if we have a causal model M, then pre-dicting the ramifications of an action is triv-ial—mutilate and solve. If instead of a com-plete model we only have a probabilisticmodel, it is again trivial: We mutilate and prop-agate probabilities in the resultant causal net-work. The important point is that we can spec-ify knowledge using causal vocabulary and canhandle actions that are specified as modalities.

However, what if we do not have even aprobabilistic model? This is where data comein. In certain applications, we are lucky to havedata that can supplement missing fragments ofthe model, and the question is whether thedata available are sufficient for computing theeffect of actions.

Let us illustrate this possibility in a simpleexample taken from economics (figure 11).Economic policies are made in a manner simi-lar to the way actions were taken in the firingsquad story: Viewed from the outside, they aretaken in response to economic indicators orpolitical pressure, but viewed from the policymaker’s perspective, the policies are decidedunder the pretense of free will.

Like rifleman A in figure 5, the policy makerconsiders the ramifications of nonroutineactions that do not conform to the dictates ofthe model, as shown in the mutilated graph offigure 11. If we knew the model, there wouldbe no problem calculating the ramifications ofeach pending decision—mutilate and pre-dict—but being ignorant of the functional rela-tionships among the variables, and havingonly the skeleton of the causal graph in ourhands, we hope to supplement this informa-tion with what we can learn from economicdata.

Unfortunately, economic data are takenunder a “wholesome” model, with tax levelsresponding to economic conditions, and weneed to predict ramifications under a mutilatedmodel, with all links influencing tax levelsremoved. Can we still extract useful informa-tion from such data? The answer is yes. As longas we can measure every variable that is a com-mon cause of two or more other measured vari-ables, it is possible to infer the probabilities ofthe mutilated model directly from those of thenonmutilated model, regardless of the under-lying functions. The transformation is given by

Articles

SPRING 2002 101

Page 8: Reasoning with Cause and Effect - CS | Computer Scienceftp.cs.ucla.edu/pub/stat_ser/r265-ai-mag.pdf · a causal model. In general, the purpose of a model is to assign truth values

of cancer cases. Controlled experiment coulddecide between the two models, but these areimpossible, and now also illegal, to conduct.

This is all history. Now we enter a hypothet-ical era where representatives of both sidesdecide to meet and iron out their differences.The tobacco industry concedes that theremight be some weak causal link between smok-ing and cancer, and representatives of thehealth group concede that there might besome weak links to genetic factors. According-ly, they draw this combined model (model C infigure 12), and the question boils down toassessing, from the data, the strengths of thevarious links. In mutilation language, the ques-tion boils down to assessing the effect of smok-ing in the mutilated model shown here (modelD) from data taken under the wholesome mod-el shown before (model C). They submit thequery to a statistician, and the answer comesback immediately: impossible. The statisticianmeans that there is no way to estimate thestrength for the causal links from the databecause any data whatsoever can perfectly fiteither one of the extreme models shown inmodel A and model B; so, they give up anddecide to continue the political battle as usual.

Before parting, a suggestion comes up: Per-haps we can resolve our differences if we mea-sure some auxiliary factors. For example,because the causal link model is based on theunderstanding that smoking affects lung can-cer through the accumulation of tar deposits inthe lungs, perhaps we can measure the amountof tar deposits in the lungs of sampled individ-uals, which might provide the necessary infor-

the manipulation theorem described in thebook by Spirtes, Glymour, and Schienes (1993)and further developed by Pearl (2000, 1995).

Remarkably, the effect of certain policies canbe inferred even when some common factorsare not observable, as is illustrated in the nextexample (Pearl 2000).

Smoking and the Genotype TheoryIn 1964, the surgeon general issued a reportlinking cigarette smoking to death, cancer, andmost particularly, lung cancer. The report wasbased on nonexperimental studies in which astrong correlation was found between smokingand lung cancer, and the claim was that thecorrelation found is causal; namely, if we bansmoking, the rate of cancer cases will be rough-ly the same as the rate we find today amongnonsmokers in the population (model A, figure12). These studies came under severe attacksfrom the tobacco industry, backed by somevery prominent statisticians, among them SirRonald Fisher. The claim was that the observedcorrelations can also be explained by a modelin which there is no causal connection be-tween smoking and lung cancer. Instead, anunobserved genotype might exist that simulta-neously causes cancer and produces an inborncraving for nicotine (see model B, figure 12).

Formally, this claim would be written in ournotation as

P(cancer | do(smoke)) = P(cancer | do(not_smoke)) = P(cancer)

stating that making the population smoke orstop smoking would have no effect on the rate

Articles

102 AI MAGAZINE

Example: Policy Analysis

Model Underlying Data Model for Policy Evaluation

Economic conditions

Economicconsequences

Economicconsequences

Tax

Economic conditions

Tax

Figure 11. Intervention as Surgery.

Page 9: Reasoning with Cause and Effect - CS | Computer Scienceftp.cs.ucla.edu/pub/stat_ser/r265-ai-mag.pdf · a causal model. In general, the purpose of a model is to assign truth values

mation for quantifying the links? Both sidesagree that this suggestion is reasonable, sothey submit a new query to the statistician:Can we find the effect of smoking on cancerassuming that an intermediate measurementof tar deposits is available?

Sure enough, the statistician comes backwith good news: It is computable! In otherwords, it is possible now to infer the effect ofsmoking in the mutilated model shown here(model B) from data taken under the originalwholesome model (model C). This inference isvalid as long as the data contain measurementsof all three variables: smoking, tar, and cancer.Moreover, the solution can be derived in closedmathematical form, using symbolic manipula-tions that mimic logical derivation but are gov-erned by the surgery semantics. How can thisderivation be accomplished?

Causal CalculusTo reduce an expression involving do(x) tothose involving ordinary probabilities, weneed a calculus for doing, a calculus thatenables us to deduce behavior under interven-tion from behavior under passive observations.Do we have such a calculus?

If we look at the history of science, we findto our astonishment that such a calculus doesnot in fact exist. It is true that science rests ontwo components: one consisting of passiveobservations (epitomized by astronomy) andthe other consisting of deliberate intervention(represented by engineering and craftsman-ship). However, algebra was not equally fair tothese two components. Mathematical tech-niques were developed exclusively to supportthe former (seeing), not the latter (doing). Evenin the laboratory, a place where the two com-ponents combine, the seeing part enjoys thebenefits of algebra, whereas the doing part is atthe mercy of the scientist’s judgment. True,when a chemist pours the content of one testtube into another, a new set of equationsbecomes applicable, and algebraic techniquescan be used to solve the new equations. How-ever, there is no algebraic operation to repre-sent the transfer from one test tube to anotherand no algebra for selecting the correct set ofequations when conditions change. Such selec-tion has thus far relied on unaided scientificjudgment.

Let me convince you of this imbalance usinga simple example. If we want to find thechance it rained, given that we see wet grass,we can express our question in a formal sen-tence, P(rain | wet) and use the machinery ofprobability theory to transform the sentenceinto other expressions that are more conve-

nient or informative. However, suppose we aska different question: What is the chance itrained if we make the grass wet? We cannoteven express our query in the syntax of proba-bility because the vertical bar is already takento mean “given that we see.” We know intu-itively what the answer should be—P(rain |do(wet)) = P(rain)—because making the grasswet does not change the chance of rain. How-ever, can this intuitive answer, and others likeit, be derived mechanically to comfort ourthoughts when intuition fails?

The answer is yes, but it takes a new algebrato manage the do(x) operator. To make it into agenuine calculus, we need to translate thesurgery semantics into rules of inference, andthese rules consist of the following (Pearl1995):

Rule 1: Ignoring observations

Rule 2: Action-observation exchange

P y do x do z w

P y do x z w Y Z X WGX Z

( ) ( )( )= ( )( ) ⊥⊥

, ,

, , ( , ) if

P y do x z w

P y do x w Z X WGX

( )( )= ( )( ) ⊥⊥( )

, ,

if Y, ,

Articles

SPRING 2002 103

A. Surgeon General (1964):

Smoking Cancer

B. Tobacco Industry:Genotype (Unobserved)

Smoking Cancer

C. Combined:

Cancer

D. Combined and Refined

P(c | do(s)) = computable

Smoking

Smoking CancerTar

P(c | do(s)) = noncomputable

P(c | do(s)) = P(c)

P(c | do(s)) � P(c | s)

Figure 12. Predicting the Effects of Banning Cigarette Smoking.A. Model proposed by the surgeon general. B. Model proposed by the tobaccoindustry. C. Combined model. D. Mutilated combined model, with one addition-al observation.

Page 10: Reasoning with Cause and Effect - CS | Computer Scienceftp.cs.ucla.edu/pub/stat_ser/r265-ai-mag.pdf · a causal model. In general, the purpose of a model is to assign truth values

given any information whatsoever on the hid-den genotype: It might be continuous or dis-crete, unidimensional or multidimensional,yet measuring an auxiliary variable (for exam-ple, tar) someplace else in the system enablesus to discount the effect of this hidden geno-type and predict what the world would be likeif policy makers were to ban cigarette smoking.

Thus, data from the visible allow us toaccount for the invisible. Moreover, using thesame technique, a person can even answersuch intricate and personal questions as, “I amabout to start smoking—should I?”

I think this trick is amazing because I cannotdo such calculations in my head. It demon-strates the immense power of having formalsemantics and symbolic machinery in an areathat many respectable scientists have surren-dered to unaided judgment.

Learning to Act by Watching Other ActorsThe common theme in the past two exampleswas the need to predict the effect of our actionsby watching the behavior of other actors (pastpolicy makers in the case of economic deci-sions and past smokers-nonsmokers in thesmoking-cancer example). This problem recursin many applications, and here are a couple ofadditional examples.

In the example of figure 14, we need to pre-dict the effect of a plan (sequence of actions)after watching an expert control a productionprocess. The expert observes dials that we can-not observe, although we know what quanti-ties these dials represent.

Rule 3: Ignoring actions

These three rules permit us to transformexpressions involving actions and observationsinto other expressions of this type. Rule 1allows us to ignore an irrelevant observation,the third to ignore an irrelevant action, the sec-ond to exchange an action with an observationof the same fact. The if statements on the rightare separation conditions in various subgraphsof the diagram that indicate when the transfor-mation is legal.3 We next see them in action inthe smoking-cancer example that was dis-cussed earlier.

Figure 13 shows how one can prove that theeffect of smoking on cancer can be determinedfrom data on three variables: (1) smoking, (2)tar, and (3) cancer.

The question boils down to computing theexpression P(cancer | do(smoke)) from nonex-perimental data, namely, expressions involv-ing no actions. Therefore, we need to eliminatethe do symbol from the initial expression. Theelimination proceeds like an ordinary solutionof an algebraic equation—in each stage, a newrule is applied, licensed by some subgraph ofthe diagram, until eventually we achieve a for-mula involving no do symbols, meaning anexpression computable from nonexperimentaldata.

If I were not a modest person, I would saythat this result is amazing. Behold, we are not

P y do x do z w

P y do x w Y Z X WGX Z W

( , , )

, ,, ( )

( ) ( )= ( )( ) ⊥⊥( ) if

Articles

104 AI MAGAZINE

Smoking Tar CancerP(c | do{s}) = Σt P(c | do{s}, t) P(t | do{s})

= Σs′ Σt P(c | do{t}, s′) P(s′ | do{t}) P(t | s)

= Σt P(c | do{s}, do{t}) P(t | do{s})

= Σt P(c | do{s}, do{t}) P(t | s)

= Σt P(c | do{t}, P(t | s)

= Σs′ Σt P(c | t, s′) P(s′) P(t | s)

= Σs′ Σt P(c | t, s′) P(s′ | do{t}) P(t | s)

Probability Axioms

Probability Axioms

Rule 2

Rule 2

Rule 3

Rule 3

Rule 2

Genotype (Unobserved)

Figure 13. Derivation in Causal Calculus.

Page 11: Reasoning with Cause and Effect - CS | Computer Scienceftp.cs.ucla.edu/pub/stat_ser/r265-ai-mag.pdf · a causal model. In general, the purpose of a model is to assign truth values

The example in figure 15 (owed to J. Robins)comes from sequential treatment of AIDSpatients.

The variables X1 and X2 stand for treatmentsthat physicians prescribe to a patient at twodifferent times: Z represents observations thatthe second physician consults to determine X2,and Y represents the patient’s survival. Thehidden variables U1 and U2 represent, respec-tively, part of the patient history and thepatient disposition to recover. Doctors used thepatient’s earlier PCP history (U1) to prescribeX1, but its value was not recorded for dataanalysis.

The problem we face is as follows: Assumewe have collected a large amount of data onthe behavior of many patients and physicians,which is summarized in the form of (an esti-mated) joint distribution P of the observed fourvariables (X1, Z, X2, Y). A new patient comes in,and we want to determine the impact of theplan (do(x1), do(x2)) on survival (Y), where x1and x2 are two predetermined dosages ofbactrim, to be administered at two prespecifiedtimes.

Many of you have probably noticed the sim-ilarity of this problem to Markov decisionprocesses, where it is required to find an opti-mal sequence of actions to bring about a cer-tain response. The problem here is both sim-pler and harder. It is simpler because we areonly required to evaluate a given strategy, andit is harder because we are not given the tran-sition probabilities associated with the elemen-tary actions—these need to be learned fromdata, and the data are confounded by spuriouscorrelations. As you can see on the bottom lineof figure 15, this task is feasible—the calculusreveals that our modeling assumptions are suf-ficient for estimating the desired quantity fromthe data.

Deciding AttributionI now demonstrate how causal calculus cananswer questions of attribution, namely, find-ing causes of effects rather than effects of caus-es. The U.S. Army conducted many nuclearexperiments in Nevada in the period from1940 to 1955. Data taken over a period of 12years indicate that fallout radiation apparentlyhas resulted in a high number of deaths fromleukemia in children residing in southernUtah. A lawsuit was filed. Is the Army liable forthese deaths?

According to a fairly common judicial stan-dard, damage should be paid if and only if it ismore probable than not that death would nothave occurred but for the action. Can we calcu-late this probability PN that event y would not

have occurred but for event x? (PN stands forprobability of necessity.) The answer is yes; PNis given by the formula

where x′ stands for the complement of x. How-ever, to get this formula, we must assume acondition called monotonicity; that is, radiationcannot prevent leukemia. In the absence ofmonotonicity, the formula provides a lowerbound on the probability of necessity (Pearl2000; Tian and Pearl 2000).

This result, although it is not mentionedexplicitly in any textbooks on epidemiology,

PNP y x P y x

P y x

P y x P y do x

P x y

=( ) ′ ′( )[ ]

( )+

′( ) ′( )( )( )

,

Articles

SPRING 2002 105

X1

U1

ZX 2

U2

Y Output

For Example,Process Control

HiddenDials

VisibleDials

ControlKnobs

Problem: Find the effect of (do(x1), do(x2)) on Y,from data on X1, Z, X2, and Y.

U1

X1

Y Recovery/Death

For Example, Drug Management(Pearl and Robins 1985)

Patient'sHistory

Patient'sImmuneStatus

Episodes of PCP

Solution: P(y | do(x1), do(x2)) = Σz P(y | z, x1, x2) P(z | x1)

Dosagesof Bactrim Z

X 2

U2

Figure 14. Learning to Act by Watching Other Actors (Process Control).

Figure 15. Learning to Act by Watching Other Actors (Treatment Management).

Page 12: Reasoning with Cause and Effect - CS | Computer Scienceftp.cs.ucla.edu/pub/stat_ser/r265-ai-mag.pdf · a causal model. In general, the purpose of a model is to assign truth values

of dollars are invested each year on variouspublic health studies: Is chocolate ice creamgood for you or bad for you? Would red wineincrease or decrease your heart rate? The sameapplies to the social sciences. Would increasingpolice budget decrease or increase crime rates?Is the Columbine school incident owed to TVviolence or failure of public education? TheInteruniversity Consortium for Political andSocial Research distributed about 800 gigabytesworth of such studies in 1993 alone.

Unfortunately, the causal-analytic method-ology currently available to researchers in thesefields is rather primitive, and every innovationcan make a tremendous difference. Moreover,the major stumbling block has not been statis-tical but, rather, conceptual—lack of semanticsand lack of formal machinery for handlingcausal knowledge and causal queries—perfectfor AI involvement. Indeed, several hurdleshave recently been removed by techniquesthat emerged from AI laboratories. I predictthat a quiet revolution will take place in thenext decade in the way causality is handled instatistics, epidemiology, social science, eco-nomics, and business. Although news of thisrevolution will never make it to the DefenseAdvanced Research Projects Agency newsletter,and even the National Science Foundationmight not be equipped to manage it, it willnevertheless have an enormous intellectualand technological impact on our society.

Causes and ExplanationsWe now come to one of the grand problems inAI: generating meaningful explanations. It isthe hardest of all causal tasks considered thusfar because the semantics of explanation is stilldebatable, but some promising solutions arecurrently in the making (see Pearl [2000] andHalpern and Pearl [2001a, 2001b]).

The art of generating explanations is as oldas mankind (figure 16).

According to the Bible, it was Adam whofirst discovered the ubiquitous nature of causalexplanation when he answered God’s questionwith “she handed me the fruit and I ate.” Evewas quick to catch on: “The serpent deceivedme, and I ate.” Explanations here are used forexonerating one from blame, passing on theresponsibility to others. The interpretationtherefore is counterfactual: “Had she not givenme the fruit, I would not have eaten.”

Counterfactuals and Actual CausesThe modern formulation of this concept startsagain with David Hume, who stated (1748):“We may define a cause to be an object fol-

statistics, or law, is rather startling. First, itshows that the first term, which generations ofepidemiologists took to be the formal interpre-tation PN, by the power of which millions ofdollars were awarded (or denied) to plaintiffs inlawsuits, is merely a crude approximation ofwhat lawmakers had in mind by the legal termbut for (Greenland 1999). Second, it tells uswhat assumptions must be ascertained beforethis traditional criterion coincides with law-makers’ intent. Third, it shows us preciselyhow to account for confounding bias P(y | x′) –P(y | do(x′)) or nonmonotonicity. Finally, itdemonstrates (for the first time, to my knowl-edge) that data from both experimental andnonexperimental studies can be combined toyield information that neither study alone canprovide.

AI and Peripheral SciencesBefore I go to the topic of explanation, I wouldlike to say a few words on the role of AI in suchapplications as statistics, public health, andsocial science. One of the reasons that I findthese areas to be fertile ground for trying outnew ideas in causal reasoning is that unlike AI,tangible rewards can be reaped from solvingrelatively small problems. Problems involvingbarely four to five variables, which we in AIregard as toy problems, carry tremendous pay-offs in public health and social science. Billions

Articles

106 AI MAGAZINE

Figure 16. Causal Explanation.“She handed me the fruit and I ate.”“The serpent deceived me, and I ate.”

Page 13: Reasoning with Cause and Effect - CS | Computer Scienceftp.cs.ucla.edu/pub/stat_ser/r265-ai-mag.pdf · a causal model. In general, the purpose of a model is to assign truth values

lowed by another, …, where, if the first objecthas not been, the second never had existed.”

This counterfactual definition was given apossible-world semantics by David Lewis(1973) and an even simpler semantics usingour structural interpretation of definition 2.Note how we write, in surgery language,Hume’s sentence: “If the first object (x) had notbeen, the second (y) never had existed”:

(1) X(u) = x

(2) Y(u) = y

(3) Yx′(u) ≠ y for x′ ≠ x

meaning that given situation u, the solutionfor Y in a model mutilated by the operatordo(X = x′) is not equal to y.

However, this definition of cause is known tobe ridden with problems: It ignores aspects ofsufficiency, it fails in the presence of alterna-tive causes, and it ignores the structure of inter-vening mechanisms. I first demonstrate theseproblems by examples and then provide a solu-tion using a notion called sustenance that iseasy to formulate in our structural-modelsemantics.

Let us first look at the aspect of sufficiency(or production), namely, the capacity of acause to produce the effect in situations wherethe effect is absent. For example, both matchand oxygen are necessary for fire, and neither issufficient alone. Why is match considered anadequate explanation and oxygen an awkwardexplanation? The asymmetry surfaces when wecompute the probability of sufficiency:

P(x is sufficient for y) = P(Yx = y | X ≠ x, Y ≠ y)

for which we obtain:

P(oxygen is sufficient for fire) = P(match) = low

P(match is sufficient for fire) = P(oxygen) = high

Thus, we see that human judgment ofexplanatory adequacy takes into account notmerely how necessary a factor was for theeffect but also how sufficient it was.

Another manifestation of sufficiency occursin a phenomenon known as overdetermination.Suppose in the firing squad example (figure 4),rifleman B is instructed to shoot if and only ifrifleman A does not shoot by 12 noon. If rifle-man A shoots before noon and kills, we wouldnaturally consider his shot to be the cause ofdeath. Why? The prisoner would have diedwithout A’s shot. The answer lies in an argu-ment that goes as follows: We cannot exoner-ate an action that actually took place byappealing to hypothetical actions that mightor might not materialize. If for some strangereason B’s rifle gets stuck, then death would

not have occurred were it not for A’s shot. Thisargument instructs us to imagine a new world,contrary to the scenario at hand, in whichsome structural contingencies are introduced,and in this contingency-inflicted world, we areto perform Hume’s counterfactual test. I callthis type of argument sustenance (Pearl 2000)formulated as follows:

Definition 3: SustenanceLet W be a set of variables, and let w, w′ bespecific realizations of these variables. Wesay that x causally sustains y in u relative tocontingencies in W if and only if(1) X(u) = x;

(2) Y(u) = y;

(3) Yxw(u) = y for all w; and

(4) Yx′w′(u) = y′ ≠ y for some x′ ≠ x and some w′.The last condition Yx′w(u) = y′ weakens

necessity by allowing Y to differ from y (underx′ ≠ x) under a special contingency, when W isset to some w′. However, the third condition,Yxw(u) = y carefully screens the set of permittedcontingencies by insisting that Y retain its val-ue y (under x) for every setting of W = w.

We now come to the second difficulty withthe counterfactual test—its failure to incorpo-rate structural information.

Consider the circuit in figure 17. If someonewere to ask us which switch causes the light tobe on in this circuit, we would point to switch1. After all, switch 1 (S1) causes the current toflow through the light bulb, but switch 2 (S2)is totally out of the game. However, the overallfunctional relationship between the switchesand the light is deceptively symmetric:

Light = S1 � S2

Turning switch 1 off merely redirects thecurrent but keeps the light on, but turning

Articles

SPRING 2002 107

Light

Switch-2

Switch-1

On

Off

Which switch is the actual cause of light? S1!

Deceiving symmetry: Light = S1 � S2

Figure 17. Preemption: How the Counterfactual Test Fails.

Page 14: Reasoning with Cause and Effect - CS | Computer Scienceftp.cs.ucla.edu/pub/stat_ser/r265-ai-mag.pdf · a causal model. In general, the purpose of a model is to assign truth values

but also the inner structure of the process lead-ing from causes to effects. How?

Causal BeamThe solution I would like to propose here isbased on local sustenance relationships. Givena causal model, and a specific scenario in thismodel, we construct a new model by pruningaway, from every family, all parents exceptthose that minimally sustain the value of thechild. I call the new model a causal beam (Pearl2000). In this new model, we conduct thecounterfactual test, and we proclaim an eventX = x the actual cause of Y = y if y depends onx in the new model. I now demonstrate thisconstruction using a classical example owed toP. Suppes (figure 18). It is isomorphic to thetwo-switch problem but more bloodthirsty.

A desert traveler T has two enemies. Enemy1 poisons T’s canteen, and enemy 2, unawareof enemy 1’s action, shoots and empties thecanteen. A week later, T is found dead, and thetwo enemies confess to action and intention. Ajury must decide whose action was the actualcause of T’s death. Enemy 1 claims T died ofthirst, and enemy 2 claims to have only pro-longed T’s life.

Now let us construct the causal beam associ-ated with the natural scenario in which wehave death (Y = 1), dehydration (D = 1), and nopoisoning (C = 0). Consider the C-family (fig-ure 19).

Because emptying the canteen is sufficientfor sustaining no cyanide intake, regardless ofpoisoning, we label the link P → C inactive andthe link X → C sustaining. The link P → C isinactive in the current scenario, which allowsus to retain just one parent of C, with the func-tional relationship C = ¬X (figure 20).

Next, consider the Y-family (in the situationD = 1, C = 0) (figure 21). Because dehydrationwould sustain death regardless of cyanideintake, we label the link C → Y inactive and thelink D → Y sustaining.

We drop the link C → Y, and we end up witha causal beam leading from shooting to deaththrough dehydration. In this final model weconduct the counterfactual test and find thatthe test is satisfied because Y = X. Thus, wehave the license to classify the shooter as thecause of death, not the poisoner, though nonemeets the counterfactual test for necessity on aglobal scale—the asymmetry emanates fromstructural information.

Things will change of course if we do notknow whether the traveler craved for waterbefore or after the shot. Our uncertainty can bemodeled by introducing a background vari-able, U, to represent the time when the traveler

Articles

108 AI MAGAZINE

Dehydration D

Y Death

C Cyanide Intake

Enemy-1Poisons Water

Enemy-2Shoots Canteen

P = 1X = 1

D = 1 C = 0

Y = 1

Figure 18. The Desert Traveler (The Actual Scenario).

Inactive

Dehydration D

Y Death

C Cyanide Intake

Enemy-1Poisons Water

Enemy-2Shoots Canteen

Sustaining

¬X � P

P = 1X = 1

D = 1 C = 0

Y = 1

Figure 19. The Desert Traveler (Constructing a Causal Beam-1).

Dehydration D

Y Death

C Cyanide Intake

Enemy-1Poisons Water

Enemy-2Shoots Canteen

C = ¬X

P = 1X = 1

D = 1 C = 0

Y = 1

Figure 20. The Desert Traveler (Constructing a Causal Beam-2).

switch 2 off has no effect whatsoever—thelight turns off if and only if both switches areoff. Thus, we see that the internal structure ofa process affects our perception of actual cau-sation. Evidently, our mind takes into consid-eration not merely input-output relationships

Page 15: Reasoning with Cause and Effect - CS | Computer Scienceftp.cs.ucla.edu/pub/stat_ser/r265-ai-mag.pdf · a causal model. In general, the purpose of a model is to assign truth values

first reached for drink (figure 22). If the canteen was emptied before T needed

to drink, we have the dehydration scenario, asbefore (figure 21). However, if T drank beforethe canteen was emptied, we have a new causalbeam in which enemy 1 is classified as thecause of death. If U is uncertain, we can useP(u) to compute the probability P(x caused y)because the sentence “x was the actual cause ofy” receives a definite truth value for every valueu of U. Thus,

Temporal PreemptionWe come now to resolve a third objectionagainst the counterfactual test—temporal pre-emption. Consider two fires advancing towarda house. If fire 1 burned the house before fire 2,we (and many juries nationwide) would con-sider fire 1 the actual cause of the damage,although fire 2 would have done the same if itwere not for fire 1. If we simply write the struc-tural model as

H = F1 � F2,

where H stands for “house burns down,” thebeam method would classify each fire equallyas a contributory cause, which is counterintu-itive. Here, the second cause becomes ineffec-tive only because the effect has already hap-pened—a temporal notion that cannot beexpressed in the static causal model we haveused thus far. Remarkably, the idea of a causalbeam still gives us the correct result if we use adynamic model of the story, as shown in figure23.

Dynamic structural equations are obtainedwhen we index variables by time and ask forthe mechanisms that determine their values.For example, we can designate by S(x, t) thestate of the fire in location x and time t anddescribe each variable S(x, t) as dependent onthree other variables: (1) the previous state ofthe adjacent region to the north, (2) the previ-ous state of the adjacent region to the south,and (3) the previous state at the same location(figure 24).

To test which fire was the cause of the dam-age, we simulate the two actions at their corre-sponding times and locations and compute thescenario that unfolds from these actions.Applying the process equations recursively,from left to right, simulates the propagation ofthe two fires and gives us the actual value foreach variable in this spatiotemporal domain.In figure 24, white represents unconsumedregions, black represents regions on fire, andgrey represents burned regions.

P x y P uu x y u

caused caused in

( ) = ( ){ }

Articles

SPRING 2002 109

Dehydration D

y Death

C Cyanide Intake

Enemy-1Poisons Water

Enemy-2Shoots Canteen

Y = X

C = ¬X

Y = D

P = 1X= 1

D = 1 C = 0

Y = 1

Figure 21. The Desert Traveler (The Final Beam).

Enemy-2Shoots Canteen

Dehydration D

Y Death

C Cyanide Intake

Enemy-1Poisons Water

UXUP

X = 1 P = 1

Time to First Drink

u

Figure 22. The Enigmatic Desert Traveler (Uncertain Scenario).

x

x* House

t*

Fire-1

Fire-2 t

Figure 23. Dynamic Model under Action: do(Fire-1), do(Fire-2).

We are now ready to construct the beamand conduct the test for causation. The result-ing beam is unique and is shown in figure 25.

The symmetry is clearly broken—there is adependence between fire 1 and the state of the

Page 16: Reasoning with Cause and Effect - CS | Computer Scienceftp.cs.ucla.edu/pub/stat_ser/r265-ai-mag.pdf · a causal model. In general, the purpose of a model is to assign truth values

medicine, biology, economics, and social sci-ence, and I believe AI is in a unique position tohelp these areas because only AI enjoys thecombined strength of model searching, learn-ing, and automated reasoning within the logicof causation.

The second quote is from Albert Einsteinwho, a year before his death, attributed theprogress of Western science to two fortunateevents: “Development of Western science isbased on two great achievements: The inven-tion of the formal logical system (in Euclideangeometry) by the Greek philosophers, and thediscovery of the possibility to find out causalrelationships by systematic experiment (duringthe Renaissance).”

I have tried to convince you that experimen-tal science has not fully benefited from thepower of formal methods—formal mathemat-ics was used primarily for analyzing passiveobservations under fixed boundary conditions,but the choice of actions, the design of newexperiments, and the transitions betweenboundary conditions have been managed bythe unaided human intellect. The develop-ment of autonomous agents and intelligentrobots requires a new type of analysis in whichthe doing component of science enjoys thebenefit of formal mathematics side by sidewith its observational component. A shortglimpse at these benefits was presented here. Iam convinced that the meeting of these twocomponents will eventually bring aboutanother scientific revolution, perhaps equal inimpact to the one that took place during theRenaissance. AI will be the major player in thisrevolution, and I hope each of you take part inseeing it off the ground.

AcknowledgmentsI want to acknowledge the contributions of awonderful team of colleagues and studentswith whom I was fortunate to collaborate: AlexBalke, David Chickering, Adnan Darwiche,Rina Dechter, Hector Geffner, David Galles,Moises Goldszmidt, Sander Greenland, JosephHalpern, David Heckerman, Jin Kim, JamieRobins, and Tom Verma. Most of these namesshould be familiar to you from other stages andother shows, except perhaps Greenland andRobins, two epidemiologists who are carryingthe banner of causal analysis in epidemiology(Greenland et al. 1999).

I have borrowed many ideas from otherauthors, the most influential ones are listed inthe references, but others are cited in Pearl(2000). I only mention that the fundamentalidea that actions be conceived as modifiers ofmechanisms goes back to Jacob Marschak and

house (in location x*) at all times t ≥ t*; no suchdependence exists for fire 2. Thus, the earlierfire is proclaimed the actual cause of the houseburning.

ConclusionsI would like to conclude this article by quotingtwo great scientists. The first is Democritus(460–370 B.C.), the father of the atomic theoryof matter, who said: “I would rather discoverone causal relation than be King of Persia.”Although the political situation in Persia haschanged somewhat from the time he made thisstatement, I believe Democritus had a validpoint in reminding us of the many applicationareas that could benefit from the discovery ofeven one causal relation, namely, from thesolution of one toy problem, on an AI scale. AsI discussed earlier, these applications include

Articles

110 AI MAGAZINE

x

x* House

t*

Fire-1

Fire-2

S(x, t) = f [S(x, t – 1), S(x + 1, t – 1), S(x – 1, t – 1)]

t

Figure 24. The Resulting Scenario.

Fire-1

Fire-2

x

x* House

t*

Actual cause: Fire-1

t

Figure 25. The Dynamic Beam.

Page 17: Reasoning with Cause and Effect - CS | Computer Scienceftp.cs.ucla.edu/pub/stat_ser/r265-ai-mag.pdf · a causal model. In general, the purpose of a model is to assign truth values

Herbert Simon. Strotz and Wold (1960)were the first to represent actions by“wiping out” equations, and I wouldnever have taken seriously the writingsof these “early” economists if it werenot for Peter Spirtes’s lecture,4 where Ifirst learned about manipulations andmanipulated graphs.

Craig Boutilier deserves much creditfor turning my unruly lecture into areadable article.

Notes1. The lecture was delivered as part of theIJCAI Research Excellence Award for 1999.Color slides and the original transcript canbe viewed at bayes.cs.ucla.edu/IJCAI99/.Detailed technical discussion of this mate-rial can be found in Pearl (2000).

2. H. Simon (1953) devised a test for decid-ing when a one-to-one correspondenceexists; see Pearl (2000).

3. The notation is defined in Pearl (2000).For example,

states that in the subgraph formed by delet-ing all arrows entering X and leaving Z, thenodes of X d-separate those of Y from thoseof Z.

4. Given at the International Congress ofPhilosophy of Science, Uppsala, Sweden,1991.

BibliographyDruzdzel, M. J., and Simon, H. A. 1993.Causality in Bayesian Belief Networks. InProceedings of the Ninth Conference on Uncer-tainty in Artificial Intelligence, eds. D. Heck-erman and A. Mamdani, 3–11. San Francis-co, Calif.: Morgan Kaufmann.

Fikes, R. E., and Nilsson, N. J. 1971. STRIPS:A New Approach to the Application of The-orem Proving to Problem Solving. ArtificialIntelligence 2(3–4): 189–208.

Good, J. 1961. A Causal Calculus, I. BritishJournal for the Philosophy of Science11:305–318.

Greenland, S. 1999. Relation of the Proba-bility of Causation to the Relative Risk andthe Doubling Dose: A Methodologic ErrorThat Has Become a Social Problem. Ameri-can Journal of Public Health 89:1166–1169.

Greenland, S.; Pearl, J.; and Robins, J. M.1999. Causal Diagrams for EpidemiologicResearch. Epidemiology 10(1): 37–48.

Haavelmo, T. 1943. The Statistical Implica-tions of a System of Simultaneous Equa-tions. Econometrica 11:1–12. Reprinted in

Y Z XGX Z

⊥⊥( )

1995. The Foundations of Econometric Analy-sis, eds. D. F. Hendry and M. S. Morgan,477–490. Cambridge, U.K.: Cambridge Uni-versity Press.

Hall, N. 2002. Two Concepts of Causation.In Causation and Counterfactuals, eds. J.Collins, N. Hall, and L. A. Paul. Cambridge,Mass.: MIT Press. Forthcoming.

Halpern, J. Y. 1998. Axiomatizing CausalReasoning. In Uncertainty in Artificial Intel-ligence, eds. G. F. Cooper and S. Moral,202–210. San Francisco, Calif.: MorganKaufmann.

Halpern, J. Y., and Pearl, J. 2001a. Causesand Explanations: A Structural-ModelApproach—Part 1: Causes. In Proceedings ofthe Seventeenth Conference on Uncertainty inArtificial Intelligence, 194–202. San Francis-co, Calif.: Morgan Kaufmann.

Halpern, J. Y., and Pearl, J. 2001b. Causesand Explanations: A Structural-ModelApproach—Part 2: Explanations. In Pro-ceedings of the Seventeenth InternationalJoint Conference on Artificial Intelligence,27–34. Menlo Park, Calif.: InternationalJoint Conferences on Artificial Intelligence.

Heckerman, D. 1999. Learning BayesianNetworks. Invited Talk presented at the Six-teenth International Joint Conference onArtificial Intelligence (IJCAI-99), 3 August,Stockholm, Sweden.

Heckerman, D., and Shachter, R. 1995.Decision-Theoretic Foundations for CausalReasoning. Journal of Artificial IntelligenceResearch 3:405–430.

Hume, D. 1748 (rep. 1988). An Enquiry Con-cerning Human Understanding. Chicago:Open Court.

Lewis, D. 1973. Counterfactuals. Cambridge,Mass.: Harvard University Press.

Lin, F. 1995. Embracing Causality in Speci-fying the Indirect Effects of Actions. In Pro-ceedings of the Fourteenth InternationalJoint Conference on Artificial Intelligence,1985–1991. Menlo Park, Calif.: Interna-tional Joint Conferences on Artificial Intel-ligence.

Michie, D. 1999. Adapting Good’s Theoryto the Causation of Individual Events. InMachine Intelligence 15, eds. D. Michie, K.Furukawa, and S. Muggleton, 60–86.Oxford, U.K.: Oxford University Press.

Nayak, P. P. 1994. Causal Approximations.Artificial Intelligence 70(1–2): 277–334.

Pearl, J. 2000. Causality: Models, Reasoning,and Inference. New York: Cambridge Univer-sity Press.

Pearl, J. 1995. Causal Diagrams for Empiri-cal Research. Biometrika 82(4): 669–710.

Reiter, R. 1987. A Theory of Diagnosis fromFirst Principles. Artificial Intelligence 32(1):57–95.

Shoham, Y. 1988. Reasoning about Change:Time and Causation from the Standpoint ofArtificial Intelligence. Cambridge, Mass.: MITPress.

Simon, H. A. 1953. Causal Ordering andIdentifiability. In Studies in EconometricMethod, eds. W. C. Hood and T. C. Koop-mans, 49–74. New York: Wiley.

Spirtes, P.; Glymour, C.; and Scheines, R.1993. Causation, Prediction, and Search.New York: Springer-Verlag.

Strotz, R. H., and Wold, H. O. A. 1960.Recursive versus Nonrecursive Systems: AnAttempt at Synthesis. Econometrica 28(2):417–427.

Tian, J., and Pearl, J. 2000. Probabilities ofCausation: Bounds and Identification.Annals of Mathematics and Artificial Intelli-gence 28:287–313.

Wright, S. 1921. Correlation and Causa-tion. Journal of Agricultural Research20:557–585.

Wright, S. 1920. The Relative Importanceof Heredity and Environment in Determin-ing the Piebald Pattern of Guinea Pigs. Pro-ceedings of the National Academy of Science6:320–332.

Judea Pearl received aB.S. in electrical engi-neering from the Tech-nion Haifa Israel in 1960,an M.S. in physics fromRutgers University, and aPh.D. in electrical engi-neering, from the Poly-

technic Institute of Brooklyn. He joined thefaculty at the University of California at LosAngeles in 1970 and is currently director ofthe Cognitive Systems Laboratory, where heconducts research in knowledge representa-tion, probabilistic and causal reasoning,constraint processing, and learning. Amember of the National Academy of Engi-neering, a Fellow of the Institute of Electri-cal and Electronics Engineers, and a Fellowof the American Association for ArtificialIntelligence, Pearl is a recipient of the IJCAIResearch Excellence Award (1999), the AAAIClassic Paper Award (2000), and the LakatosAward of Outstanding Contribution to thePhilosophy of Science (2001).

Articles

SPRING 2002 111

Page 18: Reasoning with Cause and Effect - CS | Computer Scienceftp.cs.ucla.edu/pub/stat_ser/r265-ai-mag.pdf · a causal model. In general, the purpose of a model is to assign truth values

AAAI invites proposals for the 2003 Spring Symposium Series,to be held March 24-26, 2003 at Stanford University, California.

The Spring Symposium Series is an annual set of meetings run inparallel at a common site. It is designed to bring colleagues togetherin an intimate forum while at the same time providing a significantgathering point for the AI community. The two and a half day formatof the series allows participants to devote considerably more time tofeedback and discussion than typical one-day workshops. It is an idealvenue for bringing together new communities in emerging fields.

The symposia are intended to encourage presentation of specula-tive work and work in progress, as well as completed work. Ampletime should be scheduled for discussion. Novel programming, includ-ing the use of target problems, open-format panels, working groups,or breakout sessions, is encouraged. Working notes will be prepared,and distributed to the participants. At the discretion of the individualsymposium chairs, these working notes may also be made available asAAAI Technical Reports following the meeting. Most participants ofthe symposia will be selected on the basis of statements of interest orabstracts submitted to the symposia chairs; some open registrationwill be allowed. All symposia are limited in size, and participants willbe expected to attend a single symposium.

Proposals for symposia should be between two and five pages inlength, and should contain:

A title for the symposium.A description of the symposium, identifying specific areas of inter-est, and, optionally, general symposium format.The names and addresses (physical and electronic) of the organiz-ing committee: preferably three or more people at different sites, allof whom have agreed to serve on the committee.A list of potential participants that have been contacted and thathave expressed interest in participating. A common way of gather-ing potential participants is to send email messages to email listsrelated to the topic(s)of the symposium. Note that potential partic-ipants need not commit to participating, only state that they areinterested.Ideally, the entire organizing committee should collaborate in pro-

ducing the proposal. If possible, a draft proposal should be sent out toa few of the potential participants and their comments solicited.

Approximately eight symposia on a broad range of topics withinand around AI will be selected for the 2003 Spring Symposium Series.All proposals will be reviewed by the AAAI Symposium Committee,chaired by Holly Yanco, University of Massachusetts Lowell. The cri-teria for acceptance of proposals include:

Perceived interest to the AAAI community. Although AAAI encour-ages symposia that cross disciplinary boundaries, a symposium mustbe of interest to some subcommunity of the AAAI membership. Sym-posia that are of interest to a broad range of AAAI members are alsopreferred.

Appropriate number of potential participants. Although the seriessupports a range of symposium sizes, the target size is around 40-60participants.

Lack of a long-term ongoing series of activities on the topic. TheSpring Symposium Series is intended to nurture emerging communi-ties and topics, so topics that already have yearly conferences or work-shops are inappropriate.

An appropriate organizing committee. The organizing committeeshould have (1) good technical knowledge of the topic, (2) good orga-nizational skills, and (3) connections to the various communitiesfrom which they intend to draw participants. Committees for cross-disciplinary symposia must adequately represent all the disciplines tobe covered by the symposium.

Accepted proposals will be distributed as widely as possible over thesubfields of AI, and balanced between theoretical and applied topics.Symposia bridging theory and practice and those combining AI andrelated fields are particularly solicited.

Symposium proposals should be submitted as soon as possible, butno later than April 22, 2002. Proposals that are submitted significantlybefore this deadline can be in draft form. Comments on how toimprove and complete the proposal will be returned to the submitterin time for revisions to be made before the deadline. Notifications ofacceptance or rejection will be sent to submitters around May 6, 2002.The submitters of accepted proposals will become the chair of thesymposium, unless alternative arrangements are made. The sympo-sium organizing committees will be responsible for:

Producing, in conjunction with the general chair, a Call for Partic-ipation and Registration Brochure for the symposium, which willbe distributed to the AAAI membershipAdditional publicity of the symposium, especially to potential audi-ences from outside the AAAI communityReviewing requests to participate in the symposium and determin-ing symposium participantsPreparing working notes for the symposiumScheduling the activities of the symposiumPreparing a short review of the symposium, to be printed in AIMagazine

AAAI will provide logistical support, will take care of all local arrange-ments, and will arrange for reproducing and distributing the workingnotes. Please submit (preferably by electronic mail) your symposiumproposals, and inquiries concerning symposia, to:

Holly YancoComputer Science DepartmentUniversity of Massachusetts LowellOlsen Hall, 1 University AvenueLowell, MA 01854Voice: 978-934-3642Fax: 978-934-3551E-mail: [email protected]

Call for Proposals:

2003 AAAI Spring Symposium SeriesMarch 24-26, 2003

Sponsored by the American Association for Artificial Intelligence

112 AI MAGAZINE