Cause and CancerEpidemiology week 3-1b.pdfThe philosopher Bryan Magee summarized this conundrum...

7
1 Cause and CancerEpidemiology STEVEN N. GOODMAN AND JONATHAN M. SAMET -f-h" prevention of disease has long been based implicitly on taking I action on the assumption that a disease is caused by a factor that can be controlled. Early examplesinclude the experimentalevidence generated by Lind, showing that consumptionof orangesand lemons prevented scurvy, and Snow's observations on cholera occurrencein London, showing a disease pattern consistent with water-borne trans- mission (Rosen, 1993).In theseexamples, preventive stepsfollowed: After Lind's experiment, the diets of the British navy were supple- mented with citrus fruits; and after Snow's observational study, steps were taken to ensure that the source of water was changed in the affected areas of London. Over the ensuingcenturies, infectious agents were causally linked to specific diseases, and prevention was ac- complished by intemrpting transmission and by vaccines. During the twentieth century,public health was threatened by parallel epidemics of chronic diseases, including cancer; and as the causal agents were identified, a broad range of preventive strategieswere implemented. The concept of causationhas long had a central role in the appli- cation of epidemiologic evidencefor controlling cancer. The designa- tion of a risk factor as "causal" has beenthe starting point for initiating cancer prevention programs based on reducing exposure to the risk factor. Although the concept of causation itself remains a matter of continuing discussion among philosophers and others, use of the term in public health implies that the evidencesupporting causalityof asso- ciation has reached a critical threshold of certainty and that reduced exposurecan be expectedto be followed by reduced disease occur- rence.Over the last 50 years,identification of the causes of cancerhas beenthe primary focus of most epidemiologicresearch on cancer;only recently has attention shifted toward identifying genetic determinants of susceptibility and markers of the early stagesof carcinogenesis. There are numerousexamples of how identifying a cause of cancer has led to intervention and reduction of cancer occurrence. Tobacco use and cancer of the lung is a notable example for its historical prece- dence and for the framework applied to the scientific evidence as the causality of the association was evaluated(US Departmentof Health Education and Welfare-DHEW 1964; White, 1990). The range of causal risk factorsfor cancer is broad,including infectiousagents (e.g., human papillomavirus and cervical cancer), physical agents (e.g., ionizing radiation and leukemia), inhaled agents (e.g.,radon and lung cancer),pharmaceutical agentsand hormones (e.g., diethylstilbestrol and adenocarcinoma of the vagina), food contaminants (e.g., aflatoxin and liver cancer), workplace exposures (e.g., asbestos), life style- related exposures (e.g., alcohol consumption), and genetic mutations (e.g., Li-Fraumeni syndrome).These and other factors consideredto be causes of cancerhave been given this label only after the accumu- lation of sufficient evidence, in most instances derived from both epi- demiologic and laboratory research. This chapter provides an overview of causal inference with a focus on the interpretation of epidemiologic data on cancer risk. It begins with an introduction to the centuries-old discussion on causeand cau- sation and next considers the epidemiologic concept of causation, setting the discussion in the context of current understanding of car- cinogenesisas a multistep process. The criteria for causation,often attributed to the British medical statistician Sir Austin Bradford Hill (Hill, 1965) or to the 1964 Report of the U.S. Surgeon General on tobacco (US Department of Health Education and Welfare-DHEW, l9&), have provided a framework for evaluating evidence to judge the causalitv of associations. Thesecriteria are addressed in depth, and their application is illus- trated with the example of smoking, both active and passive, and lung cancer. The chapterconcludes with a consideration of emergingissues concerned with causation, including the intelpretation of data coming from the new technologies of contemporary "molecular epidemiol- ogy" and new approaches to evaluatingcausation. CONCEPTS OF CAUSATION At its foundation, "cause" is not knowable with certainty. This fact underlies much of the methodologic and conceptual confusion that often swirls around claims of causation basedon scientific data. The fundamental intuition underlying the causalconcept is that event "A" somehow produces another event, "B." However, the "production" of "B" is not observable. The philosopher Bryan Magee summarized this conundrum eloquently. It seems to be impossible for us to form any conception of an ordered world at all without the idea of there being causal con- nections between events. But when we pursue this idea seriously we find that causal connection is not anything we ever actually observe, nor ever can observe. We may say that Event A causes Event B, but when we examine the situation we find that what we actually observed is Event A followed by Event B. There is not some third entity between them, a casual link, which we also observe. . . . So we have this indispensable notion of causeat the very heart of our conception of the world, and of our understand- ing of our own experience, which we find ourselves quite unable to validate by observation or experience . . . It actually purports to tell us how specific material events are related to each other in the real world, yet it is not derived from, nor can it be validated by, observation of that world. This is deeply mysterious. (Magee, 2001) The fact that causation is not directly observable meansthat scien- tists and philosophers have had to develop a set of constructs and heuristics by which to define a "cause" operationally. These constructs typically have two components: a predictive or associational one, determined empirically, and an explanatory one, based on a proposed underlying mechanism.All causal claims rest on these twin pillars; an associationwith no plausible mechanistic basis is typically not accepted as causal,and a proposed mechanism,however well founded, cannot be accepted as the basis for a causalclaim without empirical demonstration that the effect occurs more often in the presenceof the purportedcausethan in its absence. However, thesecomponents need not contribute equally, and various causal claims may rest on quite different balances of contributions of empirical and mechanistic information. Underlying any operationaldefinition of causality must be an onto- logic one: that is, how a causeis defined in principle. A particularly useful, widely accepted definition in both philosophy and epidemiol- ogy is the "counterfactual" notion of causation. This concept had its origins at least as far back as the English philosopher David Hume (17Il-1776) (Hume, 1139). During the twentieth century, this concept was further developed and applied by statisticians, philosophers, and epidemiologists (Bunge, 1959; lrwis, 1973; Rubin, 1974; Robins, 1986, 1987; Greenland, 1990; Neyman, 1990; Greenland et al., 1999;

Transcript of Cause and CancerEpidemiology week 3-1b.pdfThe philosopher Bryan Magee summarized this conundrum...

Page 1: Cause and CancerEpidemiology week 3-1b.pdfThe philosopher Bryan Magee summarized this conundrum eloquently. It seems to be impossible for us to form any conception of an ordered world

1 Cause and CancerEpidemiology

STEVEN N. GOODMAN AND JONATHAN M. SAMET

-f-h" prevention of disease has long been based implicitly on takingI action on the assumption that a disease is caused by a factor that

can be controlled. Early examples include the experimental evidencegenerated by Lind, showing that consumption of oranges and lemonsprevented scurvy, and Snow's observations on cholera occurrence inLondon, showing a disease pattern consistent with water-borne trans-mission (Rosen, 1993).In these examples, preventive steps followed:After Lind's experiment, the diets of the British navy were supple-mented with citrus fruits; and after Snow's observational study, stepswere taken to ensure that the source of water was changed in theaffected areas of London. Over the ensuing centuries, infectious agentswere causally linked to specific diseases, and prevention was ac-complished by intemrpting transmission and by vaccines. During thetwentieth century, public health was threatened by parallel epidemicsof chronic diseases, including cancer; and as the causal agents wereidentified, a broad range of preventive strategies were implemented.

The concept of causation has long had a central role in the appli-cation of epidemiologic evidence for controlling cancer. The designa-tion of a risk factor as "causal" has been the starting point for initiatingcancer prevention programs based on reducing exposure to the riskfactor. Although the concept of causation itself remains a matter ofcontinuing discussion among philosophers and others, use of the termin public health implies that the evidence supporting causality of asso-ciation has reached a critical threshold of certainty and that reducedexposure can be expected to be followed by reduced disease occur-rence. Over the last 50 years, identification of the causes of cancer hasbeen the primary focus of most epidemiologic research on cancer;onlyrecently has attention shifted toward identifying genetic determinantsof susceptibility and markers of the early stages of carcinogenesis.

There are numerous examples of how identifying a cause of cancerhas led to intervention and reduction of cancer occurrence. Tobaccouse and cancer of the lung is a notable example for its historical prece-dence and for the framework applied to the scientific evidence as thecausality of the association was evaluated (US Department of HealthEducation and Welfare-DHEW 1964; White, 1990). The range ofcausal risk factors for cancer is broad, including infectious agents (e.g.,human papillomavirus and cervical cancer), physical agents (e.g.,ionizing radiation and leukemia), inhaled agents (e.g., radon and lungcancer), pharmaceutical agents and hormones (e.g., diethylstilbestroland adenocarcinoma of the vagina), food contaminants (e.g., aflatoxinand liver cancer), workplace exposures (e.g., asbestos), life style-related exposures (e.g., alcohol consumption), and genetic mutations(e.g., Li-Fraumeni syndrome). These and other factors considered tobe causes of cancer have been given this label only after the accumu-lation of sufficient evidence, in most instances derived from both epi-demiologic and laboratory research.

This chapter provides an overview of causal inference with a focuson the interpretation of epidemiologic data on cancer risk. It beginswith an introduction to the centuries-old discussion on cause and cau-sation and next considers the epidemiologic concept of causation,setting the discussion in the context of current understanding of car-cinogenesis as a multistep process. The criteria for causation, oftenattributed to the British medical statistician Sir Austin Bradford Hill(Hill, 1965) or to the 1964 Report of the U.S. Surgeon General ontobacco (US Department of Health Education and Welfare-DHEW,l9&), have provided a framework for evaluating evidence to judgethe causalitv of associations.

These criteria are addressed in depth, and their application is illus-trated with the example of smoking, both active and passive, and lungcancer. The chapter concludes with a consideration of emerging issuesconcerned with causation, including the intelpretation of data comingfrom the new technologies of contemporary "molecular epidemiol-ogy" and new approaches to evaluating causation.

CONCEPTS OF CAUSATION

At its foundation, "cause" is not knowable with certainty. This factunderlies much of the methodologic and conceptual confusion thatoften swirls around claims of causation based on scientific data. Thefundamental intuition underlying the causal concept is that event "A"somehow produces another event, "B." However, the "production" of"B" is not observable. The philosopher Bryan Magee summarized thisconundrum eloquently.

It seems to be impossible for us to form any conception of anordered world at all without the idea of there being causal con-nections between events. But when we pursue this idea seriouslywe find that causal connection is not anything we ever actuallyobserve, nor ever can observe. We may say that Event A causesEvent B, but when we examine the situation we find that what weactually observed is Event A followed by Event B. There is notsome third entity between them, a casual link, which we alsoobserve. . . . So we have this indispensable notion of cause at thevery heart of our conception of the world, and of our understand-ing of our own experience, which we find ourselves quite unableto validate by observation or experience . . . It actually purports totell us how specific material events are related to each other in thereal world, yet it is not derived from, nor can it be validated by,observation of that world. This is deeply mysterious. (Magee,2001)

The fact that causation is not directly observable means that scien-tists and philosophers have had to develop a set of constructs andheuristics by which to define a "cause" operationally. These constructstypically have two components: a predictive or associational one,determined empirically, and an explanatory one, based on a proposedunderlying mechanism. All causal claims rest on these twin pillars;an association with no plausible mechanistic basis is typically notaccepted as causal, and a proposed mechanism, however well founded,cannot be accepted as the basis for a causal claim without empiricaldemonstration that the effect occurs more often in the presence of thepurported cause than in its absence. However, these components neednot contribute equally, and various causal claims may rest on quitedifferent balances of contributions of empirical and mechanisticinformation.

Underlying any operational definition of causality must be an onto-logic one: that is, how a cause is defined in principle. A particularlyuseful, widely accepted definition in both philosophy and epidemiol-ogy is the "counterfactual" notion of causation. This concept had itsorigins at least as far back as the English philosopher David Hume(17Il-1776) (Hume, 1139). During the twentieth century, this conceptwas further developed and applied by statisticians, philosophers, andepidemiologists (Bunge, 1959; lrwis, 1973; Rubin, 1974; Robins,1986, 1987; Greenland, 1990; Neyman, 1990; Greenland et al., 1999;

Page 2: Cause and CancerEpidemiology week 3-1b.pdfThe philosopher Bryan Magee summarized this conundrum eloquently. It seems to be impossible for us to form any conception of an ordered world

PART l: BASIC CONCEPTS

Pearl, 2000).The counterfactual definition holds that something is acause of a given outcome if, when the same individual is observedwith and without a purported cause and without changing any othercharacteristic of that individual. a different outcome would beobserved. For example, the counterfactual state for a smoker is thesame individual never having smoked. The state that cannot beobserved is called the counterfactual state: literally, counter to theobserved facts. The impossibility of observing the counterfactual stateis what makes all causal claims subject to uncertainty.

The above definition is deterministic; that is, the outcome alwaysoccurs in the presence of the cause and never occurs without it.However, health research rarely deals with either a cause thatinevitably produces certain outcomes, or outcomes that cannot occurabsent specific causes. For example, smokers do not always get lungcancer, and never-smokers do develop this malignancy. Therefore, thecounterfactual definition must be expanded to encompass the notionof a probabilistic outcome. That is, the formal definition of a cause inepidemiology requires that a factor X be associated with a differencein the probability of an outcome. For example, if X may take on twodifferent values, y or z:

Condition l: observed association

Pr(outcome I X = il * Pr(outcome I X = z)

Properly designed studies provide a scientific basis for inferringwhat the outcome of the counterfactual state would be and permit therelated uncertainty to be quantified. In a laboratory, scientists are ableto predict the outcome in this counterfactual state, generally with ahigh degree of confidence, by repeating an experimental procedurewith every factor tightly controlled, varying only the factor of inter-est. In observational studies of humans, however, researchers must tryto infer what the outcome would be in a counterfactual state by study-ing another group of persons who, at least on average, are substan-tively different from the exposed group in only one variable: theexposure under study. The outcome of this second group is used torepresent what would have occurred in the original group if it wereobserved with an exposure different from that which actually existed(Greenland, 1990). In the case of smoking and disease, this compari-son is between disease risk in smokers and nonsmokers.

Simply observing a difference in the probability of an outcomebetween two groups that differ on X is not sufficient condition forcausation because it does not distinguish between causation andspurious or indirect association, produced by "confounders," or ancil-lary causes. The notion of "causation" requires that the cause somehowactively "produce" its effect, which is captured operationally by therequirement that active manipulation of the cause should produce achange in the probability of the outcome. For example, if one saw thatstudents with poor visual acuity typically sat closer to the front of aclassroom, one would not call the seating arrangement a "cause" oftheir poor eyesight unless it could be shown that seating them fartherback improved it. The notation that captures this idea is one thatintroduces an operator, not part of traditional statistical notation ",Set(X = x)," which corresponds to actively setting a risk factor X equalto some value x, rather than simply observing that the factor is equalto r. Thus the counterfactual notion of probabilistic causation for arisk factor X requires condition 2.

Condition 2: no confounding

Pr(outcome I set[X - -r)] = Pr(outcome I X = x)

If we put together condition l-that there is an observed associationbetween cause and effect-with condition 2-that there is no otherindirect cause responsible for the observed effect-we have the coun-terfactual condition for probabilistic causation, expressed as follows.

Condition I + Condition2= Causality condition

Pr[outcome I set(X = y)] * Pr[outcome I set(X = 3)]

This condition states that if the probability of an outcome changeswhen risk factor X is actively changed from z to y, then X is regardedas a cause of the outcome.

In the randomized controlled trial, a risk factor is actively manipu-lated. Understanding the role of randomization can deepen insightsinto the interpretation of nonrandomized designs used in epidemiol-ogy. Randomization has two critical consequences: (l) it makesexposure to a proposed causal factor independent of potentiallyconfounding factors; and (2) it provides a known probability distribu-tion for the potential outcomes in each group under a given mathe-matical hypothesis (i.e., the null) (Greenland, 1990). Randomizationdoes not necessarily free the inference from an individual randomizedstudy from unmeasured confounding (it does so only on average).Randomization does imply that measures of uncertainty about causalestimates from randomized studies have an experimental foundation.In the absence of randomization, uncertainty about causal effectsdepends in part on the confidence that all substantive confounding hasbeen eliminated or controlled by either the study design or the analy-sis. The level of confidence is ultimately based on scientific judgmentand consequently is subject to uncertainty and questioning.

One way to increase that confidence is to repeat the study. Similarresults in a series of randomized studies make it increasingly unlikelythat unmeasured confounding is accounting for the findings, as theprocess of randomization makes the mathematical probability of suchconfounding progressively smaller as the sample size or number ofstudies increase. In observational studies, however, increasing thenumber of studies may reduce the random component of uncertainty,but not necessarily the systematic component atffibutable to con-founding. Without randomization, there is no mathematical basis forassuming that an imbalance of unknown confounders decreases withan increase in the number of studies. However, if observational studiesare repeated in different settings with different persons, different eli-gibility criteria, and/or different exposure opportunities, each of whichmight eliminate another source of confounding from consideration, theconfidence that unmeasured confounders are not producing the find-ings is increased. How many studies need to be done, how diverse theyneed to be, and ho..v relevant they are to the question at hand arematters of scientific judgment, and explicit criteria cannot be offered.

Confidence that unmeasured confounding is not producing theobserved results is further increased by understanding the biologicprocess by which the exposure might affect the outcome. This under-standing allows better identification and measurement of relevant con-founders, making it more unlikely that unmeasured factors are ofconcern. Biologic understanding can also serve as the basis for a judg-ment that the observed difference in outcome frequency could beproduced only by an implausible degree of confounder imbalancebetween exposed and unexposed groups. Thus, causal conclusionsfrom observational studies typically require more and stronger bio-logic evidence to support plausibility and to exclude confounding thanis needed for causal inferences based on randomized studies.

COMPONENT CAUSE MODEL

Causes defined in the manner described above can be viewed asworking together in many ways. In 1976 Rothman proposed a usefulframework for considering multiple-cause diseases (Rothman, 1976)that has ready extension to the causation of cancer, as most cancershave several causes. Rothman proposed that a disease may haveseveral sufficient causes, each accounting for some proportion of thecases in the population, and that each cause may have several com-ponents (Fig. 1-l) (Rothman, 1976). With this model, each compo-nent of the three complete causes must be present for disease todevelop. For example, sufficient cause I would be incomplete if A werenot present; and because A is a component of each of the three com-plete causes, it is a necessary cause.

Rothman's model is useful for considering causation of malignancy,particularly for most of the cancers for which multiple genetic andenvironmental risk factors may play a role in a multistage process thattransforms a normal cell into a malignant cell. There may be multipleways to complete this sequence of changes, involving the actions ofdifferent complexes of factors, analogous to complete causes I, II, andIII in Figure 1-1. The sufficient causes might include multiple envi-

Page 3: Cause and CancerEpidemiology week 3-1b.pdfThe philosopher Bryan Magee summarized this conundrum eloquently. It seems to be impossible for us to form any conception of an ordered world

SufiicientCause

t l

Figure l-1. Conceptual scheme for the causes of a hypothetical disease.( Source: Rothman, 197 6.)

ronmental and genetic risk factors, including environmental exposuresand genes determining carcinogen metabolism and DNA repair.Individual cases would result from having the full complement ofcomponents for a complete cause. We know, for example, that ciga-rette smoking is a powerful cause of lung cancer, but not all smokersdevelop lung cancer, implying that this factor may need to act in com-bination with other factors, perhaps genetic, to complete one sufficientcause for lung cancer. Some sufficient causes for lung cancer do notinclude smoking, as some percentage of lung cancer cases occur innever-smokers (about SVo-IOVo of cases in the United States atpresent) (Alberg and Samet,2003).

Rothman's model has one significant implication for considering theburden of cancer attributable to a particular risk factor, a calculationoften made when assigning priorities to prevention initiatives. Thepresence of several components in the same complete causes (e.9., Aand B in causes I and II) implies that the cases associated with thesecauses might be prevented by eliminating exposure to either A or B.The burden of disease to be prevented exceeds I00Vo, as the attribut-able risk estimates for A and B would include some of the same cases.In some past reviews of the burden of preventable cancer, the assump-tion was made incorrectly that the attributable risks associated withvarious causal factors should add up to INVo (Samet and Lerchen,1984).

CRITERIA FOR CAUSALITY

Epidemiologists and other public health researchers have needed prag-matic definitions of causation to support the translation of researchevidence into interventions directed at reducing the exposure to causalrisk factors (Susser, 1973,I99I; Olsen, 2003). The epidemiologic lirerature has consequently placed great emphasis on the approach toevaluating evidence to determine if a factor can be considered to causedisease. The approaches that have been developed for evaluatingcausality of associations also draw on multiple lines of scientific evi-dence; epidemiologic evidence alone is generally not regarded as suf-ficient for establishing causality (Last, 2000).

Making causal inferences from observational data, in combinationwith other relevant forms of data, can be a challenging task thatrequires expert judgment regarding the likely sources and magnitudeof confounding, together with judgment about how well the existingconstellation of study designs, results, and analyses address this poten-tial threat to inferential validity. This judgment also needs to incorpo-rate a broader assessment of the evidence, evaluating whether a causaleffect has support in the existing knowledge of the underlying biologicprocess. To aid this judgment, criteria for determining a cause havebeen proposed by many philosophers and scientists over the centuries.

In biomedical research, the first criteria came following the dis-covery of bacteria during the nineteenth century. A method was thenneeded forjudging if an organism caused a particular disease. The firstcriteria put forward for making this judgment are generally attributedto Robert Koch and his mentor Jacob Henle, although Koch alsoacknowledged Eugene Klebs. Evans (1993) provided a full account-ing of the elaboration of these criteria, now referred to as the Henle-

Epidemiology

Table 1-1. Henle-Koch Postulates

1. The parasite occurs in every case of the disease in question and undercircumstances that can account for the pathologic changes and clinicalcourse of the disease.

2. It occurs in no other disease as a fornritous and nonpathogenic parasite.3. After being fully isolated from the body and repeatedly grown in pure

culture. it can induce the disease anew.

Source: Evans (1993).

Koch postulates (Table 1-l). The criteria proved valuable for linkinginfectious agents to infectious diseases, which often have specificclinical features and unique, specific causal agents (e.g., pulmonarytuberculosi s and M y c ob a c t e r ium tub e rc ul o s i s).

These criteria, however, proved unsuitable for establishing thecauses of the epidemics of "chronic disease," including coronary heartdisease, chronic lung disease, and cancer, that became the dominantcauses of death spanning the twentieth century, as infectious diseaseswere controlled. Unlike many infectious diseases, these diseases wereoften found to be associated with multiple factors, and many casescould not be clearly linked to any risk factors. The limitations of theHenle-Koch postulates were recognized as the results of the first waveof epidemiologic studies on the chronic diseases were reported. In1959, Yerushalmy and Palmer proposed criteria for evaluating possi-ble etiologic risk factors for chronic diseases that acknowledged theneed for evidence of increased risk in exposed persons and for han-dling the nonspecific causation of these diseases. Lilienfeld (1959) andSartwell (1960), discussing the article, added the consideration ofdose-response, the strength of the association, its consistency, and itsbiologic plausibility.

The most widely cited criteria in epidemiology and public healthmore generally were set forth by Sir Austin Bradford Hill in 1965(Table 1-2) (Hill, 1965). Five of the nine criteria he listed were alsoput forward in the 1964 Surgeon General's report (US Department ofHealth Education and Welfare-DHEW 1964) as the criteria forcausal judgment: consistency, strength, specificity, temporality, andcoherence of an observed association. Hill also listed biologic gradi-ent (dose-response), plausibility, experiment (or natural experiment),and analogy. Many of these criteria had been cited in earlier epi-demiologic writings (Lilienfeld,1959; Yerushalmy and Palmer, 1959;Sartwell, 1960); and Susser and others have refined them extensivelyby exploring their justification, merits, and interpretations (Susser,1973,1977; Kaufman and Poole, 2000).

Hill (1965) clearly stated that these criteria were not intended toserve as a checklist

Here are then nine different viewpoints from all of which weshould study association before we cry causation. What I do notbelieve . . . is that we can usefully lay down some hard-and-fastrules of evidence that must be obeyed before we accept cause andeffect. None of my nine viewpoints can bring indisputable evidencefor or against the cause-and-effect hypothesis, and none can berequired as a sine qua non. What they can do, with greater or lessstrength, is to help us to make up our minds on the fundamental

Table l-2. Sir Austin Bradford Hill's Causal Criteria: Aspects ofAssociation to Be Considered Before Deciding on Causation

StrengthConsistencySpecificityTemporalityBiologic gradientPlausibilityCoherenceExperimentAnalogy

SufficientCause

I

Cause and Cancer

SufficientCause

lil

Source: Hill (1965).

Page 4: Cause and CancerEpidemiology week 3-1b.pdfThe philosopher Bryan Magee summarized this conundrum eloquently. It seems to be impossible for us to form any conception of an ordered world

6 PART l: BASIC CONCEPTS

question-is there any other way of explaining the facts before us,is there any other answer equally, or more, likely than cause andeffect? (Hill 1965, p.299)

All of these criteria were meant to be applied to evidence related toan already established statistical association; if no association has beenobserved, these criteria are not relevant. Hill explained how if a givencriterion were satisfied it strengthened a causal claim. Each of thesenine criteria served one of two purposes: as evidence against com-peting noncausal explanations or as positive support for causal ones.Noncausal explanations for associations include chance; residualor unmeasured confounding; model misspecification; selection bias;errors in measurement of exposure, confounders, or outcome; andissues regarding missing data (which can also include missing studies,such as publication bias). The criteria are discussed below.

Consistency

The criterion consistency refers to the persistent finding of an associ-ation between exposure and outcome in multiple studies of adequatepower and carried out by different investigators in studies involvingdifferent persons, places, circumstances, and times. Consistency canhave two implications for causal inference. First, consistent findingsmake unmeasured confounding an unlikely alternative explanationto causation for an observed association. Such confounding wouldhave to persist across diverse populations, exposure opportunities, andmeasurement methods. The confounding is still possible if the expo-sure of interest were strongly and universally tied to an alternativecause, as was claimed in the form of the "constitutional hypothesis"put forward in the early days of the smoking-disease debate (USDepartment of Health Education and Welfare-DHEW 1964). Thishypothesis held that there was a constitutional (i.e., genetic) factor thatmade people more likely to both smoke and develop cancer. Thus, con-sistency serves mainly to exclude the possibility that the associationis produced by an ancillary factor that differs across studies butnot one factor that is common to all or most of them (Rothman andGreenland, 1998).

The second implication of the consistency criterion is to reduce thepossibility of a chance effect by increasing the statistical strength ofan association through the accumulation of a large body of data. Con-sistency does not include the qualitative strength of such studies,which Susser subsumed under his subsidiary concept of "survivabil-ity," relating to the rigor and severity of tests of association (Susser,1991 ) .

Strength of Association

Strength of association includes two dimensions: the magnitude of theassociation and its statistical strength. An association strong in bothaspects makes the alternative explanations of chance and confoundingunlikely. The larger the measured effect, the less likely it is that anunmeasured or poorly controlled confounder could account for it com-pletely. Associations that have a small magnitude or weak statisticalstrength are more likely to reflect chance, a modest degree of bias, orunmeasured weak confounding. However, the magnitude of associa-tion is reflective of underlying biologic processes and should be con-sistent with understanding the role of the risk factor in these processes.Either a strong or a weak effect might be considered plausible basedon knowledge of the underlying processes.

In the example of active smoking and lung cancer, the relative riskslisted in the first Surgeon General's Report (US Department of HealthEducation and Welfare-DHEW, 1964) were notably elevated in men,reaching as high as 10 or more. At that time, other causes of lungcancer, including air pollution and occupational agents, had been iden-tified. However, for the general population, the risks from these factorswere far lower, making them unsatisfactory as potential confounders,

:""Xa"rf: to the observed association of active smoking with lung

Passive smoking, by contrast, has a far smaller effect on lung cancerrisk. Comparing persons with greater and lesser exposures (e.g., never-

smoking women married to smokers compared with never-smokingwomen married to never-smokers). The 1986 report of the U.S.Surgeon General (US Department of Health and Human Services-USDHHS, 1986) concluded that passive smoking does cause lungcancer. The magnitude of the effect was small in most of the studies,as anticipated on a biologic basis, but within a plausible range. Therelative risk associated with marriage to a smoker has been estimatedto be L2 in a recent meta-analysis (International Agency for Researchon Cancer-IARC, 2002\.

Specificity

Specificity has been interpreted to mean both a single (or few) effect(s)of one cause or no more than one possible cause for one effect. Inaddition to specific infectious diseases caused by specific infectiousagents, other examples include asbestos exposure and mesotheliomaand thalidomide exposure during gestation and the resulting unusualconstellation of birth defects. This criterion is rarely used as it wasoriginally proposed, having been derived primarily from the Henle-Koch postulates for infectious causes of disease (Susser, 1991). Whenspecificity exists, it can strengthen a causal claim, but its absence doesnot weaken it (Sartwell, 1960). For example, most cancers are knownto have multifactorial etiologies; many cancer-causing agents cancause several types of cancer, and these agents can also have non-cancerous effects.

When considering specificity in relation to the smoking*lung cancerassociation, the 1964 Surgeon General's report (US Departmentof Health Education and Welfare-DHEW 1964) provides a richdiscussion of this criterion. The committee recognized the linkagebetween this criterion and strength of association and offered a sym-metrical formulation of specificity in the relation between exposureand disease; that is, a particular exposure always results in a particu-lar disease, and the disease always results from the exposure. The com-mittee acknowledged that smoking does not always result in lungcancer and that lung cancer has other causes. The report noted theextremely high relative risk for lung cancer in smokers and the highattributable risk, and it concluded that the association betweensmoking and lung cancer has "a high degree of specificity."

Temporality

Temporality refers to the occurrence of a cause before its purportedeffect. Temporality is the sine qua non of causality, as a cause clearlycannot occur after its purported effect. Rothman (1986) emphasizedthat temporality is the only one of the criteria that must be fulfilled foran association to be considered causal. Any question about a tempo-ral sequence seriously weakens a causal claim; but establishing tem-poral precedence is by itself not strong evidence in favor of causality.

Coherence, Plausibility, and Analogy

Although the original definitions of coherence, plausibility, andanalogy were subtly different, in practice they have been treated essen-tially as one idea: that a proposed causal relation should not violateknown scientific principles, and that it be consistent with experimen-tally demonstrated biologic mechanisms and other relevant data, suchas ecologic patterns of disease (Rothman and Greenland, 1998). Inaddition, if biologic understanding can be used to set aside explana-tions other than a causal association, it offers further support forcausality. Together, these criteria can serve to both support a causalclaim (by supporting the proposed mechanism) and refute it (byshowing that the proposed mechanism is unlikely).

Biologic understanding, of course, is always evolving as scientificadvances make possible ever deeper exploration of disease pathogen-esis. For example, in 1964 the Surgeon General's committee found thecausal association of smoking with lung cancer to be biologically plau-sible based on knowledge of the presence of carcinogens in tobaccosmoke and animal experiments. Nearly 40 years later, this associationremains biologically plausible, but that determination rests not onlyon the earlier evidence but on more recent findines that address the

Page 5: Cause and CancerEpidemiology week 3-1b.pdfThe philosopher Bryan Magee summarized this conundrum eloquently. It seems to be impossible for us to form any conception of an ordered world

Cause and Cancer Epidemiology

genetic and molecular basis of carcinogenesis, providing a level of prosp€ctive studies (i.e., cohort) at ihe time reported highly significantunde$tanding that could not have been anticipated in 1964. smoking-lung cancer relations. They fi.rrther noted that all of the

studies comparing smokers to nonsmokels showed high relative risks

Biologic Gradient (Do5e_Response) for lung cancer (-10). Dose-rcsponse effects were also observed inevery prospective srudy and in all reEospe{tive srudies where it could

The finding of a graded increase in effect with an increase in the be calculated. The temporal sequence was reported to be not absolutelystrengft of the possible caus€ prcvides strong positive support in favor cedain but se€med highly unlikely to be in the lung cancer-smokingof a causal hypothesis. This is not just because such an obseryation direction, as cancer typically appears many years or decades after theis predicted by many cause-and-effect models and biologic processes onset of smoking. With regard to coherence of the association u,ithbut, more importantly, because it makes most noncausal explanations known facts, the studies noted the ecologic increase in lung cancerhighly unlikely. If some factor other than that of interest explains the rates with hcreased smoking in the population; rhe gender differcn-observed gradient, the unmeasured factor must change in the same tial in lung cancet which at the time was consistent with morcmanner as the exposure of interest. Except for confounders that arc smoking by men; an urban-rural difference, which air pollution couldclosely related to a causal factor, it is exremely difficult for such a not completely explain; socioeconomic differentials in lung cancer, forpattem to be qeated by vfutually any of the noncausal explanations which smoking seemed to be the storgest explaoation; and the local-for an association listed earlier The finding of a dose-response rela- ization of cancer in the respiratory tract ill relation to the type oftion has long been a maiNtay of causal argumerts in smoking inves- smokng. The studies also cited the known reduction in risk amongtigations; viftually all health outcomes causally linked to smoking former smokers, with greater risk reductions corelated with morehave shown an increase in risk and/or severity with an increase in the time spelt not smoking. These observations, in combination withlifetime smoking history This criteriol is not based on any specific histopathologic evidence, basic biologic observations, and an in-depthshape of the dose-response relation. discussion of each competing non-smoking-related explanation (e.9.,

occupation, constitutional hypothesis, infections, environmental

Experiment factors such as pollution), produced a case for causation that prcvedirrefutable.

The criterion "experiment" refers to situations where natural condi-tions might plausibly be thought to imitate conditions of a randomizedexperiment, producing a "natural experiment" whose results mighthave the force of a true experiment. An experiment is typically a sit-uation in which a scientist controls who is exposed in a way that doesnot depend on any of the subject's characteristics. Sometimes natureproduces similar exposure patterns. The reduced risk after smokingcessation serves as one such situation that approximates an experi-ment; an alternative noncausal explanation might posit that an unmea-sured causal factor of that health outcome was more frequent amongthose who did not stop smoking than among those who did. The causalinterpretation is further strengthened if risk continues to decline informer smokers with increasing time since quitting. Similar to thedose-response criteria, observations of risk reduction after quittingsmoking have the dual effects of making most noncausal explanationsunlikely and supporting the biologic model that underlies the causalclaim.

APPLYING THE CAUSAL CRITERIA

The greater the extent to which an association fulfills the previous cri-teria, the more difficult it is to offer a more compelling alternativeexplanation. Which of these criteria may be more important andwhether some can be unfulfilled and still justify the causal claim is amatter of judgment. Temporality, however, cannot be violated. Whenthere is a still incompletely understood pathogenic mechanism, thecausal claim might still be justified by strong direct empirical evidenceof higher lung cancer rates in smokers (i.e., strong, consistent associ-ations). Moderate associations (e.9., relative risk of l-2) in only a fewstudies, without adequate understanding of potential confounders orwith weak designs, might result in a suspicion of causal linkage.

The process of applying the criteria extends beyond simply liningup the evidence against each criterion, although there is evidence thatepidemiologists tend to use the evidence in neither a consistent norcomprehensive manner (Weed and Gorelic, 1996). Rather, the criteriashould be used to integrate multiple lines of evidence coming fromchemical and toxicologic characterizations of tobacco smoke and itscomponents, epidemiologic approaches, and clinical investigations.Those applying the criteria weigh the totality of the evidence in adecision-making process that synthesizes and, of necessity, involves amultidisciplinary judgment.

The 19& Surgeon General's report still stands as one of the finestexamples of the power of applying these criteria systematically andcomprehensively. Starting with the criterion for consistency, the com-mittee noted that all 29 retrospective studies (i.e., case-control) and 7

The 1986 Report of the Surgeon General (US Department of Healthand Human Services-USDHHS, 1986) concluded that passivesmoking causes lung cancer, a conclusion that has proved momentousin its implications. This report also based its evaluation of the evidenceon the causal criteria. A clear distinction was made between the evi-dence on active smoking and that expected from the much lower car-cinogen doses arising from passive smoking. Biologic plausibility wasemphasized, including the substantial evidence on lung cancer riskin active smokers. This causal conclusion has been reaffirmed in allsubsequent reports (Samet and Wang, 2000; International Agency forResearch on Cancer-IARC, 2002).

EMERGING ISSUES IN CAUSAL INFERENCEAND CANCER

Perhaps the most challenging and exciting issue facing cancer scien-tists now and in the future is the prospect of understanding theprocesses of cancer development at the molecular level. However,with a richer understanding of basic mechanisms comes concomitantcomplexity in the concept and determination of cause. Biomarkers canserve as indicators of exposure, dose, susceptibility, or effect, each ofwhich can elevate the cancer risk, albeit via quite different routes(Links et al., 1995). Similarly, the mechanisms by which various genesaffect cancer risk are diverse, from genes that directly modulate tumorgrowth to others responsible for cellular homeostasis, DNA repair,genetic stability, or a host of interrelated functions that protect the cellagainst damage from somatic or environmental factors or affect itsrepair capacity when damage occurs (Vineis and Porta, 1996; Hussainand Harris, 1998).

It is interesting to consider the implications of this kind of knowl-edge for causal inference in cancer. The most obvious change is thatwe now understand the biologic basis of action of long-establishedcarcinogens, such as smoking, chemotherapy, and various chemicalagents. Mechanistic explanations of how environmental exposureshave a carcinogenic effect provide the basis for increased confidencethat any given association between the exposure and cancer incidenceis justifiably labeled causal. Molecular "signatures" of specific expo-sures (e.g., p53 CpG hotspots) (Greenblatt et al., 1994) are definingthe relevant effects of certain exposures more precisely and are alsomaking increasingly possible what could not be done before: establishcausal connections between exposure and disease on the individuallevel.

Second, by understanding better what mediates risk due to expo-sures, we are increasingly able to identify subpopulations of individ-uals at substantially different degrees of risk from an exposure (Shields

Page 6: Cause and CancerEpidemiology week 3-1b.pdfThe philosopher Bryan Magee summarized this conundrum eloquently. It seems to be impossible for us to form any conception of an ordered world

PART l: BASIC CONCEPTS

and Harris, 2000). This risk heterogeneity can be due to genetic orsomatic variations that affect the absorption, metabolism, or cellulareffect of an agent or the host's susceptibility to those effects (Linkset al., 1995). The smaller size of these subpopulations, however, posesgreater challenges to epidemiologic approaches to risk assessment: thefiner the risk stratification based on mechanistic criteria. the more dif-ficult it is to confirm it with epidemiologic methods (Slattery 2ffi2;Moore et a1.,2003).

Third, with the opening of the cancer "black box" comes greaterrecognition of the extraordinary complexity of the carcinogenicprocess at the molecular level. This is seen not only with externalagents dependent on genes for their effects but with environmentalmodification of gene expression (e.g., through DNA methylation)(Moore et al., 2003), gene functions interacting in myriad ways, andperhaps most intriguingly genetic determinants of exposure, as shownwith developing understanding of the biologic bases of nicotine,alcohol, and drug addiction (Shields et al., 1993; Greenblatt et al.,1994; Hemby, 1999). This complexity poses significant problems forpopulation-based risk assessments and causal inference because itraises, more acutely than with conventional risk factors, questionsabout which genes lie in the causal pathway (those that do should notbe considered covariates), which genes or biomarkers are necessaryfor the effects or function of others (i.e., there may be many biologi-cally based high level interaction terms), whether an observed associ-ation represents a direct or indirect causal effect, and whether it ismeaningful to talk about binary exposure{utcome or gene{utcomerelations when the components of these complex networks eithercannot be changed individually or changed at all (Taioli and Garte,200.2).

Fourth, although the functions of many genes and gene products areunderstood, many more are not. The current ability to perform massscreening of potential causal factors via high throughput, genomic andproteomic technologies is far outstripping our ability to explain whatwe find; and severe problems related to multiplicity ("data dredging")arise in many of these investigations (Reiner et al., 2003). Epidemi-ologists have long faced this problem but never on this scale, withliterally hundreds or even thousands of potential markers measuredand analyzed, in all combinations, sometimes in only tens or hundredsof subjects. The weakness of findings subject to these problems isnot often fully appreciated, and researchers are making claims aboutpossibilities of screening, treatrnent, or prevention before findings arereplicated in independent data sets.

A final issue that is often neglected in this rush toward exploringnew potential causes is that of measurement (Little et al., 2002). Manyof the techniques and assays used to identify metabolic products,genes, and gene products are relatively new and not standardized.Understanding the reliabiliry and validity of these techniques is anarduous process that often does not get sufficient attention, yet it iscritical for distinguishing likely spurious claims from those that arewell grounded.

As we come closer to understanding cancer on the molecular level,many have raised the possibility of individual risk prediction (Vineis,1997; Hussain et al., 2001). Some commentators have suggested thatmechanistic understanding may ultimately render population studiesirrelevant. However. it is instructive to consider the case of infectiousdiseases. Understanding the basic mechanisms of infectious diseasehas not eliminated the need to study disease patterns on a populationlevel, and the same will likely be true for cancer (Nevins et al., 2003).There are many reasons to believe that population-based studies willbe as important in the future as they are now, albeit perhaps focusedon different kinds of questions. To understand a mechanism after acancer occurs is not to predict it; early steps in the process, when thedisease can be prevented, will necessarily be less than l00%o predic-tive; and defining optimal groups for screening and early interventionwill still require population-based data. Risk groups must be definedusing far fewer factors than we know are operating at the molecularlevel, the latter being almost unique for an individual.

The molecular revolution in cancer may ultimately force a mergingof two "schools" of causal inference: the probabilistic, chronic diseasemodel that the Hill criteria addressed and the more mechanistic, deter-

ministic models used for infectious disease. for which the Henle-Kochcriteria were devised (Fredericks and Relman, 1996). Nowhere is thisseen more clearly than with viral carcinogenesis. The Henle-Koch cri-teria were based on the nineteenth century understanding of bacterialdisease causation and are poorly suited for viral disease mechanismsor even for infectious disease outcomes. Efforts to refashion the Henle-Koch criteria for the new era of molecular medicine (Fredericks andRelman, 1996; Vineis and Porta, 1996) have shown that it is extraor-dinarily difficult to outline a set of experimental conditions that allknown pathogens-not to mention new pathogens with differentmechanisms-must satisfy to justify causal claims for new infectiousdiseases.

In the case of viral carcinogenesis, the siruation is even morecomplex because the final disease is not a direct manifestation of aninfectious process. Numerous viral agents have been linked to cancerwith varying degrees of certainty: Epstein-Barr virus and Burkitt'slymphoma (Pagano, 1999), human papillomavirus and cervical cancer(Bosch and de Sanjose, 2003), SV40 and mesothelioma (Carboneet al., 1997; Klein et al., zmD. The evidential basis of these claimsincludes findings that might implicate the virus in individual cancercases (e.g., finding viral genetic sequences or viral-specific proteins intumor tissue) and traditional epidemiologic evidence (e.g., high inci-dence in persons with evidence of viral exposure). However, as boththese and other examples have shown, there is no single molecularfinding that definitively implicates a virus as a cause of cancer. Thevariety and complexity of mechanisms by which viruses can directly(by inducing oncogenic changes) or indirectly (by increasing host sus-ceptibility to exogenous agents) raise cancer risk seems to defy a setof causal criteria designed specifically for viral agents or that are basedon any specific mechanism. Therefore, from the standpoint of causalinference, it may be best to consider viral agents under the sameumbrella as toxic exposures and other environmental causes of cancer.The fact that the pathways from viral infection to cancer appearanceoften share components (e.g., p53 inactivation) with those of nonin-fectious carcinogenic exposures supports this view.

It seems unlikely that the need for population-based risk estimatesand causal inference will disappear from cancer research, just as it hasnot in infectious disease. However, we are entering an era where therelative strength of the "twin pillars" of causal inference-knowledgederived from empirical, population-based patterns and that based onunderstanding of biologic mechanisms in individuals-will tilt farthertoward the mechanistic end, requiring less proof from populations andmore from the laboratory. One challenge for causal inference in thefuture will be how best to integrate these various forms of evidenceand how to assemble groups with the sufficient interdisciplinaryexpertise to assess them.

References

Alberg AJ, Samet JM. 2003. Epidemiology of lung cancer. Chest 123:2lS-49S.Bosch FX, de Sanjose S. 2003. Chapter 1: Human papillomavirus and cervi-

cal cancer-burden and assessment of causality. J Natl Cancer InstMonogr 31:3-13.

Bunge M. 1959. Causality: The Place of the Causal Principle in ModernScience. Cambridge, MA: Harvard University Press.

Carbone M, Rizzo P, Pass t{L 1997 . Simian virus 40, poliovaccines and humanrumors: a review of recent developments. Oncogene 15:1877-1888.

Evans AS. 1993. Causation and Disease: A Chronological Journey. New York:Plenum.

Fredericks DN, Relman DA. 1996. Sequence-based identification of microbialpathogens: a reconsideration of Koch's postulates. Clin Microbiol Rev9: I 8-33.

Greenblatt MS, Bennett WP, Hollstein M, Harris CC. 1994. Mutations in thep53 rumor suppressor gene: clues to cancer etiology and molecular patho-genesis. Cancer Res 54:48554878.

Greenland S. 1990. Randomization, statistics, and causal inference. Epidemi-ology l:421429.

Greenland S, Robins JM, Pearl J. 1999. Confounding and collapsibility incausal inference. Stat Sci 14:2946.

Hemby SE. 1999. Recent advances in the biology of addiction. Curr Psychia-try Rep 1:159-165.

Hill AB. 1965. The environment and disease: association or causation? Proc RSoc Med 58:295-300.

Page 7: Cause and CancerEpidemiology week 3-1b.pdfThe philosopher Bryan Magee summarized this conundrum eloquently. It seems to be impossible for us to form any conception of an ordered world

Cause anci Cancer Epidemiology

Hume D.l'139. ATreatise of Human Nature. London: Oxford University Press.Hussain SP, Harris CC. 1998. Molecular epidemiology of human cancer: con-

tribution of mutation spectra studies of tumor suppressor genes. CancerRes 58:40234037.

Hussain SP, Hofseth lJ, Harris CC. 2001. Tumor suppressor genes: at the cross-roads of molecular carcinogenesis, molecular epidemiology and humanrisk assessment. Lung Cancer 34(Suppl 2):S7-S15.

International Agency for Research on Cancer (IARC). 2002. Tobacco smokeand involuntary smoking.IARC Monograph 83. Lyon: IARC.

Kaufman JS, Poole C. 2000. Looking back: causal thinking in the health sci-ences. Annu Rev Public Health 2l:l0l-119.

Klein G, Powers A, Croce C.2A02. Association of SV40 with human rumors.Oncogene 21 :l l 4l-l l 49.

Last JM. 2000. A Dictionary of Epidemiology. New York: Oxford UniversiryPress.

Lewis D.1973. Counterfactuals. Cambridge, MA: Harvard University Press.LilienfeldAM. 1959. on the methodology of investigations of etiologic factors

in chronic diseases: some comments. J Chronic Dis 10:41-46.Links JM, Kensler TW, Groopman JD. 1995. Biomarkers and mechanistic

approaches in environmental epidemiology. Annu Rev Public Healthl6:83-103.

Little J, Bradley L, Bray MS, Clyne M, Dorman J, Ellsworth DL, Hanson J,Khoury M, Lau J, O'Brien TR, Rothman N, Stroup D, Taioli E, ThomasD, Vainio H, Wacholder S, Weinberg C.2W2. Reporting, appraising, andintegmting data on genotype prevalence and gene-disease associations.Am J Epidemiol 156:300-310.

Magee B. 2001. The Great Philosophers: An Introduction to Western Philoso-phy. Oxford, UK: Oxford University Press.

Moore LE, Huang WY, Chung J, Hayes RB. 2003. Epidemiologic considera-tions to assess altered DNA methylation from environmental exposuresin cancer. Ann NY Acad Sci 983:181-196.

Nevins JR, Huang ES, Dressman H, Pittman J, Huang AT, West M. 2003.Towards integrated clinico-genomic models for personalized medicine:combining gene expression signatures and clinical factors in breast canceroutcomes prediction. Hum Mol Genet l2(Spec. No. 2):R153-R157.

Neyman J. 1990. On the application of probability theory to agricultural exper-iments: essay on principles (1923). Stat Sci 5:463-572.

Olsen J. 2003. What characterises a useful concept of causation in epidemiol-ogy? J Epidemiol Community Health 57:86-88.

Pagano JS. 1999. Epstein-Ban virus: the first human tumor virus and its rolein cancer. Proc Assoc Am Physicians I I l:573-580.

Pearl J. 2000. Causalify: Models, Reasoning and Inference. Cambridge, UK:Cambridge University hess.

Reiner A, Yekutieli D, Benjamini Y.2003.Identifying differentially expressedgenes using false discovery rate controlling procedures. Bioinformatics19:368-375.

Robins J. 1986. A new approach to causal inference in mortality studies withsustained exposure periods: applications to control ofthe healthy workersurvivor effect. Mathematical Modelling 7 :1393-1512.

Robins I. 1987. A graphical approach to the identification and esrimation ofcausal parameters in mortality studies with sustained exposure periods.J Chronic Dis 40:1395-161S.

Rosen G. 1993. AHistory of Public Health. Baltimore, MD: The Johns HopkinsUniversify Press.

Rothman W.1976. Causes. Am J Epidemiol 104:587-592.Rothman KJ. 1986. lnteractions Between Causes. Modern Epidemiology.

Boston: Little, Brown.Rothman KJ, Greenland S. 1998. Modern Epidemiology. Philadelphia:

Lippincon-Raven.Rubin D.1974. Estimating causal effects of treatments in randomized and non-

randomized studies. J Educ Psychol 66:688-701.Samet JM, Lerchen ML. 1984. Proportion of lung cancer caused by occupa-

tion: a critical review. ln: Bernard J, Gee L, Keith W' Morgan C, BrooksSM, editors. Occupational Lung Disease. New York: Raven kess, pp.5547.

Samet JM, Wang SS. 2000. Environmental tobacco smoke. In: Lippmann M,editor. Environmental Toxicants: Human Exposures and Their HealthEffects. New York: Van Nostrand Reinhold, pp.319-375.

Sartwell PE. 1960. On the methodology of investigations of etiologic factorsin chronic diseases: further comments. J Chronic Dis 1l:61-63.

Shields PG, Harris CC. 2000. Cancer risk and low-penetrance susceptibilitygenes in gene-environment interactions. J Clin Oncol 18:2309-2315.

Shields PG, Caporaso NE, Falk RT, Sugimura H, Trivers GE, Trump BF,Hoover RN, Weston A, Hanis CC. 1993. Lung cancer, race, and aCYPIAI genetic polymorphism. Cancer Epidemiol Biomarkers Prev2:481485.

Slattery ML.zNL The science and art of molecular epidemiology. J EpidemiolCommunity Health 56:728-729.

Susser M. 1973. Causal Thinking in the Health Sciences: Concepts and Strate-gies in Epidemiology. New York: Oxford University Press.

Susser M. 1977. Judgement and causal inference: criteria in epidemiologicstudies. Am J Epidemiol 105:l-15.

Susser M. 1991. What is a cause and how do we know one? A grammar forpragmatic epidemiology. Am J Epidemiol 133:635-648.

Taioli E, Garte S. 2OO2. Covariates and confounding in epidemiologic studiesusing metabolic gene polymorphisms. Int J Cancer 100:97-100.

US Department of Health and Human Services (USDHHS). 1986. The HealthConsequences of Involuntary Smoking: A Report of the Surgeon General.DIIHS Publ. No. (CDC) 87-8398. Washington, DC: U.S. GovernmentPrinting Office.

US Department of Health Education and Welfare (DHEW). 1964. Smoking andHealth. Report of the Advisory Comminee to the Surgeon General.DHEW Publ. No. (PHS) 1103. Washington, DC: U.S. Government Print-ing Office.

Vineis P. 1997. Sources of variation in biomarkers. IARC Sci Publ 142:59-7t .

Vineis P, Porta M. 1996. Causal thinking, biomarkers, and mechanisms of car-cinogenesis. J Clin Epidemiol 490:95 l-956.

Weed DL, Gorelic LS. 1996. The practice of causal inference in cancer epi-demiology. Cancer Epidemiol Biomarkers Prev 5:303-311.

White C. 1990. Research on smoking and lung cancer: a landmark in the historyof chronic disease epidemiology. Yale J Biol Med 63:2946.

Yerushalmy J, Palmer C. 1959. On the methodology of investigations of etio-logic factors in chronic diseases. J Chronic Dis 10:27-40.