Case control study – part 1

1

Case control study - Part 1

Dr. Rizwan S A, M.D.,

2

Outline of presentation

• Some history• Planning and conducting a study• Matching• Sources of bias• Applications

3

A scenario

• Assume you are the senior health advisor to the GOI

• Recently, several isolated reports of neurological illness following DPT vaccination have come up in the country

• Media is adding fuel to the fire

• Parents and doctors are reluctant to vaccinate and the vaccination rates are going down the drain!

• What will you do?

4

Some history

• 1788 - Early concepts found in works of Parisian physician PCA Louis

• 1843 - First explicit description by William Augustus Guy (occupational exposure and pulmonary disease)

• 1862 - Baker, case control comparisons of marriage and fertility in breast cancer patients

• 1926 - Lane Claypon’s Breast cancer study

• 1950 - Levin et al ; Wynder & Graham ; Schrek et al. and Doll & Hill; (smoking and lung cancer)

5

Planning and conducting

• Research question• Definition of case• Definition of control• Selecting the cases & controls• Research instrument

6

Case Control Studies Cohort StudiesProceeds from effect to cause Proceeds from cause to effect

Starts with the diseaseStarts with people exposed to the risk factor

or suspected cause

Tests whether the suspected cause occurs more frequently in those with disease than

those without disease

Tests whether disease occurs more frequently in those exposed than in those not

exposed

Usually the 1st approach to the testing of hypothesis, but also useful for exploratory

studies

Reserved for the testing of precisely formulated hypothesis

Involves fewer study subjects Involves larger number of subjects

Yields results relatively quickly Long follow-up, delayed results

Suitable for study of rare diseasesInappropriate when disease or exposure

under investigation is rare

Generally, yields only estimate of relative risk (Odds ratio)

Yields incidence rates, relative risk, attributable risk

Cannot yield information about disease other than that under study

Can give information about more than one disease outcome

Relatively inexpensive Expensive

7

Research question

• Begin with broad and ambitious question• Later, narrow and more precise• Considerations of time, cost• Eg.

1. Does tobacco cause cancer?

2. Does smoking tobacco cause bronchogenic CA?

3. Do persons having broncho. CA have h/o greater exposure to tobacco smoking as compared to persons w/o the disease?

• Poor questions can spoil the entire study

8

Definition of case - 1

• Eligibility• Definition of disease

9


• Eligibility (2 components)– Objective criteria for diagnosis– Stating the eligibility criteria

• Eligibility criteria – should reflect ‘potentially at risk for exposure’ both for the case & control– Eg. recent OCP and MI; (sterilized, post-

menopausal, CIs to OCPs)

10


• Cases sh. have reasonable possibility of having had their disease induced by the exposure– Eg. OCP and Thromboembolism – sh. exclude

postpartum and postoperative cases (Why?)

• Incident cases– Will be more uniform– Recall more accurate– More certain that exposure preceded the disease– Berkson and Neyman

11


• Definition of disease– Objective criteria to reliably diagnose the

disease– Eg. Rh. Arthritis (several diagnostic criteria -

causing confusion)– To reduce misclassification

• Sources of cases– Hospital lists, special reporting systems like

cancer registries, disease surveillance, death certificates

12

Definition of control - 1

• Eligibility criteria– Sh. be similar to the cases with regard to

potential for exposure– Problems arise in hospital based controls

• We want to select controls that are likely to reflect the exposure rate in the population

• We sh. exclude those hospital controls whose condition is associated with the exposure (Eg. Aspirin and MI; controls with chronic pain/peptic ulcer)

– One solution – include controls with a variety of diagnoses not associated with exposure

13


• Sources of controls– Hospital based– Dead controls– Controls with similar diseases– Neighborhood controls• Population based• Best friend control/ Sibling control

14


• Hospital based– Referral pattern is similar to cases (form the same study base)– Similar quality of information– Convenience– May not be representative of the population

• Dead controls– In a study where the case is death from a particular cause– Information obtained from ‘proxy’ informants– But dead controls differ from living controls

• Controls with similar diseases– Cancer (of different type) controls for cancer cases– Minimize recall bias, interviewer bias, examine specificity of

exposure

15


• Neighborhood controls– Best friend control/Sibling control• Inexpensive, easy and quick• Ability to match on a number of variables that

are associated with neighborhood/friendship• May introduce selection bias (‘smoking’ cases

nominate ‘smoking’ friends) related to the exposure and overmatching

– Population based• Truly representative sample• From tax lists, voting lists, telephone directories

16

Definition of control - 5Source Advantage Disadvantage

Hospital based Easily identified.Available for interview.More willing to cooperate.Tend to give complete and accurate information (recall bias).

Not typical of general population.Possess more risk factors for disease.Some diseases may share risk factors with disease under study. Berkesonian bias

Population based Most representative of the general population.Generally healthy.

Time, money, energy.Opportunity of exposure may not be same as that of cases. (location, occup.)

Neighbourhood controls/ Telephone exchange random dialing

Controls and cases similar in residence.Easier than sampling the population.

Non cooperation.Not representative of general population.

Best friend control/ Sibling control

Accessible, Cooperative.Similar to cases in most aspects.

Overmatching.

17

Selection process - 1

Total populationReference population

cases controls

18

Selection process - 2

• Cases– In practice; we use all eligible cases within a

defined time period• From disease registry or hospital• We are implicitly sampling from a subset of total

population of cases

• Controls– Sampling is most pertinent here because in

rare diseases, the no. of controls greatly exceed no. of cases

19

Selection of cases - 1

• Representativeness– Ideally, cases sh. be a random sample of all cases of

interest in the source population (e.g. from vital data, registry data)

– But commonly they are a selection of available cases from a medical care facility. (e.g. from hospitals, clinics)

• Method of Selection– Selection may be from incident or prevalent cases– Incident cases are those derived from ongoing

ascertainment of cases over time– Prevalent cases are derived from a cross-sectional survey

20


• Incident cases are more optimal• These should be all newly diagnosed cases over a

given period of time in a defined population. (However we are excluding patients who died before diagnosis)

• Prevalent cases do not include patients with a short course of disease (patients who recovered early and those who died will not be included)

• Can be partly overcome by including deceased cases as well as those alive

21


• Validity is more important than generalizability i.e. the need to establish an etiologic relationship is more important than to generalise results to the population

• Eg.– In a study on breast cancer – we can include all

cases or we can include only premenopausal women with lobular cancer• If we take the later group as cases; we can elicit the

etiology better

– Studies done in nurses for OCP use

22

Selection of controls - 1

• The four principals of Wacholder

1. The study base

2. De-confounding

3. Comparable accuracy

4. Efficiency

23


• Should the controls be similar to the cases in all respects other than having the disease? i.e. comparable

• Should the controls be representative of all non-diseased people in the population from which the cases are selected? i.e. representative

24


• Representativeness– Sh. be representative of the general population in

terms of probability of exposure to the risk factor

• Comparability– Sh. also have had the same opportunity to be

exposed as the cases have

• Not that both cases and controls are equally exposed; but only that they have had the same opportunity for exposure.

25


• Usually, cases are not a random sample of all cases in the population. So, the controls must be selected in the same way (and with the same biases) as the cases.

• If follows from the above, that a pool of potential controls must be defined. This is a universe of people from whom controls may be selected (study base)

26


• The study base is composed of a population at risk of exposure over a period

• Cases emerge within a study base. Controls should also emerge from the same study base, except that they are not cases.

• Eg. If cases are selected exclusively from hospitalized patients, controls must also be selected from hospitalized patients.

27


• Comparability is more important than representativeness in the selection of controls

• The control should resemble the case in all respects except for the presence of disease

28


• Number of controls– Large study; equal numbers– Small study; multiple controls

• Use of multiple controls– Controls of same type–Multiple controls of different types• Hospital and neighborhood controls• e.g. case - children with brain tumor, control-

children with other cancer, normal children

29


Children with brain tumors

Children with other cancers

Children without cancer

Radiation causes cancers

Radiation causes brain cancers only

Multiple controls of different types are valuable for exploring alternate hypothesis & for taking into account possible potential recall bias.

Exposure to radiation

30

Sampling for cases/controls - 1

• Frame – list of all potentially eligible cases and controls in the target population (a subset of the general pop. both at risk of exposure and disease development)

• The frame sh. not be biased in any manner, else the sample will also be biased even if random

• Types of sampling• SRS• Systematic• Stratified• Matched

• The objective is to avoid bias in selection, each case or control has equal chance of being selected

31


• If we are using all incident cases occurring in a defined area and time period, then controls selected at random from the gen. pop. is the best choice (sound basis for calculating RR, AR, etiologic fraction)

• If cases are selected from hospital(s), it is not necessary that population controls are the only best choice, a valid control series from hospital can be valid

• However, hosp. controls often leave room for doubt about validity of comparison (cost and practicality)

32


• Random digit dialing– Prerequisite; extensive telephone coverage– Either screen for potential controls/telephone interviews

• Method– All area codes and prefix numbers are obtained– Add all possible two digit numbers– The first 8 digits – PSU– Select a PSU at random – if response obtained then

retain PSU– Then the last two digits are randomly selected and

continued until required sample is reached– The no. of PSUs and total houses depend on design

33


• Examples– Artificial sweeteners and bladder cancer

• Cases; 21-84 years, newly diagnosed bladder cancer in 10 designated counties in metropolitan areas

• Controls; age-sex stratified random sample of the general populations in the ten counties frequency matched at 2:1 ratio

– Oral contraception and congenital malformations• Cases of malformation from all newborns and stillborns

delivered at five major hospitals bet 1974-76• Controls; all unaffected newborns in the 5 five hospitals,

sampling days were rotated to represent all 7 days

34

Matching - 1

• Matching is defined as the process of selecting controls so that they are similar to cases in certain characteristics such as age, sex, race, socioeconomic status and occupation

• What is post-matching?– Pairing controls to cases from an unmatched data during

analysis

• We often want a constant case control ratio, but sometimes matching is incomplete so that we end up with a variable ratio

35

Matching - 2

• Objective – to eliminate biased comparison between cases and controls

• Two step process1. The matched design

2. The matched analysis

• One immediate effect of matching is the balance between no. of cases and controls

• Sometimes we can deliberately match on a factor which comes in the casual path to confirm or refute its role. (Eg. Smoking and MI, matched on cholesterol)

36

Matching - 3

• What variables to match?– Factors which are independent risk factors for the disease– Assoc. with the exposure but non-causally– May not be directly a risk factor, but may be assoc. with

other casual factors excluding the study exposure

• Similar to something?

37

Matching - 4

• Situations to match or not?• Casual Non-causal

E

F

DE

F

DE

F

DE

F

D

1 2 3 4

38

Matching - 5

• Examples• 1. E = alcohol F = smoking D = lung CA

– Implication if not matched?

• 2. E = OCP F = smoking D = MI– Implication if not matched?

• 3. E = blood grp O F = age, sex D = thrombosis– Implication if matched?

• 2. E = OCP F = prescribing physician D = MI– Implication if not matched or matched?

39

Matching - 6• In summary, the decision to match or not depends

on the residual association of the factor with disease and exposure after controlling other variables

• Overmatching– Reduces validity or statistical efficiency– Two general meanings

• Unmatched analysis in matched studies• Matching for unnecessary variables

– If one matches on a factor that is associated with exposure but not the disease• Paired analysis may correctly estimate odds ratio but the variance

will be more compared to an unmatched study of the sample (overmatching increases the frequency of exposure concordant pairs which are discarded in paired analysis)

40

Matching - 7• If one matches a factor that is casually or non-causally

assoc. with disease but not exposure then OR will be biased towards unity

• If one matches a factor which is assoc. with disease but not exposure then OR will be correctly estimated whether or not pairing is retained or not– Paired analysis will be less efficient than unpaired one

• Matching on highly correlated variable is also unnecessary

• Finally, matching sh. be done for factors which have strongest relationship to the disease and are least correlated

41

Matching - 8• Alternatives to matching– At the sampling phase

• Stratified sampling• Frequency matching

– At analysis phase• Post-stratification• Regression analysis

• Stratified sampling• Pre-determined number of cases and controls in each

subgroup created by the cross-classification• Eg. Age (4 groups), sex (2), race (4 groups)

– Total 32 subgroups

42

Matching - 9• Frequency matching• Controls being taken from the corresponding subgroups in

proportion to the no. of cases– Eg. If 30% of cases are males of Hindu religion in 60-65 years then we

take 30% of similar controls– More practical than stratified sampling but it requires one to continually

update on the distribution of accumulating cases to maintain a fixed case-control ratio

• Post-stratification• Stratify the subgroups and analyze• Very flexible in that variables need not be pre-specified• Limitation - the number of variables that can be stratified due to

lack of numbers

• Regression analysis– Most useful when the number of variables/subgroups increase

43

Matching - 10• Effectiveness of matching

– Removal of bias– Reduction of variance

• Matched design only gives a modest increase in efficiency• Greatest improvement is when there is strong assoc. between

disease and the confounder• Also efficient when only a small proportion of the target

population is exposed to the study factor

• The added cost and complexity of matching should be weighed against any expected gains in precision

44

Matching - 11• Advantages – Cases and controls will be comparable to the matched

variables– Provides the best means to investigate a very specific

hypothesis

• Disadvantages– One can no longer study the matched variable in

relation to the risk of disease– Increase in cost, time and labor– A certain fraction of cases are discarded as a result of

failure to find a matching control

45

Matching - 12• Summary – Unless one has very good reason to match, one

is better off avoiding it– Frequency matching within rather broad

categories of the matching variables will suffice for most studies

46

Sources of bias - 1• Bias – systematic error in the design, conduct, or analysis of a study that

results in a mistaken estimate of the risk measure

1. Ascertainment and selection biasa) Surveillance

b) Diagnosis

c) Referral

d) Selection

e) Non-response

f) Length of stay

g) Survival

h) Admission diagnoses

2. Bias in estimation of exposurea) Recall

b) Interviewer

c) Prevarication

d) Improper analysis

3. Misclassification

4. Other sources

47

Sources of bias - 21. Ascertainment and selection bias - Not peculiar to case-control, can occur in cohort studies also

a) Differential Surveillance– In asymptomatic/mild diseases , cases are more likely to be

detected in persons who are closely examined– Eg. OPC and endometrial cancer/phlebitis

• Women taking OCPs were more thoroughly evaluated• Based on preliminary reports of OCP use and phlebitis, clinicians

started looking for phlebitis in such exposed patients

– Exposed cases would have a greater likelihood of being diagnosed as compared to unexposed cases

– This bias can be checked by doing a stratified analysis in subgroups having equal surveillance (based on some index of medical care) or restrict the study to time prior to publication of such finding

48

Sources of bias - 31. Ascertainment and selection bias

b) Diagnosis• In conditions like cervical dysplasia, knowledge of

exposure may alter the assessment• This is most likely to occur in cases of uncertain

diagnosis

c) Differential Referral• OR’ = bOR; b = (s1s4)/(s2s3);

– Where s1, s2, s3, s4 are the proportions of exposed and unexposed cases and controls resp.

– A biased selection of cases will be compensated by biased selection of controls also

– The probability of selecting exposed case = unexposed case, and likewise for control

49


c) Differential Referral (cont.)• Eg. • A study of Alcohol and kidney failure, and income is

assoc. with alcohol intake• A Hospital only admits wealthy patients, so cases of

kidney failure in this hospital will be more exposed to alcohol than patients in the gen. pop.

• But if patients with other diseases also have similar income characters and they were taken as controls, bias won’t occur

• If controls are taken form gen. pop. then we have to match/stratify income to eliminate income as a source of selection bias

50


d) Selection• Eg. Interviewer ‘keying’ on cases who are exposed (one

particular nurse was searching out all the cases of ectopic pregnancy with IUD usage)

• To avoid this, we must precisely and in advance the methods by which cases and controls are selected, carefully train staff, quality control

e) Non-Response• a worst case analysis taking all non-responding cases as

unexposed and all non-responding controls as exposed will show if the non-response is likely to bias the estimates

• if the exposure rates were equal between responders and non-responders, there will be no bias

51


f) Length of stay• In hospital study – incident cases sh. be selected rather than

prevalent cases otherwise,– Patients who stay longer will have more probability of being

selected– Cases of short duration would be under represented

• We check this by stratifying the analysis on the basis of the duration b/w admission and selection

g) Survival• In a situation where disease accompanied by mortality is

studied only in survivors• Eg. A study in survivors of MI may reveal factors that are

assoc. with surviving an MI rather than sustaining one• Unless one can justify that exposure is not related to

duration/survival one sh. take only incident cases• This bias can be checked by stratifying date of onset

52


h) Admission diagnoses• Eg. In hospital based study – assoc. b/w smoking

and MI, if controls are lung cancer patients; this will underestimate the effect• To avoid this bias we must select controls with a

variety of diseases which are believed to be unrelated with study exposure (neither + nor -)

2. Bias in the estimation of exposure

a) Recall• Eg. A mother with malformed baby will try with

more care and intensity to recall a pelvic X-ray compared to women with normal baby

53

Sources of bias - 82. Bias in the estimation of exposure

a) Recall (cont.)• Sometimes, the disease itself affects memory (dementia)• This bias can be reduced by using controls with another disease who

will also keep thinking of reasons for their disease• Independent verification of h/o exposure can be sought

b) Interviewer• Interviewer may probe cases more intensely for histories of exposure

than in controls if they know the hypothesis• Reduced by training staff, keeping staff ignorant of hypothesis (ideal

but unobtainable), keeping interview time constant

c) Prevarication• Subjects may have ulterior motives for deliberately overestimating or

underestimating exposure• Eg. A worker who may receive disability pay may exaggerate his

exposure; if it means loss of job, he may minimize it• May be overcome by several independent raters

54

Sources of bias - 92. Bias in the estimation of exposure

d) Improper analysis• Unmatched analysis for a matched study

3. Misclassification– The disease/exposure status classification may be erroneous– Some controls may actually have the study disease but this is

very improbable with rare diseases– The most likely source of misclassification will occur in the

determination of exposure– Any measure to reduce misclassification sh. addressed at the

design stage, a pilot study will reveal many errors

4. Other sources of error– Insufficient sample size, errors of interpretation, not accounting

for effect of extraneous variables

55

Sources of bias - 104. Other sources of error

– Cases and controls sh. be similar with respect to factors that might have affected both the development of disease and the opportunity for past exposure

– For eg. Medical conditions like HTN, DM preclude the use of OCPs, thus users of these would inherently be at a lower risk

– An agent found in assoc. with study disease was prescribed due to an early manifestation of the disease

– For eg. Estrogens prescribed for irregular bleeding that was the first symptom of undetected endometrial cancer. If this was the case then later diagnosis of the cancer would find an apparent assoc. with estrogen usage.

56

Sources of bias - 11Summary– Before starting a study, one should list the

likely sources of bias and plan the investigation and analyses so as to prevent/minimize them

57

Specific limitations of Case control study

• Is not useful to study weak associations (OR < 1.5)

• Non-participation rates are freq. low and differential for cases and controls

• Differential recall bias

58

Applications of Case control study

1. Vaccine effectiveness

2. Evaluation of treatment and program efficacy

3. Evaluation of screening programs

4. Outbreak investigations

5. Demography

6. Genetic epidemiology

7. Occupational epidemiology

59

Pertussis vacc. in UK - 1Year Event(s)

1906 Bordet and Gengou of the Pasteur Institute grow the pertussis bacterium in artificial media

1912-14 Pert. vaccine used by many researchers

Next few years

Many versions of vaccine developed

1942 Several local authorities in UK start vaccine

1947-48 First published reports appear of irreversible brain damage after whole-cell pertussis vaccine

1957 85,000 cases of pertussis reportedVaccination magnified to national scale

1975 Cases came down to 8,900Pertussis incidence peaks every 4 yearsThe peaks became smaller and smaller, the smallest was in 1974-75The next peak at 1978 should have been the smallest, but was it?

60

Pertussis vacc. in UK - 2

Vaccine acceptance rates Pertussis incidence rates

61


Year Event(s)

1974-75 Adverse publicity by media about the side effects of pert. vaccineParents and doctors hesitated to give vaccine

1976-79 National Childhood Encephalopathy Study (NCES) commissioned by the Dept. of Health and Social Security

1974 Vaccine acceptance rate came down (from 78% in 1971) to 37%

1977-79 An epidemic of pertussis occurs in Great Britain. > 100,000 cases and 36 deaths

1979 Vaccine Damage Payment Act passed in Great Britain. The act provides a mechanism for government compensation to those with vaccine-associated injuries

62


• Findings of the NCES study;

• Attributable risk – – Serious neurological disorders = 1 in 1,10,000 injs.– Persistent neurological sequelae = 1 in 3,10,000 injs.

63


Year Event(s)

1982 British Child Health and Education Study Long-term neurologic problems are not found to be related to pertussis immunizations.

1983 Communicable Diseases Surveillance Centre Study, or North West Thames Study, followed a large group of children after pertussis vaccination, finds no convincing evidence relating DPT vaccine to neurologic damage.

1988 Loveday judgment in Great Britain's High Court rules that there is insufficient evidence to demonstrate that pertussis vaccine can cause permanent brain damage.Considered as "test case" meaning that other lawsuits claiming permanent neurologic effects from pertussis vaccine are effectively excluded.

64

Pertussis vacc. in UK - 61990-

Happy ending

?

65

Critical Appraisal of NCES - 1• Research question

– Intended and actual

• Study design– Case control – reasons for choosing– Cohort – reasons for not choosing

• Case selection

66

Critical Appraisal of NCES - 2

• Only hospital admitted cases were selected as cases – any comments?

• Control selection

– Comments?

67

Critical Appraisal of NCES - 3• Exposure measurement

68

Critical Appraisal of NCES - 4• Results

• There was no noticeable clustering in any area

69

Critical Appraisal of NCES - 5• Results• 3.5% of cases and 1.7% of controls had been

immunized• OR 0f 2.4, p value < 0.001

70

Critical Appraisal of NCES - 6• Results• There was no significant association between serious

neurological illness and diphtheria and tetanus vaccine• Confounders• History of fits

– Is a known contraindication to immunization, including such cases will underestimate OR,

– A separate analysis limited to normal children with no past history of fits gave a RR of 3.2

• Social class– Could not be controlled– But analysis in those pairs of children in which both the affected

and control were of the same social class – no differences

71

Critical Appraisal of NCES - 7• Causation Vs. association

A) clinically distinctive

B) restricted to immunized children

C) closely related in time to immunization

D) biologically plausible

E) without alternative explanation

• Attributable risk– Can this be calculated in a case control study?– Covered an entire national population (in theory represents the

total incidence of serious neurological illnesses, assumption about immunization coverage)

– Serious neurological disorders = 1 in 1,10,000 injs.– Persistent neurological sequelae = 1 in 3,10,000 injs.– Is this appropriate?

72

Thank you

Case control study – part 1

Health & Medicine

Transcript of Case control study – part 1