Logistic Regression Analysis of Matched Case-Control Data- Part 2
Case control study – part 1
-
Upload
rizwan-s-a -
Category
Health & Medicine
-
view
405 -
download
0
Transcript of Case control study – part 1
1
Case control study - Part 1
Dr. Rizwan S A, M.D.,
2
Outline of presentation
• Some history• Planning and conducting a study• Matching• Sources of bias• Applications
3
A scenario
• Assume you are the senior health advisor to the GOI
• Recently, several isolated reports of neurological illness following DPT vaccination have come up in the country
• Media is adding fuel to the fire
• Parents and doctors are reluctant to vaccinate and the vaccination rates are going down the drain!
• What will you do?
4
Some history
• 1788 - Early concepts found in works of Parisian physician PCA Louis
• 1843 - First explicit description by William Augustus Guy (occupational exposure and pulmonary disease)
• 1862 - Baker, case control comparisons of marriage and fertility in breast cancer patients
• 1926 - Lane Claypon’s Breast cancer study
• 1950 - Levin et al ; Wynder & Graham ; Schrek et al. and Doll & Hill; (smoking and lung cancer)
5
Planning and conducting
• Research question• Definition of case• Definition of control• Selecting the cases & controls• Research instrument
6
Case Control Studies Cohort StudiesProceeds from effect to cause Proceeds from cause to effect
Starts with the diseaseStarts with people exposed to the risk factor
or suspected cause
Tests whether the suspected cause occurs more frequently in those with disease than
those without disease
Tests whether disease occurs more frequently in those exposed than in those not
exposed
Usually the 1st approach to the testing of hypothesis, but also useful for exploratory
studies
Reserved for the testing of precisely formulated hypothesis
Involves fewer study subjects Involves larger number of subjects
Yields results relatively quickly Long follow-up, delayed results
Suitable for study of rare diseasesInappropriate when disease or exposure
under investigation is rare
Generally, yields only estimate of relative risk (Odds ratio)
Yields incidence rates, relative risk, attributable risk
Cannot yield information about disease other than that under study
Can give information about more than one disease outcome
Relatively inexpensive Expensive
7
Research question
• Begin with broad and ambitious question• Later, narrow and more precise• Considerations of time, cost• Eg.
1. Does tobacco cause cancer?
2. Does smoking tobacco cause bronchogenic CA?
3. Do persons having broncho. CA have h/o greater exposure to tobacco smoking as compared to persons w/o the disease?
• Poor questions can spoil the entire study
8
Definition of case - 1
• Eligibility• Definition of disease
9
Definition of case - 2
• Eligibility (2 components)– Objective criteria for diagnosis– Stating the eligibility criteria
• Eligibility criteria – should reflect ‘potentially at risk for exposure’ both for the case & control– Eg. recent OCP and MI; (sterilized, post-
menopausal, CIs to OCPs)
10
Definition of case - 3
• Cases sh. have reasonable possibility of having had their disease induced by the exposure– Eg. OCP and Thromboembolism – sh. exclude
postpartum and postoperative cases (Why?)
• Incident cases– Will be more uniform– Recall more accurate– More certain that exposure preceded the disease– Berkson and Neyman
11
Definition of case - 4
• Definition of disease– Objective criteria to reliably diagnose the
disease– Eg. Rh. Arthritis (several diagnostic criteria -
causing confusion)– To reduce misclassification
• Sources of cases– Hospital lists, special reporting systems like
cancer registries, disease surveillance, death certificates
12
Definition of control - 1
• Eligibility criteria– Sh. be similar to the cases with regard to
potential for exposure– Problems arise in hospital based controls
• We want to select controls that are likely to reflect the exposure rate in the population
• We sh. exclude those hospital controls whose condition is associated with the exposure (Eg. Aspirin and MI; controls with chronic pain/peptic ulcer)
– One solution – include controls with a variety of diagnoses not associated with exposure
13
Definition of control - 2
• Sources of controls– Hospital based– Dead controls– Controls with similar diseases– Neighborhood controls• Population based• Best friend control/ Sibling control
14
Definition of control - 3
• Hospital based– Referral pattern is similar to cases (form the same study base)– Similar quality of information– Convenience– May not be representative of the population
• Dead controls– In a study where the case is death from a particular cause– Information obtained from ‘proxy’ informants– But dead controls differ from living controls
• Controls with similar diseases– Cancer (of different type) controls for cancer cases– Minimize recall bias, interviewer bias, examine specificity of
exposure
15
Definition of control - 4
• Neighborhood controls– Best friend control/Sibling control• Inexpensive, easy and quick• Ability to match on a number of variables that
are associated with neighborhood/friendship• May introduce selection bias (‘smoking’ cases
nominate ‘smoking’ friends) related to the exposure and overmatching
– Population based• Truly representative sample• From tax lists, voting lists, telephone directories
16
Definition of control - 5Source Advantage Disadvantage
Hospital based Easily identified.Available for interview.More willing to cooperate.Tend to give complete and accurate information (recall bias).
Not typical of general population.Possess more risk factors for disease.Some diseases may share risk factors with disease under study. Berkesonian bias
Population based Most representative of the general population.Generally healthy.
Time, money, energy.Opportunity of exposure may not be same as that of cases. (location, occup.)
Neighbourhood controls/ Telephone exchange random dialing
Controls and cases similar in residence.Easier than sampling the population.
Non cooperation.Not representative of general population.
Best friend control/ Sibling control
Accessible, Cooperative.Similar to cases in most aspects.
Overmatching.
17
Selection process - 1
Total populationReference population
cases controls
18
Selection process - 2
• Cases– In practice; we use all eligible cases within a
defined time period• From disease registry or hospital• We are implicitly sampling from a subset of total
population of cases
• Controls– Sampling is most pertinent here because in
rare diseases, the no. of controls greatly exceed no. of cases
19
Selection of cases - 1
• Representativeness– Ideally, cases sh. be a random sample of all cases of
interest in the source population (e.g. from vital data, registry data)
– But commonly they are a selection of available cases from a medical care facility. (e.g. from hospitals, clinics)
• Method of Selection– Selection may be from incident or prevalent cases– Incident cases are those derived from ongoing
ascertainment of cases over time– Prevalent cases are derived from a cross-sectional survey
20
Selection of cases - 2
• Incident cases are more optimal• These should be all newly diagnosed cases over a
given period of time in a defined population. (However we are excluding patients who died before diagnosis)
• Prevalent cases do not include patients with a short course of disease (patients who recovered early and those who died will not be included)
• Can be partly overcome by including deceased cases as well as those alive
21
Selection of cases - 3
• Validity is more important than generalizability i.e. the need to establish an etiologic relationship is more important than to generalise results to the population
• Eg.– In a study on breast cancer – we can include all
cases or we can include only premenopausal women with lobular cancer• If we take the later group as cases; we can elicit the
etiology better
– Studies done in nurses for OCP use
22
Selection of controls - 1
• The four principals of Wacholder
1. The study base
2. De-confounding
3. Comparable accuracy
4. Efficiency
23
Selection of controls - 2
• Should the controls be similar to the cases in all respects other than having the disease? i.e. comparable
• Should the controls be representative of all non-diseased people in the population from which the cases are selected? i.e. representative
24
Selection of controls - 3
• Representativeness– Sh. be representative of the general population in
terms of probability of exposure to the risk factor
• Comparability– Sh. also have had the same opportunity to be
exposed as the cases have
• Not that both cases and controls are equally exposed; but only that they have had the same opportunity for exposure.
25
Selection of controls - 4
• Usually, cases are not a random sample of all cases in the population. So, the controls must be selected in the same way (and with the same biases) as the cases.
• If follows from the above, that a pool of potential controls must be defined. This is a universe of people from whom controls may be selected (study base)
26
Selection of controls - 5
• The study base is composed of a population at risk of exposure over a period
• Cases emerge within a study base. Controls should also emerge from the same study base, except that they are not cases.
• Eg. If cases are selected exclusively from hospitalized patients, controls must also be selected from hospitalized patients.
27
Selection of controls - 6
• Comparability is more important than representativeness in the selection of controls
• The control should resemble the case in all respects except for the presence of disease
28
Selection of controls - 7
• Number of controls– Large study; equal numbers– Small study; multiple controls
• Use of multiple controls– Controls of same type–Multiple controls of different types• Hospital and neighborhood controls• e.g. case - children with brain tumor, control-
children with other cancer, normal children
29
Selection of controls - 8
Children with brain tumors
Children with other cancers
Children without cancer
Radiation causes cancers
Radiation causes brain cancers only
Multiple controls of different types are valuable for exploring alternate hypothesis & for taking into account possible potential recall bias.
Exposure to radiation
30
Sampling for cases/controls - 1
• Frame – list of all potentially eligible cases and controls in the target population (a subset of the general pop. both at risk of exposure and disease development)
• The frame sh. not be biased in any manner, else the sample will also be biased even if random
• Types of sampling• SRS• Systematic• Stratified• Matched
• The objective is to avoid bias in selection, each case or control has equal chance of being selected
31
Sampling for cases/controls - 2
• If we are using all incident cases occurring in a defined area and time period, then controls selected at random from the gen. pop. is the best choice (sound basis for calculating RR, AR, etiologic fraction)
• If cases are selected from hospital(s), it is not necessary that population controls are the only best choice, a valid control series from hospital can be valid
• However, hosp. controls often leave room for doubt about validity of comparison (cost and practicality)
32
Sampling for cases/controls - 3
• Random digit dialing– Prerequisite; extensive telephone coverage– Either screen for potential controls/telephone interviews
• Method– All area codes and prefix numbers are obtained– Add all possible two digit numbers– The first 8 digits – PSU– Select a PSU at random – if response obtained then
retain PSU– Then the last two digits are randomly selected and
continued until required sample is reached– The no. of PSUs and total houses depend on design
33
Sampling for cases/controls - 4
• Examples– Artificial sweeteners and bladder cancer
• Cases; 21-84 years, newly diagnosed bladder cancer in 10 designated counties in metropolitan areas
• Controls; age-sex stratified random sample of the general populations in the ten counties frequency matched at 2:1 ratio
– Oral contraception and congenital malformations• Cases of malformation from all newborns and stillborns
delivered at five major hospitals bet 1974-76• Controls; all unaffected newborns in the 5 five hospitals,
sampling days were rotated to represent all 7 days
34
Matching - 1
• Matching is defined as the process of selecting controls so that they are similar to cases in certain characteristics such as age, sex, race, socioeconomic status and occupation
• What is post-matching?– Pairing controls to cases from an unmatched data during
analysis
• We often want a constant case control ratio, but sometimes matching is incomplete so that we end up with a variable ratio
35
Matching - 2
• Objective – to eliminate biased comparison between cases and controls
• Two step process1. The matched design
2. The matched analysis
• One immediate effect of matching is the balance between no. of cases and controls
• Sometimes we can deliberately match on a factor which comes in the casual path to confirm or refute its role. (Eg. Smoking and MI, matched on cholesterol)
36
Matching - 3
• What variables to match?– Factors which are independent risk factors for the disease– Assoc. with the exposure but non-causally– May not be directly a risk factor, but may be assoc. with
other casual factors excluding the study exposure
• Similar to something?
37
Matching - 4
• Situations to match or not?• Casual Non-causal
E
F
DE
F
DE
F
DE
F
D
1 2 3 4
38
Matching - 5
• Examples• 1. E = alcohol F = smoking D = lung CA
– Implication if not matched?
• 2. E = OCP F = smoking D = MI– Implication if not matched?
• 3. E = blood grp O F = age, sex D = thrombosis– Implication if matched?
• 2. E = OCP F = prescribing physician D = MI– Implication if not matched or matched?
39
Matching - 6• In summary, the decision to match or not depends
on the residual association of the factor with disease and exposure after controlling other variables
• Overmatching– Reduces validity or statistical efficiency– Two general meanings
• Unmatched analysis in matched studies• Matching for unnecessary variables
– If one matches on a factor that is associated with exposure but not the disease• Paired analysis may correctly estimate odds ratio but the variance
will be more compared to an unmatched study of the sample (overmatching increases the frequency of exposure concordant pairs which are discarded in paired analysis)
40
Matching - 7• If one matches a factor that is casually or non-causally
assoc. with disease but not exposure then OR will be biased towards unity
• If one matches a factor which is assoc. with disease but not exposure then OR will be correctly estimated whether or not pairing is retained or not– Paired analysis will be less efficient than unpaired one
• Matching on highly correlated variable is also unnecessary
• Finally, matching sh. be done for factors which have strongest relationship to the disease and are least correlated
41
Matching - 8• Alternatives to matching– At the sampling phase
• Stratified sampling• Frequency matching
– At analysis phase• Post-stratification• Regression analysis
• Stratified sampling• Pre-determined number of cases and controls in each
subgroup created by the cross-classification• Eg. Age (4 groups), sex (2), race (4 groups)
– Total 32 subgroups
42
Matching - 9• Frequency matching• Controls being taken from the corresponding subgroups in
proportion to the no. of cases– Eg. If 30% of cases are males of Hindu religion in 60-65 years then we
take 30% of similar controls– More practical than stratified sampling but it requires one to continually
update on the distribution of accumulating cases to maintain a fixed case-control ratio
• Post-stratification• Stratify the subgroups and analyze• Very flexible in that variables need not be pre-specified• Limitation - the number of variables that can be stratified due to
lack of numbers
• Regression analysis– Most useful when the number of variables/subgroups increase
43
Matching - 10• Effectiveness of matching
– Removal of bias– Reduction of variance
• Matched design only gives a modest increase in efficiency• Greatest improvement is when there is strong assoc. between
disease and the confounder• Also efficient when only a small proportion of the target
population is exposed to the study factor
• The added cost and complexity of matching should be weighed against any expected gains in precision
44
Matching - 11• Advantages – Cases and controls will be comparable to the matched
variables– Provides the best means to investigate a very specific
hypothesis
• Disadvantages– One can no longer study the matched variable in
relation to the risk of disease– Increase in cost, time and labor– A certain fraction of cases are discarded as a result of
failure to find a matching control
45
Matching - 12• Summary – Unless one has very good reason to match, one
is better off avoiding it– Frequency matching within rather broad
categories of the matching variables will suffice for most studies
46
Sources of bias - 1• Bias – systematic error in the design, conduct, or analysis of a study that
results in a mistaken estimate of the risk measure
1. Ascertainment and selection biasa) Surveillance
b) Diagnosis
c) Referral
d) Selection
e) Non-response
f) Length of stay
g) Survival
h) Admission diagnoses
2. Bias in estimation of exposurea) Recall
b) Interviewer
c) Prevarication
d) Improper analysis
3. Misclassification
4. Other sources
47
Sources of bias - 21. Ascertainment and selection bias - Not peculiar to case-control, can occur in cohort studies also
a) Differential Surveillance– In asymptomatic/mild diseases , cases are more likely to be
detected in persons who are closely examined– Eg. OPC and endometrial cancer/phlebitis
• Women taking OCPs were more thoroughly evaluated• Based on preliminary reports of OCP use and phlebitis, clinicians
started looking for phlebitis in such exposed patients
– Exposed cases would have a greater likelihood of being diagnosed as compared to unexposed cases
– This bias can be checked by doing a stratified analysis in subgroups having equal surveillance (based on some index of medical care) or restrict the study to time prior to publication of such finding
48
Sources of bias - 31. Ascertainment and selection bias
b) Diagnosis• In conditions like cervical dysplasia, knowledge of
exposure may alter the assessment• This is most likely to occur in cases of uncertain
diagnosis
c) Differential Referral• OR’ = bOR; b = (s1s4)/(s2s3);
– Where s1, s2, s3, s4 are the proportions of exposed and unexposed cases and controls resp.
– A biased selection of cases will be compensated by biased selection of controls also
– The probability of selecting exposed case = unexposed case, and likewise for control
49
Sources of bias - 41. Ascertainment and selection bias
c) Differential Referral (cont.)• Eg. • A study of Alcohol and kidney failure, and income is
assoc. with alcohol intake• A Hospital only admits wealthy patients, so cases of
kidney failure in this hospital will be more exposed to alcohol than patients in the gen. pop.
• But if patients with other diseases also have similar income characters and they were taken as controls, bias won’t occur
• If controls are taken form gen. pop. then we have to match/stratify income to eliminate income as a source of selection bias
50
Sources of bias - 51. Ascertainment and selection bias
d) Selection• Eg. Interviewer ‘keying’ on cases who are exposed (one
particular nurse was searching out all the cases of ectopic pregnancy with IUD usage)
• To avoid this, we must precisely and in advance the methods by which cases and controls are selected, carefully train staff, quality control
e) Non-Response• a worst case analysis taking all non-responding cases as
unexposed and all non-responding controls as exposed will show if the non-response is likely to bias the estimates
• if the exposure rates were equal between responders and non-responders, there will be no bias
51
Sources of bias - 61. Ascertainment and selection bias
f) Length of stay• In hospital study – incident cases sh. be selected rather than
prevalent cases otherwise,– Patients who stay longer will have more probability of being
selected– Cases of short duration would be under represented
• We check this by stratifying the analysis on the basis of the duration b/w admission and selection
g) Survival• In a situation where disease accompanied by mortality is
studied only in survivors• Eg. A study in survivors of MI may reveal factors that are
assoc. with surviving an MI rather than sustaining one• Unless one can justify that exposure is not related to
duration/survival one sh. take only incident cases• This bias can be checked by stratifying date of onset
52
Sources of bias - 71. Ascertainment and selection bias
h) Admission diagnoses• Eg. In hospital based study – assoc. b/w smoking
and MI, if controls are lung cancer patients; this will underestimate the effect• To avoid this bias we must select controls with a
variety of diseases which are believed to be unrelated with study exposure (neither + nor -)
2. Bias in the estimation of exposure
a) Recall• Eg. A mother with malformed baby will try with
more care and intensity to recall a pelvic X-ray compared to women with normal baby
53
Sources of bias - 82. Bias in the estimation of exposure
a) Recall (cont.)• Sometimes, the disease itself affects memory (dementia)• This bias can be reduced by using controls with another disease who
will also keep thinking of reasons for their disease• Independent verification of h/o exposure can be sought
b) Interviewer• Interviewer may probe cases more intensely for histories of exposure
than in controls if they know the hypothesis• Reduced by training staff, keeping staff ignorant of hypothesis (ideal
but unobtainable), keeping interview time constant
c) Prevarication• Subjects may have ulterior motives for deliberately overestimating or
underestimating exposure• Eg. A worker who may receive disability pay may exaggerate his
exposure; if it means loss of job, he may minimize it• May be overcome by several independent raters
54
Sources of bias - 92. Bias in the estimation of exposure
d) Improper analysis• Unmatched analysis for a matched study
3. Misclassification– The disease/exposure status classification may be erroneous– Some controls may actually have the study disease but this is
very improbable with rare diseases– The most likely source of misclassification will occur in the
determination of exposure– Any measure to reduce misclassification sh. addressed at the
design stage, a pilot study will reveal many errors
4. Other sources of error– Insufficient sample size, errors of interpretation, not accounting
for effect of extraneous variables
55
Sources of bias - 104. Other sources of error
– Cases and controls sh. be similar with respect to factors that might have affected both the development of disease and the opportunity for past exposure
– For eg. Medical conditions like HTN, DM preclude the use of OCPs, thus users of these would inherently be at a lower risk
– An agent found in assoc. with study disease was prescribed due to an early manifestation of the disease
– For eg. Estrogens prescribed for irregular bleeding that was the first symptom of undetected endometrial cancer. If this was the case then later diagnosis of the cancer would find an apparent assoc. with estrogen usage.
56
Sources of bias - 11Summary– Before starting a study, one should list the
likely sources of bias and plan the investigation and analyses so as to prevent/minimize them
57
Specific limitations of Case control study
• Is not useful to study weak associations (OR < 1.5)
• Non-participation rates are freq. low and differential for cases and controls
• Differential recall bias
58
Applications of Case control study
1. Vaccine effectiveness
2. Evaluation of treatment and program efficacy
3. Evaluation of screening programs
4. Outbreak investigations
5. Demography
6. Genetic epidemiology
7. Occupational epidemiology
59
Pertussis vacc. in UK - 1Year Event(s)
1906 Bordet and Gengou of the Pasteur Institute grow the pertussis bacterium in artificial media
1912-14 Pert. vaccine used by many researchers
Next few years
Many versions of vaccine developed
1942 Several local authorities in UK start vaccine
1947-48 First published reports appear of irreversible brain damage after whole-cell pertussis vaccine
1957 85,000 cases of pertussis reportedVaccination magnified to national scale
1975 Cases came down to 8,900Pertussis incidence peaks every 4 yearsThe peaks became smaller and smaller, the smallest was in 1974-75The next peak at 1978 should have been the smallest, but was it?
60
Pertussis vacc. in UK - 2
Vaccine acceptance rates Pertussis incidence rates
61
Pertussis vacc. in UK - 3
Year Event(s)
1974-75 Adverse publicity by media about the side effects of pert. vaccineParents and doctors hesitated to give vaccine
1976-79 National Childhood Encephalopathy Study (NCES) commissioned by the Dept. of Health and Social Security
1974 Vaccine acceptance rate came down (from 78% in 1971) to 37%
1977-79 An epidemic of pertussis occurs in Great Britain. > 100,000 cases and 36 deaths
1979 Vaccine Damage Payment Act passed in Great Britain. The act provides a mechanism for government compensation to those with vaccine-associated injuries
62
Pertussis vacc. in UK - 4
• Findings of the NCES study;
• Attributable risk – – Serious neurological disorders = 1 in 1,10,000 injs.– Persistent neurological sequelae = 1 in 3,10,000 injs.
63
Pertussis vacc. in UK - 5
Year Event(s)
1982 British Child Health and Education Study Long-term neurologic problems are not found to be related to pertussis immunizations.
1983 Communicable Diseases Surveillance Centre Study, or North West Thames Study, followed a large group of children after pertussis vaccination, finds no convincing evidence relating DPT vaccine to neurologic damage.
1988 Loveday judgment in Great Britain's High Court rules that there is insufficient evidence to demonstrate that pertussis vaccine can cause permanent brain damage.Considered as "test case" meaning that other lawsuits claiming permanent neurologic effects from pertussis vaccine are effectively excluded.
64
Pertussis vacc. in UK - 61990-
Happy ending
?
65
Critical Appraisal of NCES - 1• Research question
– Intended and actual
• Study design– Case control – reasons for choosing– Cohort – reasons for not choosing
• Case selection
66
Critical Appraisal of NCES - 2
• Only hospital admitted cases were selected as cases – any comments?
• Control selection
– Comments?
67
Critical Appraisal of NCES - 3• Exposure measurement
68
Critical Appraisal of NCES - 4• Results
• There was no noticeable clustering in any area
69
Critical Appraisal of NCES - 5• Results• 3.5% of cases and 1.7% of controls had been
immunized• OR 0f 2.4, p value < 0.001
70
Critical Appraisal of NCES - 6• Results• There was no significant association between serious
neurological illness and diphtheria and tetanus vaccine• Confounders• History of fits
– Is a known contraindication to immunization, including such cases will underestimate OR,
– A separate analysis limited to normal children with no past history of fits gave a RR of 3.2
• Social class– Could not be controlled– But analysis in those pairs of children in which both the affected
and control were of the same social class – no differences
71
Critical Appraisal of NCES - 7• Causation Vs. association
A) clinically distinctive
B) restricted to immunized children
C) closely related in time to immunization
D) biologically plausible
E) without alternative explanation
• Attributable risk– Can this be calculated in a case control study?– Covered an entire national population (in theory represents the
total incidence of serious neurological illnesses, assumption about immunization coverage)
– Serious neurological disorders = 1 in 1,10,000 injs.– Persistent neurological sequelae = 1 in 3,10,000 injs.– Is this appropriate?
72
Thank you