Public Health Information Network (PHIN) Series II Outbreak Investigation Methods: From Mystery to...
-
Upload
kory-bruce -
Category
Documents
-
view
214 -
download
0
Transcript of Public Health Information Network (PHIN) Series II Outbreak Investigation Methods: From Mystery to...
Public Health Information Network (PHIN)
Series II
Outbreak Investigation Methods:
From Mystery to Mastery
Access Series Files Online http://www.vdh.virginia.gov/EPR/Training.asp
Session slides
Session activities (when applicable)
Session evaluation forms
Speaker biographies
Alternate Web site: http://www.sph.unc.edu/nccphp/phtin/index.htm
Site Sign-in Sheet
Please submit your site sign-in sheet andsession evaluation forms to:
Suzi SilversteinDirector, Education and Training
Emergency Preparedness & Response Programs
FAX: (804) 225 - 3888
Series IISession VI
“Data Analysis”
Series II Sessions
1. “Recognizing an Outbreak”
2. “Risk Communication”
3. “Study Design”
4. “Designing Questionnaires”
5. “Interviewing Techniques”
6. “Data Analysis”
7. “Writing and Reviewing Epidemiological Literature”
Today’s Presenters
Amy Nelson, PhDConsultantNC Center for Public Health Preparedness
Sarah Pfau, MPHConsultantNC Center for Public Health Preparedness
“Analyzing Data” Learning Objectives
Upon completion of this session, you will:
• Understand what an analytic study contributes to an epidemiological outbreak investigation
• Understand the importance of data cleaning as a part of analysis planning
“Analyzing Data” Learning Objectives
• Know why and how to generate descriptive statistics to assess trends in your data
• Know how to generate and interpret epi curves to assess trends in your outbreak data
• Understand how to interpret measures of central tendency
“Analyzing Data” Learning Objectives (cont’d.)
• Know why and how to generate measures of association for cohort and case-control studies
• Understand how to interpret measures of association (risk ratios, odds ratios) and corresponding confidence intervals
• Know how to generate and interpret selected descriptive and analytic statistics in Epi Info software
Lecturer
Amy Nelson, PhD
Consultant,
NC Center for Public Health Preparedness
Analyzing Data: Session Overview
• Analysis planning• Descriptive epidemiology
– Epi curves– Spot maps– Measures of central tendency – Attack rates
• Analytic epidemiology– Measures of association
• Case study analysis using Epi Info software
Analysis Planning
Analysis Planning
– An invaluable investment of time
– Helps you select the most appropriate epidemiologic methods
– Helps assure that the work leading up to analysis yields a database structure and content that your preferred analysis software needs to successfully run analysis programs
Analysis Planning
Several factors influence—and sometimes limit—your approach to data analysis:
– Research question
– Exposure and outcome variables
– Study design
– Sample population
Analysis Planning
Three key considerations as you plan your analysis:
1. Work backwards from the research question(s) to design the most efficient data collection instrument
2. Study design will determine which statistical tests and measures of association you evaluate in the analysis output
3. Consider the need to present, graph, or map data
Analysis Planning
1. Work backwards from the research question(s) to design the most efficient data collection instrument
• Develop a sound data collection instrument
• Collect pieces of information that can be counted, sorted, and recoded or stratified
• Analysis phase is not the time to realize that you should have asked questions differently!
Analysis Planning
2. Study design will determine which statistical tools you will use
• Use risk ratio (RR) with cohort studies and odds ratio (OR) with case-control studies; need to know which to evaluate, because both are generated simultaneously in Epi Info and SAS
• Some sampling methods (e.g., matching in case-controls studies) require special types of analysis
Analysis Planning
3. Consider the need to present, graph, or map data
• Even if you collect continuous data, you may later categorize it so you can generate a bar graph and assess frequency distributions
• If you plan to map data, you may need X-and Y-coordinate or denominator data
Basic Steps of an Outbreak Investigation
1. Verify the diagnosis and confirm the outbreak
2. Define a case and conduct case finding
3. Tabulate and orient data: time, place, person
4. Take immediate control measures
5. Formulate and test hypotheses
6. Plan and execute additional studies
7. Implement and evaluate control measures
8. Communicate findings
Descriptive Epidemiology
Step 3: Tabulate and orient data: time, place, person
Descriptive epidemiology:
•Familiarizes the investigator with the data
•Comprehensively describes the outbreak
•Is essential for hypothesis generation (step #5)
Data Cleaning
• Check for accuracy– Outliers
• Check for completeness– Missing values
• Determine whether or not to create or collapse data categories
• Get to know the basic descriptive findings
Data Cleaning:Outliers
• Outliers can be cases at the very beginning and end that may not appear to be related– First check to make certain they are not due to a
collection, coding or data entry error
• If they are not an error, they may represent– Baseline level of illness– Outbreak source– A case exposed earlier than the others– An unrelated case– A case exposed later than the others– A case with a long incubation period
Data Cleaning:Distribution of Variables
Illness Onset for Outbreak of Gastrointestinal Illness at a Nursing Home
0
2
4
6
8
Day of Onset
Nu
mb
er o
f C
ases
“Outlier”
Data Cleaning:Missing Values
• The investigator can check into missing values that are expected versus those that are due to problems in data collection or entry
• The number of missing values for each variable can also be learned from frequency distributions
Data Cleaning:Frequency Distributions
Data Cleaning:Data Categories
• Which variables are continuous versus categorical?
• Collapse existing categories into fewer?
• Create categories from continuous? (e.g., age)
Descriptive Epidemiology
• Comprehensively describes the outbreak– Time– Place– Person
Descriptive Epidemiology
Time
Descriptive Epidemiology: Time
02468
101214161820
Day
# o
f C
ases
Descriptive Epidemiology:Time
• What is an epidemic curve and how can it help in an outbreak?
– An epidemic curve (epi curve) is a graphical depiction of the number of cases of illness by the date of illness onset
Descriptive Epidemiology:Time
• An epi curve can provide information on the following characteristics of an outbreak:
– Pattern of spread– Magnitude– Outliers– Time trend– Exposure and / or disease incubation period
Epidemic Curves
The overall shape of the epi curve can reveal the type of outbreak (the pattern of spread)
• Common source– Intermittent– Continuous– Point source
• Propagated
Epidemic Curves:Common Source
• People are exposed to a common harmful source
• Period of exposure may be brief (point source), long (continuous) or intermittent
Epi Curve: Common Source Outbreak with Intermittent Exposure
Pattern of Spread
Epi Curve: Common Source Outbreak with Continuous Exposure
Pattern of Spread
Epi Curve: Point Source Outbreak
Pattern of Spread
Epi Curve: Propagated Outbreak
Pattern of Spread
Epidemic Curves
Magnitude
Epidemic Curves:Time Trend
Provide information about the time trend of the outbreak
• Consider:– Date of illness onset for the first case– Date when the outbreak peaked – Date of illness onset for the last case
Epidemic Curves
Time Trend
Epidemic Curves:Incubation Period
• If the timing of the exposure is known, epi curves can be used to estimate the incubation period of the disease
• The time between the exposure and the peak of the epi curve represents the median incubation period
Epidemic Curves:Incubation Period
• In common source outbreaks with known incubation periods, epi curves can help determine the average period of exposure
– Find the average incubation period for the organism and count backwards from the peak case on the epi curve
Epidemic Curves
• This can also be done to find the minimum incubation period
– Find the minimum incubation period for the organism and count backwards from the earliest case on the epi curve
Exposure / Outbreak Incubation Period
• Average and minimum incubation periods should be close and should represent the probable period of exposure
• Widen the estimated exposure period by 10% to 20%
Calculating Incubation Period
Onset of illness among cases of E. coli O157:H7 Infection, Massachusetts, December, 1998.
Creating an Epidemic Curve
Provide a descriptive titleLabel each axisPlot the number of cases of disease
reported during an outbreak on the y-axisPlot the time or date of illness onset on the
x-axisInclude the pre-epidemic period to show
the baseline number of cases
Epi Curve for a Common Source Outbreak with Continuous Exposure
Y-
Axi
s
X - Axis
Creating an Epidemic CurveX-axis considerations
Choice of time unit for x-axis depends upon the incubation period
• Begin with a unit approximately one quarter the length of the incubation period
Example: 1. Mean incubation period for influenza = 36 hours2. 36 x ¼ = 93. Use 9-hour intervals on the x-axis for an outbreak
of influenza lasting several days
Creating an Epidemic Curve
X-axis considerations
• If the incubation period is not known, graph several epi curves with different time units
• Usually the day of illness onset is the best unit for the x-axis
Epi Curve X-Axis Considerations
05
101520253035404550
10/1-10/7 10/8-10/14 10/15-10/21 10/22-10/28
Week of Onset
# o
f C
ases
X-axis unit of time = 1 week X-axis unit of time = 1 day
Descriptive Epidemiology
Place
Descriptive Epidemiology: Place
• Spot map
– Shows where cases live, work, spend time
– If population size varies between locations being compared, use location-specific attack rates instead of number of cases
Descriptive Epidemiology: Place
Source: http://www.phppo.cdc.gov/PHTN/catalog/pdf-file/LESSON4.pdf
Descriptive Epidemiology
Person
Descriptive Epidemiology: Person
Data summarization for descriptive
epidemiology of the population
• Line listings
• Graphs– Bar graphs– Histograms
Line Listing Signs/
SymptomsLab Demograph
ics
Case #
Report Date
Onset Date
Physician
Diagnosis
N V J HAIgM
Sex
Age
1 10/12/02 10/5/02 Hepatitis A
1 1 1 1 M 37
2 10/12/02 10/4/02 Hepatitis A
1 0 1 1 M 62
3 10/13/02 10/4/02 Hepatitis A
1 0 1 1 M 38
4 10/13/02 10/9/02 NA 0 0 0 NA F 44
5 10/15/02 10/13/02 Hepatitis A
1 1 0 1 M 17
6 10/16/02 10/6/02 Hepatitis A
0 0 1 1 F 43
Bar Graph
Descriptive Epidemiology
• Measures of central tendency– Mean
– Median
– Mode
– Range
Measures of Central Tendency
Mean (Average)The sum of all values divided by the number of values
Example:
1.Cases 7,10, 8, 5, 5, 37, 9 years old
2.Mean = (7+10+8+5+5+37+9)/7
3.Mean = 11.6 years of age
Measures of Central Tendency
Median (50th percentile)
The value that falls in the middle position when the measurements are ordered from smallest to largest
Example:
1.Ages 7,10, 8, 5, 5, 37, 9
2.Ages sorted: 5, 5, 7, 8, 9,10, 37
3.Median age = 8
Calculate a Median ValueIf the number of measurements is odd:
Median = value with rank (n+1) / 2• 5, 5, 7, 8, 9,10, 37 • n = 7, (n+1) / 2 = (7+1) / 2 = 4• The 4th value = 8
Where n = the number of values
Calculate a Median ValueIf the number of measurements is even:Median=average of the two values with:a. rank of n / 2 and b. rank of (n / 2) + 1Where n = the number of values
• 5, 5, 7, 8, 9,10, 37 • n = 7; (7 / 2) = 3.5. So “8” is the first
value• (7 / 2) + 1 = 4.5, so “9” is the second
value• (8 + 9) / 2 = 8.5• The Median value = 8.5
Measures of Central TendencyMode [Modal Value]
• The value that occurs the most frequently– Example: 5, 5, 7, 8, 9,10, 37
Mode= 5
• It is possible to have more than one mode– Example: 5, 5, 7,8,10,10, 37
Modes= 5 and 10
Measures of Central Tendency
Mode [Modal Value]:
The value for the variable in which the greatest frequency of records fall
Epi Info limitation: If multiple values share the same frequency that is also the highest frequency, Epi Info will identify only the first value it encounters as “Mode” as it scans the table in ascending order
Measures of Central Tendency Mode Software Limitation
The ages 11, 17, 35, and 62 all qualify for the status of “mode,” but Epi Info identifiesAge 11 as the mode in analysis output for MEANS AGE in viewOswego.
Modal Values
Measures of Central Tendency
3 7711 36.836.0Min MaxMode
50th percentile
Median Mean(average)
Activity:Calculate Mean and Median
Completion time: 5 minutes
Calculate Mean and Median AgeCase # Age (Years)1 5
2 9
3 7
4 6
5 8
6 5
For an even number of measurements, Median = the average of two values ranked:
a. N / 2b. (n / 2) + 1
Calculate Mean and Median Age
Mean age:• 5+9+7+6+8+5=40• 40 / 6 = 6.67 years
Median age:• 5,5,6,7,8,9• Average of values ranked (n/2) and (n/2)+1• =(6/2) and (6/2) +1 = average of 6 and 7• =(6+7) / 2 = 6.5 years
Question & Answer Opportunity
5 minute break
Attack Rates
Attack Rates (AR)AR
# of cases of a disease
# of people at risk (for a limited period of time)
Food-specific AR# people who ate a food and became ill
# people who ate that food
Food-Specific Attack Rates
CDC. Outbreak of foodborne streptococcal disease. MMWR 23:365, 1974.
Consumed
ItemDid Not Consume
Item
Item Ill Total AR(%) Ill Total AR(%)
Chicken 12 46 26 17 29 59
Cake 26 43 61 20 32 63
Water 10 24 42 33 51 65
Green Salad 42 54 78 3 21 14
Asparagus 4 6 67 42 69 61
This food is probably not the source of infection
Stratified Attack Rates
Ill Well Total AR(%)
Women 13 16 29 45
Men 5 27 32 16
Attack rate in women: 13 / 29 = 45%
Attack rate in men: 5 / 32 = 16%
Hypothesis Generation vs. Hypothesis Testing
Hypothesis Generation vs. Hypothesis Testing
Formulate hypotheses– Occurs after having spoken with some case –
patients and public health officials – Based on information form literature review– Based on descriptive epidemiology (step #3)
Test hypotheses– Occurs after hypotheses have been generated– Based on analytic epidemiology
Descriptive Epidemiology
Analytic Epidemiology
Search for clues Clues available
Formulate hypotheses Test hypotheses
No comparison group Comparison group
Answers: How much, who, what, when, where
Answers: How, why
Analytic Epidemiology
Analytic Epidemiology
Measures of Association
• Risk Ratio (cohort study)
• Odds Ratio (case-control study)
Cohort versus Case-Control Study
Cohort versus Case-Control Study
Analysis Output
Cohort Study
Measure of Association
Risk Ratio
Ill Not Ill Total
Exposed A B A+B
Unexposed C D C+D
Risk Ratio [A/(A+B)]
[C/(C+D)]
Risk Ratio Example
Ill Well Total
Ate alfalfa sprouts
43 11 54
Did not eat alfalfa sprouts
3 18 21
Total 46 29 75
RR = (43 / 54) / (3 / 21) = 5.6
Interpreting a Risk Ratio
• RR=1.0 = no association between exposure and disease
• RR>1.0 = positive association
• RR<1.0 = negative association
Case-Control Study
Measure of Association
Odds Ratio
Cases Controls
Exposed A B
Unexposed C D
Odds Ratio (A/C)/(B/D)=(A*D)/(B*C)
Odds Ratio Example
Case Control Total
Ate at restaurant X 60 25 85
Did not eat at restaurant X
18 55 73
Total 78 80 158
OR = (60 / 18) / (25 / 55) = 7.3
Interpreting an Odds Ratio
The odds ratio is interpreted in the same way as a risk ratio:
• OR=1.0 = no association between exposure and disease
• OR>1.0 = positive association
• OR<1.0 = negative association
What to do with a Zero CellCase Control Total
Ate at restaurant X 60 0 60
Did not eat at restaurant X
18 55 73
Total 78 55 133
•Try to recruit more study participants
•Add 1 to each cell*
*Remember to document / report this!
Confidence Intervals
Confidence Intervals• Allow the investigator to:
– Evaluate statistical significance
– Assess the precision of the estimate (the odds ratio or risk ratio)
• Consist of a lower bound and an upper bound
– Example: RR=1.9, 95% CI: 1.1-3.1
Confidence Intervals• Provide information on precision of
estimate
– Narrow confidence intervals =more precise
– Wide confidence intervals =less precise
• Example: OR=10, 95% CI: 0.9 - 44.0
• Example: OR=10, 95% CI: 9.0 - 11.0
Plan and Execute Additional Studies
• To gather more specific info– Example: Salmonella muenchen
• Intervention study – Example: implement intensive hand-washing
Question & Answer Opportunity
5 minute break
Epi Info Analysis
Case Study
Download Epi Info software for free at:
http://www.cdc.gov/epiinfo
Oswego Tutorial
1. Epi Info Main Menu
2. “Help”
3. “Tutorials”
4. “Oswego Tutorial”
Case Study Overview
• Oswego County, New York: 1940
• 80 people attended a church supper on 4 / 18
• 46 people who attended the supper suffered from gastrointestinal illness beginning 4 / 18 and ending 4 / 19
• 75 people (ill and non-ill) interviewed
• Investigation focus: church supper as source of infection
Church Supper
• Supper held in the church basement.
• Foods contributed by numerous families.
• Supper from 6:00 PM to 11:00 PM, so food consumed over a period of several hours.
Case StudyDescriptive Epidemiology
Investigators needed to determine:
a) The type of outbreak occurring;
b) The pathogen causing the acute gastrointestinal illness; and
c) The source of infection
Data Cleaning
Know your data! Know the:• Number of records
• Field formats and contents
• Special properties
• Table relationships
Data Cleaning
Tell Epi Info which records to include in analyses
“Set” command in Analyze Data
Case Study: Line Listing
• Organize and review data about time, person, and place that were collected via hypothesis generating interviews.
Case Study: Line Listing
Code for generating output:
Line Listing Windows Commands
1. Read (viewOswego in Sample.MDB)
2. Sort (on AGE, in ascending order)
3. Select (only the cases where ILL=“Yes”)
4. List (generate a line listing with the fields AGE, SEX, and DATEONSET)
Case Study: Means
Code: Windows command:
Means (of AGE)
Distribution: Frequency by Gender
Windows command:
Frequencies (by SEX)
Case Study:Epidemic Curve
Variable of Interest:
DATEONSET (date of onset of illness)
– Entered into database mm/dd/yyyy/hh/mm/ss/AM PM
Case Study: Epidemic Curve
Point-Source Outbreak
‘Textbook’ distributionCase Study distribution
Case Study: Epidemic Curve
Maximum incubation period
Overlap
Average incubation period
Outlier?
Using Epi Info to Create Epi Curves
Step-by-Step Instructions1. Open the Analyze Data component
2. Use the “Read” command to access your data table
3. Click on the “Graph” command
4. Choose “Histogram” as the “Graph Type”
5. Choose your date / time of illness onset variable as the x- axis main variable
Using Epi Info to Create Epi Curves
Step-by-Step Instructions6. Choose “count” from the “Show value of”
option beneath the y-axis option
7. Choose weeks, days, hours, or minutes for the x-axis interval from the “interval” dropdown menu
8. Type in graph title where it says “Page title”
9. Click “OK”
Determine Incubation Period
Alternative: Create a temporary variable called “Incubation” in Analyze Data:
INCUBATION = DATEONSET – TIMESUPPER
Where field format is identical:
Date / time – mm/dd/yyyy/hh/mm/ss/AM PM
Means INCUBATIONAnalysis Output
Calculate Mean Incubationin Epi Info
Identify the Pathogen. . .
Identify the Pathogen. . .
CDC’s Foodborne Outbreak Response and Surveillance Unit
“Guide to Confirming the Diagnosis in Foodborne Diseases”
http://www.cdc.gov/foodborneoutbreaks/guide_fd.htm
Case Study: Attack Rates
Obtain the information that you need to calculate food-specific attack rates via:
A. Stratified Frequency TablesB. Line ListingsC. 2 x 2 Tables
Food-specific AR# people who ate a food and became ill
# people who ate that food
Stratified Frequency Tables
AR for people who consumed cake: 27 / 40 = 67.5%
40 people ate cake; 27 people who ate cake are ill.
AR for people who did not consume cake:
19 / 35 = 54.2%
35 people did not eat cake;19 of those people are ill.
Frequencies CAKE ; Stratify by ILL
Line Listings
13 + 27 people ate cakes27 people who ate cake are ill
AR for people whoConsumed cake: 27 / 40 = 67.5%
Not Ill
Ill
Tables Analysis Output
Windows command: Tables (Exposure = CAKES; Outcome = ILL)
2 x 2 Table
Activity: Interpreting Output
What percentage of people who ate cake did not get ill?
Activity: Interpreting Output
Answer: 32.5% of the people who ate cake did not get ill.
Exposure Outcome
Case Study Attack Rates Consumed
ItemDid Not Consume
Item
Item Ill Total AR(%) Ill Total AR(%)
Baked Ham 29 46 63% 17 29 59%
CabbageSalad
18 28 64% 28 47 60%
Cakes 27 40 68% 19 35 54%
Chocolate Ice Cream
25 47 53% 20 27 74%
VanillaIce Cream
43 54 80% 3 21 14%
We should further investigate the association of vanilla ice cream consumption and illness
Generate and Test a Hypothesis!
• The epi curve is indicative of a Point-Source outbreak
• Based on the incubation period, we suspect Staphylococcus aureus as the pathogen
• The food-specific attack rates lead us to believe that vanilla ice cream may be the source of infection
Case Study
Tables Analysis Output
2 x 2 Table Shell Epi Info 2 x 2 Table
Windows command: Tables (for VANILLA)
Tables Analysis Output
“The risk of becoming ill was more than five times greater for peoplewho consumed vanilla ice cream than for people who did not consume vanilla ice cream.”
Case StudyAnalytic Results
- Point-Source Outbreak
- Staphylococcus aureus suspected pathogen based on 4.3 hr average incubation period
- Vanilla ice cream suspected source of infection (highest food-specific AR of 80%)
- Vanilla ice cream RR = 5.6
- Vanilla ice cream C.I. = 1.9 – 16.0
Online Epi Info Instruction
http://www.sph.unc.edu/nccphp/training/all_trainings/at_epi_info.htm
8 Self-Instructional Training Modules for various screen components, functions, and commands in Analyze Data
Question & AnswerOpportunity
Next Session December 1st, 1:00 p.m. – 3:00 p.m.
Topic: “Writing and Reviewing Epidemiological Literature”
Session V Summary
Analysis planning can: be an invaluable investment of time; help you select the most appropriate epidemiologic methods; and help assure that the work leading up to analysis yields a database structure and content that your preferred analysis software needs to successfully run analysis programs.
As you plan your analysis: 1) Work backwards from the research question(s) to design the most efficient data collection instrument; 2) Consider your study design to guide which statistical tests and measures of association you evaluate in the analysis output; and 3) Consider the need to present, graph, or map data.
Session V SummaryDescriptive epidemiology: 1) Familiarizes the investigator with data about time, place, and person; 2) Comprehensively describes the outbreak; and 3) Is essential for hypothesis generation.
Data cleaning is the first step in preparing to generate descriptive statistics, as it contributes to the accuracy and completeness of the data.
Measures of central tendency provide a means of assessing the distribution of data. Measures include mean, median, mode, and range.
Epi curves, spot maps, and line listings are all ways in which you can generate and review the time, place, and person elements – respectively – of descriptive statistics.
Session V Summary
Attack rates are descriptive statistics that are useful for comparing the risk of disease in groups with different exposures (such as consumption of individual food items).
Analytic epidemiology allows you to test the hypotheses generated via review of descriptive statistics and the medical literature.
The measures of association for case control and cohort analytic studies, respectively, are odds ratios and risk ratios.
Confidence intervals that accompany measures of association evaluate the statistical significance of the measures and assess the
precision of the estimates.
References and Resources
Centers for Disease Control and Prevention (1992). Principles of Epidemiology, 2nd ed. Atlanta, GA: Public Health Practice Program Office.
Division of Public Health Surveillance and Informatics, Epidemiology Program Office, Centers for Disease Control and Prevention (January 2003). Epi Info Support Manual. [included with installation of the software, which can be found at: http://www.cdc.gov/epiinfo/index.htm]
Gordis L. (1996). Epidemiology. Philadelphia, WB Saunders.
References and Resources
Rothman KJ. Epidemiology: An Introduction. New York, Oxford University Press, 2002.
Stehr-Green, J. and Stehr-Green, P. (2004). Hypothesis Generating Interviews. Module 3 of a Field Epidemiology Methods course being developed in the NC Center for Public Health Preparedness, UNC Chapel Hill.
Torok, M. (2004). FOCUS on Field Epidemiology. “Epidemic Curves”. Volume 1, Issue 5. NC Center for Public Health Preparedness