Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza...
-
Upload
danny-stanard -
Category
Documents
-
view
219 -
download
3
Transcript of Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza...
![Page 1: Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649cb05503460f949752c7/html5/thumbnails/1.jpg)
Preparing Data for Analysis
National Center for Immunization & Respiratory Diseases
Influenza Division
Nishan Ahmed
Regional Training Workshop on Influenza Data Management
Phnom Penh, Cambodia
July 27 – August 2, 2013
![Page 2: Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649cb05503460f949752c7/html5/thumbnails/2.jpg)
• Check for accuracy of observations and correct or eliminate inaccuracies– Important for both simple and complex data
• Questions to ask:– Are values outside of what you would normally
observe?– If yes, are values due to inaccuracies in the data
or to real changes in activity (i.e. an outbreak, start of influenza season)
• Values can be inaccurate due to many factors• Data Entry mistake• Incorrect measurement at site• Incorrect analysis
Data Cleaning: What is it?
![Page 3: Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649cb05503460f949752c7/html5/thumbnails/3.jpg)
• To prepare your data for regular analysis– Steps:
• Prepare a copy for temporary cleaning, but also clean the original data source as corrections are validated
• If data is not cleaned at source, cleaning will need to be done each time analysis is attempted (i.e. records can be temporarily deleted until verified or corrected)
• To finalize a dataset for future analysis/create a clean copy to be used for research– Typically a more thorough process than
cleaning during a flu season
Data Cleaning: Why do it?
![Page 4: Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649cb05503460f949752c7/html5/thumbnails/4.jpg)
• To check for validity and consistency of reported variables– Ensures that the data collected makes
sense• Examples:
– # of ILI cases is not greater than the # of patient visits
– The date of onset is before data of death– Only enrolled sites should be reporting &
included in analysis of sentinel data
• To check for data outliers– A facility that normally sees ~100 patient
visits will probably not see 1,000 patients during a week
• To identify and remove duplicate records
Data Cleaning: Why do it?
![Page 5: Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649cb05503460f949752c7/html5/thumbnails/5.jpg)
• How do you find data that has problems?– Eyeball method– Through quick, simple data queries
• Access or Excel queries as you go– Statistical methods – Through pre-programmed automated
processes• Used for elements that are routinely cleaned• Example: Automated process for deleting
duplicate records
Methods to identify problems
![Page 6: Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649cb05503460f949752c7/html5/thumbnails/6.jpg)
Eyeball Method
![Page 7: Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649cb05503460f949752c7/html5/thumbnails/7.jpg)
• To find duplicate records, using Access
Quick and Simple Queries
![Page 8: Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649cb05503460f949752c7/html5/thumbnails/8.jpg)
• To check validity of variables
Quick and Simple Queries
![Page 9: Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649cb05503460f949752c7/html5/thumbnails/9.jpg)
Automated Processes: Duplicates
![Page 10: Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649cb05503460f949752c7/html5/thumbnails/10.jpg)
• Measures of Center– Mean: Sum of the observations divided
by the number of observations.– Median: The middle value in an ordered
list– Mode: The most frequently occurring
value
Basic Statistic Measures
Measures of Variation or Spread
Standard Deviation: measures variation by indication how far, on average, the observations are from the mean
![Page 11: Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649cb05503460f949752c7/html5/thumbnails/11.jpg)
Equations in ExcelMean Median Standard Deviation
![Page 12: Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649cb05503460f949752c7/html5/thumbnails/12.jpg)
• Example: Checking for outliers– The US ILI system uses a statistical
process to check for outliers:• Look at # of patient visits over time from a
given provider• That # should be consistent within a certain
degree of change (i.e. 4 standard deviations from the mean)
• All values above or below this value are selected and checked manually to verify whether or not the values are reasonable and make sense.
Data Cleaning Processes
![Page 13: Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649cb05503460f949752c7/html5/thumbnails/13.jpg)
Data Outliers in Excel
![Page 14: Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649cb05503460f949752c7/html5/thumbnails/14.jpg)
Data Cleaning
01002: Data could not be disproved, left in.
04099: Fixed data based on returned workfolder
04108: Data looked OK to surveillance staff, this was the peak of pandemic, and we would have expected numbers to be high
![Page 15: Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649cb05503460f949752c7/html5/thumbnails/15.jpg)
• List of errors found during the cleaning process
• Helps to keep track of changes made to records during the cleaning process.– Keep track of how the data has changed
over time– Used for follow-up on questions to sites
• May be manual or automated– Based on needs of the data
Error Logs
![Page 16: Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649cb05503460f949752c7/html5/thumbnails/16.jpg)
Example of Error Log
DateStat
eSpecime
n IDPatient
ID FieldPrior Value
Current Value
Reason for Change
Your Initial
s Comments
2/9/11 MDA110091
9399573
1SPECIME
N idA1100919
3A1100919
3bcoinfection H3
and 2009 H1N1 AB
changed one specimen id to 'b' so would be
coded as two separate viruses
2/9/11 MDA110053
7099166
9SPECIME
N idA1100537
0A1100537
0bcoinfection H3
and 2009 H1N1 AB
changed one specimen id to 'b' so would be
coded as two separate viruses
2/9/11 MDA110109
9199761
1SPECIME
N idA1101099
1A1101099
1Bcoinfection B
and 2009 H1N1 AB
changed one specimen id to 'b' so would be
coded as two separate viruses
2/9/11 SDM11VR00
083038818
2SPECIME
N idM11VR00
0830
M11VR000830 (a, b,
c)coinfection 2009 H1N1, H3, and B AB
changed one specimen id to 'b' so would be
coded as two separate viruses
![Page 17: Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649cb05503460f949752c7/html5/thumbnails/17.jpg)
• Preparing data for analysis includes finding and cleaning as many data errors as possible– Statistical methods, the eyeball method,
and simple queries can all be used to find potential data errors
• Data cleaning is important because data errors could alter the interpretation of data (i.e. could cause a perceived increase without a true increase in disease activity)
• Error logs are useful in accounting for errors and how they were dealt with
Conclusions