Post on 14-Dec-2015
Introduction to Introduction to Secondary Data AnalysisSecondary Data Analysis
Young Ik Cho, PhD Research Associate Professor
Survey Research LaboratoryUniversity of Illinois at Chicago
Fall, 2009
Survey Research Laboratory
2 of 20
What is secondary data?What is secondary data?
• Data collected by a person or organization other than the users of the data
Survey Research Laboratory
3 of 20
Advantages of Secondary DataAdvantages of Secondary Data
• Unobtrusive
• Fast & inexpensive
• Avoid data collection problems
• Provide bases for comparison
Survey Research Laboratory
4 of 20
Disadvantages of Secondary DataDisadvantages of Secondary Data
• Data availability
• Level of observation
• Quality of documentation
• Data quality control
• Outdated data
Survey Research Laboratory
5 of 20
Data SourcesData Sources
Inter-university Consortium for Political and Social Research (ICPSR)http://www.icpsr.umich.edu/icpsrweb/ICPSR/
National Center for Health Statistics (NCHS) http://www.cdc.gov/nchs/surveys.htm
Center for Medicare and Medicaid Services (CMS) http://www.cms.hhs.gov/home/rsds.asp
US Census Bureau http://www.census.gov/main/www/access.html
Survey Research Laboratory
6 of 20
Examples of Directly Downloadable Data from NCHS:
National Health and Nutrition Examination Survey (NHANES)
National Ambulatory Medical Care Survey (NAMCS)
National Hospital Ambulatory Medical Care Survey (NHAMCS)
National Hospital Discharge Survey (NHDS)
National Home and Hospice Care Survey (NHHCS)
National Nursing Home Survey (NNHS)
National Survey of Ambulatory Surgery (NSAS)
National Employer Health Insurance Survey (NEHIS)
National Vital Statistics System (NVSS)
National Health Interview Survey (NHIS)
Data Sources (cont.)Data Sources (cont.)
Survey Research Laboratory
7 of 20
Data Available for Use with Survey Documentation and Analysis (SDA):http://www.icpsr.umich.edu/icpsrweb/ICPSR/access/sda.jsp
Aging Data National Archive of Computerized Data on Aging (NACDA)http://www.icpsr.umich.edu/NACDA/
Holding about 160 survey data including: • Longitudinal Study of Aging, 70 Years and Older, 1984-1990• National Survey of Self-Care and Aging: Follow-Up, 1994 • National Health and Nutrition Examination Survey II: Mortality Study, 1992• National Hospital Discharge Survey, 1994-1997• National Health Interview Survey, 1994, Second Supplement on Aging
Data Sources (cont.)Data Sources (cont.)
Survey Research Laboratory
8 of 20
SDA (continued):
Substance Abuse Data Substance Abuse and Mental Health Data Archive (http://www.icpsr.umich.edu/SAMHDA/)
• Drug Abuse Warning Network• Monitoring the Future • National Household Survey on Drug Abuse • National Pregnancy and Health Survey• National Treatment Improvement Evaluation Study • Treatment Episode Data Set • Uniform Facility Data Set
Data Sources (cont.)Data Sources (cont.)
Survey Research Laboratory
9 of 20
SDA (continued):
Criminal Justice Data National Archive of Criminal Justice Data (NACJD) (http://www.icpsr.umich.edu/NACJD/)
• International Crime Data • Homicide Data • National Crime Victimization Survey Data• Corrections Data
Data Sources (cont.)Data Sources (cont.)
Survey Research Laboratory
10 of 20
Evaluation of Data SourcesEvaluation of Data Sources
• Purpose of the study
• Sponsor/collector of the data
• Mode of data collection
• Sampling procedures
• Consistency of data with other sources
Survey Research Laboratory
11 of 20
Evaluation of Data Sources (cont.)Evaluation of Data Sources (cont.)
• Documentation
• Number of observations
• Number of variables
• Coding scheme
• Summary statistics
Survey Research Laboratory
12 of 20
Types of Survey Sample DesignTypes of Survey Sample Design
• Simple Random Sampling
• Systematic Sampling
• Complex sample designs
▪ stratified designs
▪ cluster designs▪ mixed mode designs
Survey Research Laboratory
13 of 20
Types of Survey Sample DesignTypes of Survey Sample Design
• Simple Random Sampling Each member of the population has an equal
and known chance of being selected Simple Random Sample With Replacement
(SRSWR) Simple Random Sample Without
Replacement (SRSWOR)
Survey Research Laboratory
14 of 20
Types of Survey Sample DesignTypes of Survey Sample Design
• Systematic Random Sampling the selection of every kth element from a
sampling frame with the sampling interval k (=N/n).
Survey Research Laboratory
15 of 20
Types of Survey Sample DesignTypes of Survey Sample Design
• Stratified sample The population is first divided into non-
overlapping subpopulations: strata such as gender, race or SES.
Sample from each strata. Works most effectively when the variance is
smaller within the strata than in the sample as a whole.
Survey Research Laboratory
16 of 20
Types of Survey Sample DesignTypes of Survey Sample Design
• Cluster sample Elements are selected in groups or clusters
PSU: Primary Sampling Unit. This is the first unit that is sampled in the design. For example, school districts from Chicago may be sampled and then schools within districts may be sampled.
Homogeneity within cluster: Intracluster Correlation Coefficient (ICC)
Survey Research Laboratory
17 of 20
Why complex survey design?Why complex survey design?
• Increased efficiency
• Decreased costs
Survey Research Laboratory
18 of 20
Sample Weights Sample Weights
• Selection weight: Used to adjust for differing probabilities of selection (=N/n).
• In theory, simple random samples are self-weighted
• In practice, simple random samples are likely to also require adjustments for non-response
Survey Research Laboratory
19 of 20
Types of Sample WeightsTypes of Sample Weights
• Post-stratification weights: designed to bring the sample proportions in demographic subgroups into agreement with the population proportion in the subgroups.
Survey Research Laboratory
20 of 20
Types of Sample Weights (cont.)Types of Sample Weights (cont.)
• Non-response weights: designed to inflate the weights of survey respondents to compensate for nonrespondents with similar characteristics.
Survey Research Laboratory
21 of 20
Types of Sample Weights (cont.)Types of Sample Weights (cont.)
• “Blow-up” (expansion) weights: provide estimates for the total population of interest
Survey Research Laboratory
22 of 20
Types of Sample Weights (cont.)Types of Sample Weights (cont.)
• Replicate weights: A series of weight variables that are used instead of PSUs and strata in an effort to protect the respondents' identity. Selection weight and the replicate weights must be used for the correct calculation of the point estimate and its standard error.
Survey Research Laboratory
23 of 20
Complex Survey Design Effect Complex Survey Design Effect
• Complex designs with clustering and unequal selection probabilities generally increase the sampling variance.
• Not accounting for the impact of complex sample design can lead to biased estimates.
Survey Research Laboratory
24 of 20
Complex Survey Design EffectComplex Survey Design Effect
• The ratio of the design-based standard error to the SRS standard error of a variable:
• Deff=SE(des)/SE(srs)
• Deff= 1 + ρ (n – 1)where the ρ is the interclass correlation and n is the number of elements in the cluster.
Survey Research Laboratory
How can we adjust for How can we adjust for the design effects?the design effects?
• Find variables identifying the primary sampling units (psu), the strata, and the weight(s).
• Use appropriate software to adjust for the design effect.
25 of 20
Survey Research Laboratory
26 of 20
Syntax Examples of Design-based Syntax Examples of Design-based Analysis in SAS, STATA & SUDAAN Analysis in SAS, STATA & SUDAAN
SAS
proc surveyreg data=nhanes;
strata strata;
cluster psu;
class sex race;
model fatintk = age sex race;
weight finalwt
STATA
svyset strata strata
svyset psu psu
svyset pweight finalwt
svyreg fatitk age male black hispanic
Survey Research Laboratory
27 of 20
Syntax Examples of Design-based Syntax Examples of Design-based Analysis in STATA, SUDAAN & SASAnalysis in STATA, SUDAAN & SAS
SUDAAN
proc regress data=”c:\nhanes.sav” filetype=spss desgn=wr;
nest strata psu;
weight finalwt
subpgroup sex race;
levels 2 3;
model fatintk = age sex race;