Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey...
Transcript of Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey...
![Page 1: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/1.jpg)
11
Complex Survey Analysis2010 Workshop of the Association of Public
Health Epidemiologists of OntarioToronto, Ontario
September 20-21, 2010
Susan Bondy, PhDDalla Lana School of Public Health
![Page 2: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/2.jpg)
2
Outline
• Survey analysis in health context• Review of survey samples
– Complex design elements– Issues and implications
• Working with software• Tips / Q&A (all around)
![Page 3: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/3.jpg)
3
What we report from surveys• Descriptive statistics
– Means and rates (e.g., % prevalence), – TOTALS
• Measures of difference, association and effect– % diff, risk diff, OR, RR, rho, etc.– These test hypotheses
• Always reported with expression of variance– Margin of Error (MOE or +/- part)– Confidence intervals
![Page 4: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/4.jpg)
Analytic concerns (health surveys)
• Representativeness, ‘representivity’• Reliability / precision
– Impact of design elements on precision• Privacy and confidentiality
4
![Page 5: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/5.jpg)
Understanding Complex Samples
You will need to understand the jargon to use the software
![Page 6: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/6.jpg)
6
Simple Random Sample• Selection is entirely at random• Everyone has same selection probability
– No unequal or over-sampling– No stratification– Independent selection; not in groups
• Self-weighting (no probability weights)• Theoretically “With Replacement”
• Statistically efficient – But field costs might be a killer– Rare in multi-agenda public health surveys
![Page 7: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/7.jpg)
Strata
• Mutually-exclusive categories (layers)• COMPREHENSIVE (add up to whole pop,
or universe)
• These are NOT sampled• Sampling (of some other unit) is done
WITHIN these layers
![Page 8: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/8.jpg)
Strata - examples• All of Ontario
• Samples of households within EACH LHIN• LHIN is the stratum
• All school boards• Sample of classrooms within each board• Board is the stratum
• All ages• Sample separately for diff age groups• Age group is the stratum
8
![Page 9: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/9.jpg)
Strata vs. clusters – tougher examples
• E.g., health services research• 7 clinics offer ALL care for Ontario• Sampling done with each of all 7 centres.• Data used to describe all Ontario care.
• Clusters or strata?
9
![Page 10: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/10.jpg)
A: Strata
• Because:– They add up to the universe to be described
(“comprehensive”)– Not selected at random, – Layering fixed by design
10
![Page 11: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/11.jpg)
Forms of stratification
• EXplicit stratification– Also known as “over-sampling”– For planned “Domain analysis”
• E.g., LHIN-specific results
• Example– LHINs not equal in true population– Samples equal (for same precision in each)– Higher sampling fraction in smaller LHINs
• Creates need for sampling weights 11
![Page 12: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/12.jpg)
Example
Ontario, SRSEnds up with:• n=3000 in Toronto• 1000 would have been
plenty • Wasted money
• n=300 in North• Poor estimates• Suppressed data• Wasted money
Ontario, equal regional samples
On purpose:• 1000 in Toronto• 1000 in North
• Good, usable data• Cost-efficient
12
![Page 13: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/13.jpg)
Forms of stratification
• IMplicit stratification– Sampling again specific to each layer, but– Now the sampling is done to KEEP the
sample structured like the population – Sampling with Probability Proportionate to
Size (“PPS”)• Reduces need for sampling weights!
– Done to avoid a ‘bad sample’
13
![Page 14: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/14.jpg)
“Bad sample?”
• A good sample:– Has the same distribution of characteristics as
the real population• E.g., same proportions by age and sex
• Large enough samples are good ‘on average’
• but random is random and you will get weird samples
14
![Page 15: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/15.jpg)
Risk for ‘bad samples’
• Imagine a survey of babies• Triplets+ rare• VERY high rates of bad outcome• So, number of multiples will drive this year’s estimates
• SRS 1 – No triplets + • Low death rate estimated
• SRS 2 – accidentally 3 times the norm• High death rate estimated
• Net effect? • High sample to sample random error 15
![Page 16: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/16.jpg)
Implicit stratification• FORCE a few multiples into the sample
• Same small % as actual pop• Too few for a specific report• BUT, total survey less prone to random error, year
over year
PRINCIPLE:• Find factors strongly associated with outcome• Force this into design and analysis to gain
precision
16
![Page 17: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/17.jpg)
Another bad sample – sampling natural groups of unequal size
• Example: 346 Municipalities in Ontario– One Toronto– 8 Cities > 200,000 population– 337 small centres– Systematic size bias:
• Geo/politics • Smaller areas of governance where pop is spread thin
• Choosing SRS of communities would create a disproportionately rural sample !
• So selection is PPS (proportionate to size)
![Page 18: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/18.jpg)
18
“Group” sampling• E.g., people by FAMILY, students by CLASS,
teeth by MOUTH , babies including TWINS, etc.,
• Common in health studies – population and clinic-based surveys– Also experimental designs– May be used naïvely
• Used because of relative cost-efficiency
![Page 19: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/19.jpg)
“PSU” or “cluster”?Classic WHO household surveys
• One country is divided into thousands of PSUs – close to equal population size
• E.g., communities or parts of large cities– Stage 1: 50 to 100 of these “PSU” sampled
• Note the large number!– Stage 2: Sub-regions sampled within each
• Now ‘SSU’ often called “Clusters”– Stage 3: Households within SSUs
19
![Page 20: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/20.jpg)
Better jargon
• Primary Sampling Unit (PSU)• Secondary Sampling Unit (SSU)• Tertiary … you get the idea
• For most complex software, ideal to understand each stage:– Element sampled, sampling method and
fraction
20
![Page 21: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/21.jpg)
Stratum or cluster?
• 7 hospitals agreed to take part in some project; not at random, out of say 24
• Discuss…
21
![Page 22: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/22.jpg)
Analysis
22
![Page 23: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/23.jpg)
Analysis 1: Preparation of the data set
• Field staff have to finalize the dataset • Documenting numbers
– Complete observations– Final dispositions– Response and participation rates
• Data cleaning and documentation– Acting on skip patterns, etc.
23
![Page 24: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/24.jpg)
Preparation of sampling weights
• “Sampling” or “Stratification” weights– These undo the effects of oversampling– Calculated by figuring out:
• The true proportions in categories used in sampling (known for pop and sample, before selection)
• The raw proportions in the sample• Use weights to make sample apply to pop
24
![Page 25: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/25.jpg)
Post-stratification weights• Use information about people that you
couldn’t know before recruitment– E.g., education; smoking status– Again work out wanted percentages– Add further adjustment to weights– Only beneficial if correctly associated with
outcome of interestNecessary? Better? Needs to be
considered.25
![Page 26: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/26.jpg)
26
Survey estimation – two parts
• Prevalence = 13.0 (95% CI = 10.0-16.0)• Odds ratio = 2.1 (95% CI = 1.6-4.0)
Point estimates weighted to
correct for over-sampling
Variances calculated and applied using full
design information BY SMART
SOFTWARE
![Page 27: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/27.jpg)
Design Effect (DEFF)• A statistic showing how much less efficient
a complex sampling design is, relative to SRS of identical size
• DEFF =1 Same efficiency as SRS• DEFF >1 Less efficient than SRS• DEFF =2 As efficient as SRS of ½ size
27
Jargon:
![Page 28: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/28.jpg)
95% C.I.
Analysis type Estimate 95% CIModel-based(assume SRS)
13% 11.0 – 15.2
Account for weights 10% 8.0 – 12.3Account for weights and clustering
10% 7.5 – 13.0
28
Point estimateAffected by weights IF
population mixed
Affected by weighting AND by
clustering
![Page 29: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/29.jpg)
29
2 most common approaches for complex survey variance estimation
“Taylor-Series”aka
“Linearized” variance estimation
“Bootstrap”
Includes tools such as bootstrap weights
![Page 30: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/30.jpg)
30
Bootstrapping approaches• Sampling variability “observed” not
calculated from a fixed formula– Felt to reflect “true” sampling variability, – Chance alone if survey really repeated an infinite
number of times• Virtually free of assumptions
– Tends to be more appropriate and conservative • Very broadly applicable
– E.g., to smaller sample sizes– Sometimes to analyses that other software can’t do
![Page 31: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/31.jpg)
BootstrappingCustom-bootstrapping
• Advanced programming • Draw many (e.g., 1000)
samples from your overall N– Respect strat and clustering– Reweight each time– Save 1000 point estimates
• Variance in 1000 estimates is new corrected variance
Bootstrap weights files
• (example StatCan) • Resampling done once to
produce a set of resampling weights– 1000 weights per observation
• Point estimate calculated once with each weight var (1000 times)
• Variance within 1000 estimates is new variance31
![Page 32: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/32.jpg)
32
Taylor SeriesSoftware uses complex linear
equations to calculate corrected variance for every estimate
• Requires assumptions about data !–Eg., pretty large sample sizes
• Very difficult for user to know:–when limits are being pushed
• Need to tell software full sampling design
![Page 33: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/33.jpg)
33
Software optionsEpi Info Linearized estimation only with very limited analysis options
NB: use only procedures for surveys
SPSS Linearized estimation only (most recent versions may add!)Several analyses availableNB: use only the stand-alone module for complex surveys
Stata Linearized or BS Weights (called via BRR routine)Good range of ‘canned’ complex analysesNB: use the ‘svy’ commands provided
SAS Linearized: means, prop. linear and logistic (more in v10)NB: use only “PROCSURVEY___” commands
Wesvar Linearized or BS Weights (called via BRR routine)Good range of ‘canned’ complex analyses
Statistics Canada Bootvar
BS Weights + bonuses: CV and suppression rulesSomewhat limited analysis options (can request more)NB: programs are macros for SAS or SPSS
![Page 34: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/34.jpg)
34
Tell your software1. Clustered sampling
Correct method WIDENS 95%CIs2. Stratification
Correct method might narrow CIs (a bit)3. Weights
Correct method WIDENS CIs 4. Finite population correction
Never allow this to shrink your CIs
![Page 35: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/35.jpg)
Epi / public health norms:
• Always use population-weighted analyses• Only these are sure to reflect the actual pop
• Never use the “finite population correction”• Well, it’s bloody unlikely• Small samples from small true groups are tough,
statistically; ‘nuff said.
• Always use vetted COMMANDS specifically designed for complex samples
35
![Page 36: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/36.jpg)
36
Using “Taylor-series” type software
1) Use syntax (or dialogue boxes) to declare:
• Weight variable• Stratification variable• Group unit for cluster sampling
– Primary sampling unit (PSU)• Usu. ignore requests for finite population info
2) Run your analysis using ONLY special commands for complex samples
![Page 37: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/37.jpg)
Software specific
• SAS – proc survey commands– Declare strata, weights, cluster for the first
sampling stage– Options are within each proc statement
• Stata – svy-utilities– Can set design options once – Can include all stages, separate post-
stratification weights, and standardization weights
37
![Page 38: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/38.jpg)
Software specific
• SPSS – only separate CS module!– Possibly least intuitive
• Set-up profile, then analyse– Read examples etc – Allows multi-stage– DO ensure first stage set at “with
replacement”
38
![Page 39: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/39.jpg)
Sampling method jargon
• “Sampling with replacement”• In theory done with SRS• Not actually done (we don’t interview twice)• Sampling WOR (without…)
• More conservative assumption is to pretend it was “WR” and from a theoretically infinitely large potential sample
39
![Page 40: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/40.jpg)
Selecting your procedures
• Ratio commands• Create dummy vars for numerator and
denominator, then use to calculate proportions
• Proportion and table commands• Act like table analyses, varying niceness
40
![Page 41: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/41.jpg)
Selecting your procedures
• Means commands• Obviously for continuous vars• ALSO COMMON default for proportions• Try recoding 0/1 vars as 0/100 and spit out %s• Taylor series is ‘large sample technique’ so using
large sample analysis to get mean (and limits) for binary vars as continuous is consistent.
41
![Page 42: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/42.jpg)
Total commands?Wgtd % of all obs’ns
Yes 40%NO 40%DK 20%
Weighted totals:Yes 40,000No 40,000DK 20,000
Wgtd % of valid responsesYes 50%No 50%….
42
Are you happy reporting this as the population total?Alternative is to apply percent estimate (and its upper and lower limits) and use this to estimate pop numbers from pop denominator.
![Page 43: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/43.jpg)
Survey regress/ survey logist
• Commands are least weird to look at
• A big challenge is that you can’t use favourite tests for adding/dropping vars– Likelihood ratio tests are now N/A– Have to use Wald tests to test hypotheses
about coefficients– Some come on output; may need custom
tests 43
![Page 44: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/44.jpg)
Additional stuff with Stata
• Can include extra sets of weights– Post-stratification weights– Standard pop values for standardization
44
![Page 45: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/45.jpg)
Sub-group analyses• Survey stats all about LARGE samples
• Many PSUs, many people per PSU• Analysis of small subsets can lack precision and result in
‘bad samples’
• Probably less harm when studying narrow age group (for example)
• People still come from lots of PSUs
• Risky to study sub-geography• Too few PSUs, unless survey engineered for that level of
geography
• Use “domain” option in commands• Not “if” or “where”• There are limits 45
![Page 46: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/46.jpg)
Privacy and precision
![Page 47: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/47.jpg)
Rules for release or suppression of data
• Always use confidence intervals• Apply rules to suppress estimates that lack
minimum precision– E.g., Statistics Canada
• Minimum observations in numerators• Coefficient of Variation or Relative Standard Error
cut-points (warnings and ‘do not release’)
• Rules for confidentiality– Usually 5+ minimum obs per cell– Suppress zeros cells 47
![Page 48: Complex Survey Analysiscore.apheo.ca/resources/events/2010/Session2C - Bondy.pdf · Complex Survey Analysis 2010 Workshop of the Association of Public Health Epidemiologists of Ontario](https://reader034.fdocuments.us/reader034/viewer/2022050413/5f89d107b25e9b47901f3856/html5/thumbnails/48.jpg)
The perennial FAQs
• When/why do I have to use complex survey software?
• E.g., I have no clusters, just weights
• When/why do I have to bootstrap instead of using SAS/SPSS/Stata?
• Others?
48