Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001.

Post on 13-Dec-2015

222 views 3 download

Tags:

Transcript of Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001.

Survey MethodologyEPID 626

Sampling, Part IIManya Magnus, Ph.D.

Fall 2001

Lecture overview

• Comments about Assignment I

• More sampling techniques

• Sampling error

• Sample sizes

Comments about Assignment I• Late policy• Location of mailbox• Randomization vs. random selection• Validity, reliability• Sampling frames• Physician responses=?=“gold standard”• Research questions vs. survey

questions• Registering for class

Comments about Assignment I

• Grading

Looked for completeness in answering questions, care in discussion of survey, effort, basically correct information, not just cut-n-paste, synthesis.

• Questions about grade: email manyadm@tulane.edu

Comments about Assignment I

• Grading:– ++ 90-100%– + 80-89% 70-79%– - 60-69%– -- <60%– 0 not turned in

Random digit dialing (1)• Delineate the geographic boundaries of the

sampling area• Identify all of the exchanges used in the

geographic area• Identify the distribution of prefixes with the

sampling area– Example: There may be 8 exchanges, but you

may find that 3 of them are used for nearly two-thirds of residential lines.

Random digit dialing (2)• You may stratify based on the

distribution of prefixes– Ex. Take more samples of the 3 exchanges

that account for the most residential lines

• Try to identify vacuous suffixes– These are suffixes not yet assigned or

assigned in large groups to a business– Usually consider suffixes in 100s

• ex. 0000-0099, 0100-0199

Random digit dialing (3)

• May randomly select the four-digit suffixes – ex. use a random-numbers table

• Alternatively, you may use a plus-one approach– When you reach residence, use the

number as a seed, and add fixed digits (one or two) to get the next sample

Random digit dialing (4)• Provides a nonzero chance of reaching any

household within a sampling area that has a telephone line regardless of whether the number is listed

• Is the probability of reaching every household equal?– No. Households with more than one phone line will

have a greater probability than households with one phone line.

– Adjust for unequal probability by weighting

Random Digit Dialing (5)

• Advantages: Inexpensive and easy to do

• Disadvantages: 1. Large number of unfruitful calls2. Will exclude individuals without phones3. May be difficult to ascertain geographic area

Sampling distributions

• The central limit theorem: In a sequence of samples of a population, for a particular estimate (say a mean), there will be a normal distribution around the true population value

• As sample size increases, distribution becomes increasingly normal

• This variation around the true value is the sampling error—it stems from the fact that, by chance, samples may differ from the population as a whole.

• The larger the sample size and the less variance of what is being measured, the more tightly the sample estimates will “bunch” around the true population value, and the more accurate the sample-based estimate will be.

Example (1) (adapted from Babbie)

• Survey at TUSPHTM• Approval of new Lundi Gras holiday• Dichotomous outcome:

approve/disapprove• Survey population—aggregation of

students• Sampling frame—student list• Random sample of students;

representative sample of student body

Example (2) (adapted from Babbie)

• Extremes and all combinations in between possible: 100% approve100% disapprove, 1% approve, 99% disapprove, etc..

• First random sample: 48% approve, 52% disapprove

• Second random sample: 20% approve, 80% disapprove

• And so forth

Example (3) (adapted from Babbie)

• What results from this exercise, is a distribution of samples, or a sampling distribution.

• As more independent random samples are selected, the sample statistics obtained will be distributed around true population value in a known way.

Example (4) (adapted from Babbie)

• They will be clustered about the true value within a certain range.

• The range is given by the standard error.• We do not know if the value in our sample

is within the range, just that if many similar samples were taken in the same fashion, X% would fall within the specified range; this one may or may not.

Example (5) (adapted from Babbie)

• Probability theory says that 68% of samples will fall within one standard deviation of the parameter and 95% will fall within two standard deviations of the parameter

• Increasing confidence with increasing range

• Note difference between standard errors & standard deviations

Standard error of a mean

n

VarSE

Standard error of a mean

• The standard deviation of the distribution of sample estimates of the mean that would be formed if an infinite number of samples of a given size were drawn.

Proportions

• Mean of a two-value (binomial) distribution

• Var of a proportion = p(1-p)

• So the

n

ppSE

)1(

Table 2.1Confidence Ranges for Variability

Attributable to Sampling

• Trends

• If sample size=75 and p=0.20,

)29.0,11.0(%95

9092.02*)046188.0(

046188.075

16.0

75

)80.0)(20.0(

CI

SE

Confidence intervals

• In a survey of 100 respondents, 20% say yes. What is the confidence interval for a 95% confidence level?

• In a survey 250 respondents, 10% say yes. What is the confidence interval for a 95% confidence level? What if 50% said yes?

• In a survey of 100 respondents, 20% say yes. What is the confidence interval for a 95% confidence level?

• Interval is 8.

• 95% CI=(12%, 28%)

• In a survey 250 respondents, 10% say yes. What is the confidence interval for a 95% confidence level? What if 50% said yes?

• Interval is about 3.8.• 95% CI is about (6.2%, 13.8%)• If 50% said yes, CI is about

(43.7%, 56.3%)

Sampling error and sampling strategy

• SRS is approximated by the standard error• Systematic sampling

– If not stratified, sampling error is the same as in SRS.

– If stratified, errors are lower than those associated with SRS for the same size for variables that differ (on average) by stratum, if rates of selection are constant across strata.

Sampling error and sampling strategy (2)

• Unequal rates of selection decrease sampling error for oversampled groups.

• It will generally produce sampling errors for the whole sample that are higher than those associated with SRS of the same size for variables that differ by stratum.

Sampling error and sampling strategy (3)

• Clusters will produce sampling errors that are higher than SRS for the same size for variables that are more homogenous within clusters than in the population as a whole.

• You must look at the nature of the clusters to evaluate the effect on the sampling error.

Caveats

• Sampling error is in no way the only source of error.

• Non-sampling error, bias, error resulting from incorrect specification of sampling frame, etc., etc., are also sources of error.

• Often the latter are more insidious as they are seldom quantifiable

• Total survey approach useful in this regard.

Sample size (1)

• Very important to consider prior to undertaking study

• Consult a biostatistician

• Many references in texts, available spreadsheet, stat programs, EpiInfo, etc.

• Never feel bad asking for assistance

Sample size (2)• What not to do

1. Sample size does not rely on the fraction of the population that is sampled. Nor does it depend on the size of the population you want to describe.

2. Sample size should not be decided solely based on what others have previously done.

3. Sample size should not be based on the desired level of precision for just one estimate.

Sample size (3)

• What to do– develop analysis plan– desired precision of estimates for

subgroups, – consider research questions– affordability, – feasibility, – and to some extent, previous studies

Sample size (5)

• Parameters required to calculate sample size:– Null hypothesis—what precisely are you

asking/testing? [Pr(type I error)] [Pr(type II error)]—usually included as 1-

=power– What difference between groups do you want to

observe? (e.g., 1- 2)

– What is a good estimate of variance in population?

Sample size (6)

• How sample size works—some examples

Sample size (7) sample size, power

Group AGroup B

Sample size (8) sample size, power

A:

B:

Sample size (9) variability, power

A:

B:

Sample size (10) variability, power

A:

B:

Non-response (1)

• Very big issue

• Source of non-sampling error

• Can lead to bias, uninterpretability of results

• Violates whole point of probability sample, yet unavoidable

Non-response (2)

• Issue in probability as well as non-probability samples

• Exists on many levels

Non-response (3)

Whole sample

Reached Not reached

Non-response (4)

Reached

Can participate

Cannot participate

Non-response (5)

Reached

Enrolled Refused

Non-response (6)

Participated

Answer individual question

Did not answer

individual question