Survey Sampling Formula Sheet

13
Survey Sampling Formula Sheet Prepared by Robert D’Agostino [email protected] Formulas and Descriptions Obtained from: Elementary Survey Sampling 7 th Edition Richard L. Scheaffer ; William Mendenhall, III ; R. Lyman Ott ; Kenneth G. Gerow Population Parameters / Sample Statistics ∑( ) ∑( ) ∑( ) ∑( ) [∑( ) (∑ ) ] Simple Random Sampling If a sample size of size is drawn from a population of size such that every possible sample of size has the same chance of being selected, the sampling procedure is called simple random sampling. Estimators / Bound on the error: ̂ ̂ ∑ ̂ ( ) ( ) ( )

description

Formula sheet for Elementary Survey Sampling 7th Edition by Richard L. Scheaffer ; William Mendenhall, III ; R. Lyman Ott ; Kenneth G. Gerow.Brief descriptions of some notation/estimators are included in the document. There may be typos, and formatting could perhaps be better. May be useful for someone taking an introductory course in survey sampling.Also I would like to note that some of the formulas seem to not display correctly when this document is viewed in a web browser, but they appear fine in the downloaded pdf.

Transcript of Survey Sampling Formula Sheet

Page 1: Survey Sampling Formula Sheet

Survey Sampling Formula Sheet Prepared by Robert D’Agostino

[email protected]

Formulas and Descriptions Obtained from: Elementary Survey Sampling 7th Edition

Richard L. Scheaffer ; William Mendenhall, III ; R. Lyman Ott ; Kenneth G. Gerow Population Parameters / Sample Statistics

∑( )

∑(

)

∑( )

∑( )

[∑(

)

(∑

)

]

Simple Random Sampling If a sample size of size is drawn from a population of size such that every possible sample of size has the same chance of being selected, the sampling procedure is called simple random sampling. Estimators / Bound on the error:

∑ ⁄

( ) (

)

(

)

Page 2: Survey Sampling Formula Sheet

√ ( ) √(

)

( ) ( ) (

)

√ ( ) √(

)

( ) (

)

√ ( ) √(

)

Sample size required to estimate with a bound on the error of estimation B:

( )

( )

Sample size required to estimate with a bound on the error of estimation B:

( )

( )

Sample size required to estimate with a bound on the error of estimation B:

( )

( )

If no estimate of is available, substitute to obtain a conservative sample size which will likely be larger than what’s required. Stratified Random Sampling A stratified random sample is one obtained by separating the population elements into non-overlapping groups, called strata, and then selecting a simple random sample from each stratum.

Notation: # of strata # of sampling units in stratum , ∑ .

( )

( )(

)

Page 3: Survey Sampling Formula Sheet

( ) ( ) ∑

( ) (

)

Approximate sample size required to estimate µ or with a bound B on the error of estimation:

Where is the fraction of observations allocated to stratum . when estimating and when estimating . Approximate allocation that minimizes cost for a fixed value of ( ) or minimizes ( ) for a fixed cost:

( √

∑ √

)

And the total sample size for optimal allocation that minimizes cost for a fixed value of ( ) or minimizes ( ) for a fixed cost:

(∑ √

)(∑ √

)

Neyman Allocation:

(

)

(∑

)

Special Case of Neyman Allocation (Proportional Allocation): If the stratum variances are approximately equal, the Neyman Allocation formulas reduce to:

(

) ( )

Estimation of the Population Proportion:

( )

( )

( ) (

)

Page 4: Survey Sampling Formula Sheet

Approximate sample size required to estimate with a bound B on the error of estimation:

Approximate allocation that minimizes cost for a fixed value of ( ) or minimizes ( ) for a fixed cost:

( √

∑ √

)

Stratification after Selection of the Sample:

( ) ∑( )

( ) (

) ∑(

)

(

)

( )

(

)∑

∑(

)

Ratio, Regression, and Difference Estimation

( ) (

)(

)(

)

∑ ( )

Note: We can estimate

by if needed.

( ) ( ) (

)

( ) ( ) (

)

Sample size required to estimate R with a bound on the error of estimation B:

Page 5: Survey Sampling Formula Sheet

Sample size required to estimate with a bound on the error of estimation B:

Sample size required to estimate with a bound on the error of estimation B:

Ratio Estimation in Stratified Random Sampling: Separate ratio estimator:

Estimate the ratio within each stratum by ⁄ and then form a weighted average of these

separate estimates as a single estimate of population ratio denoted by .

( ) (

)

∑( )

( ) ∑( )

( ) (

)

Combined ratio estimator: We estimate by the stratified estimate as usual, and then estimate by , leading to the

combined ratio estimate denoted by .

( )

∑(

)

( ) (

)

Page 6: Survey Sampling Formula Sheet

∑( )

Regression Estimation:

( )

∑ ( )( )

∑ ( )

( ) (

) (

)(

)∑( ( ) )

(

) (

)

Difference Estimation:

( )

( ) (

) (

) (

)∑( )

Systematic Sampling A sample obtained by randomly selecting one element from the first k elements in the frame and every kth element thereafter is called a 1 in k systematic sample with a random start. Similarly, one can perform a repeated systematic sampling procedure, also known as a r in k systematic sample. In this procedure, r random starts are selected.

( ) (

)

These formulas are the exact same as simple random sampling, similarly they are the same for proportions and totals as well. Sample size calculations are done using the simple random sampling formulas also. Cluster Sampling A cluster sample is a probability sample in which each sampling unit is a collection, or cluster, of elements. In these formulas # of clusters in the population, # of clusters selected in a SRS, # of elements in cluster , , average cluster size for the sample , # of elements

in the population ,

average cluster size for the population, total of all observations in

cluster .

Page 7: Survey Sampling Formula Sheet

( ) (

)

∑ ( )

( ) ( ) (

)

Estimator of the population total, when is unknown:

( ) ( )

(

)

∑ ( )

Approximate sample size required to estimate with a bound B on the error of estimation:

Approximate sample size required to estimate using with a bound B on the error of estimation:

Approximate sample size required to estimate using with a bound B on the error of estimation:

Estimator of the population proportion : Let denote the total number of elements in cluster that possess the characteristic of interest.

Page 8: Survey Sampling Formula Sheet

( ) (

)

∑ ( )

Sample size required to estimate with a bound B on the error of estimation:

Cluster Sampling Combined with Stratification:

( )

(

)

∑(

)

Cluster Sampling with Probabilities Proportional to Size:

( )

( )∑( )

( )

( )∑( )

Two-Stage Cluster Sampling A two-stage cluster sample is obtained by first selecting a probability sample of clusters, and then selecting a probability sample of elements from each sampled cluster. For this chapter we denote # of clusters in the population, number of clusters selected in a simple random sample, # of elements in cluster , # of elements selected in a simple random

Page 9: Survey Sampling Formula Sheet

sample from cluster , # of elements in the population ,

= average cluster size for the

population , the observation in the sample from the cluster ,

∑ sample

mean for the cluster.

(

)∑

( ) (

) (

)

∑(

( ) (

))

∑( )

∑( )

( ) ( ) (

)(

)

∑(

( )(

))

Ratio Estimation (Use when is unknown):

( ) (

) (

)

∑(

( ) (

))

∑( )

Estimation of a Population Proportion:

( ) (

) (

)

∑(

( ) (

))

∑( )

Page 10: Survey Sampling Formula Sheet

Sampling Equal-Sized Clusters: Suppose that each cluster contains elements; that is . In this case, it is common to take samples of equal size from each cluster, so that Under these conditions we can obtain estimates:

∑∑

( ) (

)

(

) (

)

∑( )

( )∑∑( )

Two-Stage Cluster Sampling with Probabilities Proportional to Size: Since the number of elements in a cluster may vary from cluster to cluster, it may be advantageous to sample clusters proportional to their sizes. These estimates are for two stage cluster sampling in which the first stage sampling is carried out with probabilities proportional to size.

( )

( )∑( )

( )

( )∑( )

Estimating the Population Size Direct sampling: First, a random sample of size is drawn from the population. At a later date a second sample of size is drawn. Let denote the number of tagged individuals observed in the second sample.

( ) ( )

Inverse sampling:

Page 11: Survey Sampling Formula Sheet

First, an initial sample of individuals is drawn, tagged, and released. Later, random sampling is conducted until exactly tagged animals are recaptured. If the sample contains individuals, the proportion of tagged individuals is estimated by . Note that is fixed, and is random.

( ) ( )

( )

Choosing Sample Sizes for Direct and Inverse Sampling:

Let

, and

. It turns out that

( )

. Thus if given an estimate of N, and targeted

value of variance, one can determine either of the sampling fractions, given a choice for the other. Estimating Population Density and Size from Quadrat Samples:

Let denote the number of elements in quadrat , ∑ # of total elements in the

population (having combined area ) , ⁄ density of elements. Suppose the element counts are obtained from independently and randomly selected quadrats, each of area .

( )

( ) ( ) (

)

Estimating Population Density and Size from Stocked Quadrats: A quadrat that contains species of interest is said to be stocked. For a sample of quadrats each of area , from a population of area , let denote the number of sampled quadrats that are NOT stocked.

(

) (

)

( )

( )

( ) ( )

( )

Adaptive Sampling: Let denote the number of cells in a network, and denote the total count of points of interest.

Page 12: Survey Sampling Formula Sheet

( ) (

)

Supplemental Topics Interpenetrating Subsamples: An experimenter is interested in obtaining information from a simple random sample of people selected from a population of size . She has interviewers available to do the fieldwork, but the interviewers differ in their manner of interviewing and hence obtain slightly different responses from identical subjects. A good estimate of the population mean can be obtained by randomly dividing the -sample elements into subsamples of elements each and assign one interviewer to each of the subsamples. We consider the first subsample to be a simple random sample of size selected from the elements in the total sample. The second subsample is then a simple random sample of size selected from the remaining ( ) elements. This process is continued until the elements have been randomly divided into subsamples. The subsamples are called interpenetrating samples. Now,

let denote the observation in the sub-sample, where and .

( ) (

)

∑( )

Estimation of Means and Totals over Subpopulations: Let denote the number of elements in the population, and the number of elements in the subpopulation. A simple random sample of elements is selected from the population of elements.

Let denote the number of sampled elements from the subpopulation. Let denote the

sampled observation that falls in the subpopulation. We will consider the sample mean for elements from the subpopulation.

( ) ( )

Page 13: Survey Sampling Formula Sheet

∑( )

( ) ( )

( )

( ) (

)

Estimator of the subpopulation total when is unknown:

( ) (

)

Here

is the sample variance calculated from an adjusted sample consisting of replacing all the observations NOT from the subpopulation of interest with zeros. The sample variance is then calculated from all the “observations”. Random-Response Model: It allows respondents to respond to sensitive issues (such as criminal behavior or sexuality) while maintaining confidentiality. Designate the people in the population with and without the characteristic of interest as groups A and B, respectively. Let p be the proportion of people in group A with the characteristic of interest. We wish to estimate this parameter by starting with a stack of cards that are identical except that a fraction, , are marked A and the remaining fraction, ( ), are marked B. A simple random sample of size n is selected from the population. Each person is asked to randomly draw a card from the deck and to state “yes” if the letter on the card agrees with the group to which they belong to, or “no” if the letter does not correspond to the group they belong to. The card is replaced before the next person draws, and the interviewer does not see the cards, just the responses of yes and no. Let be the number of people in the sample who responded yes. We have the following unbiased estimator of :

( )( ) (

)

( )

( ) ( ) (

( ) )(

) ( ) (

)