SJS SDI_181 Design of Statistical Investigations 18 Sample Size Determination Stephen Senn.
SJS SDI_161 Design of Statistical Investigations Stephen Senn Random Sampling I.
-
Upload
destiny-thorpe -
Category
Documents
-
view
220 -
download
1
Transcript of SJS SDI_161 Design of Statistical Investigations Stephen Senn Random Sampling I.
SJS SDI_16 1
Design of Statistical Investigations
Stephen Senn
Random Sampling I
SJS SDI_16 2
Simple Random Sample
Definition
If a sample of size n is drawn from a population of size N in such a way that every possible sample of size n has the same chance of being selected the sampling procedure is called simple random sampling. The sample thus obtained is called a simple random sample.
Scheaffer, Mendenhall and Ott,
Elementary Survey Sampling, Fourth Edition
SJS SDI_16 3
Typical Use of Sample
• The typical use of a sample is to say something about a population mean or proportion
• Point estimate– mean, proportion
• Confidence interval– 95%, 99% etc
• Occasionally we are interested in estimating totals– Total weight, total value etc.
SJS SDI_16 4
sample
y
0 20 40 60 80 100
40
45
50
55
60
95% CI for mean for 100 samples of size 10 from n(50,4)
SJS SDI_16 5
With or without replacement?The above definition is slightly more general than that which we encountered previously. It allows for sampling without replacement.
Our previous definition stressed independence. Strictly speaking, for any finite population draws are not quite independent if they do not occur without replacement.
Why is this? Consider a sample of size two drawn from N without replacement. There are ! ( 1)
2 2!( 2)! 2
N N N N
N
ways of choosing the sample. Hence, the probability of a given sample being chosen is 2
( 1)N N
SJS SDI_16 6
But an independence argument would produce a different answer. The probability of a given item being chosen is 1/N. Hence the probability of two given items being independently chosen in any order is
2
1 1 2 22
( 1)N N N N N
Note, however, that provided N is large compared to n, the distinction between sampling with and without replacement is unimportant.
This is fortunate, since correcting for sampling without replacement from finite populations involves a lot of tedious but elementary algebra!
SJS SDI_16 7
How Not to Draw a Simple Random Sample
• Do not use own judgement– This is haphazard sampling– Subject to psychological bias– Human beings are not good randomisers
• Do not use systematic sampling– There may be cyclic patterns or other trends in
the population
SJS SDI_16 8
The Swiss Lottery
• Draw 6 from 45– 45C6=8,145,060 combinations
• Professor Hans Riedwyl’s study of a given draw– 16,862,596 tickets sold– approximately two tickets per choice– There were over 5000 combinations that were
chosen more than 50 times!
SJS SDI_16 9
The UK Lottery• This is a 6/49 lottery
• In the first 282 draws – average jackpot £2 million– maximum £22.6 million
• Draw 9 January 1995– 133 people bought the winning combination
• 7,17,23,32,38,42
– £122,510 each Source John Haigh, Taking Chances, Oxford
SJS SDI_16 10
Random Pattern?1 2 3 4 5
6 7 8 9 1011 12 13 14 15
16 17 18 19 20
21 22 23 24 2526 27 28 29 30
31 32 33 34 35
36 37 38 39 40
41 42 43 44 4546 47 48 49
Random, from the point of view of the lottery machine but evidently not to the punter!
SJS SDI_16 11
How to Choose a Random Sample
• Sampling frame of N population units with each item identified by a unique number 1 to N
• ‘Generate’ random number between 0 and 1– Using computer, random number table, randomising device
• Multiply by N and round up• Select population member indicated• Repeat n times
– For sampling without replacement draw again if number is chosen twice
SJS SDI_16 12
The S-PLUS Approach> #To illustrate different approaches to samplingN <- 20> # Size of populationn <- 10> # Size of sampleIdentify <- c(1:N)> #Population identifiersIdentify [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20> #Sample with replacementsort(sample(Identify, size = n, replace = T)) [1] 6 7 7 7 12 14 16 16 19 19> #Sample without replacementsort(sample(Identify, size = n, replace = F)) [1] 2 8 11 12 13 14 16 17 18 20
SJS SDI_16 13
Finite Population Correction Factors
In practice we very rarely carry out sampling with replacement.
EXCEPTION The bootstrap - re-sampling investigation of properties of statistics.
However, often populations are large compared to samples. Hence we can behave as if draws were independent. The theory of sampling with replacement applies. In the next few slides we consider what happens when the population is not large.
y1,y2,…yn simple random sample from a population of values u1,u2,…uN.
SJS SDI_16 14
2 2
1 1
1 1
2
2
2 12
1
2 2,
2
( ) / , ( ) ( ) /
1 1( ) ( )
1
( ) [( )( )] ( ) ( ) ( ) ( )
1 1
( 1)
N N
i i i ii i
n n
ii i
N
i Ni
ii
i j i j i j i j i j
N
i ji j
E y u N V y u N
E y E yn n
uu
N N
Cov y y E y y E y y E y E y E y y
u uN N N
2 2
1 1
22
2
1 1
1
1 1
1
1 1
1
N N Ni j
i ii i j i
N N
i i Ni i
ii
u uu u
N N N
u u
uN N N
This section based closely on Scheaffer, Mendenhall and Ott
SJS SDI_16 15
22
2
1 1
1
22
1 1
2
2
1 1
2 2
1
1 1
1
1 1 ( 1)
1 ( 1)
1 1 1
1 ( 1)
1 1( )
( 1) 1
N N
i i Ni i
ii
N N
i ii i
N N
i ii i
N
ii
u u
uN N N
N Nu u
N N N N
u uN N N N
u uN N N
We can use this fact to find the variance of y
SJS SDI_16 16
22
1 1
22
2
22
2
2
1 1( ) 2 ( , )
1
1
1 2 ( 1)
1 2
1
n n
i i ji i i j
i j
V y V y Cov y yn n
nn N
n nn
n N
N n
n N
SJS SDI_16 17
Variance estimation
22 2
1 1
2 2
1
2 2
1
2 2
1 1( ) ( ) ( ) ( )
1 1
1( ) ( )
1
1( ) ( )
1
1 1( )
1 1 1
n n
i ii i
n
ii
n
ii
E s E y y E y yn n
E y n yn
E y nE yn
N nn nV y n n
n n N
2
n
SJS SDI_16 18
When Can we Ignore FPCFs?
• Large population relative to sample– N/n is large
• The sample does not form part of the population for which we are issuing the prediction– Destructive sampling of manufacturing output
• From now on we shall ignore FPCFs
SJS SDI_16 19
22
22
22
2
1
1 1
1 1 1
1
1
( )1
N nn n
n N n
N n Nn
n N N
N n s N n NE
N n N n N
N nV y
N n
SJS SDI_16 20
Error Bounds and Sample SizeIt is traditional to use error bounds of two standard errors.
This is a way of giving an impression of the precision of the survey.
The desired error bound,, can be used to fix the size of the sample.
2
2
2
2
4
n
n
n
This is the appropriate formula for the sample size given a desired bound on the mean
SJS SDI_16 21
Questions
• How many people do you have to have in a room before the probability that at least two share the same birthday is at least 1/2?
• Suppose we want to estimate the total in a population rather than the mean. What is the error bound on the total?
• What is the error bound for a population proportion?