Stats Lecture 06 Sampling Distributions
-
Upload
katherine-sauer -
Category
Documents
-
view
228 -
download
0
Transcript of Stats Lecture 06 Sampling Distributions
-
8/3/2019 Stats Lecture 06 Sampling Distributions
1/51
Sampling Distributions
for
Means and Proportions
Quantitative Methods for Economics
Dr. Katherine Sauer
Metropolitan State College of Denver
-
8/3/2019 Stats Lecture 06 Sampling Distributions
2/51
Chapter Overview:
I. Sampling Distributions: Means
II. The Central Limit TheoremIII. The Normal Distribution
IV. Sampling Distributions: Proportions
V. Desirable Properties of Estimators
-
8/3/2019 Stats Lecture 06 Sampling Distributions
3/51
We can use sample statistics to inferthings about thepopulation
parameters.
Sometimes the sample statistic (e.g. mean) will be close to thepopulation parameter, sometimes it will not.
Recall: Greek letters are used for the population, English letters are
used for the corresponding sample characteristic.
-
8/3/2019 Stats Lecture 06 Sampling Distributions
4/51
Lets start with some review.
Suppose our population consists of five numbers.
3, 1, 5, 6, 2
Calculate the population mean, variance, and standard deviation.
= 3.4
2 = 3.44
= 1.8547
-
8/3/2019 Stats Lecture 06 Sampling Distributions
5/51
I. Sampling Distributions: Means
Suppose we want a sample of size 2. In the table, list all possiblecombinations of samples of size 2.
Repeat for a sample of size 3.
Sample of 2 Sample of 3
3,1 3,1,5
3,5 3,1,6
3,6 3,1,2
3,2 3,5,6
1,5 3,5,2
1,6 3,6,2
1,2 1,5,65,6 1,5,2
5,2 1,6,2
6,2 5,6,2
-
8/3/2019 Stats Lecture 06 Sampling Distributions
6/51
Now, calculate each samples mean.
Sample of 2 Mean Sample of 3 Mean
3,1 2 3,1,5 3
3,5 4 3,1,6 3.33
3,6 4.5 3,1,2 2
3,2 2.5 3,5,6 4.67
1,5 3 3,5,2 3.33
1,6 3.5 3,6,2 3.67
1,2 1.5 1,5,6 4
5,6 5.5 1,5,2 2.67
5,2 3.5 1,6,2 3
6,2 4 5,6,2 4.33
Notice that the sample means vary from the population mean.
from 1.5 to 5.5 for sample of n=2
from 2 to 4.67 for sample of n=3
Depending on the sample chosen, the sample mean could be a
good estimate of the population mean, or not.
-
8/3/2019 Stats Lecture 06 Sampling Distributions
7/51
Lets calculate the mean of all the sample means and the standard
deviation of the sample means.
x x
For sample of size 2: For sample of size 3:
mean 3.4 3.4standard dev. 1.1358 0.7572
The mean of all the sample means is the same as the populationmean.
The standard deviation of all the sample means decreases as the
sample size increases.
-
8/3/2019 Stats Lecture 06 Sampling Distributions
8/51
The standard deviation of all the sample means is called the
standard error of the mean.
It can be calculated directly from the samples (as we just did) or
by using the formula when the population standard deviation is
known:
1
N
nN
nx
1
N
nNis the finite population correction factor.
When N is large, this factor is approximately 1.
- if the sample size is less than 5% of the
population size, you dont need the
correction factor (finite populations)
-
8/3/2019 Stats Lecture 06 Sampling Distributions
9/51
The difference between the population mean and its point estimate
is called the sampling error.
If point estimates are the same as the population parameters, there
is no sampling error and the standard error is zero.
-
8/3/2019 Stats Lecture 06 Sampling Distributions
10/51
Aprobability distribution is a list of every possible outcome with the
corresponding probability.
For our example, there are 10 possible samples of size 2. The
probability of each sample being selected is 0.10.
Lets plot the probability distribution of our sample means.
Step 1: Construct a frequency distribution table.
- 3 intervals is probably appropriate
- 1.5 to less than 3, 3 to less than 4.5, 4.5 to less than 6
Interval Frequency
1.5< x < 3 3
3 < x < 4.5 5
4.5 < x < 6 2
-
8/3/2019 Stats Lecture 06 Sampling Distributions
11/51
Step 2: Calculate the relative frequencies.
probability of particular sample x frequency
Interval Frequency Relative Frequency1.5< x < 3 3 0.3
3 < x < 4.5 5 0.5
4.5 < x < 6 2 0.2
Step 3: Plot the probability histogram.
0
0.1
0.2
0.3
0.4
0.5
0.6
1.5< x < 3 3 < x < 4.5 4.5 < x < 6
RelativeFreq
uency
Sample Mean
Distribution of Sample Means
-
8/3/2019 Stats Lecture 06 Sampling Distributions
12/51
0
0.1
0.2
0.3
0.4
0.5
0.6
1.5< x < 3 3 < x < 4.5 4.5 < x < 6
RelativeFrequency
Sample Mean
Distribution of Sample Means
Notice, even for this very small population and sample size, the
probability distribution is tending toward the bell shape of the
Normal Distribution.
-
8/3/2019 Stats Lecture 06 Sampling Distributions
13/51
II. The Central Limit Theorem says that the probability distribution
of the sample means
for samples of size 30 or greater
selected from any population whose mean and variance
are known
approaches a Normal distribution
with mean and standard deviation .
The distribution of sample means for sample
sizes ofn > 30.
n
x
nNx
,~
-
8/3/2019 Stats Lecture 06 Sampling Distributions
14/51
In addition, the Central Limit Theorem applies for small samples
from Normal populations, when the population variance is
known.
for samples ofany size from a Normal
distribution with known variance.
The Central Limit Theorem allows us to calculate
- probabilities regarding sample means- the limits that contain various percentages of sample
means
( later it will also help us construct confidence intervals)
nNx
,~
-
8/3/2019 Stats Lecture 06 Sampling Distributions
15/51
III. The Normal probability distribution
It has long been recognized that large numbers of measurements,
when sorted and plotted in a histogram, tend to look like a bell-shaped form.
This bell-shaped curve is the Normal probability distribution
curve.
Formula:
This formula would trace out a bell-curve, symmetrical around
the mean of .
The area under the curve sums to 1.
- true of any probability distribution
2
2
1
2
1)(
x
exf
-
8/3/2019 Stats Lecture 06 Sampling Distributions
16/51
-
8/3/2019 Stats Lecture 06 Sampling Distributions
17/51
The probability that a random variable,X, has a value betweenx = a
andx = b is given by the area under the curve betweenx = a and
x = b.
2
21
2
1)(
x
exf
However, we actually dont need to do the integration because the
Normal curve has some special characteristics that let us find the
area from a single table.
-
8/3/2019 Stats Lecture 06 Sampling Distributions
18/51
Special properties of the Normal distribution:
1. Total area under the curve is one. (true of any probability
distribution)
2. The curve is symmetrical about the mean.
- the area to the left of the mean is 0.5- the area to the right of the mean is 0.5
3. The area under the curve between the mean and any point
depends on the number of standard deviations between the pointand the mean.
- theZ-score is the number of standard deviations
between the point and the mean
-
8/3/2019 Stats Lecture 06 Sampling Distributions
19/51
The area between the mean and a point which is one standard
deviation from the mean is 0.3413.
68.26% of the total area is within one standard deviation
The area between the mean and a point which is two standard
deviations from the mean is 0.4772.95.44% of the total area is within two standard deviations
The area between the mean and a point which is three standarddeviations from the mean is 0.4986.
99.72% of the total area is within three standard deviations
-
8/3/2019 Stats Lecture 06 Sampling Distributions
20/51
0.3413 = 34%0.3413
0.135 = 13.5%
0.0235 = 2.35%
0.0015 = 0.15%
0.135
0.0235
0.0015
-
8/3/2019 Stats Lecture 06 Sampling Distributions
21/51
The Z-score is calculated as
Z is the number of standard deviations between the point (x) and
the mean.
Calculate Z to two decimal points.
Once you have Z, use a Normal probability distribution table tofind the area under the curve.
xZ
-
8/3/2019 Stats Lecture 06 Sampling Distributions
22/51
Here is an excerpt from the table.
Ex: Z = 1.00
Area in upper
tail = 0.1587
Area between
and +
= 0.50.1587
= 0.3413
-
8/3/2019 Stats Lecture 06 Sampling Distributions
23/51
Example: Suppose the time it takes to process an email inquiry is
normally distributed with a mean time of 500 seconds and a
standard deviation of 10 seconds. What is the probability that a
selected email will be processed in more than 505 seconds?
Step 1: Sketch the curve and indicate relevant information.
Step 2: Calculate Z.
Z = 505500 = 0.510
Step 3: Look up in table.
-
8/3/2019 Stats Lecture 06 Sampling Distributions
24/51
When Z = 0.5, the area in the upper tail is 0.3085.
-
8/3/2019 Stats Lecture 06 Sampling Distributions
25/51
The probability that an email will take more than 505 seconds to
process is 0.3085.
-
8/3/2019 Stats Lecture 06 Sampling Distributions
26/51
What if instead we wanted to know the probability that processing
an email will take less than 485 seconds?
Step 1: Sketch the curve and indicate relevant information.
Step 2: Calculate Z.
Z = 485500 = -1.5
10
Step 3: Look up in table.
For Z = 1.5 area in tail is 0.0668
The probability that processing an email will take less than 485
seconds to process is 0.0668.
h if i d d k h b bili h i
-
8/3/2019 Stats Lecture 06 Sampling Distributions
27/51
What if instead we wanted to know the probability that processing
an email will take between 485 and 505 seconds?
Step 1: Sketch the curve and indicate relevant information.
Step 2: Calculate Z.
Z1 = 485500 = -1.5 Z2 = 505500 = 0.5
10 10
Step 3: Look up in the table.
For Z = 1.5, area in tail is 0.0668
For Z = 0.5, area in tail is 0.3085
Subtract from 1. 10.06680.3085 = 0.6247
-
8/3/2019 Stats Lecture 06 Sampling Distributions
28/51
Example: An importer of Herbs and Spices claims that the average
weight of packets of saffron is 20 grams. However, packets are
actually filled to an average weight of 19.5 grams with a standard
deviation of 1.8 grams. A random sample of 36 packets is selected.
Find the probability that the average weight is 20 grams or more.
In this example we are dealing with a sample of size n > 30. Well
apply the CLT and calculate the mean and standard error of thedistribution of means.
For our sample, and
nx
x
5.19 x 3.036
8.1
nx
-
8/3/2019 Stats Lecture 06 Sampling Distributions
29/51
Step 1: Sketch the curve and indicate relevant information.
Step 2: Calculate Z (using the calculated mean and std error).
Z = 2019.5 = 1.67
0.3
Step 3: Look up Z in the table.
For Z = 1.67, the area in the tail is 0.0475
This is the probability that the average weight is 20 grams or more.
-
8/3/2019 Stats Lecture 06 Sampling Distributions
30/51
Instead, lets find the lower and upper limit within which 95% of all
packets weigh.
In this case, we are dealing with the population, not the sample. Use
the population mean and standard deviation.
Step 1: Sketch the curve and indicate relevant information.
Step 2: Look up the Z that corresponds to a tail area of 0.025.
=19.5
Area in tail above
line = 0.025
Area in tail below
line = 0.025
-
8/3/2019 Stats Lecture 06 Sampling Distributions
31/51
When the area of the upper tail is 0.025, Z = 1.96.
-
8/3/2019 Stats Lecture 06 Sampling Distributions
32/51
=19.5
Area in tail above
line = 0.025
Area in tail below
line = 0.025
15.972
Step 3: Find the upper and lower limits.
Z x = number of units from the mean
1.96 x 1.8 = 3.528 grams
19.5 + 3.528 = 23.028 grams is the upper limit
19.53.528 = 15.972 grams is the lower limit
23.028
95% of the packets of saffron are between 15.972 and 23.028
grams in weight.
-
8/3/2019 Stats Lecture 06 Sampling Distributions
33/51
Instead, lets calculate the two limits within which 95% of all
average weights fall.
Now we are dealing with the sample of n = 36.
The methodology is the same as when we use the entire population,
except well use the standard error of the means instead of the
standard deviation for the population.
Step 1: Sketch the curve and indicate relevant information.
Area in tail above
line = 0.025
Area in tail below
line = 0.025
5.19 x
3.036
8.1
nx
-
8/3/2019 Stats Lecture 06 Sampling Distributions
34/51
Step 2: Look up the Z that corresponds to a tail area of 0.025.
Z = 1.96
Step 3: Find the upper and lower limits.
Z x x = number of units from the mean
1.96 x 0.3 = 0.588 grams
19.5 + 0.588 = 20.088 grams is the upper limit
19.50.588 = 18.912 grams is the lower limit
95% of the samples average weights are between 18.912 and 20.088
grams.
-
8/3/2019 Stats Lecture 06 Sampling Distributions
35/51
IV. Sampling Distributions: Proportions
Aproportion is the number of elements with a given characteristic
divided by the total number of elements in the group.ex: The proportion of people who vote in an election is
the number who vote divided by the number eligible to
vote.
X or x are the number of elements with a given characteristic.
Often times proportions are quoted as percentages.
The sample proportion is a point estimate of the population
proportion.
N
X
n
xp
-
8/3/2019 Stats Lecture 06 Sampling Distributions
36/51
Example: Suppose we have the following population of data.3, 1, 5, 6, 2
Calculate the population proportion of even numbers.
= 2 = 0.4
5
Referring back to our samples of size 2 and 3, calculate the sample
proportion of even numbers.
-
8/3/2019 Stats Lecture 06 Sampling Distributions
37/51
3,1 0 3,1,5 0
3,5 0 3,1,6 1/3 = 0.33
3,6 1/2 = 0.5 3,1,2 1/3 = 0.33
3,2 1/2 = 0.5 3,5,6 1/3 = 0.33
1,5 0 3,5,2 1/3 = 0.33
1,6 1/2 = 0.5 3,6,2 2/3 = 0.67
1,2 1/2 = 0.5 1,5,6 1/3 = 0.33
5,6 1/2 = 0.5 1,5,2 1/3 = 0.33
5,2 1/2 = 0.5 1,6,2 2/3 = 0.67
6,2 2/2 = 1 5,6,2 2/3 = 0.67
Sample
Proportion
Sample
ProportionSample of 3Sample of 2
Calculate the mean of all sample proportions for each sample size.
The mean of all the sample proportions is the same as thepopulation proportion.
For samples of size 2: p = 0.4
For samples of size 3: p = 0.4
-
8/3/2019 Stats Lecture 06 Sampling Distributions
38/51
The standard deviation of all the sample proportions decreases as
the sample size increases.
The standard error of all sample proportions is given by
(when N is large, we can omit the finite population correction
factor)
For samples of size 2:
= (0.3464)(0.8660)
= 0.29998
= 0.3
1
)1(
N
nN
np
15
25
2
)4.01(4.0
p
-
8/3/2019 Stats Lecture 06 Sampling Distributions
39/51
For samples of size 3:
= (0.2828)(0.7071)= 0.199967
= 0.2
15
35
3
)4.01(4.0
p
-
8/3/2019 Stats Lecture 06 Sampling Distributions
40/51
The list of every possible sample proportion with its probability
is called the sampling distribution of proportions.
Lets plot the probability distribution of our proportions for the
samples of size 2.
Step 1: Construct a frequency distribution table.
- we only have 3 values forp (0, 0.5, 1)
p Frequency
0 30.5 6
1 1
-
8/3/2019 Stats Lecture 06 Sampling Distributions
41/51
Step 2: Calculate the relative frequency distribution. (probability
distribution)
probability of particular sample x frequency
0.10
Step 3: Plot the probability histogram.
p Frequency Relative Frequency
0 3 0.3
0.5 6 0.6
1 1 0.1
-
8/3/2019 Stats Lecture 06 Sampling Distributions
42/51
Notice, even for this very small population and sample size, the
probability distribution is tending toward the bell shape of the
Normal Distribution.
For samples of size 30 or greater the distribution of
sample proportions is approximately Normal with
mean and standard deviation
The distribution of sample proportions
for sample sizes ofn > 30.
p
np
)1(
n
Np)1(
,~
-
8/3/2019 Stats Lecture 06 Sampling Distributions
43/51
Example: In a certain neighborhood, it is known that 12% of
people age 16 to 24 are unemployed. If a random sample of 150
people age 16 to 24 is selected, what is the probability that the
sample contains at most 10% unemployed?
Step 1: Calculatep and p .
In this case, n = 150.
If 12% of the population is unemployed then = 0.12.
p = = 0.12
= 0.0265150
)12.01(12.0
p
Step 2: Sketch the curve and indicate relevant information
-
8/3/2019 Stats Lecture 06 Sampling Distributions
44/51
Step 2: Sketch the curve and indicate relevant information.
Step 3: Calculate Z
= 0.100.12 = -0.7547 = -0.75
0.0265
Step 4: Look up Z in the table.
Area in tail is 0.2266.
The probability that at most 10% of the sample is unemployed is
0.2266.
= 0.12
0.10
p
pZ
-
8/3/2019 Stats Lecture 06 Sampling Distributions
45/51
Instead, lets calculate the probability that the sample contains at
most 25 unemployed people.
Step 1: Convert the number into a proportion.
25 / 150 = 0.16667
Step 2: Calculatep and p .
p = = 0.12 p = 0.0265
S 3 Sk h h d i di l i f i
-
8/3/2019 Stats Lecture 06 Sampling Distributions
46/51
Step 3: Sketch the curve and indicate relevant information.
Pr(p
-
8/3/2019 Stats Lecture 06 Sampling Distributions
47/51
Many times the value of the population proportion is unknown.
We can approximate the mean and standard error of the
proportions by :
pp n
ppsp
)1(
-
8/3/2019 Stats Lecture 06 Sampling Distributions
48/51
V. Some desirable properties of estimators
1. Estimators should be unbiased.
accurate
An estimator is unbiased if the average value of all the pointestimates is equal to the population parameter being estimated.
To prove that x is an unbiased estimator of we would need to
show that the expected value of the sample mean is equal to the
population mean.
E(x) =.
-
8/3/2019 Stats Lecture 06 Sampling Distributions
49/51
2. The values of sample statistics vary around the population
parameter. It is desirable to keep this variance at a minimum
minimum variance
precise
An estimator is precise when the values of the estimates are
close.
-
8/3/2019 Stats Lecture 06 Sampling Distributions
50/51
-
8/3/2019 Stats Lecture 06 Sampling Distributions
51/51
Concepts:
- Central Limit Theorem- Normal distribution
- desirable properties of estimators
Skills:For both means and proportions:
- calculate the mean of all the sample means (proportions) and the
standard deviation of the sample means (proportions)
- construct a probability distribution table
- calculate the probability of an event