PRBE002 Slides Session 05
-
Upload
rony-rahman -
Category
Documents
-
view
218 -
download
0
Transcript of PRBE002 Slides Session 05
-
7/26/2019 PRBE002 Slides Session 05
1/18
6/02/2014
1
Chapter 10Sampling distributions
3
Introduction
In real life, calculating the parametersof populations is prohibitive becausepopulations are very large.
Rather than investigating the wholepopulation, we take a sample, calculatea statisticrelated to theparameterofinterest and make an inference.
The sampling distribution of the statisticis the tool that tells us how close thestatistic is to the parameter.
-
7/26/2019 PRBE002 Slides Session 05
2/18
6/02/2014
2
4
Sampling Distributions
A sampling distribution is created by, as thename suggests, s am p l i n g .
The method we will employ to derive thesampling distribution uses the r u l e s o f p r o b a b i l i t y and the l a w s o f e x p e c t ed v al u e a n d v a r ia n ce .
For example, consider the roll of one and twodice
5
x 1 2 3 4 5 6
P(x) 1/6 1/6 1/6 1/6 1/6 1/6
10.1 Sampling Distribution of theSample Mean
Sampling distribution of a single dieA fair die is thrown infinitely many times,with the random variable X = Number of spotsshowing on any throw.The probability distribution of X is:
and the mean and variance are calculated as:
6
Sampling Distribution of Two Dice
A sampling distribution of the sample mean iscreated by looking at all samples of size n=2(i.e. two dice) and their means
While there are 36 possible samples of size2, there are only 11 values for , and some(e.g. =3.5) occur more frequently thanothers (e.g. =1).
-
7/26/2019 PRBE002 Slides Session 05
3/18
6/02/2014
3
7
The s am p l i n g d i s t r i b u t i o n of is shown below:
P( )
Sampling Distribution of Two Dice
8
Compare
Compare the distribution of X
with the sampling distribution of .
As well, note that:
9
Generalise
We can generalise the mean and variance of thesampling of two dice:
to n-dice:
The standard deviation ofthe sampling distribution ofthe sample mean is calledthe s t a n d a r d e r r o r :
-
7/26/2019 PRBE002 Slides Session 05
4/18
6/02/2014
4
10
1
1
1
6
6
6
Notice that is smaller
thanx. The larger the sample
size the smaller . Therefore,
tends to fall closer to, as the
sample size increases.
11
The variance of the sample mean is smallerthan the variance of the population.
1 2 3
Also,
Expected value of the population = (1 + 2 + 3)/3 = 2
Mean = 1.5 Mean = 2.5Mean = 2.
Population 1.51.51.51.51.51.51.51.51.51.51.51.51.5
2.52.52.52.52.52.52.52.52.52.52.52.52.5
22222222222
Expected value of the sample mean = (1.5 + 2 + 2.5)/3 = 2
Compare the variability of the population
to the variability of the sample mean.Let us take samplesof two observations.
12
Central Limit Theorem
If a random sample is drawn from a normalpopulation, then the sampling distribution ofthe sample mean is normally distributed forall values of n (sample size).
If a random sample is drawn from anypopulation, the sampling distribution of thesample mean is approximately normal for asufficiently large sample size.
-
7/26/2019 PRBE002 Slides Session 05
5/18
6/02/2014
5
13
Central Limit Theorem
The larger the sample size, the more closely
the sampling distribution of will resemble anormal distribution.
In most practical situations, a sample size of30 may be sufficiently large to allow us to usethe normal distribution as an approximationfor the sampling distribution of .
14
Sampling Distribution of the Sample Mean
15
We can standardise the sampling distribution ofthe sample mean as
Sampling Distribution of the Sample Mean
-
7/26/2019 PRBE002 Slides Session 05
6/18
6/02/2014
6
16
The summaries above assume that the population
is infinitely large. However, if the population is finitethe standard error is
where N is the population size and
is the f i n i t e p o p u l a t i o n c o r r e c t i o n f a c t o r .
Sampling Distribution of the Sample Mean
17
Sampling Distribution of the Sample Mean
If the population size is large relative to thesample size the finite population correctionfactor is close to 1 and can be ignored.
We will treat any population that is at least 20times larger than the sample size as large.
In practice, most applications involve
populations that qualify as large. As a consequence the finite population
correction factor is usually omitted.
18
The weight of each 32g chocolate bar isnormally distributed with a mean of 32.2 gand a standard deviation of 0.3 g.
Find the probability that, if a customer buysone chocolate bar, that bar will weigh morethan 32 g.
Solution The random variable X is the weight of a
chocolate bar.
= 32.2
0.7486
x = 32
Example 10.1
-
7/26/2019 PRBE002 Slides Session 05
7/18
6/02/2014
7
19
= 32.2
0.7486
x = 32
Find the probability that, if a customer buysa pack of 4 bars, the mean weight of the 4
bars will be more than 32 g.Solution
The random variable here is the meanweight per chocolate bar, . We want
0.9082
20
Example
The average weekly income of graduates oneyear after graduation is $600. Suppose thedistribution of weekly income has a standarddeviation of $100. What is the probability that25 randomly-selected graduates have anaverage weekly income of less than $550?
Solution
Let X be the weekly income of graduates one
year after graduation.
21
ExampleThe average weekly income of graduatesone year after graduation is $600. Supposethe distribution of weekly income has astandard deviation of $100.
1. What is the probability that 25 randomly-selected graduates have an average weeklyincome of less than $550?
2. If a random sample of 25 graduates actuallyhad an average weekly income of $550, whatwould you conclude about the validity of theclaim that the average weekly income is$600?
-
7/26/2019 PRBE002 Slides Session 05
8/18
6/02/2014
8
22
Solution
Let X be the weekly income of graduates one
year after graduation.1.
2. With = 600, the probability of having asample mean of 550 is very low (0.0062).So the claim that the average weeklyincome is $600 is probably unjustified.
It would be more reasonable toassume that is smaller than $600,because then a sample mean of $550becomes more probable.
23
To make inferences about population parameterswe use sampling distributions.
The symmetry of the normal distribution along withthe sample distribution of the mean lead to:
- Z.025 Z.025
Using Sampling Distributions forInference
24
1.96 1.960
0.025 0.025
0.025 0.025
Standard normal distribution Z
Normal distribution of
-
7/26/2019 PRBE002 Slides Session 05
9/18
6/02/2014
9
25
Conclusion
There is a 95% chance that the samplemean falls within the interval [560.8,639.2] if the population mean is 600.
Since the sample mean was 550, thepopulation mean is probably not 600.
26
In general
27
Creating the Sampling Distributionby Computer Simulation
By producing data sets of randomnumbers that come from givendistributions, we can verify probabilisticand statistical characteristics.
We simulate a dice-tossing experiment(creating the distribution of theaverage).
Effects of an increasing sample size onthe distribution of the mean are shown.
-
7/26/2019 PRBE002 Slides Session 05
10/18
6/02/2014
10
28
Simulation of Dice Tossing
n = 2 n = 5
n = 10Mean = 3.494
Stand. dev. = 0.544
Mean = 3.486
Stand. dev. = 1.215
Mean = 3.495
Stand. dev. = 0.749
29
Type in the variable values and the probability (type 1/6).
Create a histogram
for the distribution
of the mean.
Type the bin.
Excel
Creating a
simulated
distribution
of the mean
Calculate the means.
Create samples
of size two.
30
The parameter of interest for nominal datais theproportion of times a particularoutcome (success) occurs.
To estimate the population proportionp weuse the sample proportion .
The sampling distribution of is binomial.
We prefer to use normal approximation tothe binomial distribution to makeinferences about .
p
10.2 Sampling Distribution of the
Sample Proportion
p
p
-
7/26/2019 PRBE002 Slides Session 05
11/18
6/02/2014
11
31
Normal Approximation to BinomialBinomial distribution with n=20 and p=.5 with anormal approximation superimposed (=10 and
=2.24)
32
Normal Approximation to Binomial
Binomial distribution with n=20 and p=.5 with a normalapproximation superimposed ( =10 and =2.24).
where did these values come from?!
From Section 7.6 we saw that:
Hence:
and
33
Normal approximation to the binomialworks best when
the number of experiments (sample size) islarge, and
the probability of successp is close to 0.5.
For the approximation to provide goodresults:
np 5; n(1 p) 5
Normal Approximation to Binomial
-
7/26/2019 PRBE002 Slides Session 05
12/18
6/02/2014
12
34
To calculate P(X=10) usingthe normal distribution, we
can find the area under thenormal curve between 9.5and 10.5
P(X = 10) P(9.5 < Y < 10.5)
where Y is a normal random variable approximatingthe binomial random variable X.
Normal Approximation to Binomial
35
P(X = 10) P(9.5 < Y < 10.5)
where Y is a normal random variable approximatingthe binomial random variable X.
In fact:
P(X = 10) = .176
while
P(9.5 < Y < 10.5) = .1742
t h e a p p r o x i m a t i on i s q u i t e
g o o d .
Normal Approximation to Binomial
36
Approximate the binomial probabilityP(X = 10) when n = 20 and p = 0.5.
The parameters of the normaldistribution used to approximate thebinomial are:
= np; = np(1 p)
Example
-
7/26/2019 PRBE002 Slides Session 05
13/18
6/02/2014
13
37
109.5 10.5
P(XBinomial = 10)~= P(9.5
-
7/26/2019 PRBE002 Slides Session 05
14/18
6/02/2014
14
40
The Laurier companys brand has a market
share of 30%. In a survey, 1 000consumers were asked which brand theyprefer. What is the probability that morethan 32% of the respondents say theyprefer the Laurier brand?
Example
41
Solution
The number of respondents who prefer Laurieris binomial with n = 1000 and p = 0.30.
Also, np = 1000(0.3) = 300 > 5n(1 p) = 1000(1 0.3) = 700 > 5.
Therefore, is normal with mean p = 0.30and standard error
Hence
42
Sampling Distribution of theDifference between Two Means,
The difference between two means canbecome a parameter of interest when thecomparison between two populations isstudied.
To make an inference about 1 2 weobserve the distribution of .
-
7/26/2019 PRBE002 Slides Session 05
15/18
6/02/2014
15
43
Applying the laws of expected valueand variance we have:
The distribution of is normalwith mean 1 2 and standarddeviation of if
the two samples are independent
the original populations are normallydistributed.
44
If the original populations are notnormally distributedbut the samplesizes are 30 or more, the distributionof is approximately normal.
45
The mean salaries of MBA graduates fromtwo universities (1 and 2), after 5 years,are $62 000 (stand. dev. = $14 500) and$60 000 (stand. dev. = $18 300).What isthe probability that a sample mean ofUniversity-1 graduates will exceed thesample mean of University-2 graduates?(n1 = 50; n2 = 60)
Solution
As the sample sizes are more than 30, thedistribution of is approximatelynormal.
Example
-
7/26/2019 PRBE002 Slides Session 05
16/18
6/02/2014
16
1 2 = 62 000 60 000 = $2 000
Th e r e i s a b o u t a 7 4% c h a n c e t h a t t h e s am p l e m e a n
s t a r t i n g s a l ar y o f U . # 1 g r a d s w i l l e x c e e d t h a t o f U .
# 2 . 46
47
In Chapters 7 and 8 we introducedprobability distributions, which allowed usto make probability statements aboutvalues of the random variable.
A prerequisite of this calculation isknowledge of the distribution and the
relevant parameters.
From Here to Inference
48
In Example 7.9, we needed to know thatthe probability that Pat Statsdud guessesthe correct answer is 20% (p = .2) andthat the number of correct answers(successes) in 10 questions (trials) is abinomial random variable.
We then could compute the probability ofany number of successes.
From Here to Inference
-
7/26/2019 PRBE002 Slides Session 05
17/18
6/02/2014
17
In Example 8.3, we needed to know thatthe amount of time to assemble acomputer is normally distributed with amean of 60 minutes and a standarddeviation of 8 minutes.
These three bits of information allowed usto calculate the probability of variousvalues of the random variable.
From Here to Inference
49
50
The figure below symbolically represents the use ofprobability distributions.
Simply put, knowledge of the population and itsparameter(s) allows us to use the probabilitydistribution to make probability statements aboutindividual members of the population.
Probability Distribution ---------- Individual
From Here to Inference
51
In this chapter we developed the samplingdistribution, wherein knowledge of the parameter(s)and some information about the distribution allowus to make probability statements about a samplestatistic.
----- Statistic
From Here to Inference
-
7/26/2019 PRBE002 Slides Session 05
18/18
6/02/2014
52
Statistics works by reversing the direction of
the flow of knowledge in the previous figure.The next figure displays the character of statistical inference.
Starting in Chapter 11, we will assume thatmost population parameters are unknown. Thestatistics practitioner will sample from thepopulation and compute the required statistic.The sampling distribution of that statistic willenable us to draw inferences about theparameter.
From Here to Inference
53
From Here to Inference
Statistic ------ Parameter