PRBE002 Slides Session 05

7/26/2019 PRBE002 Slides Session 05

1/18

6/02/2014

1

Chapter 10Sampling distributions

3

Introduction

In real life, calculating the parametersof populations is prohibitive becausepopulations are very large.

Rather than investigating the wholepopulation, we take a sample, calculatea statisticrelated to theparameterofinterest and make an inference.

The sampling distribution of the statisticis the tool that tells us how close thestatistic is to the parameter.


2/18

6/02/2014

2

4

Sampling Distributions

A sampling distribution is created by, as thename suggests, s am p l i n g .

The method we will employ to derive thesampling distribution uses the r u l e s o f p r o b a b i l i t y and the l a w s o f e x p e c t ed v al u e a n d v a r ia n ce .

For example, consider the roll of one and twodice

5

x 1 2 3 4 5 6

P(x) 1/6 1/6 1/6 1/6 1/6 1/6

10.1 Sampling Distribution of theSample Mean

Sampling distribution of a single dieA fair die is thrown infinitely many times,with the random variable X = Number of spotsshowing on any throw.The probability distribution of X is:

and the mean and variance are calculated as:

6

Sampling Distribution of Two Dice

A sampling distribution of the sample mean iscreated by looking at all samples of size n=2(i.e. two dice) and their means

While there are 36 possible samples of size2, there are only 11 values for , and some(e.g. =3.5) occur more frequently thanothers (e.g. =1).


3/18

6/02/2014

3

7

The s am p l i n g d i s t r i b u t i o n of is shown below:

P( )

Sampling Distribution of Two Dice

8

Compare

Compare the distribution of X

with the sampling distribution of .

As well, note that:

9

Generalise

We can generalise the mean and variance of thesampling of two dice:

to n-dice:

The standard deviation ofthe sampling distribution ofthe sample mean is calledthe s t a n d a r d e r r o r :


4/18

6/02/2014

4

10

1

1

1

6

6

6

Notice that is smaller

thanx. The larger the sample

size the smaller . Therefore,

tends to fall closer to, as the

sample size increases.

11

The variance of the sample mean is smallerthan the variance of the population.

1 2 3

Also,

Expected value of the population = (1 + 2 + 3)/3 = 2

Mean = 1.5 Mean = 2.5Mean = 2.

Population 1.51.51.51.51.51.51.51.51.51.51.51.51.5

2.52.52.52.52.52.52.52.52.52.52.52.52.5

22222222222

Expected value of the sample mean = (1.5 + 2 + 2.5)/3 = 2

Compare the variability of the population

to the variability of the sample mean.Let us take samplesof two observations.

12

Central Limit Theorem

If a random sample is drawn from a normalpopulation, then the sampling distribution ofthe sample mean is normally distributed forall values of n (sample size).

If a random sample is drawn from anypopulation, the sampling distribution of thesample mean is approximately normal for asufficiently large sample size.


5/18

6/02/2014

5

13

Central Limit Theorem

The larger the sample size, the more closely

the sampling distribution of will resemble anormal distribution.

In most practical situations, a sample size of30 may be sufficiently large to allow us to usethe normal distribution as an approximationfor the sampling distribution of .

14

Sampling Distribution of the Sample Mean

15

We can standardise the sampling distribution ofthe sample mean as



6/18

6/02/2014

6

16

The summaries above assume that the population

is infinitely large. However, if the population is finitethe standard error is

where N is the population size and

is the f i n i t e p o p u l a t i o n c o r r e c t i o n f a c t o r .


17


If the population size is large relative to thesample size the finite population correctionfactor is close to 1 and can be ignored.

We will treat any population that is at least 20times larger than the sample size as large.

In practice, most applications involve

populations that qualify as large. As a consequence the finite population

correction factor is usually omitted.

18

The weight of each 32g chocolate bar isnormally distributed with a mean of 32.2 gand a standard deviation of 0.3 g.

Find the probability that, if a customer buysone chocolate bar, that bar will weigh morethan 32 g.

Solution The random variable X is the weight of a

chocolate bar.

= 32.2

0.7486

x = 32

Example 10.1


7/18

6/02/2014

7

19

= 32.2

0.7486

x = 32

Find the probability that, if a customer buysa pack of 4 bars, the mean weight of the 4

bars will be more than 32 g.Solution

The random variable here is the meanweight per chocolate bar, . We want

0.9082

20

Example

The average weekly income of graduates oneyear after graduation is $600. Suppose thedistribution of weekly income has a standarddeviation of $100. What is the probability that25 randomly-selected graduates have anaverage weekly income of less than $550?

Solution

Let X be the weekly income of graduates one

year after graduation.

21

ExampleThe average weekly income of graduatesone year after graduation is $600. Supposethe distribution of weekly income has astandard deviation of $100.

1. What is the probability that 25 randomly-selected graduates have an average weeklyincome of less than $550?

2. If a random sample of 25 graduates actuallyhad an average weekly income of $550, whatwould you conclude about the validity of theclaim that the average weekly income is$600?


8/18

6/02/2014

8

22

Solution

Let X be the weekly income of graduates one

year after graduation.1.

2. With = 600, the probability of having asample mean of 550 is very low (0.0062).So the claim that the average weeklyincome is $600 is probably unjustified.

It would be more reasonable toassume that is smaller than $600,because then a sample mean of $550becomes more probable.

23

To make inferences about population parameterswe use sampling distributions.

The symmetry of the normal distribution along withthe sample distribution of the mean lead to:

- Z.025 Z.025

Using Sampling Distributions forInference

24

1.96 1.960

0.025 0.025

0.025 0.025

Standard normal distribution Z

Normal distribution of


9/18

6/02/2014

9

25

Conclusion

There is a 95% chance that the samplemean falls within the interval [560.8,639.2] if the population mean is 600.

Since the sample mean was 550, thepopulation mean is probably not 600.

26

In general

27

Creating the Sampling Distributionby Computer Simulation

By producing data sets of randomnumbers that come from givendistributions, we can verify probabilisticand statistical characteristics.

We simulate a dice-tossing experiment(creating the distribution of theaverage).

Effects of an increasing sample size onthe distribution of the mean are shown.


10/18

6/02/2014

10

28

Simulation of Dice Tossing

n = 2 n = 5

n = 10Mean = 3.494

Stand. dev. = 0.544

Mean = 3.486

Stand. dev. = 1.215

Mean = 3.495

Stand. dev. = 0.749

29

Type in the variable values and the probability (type 1/6).

Create a histogram

for the distribution

of the mean.

Type the bin.

Excel

Creating a

simulated

distribution

of the mean

Calculate the means.

Create samples

of size two.

30

The parameter of interest for nominal datais theproportion of times a particularoutcome (success) occurs.

To estimate the population proportionp weuse the sample proportion .

The sampling distribution of is binomial.

We prefer to use normal approximation tothe binomial distribution to makeinferences about .

p

10.2 Sampling Distribution of the

Sample Proportion

p

p


11/18

6/02/2014

11

31

Normal Approximation to BinomialBinomial distribution with n=20 and p=.5 with anormal approximation superimposed (=10 and

=2.24)

32

Normal Approximation to Binomial

Binomial distribution with n=20 and p=.5 with a normalapproximation superimposed ( =10 and =2.24).

where did these values come from?!

From Section 7.6 we saw that:

Hence:

and

33

Normal approximation to the binomialworks best when

the number of experiments (sample size) islarge, and

the probability of successp is close to 0.5.

For the approximation to provide goodresults:

np 5; n(1 p) 5



12/18

6/02/2014

12

34

To calculate P(X=10) usingthe normal distribution, we

can find the area under thenormal curve between 9.5and 10.5

P(X = 10) P(9.5 < Y < 10.5)

where Y is a normal random variable approximatingthe binomial random variable X.


35

P(X = 10) P(9.5 < Y < 10.5)

where Y is a normal random variable approximatingthe binomial random variable X.

In fact:

P(X = 10) = .176

while

P(9.5 < Y < 10.5) = .1742

t h e a p p r o x i m a t i on i s q u i t e

g o o d .


36

Approximate the binomial probabilityP(X = 10) when n = 20 and p = 0.5.

The parameters of the normaldistribution used to approximate thebinomial are:

= np; = np(1 p)

Example


13/18

6/02/2014

13

37

109.5 10.5

P(XBinomial = 10)~= P(9.5


14/18

6/02/2014

14

40

The Laurier companys brand has a market

share of 30%. In a survey, 1 000consumers were asked which brand theyprefer. What is the probability that morethan 32% of the respondents say theyprefer the Laurier brand?

Example

41

Solution

The number of respondents who prefer Laurieris binomial with n = 1000 and p = 0.30.

Also, np = 1000(0.3) = 300 > 5n(1 p) = 1000(1 0.3) = 700 > 5.

Therefore, is normal with mean p = 0.30and standard error

Hence

42

Sampling Distribution of theDifference between Two Means,

The difference between two means canbecome a parameter of interest when thecomparison between two populations isstudied.

To make an inference about 1 2 weobserve the distribution of .


15/18

6/02/2014

15

43

Applying the laws of expected valueand variance we have:

The distribution of is normalwith mean 1 2 and standarddeviation of if

the two samples are independent

the original populations are normallydistributed.

44

If the original populations are notnormally distributedbut the samplesizes are 30 or more, the distributionof is approximately normal.

45

The mean salaries of MBA graduates fromtwo universities (1 and 2), after 5 years,are $62 000 (stand. dev. = $14 500) and$60 000 (stand. dev. = $18 300).What isthe probability that a sample mean ofUniversity-1 graduates will exceed thesample mean of University-2 graduates?(n1 = 50; n2 = 60)

Solution

As the sample sizes are more than 30, thedistribution of is approximatelynormal.

Example


16/18

6/02/2014

16

1 2 = 62 000 60 000 = $2 000

Th e r e i s a b o u t a 7 4% c h a n c e t h a t t h e s am p l e m e a n

s t a r t i n g s a l ar y o f U . # 1 g r a d s w i l l e x c e e d t h a t o f U .

# 2 . 46

47

In Chapters 7 and 8 we introducedprobability distributions, which allowed usto make probability statements aboutvalues of the random variable.

A prerequisite of this calculation isknowledge of the distribution and the

relevant parameters.

From Here to Inference

48

In Example 7.9, we needed to know thatthe probability that Pat Statsdud guessesthe correct answer is 20% (p = .2) andthat the number of correct answers(successes) in 10 questions (trials) is abinomial random variable.

We then could compute the probability ofany number of successes.



17/18

6/02/2014

17

In Example 8.3, we needed to know thatthe amount of time to assemble acomputer is normally distributed with amean of 60 minutes and a standarddeviation of 8 minutes.

These three bits of information allowed usto calculate the probability of variousvalues of the random variable.


49

50

The figure below symbolically represents the use ofprobability distributions.

Simply put, knowledge of the population and itsparameter(s) allows us to use the probabilitydistribution to make probability statements aboutindividual members of the population.

Probability Distribution ---------- Individual


51

In this chapter we developed the samplingdistribution, wherein knowledge of the parameter(s)and some information about the distribution allowus to make probability statements about a samplestatistic.

----- Statistic



18/18

6/02/2014

52

Statistics works by reversing the direction of

the flow of knowledge in the previous figure.The next figure displays the character of statistical inference.

Starting in Chapter 11, we will assume thatmost population parameters are unknown. Thestatistics practitioner will sample from thepopulation and compute the required statistic.The sampling distribution of that statistic willenable us to draw inferences about theparameter.


53


Statistic ------ Parameter

PRBE002 Slides Session 05

Documents

Transcript of PRBE002 Slides Session 05