Introduction to Sampling Distributions

71
Section 7.1-2 Sampling Distributions and the Central Limit Theorem © 2012 Pearson Education, Inc. All rights reserved. 1

description

Chapter 7. Introduction to Sampling Distributions. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania. Terms, Statistics & Parameters. Terms : Population, Sample, Parameter, Statistics. Why Sample?. - PowerPoint PPT Presentation

Transcript of Introduction to Sampling Distributions

Page 1: Introduction to Sampling Distributions

1

Section 7.1-2

Sampling Distributions and the Central Limit Theorem

© 2012 Pearson Education, Inc. All rights reserved.

Page 2: Introduction to Sampling Distributions

2

Section 7.1 Objectives

• Describe sampling distributions and verify their properties

• Explain the Central Limit Theorem• Apply the Central Limit Theorem

© 2012 Pearson Education, Inc. All rights reserved.

Page 3: Introduction to Sampling Distributions

3

Terms, Statistics, and Parameters

© 2012 Pearson Education, Inc. All rights reserved.

Page 4: Introduction to Sampling Distributions

4

Why Sample?

© 2012 Pearson Education, Inc. All rights reserved.

At times, we’d like to know something about the population, but because our time, resources, and efforts are limited, we can take a sample to learn about the population.

Page 5: Introduction to Sampling Distributions

5

Sampling Demonstration (size n=2)

© 2012 Pearson Education, Inc. All rights reserved.

• Write each of the following numbers on a card: 1,3,5,7

• Repeat the following experiment:1. Randomly select card #1and replace it2. Randomly select card #2 and replace it

Page 6: Introduction to Sampling Distributions

6

Types of Inference

© 2012 Pearson Education, Inc. All rights reserved.

• Estimation: We estimate the value of a population parameter

• Testing: We formulate a decision about a population parameter

• Regression: We make predictions about the value of a population parameter

Page 7: Introduction to Sampling Distributions

7

Sampling Distributions

© 2012 Pearson Education, Inc. All rights reserved.

• The distribution of values taken on by the statistic • It is based on all possible samples of the same size

from a given population we sample with replacement the same value can be used over again

• A sampling distribution is a sample space• The distribution describes everything that can happen

when we sample

Page 8: Introduction to Sampling Distributions

8

Sampling Distributions

© 2012 Pearson Education, Inc. All rights reserved.

• To evaluate the reliability of our inference, we need to know about the probability distribution of the statistic we are using.

• Typically, we are interested in the sampling distributions for sample means and sample proportions.

Page 9: Introduction to Sampling Distributions

9

Sampling Distributions

Sampling distribution • The probability distribution of a sample statistic • Formed when samples of size n are repeatedly taken

from a population• e.g. Sampling distribution of sample means

© 2012 Pearson Education, Inc. All rights reserved.

Page 10: Introduction to Sampling Distributions

10

Sampling Distribution of Sample Means

Sample 1

1x

Sample 5

5xSample 2

2x

Sample 3

3xSample 4

4x

Population with μ, σ

The sampling distribution consists of the values of the sample means, 1 2 3 4 5, , , , ,...x x x x x

© 2012 Pearson Education, Inc. All rights reserved.

Page 11: Introduction to Sampling Distributions

11

Checkpoint (contd.)4. What is the sample characteristic corresponding to each sample?The sample statistic is the sample mean length

5. What is the sampling distribution?The sampling distribution is the probability distribution of all possible values of

6. To which population parameter does this sampling distribution correspond?This sampling distribution relates to the population mean length µ of all the trout in the tank.

© 2012 Pearson Education, Inc. All rights reserved.

Page 12: Introduction to Sampling Distributions

12

Exercise 1: Sampling Distribution of Sample Means

The population values {1, 3, 5, 7} are written on slips of paper and put in a box. Two slips of paper are randomly selected, with replacement. a. Find the mean, variance, and standard deviation of

the population.

Mean: 4xN

22Varianc : 5e ( )x

N

Standard Deviat 5ion 236: 2.

Solution:

© 2012 Pearson Education, Inc. All rights reserved.

Page 13: Introduction to Sampling Distributions

13

Exercise 1 : Sampling Distribution of Sample Means

b. Graph the probability histogram for the population values {1,3,5,7}

All values have the same probability of being selected (uniform distribution)

Population values

Prob

abili

ty

0.25

1 3 5 7x

P(x) Probability Histogram of Population of x

Solution:

© 2012 Pearson Education, Inc. All rights reserved.

Page 14: Introduction to Sampling Distributions

14

Exercise 1 Sampling Distribution of Sample Means

c. List all the possible samples of size n = 2 and calculate the mean of each sample.

53, 743, 533, 323, 141, 731, 521, 311, 1

77, 767, 557, 347, 165, 755, 545, 335, 1

These means form the sampling distribution of sample means.

© 2012 Pearson Education, Inc. All rights reserved.

SampleSolution:

Sample x x

Page 15: Introduction to Sampling Distributions

15

Exercise 1 Sampling Distribution of Sample Means

d. Construct the probability distribution of the sample means.

x f Probabilityf Probability1 1 0.06252 2 0.12503 3 0.18754 4 0.25005 3 0.18756 2 0.12507 1 0.0625

𝑥

© 2012 Pearson Education, Inc. All rights reserved.

Page 16: Introduction to Sampling Distributions

16

Exercise 1 : Sampling Distribution of Sample Means

e. Use your calculator to find the mean, variance, and standard deviation of the sampling distribution of the sample means.Solution:The mean, variance, and standard deviation of the 16 sample means are:

4x 2 5 2 52

.x 2 5 1 581x . .

These results satisfy the properties of sampling distributions of sample means.

4x 5 2 236 1 5812 2

. .x n

© 2012 Pearson Education, Inc. All rights reserved.

Page 17: Introduction to Sampling Distributions

17

Exercise 1 : Sampling Distribution of Sample Means

f. Graph the probability histogram for the sampling distribution of the sample means.

The shape of the graph is symmetric and bell shaped. It approximates a normal distribution.

Solution:

Mean Trout Length (in.)

Prob

abili

ty

0.25

P(x) Probability Histogram of Sampling Distribution of

0.20

0.15

0.10

0.05

6 75432

x

x

© 2012 Pearson Education, Inc. All rights reserved.

Page 18: Introduction to Sampling Distributions

18

Exercise 2 : Sampling Distribution of Sample Means

Class Exercise: Each group of three students should repeat the following experiment five times.• Generate random number pairs using the command

RandInt(1,7,2). Reject any pair that contains an even number. Complete the following table and report your five mean values to the class recorder.

© 2012 Pearson Education, Inc. All rights reserved.

Trial #1      2      3      4      5      

Page 19: Introduction to Sampling Distributions

19

2. The standard deviation of the sample means, , is equal to the population standard deviation, σ, divided by the square root of the sample size, n.

1. The mean of the sample means, , is equal to the population mean μ.

Properties of Sampling Distributions of Sample Means

x

x

x

x n

• Called the standard error of the mean.

© 2012 Pearson Education, Inc. All rights reserved.

Page 20: Introduction to Sampling Distributions

20

The Central Limit Theorem 7.1If the population itself is normally distributed,

then the sampling distribution of sample means is normally distribution for any sample size n.

x

© 2012 Pearson Education, Inc. All rights reserved.

x

xx xx

xxxx x

xxx

Page 21: Introduction to Sampling Distributions

21

The Central Limit Theorem 7.21. If samples of size n ≥ 30 are drawn from any

population with mean = µ and standard deviation = σ,

x

xx xx

xxxx x

xxx x

then the sampling distribution of sample means approximates a normal distribution. The greater the sample size, the better the approximation.

© 2012 Pearson Education, Inc. All rights reserved.

Page 22: Introduction to Sampling Distributions

22

The Central Limit Theorem• In either case, the sampling distribution of sample

means has a mean equal to the population mean.

• The sampling distribution of sample means has a variance equal to 1/n times the variance of the population and a standard deviation equal to the population standard deviation divided by the square root of n.

Variance

Standard deviation (standard error of the mean)

x

x n

22x n

© 2012 Pearson Education, Inc. All rights reserved.

Mean

Page 23: Introduction to Sampling Distributions

23

The Central Limit TheoremCase: Any Population Distribution Case: Normal Population Distribution

Distribution of Sample Means, n ≥ 30

Distribution of Sample Means, (any n)

© 2012 Pearson Education, Inc. All rights reserved.

Page 24: Introduction to Sampling Distributions

24

Exercise 3: Interpreting the Central Limit Theorem

Cellular phone bills for residents of a city have a mean of $63 and a standard deviation of $11. Random samples of 100 cellular phone bills are drawn from this population and the mean of each sample is determined. Find the mean and standard error of the mean of the sampling distribution. Then sketch a graph of the sampling distribution of sample means.

© 2012 Pearson Education, Inc. All rights reserved.

Page 25: Introduction to Sampling Distributions

25

Solution: Interpreting the Central Limit Theorem

• The mean of the sampling distribution is equal to the population mean

• The standard error of the mean is equal to the population standard deviation divided by the square root of n.

63x

11 1.1100x n

© 2012 Pearson Education, Inc. All rights reserved.

Page 26: Introduction to Sampling Distributions

26

Solution: Interpreting the Central Limit Theorem

• Since the sample size is greater than 30, the sampling distribution can be approximated by a normal distribution with

$63x $1.10x

© 2012 Pearson Education, Inc. All rights reserved.

Page 27: Introduction to Sampling Distributions

27

Exercise 4: Interpreting the Central Limit Theorem

Suppose the training heart rates of all 20-year-old athletes are normally distributed, with a mean of 135 beats per minute and standard deviation of 18 beats per minute. Random samples of size 4 are drawn from this population, and the mean of each sample is determined. Find the mean and standard error of the mean of the sampling distribution. Then sketch a graph of the sampling distribution of sample means.

© 2012 Pearson Education, Inc. All rights reserved.

Page 28: Introduction to Sampling Distributions

28

Solution: Interpreting the Central Limit Theorem

• The mean of the sampling distribution is equal to the population mean.

• The standard error of the mean is equal to the population standard deviation divided by the square root of n.

=

© 2012 Pearson Education, Inc. All rights reserved.

Page 29: Introduction to Sampling Distributions

29

Solution: Interpreting the Central Limit Theorem

• Since the population is normally distributed, the sampling distribution of the sample means is also normally distributed.

135x 9x

© 2012 Pearson Education, Inc. All rights reserved.

Page 30: Introduction to Sampling Distributions

Exercise 5: Probabilities for Sampling Distributions

The graph shows the length of time people spend driving each day. You randomly select 50 drivers age 15 to 19. What is the probability that the mean time they spend driving each day is between 24.7 and 25.5 minutes? Assume that σ = 1.5 minutes.

Larson/Farber 4th ed 30

Page 31: Introduction to Sampling Distributions

Solution: Probabilities for Sampling Distributions

From the Central Limit Theorem (sample size is greater than 30), the sampling distribution of sample means is approximately normal with

25x 1.5 0.2121350x n

31Larson/Farber 4th ed

Page 32: Introduction to Sampling Distributions

Solution: Probabilities for Sampling Distributions

124 7 25 1 411 5

50

xz

n

- . - - ..

24.7 25

P(24.7 < x < 25.5)

x

Normal Distributionμ = 25 σ = 0.21213

225 5 25 2 361 5

50

xz

n

- . - ..

25.5 -1.41z

Standard Normal Distribution μ = 0 σ = 1

0

P(-1.41 < z < 2.36)

2.36

0.99090.0793

32Larson/Farber 4th ed

𝑷 (24.7<𝑥<25.5 )=𝑷 (− 1.41<𝑧<2.36)=𝒏𝒐𝒓𝒎𝒂𝒍𝒄𝒅𝒇 (− 1.41, 2.36 )=0.9116

Page 33: Introduction to Sampling Distributions

33

Solution: Interpreting the Central Limit Theorem

• The mean of the sampling distribution is equal to the population mean

• The standard error of the mean is equal to the population standard deviation divided by the square root of n.

135x

© 2012 Pearson Education, Inc. All rights reserved.

18 94x n

Page 34: Introduction to Sampling Distributions

34

Exercise 6: Probabilities for and Suppose a team of biologists has been studying the Pinedale Children’s fishing pond. Let x represent the length of a single trout taken at random from the pond. Assume x has a normal distribution with μ=10.2 inches and standard deviation σ=1.4 in.a) What is the probability that a single trout taken at

random from the pond is between 8 and 12 inches?b) What is the probability that the mean length of 5

trout taken at random is between 8 and 12 inches?c) Explain the difference between parts a) and b).

Page 35: Introduction to Sampling Distributions

35

Exercise 6: Probabilities for and

a) What is the probability that a single trout taken at random from the pond is between 8 and 12 inches?

b) What is the probability that the mean length of 5 trout taken at random is between 8 and 12 inches?

= = 10.2 =

Page 36: Introduction to Sampling Distributions

36

Exercise 6: Probabilities for and

 c) Explain the difference between parts a) and b)

In part a, we are computing the probability that a single trout will be between 8 and 12 inches in length.

In part b, we are computing the probability that average length for a sample of size of 5 will be between 8 and 12 inches.

Page 37: Introduction to Sampling Distributions

37

Exercise 6: Probabilities for and

NOTES:1. Both curves use the same scale on

the horizontal axis.2. The means are the same.3. The shaded area is above the

interval from 8 to 12 on each graph.

3. The smaller standard deviation of the distribution has the effect of gathering together much more of the total probability into the region over its mean.

Page 38: Introduction to Sampling Distributions

38

Exercise 7: Probabilities for x and xAn education finance corporation claims that the average credit card debts carried by undergraduates are normally distributed, with a mean of $3173 and a standard deviation of $1120. (Adapted from Sallie Mae)

Solution:You are asked to find the probability associated with a certain value of the random variable x.

a) What is the probability that a randomly selected undergraduate, who is a credit card holder, has a credit card balance less than $2700?

© 2012 Pearson Education, Inc. All rights reserved.

Page 39: Introduction to Sampling Distributions

39

Solution: Probabilities for x and x

P( x < 2700) = P(z < –0.42) = 0.3372

z x

2700 3173

1120 0.42

2700 3173

P(x < 2700)

x

Normal Distribution μ = 3173 σ = 1120

–0.42z

Standard Normal Distribution μ = 0 σ = 1

0

P(z < –0.42)

0.3372

© 2012 Pearson Education, Inc. All rights reserved.

Page 40: Introduction to Sampling Distributions

40

Example: Probabilities for x and x

b) You randomly select 25 undergraduates who are credit card holders. What is the probability that their mean credit card balance is less than $2700?

Solution:You are asked to find the probability associated with a sample mean .x

3173x 1120 22425x n

© 2012 Pearson Education, Inc. All rights reserved.

Page 41: Introduction to Sampling Distributions

41

0

P(z < –2.11)

–2.11z

Standard Normal Distribution μ = 0 σ = 1

0.0174

Solution: Probabilities for x and x

z x

n

2700 31731120

25

473224

2.11

Normal Distribution μ = 3173 σ = 1120

2700 3173

P(x < 2700)

x

P( x < 2700) = P(z < –2.11) = 0.0174

© 2012 Pearson Education, Inc. All rights reserved.

Page 42: Introduction to Sampling Distributions

42

Solution: Probabilities for x and x

c) Write interpretive statements for the two calculations above1. There is about a 34% chance that an undergraduate

will have a balance less than $2700.2. There is only about a 2% chance that the mean of a

sample of 25 will have a balance less than $2700. If the mean balance of a sample of 25 actually was less than $2700, we would consider this to be an unusual event.

© 2012 Pearson Education, Inc. All rights reserved.

Page 43: Introduction to Sampling Distributions

43

Population Variability vs. Standard Error

Variability – The spread of the sampling distribution indicates the variability of the statistic

Example 1: Americans’ incomes are quite widely distributed, from $0 to Bill Gates’

Large population variability standard error will be quite variable

Page 44: Introduction to Sampling Distributions

44

Population Variability vs. Standard Error

Variability – The spread of the sampling distribution indicates the variability of the statistic

Example 2: Americans’ car values are less widely distributed, from about $500 to about $50K

Smaller population variability standard error will be less variable

Page 45: Introduction to Sampling Distributions

45

Section 7.2 Summary

• Found sampling distributions and verified their properties

• Interpreted the Central Limit Theorem• Applied the Central Limit Theorem to find the

probability of a sample mean

© 2012 Pearson Education, Inc. All rights reserved.

Page 46: Introduction to Sampling Distributions

46

Section 7.3 Objectives

• Compute the mean and standard deviation for the sample proportion

• Use the normal approximation to compute probabilities for proportions

© 2012 Pearson Education, Inc. All rights reserved.

Page 47: Introduction to Sampling Distributions

47

Sampling Distribution for the Proportion

© 2012 Pearson Education, Inc. All rights reserved.

Page 48: Introduction to Sampling Distributions

48

Sampling Distribution for the Proportion

© 2012 Pearson Education, Inc. All rights reserved.

• The standard error for the distribution is the standard deviation of the

• We consider the sampling distribution for r in the binomial distribution

• The distribution is discrete, while x is continuous• To adjust for this, we will need to apply an

appropriate continuity correction

Page 49: Introduction to Sampling Distributions

49

Sampling Distribution for the Proportion

© 2012 Pearson Education, Inc. All rights reserved.

Page 50: Introduction to Sampling Distributions

50

Exercise 1: Sampling distribution of The annual crime rate in the Capital Hill neighborhood of Denver is 111 victims per 1000 residents. This means that 111 out of 1000 residents have been the victim of at least one crime. These crimes range from relatively minor crimes (stolen hubcaps or purse snatching) to major crimes (murder). The Arms is an apartment building in Capital Hill. It has 50 year round residents. Suppose we view each of the n residents as a binomial trial. The random variable r (which takes on values 0, 1, 2, . . . , 50) represents the number of victims of at least one crime in the next year.

Page 51: Introduction to Sampling Distributions

51

Exercise 1: Sampling distribution of

a) What is the population probability p that a resident in the Capital Hill neighborhood will be the victim of a crime next year? What is the probability q that a resident will not be a victim?Solution:Probability = relative frequency =

p = 111/1000 = 0.111 q = 1 – p = 0.889

Page 52: Introduction to Sampling Distributions

52

Exercise 1: Sampling distribution of

b) Can we approximate the distribution with a normal distribution? Explain.Solution:Consider the random variable = np = 50(0.111) = 5.55

nq = 50(0.889) = 44.45Since both np and nq are greater than 5, we can approximate the distribution with a normal distribution.

Page 53: Introduction to Sampling Distributions

53

Exercise 1: Sampling distribution of

Sampling Distribution for p-hat (n=50, p=0.111)

0

1/50

1/25

3/50

2/25

1/10

3/25

7/50

4/25

9/50

1

/5

11/5

0

6/25

13/5

0

7/25

3/10

0.000

0.020

0.040

0.060

0.080

0.100

0.120

0.140

0.160

0.180

0.200

Sampling Distribution for p-hat (n=50, p=0.111)

phat

P(ph

at)

Page 54: Introduction to Sampling Distributions

54

Exercise 1: Sampling distribution of

c) What are the mean and standard deviation for the distribution?Solution:

Page 55: Introduction to Sampling Distributions

55

Exercise 1: Sampling distribution of

d) What is the probability that between 10% and 20% of the Arms residents will be victims of a crime next year? Interpret the results.Continuity Correction: = 0.01 

.

Page 56: Introduction to Sampling Distributions

56

Exercise 1: Sampling distribution of

Interpretive Statement: There is about a 67% chance that between 10% and 20% of the Arms residents will be crime victims next year.

Page 57: Introduction to Sampling Distributions

57

Exercise 2: Sampling distribution of

Consider tossing a fair coin 5 times. Calculate the proportion of the 5 tosses that result in heads. Calculate the sampling distribution of

a) Compute the possible values of

r = r/n0 0 1 1/5 2 2/5 3 3/5 4 4/5 5 1

Page 58: Introduction to Sampling Distributions

58

Exercise 2: Sampling distribution of

b) Compute the possible values of

r P() = binompdf(5,0.5,r)0 0 0.0311 1/5 0.1562 2/5 0.3133 3/5 0.3134 4/5 0.1565 1 0.031

Page 59: Introduction to Sampling Distributions

59

Exercise 2: Sampling distribution of

c) G

0 1/5 2/5 3/5 4/5 1 more0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

Sampling Distribution for p-hat (n=5, p=0.5)

Page 60: Introduction to Sampling Distributions

60

Exercise 3: Sampling distribution of

According to a study by the U.S. Department of Transportation, 44% of college students drive while distracted. Professor Baker surveyed 244 students at her college and 36% of them admitted to driving while distracted in the past week.

Do these results seem reasonable? Compute the probability that in a sample of 244 students, 36% or less have engaged in distracted driving.

Page 61: Introduction to Sampling Distributions

61

Solution: Sampling distribution of

𝑛=244 ,𝑝=0.44 ,𝑞=0.56244(0.44) = 107.36

244(0.56) = 136.64

Since both np and nq are greater than 5, we can approximate the distribution with a normal distribution.

Page 62: Introduction to Sampling Distributions

62

Solution: Sampling distribution of

Continuity Correction: = 0.002

= =

Page 63: Introduction to Sampling Distributions

63

Solution: Sampling distribution of

Interpretive Statement: Theoretically, there is only about a 0.73% chance that in a sample of 244 students, 36% or less will have engaged in distracted driving. It does not appear that the Professor Baker’s students are being very honest in response to this survey!

Page 64: Introduction to Sampling Distributions

Bias in SamplingDefinition: A sample statistic is unbiased if the mean of its sampling distribution equals the value of the parameter being estimated.• The sample mean is an unbiased estimator of the

mean µ when n ≥ 30• The sample proportionis an unbiased estimator of the

population proportion of successes p in binomial experiments with sufficiently large numbers of trials n

• Sample standard deviation is a biased estimator of population standard deviation (bias is introduced by the non-linear square root function).

Page 65: Introduction to Sampling Distributions

Bias in SamplingDefinition: A sample statistic is unbiased if the mean of its sampling distribution equals the value of the parameter being estimated.• The sample variance is an unbiased estimator of

population variance • Sample standard deviation s is a biased estimator of

population standard deviation σ (bias is introduced by the non-linear square root function).

https://www.khanacademy.org/math/probability/descriptive-statistics/variance_std_deviation/v/sample-standard-deviation-and-bias

Page 66: Introduction to Sampling Distributions

Variability of Distribution

• Spread of sampling distribution is an indication of the variability of the statistic

• Spread is affected by sample size The v decreases as sample size increases The variability of decreases as sample size

increases

Page 67: Introduction to Sampling Distributions

Population Parameter as a Target

Page 68: Introduction to Sampling Distributions

Population Parameter as a Target

Both bias and variability describe what happens when we take many shots at the target.

Bias means that our aim is off and we consistently miss the bulls-eye in the same direction.

Our sample values do not center on the population value.

Page 69: Introduction to Sampling Distributions

Population Parameter as a Target

High variability means that repeated shots are widely scattered on the target.

Repeated samples do not give very similar results.

For best results, choose a sample statistic with • Low bias• Minimum variability

Page 70: Introduction to Sampling Distributions

Spread: Low Variability is Better!

Larger samples are more likely to produce an estimate close to the true value of the parameter.

Sample size: larger n smaller standard error

Page 71: Introduction to Sampling Distributions

71

Section 7.3 Summary

• Computed the mean and standard deviation for the sample proportion

• Used normal approximation to compute probabilities for proportions

• Discussed the concepts of statistical bias and variability

© 2012 Pearson Education, Inc. All rights reserved.