Exercise 19: Sample Size. Part One Explore how sample size affects the distribution of sample...

29
Exercise 19: Sample Size

Transcript of Exercise 19: Sample Size. Part One Explore how sample size affects the distribution of sample...

Exercise 19: Sample Size

Part One

• Explore how sample size affects the distribution of sample proportions

• This was achieved by first taking random samples 20 times when n=10 and then taking 20 random samples where n=40. These random samples were then summarized as sample statistics (p-hat).

Tally for Discrete Variable : Live

Live Count Percentoff 223 50.11on 222 49.89

N= 445 *= 1

This verifies that the proportion of students living on campus and off campus is approximately 50%.

This would be the population proportion (p).

Mean, Shape & Standard Deviation

• What would you expect if 20 random samples of 10 were taken?

• What would you expect if 20 random samples of 40 were taken?

Results from 20 samples where n=10 resulting in phatlive…0.30000.40000.50000.40000.50000.40000.50000.30000.50000.6000

0.60000.50000.50000.40000.50000.55560.70000.40000.60000.8000

Descriptive Statistics: phatlive=10

Variable N N* Mean SE Mean StDev Phatlive 20 0 0.4978 0.0278 0.1242

Minimum Q1 Median Q3 Maximum 0.3000 0.4000 0.5000 0.5889 0.8000

Let’s Look At A Stem PlotStem-and-leaf of phatlive=10 (N = 20)Leaf Unit = 0.010 3 00 3 4 00000 4 5 0000000 5 5 6 000 6 7 0 7 8 0

Sample Proportions…

• What is the center, spread and shape for this sample proportion?

• Center= mean= 0.4978= phat• Spread= st.dev= 0.1242• Shape= np and/or n(1-p) does not equal atleast

10, therefore guidelines for normality are not met. However, as shown in the stem plot, the results appear relatively normal because of the perfectly balanced population proportions of .5 and .5.

What if the sample size increases…Results from 20 samples where n=40

resulting in phatlive…0.57500.47500.45000.42500.47500.32500.42500.40000.42500.3500

0.55000.50000.53850.43590.45000.50000.47500.42500.45000.4750

Descriptive Statistics phatlive=40

Variable N N* Mean SE Mean StDev Phatlive=40 20 0 0.4562 0.0137 0.0611

Minimum Q1 Median Q3 Maximum

0.3250 0.4250 0.4500 0.4938 0.5750

Stem-plot for phatlive=40N = 20 &Leaf Unit = 0.010

3 2 3 5 3 3 4 0 4 22223 4 555 4 7777 4 5 00 5 3 5 5 5 7

Sample Proportions for phatlive=40

• What is the center, spread and shape for this sample proportion?

• Center= mean=.4562• Spread= st. dev. = .0611• Shape= np and n(1-p) are greater then 10

there normality satisfied.

Let’s compare them simultaneously

Descriptive Statistics: phatlive=40, phatlive=10 Variable N N* Mean SE Mean StDev Minimum Q1 Medianphatlive=40 20 0 0.4562 0.0137 0.0611 0.3250 0.4250 0.4500 phatlive=10 20 0 0.4978 0.0278 0.1242 0.3000 0.4000 0.5000

Variable Q3 Maximumphatlive=40 0.4938 0.5750phatlive=10 0.5889 0.8000

How do their centers, spreads and shapes compare?

Box-plots

What does this mean?

• The mean for n=40 is more consistent with the population mean.

• The spread is smaller for n=40• The shape is more normal for n=40

As outlined in Chapter 6

• A random variable X for count of sampled individuals in the category of interest is binomial with parameters n and p if…

1.There is a fixed sample size n2.Each selection is independent of the others3.Each individual sampled takes just two

possible values4.The Probability of each individual falling in the

category of interest is always p.

However…

• The second condition isn’t really met when sampling without replacement. But as long as the population is at least 10n, then approximate independence can still be concluded.

• Since the population is greater then 400, both sample sizes of 10 and 40 follow this rule.

Part 2

• Explores how population shape affects the distribution of sample proportion.

• First, 20 random samples of 10 were taken and then 20 random samples of 40 were taken. The results were compared.

HandednessTally for Discrete Variables: Handed

Handed Count Percent

ambid 13 2.91 left 40 8.97 right 393 88.12

N= 446

• Proportion of ambidextrous is very skewed since only approximately 3% of population is vs. 97% who is not.

For Handedness n=10

Variable N N* Mean SE Meanphathandedn=10 20 0 0.0300 0.0164

StDev Min. Q1 Median Q3 Max.0.0733 0.00 0.00 0.00 0.00 0.3000

Stem-plot n=10

Stem-and-leaf of phathandedn=10 N = 20 & Leaf Unit = 0.010 0 00000000000000001 00023 0

What does this data show?

• The center or mean is 0.0300• The spread is .0073• The shape is not normal because the

guidelines of np and n(1-p) being greater then 10 are not met

Handedness n=40

Descriptive Statistics: phathandedn=40 Variable N N* Mean SE Mean StDev phathandedn=40 20 0 0.04000 0.00612 0.02739

Minimum Q1 Median Q3 Maximum0.00000 0.02500 0.03750 0.05000 0.10000

Stem-plot n-40 Stem-and-leaf of phathandedn=40 N = 20Leaf Unit = 0.0010 0 000 1 2 5555555 3 4 5 000000 6 7 555 8 9 10 0

What does this mean?

• The center or mean is 0.0400• The spread is 0.02739• The shape is normal because the guidelines of

np and n(1-p) being greater then 10 are met.

Let’s compare them…

Variable N N* Mean SE Mean StDev phathandedn=40 20 0 0.0400 0.00612 0.02739 phathandedn=10 20 0 0.0300 0.0164 0.0733

Minimum Q1 Median Q3 Maximum0.00000 0.02500 0.03750 0.05000 0.100000.0000 0.0000 0.0000 0.0000 0.3000

Let’s compare them…

What does it mean?

• By increasing the sample size, the box plot became less skewed.

• There was less of a spread and fewer outliers. • The center remained at approximately .03• The shape became more normal.

Overall

• Live seemed to be more normal the handedness. This was because the population was no skewed for the live variable like for handedness.

• In both situation, n=40 caused the distributions to be more normal.