Exercise 19: Sample Size. Part One Explore how sample size affects the distribution of sample...
-
Upload
sydni-thomason -
Category
Documents
-
view
214 -
download
0
Transcript of Exercise 19: Sample Size. Part One Explore how sample size affects the distribution of sample...
Part One
• Explore how sample size affects the distribution of sample proportions
• This was achieved by first taking random samples 20 times when n=10 and then taking 20 random samples where n=40. These random samples were then summarized as sample statistics (p-hat).
Tally for Discrete Variable : Live
Live Count Percentoff 223 50.11on 222 49.89
N= 445 *= 1
This verifies that the proportion of students living on campus and off campus is approximately 50%.
This would be the population proportion (p).
Mean, Shape & Standard Deviation
• What would you expect if 20 random samples of 10 were taken?
• What would you expect if 20 random samples of 40 were taken?
Results from 20 samples where n=10 resulting in phatlive…0.30000.40000.50000.40000.50000.40000.50000.30000.50000.6000
0.60000.50000.50000.40000.50000.55560.70000.40000.60000.8000
Descriptive Statistics: phatlive=10
Variable N N* Mean SE Mean StDev Phatlive 20 0 0.4978 0.0278 0.1242
Minimum Q1 Median Q3 Maximum 0.3000 0.4000 0.5000 0.5889 0.8000
Let’s Look At A Stem PlotStem-and-leaf of phatlive=10 (N = 20)Leaf Unit = 0.010 3 00 3 4 00000 4 5 0000000 5 5 6 000 6 7 0 7 8 0
Sample Proportions…
• What is the center, spread and shape for this sample proportion?
• Center= mean= 0.4978= phat• Spread= st.dev= 0.1242• Shape= np and/or n(1-p) does not equal atleast
10, therefore guidelines for normality are not met. However, as shown in the stem plot, the results appear relatively normal because of the perfectly balanced population proportions of .5 and .5.
What if the sample size increases…Results from 20 samples where n=40
resulting in phatlive…0.57500.47500.45000.42500.47500.32500.42500.40000.42500.3500
0.55000.50000.53850.43590.45000.50000.47500.42500.45000.4750
Descriptive Statistics phatlive=40
Variable N N* Mean SE Mean StDev Phatlive=40 20 0 0.4562 0.0137 0.0611
Minimum Q1 Median Q3 Maximum
0.3250 0.4250 0.4500 0.4938 0.5750
Stem-plot for phatlive=40N = 20 &Leaf Unit = 0.010
3 2 3 5 3 3 4 0 4 22223 4 555 4 7777 4 5 00 5 3 5 5 5 7
Sample Proportions for phatlive=40
• What is the center, spread and shape for this sample proportion?
• Center= mean=.4562• Spread= st. dev. = .0611• Shape= np and n(1-p) are greater then 10
there normality satisfied.
Let’s compare them simultaneously
Descriptive Statistics: phatlive=40, phatlive=10 Variable N N* Mean SE Mean StDev Minimum Q1 Medianphatlive=40 20 0 0.4562 0.0137 0.0611 0.3250 0.4250 0.4500 phatlive=10 20 0 0.4978 0.0278 0.1242 0.3000 0.4000 0.5000
Variable Q3 Maximumphatlive=40 0.4938 0.5750phatlive=10 0.5889 0.8000
How do their centers, spreads and shapes compare?
What does this mean?
• The mean for n=40 is more consistent with the population mean.
• The spread is smaller for n=40• The shape is more normal for n=40
As outlined in Chapter 6
• A random variable X for count of sampled individuals in the category of interest is binomial with parameters n and p if…
1.There is a fixed sample size n2.Each selection is independent of the others3.Each individual sampled takes just two
possible values4.The Probability of each individual falling in the
category of interest is always p.
However…
• The second condition isn’t really met when sampling without replacement. But as long as the population is at least 10n, then approximate independence can still be concluded.
• Since the population is greater then 400, both sample sizes of 10 and 40 follow this rule.
Part 2
• Explores how population shape affects the distribution of sample proportion.
• First, 20 random samples of 10 were taken and then 20 random samples of 40 were taken. The results were compared.
HandednessTally for Discrete Variables: Handed
Handed Count Percent
ambid 13 2.91 left 40 8.97 right 393 88.12
N= 446
• Proportion of ambidextrous is very skewed since only approximately 3% of population is vs. 97% who is not.
For Handedness n=10
Variable N N* Mean SE Meanphathandedn=10 20 0 0.0300 0.0164
StDev Min. Q1 Median Q3 Max.0.0733 0.00 0.00 0.00 0.00 0.3000
Stem-plot n=10
Stem-and-leaf of phathandedn=10 N = 20 & Leaf Unit = 0.010 0 00000000000000001 00023 0
What does this data show?
• The center or mean is 0.0300• The spread is .0073• The shape is not normal because the
guidelines of np and n(1-p) being greater then 10 are not met
Handedness n=40
Descriptive Statistics: phathandedn=40 Variable N N* Mean SE Mean StDev phathandedn=40 20 0 0.04000 0.00612 0.02739
Minimum Q1 Median Q3 Maximum0.00000 0.02500 0.03750 0.05000 0.10000
Stem-plot n-40 Stem-and-leaf of phathandedn=40 N = 20Leaf Unit = 0.0010 0 000 1 2 5555555 3 4 5 000000 6 7 555 8 9 10 0
What does this mean?
• The center or mean is 0.0400• The spread is 0.02739• The shape is normal because the guidelines of
np and n(1-p) being greater then 10 are met.
Let’s compare them…
Variable N N* Mean SE Mean StDev phathandedn=40 20 0 0.0400 0.00612 0.02739 phathandedn=10 20 0 0.0300 0.0164 0.0733
Minimum Q1 Median Q3 Maximum0.00000 0.02500 0.03750 0.05000 0.100000.0000 0.0000 0.0000 0.0000 0.3000
What does it mean?
• By increasing the sample size, the box plot became less skewed.
• There was less of a spread and fewer outliers. • The center remained at approximately .03• The shape became more normal.