Unit 6 Sampling Distributions and Statistical Inference - 1 FPP Chapters 16-18, 20-21, 23
description
Transcript of Unit 6 Sampling Distributions and Statistical Inference - 1 FPP Chapters 16-18, 20-21, 23
6-1Stats
Unit 6
Sampling Distributionsand Statistical Inference - 1
FPP Chapters 16-18, 20-21, 23
The Law of Averages (Ch 16)Box Models (Ch 16)
Sampling Distribution
Probability Histogram (Ch 17)
Sampling Distribution
Central Limit Theorem (Ch 17, 18)
Expected Value (Ch 17, 18) for average (mean), sum, percentage, count
Standard Error (Ch 17, 18) for average (mean), sum, percentage, count
Chance Error
Confidence Intervals (Ch 21) A.05
6-2Stats
The Law of Averages•Toss a coin 10,000 times.•At each toss we expect 50% to be
heads.•At each toss let’s note
–the number of heads–the percentage of heads
6-3Stats
Number of heads
Number of tosses
Number of heads-Expected number of heads
10 50 100 500 1000 5000 10000
-50
-40
-30
-20
-10
0
6-4Stats
Percentage of heads
Number of tosses
Percentage of heads - 50%
10 50 100 500 1000 5000 10000
-10
-5
0
5
10
6-5Stats
The Law of Averages
With a large number of tosses, the percentage of heads is likely to be close to 50%, although it is not likely to be exactly equal to 50%.
6-6Stats
The Law of Averagesdoes NOT say …“The ___________________ team has had such a long string of losses, they are due to get a win. Therefore their chances of winning the next game are greater.”
“I have tossed a coin many times, and now have a string of 5 heads. So the chances of getting tails on the next toss must be greater than 50%.”
6-7Stats
Number of Heads,Chance Error
•Number of heads = 50% of the number of tosses +
chance error
•Can we assess what the chance error is?
6-8Stats
Coin toss example
10100 =
•It turns out that - after 100 tosses, chance error = 5
- after 10,000 tosses, chance error = 50
- increasing the number of tosses by 100 times, chance error increases _______ times.
•Why does the percentage go to 50%?
6-9Stats
Example
We have the choice of tossing a coin 10 times or 100 times. We win if
–we get more than 60% heads.–we get more than 40% heads.–we get between 40% and 60% heads.–we get exactly 50% heads.
Should we toss 10 or 100 times?
6-10Stats
Baseball series
•Team A believes that on any day they have a 60% chance of beating Team B.•They have the option of playing
–1 game, or–best 2 out of 3
•Which format should they choose?
6-11Stats
Where we are headed
•We want to perform a political survey and randomly sample citizens.•We want to quantify the chance variability of our sample. (We don’t want all to be republican).•We can solve variability questions like these by analogy with drawing from a box.
6-12Stats
Making a Box Model
In specifying a box model, we would like to know
- What numbers go into the box
- How many of each kind
- How many draws (sample size)
In practice, what do we really know / not know?
Why do we make box models?
6-13Stats
Variability in the box model
1 2 43 5 6
•Sample 25 tickets with replacement.•Record the sum of the 25 tickets.
3 2 3 2 6 4 6 5 1 5 6 1 5 3 1 3 5 2 4 2 2 6 5 3 4
•Their sum is 89.
6-14Stats
Try again
4 4 6 1 4 1 6 1 5 2 1 4 5 2 1 4 5 2 2 5 4 3 3 2 6
•sum is 83
3 2 3 5 1 4 4 6 5 1 2 1 5 2 1 2 4 3 4 6 1 6 3 1 3
•sum is 78
•Other tries: 82, 92, 71, 73, 90•Range is 25 to 150 but we only observed 71 to 92.
6-15Stats
Roulette
•A roulette wheel has 38 pockets–18 red numbers–18 black numbers–2 green (0 and 00)
•We put a dollar on red. What are the chances of winning?•What numbers are in the box?
6-16Stats
Net gain
•Net gain is the amount that we have won or lost.•Let’s play 10 times…
R R R B G R R B B R+1 +1 +1 –1 –1 +1 +1 –1 –1 +1+1 +2 +3 +2 +1 +2 +3 +2 +1 +2
6-17Stats
So, Our Box Model is …
6-18Stats
6-19Stats
Which game?
You win if you draw a “1”.•A box has 1 “0” ticket and 9 “1” tickets.Or•A box has 10 “0” ticket and 90 “1” tickets.Or•You draw 10 times with replacement. If the sum is 10 then you win.
6-20Stats
Our Box Model is …
6-21Stats
Expected ValueChapt 17
“The expected value for the sum of draws made at random with replacement from a box”
equalsthe expected value for a sample sum
equals
A sample sum is likely to be around its expected value, but to be off by a chance error similar in size to the standard error for sum.
6-22Stats
Standard Error for Sum
The standard error for sum, SE(sum), for a random sample of a given sample size is
.
In FPP, this is
.
samplesize×(populationSD)
numberofdraws×(SDofbox)
6-23Stats
A Sample Sum is Likely ...
The sample sum is likely to be around
____________, give or take
____________or so.
The expected value for the sum, EV(sum), fills the first blank.
The standard error for sum, SE(sum), fills the second blank.
Observed values are rarely more than 2 or 3 SE’s away from the expected value.
6-24Stats
A ReminderThe formulas here are for simple random samples. They likely do not apply to other kinds of samples.
6-25Stats
Example - Keno
In Keno, if you bet on one number, if you win you get $2, if you lose you lose $1.
The chance of winning is ¼________.
What does the box model look like?
What is the expected net gain after 100 plays?
6-26Stats
6-27Stats
ExampleWashington State Lottery
In MegaMillions,you pay $1 to play. You select 5 numbers between 1 and 56, and one MegaBall number between 1 and 46. If you match all 5 numbers AND the MegaBall number, you win the jackpot (starts at $12 million).
The chance of winning is ¼_____.
What does the box model look like?
What is the expected net gain after 100 plays?
6-28Stats
6-29Stats
Washington State Lotterycontinued
Today’s jackpot is ___________.
Suppose you play 10 times.We want to know about your net gain.
What is the relevant box model?
6-30Stats
Washington State Lotterycontinued
What is the expected net gain if you buy 100 tickets?
What does that mean?
What is the standard error for your net gain?
What does that tell us?
6-31Stats
Probability histogram
Earlier in the course we displayed data in histograms.
1 2 3 4 5 6
0.0
0.1
0.2
0.3
0.4
x
• Probability histograms represent the true (as opposed to the data) chance of an outcome.
• Example: rolling a die
6-32Stats
Sum of two die
2 4 6 8 10 12
0.0
0.10
0.20
0.30
x
2 4 6 8 10 12
0.0
0.10
0.20
0.30
x
2 4 6 8 10 12
0.0
0.10
0.20
0.30
x
2 4 6 8 10 12
0.0
0.10
0.20
0.30
x
100 1,000
10,000 truth
6-33Stats
Empirical vs. truth
After rolling 100 times we see that we never rolled a 2. But we know a 2 is possible.
After rolling 1,000 times the distribution seems more symmetric
After 10,000 the histogram is symmetric.
The empirical histogram converges to the true histogram.
6-34Stats
Caution
There are two counts that may be confused–the number of things added together–the number of repetitions of the
experiment
As the number of repetitions increases, the empirical distribution converges to the true histogram.
What happens when the number of things added together increases?
6-35Stats
Expected ValueChapt 23
“The expected value for the average of draws made at random with replacement from a box”
equalsthe expected value for a sample mean
equals
A sample average (mean) is likely to be around its expected value, but to be off by a chance error similar in size to the standard error for average.
6-36Stats
Standard Error for Average
The standard error for average, SE(avg), for a random sample of a given sample size is
.
In FPP, this is
.
populationSDsamplesize
€
SD ofboxnumber ofdraws
6-37Stats
A Sample Average is Likely ...
The sample average is likely to be around
__________ _, give or take
____________or so.
The expected value for the average, EV(avg), fills the first blank.
The standard error for average, SE(avg), fills the second blank.
Observed values are rarely more than 2 or 3 SE’s away from the expected value.
6-38Stats
A WarningThe formulas here are for simple random samples. They likely do not apply to other kinds of samples.
6-39Stats
Probability histogramsand the normal curve
Toss a coin 100 times
35 40 45 50 55 60 65
0.0
0.02
0.04
0.06
0.08
Average = 50
SD = 5
6-40Stats
Using the Normal
• A coin is tossed 100 times. Use the normal curve to estimate the chances of–exactly 50 heads (7.96%)
–between 45 and 55 heads inclusive (72.87%)
–between 45 and 55 heads exclusive (63.19%)
• Probability histograms can be difficult to compute but the normal curve is easy.
6-41Stats
Drawing from a lopsided box
Assume that the box has tickets 1,9,5,5,5
2 4 6 8
0.0
0.2
0.4
0.6
x
400 450 500 550
0.0
0.004
0.008
0.012
x
4800 4900 5000 5100 5200 5300
0.0
0.002
0.004
x
6-42Stats
6-43Stats
Central Limit TheoremWhen drawing • a LARGE sample • at random • with replacement from a box,
And computing the sample sum of draws (net gain),the sample count (# heads), the sample average, orthe sample percent,
the probability histogram will follow a normal curve.
=
6-44Stats
When the sample size is large enough, to use a normal curve to make probability calculations we simply need–the expected value of the sum–(This can tell us about the
)
–the standard error of the sum–(This can tell us about the
)
Central Limit Theorem
6-45Stats
Central Limit TheoremWhen drawing • a LARGE sample • at random • with replacement from a box,
the probability histogram for the sample sum will follow a normal curve.
The average of this probability histogram is the EV(sum),
and the SD of this probability histogram is SE(sum).
6-46Stats
Central Limit TheoremWhen drawing • a LARGE sample • at random • with replacement from a box,
And computing the average of draws,
the probability histogram for the sample average (mean) will follow a normal curve.
The average of this probability histogram is the
EV(avg) = the population mean,
and the SD of this probability histogram is SE(avg).
6-47Stats
Using the normal curve
In practice
68% of the time the observed sum will be between expected value 1 SE
95% of the time the observed sum will be between expected value 2 SEs
6-48Stats
Using Normal Curvesto figure probabilities
Example: RouletteThere are 161 students, 3 TA’s, and one professor for this course.
Suppose that we each play ten $1 games of roulette, always betting on red.
Recall that a roulette wheel has 18 red, 18 black, and 2 green pockets.
If the balls lands in a red pocket, we get back our $1 and win an additional $1.
If the ball lands in a black or green pocket, we lose our $1.
6-49Stats
Roulette example
• Box model
• Expected value of sum
• Standard error
• Probability
6-50Stats
A short cut to SE
When there are only two different numbers in the box
number smallithfraction w
number bigithfraction w
numbersmall
numberbig
SD ××⎟⎟⎠⎞
⎜⎜⎝⎛
−=
SD×= draws ofnumber SE
6-51Stats
Classifying & Counting
For percentages or counts (number of occurrences of something), we can use a special Box Model.
For classifying and counting (looking at percentages or counts) use a box with 0’s and 1’s on the tickets.
Tickets marked ‘1’ signify a “special” item.
Tickets marked ‘0’ signify a “non-special” item.
6-52Stats
Classifying & Countingcontinued
What is the average of all of the ticket values in a 0-1 box?
What is the SD of all of the ticket values in a 0-1 box?
6-53Stats
Classifying & Counting
continued furtherWhat is the sum of a sample of n draws from a 0-1 box?
Expected Value for the sum of a sample of n draws from a 0-1 box?
What is the SD for the sum of a sample of n draws from a 0-1 box?
6-54Stats
Expected Value and Standard Error for
Sample CountsWhat is the Expected Value of the number of 1’s drawn from a 0-1 box?(This is the Expected Value for a sample count drawn from a population with _____ “special” items and _______ “non-special” items.)
What is the Standard Error for the count of 1’s drawn from a 0-1 box?
6-55Stats
A Sample Count is Likely ...
The sample count is likely to be around
__________ _, give or take ____________or so.
The expected value for the count, EV(count), fills the first blank.
The standard error for count, SE(count), fills the second blank.
Observed values are rarely more than 2 or 3 SE’s away from the expected value.
6-56Stats
Remember ...The formulas here are for simple random samples. They likely do not apply to other kinds of samples.
6-57Stats
Expected Value and Standard Error for
Sample ProportionsWhat is the Expected Value of the percentage of 1’s drawn from a 0-1 box?(This is the Expected Value for a sample percentage drawn from a population with _____ “special” items and _______ “non-special” items.)
What is the Standard Error for the percentage of 1’s drawn from a 0-1 box?
6-58Stats
A Sample Percentage is Likely ...
The sample percentage is likely to be around
__________ _, give or take ____________or so.
The expected value for the count, EV(%), fills the first blank.
The standard error for count, SE(%), fills the second blank.
Observed values are rarely more than 2 or 3 SE’s away from the expected value.
6-59Stats
Central Limit Theoremfor Percentages & Counts
When drawing a LARGE sample at random with replacement from a box, the probability histogram for the sample percentage will follow a normal curve.
The average of this probability histogram is the
EV(%) = the population %,
and the
SD of this probability histogram is
SE(%) = .
6-60Stats
Central Limit Theoremfor Percentages & Counts
When drawing a LARGE sample at random with replacement from a box, the probability histogram for the sample count will follow a normal curve.
The average of this probability histogram is the
EV(count) =
and the
SD of this probability histogram is
SE(count) =
6-61Stats
Summarizing …
Expected Values and Standard Errors
6-62Stats
Shape of the Sampling Distribution
and Sample SizeWhat happens to the Shape of the Sampling Distribution as the Sample Size gets large?
6-63Stats
Expected Values, Standard Errors, and Sample Size
What happens to Expected Values and Standard Errors as Sample Size increases?
6-64Stats
Summarizing the Central Limit Theorem
As the sample size (# of draws from the box, n) gets large, …
6-65Stats
EstimationBox models:If we know what goes in the box, then we can say how likely various outcomes are.
In practice,We do not know what is in the box.That is,We do not know the population parameters.InsteadWe use data to estimate the population parameters, such as average, %, sd, …
6-66Stats
Confidence IntervalsPoint estimate:
To estimate the population average (mean) with a single value, use
The likely size of your estimation error is
Interval estimate:
To estimate the population average (mean) with an interval of values, the width of your interval depends upon how confident you want to be that your interval includes the population mean.
6-67Stats
Confidence IntervalsA confidence interval is used when estimating an unknown parameter from sample data. The interval gives a range for the parameter - and a confidence level that the range covers the true value.
Chances are in the sampling procedure, not in the parameter.
6-68Stats
Confidence IntervalExample
Pennies
6-69Stats
Confidence IntervalsPoint estimate:
To estimate the population percentage with a single value, use
The likely size of your estimation error is
Interval estimate:
To estimate the population percentage with an interval of values, the width of your interval depends upon how confident you want to be that your interval includes the population percentage.
6-70Stats
Confidence IntervalExample
Pennies
6-71Stats
The Bootstrap
When estimating a population percentage (i.e. when sampling from a 0-1 box), the fraction of 0’s and 1’s in the box is unknown.
The SD of the box can be estimated by substituting the fraction of 0’s and 1’s in the sample for the unknown fractions in the box.
The estimate is good when the sample is reasonably large.
6-72Stats
Basic Method for Constructing
Confidence Intervals
6-73Stats
Interpreting a Confidence Interval
6-74Stats
Margin of Error
6-75Stats
Sample Size Computations
6-76Stats