Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face...
-
Upload
kelly-dummitt -
Category
Documents
-
view
215 -
download
1
Transcript of Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face...
![Page 1: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/1.jpg)
Sampling Distributions Suppose I throw a dice
10000 times and count the number of times each face turns up:
Frequency of Single Dice Throws
1200
1600
2000
1 2 3 4 5 6
Dice Face
Fre
qu
ency
Frequency
Each score has a similar frequency (uniform distribution)
![Page 2: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/2.jpg)
Sampling Distributions
If instead you throw the dice 10 times (or throw ten dice) and take the average score each time, you get something like this:
Average of 10 Throws
0
500
1000
1500
2000
2500
3000
3500
1 2 3 4 5 6
Average of 10 Throws
Fre
qu
ency
Frequency
++ + +
+
++ + + }
{
10
![Page 3: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/3.jpg)
Sampling Distributions
Average of 10 Throws
0
500
1000
1500
2000
2500
3000
3500
Average of 10 Throws
Fre
qu
ency
Frequency
Average of 20 Throws
0500
10001500200025003000350040004500
1 2 3 4 5 6
Average of 20
Fre
qu
en
cy
Frequency
Compare averaging 10 vs 20 throws each go:
![Page 4: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/4.jpg)
Frequency of Single Dice Throws
1200
1600
2000
1 2 3 4 5 6
Dice Face
Fre
qu
ency
Frequency
Average of 10 Throws
0
500
1000
1500
2000
2500
3000
3500
1 2 3 4 5 6
Average of 10 Throws
Fre
qu
ency
Frequency
Average of 20 Throws
0500
10001500200025003000350040004500
Average of 20
Fre
qu
ency
Frequency
10 x 20 x
Note what happens to the spread and shape of the distribution of average scores
1 x
![Page 5: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/5.jpg)
1.Central Limit Theorem
This is a theorem of statistics and probability that implies that the distribution of a sum (or average) of any set of scores approaches a Normal Distribution as the number of scores involved in the sum (or average) gets larger and larger.
Single Throws Average of N ThrowsLight Bulb Life Average Life
![Page 6: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/6.jpg)
2. Relation between the variation between individual scores and the variation between the averages of several scores.
If the individual scores (values) in a population have a Variance of X then the variance of the averages of samples of size n has a variance of X/10.
This is intuitive – think of individual heights:
![Page 7: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/7.jpg)
Population distribution of Individual Heights
5’5”
4’0” 7’0”
(Population SD)approx = 7”
![Page 8: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/8.jpg)
Population distribution of raw scores
68% of scores lie within 1 standard deviationOf the mean
5’5”
4’0” 7’0”
68% of people have a height between 4’10” and 6’0”
6’0”4’10”
![Page 9: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/9.jpg)
Suppose we take a random sample of 10 people and measure their heights:
5’5”
4’0” 7’0”
The mean of the sample (x ) will tend to be quite close to the average height:
x
![Page 10: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/10.jpg)
Keep taking samples of 10 people and measure average height:
5’5”
4’0” 7’0”
x
Back to17
![Page 11: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/11.jpg)
Keep taking samples of 50 people and measure average height:
5’5”
4’0” 7’0”
x
Back to17
![Page 12: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/12.jpg)
Distribution of Sample Means x cluster around the population mean more closely than the raw scores do:
5’5”
4’0” 7’0”
![Page 13: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/13.jpg)
The degree of spread (standard deviation of the sample means) around the population mean depends on the number (n) in each sample.
5’5”
4’0” 7’0”
n=10n=20n=30
![Page 14: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/14.jpg)
Variance and SD
As we observed before the Variance of sample means is the variance of the population of individual scores divided by the sample size.
Because the Standard Deviation is the square root of the Variance, the Standard Deviation of the sample means is equal to the Standard Deviation of the individual scores divided by the square root of the sample size.
![Page 15: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/15.jpg)
The amount of variation (standard deviation of the sample means) around the population mean depends on the number (n) in each sample.
The standard deviation of sample means of size n around the
population mean is equal to the population standard deviation
divided by √n and is called the standard error of the mean (se)
![Page 16: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/16.jpg)
5’5”
4’0” 7’0”
Raw scores SD= 7”
Samples of size 10SD of the sample means = 7/sqrt(10)= 7/3.16 = 2.2
7”
2.2”
![Page 17: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/17.jpg)
Quick Summary
We get an idea of the amount of variation in the population of individual scores from the variation within our sample (i.e. the data).
Given that our sample average is from x number of scores we know how the sample averages would be expected to vary from one sample to the next.
![Page 18: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/18.jpg)
T-Test
The T-Test works by assuming the data collected in two conditions is equivalent to collecting two samples from the same ‘parent’ population (this is the null hypothesis).
The variation within the data is a good estimate of the variation in the parent population. This, together with the size of the samples, allows one to predict how much variation to expect in the means of one sample to the next.E.g.
![Page 19: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/19.jpg)
T Test
If the two sample means obtained in the experiment conditions vary by more than we’d expect from this simple relation between the variation of individual scores and sample averages then it is unlikely that the data in the two conditions is equivalent to two samples from the same parent population.
It is more likely they reflect two samples from different parent populations (i.e. one’s with different means)
![Page 20: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/20.jpg)
I.e. if the data does reflect samples from the same population we expect our samples, say of size 10, to cluster around the population mean quite closely:
5’5”
4’0” 7’0”
Parent population of individual scores
Expected variation of samples of size 10
![Page 21: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/21.jpg)
Not:
5’5”
4’0” 7’0”
Expected variation of samples of size 10
Parent population of individual scores
![Page 22: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/22.jpg)
It is more likely that the real situation is that the two samples come from different parent populations:
5’5”
4’0” 7’0”
6’5”
![Page 23: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/23.jpg)
So an experiment selects 8 babies at random and feeds half Marmite and half Bovril. Heights measured at 20 years.
Vs.
![Page 24: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/24.jpg)
6’5”
It is more likely that the real situation is that the two samples come from different parent populations:
5’5”
4’0” 7’0”
![Page 25: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/25.jpg)
T-Test & ANOVA The T-Test works by computing the likelihood of
getting a certain difference between two sample means.
If you have experiments with more than 2 conditions there is no single distance between two means. Instead you can examine the ‘average’ distance or variation between them. The Variance of those condition means is just such a measure.
ANOVA works out how likely it is to get the observed amount of variation (Variance) between several sample means if they really had been drawn from the same parent population.
![Page 26: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/26.jpg)
In a nutshell
The data from the conditions of an experiment can be conceptualised as samples from a parent population.
The null hypothesis assumes that these samples have been drawn from a single population.
If the variation (or just difference in the case of a T-Test) between the means of these ‘samples’ is greater than we would expect given the samples size used, then we conclude that it is unlikely that they can be thought of as having been drawn from a single population but instead come from separate ones (i.e. ones that have different means).
![Page 27: Sampling Distributions Suppose I throw a dice 10000 times and count the number of times each face turns up: Each score has a similar frequency (uniform.](https://reader033.fdocuments.us/reader033/viewer/2022051515/551bbcff550346b9588b47de/html5/thumbnails/27.jpg)
Some minor details: The T-test actually works out the sampling distribution of
the difference between two means. When the probability of getting the observed difference is less than 5% H0 is rejected – i.e. the two populations from which the means were drawn are assumed not to be equal.
ANOVA works out:1. How the sample means vary and2. How they should vary given their size and the individual
variationIf these two estimates differ widely then H0 is rejected.