Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring...
-
Upload
marcus-mckinney -
Category
Documents
-
view
216 -
download
3
Transcript of Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring...
![Page 1: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/1.jpg)
Benjamin OlkenMIT
Sampling and Sample Size
![Page 2: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/2.jpg)
Course Overview
1. What is evaluation?2. Measuring impacts (outcomes,
indicators)3. Why randomize?4. How to randomize5. Threats and Analysis6. Sampling and sample size7. RCT: Start to Finish8. Cost Effectiveness Analysis and
Scaling Up
![Page 3: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/3.jpg)
Course Overview
1. What is evaluation?2. Measuring impacts (outcomes,
indicators)3. Why randomize?4. How to randomize5. Threats and Analysis6. Sampling and sample size7. RCT: Start to Finish8. Cost Effectiveness Analysis and
Scaling Up
![Page 4: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/4.jpg)
What’s the average result?
• If you were to roll a die once, what is the “expected result”? (i.e. the average)
![Page 5: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/5.jpg)
Possible results & probability: 1 die
1 2 3 4 5 60
1/6
![Page 6: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/6.jpg)
Rolling 1 die: possible results & average
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.00
1/6
Frequency Average
![Page 7: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/7.jpg)
What’s the average result?
• If you were to roll two dice once, what is the expected average of the two dice?
![Page 8: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/8.jpg)
Rolling 2 dice:Possible totals & likelihood
2 3 4 5 6 7 8 9 10 11 12
Fre-quency
0.027777777777777
9
0.055555555555555
5
0.083333333333333
3
0.111111111111111
0.138888888888889
0.166666666666667
0.138888888888889
0.111111111111111
0.083333333333333
3
0.055555555555555
5
0.027777777777777
9
0
1/8
1/5
![Page 9: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/9.jpg)
Rolling 2 dice: possible totals 12 possible totals, 36 permutations
Die 1
Die 2
2 3 4 5 6 7
3 4 5 6 7 8
4 5 6 7 8 9
5 6 7 8 9 10
6 7 8 9 10 11
7 8 9 10 11 12
![Page 10: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/10.jpg)
Rolling 2 dice:Average score of dice & likelihood
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
Fre-quency
0.02777777777777
79
0.05555555555555
55
0.08333333333333
33
0.11111111111111
1
0.13888888888888
9
0.16666666666666
7
0.13888888888888
9
0.11111111111111
1
0.08333333333333
33
0.05555555555555
55
0.02777777777777
79
0
1/8
1/5
Lik
elih
oo
d
![Page 11: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/11.jpg)
Outcomes and Permutations
• Putting together permutations, you get:1. All possible outcomes2. The likelihood of each of those
outcomes– Each column represents one possible
outcome (average result)– Each block within a column represents one
possible permutation (to obtain that average)
2.5
![Page 12: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/12.jpg)
Rolling 3 dice:16 results 318, 216 permutations
1 1 1/3 1 2/3 2 2 1/3 2 2/3 3 3 1/3 3 2/3 4 4 1/3 4 2/3 5 5 1/3 5 2/3 6
Fre-quency
0.00462962962962964
0.01388888888888
89
0.02777777777777
79
0.04629629629629
64
0.06944444444444
45
0.09722222222222
22
0.11574074074074
1
0.125 0.125 0.11574074074074
1
0.09722222222222
22
0.06944444444444
45
0.04629629629629
64
0.02777777777777
79
0.01388888888888
89
0.00462962962962964
0
1/8
![Page 13: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/13.jpg)
Rolling 4 dice: 21 results, 1296 permutations
1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25 3.5 3.75 4 4.25 4.5 4.75 5 5.25 5.5 5.75 60%
2%
4%
6%
8%
10%
12%
14%
![Page 14: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/14.jpg)
Rolling 5 dice:26 results, 7776 permutations
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 4.2 4.4 4.6 4.8 5 5.2 5.4 5.6 5.8 60%
2%
4%
6%
8%
10%
12%
![Page 15: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/15.jpg)
Looks like a bell curve, or a normal distribution
1
1.16
6666
6666
6667
1.33
3333
3333
3333 1.
5
1.66
6666
6666
6667
1.83
3333
3333
3333 2
2.16
6666
6666
6667
2.33
3333
3333
3333 2.
5
2.66
6666
6666
6667
2.83
3333
3333
3333 3
3.16
6666
6666
6667
3.33
3333
3333
3334 3.
5
3.66
6666
6666
6667
3.83
3333
3333
3334 4
4.16
6666
6666
6667
4.33
3333
3333
3334 4.
5
4.66
6666
6666
6667
4.83
3333
3333
3334 5
5.16
6666
6666
6667
5.33
3333
3333
3334 5.
5
5.66
6666
6666
6666
5.83
3333
3333
3334
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
Rolling 10 dice:50 results, >60 million permutations
![Page 16: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/16.jpg)
>99% of all rolls will yield an average between 3 and 4
1 1.72222222222222 2.44444444444444 3.16666666666666 3.88888888888888 4.6111111111111 5.333333333333320.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
Rolling 30 dice:150 results, 2 x 10 23 permutations*
![Page 17: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/17.jpg)
>99% of all rolls will yield an average between 3 and 4
1 1.73333333333333 2.46666666666666 3.19999999999999 3.93333333333332 4.66666666666669 5.400000000000060.0%
0.5%
1.0%
1.5%
2.0%
Rolling 100 dice: 500 results, 6 x 10 77 permutations
![Page 18: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/18.jpg)
Rolling dice: 2 lessons
1. The more dice you roll, the closer most averages are to the true average(the distribution gets “tighter”)-THE LAW OF LARGE NUMBERS-
2. The more dice you roll, the more the distribution of possible averages (the sampling distribution) looks like a bell curve (a normal distribution)-THE CENTRAL LIMIT THEOREM-
![Page 19: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/19.jpg)
Which of these is more accurate?
I. II.
A. B. C.
0% 0%0%
A. I.B. II.C. Don’t know
![Page 20: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/20.jpg)
Accuracy versus Precision
Precision
(Sampl
e Size)
Accuracy (Randomization)
truth
estimates
![Page 21: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/21.jpg)
Accuracy versus Precision
Precision
(Sampl
e Size)
Accuracy (Randomization)
truth truthestimates
estimates
truth truthestimates
estimates
![Page 22: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/22.jpg)
THE basic questions in statistics
• How confident can you be in your results?
• How big does your sample need to be?
![Page 23: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/23.jpg)
THAT WAS JUST THE INTRODUCTION
![Page 24: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/24.jpg)
Outline
• Sampling distributions– population distribution– sampling distribution– law of large numbers/central limit
theorem– standard deviation and standard error
• Detecting impact
![Page 25: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/25.jpg)
Outline
• Sampling distributions– population distribution– sampling distribution– law of large numbers/central limit
theorem– standard deviation and standard error
• Detecting impact
![Page 26: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/26.jpg)
Baseline test scores
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
frequency
test scores
![Page 27: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/27.jpg)
Mean = 26
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
26 frequency
mean
test scores
![Page 28: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/28.jpg)
Standard Deviation = 20
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
0
100
200
300
400
500
600
26 frequency sd
mean
test scores
1 Standard Deviation
![Page 29: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/29.jpg)
Let’s do an experiment
• Take 1 Random test score from the pile of 16,000 tests
• Write down the value• Put the test back• Do these three steps again• And again• 8,000 times• This is like a random sample of
8,000 (with replacement)
![Page 30: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/30.jpg)
Good, the average of the sample is about 26…
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
26mean
frequency
freq (N=1)
test scores
What can we say about this sample?
![Page 31: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/31.jpg)
But…
• … I remember that as my sample goes, up, isn’t the sampling distribution supposed to turn into a bell curve?
• (Central Limit Theorem)• Is it that my sample isn’t large
enough?
![Page 32: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/32.jpg)
This is the distribution of my sample of 8,000 students!
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
26mean
frequency
freq (N=1)
test scores
Population v. sampling distribution
![Page 33: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/33.jpg)
Outline
• Sampling distributions– population distribution– sampling distribution– law of large numbers/central limit
theorem– standard deviation and standard error
• Detecting impact
![Page 34: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/34.jpg)
How do we get from here…
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
100200300400500
test scores
To here…
This is the distribution of the population(Population Distribution)
This is the distribution of Means from all Random Samples(Sampling distribution)
![Page 35: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/35.jpg)
Draw 10 random students, take the average, plot it: Do this 5 & 10 times.
Inadequate sample size
No clear distribution around population mean
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520123456789
10
Frequency of Means With 5 Samples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520123456789
10
Frequency of Means With 10 Samples
![Page 36: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/36.jpg)
More sample means around population mean
Still spread a good deal
Draw 10 random students: 50 and 100 times
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520123456789
10
Frequency of Means With 50 Samples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520123456789
10
Frequency of Means with 100 Samples
![Page 37: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/37.jpg)
Distribution now significantly more normal
Starting to see peaks
Draws 10 random students: 500 and 1000 times
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520
10
20
30
40
50
60
70
80
Frequency of Means With 500 Samples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520
10
20
30
40
50
60
70
80
Frequency of Means With 1000 Samples
![Page 38: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/38.jpg)
Draw 10 Random students
• This is like a sample size of 10• What happens if we take a sample
size of 50?
![Page 39: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/39.jpg)
N = 10 N = 50
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
2
4
6
8
10
Frequency of Means With 5 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
2
4
6
8
10
Frequency of Means With 10 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
2
4
6
8
10
Frequency of Means With 5 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
2
4
6
8
10
Frequency of Means With 10 Samples
![Page 40: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/40.jpg)
Draws of 10 Draws of 50
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
2
4
6
8
10
12
14
Frequency of Means With 50 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
5
10
15
20
Frequency of Means with 100 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
2
4
6
8
10
12
14
Frequency of Means With 50 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
4
8
12
16
Frequency of Means With 100 Samples
![Page 41: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/41.jpg)
Draws of 10 Draws of 50
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
102030405060708090
Frequency of Means With 500 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
20406080
100120140160
Frequency of Means With 1000 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
102030405060708090
Frequency of Means With 500 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
20406080
100120140160
Frequency of Means With 1000 Samples
![Page 42: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/42.jpg)
Outline
• Sampling distributions– population distribution– sampling distribution– law of large numbers/central limit
theorem– standard deviation and standard error
• Detecting impact
![Page 43: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/43.jpg)
Population & sampling distribution: Draw 1 random student (from 8,000)
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
26mean
frequency
freq (N=1)
test scores
![Page 44: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/44.jpg)
Sampling Distribution: Draw 4 random students (N=4)
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
4.5%
26mean
frequency
freq (N=4)
test scores
![Page 45: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/45.jpg)
Law of Large Numbers : N=9
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-1.0%
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
26mean
frequency
freq (N=9)
test scores
![Page 46: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/46.jpg)
Law of Large Numbers: N =100
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
26mean
frequency
freq (N=100)
test scores
![Page 47: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/47.jpg)
The white line is a theoretical distribution
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
26
mean
frequency
dist_1
freq (N=1)
test scores
Central Limit Theorem: N=1
![Page 48: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/48.jpg)
Central Limit Theorem : N=4
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
4.5%
26
mean
fre-quency
dist_4
freq (N=4)
test scores
![Page 49: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/49.jpg)
Central Limit Theorem : N=9
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-1.0%
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
26
mean
frequency
dist_9
freq (N=9)
test scores
![Page 50: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/50.jpg)
Central Limit Theorem : N =100
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
26
mean
frequency
dist_100
freq (N=100)
test scores
![Page 51: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/51.jpg)
Outline
• Sampling distributions– population distribution– sampling distribution– law of large numbers/central limit
theorem– standard deviation and standard error
• Detecting impact
![Page 52: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/52.jpg)
Standard deviation/error
• What’s the difference between the standard deviation and the standard error?
• The standard error = the standard deviation of the sampling distributions
![Page 53: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/53.jpg)
Variance and Standard Deviation
• Variance = 400
• Standard Deviation = 20
• Standard Error = SE =
![Page 54: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/54.jpg)
Standard Deviation/ Standard Error
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
26
mean
frequency
sd
dist_1
test scores
![Page 55: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/55.jpg)
Sample size ↑ x4, SE ↓ ½
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
4.5%
26
mean
frequency
sd
se4
dist_4
test scores
![Page 56: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/56.jpg)
Sample size ↑ x9, SE ↓ ?
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-1.0%
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
26mean
frequency
sd
se9
dist_9
test scores
![Page 57: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/57.jpg)
Sample size ↑ x100, SE ↓?
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
26
mean
frequency
sd
se100
dist_100
test scores
![Page 58: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/58.jpg)
Outline
• Sampling distributions• Detecting impact
– significance– effect size – power– baseline and covariates– clustering– stratification
![Page 59: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/59.jpg)
Baseline test scores
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
test scores
![Page 60: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/60.jpg)
We implement the Balsakhi Program (lecture 3)
Case 2: Remedial Education in IndiaEvaluating the Balsakhi Program
Incorporating random assignment into the program
Case 2: Remedial Education in IndiaEvaluating the Balsakhi Program
Incorporating random assignment into the program
![Page 61: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/61.jpg)
After the balsakhi programs, these are the endline test scores
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
20
40
60
80
100
120
140
160
test scores
Endline test scores
![Page 62: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/62.jpg)
0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
0
20
40
60
80
100
120
140
160
Endline test scores
The impact appears to be?
A. PositiveB. NegativeC. No impactD. Don’t know
0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
0
50
100
150
200
250
300
350
400
450
500
Baseline test scores
![Page 63: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/63.jpg)
Stop! That was the control group. The treatment group is red.
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
20
40
60
80
100
120
140
160
control
treatment
test scores
Post-test: control & treatment
![Page 64: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/64.jpg)
Is this impact statistically significant?
A. B. C.
0% 0%0%A. YesB. NoC. Don’t know
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
20
40
60
80
100
120
140
160
control
treatment
control μ
treatment μ
test scores
Average Difference = 6 points
![Page 65: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/65.jpg)
One experiment: 6 points
![Page 66: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/66.jpg)
One experiment
![Page 67: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/67.jpg)
Two experiments
![Page 68: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/68.jpg)
A few more…
![Page 69: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/69.jpg)
A few more…
![Page 70: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/70.jpg)
Many more…
![Page 71: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/71.jpg)
A whole lot more…
![Page 72: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/72.jpg)
…
![Page 73: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/73.jpg)
Running the experiment thousands of times…
By the Central Limit Theorem, these are normally distributed
![Page 74: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/74.jpg)
Hypothesis testing
• In criminal law, most institutions follow the rule: “innocent until proven guilty”
• The presumption is that the accused is innocent and the burden is on the prosecutor to show guilt– The jury or judge starts with the “null
hypothesis” that the accused person is innocent
– The prosecutor has a hypothesis that the accused person is guilty
74
![Page 75: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/75.jpg)
Hypothesis testing
• In program evaluation, instead of “presumption of innocence,” the rule is: “presumption of insignificance”
• The “Null hypothesis” (H0) is that there was no (zero) impact of the program
• The burden of proof is on the evaluator to show a significant effect of the program
![Page 76: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/76.jpg)
Hypothesis testing: conclusions
• If it is very unlikely (less than a 5% probability) that the difference is solely due to chance: – We “reject our null hypothesis”
• We may now say: – “our program has a statistically
significant impact”
![Page 77: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/77.jpg)
What is the significance level?
• Type I error: rejecting the null hypothesis even though it is true (false positive)
• Significance level: The probability that we will reject the null hypothesis even though it is true
![Page 78: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/78.jpg)
Hypothesis testing: 95% confidence
YOU CONCLUDE
Effective No Effect
THE TRUTH
Effective Type II Error
(low power)
No Effect
Type I Error(5% of the time)
78
![Page 79: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/79.jpg)
What is Power?
• Type II Error: Failing to reject the null hypothesis (concluding there is no difference), when indeed the null hypothesis is false.
• Power: If there is a measureable effect of our intervention (the null hypothesis is false), the probability that we will detect an effect (reject the null hypothesis)
![Page 80: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/80.jpg)
Assume two effects: no effect and treatment effect β
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
Before the experiment
H0Hβ
![Page 81: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/81.jpg)
Anything between lines cannot be distinguished from 0
Impose significance level of 5%
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
significanceHβH0
![Page 82: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/82.jpg)
Shaded area shows % of time we would find Hβ true if it was
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
power
Can we distinguish Hβ from H0 ?
HβH0
![Page 83: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/83.jpg)
What influences power?
• What are the factors that change the proportion of the research hypothesis that is shaded—i.e. the proportion that falls to the right (or left) of the null hypothesis curve?
• Understanding this helps us design more powerful experiments
83
![Page 84: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/84.jpg)
Power: main ingredients
1. Effect Size2. Sample Size 3. Variance4. Proportion of sample in T vs. C5. Clustering
![Page 85: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/85.jpg)
Power: main ingredients
1. Effect Size2. Sample Size 3. Variance4. Proportion of sample in T vs. C5. Clustering
![Page 86: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/86.jpg)
Effect Size: 1*SE
• Hypothesized effect size determines distance between means
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
1 Standard Deviation
HβH0
![Page 87: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/87.jpg)
Effect Size = 1*SE
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
significanceH0
Hβ
![Page 88: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/88.jpg)
The Null Hypothesis would be rejected only 26% of the time
Power: 26%If the true impact was 1*SE…
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
power
HβH0
![Page 89: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/89.jpg)
Bigger hypothesized effect size distributions farther apart
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
Effect Size: 3*SE
3*SE
![Page 90: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/90.jpg)
Bigger Effect size means more power
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
power
Effect size 3*SE: Power= 91%
H0
Hβ
![Page 91: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/91.jpg)
What effect size should you use when designing your experiment?
A. B. C. D.
25% 25%25%25%A. Smallest effect size that is still cost effective
B. Largest effect size you estimate your program to produce
C. BothD. Neither
![Page 92: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/92.jpg)
• Let’s say we believe the impact on our participants is “3”
• What happens if take up is 1/3?• Let’s show this graphically
Effect size and take-up
![Page 93: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/93.jpg)
Let’s say we believe the impact on our participants is “3”
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
Effect Size: 3*SE
3*SE
![Page 94: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/94.jpg)
Take up is 33%. Effect size is 1/3rd
• Hypothesized effect size determines distance between means
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
1 Standard Deviation
HβH0
![Page 95: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/95.jpg)
Take-up is reflected in the effect size
Back to: Power = 26%
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
power
HβH0
![Page 96: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/96.jpg)
Power: main ingredients
1. Effect Size2. Sample Size 3. Variance4. Proportion of sample in T vs. C5. Clustering
![Page 97: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/97.jpg)
By increasing sample size you increase…
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
power
Power: 91%
A. B. C. D. E.
0% 0% 0%0%0%
A. AccuracyB. PrecisionC. BothD. NeitherE. Don’t know
![Page 98: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/98.jpg)
Power: Effect size = 1SD, Sample size = N
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
significance
![Page 99: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/99.jpg)
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
significance
Power: Sample size = 4N
![Page 100: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/100.jpg)
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
power
Power: 64%
![Page 101: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/101.jpg)
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
significance
Power: Sample size = 9
![Page 102: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/102.jpg)
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
power
Power: 91%
![Page 103: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/103.jpg)
Power: main ingredients
1. Effect Size2. Sample Size 3. Variance4. Proportion of sample in T vs. C5. Clustering
![Page 104: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/104.jpg)
What are typical ways to reduce the underlying variance
A. B. C. D. E. F.
0% 0% 0%0%0%0%
A.Include covariates
B.Increase the sample
C.Do a baseline survey
D.All of the aboveE.A and BF.A and C
![Page 105: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/105.jpg)
Variance
• There is sometimes very little we can do to reduce the noise
• The underlying variance is what it is• We can try to “absorb” variance:
– using a baseline– controlling for other variables
• In practice, controlling for other variables (besides the baseline outcome) buys you very little
![Page 106: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/106.jpg)
Power: main ingredients
1. Effect Size2. Sample Size 3. Variance4. Proportion of sample in T vs. C5. Clustering
![Page 107: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/107.jpg)
Sample split: 50% C, 50% T
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
significanceH0
Hβ
Equal split gives distributions that are the same “fatness”
![Page 108: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/108.jpg)
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
power
Power: 91%
![Page 109: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/109.jpg)
If it’s not 50-50 split?
• What happens to the relative fatness if the split is not 50-50.
• Say 25-75?
![Page 110: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/110.jpg)
Sample split: 25% C, 75% T
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
significanceH0
Hβ
Uneven distributions, not efficient, i.e. less power
![Page 111: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/111.jpg)
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
power
Power: 83%
![Page 112: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/112.jpg)
Allocation to T v C
2
2
1
2
21 )(nn
XXsd
12
2
2
1
2
1)( 21 XXsd
15.13
4
1
1
3
1)( 21 XXsd
![Page 113: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/113.jpg)
Power: main ingredients
1. Effect Size2. Sample Size 3. Variance4. Proportion of sample in T vs. C5. Clustering
![Page 114: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/114.jpg)
Clustered design: intuition
• You want to know how close the upcoming national elections will be
• Method 1: Randomly select 50 people from entire Indian population
• Method 2: Randomly select 5 families, and ask ten members of each family their opinion
114
![Page 115: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/115.jpg)
Low intra-cluster correlation (Rho)
![Page 116: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/116.jpg)
HIGH intra-cluster correlation (rho)
![Page 117: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/117.jpg)
All uneducated people live in one village. People with only primary education live in another. College grads live in a third, etc. Rho on education will be..
A. B. C. D.
25% 25%25%25%A. HighB. LowC. No effect on rhoD. Don’t know
![Page 118: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/118.jpg)
If rho is high, what is a more efficient way of increasing power?
A. B. C. D.
0% 0%0%0%
A. Include more clusters in the sample
B. Include more people in clusters
C. BothD. Don’t know
![Page 119: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/119.jpg)
Testing multiple treatments
Control Group Balsakhi
CAL program Balsakhi + CAL
←0.15 SD→
↖
0.25 SD↘
↑0.15 SD
↓
↑0.10 SD
↓
←0.10 SD→
↗
0.05 SD↙
50 50
50 100100
100200
200
![Page 120: Benjamin Olken MIT Sampling and Sample Size. Course Overview 1.What is evaluation? 2.Measuring impacts (outcomes, indicators) 3.Why randomize? 4.How to.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f565503460f94c79aa4/html5/thumbnails/120.jpg)
END