Sampling and Sample Size

120
Benjamin Olken MIT Sampling and Sample Size

description

Sampling and Sample Size. Benjamin Olken MIT. Course Overview. What is evaluation? Measuring impacts (outcomes, indicators) Why randomize? How to randomize Threats and Analysis Sampling and sample size RCT: Start to Finish Cost Effectiveness Analysis and Scaling Up. Course Overview. - PowerPoint PPT Presentation

Transcript of Sampling and Sample Size

Page 1: Sampling and Sample Size

Benjamin OlkenMIT

Sampling and Sample Size

Page 2: Sampling and Sample Size

Course Overview

1. What is evaluation?2. Measuring impacts (outcomes,

indicators)3. Why randomize?4. How to randomize5. Threats and Analysis6. Sampling and sample size7. RCT: Start to Finish8. Cost Effectiveness Analysis and

Scaling Up

Page 3: Sampling and Sample Size

Course Overview

1. What is evaluation?2. Measuring impacts (outcomes,

indicators)3. Why randomize?4. How to randomize5. Threats and Analysis6. Sampling and sample size7. RCT: Start to Finish8. Cost Effectiveness Analysis and

Scaling Up

Page 4: Sampling and Sample Size

What’s the average result?

• If you were to roll a die once, what is the “expected result”? (i.e. the average)

Page 5: Sampling and Sample Size

Possible results & probability: 1 die

1 2 3 4 5 60

1/6

Page 6: Sampling and Sample Size

Rolling 1 die: possible results & average

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.00

1/6

Frequency Average

Page 7: Sampling and Sample Size

What’s the average result?

• If you were to roll two dice once, what is the expected average of the two dice?

Page 8: Sampling and Sample Size

Rolling 2 dice:Possible totals & likelihood

2 3 4 5 6 7 8 9 10 11 120

1/9

1/6

1/4

Page 9: Sampling and Sample Size

Rolling 2 dice: possible totals 12 possible totals, 36 permutations

Die 1

Die 2

2 3 4 5 6 7

3 4 5 6 7 8

4 5 6 7 8 9

5 6 7 8 9 10

6 7 8 9 10 11

7 8 9 10 11 12

Page 10: Sampling and Sample Size

Rolling 2 dice:Average score of dice & likelihood

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 60

1/9

1/6

1/4

Like

lihoo

d

Page 11: Sampling and Sample Size

Outcomes and Permutations

• Putting together permutations, you get:1. All possible outcomes2. The likelihood of each of those

outcomes– Each column represents one possible

outcome (average result)– Each block within a column represents one

possible permutation (to obtain that average)2.5

Page 12: Sampling and Sample Size

Rolling 3 dice:16 results 318, 216 permutations

1 1 1/3 1 2/3 2 2 1/3 2 2/3 3 3 1/3 3 2/3 4 4 1/3 4 2/3 5 5 1/3 5 2/3 6 0

1/9

1/6

Page 13: Sampling and Sample Size

Rolling 4 dice: 21 results, 1296 permutations

1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25 3.5 3.75 4 4.25 4.5 4.75 5 5.25 5.5 5.75 60%

2%

4%

6%

8%

10%

12%

14%

Page 14: Sampling and Sample Size

Rolling 5 dice:26 results, 7776 permutations

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 4.2 4.4 4.6 4.8 5 5.2 5.4 5.6 5.8 60%

2%

4%

6%

8%

10%

12%

Page 15: Sampling and Sample Size

Looks like a bell curve, or a normal distribution

1

1.166

6666

6666

667

1.333

3333

3333

333 1.5

1.666

6666

6666

667

1.833

3333

3333

333 2

2.166

6666

6666

667

2.333

3333

3333

333 2.5

2.666

6666

6666

667

2.833

3333

3333

333 3

3.166

6666

6666

667

3.333

3333

3333

334 3.5

3.666

6666

6666

667

3.833

3333

3333

334 4

4.166

6666

6666

667

4.333

3333

3333

334 4.5

4.666

6666

6666

667

4.833

3333

3333

334 5

5.166

6666

6666

667

5.333

3333

3333

334 5.5

5.666

6666

6666

666

5.833

3333

3333

334

0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

Rolling 10 dice:50 results, >60 million permutations

Page 16: Sampling and Sample Size

>99% of all rolls will yield an average between 3 and 4

1 1.72222222222222 2.44444444444444 3.16666666666666 3.88888888888888 4.6111111111111 5.333333333333320.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

Rolling 30 dice:150 results, 2 x 10 23 permutations*

Page 17: Sampling and Sample Size

>99% of all rolls will yield an average between 3 and 4

1 1.73333333333333 2.46666666666666 3.19999999999999 3.93333333333332 4.66666666666669 5.400000000000060.0%

0.5%

1.0%

1.5%

2.0%

Rolling 100 dice: 500 results, 6 x 10 77 permutations

Page 18: Sampling and Sample Size

Rolling dice: 2 lessons

1. The more dice you roll, the closer most averages are to the true average(the distribution gets “tighter”)-THE LAW OF LARGE NUMBERS-

2. The more dice you roll, the more the distribution of possible averages (the sampling distribution) looks like a bell curve (a normal distribution)-THE CENTRAL LIMIT THEOREM-

Page 19: Sampling and Sample Size

Which of these is more accurate?

I. II.

A. B. C.

0% 0%0%

A. I.B. II.C. Don’t know

Page 20: Sampling and Sample Size

Accuracy versus Precision

Precision

(Sampl

e Size)

Accuracy (Randomization)

truth

estimates

Page 21: Sampling and Sample Size

Accuracy versus Precision

Precision

(Sampl

e Size)

Accuracy (Randomization)

truth truthestimates

estimates

truth truthestimates

estimates

Page 22: Sampling and Sample Size

THE basic questions in statistics• How confident can you be in your

results?

• How big does your sample need to be?

Page 23: Sampling and Sample Size

THAT WAS JUST THE INTRODUCTION

Page 24: Sampling and Sample Size

Outline

• Sampling distributions– population distribution– sampling distribution– law of large numbers/central limit

theorem– standard deviation and standard error

• Detecting impact

Page 25: Sampling and Sample Size

Outline

• Sampling distributions– population distribution– sampling distribution– law of large numbers/central limit

theorem– standard deviation and standard error

• Detecting impact

Page 26: Sampling and Sample Size

Baseline test scores

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50100150200250300350400450500

frequency

test scores

Page 27: Sampling and Sample Size

Mean = 26

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50

100

150

200

250

300

350

400

450

500

26 frequency

mean

test scores

Page 28: Sampling and Sample Size

Standard Deviation = 20

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50

100

150

200

250

300

350

400

450

500

0

100

200

300

400

500

600

26 frequency sd

mean

test scores

1 Standard Deviation

Page 29: Sampling and Sample Size

Let’s do an experiment

• Take 1 Random test score from the pile of 16,000 tests

• Write down the value• Put the test back• Do these three steps again• And again• 8,000 times• This is like a random sample of

8,000 (with replacement)

Page 30: Sampling and Sample Size

Good, the average of the sample is about 26…

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50

100

150

200

250

300

350

400

450

500

-0.5%

0.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

4.0%

26mean

frequency

freq (N=1)

test scores

What can we say about this sample?

Page 31: Sampling and Sample Size

But…

• … I remember that as my sample goes, up, isn’t the sampling distribution supposed to turn into a bell curve?

• (Central Limit Theorem)• Is it that my sample isn’t large

enough?

Page 32: Sampling and Sample Size

This is the distribution of my sample of 8,000 students!

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50

100

150

200

250

300

350

400

450

500

-0.5%

0.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

4.0%

26mean

frequency

freq (N=1)

test scores

Population v. sampling distribution

Page 33: Sampling and Sample Size

Outline

• Sampling distributions– population distribution– sampling distribution– law of large numbers/central limit

theorem– standard deviation and standard error

• Detecting impact

Page 34: Sampling and Sample Size

How do we get from here…

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

100200300400500

test scores

To here…

This is the distribution of the population(Population Distribution)

This is the distribution of Means from all Random Samples(Sampling distribution)

Page 35: Sampling and Sample Size

Draw 10 random students, take the average, plot it: Do this 5 & 10 times.

Inadequate sample size

No clear distribution around population mean

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520123456789

10

Frequency of Means With 5 Samples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520123456789

10

Frequency of Means With 10 Samples

Page 36: Sampling and Sample Size

More sample means around population mean

Still spread a good deal

Draw 10 random students: 50 and 100 times

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520123456789

10

Frequency of Means With 50 Samples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520123456789

10

Frequency of Means with 100 Samples

Page 37: Sampling and Sample Size

Distribution now significantly more normal

Starting to see peaks

Draws 10 random students: 500 and 1000 times

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520

10

20

30

40

50

60

70

80

Frequency of Means With 500 Samples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520

10

20

30

40

50

60

70

80

Frequency of Means With 1000 Samples

Page 38: Sampling and Sample Size

Draw 10 Random students

• This is like a sample size of 10• What happens if we take a sample

size of 50?

Page 39: Sampling and Sample Size

N = 10 N = 50

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520

2

4

6

8

10

Frequency of Means With 5 Samples

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520

2

4

6

8

10

Frequency of Means With 10 Samples

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520

2

4

6

8

10

Frequency of Means With 5 Samples

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520

2

4

6

8

10

Frequency of Means With 10 Samples

Page 40: Sampling and Sample Size

Draws of 10 Draws of 50

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 5202468

101214

Frequency of Means With 50 Samples

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520

5

10

15

20

Frequency of Means with 100 Samples

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 5202468

101214

Frequency of Means With 50 Samples

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520

4

8

12

16

Frequency of Means With 100 Samples

Page 41: Sampling and Sample Size

Draws of 10 Draws of 50

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520

102030405060708090

Frequency of Means With 500 Samples

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520

20406080

100120140160

Frequency of Means With 1000 Samples

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520

102030405060708090

Frequency of Means With 500 Samples

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520

20406080

100120140160

Frequency of Means With 1000 Samples

Page 42: Sampling and Sample Size

Outline

• Sampling distributions– population distribution– sampling distribution– law of large numbers/central limit

theorem– standard deviation and standard error

• Detecting impact

Page 43: Sampling and Sample Size

Population & sampling distribution: Draw 1 random student (from 8,000)

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50

100

150

200

250

300

350

400

450

500

-0.5%

0.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

4.0%

26mean

frequency

freq (N=1)

test scores

Page 44: Sampling and Sample Size

Sampling Distribution: Draw 4 random students (N=4)

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50

100

150

200

250

300

350

400

450

500

-0.5%

0.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

4.0%

4.5%

26mean

frequency

freq (N=4)

test scores

Page 45: Sampling and Sample Size

Law of Large Numbers : N=9

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50

100

150

200

250

300

350

400

450

500

-1.0%

0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

7.0%

26mean

frequency

freq (N=9)

test scores

Page 46: Sampling and Sample Size

Law of Large Numbers: N =100

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50

100

150

200

250

300

350

400

450

500

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

26mean

frequency

freq (N=100)

test scores

Page 47: Sampling and Sample Size

The white line is a theoretical distribution

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50

100

150

200

250

300

350

400

450

500

-0.5%

0.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

4.0%

26mean

frequency

dist_1

freq (N=1)

test scores

Central Limit Theorem: N=1

Page 48: Sampling and Sample Size

Central Limit Theorem : N=4

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50

100

150

200

250

300

350

400

450

500

-0.5%

0.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

4.0%

4.5%

26

meanfre-quencydist_4freq (N=4)

test scores

Page 49: Sampling and Sample Size

Central Limit Theorem : N=9

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50

100

150

200

250

300

350

400

450

500

-1.0%

0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

7.0%

26mean

frequency

dist_9

freq (N=9)

test scores

Page 50: Sampling and Sample Size

Central Limit Theorem : N =100

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50

100

150

200

250

300

350

400

450

500

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

26

mean

frequency

dist_100

freq (N=100)

test scores

Page 51: Sampling and Sample Size

Outline

• Sampling distributions– population distribution– sampling distribution– law of large numbers/central limit

theorem– standard deviation and standard error

• Detecting impact

Page 52: Sampling and Sample Size

Standard deviation/error

• What’s the difference between the standard deviation and the standard error?

• The standard error = the standard deviation of the sampling distributions

Page 53: Sampling and Sample Size

Variance and Standard Deviation• Variance = 400

• Standard Deviation = 20

• Standard Error = SE =

Page 54: Sampling and Sample Size

Standard Deviation/ Standard Error

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50

100

150

200

250

300

350

400

450

500

-0.5%

0.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

4.0%

26

mean

frequency

sd

dist_1

test scores

Page 55: Sampling and Sample Size

Sample size ↑ x4, SE ↓ ½

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50

100

150

200

250

300

350

400

450

500

-0.5%

0.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

4.0%

4.5%

26mean

frequency

sd

se4

dist_4

test scores

Page 56: Sampling and Sample Size

Sample size ↑ x9, SE ↓ ?

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50

100

150

200

250

300

350

400

450

500

-1.0%

0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

7.0%

26mean

frequency

sd

se9

dist_9

test scores

Page 57: Sampling and Sample Size

Sample size ↑ x100, SE ↓?

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50

100

150

200

250

300

350

400

450

500

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

26

mean

frequency

sd

se100

dist_100

test scores

Page 58: Sampling and Sample Size

Outline

• Sampling distributions• Detecting impact

– significance– effect size – power– baseline and covariates– clustering– stratification

Page 59: Sampling and Sample Size

Baseline test scores

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

50

100

150

200

250

300

350

400

450

500

test scores

Page 60: Sampling and Sample Size

We implement the Balsakhi Program (lecture 3)

Case 2: Remedial Education in IndiaEvaluating the Balsakhi Program

Incorporating random assignment into the program

Case 2: Remedial Education in IndiaEvaluating the Balsakhi Program

Incorporating random assignment into the program

Page 61: Sampling and Sample Size

After the balsakhi programs, these are the endline test scores

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

20

40

60

80

100

120

140

160

test scores

Endline test scores

Page 62: Sampling and Sample Size

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

20

40

60

80

100

120

140

160

Endline test scores

The impact appears to be?

A. PositiveB. NegativeC. No impactD. Don’t know 0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100

050

100150200250300350400450500

Baseline test scores

Page 63: Sampling and Sample Size

Stop! That was the control group. The treatment group is red.

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

20

40

60

80

100

120

140

160

control

treatment

test scores

Post-test: control & treatment

Page 64: Sampling and Sample Size

Is this impact statistically significant?

A. B. C.

0% 0%0%A. YesB. NoC. Don’t know

01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000

20

40

60

80

100

120

140

160

control

treatment

control μ

treatment μ

test scores

Average Difference = 6 points

Page 65: Sampling and Sample Size

One experiment: 6 points

Page 66: Sampling and Sample Size

One experiment

Page 67: Sampling and Sample Size

Two experiments

Page 68: Sampling and Sample Size

A few more…

Page 69: Sampling and Sample Size

A few more…

Page 70: Sampling and Sample Size

Many more…

Page 71: Sampling and Sample Size

A whole lot more…

Page 72: Sampling and Sample Size

Page 73: Sampling and Sample Size

Running the experiment thousands of times…

By the Central Limit Theorem, these are normally distributed

Page 74: Sampling and Sample Size

Hypothesis testing

• In criminal law, most institutions follow the rule: “innocent until proven guilty”

• The presumption is that the accused is innocent and the burden is on the prosecutor to show guilt– The jury or judge starts with the “null

hypothesis” that the accused person is innocent

– The prosecutor has a hypothesis that the accused person is guilty

74

Page 75: Sampling and Sample Size

Hypothesis testing

• In program evaluation, instead of “presumption of innocence,” the rule is: “presumption of insignificance”

• The “Null hypothesis” (H0) is that there was no (zero) impact of the program

• The burden of proof is on the evaluator to show a significant effect of the program

Page 76: Sampling and Sample Size

Hypothesis testing: conclusions• If it is very unlikely (less than a 5%

probability) that the difference is solely due to chance: – We “reject our null hypothesis”

• We may now say: – “our program has a statistically

significant impact”

Page 77: Sampling and Sample Size

What is the significance level?• Type I error: rejecting the null

hypothesis even though it is true (false positive)

• Significance level: The probability that we will reject the null hypothesis even though it is true

Page 78: Sampling and Sample Size

Hypothesis testing: 95% confidence

YOU CONCLUDEEffective No Effect

THE TRUTH

Effective Type II Error (low power)

No Effect

Type I Error(5% of the time)

78

Page 79: Sampling and Sample Size

What is Power?

• Type II Error: Failing to reject the null hypothesis (concluding there is no difference), when indeed the null hypothesis is false.

• Power: If there is a measureable effect of our intervention (the null hypothesis is false), the probability that we will detect an effect (reject the null hypothesis)

Page 80: Sampling and Sample Size

Assume two effects: no effect and treatment effect β

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

Before the experiment

H0Hβ

Page 81: Sampling and Sample Size

Anything between lines cannot be distinguished from 0

Impose significance level of 5%

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

significanceHβH0

Page 82: Sampling and Sample Size

Shaded area shows % of time we would find Hβ true if it was

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

power

Can we distinguish Hβ from H0 ?

HβH0

Page 83: Sampling and Sample Size

What influences power?

• What are the factors that change the proportion of the research hypothesis that is shaded—i.e. the proportion that falls to the right (or left) of the null hypothesis curve?

• Understanding this helps us design more powerful experiments

83

Page 84: Sampling and Sample Size

Power: main ingredients

1. Effect Size2. Sample Size 3. Variance4. Proportion of sample in T vs. C5. Clustering

Page 85: Sampling and Sample Size

Power: main ingredients

1. Effect Size2. Sample Size 3. Variance4. Proportion of sample in T vs. C5. Clustering

Page 86: Sampling and Sample Size

Effect Size: 1*SE

• Hypothesized effect size determines distance between means

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

1 Standard Deviation

HβH0

Page 87: Sampling and Sample Size

Effect Size = 1*SE

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

significanceH0

Page 88: Sampling and Sample Size

The Null Hypothesis would be rejected only 26% of the time

Power: 26%If the true impact was 1*SE…

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

power

HβH0

Page 89: Sampling and Sample Size

Bigger hypothesized effect size distributions farther apart

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

Effect Size: 3*SE

3*SE

Page 90: Sampling and Sample Size

Bigger Effect size means more power

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

power

Effect size 3*SE: Power= 91%

H0

Page 91: Sampling and Sample Size

What effect size should you use when designing your experiment?

A. B. C. D.

25% 25%25%25%A. Smallest effect size that is still cost effective

B. Largest effect size you estimate your program to produce

C. BothD. Neither

Page 92: Sampling and Sample Size

• Let’s say we believe the impact on our participants is “3”

• What happens if take up is 1/3?• Let’s show this graphically

Effect size and take-up

Page 93: Sampling and Sample Size

Let’s say we believe the impact on our participants is “3”

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

Effect Size: 3*SE

3*SE

Page 94: Sampling and Sample Size

Take up is 33%. Effect size is 1/3rd• Hypothesized effect size determines

distance between means

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

1 Standard Deviation

HβH0

Page 95: Sampling and Sample Size

Take-up is reflected in the effect size

Back to: Power = 26%

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

power

HβH0

Page 96: Sampling and Sample Size

Power: main ingredients

1. Effect Size2. Sample Size 3. Variance4. Proportion of sample in T vs. C5. Clustering

Page 97: Sampling and Sample Size

By increasing sample size you increase…

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

power

Power: 91%

A. B. C. D. E.

0% 0% 0%0%0%

A. AccuracyB. PrecisionC. BothD. NeitherE. Don’t know

Page 98: Sampling and Sample Size

Power: Effect size = 1SD, Sample size = N

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

significance

Page 99: Sampling and Sample Size

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

significance

Power: Sample size = 4N

Page 100: Sampling and Sample Size

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

power

Power: 64%

Page 101: Sampling and Sample Size

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

significance

Power: Sample size = 9

Page 102: Sampling and Sample Size

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

power

Power: 91%

Page 103: Sampling and Sample Size

Power: main ingredients

1. Effect Size2. Sample Size 3. Variance4. Proportion of sample in T vs. C5. Clustering

Page 104: Sampling and Sample Size

What are typical ways to reduce the underlying variance

A. B. C. D. E. F.

0% 0% 0%0%0%0%

A.Include covariates

B.Increase the sample

C.Do a baseline survey

D.All of the aboveE.A and BF.A and C

Page 105: Sampling and Sample Size

Variance

• There is sometimes very little we can do to reduce the noise

• The underlying variance is what it is• We can try to “absorb” variance:

– using a baseline– controlling for other variables

• In practice, controlling for other variables (besides the baseline outcome) buys you very little

Page 106: Sampling and Sample Size

Power: main ingredients

1. Effect Size2. Sample Size 3. Variance4. Proportion of sample in T vs. C5. Clustering

Page 107: Sampling and Sample Size

Sample split: 50% C, 50% T

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

significanceH0

Equal split gives distributions that are the same “fatness”

Page 108: Sampling and Sample Size

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

power

Power: 91%

Page 109: Sampling and Sample Size

If it’s not 50-50 split?

• What happens to the relative fatness if the split is not 50-50.

• Say 25-75?

Page 110: Sampling and Sample Size

Sample split: 25% C, 75% T

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

significanceH0

Uneven distributions, not efficient, i.e. less power

Page 111: Sampling and Sample Size

-4 -3 -2 -1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

control

treatment

power

Power: 83%

Page 112: Sampling and Sample Size

Allocation to T v C

2

2

1

2

21 )(nn

XXsd

122

21

21)( 21 XXsd

15.134

11

31)( 21 XXsd

Page 113: Sampling and Sample Size

Power: main ingredients

1. Effect Size2. Sample Size 3. Variance4. Proportion of sample in T vs. C5. Clustering

Page 114: Sampling and Sample Size

Clustered design: intuition

• You want to know how close the upcoming national elections will be

• Method 1: Randomly select 50 people from entire Indian population

• Method 2: Randomly select 5 families, and ask ten members of each family their opinion

114

Page 115: Sampling and Sample Size

Low intra-cluster correlation (Rho)

Page 116: Sampling and Sample Size

HIGH intra-cluster correlation (rho)

Page 117: Sampling and Sample Size

All uneducated people live in one village. People with only primary education live in another. College grads live in a third, etc. Rho on education will be..

A. B. C. D.

25% 25%25%25%A. HighB. LowC. No effect on rhoD. Don’t know

Page 118: Sampling and Sample Size

If rho is high, what is a more efficient way of increasing power?

A. B. C. D.

0% 0%0%0%

A. Include more clusters in the sample

B. Include more people in clusters

C. BothD. Don’t know

Page 119: Sampling and Sample Size

Testing multiple treatmentsControl Group Balsakhi

CAL program Balsakhi + CAL

←0.15 SD→↖0.25 SD↘

↑0.15 SD

↑0.10 SD

↓←0.10 SD→

↗0.05 SD↙

50 50

50 100100

100200

200

Page 120: Sampling and Sample Size

END