Variance Estimation

44
1 Optimal Number of Replicates for Variance Estimation Mansour Fahimi, Darryl Creel, Peter Siegel, Matt Westlake, Ruby Johnson, and Jim Chromy Third International Conference on Establishment Surveys (ICES-III) June 21, 2007

description

Optimal Number of Replicates for Variance Estimation Mansour Fahimi, Darryl Creel, Peter Siegel, Matt Westlake, Ruby Johnson, and Jim Chromy Third International Conference on Establishment Surveys (ICES-III) June 21, 2007. Variance Estimation. - PowerPoint PPT Presentation

Transcript of Variance Estimation

Page 1: Variance Estimation

1

Optimal Number of Replicates for Variance Estimation

Mansour Fahimi, Darryl Creel, Peter Siegel, Matt Westlake, Ruby Johnson, and Jim Chromy

Third International Conference on Establishment Surveys(ICES-III)

June 21, 2007

Page 2: Variance Estimation

2

Variance Estimation

Two general approaches for variance estimation With weighted data obtained under complex designs:

Linearization

Replication

Page 3: Variance Estimation

3

Linearization

Approximate complex statistics in terms of L linear statistics

Estimate variance of from:

2

1

ˆˆˆ

ˆ

L

lll

l

EE

Page 4: Variance Estimation

4

Replication

Partition the full sample into R subsamples (replicates)

Obtain separate estimates for from each replicate:

Estimate variance of by:

Rr ....., 1, r ,ˆ

R

R

rr

2

1

)ˆˆ(

Page 5: Variance Estimation

5

How Many Replicates?

Recommendations regarding the optimal number of replicates for variance estimation are at variance:

Computational resources required can be intensive

For certain statistics a larger number of replicates might be needed to produce stable estimates of variance

What is the point of diminishing returns?

Page 6: Variance Estimation

6

Research Methodology

Relying on two complex establishment surveys, this work presents an array of empirical results regarding the number of bootstrap replicates for variance estimation:

National Study of Postsecondary Faculty (NSOPF:04)

National Postsecondary Student Aid Study (NPSAS:04)

Page 7: Variance Estimation

7

General Design SpecificationsNational Study of Postsecondary Faculty (NSOPF:04)

Survey of about 35,000 faculty and instructional staff

Across a sample of 1,080 institutions

In the 50 States and the District of Columbia

Page 8: Variance Estimation

8

Sampling Methodology

Institutions selected with probability proportional to a measure of size to over-represent:

Hispanic

Non-Hispanic Black

Asian and Pacific Islander

Full-time other female

Used RTI’s cost/variance optimization procedure for sample allocation

Page 9: Variance Estimation

9

Institution Sampling Frame

Degree Granting Carnegie Code Public Private Total

Doctor’s 15, 16, 52 190 110 300

Master’s 21, 22 270 320 590

Bachelor’s 31, 32, 33 90 480 570

Associate’s 40, 60 1,030 150 1,180

Other/Unknown51, 53 – 59, unclassified

110 620 730

Total   1,700 1,680 3,380

Page 10: Variance Estimation

10

Institution Sample

Degree Granting Public Private Total

Doctor’s 190 110 300

Master’s 120 80 200

Bachelor’s 30 130 160

Associate’s 340 10 350

Other 10 60 70

Total 680 400 1,080

Page 11: Variance Estimation

11

Expected Faculty CountsFrom Sampled Institutions by Strata

NSOPF stratum Black Hispanic Asian OFTF OFTM OPT Total

Public, doctor’s 10,720 8,660 32,630 58,870 115,830 51,110 277,820

Public, master's 4,670 3,150 4,950 14,120 20,440 22,130 69,460

Public, bachelor’s 810 340 520 1,430 2,110 3,880 9,090

Public, associate’s 12,250 9,240 6,100 21,100 21,700 82,570 152,960

Public, other 150 80 170 290 630 830 2,150

Private not-for-profit, doctor’s 6,060 3,760 13,110 21,490 47,370 33,280 125,080

Private not-for-profit, master's 1,110 950 1,020 4,930 7,020 12,530 27,550

Private not-for-profit, bachelor’s 1,360 390 670 3,920 6,270 5,440 18,050

Private not-for-profit, Associate’s 20 20 40 180 450 480 1,180

Private not-for-profit, other 330 120 250 790 1,680 2,700 5,880

Total 37,480 26,710 59,460 127,120 223,500 214,940 689,210

Page 12: Variance Estimation

12

Target Number of Respondentsby Institution and Faculty Strata

Institution stratum Respondents Faculty stratum Respondents

Public doctor’s 6,200 Non-Hispanic Black 1,600

Public master’s 2,700 Hispanic 1,300

Public bachelor’s 600 Asian 900

Public associate’s 7,500 Other full-time female 4,600

Public other 500 Other full-time male 8,300

Private not-for-profit doctor’s 2,600 Other part-time 7,800

Private not-for-profit master’s 1,900    

Private not-for-profit bachelor’s 1,700    

Private not-for-profit associate’s 100    

Private not-for-profit other 700    

Total 24,500 24,500

Page 13: Variance Estimation

13

Distribution of Respondents(by institution and faculty strata)

Institution stratum Respondents Faculty stratum Respondents

Public doctor’s 7,460 Non-Hispanic Black 2,060

Public master’s 2,680 Hispanic 1,700

Public bachelor’s 450 Asian 1,610

Public associate’s 6,410 Other full-time female 5,850

Public other 110 Other full-time male 8,500

Private not-for-profit doctor’s 3,160 Other part-time 6,380

Private not-for-profit master’s 2,270    

Private not-for-profit bachelor’s 2,520    

Private not-for-profit associate’s 190    

Private not-for-profit other 850    

Total 26,110 Total 26,110

Page 14: Variance Estimation

14

Variance Estimation Methodology(NSOPF:04)

Used methodology developed by Kaufman (2004) to create bootstrap replicate weights:

Reflected finite population correction adjustment for the first stage (institution) selection.

Second stage (faculty selection) finite population correction factors were close to one and not reflected.

Produced 65 bootstrap replicates to meet Data Analysis System (DAS) requirements of NCES.

Calculated standard error of several statistics using the above bootstrap replicates and Taylor linearization method in SUDAAN.

Page 15: Variance Estimation

15

Comparisons of Variance EstimatesSE of Percent Teaching as Principal Activity by Rank

(Bootstrap vs. Linearization)

0.0

0.3

0.5

0.8

1.0

Total Professor Associateprofessor

Assistantprofessor

Instructor Lecturer Other title

Sta

nd

ard

Err

or

Page 16: Variance Estimation

16

Comparisons of Variance EstimatesSE of Percent Research as Principal Activity by Rank

(Bootstrap vs. Linearization)

0.0

0.3

0.5

0.8

1.0

Total Professor Associateprofessor

Assistantprofessor

Instructor Lecturer O ther title

Stan

dard

Err

or

Page 17: Variance Estimation

17

Comparisons of Variance EstimatesSE of Percent Administration as Principal Activity by Rank

(Bootstrap vs. Linearization)

0.0

0.3

0.5

0.8

1.0

Total Professor Associateprofessor

Assistantprofessor

Instructor Lecturer O ther title

Stan

dard

Err

or

Page 18: Variance Estimation

18

Comparisons of Variance EstimatesSE of Percent Full-time by Institution Type

(Bootstrap vs. Linearization)

0.0

5.0

10.0

15.0

20.0

PublicPh.D.

PublicMS

PublicBA

PublicAsso.

PublicOther

PrivatePh.D.

PrivateMS

PrivateBA

PrivateAsso.

PrivateOther

Sta

nd

ard

Err

or

Page 19: Variance Estimation

19

Revised Variance Estimation Methodology(NSOPF:04)

Used methodology developed by Kaufman (2004) to create 200 bootstrap replicate weights.

Used 10, 11, …., 200 replicates to estimate relative standard error (RSE) of different statistics.

Repeated the above using 9 random permutations of replicates to estimate RSE of the same statistics.

Used Taylor linearization to estimate relative standard error of estimates via SUDAAN.

Page 20: Variance Estimation

20

RSE of Percent Asians by Number of Replicates

0.60%

0.65%

0.70%

0.75%

0.80%

0.85%

0.90%

0.95%

1.00%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 21: Variance Estimation

21

RSE of Percent Asians by Number of Replicates(Taylor Linearization and Permutations of Replicates)

0.40%

0.90%

1.40%

1.90%

2.40%

2.90%

3.40%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 22: Variance Estimation

22

RSE of Percent Age < 35 by Number of Replicates

2.00%

2.10%

2.20%

2.30%

2.40%

2.50%

2.60%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 23: Variance Estimation

23

RSE of Percent Age < 35 by Number of Replicates(Taylor Linearization and Permutations of Replicates)

1.2%

1.8%

2.4%

3.0%

3.6%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 24: Variance Estimation

24

RSE of Percent Citizen by Number of Replicates

0.15%

0.18%

0.20%

0.23%

0.25%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 25: Variance Estimation

25

RSE of Percent Citizen by Number of Replicates(Taylor Linearization and Permutations of Replicates)

0.10%

0.13%

0.15%

0.18%

0.20%

0.23%

0.25%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 26: Variance Estimation

26

RSE of Percent Full-time by Number of Replicates

0.0000001%

0.0000002%

0.0000003%

0.0000004%

0.0000005%

0.0000006%

0.0000007%

0.0000008%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 27: Variance Estimation

27

RSE of Percent Full-time by Number of Replicates

(Taylor Linearization and Permutations of Replicates)

0.00%

0.30%

0.60%

0.90%

1.20%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 28: Variance Estimation

28

RSE of Percent Master’s by Number of Replicates

0.80%

1.00%

1.20%

1.40%

1.60%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 29: Variance Estimation

29

RSE of Percent Master’s by Number of Replicates(Taylor Linearization and Permutations of Replicates)

0.80%

1.00%

1.20%

1.40%

1.60%

1.80%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 30: Variance Estimation

30

RSE of Percent Teaching as Principal Activity by Number of Replicates

0.35%

0.40%

0.45%

0.50%

0.55%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 31: Variance Estimation

31

RSE of Percent Teaching as Principal Activity by Number of Replicates(Taylor Linearization and Permutations of Replicates)

0.25%

0.30%

0.35%

0.40%

0.45%

0.50%

0.55%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 32: Variance Estimation

32

RSE of Mean Income by Number of Replicates

0.30%

0.35%

0.40%

0.45%

0.50%

0.55%

0.60%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 33: Variance Estimation

33

RSE of Mean Income by Number of Replicates(Taylor Linearization and Permutations of Replicates)

0.20%

0.40%

0.60%

0.80%

1.00%

1.20%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 34: Variance Estimation

34

RSE of Median Income by Number of Replicates

1.00%

1.10%

1.20%

1.30%

1.40%

1.50%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 35: Variance Estimation

35

RSE of Median Income by Number of Replicates(Taylor Linearization and Permutations of Replicates)

0.10%

0.58%

1.05%

1.53%

2.00%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 36: Variance Estimation

36

RSE of Regression InterceptIncome = Hours + Race + Hours Race

0.60%

0.70%

0.80%

0.90%

1.00%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 37: Variance Estimation

37

RSE of Regression InterceptIncome = Hours + Race + Hours Race

(Taylor Linearization and Permutations of Replicates)

0.60%

0.70%

0.80%

0.90%

1.00%

1.10%

0 20 40 60 80 100 120 140 160 180 200

Number of Replicates

Rel

ativ

e St

anda

rd E

rror

Page 38: Variance Estimation

38

RSE of Regression Slope (Hours)Income = Hours + Race + Hours Race

6.00%

7.00%

8.00%

9.00%

10.00%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 39: Variance Estimation

39

RSE of Regression Slope (Hours)Income = Hours + Race + Hours Race

(Taylor Linearization and Permutations of Replicates)

5.0%

6.0%

7.0%

8.0%

9.0%

10.0%

11.0%

12.0%

13.0%

0 20 40 60 80 100 120 140 160 180 200

Number of Replicates

Rel

ativ

e St

anda

rd E

rror

Page 40: Variance Estimation

40

RSE of Regression Slope (Race)Income = Hours + Race + Hours Race

18.0%

20.5%

23.0%

25.5%

28.0%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 41: Variance Estimation

41

RSE of Regression Slope (Race)Income = Hours + Race + Hours Race

(Taylor Linearization and Permutations of Replicates)

15.0%

20.0%

25.0%

30.0%

0 20 40 60 80 100 120 140 160 180 200

Number of Replicates

Rel

ativ

e St

anda

rd E

rror

Page 42: Variance Estimation

42

RSE of Regression Slope (Hours Race)Income = Hours + Race + Hours Race

30.0%

35.0%

40.0%

45.0%

50.0%

55.0%

60.0%

0 20 40 60 80 100 120 140 160 180 200

Number of Bootstrap Replicates

Rel

ativ

e St

anda

rd E

rror

Page 43: Variance Estimation

43

RSE of Regression Slope (Hours Race)Income = Hours + Race + Hours Race

(Taylor Linearization and Permutations of Replicates)

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

110.0%

0 20 40 60 80 100 120 140 160 180 200

Number of Replicates

Rel

ativ

e St

anda

rd E

rror

Page 44: Variance Estimation

44

Conclusions(Rough & Interim)

Complex statistics do require more replicates for stable variance estimation

It seems that:

64 replicates might be inadequate

200 replicates seem to be overkill

Somewhere between 100 to 200 replicates might be sufficient