Variance Estimation
-
Upload
melinda-oliver -
Category
Documents
-
view
96 -
download
4
description
Transcript of Variance Estimation
1
Optimal Number of Replicates for Variance Estimation
Mansour Fahimi, Darryl Creel, Peter Siegel, Matt Westlake, Ruby Johnson, and Jim Chromy
Third International Conference on Establishment Surveys(ICES-III)
June 21, 2007
2
Variance Estimation
Two general approaches for variance estimation With weighted data obtained under complex designs:
Linearization
Replication
3
Linearization
Approximate complex statistics in terms of L linear statistics
Estimate variance of from:
2
1
ˆˆˆ
ˆ
L
lll
l
EE
4
Replication
Partition the full sample into R subsamples (replicates)
Obtain separate estimates for from each replicate:
Estimate variance of by:
Rr ....., 1, r ,ˆ
R
R
rr
2
1
)ˆˆ(
5
How Many Replicates?
Recommendations regarding the optimal number of replicates for variance estimation are at variance:
Computational resources required can be intensive
For certain statistics a larger number of replicates might be needed to produce stable estimates of variance
What is the point of diminishing returns?
6
Research Methodology
Relying on two complex establishment surveys, this work presents an array of empirical results regarding the number of bootstrap replicates for variance estimation:
National Study of Postsecondary Faculty (NSOPF:04)
National Postsecondary Student Aid Study (NPSAS:04)
7
General Design SpecificationsNational Study of Postsecondary Faculty (NSOPF:04)
Survey of about 35,000 faculty and instructional staff
Across a sample of 1,080 institutions
In the 50 States and the District of Columbia
8
Sampling Methodology
Institutions selected with probability proportional to a measure of size to over-represent:
Hispanic
Non-Hispanic Black
Asian and Pacific Islander
Full-time other female
Used RTI’s cost/variance optimization procedure for sample allocation
9
Institution Sampling Frame
Degree Granting Carnegie Code Public Private Total
Doctor’s 15, 16, 52 190 110 300
Master’s 21, 22 270 320 590
Bachelor’s 31, 32, 33 90 480 570
Associate’s 40, 60 1,030 150 1,180
Other/Unknown51, 53 – 59, unclassified
110 620 730
Total 1,700 1,680 3,380
10
Institution Sample
Degree Granting Public Private Total
Doctor’s 190 110 300
Master’s 120 80 200
Bachelor’s 30 130 160
Associate’s 340 10 350
Other 10 60 70
Total 680 400 1,080
11
Expected Faculty CountsFrom Sampled Institutions by Strata
NSOPF stratum Black Hispanic Asian OFTF OFTM OPT Total
Public, doctor’s 10,720 8,660 32,630 58,870 115,830 51,110 277,820
Public, master's 4,670 3,150 4,950 14,120 20,440 22,130 69,460
Public, bachelor’s 810 340 520 1,430 2,110 3,880 9,090
Public, associate’s 12,250 9,240 6,100 21,100 21,700 82,570 152,960
Public, other 150 80 170 290 630 830 2,150
Private not-for-profit, doctor’s 6,060 3,760 13,110 21,490 47,370 33,280 125,080
Private not-for-profit, master's 1,110 950 1,020 4,930 7,020 12,530 27,550
Private not-for-profit, bachelor’s 1,360 390 670 3,920 6,270 5,440 18,050
Private not-for-profit, Associate’s 20 20 40 180 450 480 1,180
Private not-for-profit, other 330 120 250 790 1,680 2,700 5,880
Total 37,480 26,710 59,460 127,120 223,500 214,940 689,210
12
Target Number of Respondentsby Institution and Faculty Strata
Institution stratum Respondents Faculty stratum Respondents
Public doctor’s 6,200 Non-Hispanic Black 1,600
Public master’s 2,700 Hispanic 1,300
Public bachelor’s 600 Asian 900
Public associate’s 7,500 Other full-time female 4,600
Public other 500 Other full-time male 8,300
Private not-for-profit doctor’s 2,600 Other part-time 7,800
Private not-for-profit master’s 1,900
Private not-for-profit bachelor’s 1,700
Private not-for-profit associate’s 100
Private not-for-profit other 700
Total 24,500 24,500
13
Distribution of Respondents(by institution and faculty strata)
Institution stratum Respondents Faculty stratum Respondents
Public doctor’s 7,460 Non-Hispanic Black 2,060
Public master’s 2,680 Hispanic 1,700
Public bachelor’s 450 Asian 1,610
Public associate’s 6,410 Other full-time female 5,850
Public other 110 Other full-time male 8,500
Private not-for-profit doctor’s 3,160 Other part-time 6,380
Private not-for-profit master’s 2,270
Private not-for-profit bachelor’s 2,520
Private not-for-profit associate’s 190
Private not-for-profit other 850
Total 26,110 Total 26,110
14
Variance Estimation Methodology(NSOPF:04)
Used methodology developed by Kaufman (2004) to create bootstrap replicate weights:
Reflected finite population correction adjustment for the first stage (institution) selection.
Second stage (faculty selection) finite population correction factors were close to one and not reflected.
Produced 65 bootstrap replicates to meet Data Analysis System (DAS) requirements of NCES.
Calculated standard error of several statistics using the above bootstrap replicates and Taylor linearization method in SUDAAN.
15
Comparisons of Variance EstimatesSE of Percent Teaching as Principal Activity by Rank
(Bootstrap vs. Linearization)
0.0
0.3
0.5
0.8
1.0
Total Professor Associateprofessor
Assistantprofessor
Instructor Lecturer Other title
Sta
nd
ard
Err
or
16
Comparisons of Variance EstimatesSE of Percent Research as Principal Activity by Rank
(Bootstrap vs. Linearization)
0.0
0.3
0.5
0.8
1.0
Total Professor Associateprofessor
Assistantprofessor
Instructor Lecturer O ther title
Stan
dard
Err
or
17
Comparisons of Variance EstimatesSE of Percent Administration as Principal Activity by Rank
(Bootstrap vs. Linearization)
0.0
0.3
0.5
0.8
1.0
Total Professor Associateprofessor
Assistantprofessor
Instructor Lecturer O ther title
Stan
dard
Err
or
18
Comparisons of Variance EstimatesSE of Percent Full-time by Institution Type
(Bootstrap vs. Linearization)
0.0
5.0
10.0
15.0
20.0
PublicPh.D.
PublicMS
PublicBA
PublicAsso.
PublicOther
PrivatePh.D.
PrivateMS
PrivateBA
PrivateAsso.
PrivateOther
Sta
nd
ard
Err
or
19
Revised Variance Estimation Methodology(NSOPF:04)
Used methodology developed by Kaufman (2004) to create 200 bootstrap replicate weights.
Used 10, 11, …., 200 replicates to estimate relative standard error (RSE) of different statistics.
Repeated the above using 9 random permutations of replicates to estimate RSE of the same statistics.
Used Taylor linearization to estimate relative standard error of estimates via SUDAAN.
20
RSE of Percent Asians by Number of Replicates
0.60%
0.65%
0.70%
0.75%
0.80%
0.85%
0.90%
0.95%
1.00%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
21
RSE of Percent Asians by Number of Replicates(Taylor Linearization and Permutations of Replicates)
0.40%
0.90%
1.40%
1.90%
2.40%
2.90%
3.40%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
22
RSE of Percent Age < 35 by Number of Replicates
2.00%
2.10%
2.20%
2.30%
2.40%
2.50%
2.60%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
23
RSE of Percent Age < 35 by Number of Replicates(Taylor Linearization and Permutations of Replicates)
1.2%
1.8%
2.4%
3.0%
3.6%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
24
RSE of Percent Citizen by Number of Replicates
0.15%
0.18%
0.20%
0.23%
0.25%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
25
RSE of Percent Citizen by Number of Replicates(Taylor Linearization and Permutations of Replicates)
0.10%
0.13%
0.15%
0.18%
0.20%
0.23%
0.25%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
26
RSE of Percent Full-time by Number of Replicates
0.0000001%
0.0000002%
0.0000003%
0.0000004%
0.0000005%
0.0000006%
0.0000007%
0.0000008%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
27
RSE of Percent Full-time by Number of Replicates
(Taylor Linearization and Permutations of Replicates)
0.00%
0.30%
0.60%
0.90%
1.20%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
28
RSE of Percent Master’s by Number of Replicates
0.80%
1.00%
1.20%
1.40%
1.60%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
29
RSE of Percent Master’s by Number of Replicates(Taylor Linearization and Permutations of Replicates)
0.80%
1.00%
1.20%
1.40%
1.60%
1.80%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
30
RSE of Percent Teaching as Principal Activity by Number of Replicates
0.35%
0.40%
0.45%
0.50%
0.55%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
31
RSE of Percent Teaching as Principal Activity by Number of Replicates(Taylor Linearization and Permutations of Replicates)
0.25%
0.30%
0.35%
0.40%
0.45%
0.50%
0.55%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
32
RSE of Mean Income by Number of Replicates
0.30%
0.35%
0.40%
0.45%
0.50%
0.55%
0.60%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
33
RSE of Mean Income by Number of Replicates(Taylor Linearization and Permutations of Replicates)
0.20%
0.40%
0.60%
0.80%
1.00%
1.20%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
34
RSE of Median Income by Number of Replicates
1.00%
1.10%
1.20%
1.30%
1.40%
1.50%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
35
RSE of Median Income by Number of Replicates(Taylor Linearization and Permutations of Replicates)
0.10%
0.58%
1.05%
1.53%
2.00%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
36
RSE of Regression InterceptIncome = Hours + Race + Hours Race
0.60%
0.70%
0.80%
0.90%
1.00%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
37
RSE of Regression InterceptIncome = Hours + Race + Hours Race
(Taylor Linearization and Permutations of Replicates)
0.60%
0.70%
0.80%
0.90%
1.00%
1.10%
0 20 40 60 80 100 120 140 160 180 200
Number of Replicates
Rel
ativ
e St
anda
rd E
rror
38
RSE of Regression Slope (Hours)Income = Hours + Race + Hours Race
6.00%
7.00%
8.00%
9.00%
10.00%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
39
RSE of Regression Slope (Hours)Income = Hours + Race + Hours Race
(Taylor Linearization and Permutations of Replicates)
5.0%
6.0%
7.0%
8.0%
9.0%
10.0%
11.0%
12.0%
13.0%
0 20 40 60 80 100 120 140 160 180 200
Number of Replicates
Rel
ativ
e St
anda
rd E
rror
40
RSE of Regression Slope (Race)Income = Hours + Race + Hours Race
18.0%
20.5%
23.0%
25.5%
28.0%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
41
RSE of Regression Slope (Race)Income = Hours + Race + Hours Race
(Taylor Linearization and Permutations of Replicates)
15.0%
20.0%
25.0%
30.0%
0 20 40 60 80 100 120 140 160 180 200
Number of Replicates
Rel
ativ
e St
anda
rd E
rror
42
RSE of Regression Slope (Hours Race)Income = Hours + Race + Hours Race
30.0%
35.0%
40.0%
45.0%
50.0%
55.0%
60.0%
0 20 40 60 80 100 120 140 160 180 200
Number of Bootstrap Replicates
Rel
ativ
e St
anda
rd E
rror
43
RSE of Regression Slope (Hours Race)Income = Hours + Race + Hours Race
(Taylor Linearization and Permutations of Replicates)
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
110.0%
0 20 40 60 80 100 120 140 160 180 200
Number of Replicates
Rel
ativ
e St
anda
rd E
rror
44
Conclusions(Rough & Interim)
Complex statistics do require more replicates for stable variance estimation
It seems that:
64 replicates might be inadequate
200 replicates seem to be overkill
Somewhere between 100 to 200 replicates might be sufficient