Using unequal probability sampling to limit antici-pated variances of regression estimators
Anders Holmberg ICES III 07
Anders Holmberg
Department of Research & Development
Statistics Sweden
SE-701 89 Örebro
SwedenTel: +46 19 176905
Fax: +46 19 177084
E-mail: [email protected]
Outline
• Background
• The problem• Some theory• Auxiliary Information
• An application in a business survey• Comparisons and Results
• CommentsAnders Holmberg ICES III 07
Background(1)
Anders Holmberg ICES III 07
• Prepare the sampling frame
• Derive and analyse diagnostic data
• Decide on a sampling design, sampling scheme and estimator
• Launch the survey
Background(2)
Anders Holmberg ICES III 07
• Prerequisites– A well defined business population– Several parameters of interest – Design-based inference– An up-to-date frame from the business
register– Admin. data available as auxiliary
information– Attempt to find the most efficient/(robust)
design
Background(6)
Anders Holmberg ICES III 07
(1) Number of employees (u1)
(2) Turnover (u2)
(3) Personnel expenses (u3)
(4) Investments (u4)
(t-2)
(1) Number of employees (u1)
(2) Turnover (u2)
(3) Personnel expenses (u3)
(4) Investments (u4)
(t-1)
(1) Number of employees (y1)
(2) Turnover (y2)
(3) Personnel expenses (y3)
(4) Investments (y4)
(t)
(1) Number of employees (u1)
(2) Turnover (u2)
(3) Personnel expenses (u3)
(4) Investments (u4)
(t-1)
A design that minimizes
is such that
( )( . . )k qk k q opt k qk qkUi e n
Minimum of ˆ( )qq y rANV t is
2 2min
1ˆ( )qq y r qk qkU U
ANV tn
Brewer, Hajek, Cassel et al., Rosén
Optimal design in the single variable case
Anders Holmberg ICES III 07
21)1()ˆ( qkU kyq q
tANV
Population plot
Anders Holmberg ICES III 07
nypers2
0
20
40
60
80
100
120
0 10 20 30 40 50 60 70 80 90 100
E.g. if : 22
)( qkqk u
qkk u
~~qkqkk u’Guesstimate’ to find
size measures
The multivariate case?
Anders Holmberg ICES III 07
(1) Number of employees (u1)
(2) Turnover (u2)
(3) Personnel expenses (u3)
(4) Investments (u4)
(t-2)
(1) Number of employees (u1)
(2) Turnover (u2)
(3) Personnel expenses (u3)
(4) Investments (u4)
(t-1)
(1) Number of employees (y1)
(2) Turnover (y2)
(3) Personnel expenses (y3)
(4) Investments (y4)
(t)
(1) Number of employees (u1)
(2) Turnover (u2)
(3) Personnel expenses (u3)
(4) Investments (u4)
(t-1)
The multivariate case
Anders Holmberg ICES III 07
The least we should do is to analyse the various designs’ possible effects on different estimators, before we make the design choice.
Derive inclusion probabilities as a function of standardized (univariate) size measures
Maximal Brewer selection
The multivariate case
Anders Holmberg ICES III 07
There is no evident criterion of optimality, but some are better than others.
),,1())ˆ(( QqvtANVg qyq
)))ˆ((,)),ˆ(((1 Qyy tANVgtANVgf Minimize
under the restrictions
Try to find a design that in some sence is optimal for all important parameters?
Scale effects are neutralized, the relations between the ANVq :s and the corresponding single parameter minimum values (The Brewer selection) are used .
Anders Holmberg ICES III 07
The multivariate case some optimisation approaches
Minimizing a weighted sum of relative efficiency losses:
Q
q yq
pyq
q
q
iq
tANV
tANVHANOREL
1 min )ˆ(
)ˆ(
is minimized when
Q
qU qkkoptq
qkqk H
121
)(
2
~)1~(
~
If we want to put restrictions on certain parameters, e.g.
21
1 21 ( )
min ( ) ( 1) ( 1)
subject to
Qqk
q kUq q opt k qkU
f H
π
Optimization model:
0
21
1 2( )
0 1 1, ,
0
1 1, ,( 1)
k
kU
qkq qkU
q opt k qkU
k N
g n
g v q Q
π
π
min
ˆ( )1, ,
ˆ( )q i
q
q y r pq
y rq
ANV tv q Q
ANV t
Then a design that minimizes ANOREL can be obtained through non-linear programming
Anders Holmberg ICES III 07
The multivariate case some optimisation approaches
An Application
Anders Holmberg ICES III 07
The 4 variables studied for three branches (strata)
SNI25: Manufacturers of food products & beverages
N=749,
SNI28: Manufacturers of metal goods (except machines and devices)
N=2292,
SNI33: Manufacturers of optical instruments
N=323,
290)( snE
112)( snE
64)( snE
Analysis and comparisons made on admin data from previous reference times. Plots, Estimated correlations and gammacoefficients
An Application
Anders Holmberg ICES III 07
• A common ratio model pictures the relationships reasonably well if the corresponding older variable is used as regressor variable. (Strongest pairwise correlation over branches and time, although doubts exist for the investment variable)
• Estimates of the gammacoefficient are sensitive.
• Estimates ranged between 0.2 and 0.9 and sometimes deteriorated!?
• For investments very weak or no heteroscedasticity
• For the other three variables,
“cannot be ruled out” and is simple as a guesstimate
5.0~
An Application
Anders Holmberg ICES III 07
5.0
studyvariable Food Metal Optic
employees 0.5 0.5
turnover 0.5 0.5 0.5
P-costs 0.5 0.5 0.5
investment 0.2 0 0.2
StrataAuxiliary/
size variable
~ ~
)(~
qkkoptq u
~
1ku ~
1ku~
1ku
~
2ku~
2ku~
2ku
~
3ku ~
3ku~
3ku
~
4ku~
4ku ~
4ku
An Application
Anders Holmberg ICES III 07
• Computations of inclusion probabilities and the anticipated variances using the Brewer selection (Maximal brewer selection)
• Computation of the optimisation based approaches, with the extra condition that
15.1)ˆ(
)ˆ(
min
q
iq
yq
pyq
tANV
tANV
1p
2p
3p
4p
5p
6p
Study variables
Considered Design Employees Turnover P-cost Invest Mean
Opt. on Empl 0 24.3 3.5 24.4 13.0
Opt. on Turn 24.5 0 19.1 74.4 29.5
Opt. on P-cost 3.3 16.4 0 43.0 13.0
Opt. on Invest 34.4 91.7 45.9 0 43.0
Minimizing Anorel 2.8 13.9 2.9 19.5 9.8
Minimizing Anorel with restrictions 5.7 15.0 6.5 15.0 10.6
Food & Beverages ]1))ˆ()ˆ([(100 min qiq yqpyq tANVtANV
1p
2p
3p
4p
5p
6p
Study variables
Considered Design Employees Turnover P-cost Invest Mean
Opt. on Empl 0 13.0 3.4 30.6 11.8
Opt. on Turn 11.0 0 8.2 51.6 17.7
Opt. on P-cost 3.0 8.3 0 37.1 12.1
Opt. on Invest 44.7 73.3 53.4 0 42.8
Minimizing Anorel 2.9 7.7 3.1 20.1 8.5
Minimizing Anorel with restrictions 4.5 10.9 5.3 15.0 8.9
Optical Instruments ]1))ˆ()ˆ([(100 min qiq yqpyq tANVtANV
1p
2p
3p
4p
5p
6p
Study variables
Considered Design Employees Turnover P-cost Invest Mean
Opt. on Empl 0 7.3 2.0 21.0 7.6
Opt. on Turn 6.1 0 4.3 33.0 10.9
Opt. on P-cost 1.8 5.0 0 24.4 7.8
Opt. on Invest 31.6 51.2 36.1 0 29.7
Minimizing Anorel 1.7 4.9 1.9 14.0 5.6
Minimizing Anorel with restrictions 3.4 7.0 4.0 10.0 6.1
Metal goods ]1))ˆ()ˆ([(100 min qiq yqpyq tANVtANV
Maximal Brewer selection satisfies the criteria but with 25% larger sample
3.364)( snE
Does it work on the estimator variances?
Anders Holmberg ICES III 07
• In most cases we will never know
• However, for these variables we can check against admin. data (coming in 1.5 year later)
• Using
• Where is the Taylor expanded variance of the ratio estimator under poisson sampling
( )
*
ˆ( )100( 1)
ˆ( )PO q q i
q
T y r p
q y r
V t
V t
( )
1 2ˆ( ) 1PO qT y r k qkU
V t E
1p
2p
3p
4p
5p
6p
Study variables
Considered Design Employees Turnover P-cost Invest Mean loss
Opt. on Empl 0 19 9 72 25
Opt. on Turn 9 0 36 67 28
Opt. on P-cost 8 6 0 68 21
Opt. on Invest 146 146 222 24 134
Minimizing Anorel 2 2 31 0 9
Minimizing Anorel with restrictions 10 8 45 8 18
Metal goodsRatios of the Taylor expanded variances to the smallest variance of each estimator (%)
Summary
• Carefully choosing appropriate size measures to get limits anticipated variances of
regression estimators. And Brewer’s results can be extended to a multivariate situation.
• If there is a multivariate issue and you intend to use auxiliary information in the design, diagnostic computations are important.
• With an optimization approach we know what we are aiming to minimize and with the non-linear programming approach some practical trouble in designing a pps-sample are avoided.
Anders Holmberg ICES III 07
qkk ~
Top Related