Using unequal probability sampling to limit antici-pated variances of regression estimators

22
Using unequal probability sampling to limit antici- pated variances of regression estimators Anders Holmberg ICES III 07 Anders Holmberg Department of Research & Development Statistics Sweden SE-701 89 Örebro Sweden Tel: +46 19 176905 Fax: +46 19 177084 E-mail: [email protected]

description

Using unequal probability sampling to limit antici-pated variances of regression estimators. Anders Holmberg Department of Research & Development Statistics Sweden SE-701 89 Örebro Sweden Tel: +46 19 176905 Fax: +46 19 177084 E-mail: [email protected]. Anders HolmbergICES III 07. - PowerPoint PPT Presentation

Transcript of Using unequal probability sampling to limit antici-pated variances of regression estimators

Page 1: Using unequal probability sampling to limit antici-pated variances of regression estimators

Using unequal probability sampling to limit antici-pated variances of regression estimators

Anders Holmberg ICES III 07

Anders Holmberg

Department of Research & Development

Statistics Sweden

SE-701 89 Örebro

SwedenTel: +46 19 176905

Fax: +46 19 177084

E-mail: [email protected]

Page 2: Using unequal probability sampling to limit antici-pated variances of regression estimators

Outline

• Background

• The problem• Some theory• Auxiliary Information

• An application in a business survey• Comparisons and Results

• CommentsAnders Holmberg ICES III 07

Page 3: Using unequal probability sampling to limit antici-pated variances of regression estimators

Background(1)

Anders Holmberg ICES III 07

• Prepare the sampling frame

• Derive and analyse diagnostic data

• Decide on a sampling design, sampling scheme and estimator

• Launch the survey

Page 4: Using unequal probability sampling to limit antici-pated variances of regression estimators

Background(2)

Anders Holmberg ICES III 07

• Prerequisites– A well defined business population– Several parameters of interest – Design-based inference– An up-to-date frame from the business

register– Admin. data available as auxiliary

information– Attempt to find the most efficient/(robust)

design

Page 5: Using unequal probability sampling to limit antici-pated variances of regression estimators

Background(6)

Anders Holmberg ICES III 07

(1) Number of employees (u1)

(2) Turnover (u2)

(3) Personnel expenses (u3)

(4) Investments (u4)

(t-2)

(1) Number of employees (u1)

(2) Turnover (u2)

(3) Personnel expenses (u3)

(4) Investments (u4)

(t-1)

(1) Number of employees (y1)

(2) Turnover (y2)

(3) Personnel expenses (y3)

(4) Investments (y4)

(t)

(1) Number of employees (u1)

(2) Turnover (u2)

(3) Personnel expenses (u3)

(4) Investments (u4)

(t-1)

Page 6: Using unequal probability sampling to limit antici-pated variances of regression estimators

A design that minimizes

is such that

( )( . . )k qk k q opt k qk qkUi e n

Minimum of ˆ( )qq y rANV t is

2 2min

1ˆ( )qq y r qk qkU U

ANV tn

Brewer, Hajek, Cassel et al., Rosén

Optimal design in the single variable case

Anders Holmberg ICES III 07

21)1()ˆ( qkU kyq q

tANV

Page 7: Using unequal probability sampling to limit antici-pated variances of regression estimators

Population plot

Anders Holmberg ICES III 07

nypers2

0

20

40

60

80

100

120

0 10 20 30 40 50 60 70 80 90 100

E.g. if : 22

)( qkqk u

qkk u

~~qkqkk u’Guesstimate’ to find

size measures

Page 8: Using unequal probability sampling to limit antici-pated variances of regression estimators

The multivariate case?

Anders Holmberg ICES III 07

(1) Number of employees (u1)

(2) Turnover (u2)

(3) Personnel expenses (u3)

(4) Investments (u4)

(t-2)

(1) Number of employees (u1)

(2) Turnover (u2)

(3) Personnel expenses (u3)

(4) Investments (u4)

(t-1)

(1) Number of employees (y1)

(2) Turnover (y2)

(3) Personnel expenses (y3)

(4) Investments (y4)

(t)

(1) Number of employees (u1)

(2) Turnover (u2)

(3) Personnel expenses (u3)

(4) Investments (u4)

(t-1)

Page 9: Using unequal probability sampling to limit antici-pated variances of regression estimators

The multivariate case

Anders Holmberg ICES III 07

The least we should do is to analyse the various designs’ possible effects on different estimators, before we make the design choice.

Derive inclusion probabilities as a function of standardized (univariate) size measures

Maximal Brewer selection

Page 10: Using unequal probability sampling to limit antici-pated variances of regression estimators

The multivariate case

Anders Holmberg ICES III 07

There is no evident criterion of optimality, but some are better than others.

),,1())ˆ(( QqvtANVg qyq

)))ˆ((,)),ˆ(((1 Qyy tANVgtANVgf Minimize

under the restrictions

Try to find a design that in some sence is optimal for all important parameters?

Page 11: Using unequal probability sampling to limit antici-pated variances of regression estimators

Scale effects are neutralized, the relations between the ANVq :s and the corresponding single parameter minimum values (The Brewer selection) are used .

Anders Holmberg ICES III 07

The multivariate case some optimisation approaches

Minimizing a weighted sum of relative efficiency losses:

Q

q yq

pyq

q

q

iq

tANV

tANVHANOREL

1 min )ˆ(

)ˆ(

is minimized when

Q

qU qkkoptq

qkqk H

121

)(

2

~)1~(

~

Page 12: Using unequal probability sampling to limit antici-pated variances of regression estimators

If we want to put restrictions on certain parameters, e.g.

21

1 21 ( )

min ( ) ( 1) ( 1)

subject to

Qqk

q kUq q opt k qkU

f H

π

Optimization model:

0

21

1 2( )

0 1 1, ,

0

1 1, ,( 1)

k

kU

qkq qkU

q opt k qkU

k N

g n

g v q Q

π

π

min

ˆ( )1, ,

ˆ( )q i

q

q y r pq

y rq

ANV tv q Q

ANV t

Then a design that minimizes ANOREL can be obtained through non-linear programming

Anders Holmberg ICES III 07

The multivariate case some optimisation approaches

Page 13: Using unequal probability sampling to limit antici-pated variances of regression estimators

An Application

Anders Holmberg ICES III 07

The 4 variables studied for three branches (strata)

SNI25: Manufacturers of food products & beverages

N=749,

SNI28: Manufacturers of metal goods (except machines and devices)

N=2292,

SNI33: Manufacturers of optical instruments

N=323,

290)( snE

112)( snE

64)( snE

Analysis and comparisons made on admin data from previous reference times. Plots, Estimated correlations and gammacoefficients

Page 14: Using unequal probability sampling to limit antici-pated variances of regression estimators

An Application

Anders Holmberg ICES III 07

• A common ratio model pictures the relationships reasonably well if the corresponding older variable is used as regressor variable. (Strongest pairwise correlation over branches and time, although doubts exist for the investment variable)

• Estimates of the gammacoefficient are sensitive.

• Estimates ranged between 0.2 and 0.9 and sometimes deteriorated!?

• For investments very weak or no heteroscedasticity

• For the other three variables,

“cannot be ruled out” and is simple as a guesstimate

5.0~

Page 15: Using unequal probability sampling to limit antici-pated variances of regression estimators

An Application

Anders Holmberg ICES III 07

5.0

studyvariable Food Metal Optic

employees 0.5 0.5

turnover 0.5 0.5 0.5

P-costs 0.5 0.5 0.5

investment 0.2 0 0.2

StrataAuxiliary/

size variable

~ ~

)(~

qkkoptq u

~

1ku ~

1ku~

1ku

~

2ku~

2ku~

2ku

~

3ku ~

3ku~

3ku

~

4ku~

4ku ~

4ku

Page 16: Using unequal probability sampling to limit antici-pated variances of regression estimators

An Application

Anders Holmberg ICES III 07

• Computations of inclusion probabilities and the anticipated variances using the Brewer selection (Maximal brewer selection)

• Computation of the optimisation based approaches, with the extra condition that

15.1)ˆ(

)ˆ(

min

q

iq

yq

pyq

tANV

tANV

Page 17: Using unequal probability sampling to limit antici-pated variances of regression estimators

1p

2p

3p

4p

5p

6p

Study variables

Considered Design Employees Turnover P-cost Invest Mean

Opt. on Empl 0 24.3 3.5 24.4 13.0

Opt. on Turn 24.5 0 19.1 74.4 29.5

Opt. on P-cost 3.3 16.4 0 43.0 13.0

Opt. on Invest 34.4 91.7 45.9 0 43.0

Minimizing Anorel 2.8 13.9 2.9 19.5 9.8

Minimizing Anorel with restrictions 5.7 15.0 6.5 15.0 10.6

Food & Beverages ]1))ˆ()ˆ([(100 min qiq yqpyq tANVtANV

Page 18: Using unequal probability sampling to limit antici-pated variances of regression estimators

1p

2p

3p

4p

5p

6p

Study variables

Considered Design Employees Turnover P-cost Invest Mean

Opt. on Empl 0 13.0 3.4 30.6 11.8

Opt. on Turn 11.0 0 8.2 51.6 17.7

Opt. on P-cost 3.0 8.3 0 37.1 12.1

Opt. on Invest 44.7 73.3 53.4 0 42.8

Minimizing Anorel 2.9 7.7 3.1 20.1 8.5

Minimizing Anorel with restrictions 4.5 10.9 5.3 15.0 8.9

Optical Instruments ]1))ˆ()ˆ([(100 min qiq yqpyq tANVtANV

Page 19: Using unequal probability sampling to limit antici-pated variances of regression estimators

1p

2p

3p

4p

5p

6p

Study variables

Considered Design Employees Turnover P-cost Invest Mean

Opt. on Empl 0 7.3 2.0 21.0 7.6

Opt. on Turn 6.1 0 4.3 33.0 10.9

Opt. on P-cost 1.8 5.0 0 24.4 7.8

Opt. on Invest 31.6 51.2 36.1 0 29.7

Minimizing Anorel 1.7 4.9 1.9 14.0 5.6

Minimizing Anorel with restrictions 3.4 7.0 4.0 10.0 6.1

Metal goods ]1))ˆ()ˆ([(100 min qiq yqpyq tANVtANV

Maximal Brewer selection satisfies the criteria but with 25% larger sample

3.364)( snE

Page 20: Using unequal probability sampling to limit antici-pated variances of regression estimators

Does it work on the estimator variances?

Anders Holmberg ICES III 07

• In most cases we will never know

• However, for these variables we can check against admin. data (coming in 1.5 year later)

• Using

• Where is the Taylor expanded variance of the ratio estimator under poisson sampling

( )

*

ˆ( )100( 1)

ˆ( )PO q q i

q

T y r p

q y r

V t

V t

( )

1 2ˆ( ) 1PO qT y r k qkU

V t E

Page 21: Using unequal probability sampling to limit antici-pated variances of regression estimators

1p

2p

3p

4p

5p

6p

Study variables

Considered Design Employees Turnover P-cost Invest Mean loss

Opt. on Empl 0 19 9 72 25

Opt. on Turn 9 0 36 67 28

Opt. on P-cost 8 6 0 68 21

Opt. on Invest 146 146 222 24 134

Minimizing Anorel 2 2 31 0 9

Minimizing Anorel with restrictions 10 8 45 8 18

Metal goodsRatios of the Taylor expanded variances to the smallest variance of each estimator (%)

Page 22: Using unequal probability sampling to limit antici-pated variances of regression estimators

Summary

• Carefully choosing appropriate size measures to get limits anticipated variances of

regression estimators. And Brewer’s results can be extended to a multivariate situation.

• If there is a multivariate issue and you intend to use auxiliary information in the design, diagnostic computations are important.

• With an optimization approach we know what we are aiming to minimize and with the non-linear programming approach some practical trouble in designing a pps-sample are avoided.

Anders Holmberg ICES III 07

qkk ~