Imputation in EU-SILC survey

15
Imputation in EU- SILC survey Andris Fisenko and Vita Kozirkova Central Statistical Bureau of Latvia

description

Imputation in EU-SILC survey. Andris Fisenko and Vita Kozirkova Central Statistical Bureau of Latvia. EU-SILC. "EU Statistics on Income and Living Conditions (EU-SILC)". This survey is conducted in all countries of the European Union. Variables. Total housing cost - PowerPoint PPT Presentation

Transcript of Imputation in EU-SILC survey

Page 1: Imputation in EU-SILC  survey

Imputation in EU-SILC survey

Andris Fisenko and Vita KozirkovaCentral Statistical Bureau of Latvia

Page 2: Imputation in EU-SILC  survey

EU-SILC

"EU Statistics on Income and Living Conditions (EU-SILC)".

This survey is conducted in all countries of the European Union.

Page 3: Imputation in EU-SILC  survey

Variables• Total housing cost• Family/Children related allowances• Social exclusion• Housing allowances• Income received by people aged under 16• Regular taxes on wealth• Regular inter-household cash transfer paid• Regular inter-household cash transfer received• Other

Page 4: Imputation in EU-SILC  survey

Structure

• Housholds level– Income per year is asked.

– If refusal, then ask average and how often.

– If answer is “ I don’t know” then IMPUTATION

• Personal level– Differences for persons level

– Average in one time or interval.

Page 5: Imputation in EU-SILC  survey

Imputation methods

• RMV (Replace Missing Values)• Hot deck & Cold deck• Mean/Median imputation• Nearest Neighbour• Regression• Other

Page 6: Imputation in EU-SILC  survey

SPSS RMV- 1

Page 7: Imputation in EU-SILC  survey

RMV - results

Nr. Lint Mean Median Smean Trend17121 44,0 31,8 24,5 35,7 31,032651 29,0 36,7 40,0 35,7 36,748681 55,0 40,5 35,5 35,7 40,258701 . . . 35,7 41,158721 . . . 35,7 41,3

Page 8: Imputation in EU-SILC  survey

Cold-deck

• Does not bring new information• Very good benchmarking method

• There is not data from the past• Method will be analysed in the future

Page 9: Imputation in EU-SILC  survey

Random donor (Hot-deck)• The donor is chosen randomly within same group• Random overall imputation is not sensible• But random donor within imputation classes (or

clusters, groups, ..) is important method

Page 10: Imputation in EU-SILC  survey

Random donor (Hot-deck)

• Firstly, the data is divided into homogenous sub groups by a a suitable method, e.g. easily sorting into categories by some explanatory variable. Secondly, the donors are chosen randomly within these new groups.

• Many clustering methods are using the idea of nearest neighbours when determining a new grouping of the data.

Page 11: Imputation in EU-SILC  survey

Multiple imputation

• Creates m imputed data sets.

• Hot deck+Multiple imputation

Page 12: Imputation in EU-SILC  survey

Some results

0 0 1673,3 1511,520 1 1670,8 1511,2100 1,5 1672,9 1515,750 2 1670,7 1508,721 3 1677,6 1529,4100 4 1676,3 1529,150 5 1670,7 1521,9100 5 1670,1 1531,450 7 1682,1 1534,420 10 1659,4 1499,820 15 1657,5 1457,450 20 1693,4 1567,8

Number of simulations

MeanStandard Deviation

 Percentage of missing data

Page 13: Imputation in EU-SILC  survey

Some results

0.0

1.0

2.0

3.0

4.0

5.0

6.0

1%

1.5% 2% 3% 4% 5% 5% 7% 10%

15%

20%

Page 14: Imputation in EU-SILC  survey

What's next

• Software• Linear regression• Auxiliary information from

– State Revenue Service

– State Social Insurance Agency

– Regional governments

– Banks

Page 15: Imputation in EU-SILC  survey

Thank you for your attention!!!