Working with EU-SILC: data files, variables anddata management
Practical computing session I – Part 1
Heike WirthGESIS – Leibniz Institut für Sozialwissenschaften
DwB-Training Cource on EU-SILC , February 13-15, 2013Romanian Social Data Archive at the Departement of SociologyUniversity of Bucharest, Romania
• EU-SILC datasets
• EU-SILC Variables
• Differences between Data collected & anonymised User Database (UDB)
• Hands on • Transform CSV-File into SPSS/Stata-Systemfile• number of households/persons in the file
2
Overview
• Four separate files • Household ( = 1 observation per household)
Register data (D) Household data (H)
• Individuals (= 1 observation per person) Register data (R) Personal data (P)
• Since cross & longitudinal data are provided separately => 8 files
3
EU-SILC Data
For example:
• UDB_c10D_ver 2010-1 from 01-03-12.csv• UDB_c10H_ver 2010-1 from 01-03-12.csv• UDB_c10R_ver 2010-1 from 01-03-12.csv• UDB_c10P_ver 2010-1 from 01-03-12.csv
• _c = cross; _l = longitudinal• 10 = year of the survey = 2010• D = Household Register File• H = Household Data File• R = Personal Register File• P = Personal Data File
• 2010-1= version of the data (e.g. 1st version of the 2010 data)• csv = type of data (=comma separated values)4
EU-SILC Data
• Household Register File (D)• one record for every household including information regarding sample units, household weights, etc• e.g. UDB_c10D_ver 2010-2: N = 225 972 households
•
• Household Data File (H) • one record for every household including household data• e.g. UDB_c10H_ver 2010-2: N = 225 972 households
• Personal Register File (R)• one record for every person currently living in the household or temporarily absent• e.g. UDB_c10R_ver 2010-2: N = 576 531 persons
• Personal Data File (P)• Reference population: members of the household aged 16 and over• e.g. UDB_c10R_ver 2010-2: N = 476 705 persons
5
EU-SILC Data
6
Domains & Areas - Households
Source: Guidelines_Doc65_2010.pdf, p.73
7
Domains & Areas - Persons
Source: Guidelines_Doc65_2010.pdf, p.73
• Variable names in EU-SILC are composed of 3 parts:
• 1st character refers to the dataset (D; H; R; P)• 2nd character refers to the domain• 3 digits represent a sequential number
• e.g. PE040 = Highest ISCED Level attained
• Most important piece of data documentation: • Guideline ‘Description of Target Variables’ • refers to variables delivered by the NSIs to EUROSTAT
8
EU-SILC Variables
9
Guidelines – Target Variables (collected)
10
Guidelines – Target Variables (collected)
11
Guidelines – Target Variables (derived)
(...)
12
Different variable vames but same labels
13
14
15
Check HH020 & HH021 (using flag-variables)
HH021_F Flag
Gesamt
-5 m.v.of HH020
because HH021 is still
used -1 missing 1 filledHH020_F Flag
-5 m.v.of HH020 because HH021 is used
0 1 17245 17246
-1 missing 1 0 0 1
1 filled 1499 0 1500 2999
Gesamt 1500 1 18745 20246
• DIFFERENCES BETWEEN DATA COLLECTED (as described in the guidelines) AND THE ANONYMISED USER DATABASE
• All income variables are in € (EURO)• Variables removed• Top/Bottom coding• Variables added
• in addition: country specific rules
16
Additional important information
• Names of variable added• 1st character refers to the file (D; H; R; P)• 2nd character ‘X’• 3 digits represent a sequential number
• e.g. • HX040: Household size• HX060: Household type• HX080: Poverty Indicator• (….)
17
Anonymised User Database – Variables added
18
Anonymised User Database – Variables added
• Step 1: Open the 4 SPSS and/or Stata – Systemfiles
• Step 2: - Check the data
• How many households are included in the data (H- & D-File)• total• by country
• How many persons are included in the data (P- & R-File)• total (any differences between the P- & R-File?)• by country
• There are 15 countries in the training files. Fill in the table (next slide)• What are the main differences across countries? • Are there differences in the % of unemployed depending whether you use
RB210 or PL031, why?19
Hands on – Exercise 1
20
Exercise 1.3: Fill in the table
Total Hhld Gross
Income (HY010)
Total disposable
Hhld income before social
transfers (HY023)
Total disposable household
income (HY020 )
Average Hhld Size
(HX040)
% of hhld with
difficulties or great
difficulties to make
ends meet (HS120)
% of household
living in densly
populated area
(DB100)
Basic activity status
(RB210) % of un-
employed
Self-defined current
economic status
(PL031) % of un-
employed
AT Oesterreich
BG BulgariaCY CyprusCZ Czech RepublicDK DanmarkEE EstoniaES EspanaFI SuomiFR FranceGR ElladaHU HungaryLT LithuaniaMT MaltaPL Poland
Country
Mean
21
Exercise 1.3: Fill in the table
Total Hhld Gross
Income (HY010)
Total disposable
Hhld income before social
transfers (HY023)
Total disposable household
income (HY020 )
Average Hhld Size
(HX040)
% of hhld with
difficulties or great
difficulties to make
ends meet (HS120)
% of household
living in densly
populated area
(DB100)
Basic activity status
(RB210) % of un-
employed
Self-defined current
economic status
(PL031) % of un-
employed
AT Oesterreich 50.377 24.768 36.627 2,2 12,5% 36,2% 2,8% 3,7%BG Bulgaria 6.466 4.074 5.869 2,4 68,3% 44,4% 9,2% 9,4%CY Cyprus 39.736 25.781 35.040 2,8 47,5% 55,6% 3,3% 4,1%CZ Czech Republic 14.252 8.216 12.271 2,3 26,4% 32,5% 3,5% 4,3%DK Danmark 75.904 36.981 49.636 2,5 6,0% 32,7% 3,1% 3,9%EE Estonia 12.466 7.530 10.735 2,5 22,9% 34,3% 8,0% 9,2%ES Espana 29.468 17.985 25.251 2,6 29,9% 48,8% 9,2% 11,1%FI Suomi 55.202 30.691 41.284 2,3 5,5% 23,5% 5,0% 6,0%FR France 48.291 25.807 39.117 2,3 18,3% 46,6% 4,5% 5,5%GR Ellada 28.565 14.162 21.930 2,5 53,7% 37,9% 5,1% 6,1%HU Hungary 9.865 4.439 7.773 2,4 54,6% 31,5% 5,2% 6,0%LT Lithuania 9.627 5.223 8.240 2,4 40,4% 43,5% 8,0% 8,9%MT Malta 23.704 14.322 20.199 2,7 43,3% 90,0% 2,8% 3,3%PL Poland 11.543 6.030 8.991 2,6 37,7% 40,9% 4,8% 5,8%
Country
Top Related