SHARE data cleaning meeting Frankfurt – December, 6 Some suggestions from the Italian experience...

10
SHARE data cleaning meeting Frankfurt – December, 6 Some suggestions from the Italian experience Paccagnella Omar Omar Paccagnella Data cleaning meeting December 6, 2007

Transcript of SHARE data cleaning meeting Frankfurt – December, 6 Some suggestions from the Italian experience...

Page 1: SHARE data cleaning meeting Frankfurt – December, 6 Some suggestions from the Italian experience Paccagnella Omar Omar Paccagnella Data cleaning meeting.

SHARE data cleaning meetingFrankfurt – December, 6

Some suggestions from the Italian experience

Paccagnella Omar

Omar Paccagnella Data cleaning meeting December 6, 2007

Page 2: SHARE data cleaning meeting Frankfurt – December, 6 Some suggestions from the Italian experience Paccagnella Omar Omar Paccagnella Data cleaning meeting.

How to proceed?How to proceed?

Data cleaning as a whole can be divided in 2 stages:

1)The frameAll about identification of households and/or

individuals(id’s – demo characteristics – household

composition)

2)The pictureAll about individual characteristics and answers

(that can be checked !)Omar Paccagnella Data cleaning meeting December 6, 2007

Page 3: SHARE data cleaning meeting Frankfurt – December, 6 Some suggestions from the Italian experience Paccagnella Omar Omar Paccagnella Data cleaning meeting.

The frame (1)The frame (1)

Are you sure that the interviewed household/individual is the one you want ?

Longitudinal sample - IWER has to contact the same wave1 hh (wrong address? selection errors in the SMS by IWER?) … looking at the CV (name, gender, year of birth, children)

Refresher sample - IWER has to fill in at the end of the CV the selected individual (other 50+ in the hh?) … looking at the CV (name, gender, year of birth) … sample representation; oversampling 1955/1956.

Omar Paccagnella Data cleaning meeting December 6, 2007

Page 4: SHARE data cleaning meeting Frankfurt – December, 6 Some suggestions from the Italian experience Paccagnella Omar Omar Paccagnella Data cleaning meeting.

The frame (2)The frame (2)

Are you sure that in the CV of the householdALL eligible individuals are reported?

Longitudinal sample - You need to know what happened to all wave1 individuals: … w1 individuals not in the w2 CV: deceased? Moved out? … w1 individuals both deceased and moved out in the w2 CV: check for linking errors (exit instead of longitudinal interview) … w1 individuals indicated in the w2 CV as moved in: check for the id and type of interview (baseline vs longitudinal) … w1 individuals indicated as “New hh members” after w2 CV: check for the id and type of interview (baseline vs longitudinal) … w2 individuals not in the w1 CV: moved in questions completed?

Omar Paccagnella Data cleaning meeting December 6, 2007

Page 5: SHARE data cleaning meeting Frankfurt – December, 6 Some suggestions from the Italian experience Paccagnella Omar Omar Paccagnella Data cleaning meeting.

The frame (2)The frame (2)

Are you sure that in the CV of the householdALL eligible individuals are reported?

Refresher sample - You need to know whether all household members are reported … this can be checked only when the sample selection is based on hh instead of individuals or other hh information is available

Omar Paccagnella Data cleaning meeting December 6, 2007

Page 6: SHARE data cleaning meeting Frankfurt – December, 6 Some suggestions from the Italian experience Paccagnella Omar Omar Paccagnella Data cleaning meeting.

The frame (3)The frame (3)

Are you sure that demographic information matches correctly within and between waves?

Within waves - Check mixing up of respondents: e.g. the interview to the husband was done in the SMS row of the wife (refresher vs longitudinal) … gender & year of birth must be the same in CV, DN, XT sections and drop-off (where available) Between waves - Check mixing up of respondents, e.g. the name of the husband was linked with the name of his wife … gender & year of birth must be the same in CV, DN, XT sections and drop-off (where available)

Omar Paccagnella Data cleaning meeting December 6, 2007

Page 7: SHARE data cleaning meeting Frankfurt – December, 6 Some suggestions from the Italian experience Paccagnella Omar Omar Paccagnella Data cleaning meeting.

The frame: summing upThe frame:

summing up

Check that there is no household different from the selected (this also means that at least one household member must have the same gender and year of birth of the selected individual in every wave)

Check that wave1 eligible individuals are not “forgotten” in wave2

Check that id’s of the eligible individuals are properly merged

Omar Paccagnella Data cleaning meeting December 6, 2007

Page 8: SHARE data cleaning meeting Frankfurt – December, 6 Some suggestions from the Italian experience Paccagnella Omar Omar Paccagnella Data cleaning meeting.

To complete the frame …

To complete the frame …

… check and clean interviewer characteristics !

In CV there is the “org” variable, but the characteristics of IWER who completes the interview is only in IV section:

- Be sure that the same IWER has a unique id number (small/capital letter, spaces, numbers, etc.)- Check age, gender and education for the same IWER (in wave1 there were some interviews where IWER reported the respondent characteristics instead of his/hers)

Omar Paccagnella Data cleaning meeting December 6, 2007

Page 9: SHARE data cleaning meeting Frankfurt – December, 6 Some suggestions from the Italian experience Paccagnella Omar Omar Paccagnella Data cleaning meeting.

The pictureThe picture

Check outliers, DK, RF and all values that can be compared with other sources

Amounts: too large/small values; “0” values; results by IWER

Physical & cognitive tests: too large values; value of 1 in the “Ten words recall test” (total number of words instead of cited words); tests non completed; rounding off of results; same results across trials; results by IWER

Children: are their age/year of some events compatible with the age of respondents? Other in answer categories: may the answer be recoded in one category already defined? A large number of “other”: do we miss something?

Omar Paccagnella Data cleaning meeting December 6, 2007

Page 10: SHARE data cleaning meeting Frankfurt – December, 6 Some suggestions from the Italian experience Paccagnella Omar Omar Paccagnella Data cleaning meeting.

Some final thoughtsSome final thoughts

Data cleaning is not only the corrections of some errors,

but it is a way to check and evaluate the quality of our datasets:

we can find sections where data are less good (compared to other similar surveys),

the variables that need more attention (both analyzing the data and preparing the briefings).

A good data cleaning starts at the beginning of the field

Omar Paccagnella Data cleaning meeting December 6, 2007