SHARELIFE Meeting Vienna November, 5-6 The Italian experience in SHARE data cleaning Paccagnella...

9
SHARELIFE Meeting Vienna – November, 5-6 The Italian experience in SHARE data cleaning Paccagnella Omar Omar Paccagnella SHARELIFE meeting November 6, 2007

description

In general … No general rules, every household is a different story Use remarks ! Strong cooperation with survey agency (interviewers…) Check data that can be compared with information available from other sources (e.g. gross sample, booklet, other administrative data?) Be conservative ! Omar Paccagnella SHARELIFE meeting November 6, 2007

Transcript of SHARELIFE Meeting Vienna November, 5-6 The Italian experience in SHARE data cleaning Paccagnella...

Page 1: SHARELIFE Meeting Vienna  November, 5-6 The Italian experience in SHARE data cleaning Paccagnella Omar Omar Paccagnella SHARELIFE meeting November 6,

SHARELIFE MeetingVienna – November, 5-6

The Italian experience in SHARE data cleaning

Paccagnella OmarOmar Paccagnella SHARELIFE meeting November 6, 2007

Page 2: SHARELIFE Meeting Vienna  November, 5-6 The Italian experience in SHARE data cleaning Paccagnella Omar Omar Paccagnella SHARELIFE meeting November 6,

In general …

Topics of this presentation are based ONLY on the Italian experience

of data cleaning in the 2 waves of SHARE

Checks are divided in 4 groups

Omar Paccagnella SHARELIFE meeting November 6, 2007

Page 3: SHARELIFE Meeting Vienna  November, 5-6 The Italian experience in SHARE data cleaning Paccagnella Omar Omar Paccagnella SHARELIFE meeting November 6,

In general …

No general rules, every household is a different story

Use remarks !

Strong cooperation with survey agency (interviewers…)

Check data that can be compared with information available from other sources (e.g. gross sample, booklet, other administrative data?)

Be conservative !

Omar Paccagnella SHARELIFE meeting November 6, 2007

Page 4: SHARELIFE Meeting Vienna  November, 5-6 The Italian experience in SHARE data cleaning Paccagnella Omar Omar Paccagnella SHARELIFE meeting November 6,

Group 1: demographic info -

id matchingMatching within wave :

Gender and year of birth must be the same in CV, DN & XT sections and drop-off. At least one household member must have the same gender and year of birth of the selected individual (gross sample information). Check mixing up of respondents in a hh, e.g. in a couple the interview to the husband was done in the SMS row of the wife.Omar Paccagnella SHARELIFE meeting

November 6, 2007

Page 5: SHARELIFE Meeting Vienna  November, 5-6 The Italian experience in SHARE data cleaning Paccagnella Omar Omar Paccagnella SHARELIFE meeting November 6,

Group 1: demographic info -

id matchingMatching between waves :

Gender and year of birth must be the same in CV, DN & XT sections and drop-off of both waves. In case of mixing up of respondents in a hh, check whether the error was made linking the respondents (baseline vs longitudinal interview) or selecting the wrong individual row in the SMS (preload info).

Household composition (eligible & non-eligible individuals who moved in, moved out or died between waves).

Omar Paccagnella SHARELIFE meeting November 6, 2007

Page 6: SHARELIFE Meeting Vienna  November, 5-6 The Italian experience in SHARE data cleaning Paccagnella Omar Omar Paccagnella SHARELIFE meeting November 6,

Group 2 : AmountsIn all questions where an amount is asked :

Check too large values: typing errors? Pre-Euro currency? Check too small values: could be? Typing errors?

Zero values (financial questions): a way to avoid UBs? A way to consider it a very small value? Comparing the distribution of that variable with distributions from other sources: important differences? Could be any problems in the text of the question? Results by interviewer

Omar Paccagnella SHARELIFE meeting November 6, 2007

Page 7: SHARELIFE Meeting Vienna  November, 5-6 The Italian experience in SHARE data cleaning Paccagnella Omar Omar Paccagnella SHARELIFE meeting November 6,

Group 3 : Physical and cognitive test

resultsFor all tests whose results were reported in

the booklet :

Check too large values: typing errors?

Check the value of 1 in the “Ten words recall test”: total numbers of words recorded instead of the cited words.

Results by interviewer (tests non completed, rounding off of the results, same result, etc.)

Omar Paccagnella SHARELIFE meeting November 6, 2007

Page 8: SHARELIFE Meeting Vienna  November, 5-6 The Italian experience in SHARE data cleaning Paccagnella Omar Omar Paccagnella SHARELIFE meeting November 6,

Group 4 : Other checks

All issues that could be misunderstood (by respondent and/or interviewer) :

Answer category: if the question includes also the “Other” option, check whether some of the answers may be recoded in one of the categories already defined. A large number of “other” answers: do we miss something ?

Year/age of some events: are they compatible with the age of respondents?

Omar Paccagnella SHARELIFE meeting November 6, 2007

Page 9: SHARELIFE Meeting Vienna  November, 5-6 The Italian experience in SHARE data cleaning Paccagnella Omar Omar Paccagnella SHARELIFE meeting November 6,

Some final thoughts

Data cleaning is not only the corrections of some errors,

but it is a way to check and evaluate the quality of our datasets:

we can find sections where data are less good (compared to other similar surveys),

the variables that need more attention (both analyzing the data and preparing the

briefings).

A good data cleaning begins at the beginning of the fieldOmar Paccagnella SHARELIFE meeting

November 6, 2007