Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

58
Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott

Transcript of Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Page 1: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Imputation in the 2011 Census

NILS Brownbag Talk – 6 May 2014

Richard Elliott

Page 2: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Overview

• Background• What is imputation• How did we impute the 2011 Census

• Strategy• Process• Implementation

• Considerations• Information• Next steps

Page 3: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Background

• Legal obligation on the public to complete a Census Questionnaire accurately

• A minority didn’t provide such information• Item non-response

– Leave questions unanswered– Make mistakes (i.e. neglect to follow questionnaire

instructions)– Provide values that are out of range (e.g. Born in 1791)

• Item inconsistency– Captured values not consistent with other values on the

questionnaire (e.g. 6 year old mother)• Non-response

– Don’t fill in the questionnaire at all

Page 4: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Background

• It is NISRA’s policy to report estimates for the entire population. Therefore Imputation was utilised to:• Correct for non-response

– Estimate the missing persons and households• Correct for Item non-response

– Fill the gaps left by unanswered questions• Correct for Item inconsistency

– Ensure that the information provided is consistent

• These types of data quality issues apply equally to any data collection exercise• Not specific to Census• Census Office recognises that users need to be aware

Page 5: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Background

• While imputation was used to “fill the gaps”, its strength comes from the information that was recorded

• Therefore, it is important to recognise the following:• Responses to the Census represent a self-assessment of a

respondents circumstances– Proxy responses given by main householder

• Respondents didn’t always complete the questionnaire correctly• 85% of questionnaires were completed on paper forms

– handwriting that had to be captured using an electronic character recognition system

– While Service Levels in place for capture, errors still possible

Page 6: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Two Types of Imputation

• Item Edit and Imputation• Correcting a dataset for inconsistencies and item non-response• Making each record “complete and consistent”

• Record imputation• The addition of whole records to a dataset• Estimate and adjust for persons missed, duplicated and counted

in the wrong place• Increases the accuracy of the overall estimates

Page 7: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Item Edit and Imputation Strategy

• Primary Objective:• to produce a complete and consistent database where

unobserved distributions were estimated accurately by the imputation process

• There were three key principles1. All changes made maintain the quality of the data

2. The number of changes to inconsistent data are minimised

3. As far as possible, missing data should be imputed for all variables to provide a complete and consistent database

Page 8: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Item Edit and Imputation Strategy

• In adhering to these principles, the following key aims were defined• Editing must not introduce bias or distortion in the data• Editing facilitates the production of output data that is fit for

purpose• Editing methods help to ensure that pre-determined levels of

data quality are met– Highest priority given to variables which define population

bases (e.g. Age and Sex)• Editing supports the production of the population estimates by

ensuring that the basic population estimates are accurate

Page 9: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Item Edit and Imputation Strategy

• Used a similar but enhanced version of the framework adopted in 2001• One Number Census Process• Tried and tested in 2001

• Was undertaken as part of the Downstream Processing (DSP) project at ONS• Included both Item and Record Imputation• Supplemented by detailed QA at every stage by NISRA Census

Office• NISRA benefitted from enhancements to the system found

through ONS data processing• Ultimately NISRA responsible for processing of NI data and any

parameter tweaking / re-runs

Page 10: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

1. Cleansing the Data

2. Item Imputation

Imputation Process – 4 Key Stages

Capture and Coding RMR FRDVP

Edits Donor Imputation

Manual Imputation

3. Coverage Assessment and

Adjustment

4. Post-Coverage Item

Imputation

Page 11: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

1. Cleansing the Data

2. Item Imputation

Imputation Process – 4 Key Stages

Capture and Coding RMR

FRDVP EditsDonor

ImputationManual

Imputation

3. Coverage Assessment and

Adjustment

4. Post-Coverage Item Imputation

Page 12: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – Capture and Coding

• Capture and coding rules• Turning tick and text responses into data that could be edited and

imputed– Complex coding used to assign numerical values to written text

and ticked boxes (e.g. occupation and industry coding)– Invalid responses flagged for imputation (V, W, Y and Z)

• Determinations made to responses to resolve combinations of tick and text

– Ticks that could not be determined set to W (failed multi-tick)– Text that was uncodeable set to V (uncodeable text response)

• Data subject to checks to ensure each question response was within a predefined range (e.g. No year of birth before 1896 or after 2011)

– Invalid responses set to Z (out of range)• Missing data flagged as Y (missing requires imputation)

Page 13: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – Capture and Coding

• Determining combinations of ticks• Single tick

RESPONSE

CODING RULES

OUTPUT CODE Text Tick

N/A Single tick Accept single tick 1 – 4 1 – 4

N/A Two ticks

Two ticks 1 + 2, code as 2 2

Two ticks 2 + 3, code as 3 3

Two ticks 3 + 4, code as 4 4

Any other combination of two ticks code as W

W

N/A Three or more ticks

Code as W W

N/A None Code as Y Y

Page 14: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – Capture and Coding

• Determining combinations of ticks• Resolvable multi-tick

RESPONSE

CODING RULES

OUTPUT CODE Text Tick

N/A Single tick Accept single tick 1 – 4 1 – 4

N/A Two ticks

Two ticks 1 + 2, code as 2 2

Two ticks 2 + 3, code as 3 3

Two ticks 3 + 4, code as 4 4

Any other combination of two ticks code as W

W

N/A Three or more ticks

Code as W W

N/A None Code as Y Y

Page 15: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – Capture and Coding

• Determining combinations of ticks• Irresolvable multi- tick

RESPONSE

CODING RULES

OUTPUT CODE Text Tick

N/A Single tick Accept single tick 1 – 4 1 – 4

N/A Two ticks

Two ticks 1 + 2, code as 2 2

Two ticks 2 + 3, code as 3 3

Two ticks 3 + 4, code as 4 4

Any other combination of two ticks code as W

W

N/A Three or more ticks

Code as W W

N/A None Code as Y Y

This will be assumed missing and imputed.

Page 16: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – Capture and Coding

• Missing data

RESPONSE

CODING RULES

OUTPUT CODE Text Tick

N/A Single tick Accept single tick 1 – 4 1 – 4

N/A Two ticks

Two ticks 1 + 2, code as 2 2

Two ticks 2 + 3, code as 3 3

Two ticks 3 + 4, code as 4 4

Any other combination of two ticks code as W

W

N/A Three or more ticks

Code as W W

N/A None Code as Y Y

This will be imputed.

Page 17: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – Capture and Coding

• Resolving write-ins• Numbers

RESPONSECODING RULES OUTPUT

CODE Text Tick

If the answer is written as a word within the 2 CB constrained field then input the equivalent numeric value. If only one digit is entered with a space or a non-numeric character then right justify and precede with a zero e.g. – 1 code as 01

Text N/A Code text, accept 00 to 99.Any number above 99 code as ZZ, invalid text code as ZZ

00 – 99 ZZ

None N/A Code as YY YY

1

01

Page 18: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – Capture and Coding

• Resolving write-ins• Numbers

RESPONSECODING RULES OUTPUT

CODE Text Tick

If the answer is written as a word within the 2 CB constrained field then input the equivalent numeric value. If only one digit is entered with a space or a non-numeric character then right justify and precede with a zero e.g. – 1 code as 01

Text N/A Code text, accept 00 to 99.Any number above 99 code as ZZ, invalid text code as ZZ

00 – 99 ZZ

None N/A Code as YY YY

two

02

Page 19: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – Capture and Coding

• Resolving write-ins• Range Check

RESPONSECODING RULES OUTPUT

CODE Text Tick

If the answer is written as a word within the 2 CB constrained field then input the equivalent numeric value. If only one digit is entered with a space or a non-numeric character then right justify and precede with a zero e.g. – 1 code as 01

Text N/A Code text, accept 00 to 99.Any number above 99 code as ZZ, invalid text code as ZZ

00 – 99 ZZ

None N/A Code as YY YY

199

This will be assumed missing and imputed.

Page 20: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – Capture and Coding

• Resolving write-ins• Codeable response

F R A NC E

“FRANCE” gets coded to 250

Page 21: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – Capture and Coding

• Resolving write-ins• Uncodeable response

S U G A R

“SUGAR” is clearly not a country so set to set to VVV.

This will be assumed missing and imputed.

Page 22: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

1. Cleansing the Data

2. Item Imputation

Imputation Process – 4 Key Stages

Capture and Coding RMR

FRDVP EditsDonor

ImputationManual

Imputation

3. Coverage Assessment and

Adjustment

4. Post-Coverage Item Imputation

Page 23: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – RMR

• Reconcile Multiple Responses (RMR)• Removal of false persons

– Removal of persons generated by capture anomalies– For example: strike throughs, inadequately completed

questionnaires• Removal of duplicates (multiple persons / households)

– Individuals who included themselves more than once– Separated parents who included their children at both

addresses• Creating households / communals from multiple questionnaires

– Consolidating H4 / HC4 / I4 etc• Validation

– Renumbering person records within households / communals

Page 24: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

1. Cleansing the Data

2. Item Imputation

Imputation Process – 4 Key Stages

Capture and Coding RMR

FRDVP EditsDonor

ImputationManual

Imputation

3. Coverage Assessment and

Adjustment

4. Post-Coverage Item Imputation

Page 25: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – FRDVP

• Filter Rules and Derived Variables for Processing (FRDVP)• Correct data by applying edits to correct for questionnaire

routing errors• Apply hard edits to keep individual records consistent

– Minimal at this stage (mostly applied within imputation system)

• Information not required set to X– No imputation done on any variable set to X

• Create high level variables to be used within the Item Imputation system

– Blocking variables for donor searching– Makes it easier to find donors

Page 26: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – FRDVP

• Surplus information – questionnaire routing

In this scenario the respondent should have skipped question 6 and gone straight to question 7.

Therefore, since the respondent should not have answered question 6, it is set to:

X (not required)

Page 27: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – FRDVP

• Surplus information – questionnaire consistency

9 L O RD WAR DE N S

In this scenario, since the respondent is aged under 1 on Census day, and therefore did not have a usual address one year ago, the captured address information is set to X.

C RE S C E N T

B T 1 9 1 Y J

0 1 0 1 2 0 1 1

Page 28: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

1. Cleansing the Data

2. Item Imputation

Imputation Process – 4 Key Stages

Capture and Coding RMR

FRDVP EditsDonor

ImputationManual

Imputation

3. Coverage Assessment and

Adjustment

4. Post-Coverage Item Imputation

Page 29: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – Item Imputation

• Achieved using CANCEIS• Canadian Census Edit and Imputation System• Developed specifically for Census type data

– ie a mix of categorical and numerical variables• Donor-based edit and imputation system that can

simultaneously:– Apply nearest-neighbour donor imputation– Apply deterministic edits and maintain consistency

• Evaluated and endorsed as the 2011 Census imputation tool– Faster– Less resource intensive– Allowed for more joint-imputation

Page 30: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – CANCEIS

• How did CANCEIS work in practice• The database was divided up into processing units for the

purposes of resource management and maximising donor pools

Three Geographic

units

Household questions

Person questions

Individual imputationDonor unit = household

Household Persons 1 to 6

Household Persons 7 to 30

Communal Persons

Joint Household imputationBetween person edits and relationships

Donor unit = household of same size

Individual imputationRelationship to Person 1

Donor unit = individual person

Individual imputationDonor unit = individual person

Page 31: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Delivery Groups (Processing Units)

Page 32: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – CANCEIS

• How did CANCEIS work in practice• The database was divided up into processing units for the

purposes of resource management and maximising donor pools

Three Geographic

units

Household questions

Person questions

Individual imputationDonor unit = household

Household Persons 1 to 6

Household Persons 7 to 30

Communal Persons

Joint Household imputationBetween person edits and relationships

Donor unit = household of same size

Individual imputationRelationship to Person 1

Donor unit = individual person

Individual imputationDonor unit = individual person

Page 33: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – CANCEIS

• How did CANCEIS work in practice• The household questions were imputed within a single module• Person data was divided up into 4 modules

– Aim was to group variables that help predict each other– Attempt to maximise the number of donors for a given group

Demographics

e.g. Age, Sex, Marital status,

Student, Activity last week

Culture

e.g. Ethnicity, Country of birth,

Language, Passports

Health

e.g. General health, Disability,

Long-term condition

Labour Market

e.g. Economic activity, Hours

worked, Qualifications

Page 34: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – CANCEIS

• How were the donors selected?• Within each module a number of matching variables were used to

select donors• Matching variables were weighted according to several factors

– How well they would predict other values and how highly they should be prioritised when resolving inconsistencies

– For example, age is often a good predictor of other demographic variables

– Age was given a high weight, therefore observed ages were prioritised over other values if there was an inconsistency and changes were required

• Northings and Eastings were used to control for geographical differences and find donors from similar areas

– These were given a small weight compared to demographic characteristics like age, sex and marital status etc

Page 35: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – CANCEIS

• Matching variables (example)• Suppose someone omitted to fill in their occupation details• The record would be flagged for imputation under the Labour

Market module• Donor pool identified by matching on (for example):

– Economic Activity– Industry– Hours worked– Qualifications

• These variables deemed to influence Occupation• Occupation information imputed from a donor with similar

Labour Market characteristics

Page 36: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – CANCEIS

• Editing and Imputing was done simultaneously• Each record was checked for consistency before imputation• Any items that failed the checks were marked for imputation along with the missing

items

• A single donor was selected to resolve inconsistencies and non-response• Only values which satisfied the edit constraints were imputed into the recipient

record• CANCEIS sought to minimise the number of changes required to repair a record

when edit constraints were in place

• There were 31 edit rules which were broadly based on 2001• e.g. If aged between 5 and 15 then must be in full-time education

• Some rules had to be updated to account for changes since 2001• e.g. Removal of rule that did not allow same-sex couples• Replaced with rules that said married couples had to be opposite-sex and civil

partners had to be same-sex

Page 37: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – CANCEIS

• Say we have the following (oversimplified) example:

• Student is missing• Requires imputation under the demographic module

• This record is subject to two edit constraintsA. Must be aged 16+ to be married

B. Aged 5 to 15 must be a student

• Fails Rule A since aged 10 and married• Therefore, both Age and Marital Status are also flagged for

imputation

Record ID Age Marital Status

Student

Obs1 10 Married -9 (Missing)

Page 38: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – CANCEIS

• Say we have the following (oversimplified) example:

• Student is missing• Requires imputation under the demographic module

• This record is subject to two edit constraintsA. Must be aged 16+ to be marriedB. Aged 5 to 15 must be a student

• Fails Rule A since aged 10 and married• Therefore, both Age and Marital Status are also flagged for

imputation

Record ID Age Marital Status

Student

Obs1 10 Married -9 (Missing)

Page 39: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – CANCEIS

• The system searches for potential donors• Matching on demographic variables• Uses Northings and Eastings to find a donor in the area

• The following records are returned:

Record ID Age Marital Status

Student

Donor1 4 Single No

Donor2 12 Single Yes

Page 40: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – CANCEIS

A. Must be aged 16+ to be marriedB. Aged 5 to 15 must be a student

Record ID Age Marital Status

Student

Obs1 10 Married -9 (Missing)

Donor1 4 Single No

Donor2 12 Single Yes

New1

Page 41: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – CANCEIS

A. Must be aged 16+ to be marriedB. Aged 5 to 15 must be a student

• Donor1• Using Donor1 would mean that “Single” is taken as well as “No”

Record ID Age Marital Status

Student

Obs1 10 Married -9 (Missing)

Donor1 4 Single No

Donor2 12 Single Yes

New1 10 Single No

Page 42: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – CANCEIS

A. Must be aged 16+ to be marriedB. Aged 5 to 15 must be a student

• Donor1• Using Donor1 would mean that “Single” is taken as well as “No”• The new record fails Rule B• Therefore Age is taken from the donor as well

Record ID Age Marital Status

Student

Obs1 10 Married -9 (Missing)

Donor1 4 Single No

Donor2 12 Single Yes

New1 10 Single No

Page 43: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – CANCEIS

A. Must be aged 16+ to be marriedB. Aged 5 to 15 must be a student

• Donor1• Using Donor1 would mean that “Single” is taken as well as “No”• The new record fails Rule B• Therefore Age is taken from the donor as well

Record ID Age Marital Status

Student

Obs1 10 Married -9 (Missing)

Donor1 4 Single No

Donor2 12 Single Yes

New1 4 Single No

Two observed value changes

Page 44: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – CANCEIS

A. Must be aged 16+ to be marriedB. Aged 5 to 15 must be a student

• Donor2• Using Donor2 would mean that “Single” is taken as well as “Yes”

Record ID Age Marital Status

Student

Obs1 10 Married -9 (Missing)

Donor1 4 Single No

Donor2 12 Single Yes

New1 10 Single Yes

Page 45: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – CANCEIS

A. Must be aged 16+ to be marriedB. Aged 5 to 15 must be a student

• Donor2• Using Donor2 would mean that “Single” is taken as well as “Yes”• The new record passes both Rule A and Rule B

Record ID Age Marital Status

Student

Obs1 10 Married -9 (Missing)

Donor1 4 Single No

Donor2 12 Single Yes

New1 10 Single Yes

Only one observed value change

Page 46: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – CANCEIS

A. Must be aged 16+ to be marriedB. Aged 5 to 15 must be a student

• Donor2• Using Donor2 would mean that “Single” is taken as well as “Yes”• The new record passes both Rule A and Rule B

• Donor2 given a higher probability of selection

Record ID Age Marital Status

Student

Obs1 10 Married -9 (Missing)

Donor1 4 Single No

Donor2 12 Single Yes

New1 10 Single Yes

Only one observed value change

Page 47: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – CANCEIS

• Points to note• Variables were imputed in blocks of similar variables (modules)

– there was no individual model for any one question• There is independency between the modules

– for example, cultural characteristics might come from a different donor to employment characteristics

• Imputed person data was combined in a way that maintained relationship consistency within a household

• Given the processing approach, quality was maintained at the geographic unit level

Page 48: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

1. Cleansing the Data

2. Item Imputation

Imputation Process – 4 Key Stages

Capture and Coding RMR

FRDVP EditsDonor

ImputationManual

Imputation

3. Coverage Assessment and

Adjustment

4. Post-Coverage Item Imputation

Page 49: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – Manual Imputation

• Manual Imputation kept to a minimum but was necessary• Manual Imputation – QA checks

• Quality Assurance at every stage of processing• Distributional checks and checks against comparator data sources• Edits made through Data File Amendments (DFAs)• DFAs not taken lightly

– Involved detailed questionnaire image analysis– Mostly correcting for capture errors– e.g. Centenarians

• Manual Imputation to increase donor pool• Temporary changes sometimes required when donor pool too small• E.g. Postcode matching

(would have been done later in processing but brought forward)

Page 50: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

1. Cleansing the Data

2. Item Imputation

Imputation Process – 4 Key Stages

Capture and Coding RMR

FRDVP EditsDonor

ImputationManual

Imputation

3. Coverage Assessment and

Adjustment

4. Post-Coverage Item Imputation

Page 51: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – Coverage

• Coverage Assessment and Adjustment (Record Imputation)• Estimating wholly missed households and/or missing persons

within households– Enumerated persons (92%)– Census Under-enumeration project (CUE) (4%)– Census Coverage Survey (CCS) (5%)

• Further information can be found at http://www.nisra.gov.uk/Census/pop_QA_2011.pdf

Page 52: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

1. Cleansing the Data

2. Item Imputation

Imputation Process – 4 Key Stages

Capture and Coding RMR

FRDVP EditsDonor

ImputationManual

Imputation

3. Coverage Assessment and

Adjustment

4. Post-Coverage Item Imputation

Page 53: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Implementation – Post-Coverage

• Post-Coverage Item Imputation• Making wholly imputed records complete and consistent• Using same methods as initial Item Imputation• Required only basic demographic information to be available for

each record• Final check for consistency

Page 54: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Considerations

• Self completion• Incorrect information provided (mother putting down wrong age of baby)• Bad understanding of question or layout (marital status / relationships)

• Some capture errors exist• eg dob captured as 1961 instead of 1981 – valid in family• Strike throughs

• Item Imputation assumes missingness is at random (MAR)• It has to – cant make any other assumption• Attempt to control for dependency by using modules• Negligible change to marginal distributions

• Record Imputation doesn’t assume MAR• Designed to correct for under-coverage which is not uniform• This imputation will change variable distributions• Extent of change driven by CCS and CUE

Page 55: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Considerations

• While based on similar approach in 2001, some differences exist that can affect Imputation rates• Changes to definitions (eg Marital Status)• Some questions are quite similar but subtly different

– eg Religion / Qualifications• Change in processing ability

– Workplace postcode matching much easier in 2011

• Census QA undertaken at every stage• Census assessed against various comparator datasets• However, unable to compare Census to Census unit records

– 2001 to 2011 link not available when processing

Page 56: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Information

• Information already available• ONS paper on Item Edit and Imputation process

– Item Edit and Imputation Process paper• ONS Evaluation report on Item Edit and Imputation

– Item Edit and Imputation Process Evaluation paper• 2011 NI Census Methodology Overview

– http://www.nisra.gov.uk/Census/pop_meth_2011.pdf• Details on the NI Census Under-enumeration project

– 2011 Census Under Enumeration Project: Methodology paper• 2011 NI Census Quality papers

– http://www.nisra.gov.uk/Census/pop_QA_2011.pdf– http://www.nisra.gov.uk/Census/pop_QA_2_2011.pdf– http://www.nisra.gov.uk/Census/key_QA_2011.pdf

• Census Quality Survey– http://www.nisra.gov.uk/archive/census/2011/census_quality_survey.pdf

Page 57: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Next Steps

• Census Imputation rates will be published in due course

• Change Rates available on NILS website• Note that rates are change rates and not Imputation Rates

– Imputation rates are expressed as a percentage of expected response rather than total response

• Most people filled in most of the questionnaire• A small proportion didn’t• Robust procedures applied to “fill the gaps”

Page 58: Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.

Questions