2010 World Population and Housing Census Programme United Nations Statistics Division
1 Population and Housing Census Editing Department of Economic and Social Development United Nations...
-
Upload
claire-brammer -
Category
Documents
-
view
219 -
download
2
Transcript of 1 Population and Housing Census Editing Department of Economic and Social Development United Nations...
1
Population and Housing Census Editing
Department of Economic and Social Development
United Nations Statistics Division
Studies in Methods, Series F, No.82
2
U.S. Census BureauInternational Programs Center
www.census.gov/ipc/www
Microcomputer Processing of
Census and Surveys(using the Census and Survey Processing System – CSPro)
3
Form
s To
To
Products
Data User
Processing of Census, Survey or Other Form
Computer
4
Two ways of thinkingInformation
Computer Products
DataFile
ReportsTables
Thematic MapsGraphs
…
Questionnaire
5
Data Processing Stages:
1. Get ready for enumeration
2. Monitor and evaluate enumeration
3. Capture the data
4. Validate the data [edit]
5. Produce products
6
Is there a Magic Button to help us?
7
8
Form
s To
To
Data Capture
Data File
Data Capture
9
Data Products
Tabulations
Graphs
Maps
10
The Goal
Produce useful products from census/survey information.
Useful products are those that meet the needs of the user community.
Produce these products in a quick and efficient manner
11
Resource Criteria for Census and Survey Processing Time Accuracy Money Staff Regularity Products
12
Data ProcessingEasy as 1-2-3?
1. Capture the information
2. Validate the information
3. Produce the data products
13
What software?
There is a lot of data processing software available!
Which best fits your needs? Do you need training? Do you need money? Do you need help?
14
Why use CSPro?
Designed for census & survey processing Easy to use Modular in design Can be used by novices and/or experts Free Excellent support Windows environment
15
Census and Survey ProcessingSoftware(CSPro)
Tabulations File descriptions (dictionary) Data entry applications Edit applications Dissemination Products
16
CSProCensus and Survey Processingis a public-domain software package for
Entering Tabulating Editing Mapping
1. Create Products [tables, maps, etc.]2. Disseminate the results
Census and Survey data
CSPro was designed and implemented through a joint effort among the developers of IMPS and ISSA: the United States Census Bureau, Macro International, and Serpro, S.A. Funding for the development is
provided by the Office of Population of the United States Agency for International Development.
CSPro is designed to eventually replace both IMPS and ISSA.
17
Data Dictionary
The data dictionary is the base for most of the parts of CSPro
These parts include: Data entry (CSEntry) Data editing (CSBatch) Tabulation
18
Data File Design
How are data stored in the data file?
What is a case? What is on a record? How many records?
19
Objectives:
1. Understand elements of a data file
2. Describe a field, record, and questionnaire
3. Describe data file structures
4. Learn how the CSPro data dictionary defines these elements
20
Needed information about data file
Need identification fields Need information / data fields Need “SIZE” [how many characters] Need valid values/codes
21
Data File StructureASCII/text
ALL data on ONE record/line
Different types of data on DIFFERENT record/lines
22
Data Processing/Data File Terminology
Item/variable/field Record Questionnaire (Case) Data file
23
Item/variable/field
Is a single piece of information Has the attributes of:
SizeTypeNumeric/Alphanumeric
Age
5 1
Sex
M
Income
- 9 8 7 . 6 5
24
Record
a collection of related items forming a single line of information.
For example: Housing Record
contains information about the house
Population Record Contains information about each person in the house
25
Case/Questionnaire
all the records of all types for a processing unit such as a household
26
A data file is
a collection of all the questionnaires (cases)
27
CSPro Data Dictionary
1. Field names/labels
2. Field size
3. Field location
4. Field attributes
5. Record names
6. Record types
7. Records ID’s
8. Records allowed by type
28
Questionnaire sections
29
One Section ==> One record
30
From the questionnaire to the data file
(one record type)
31
From the questionnaire to the data file
(What are the data?)
32
From the questionnaire to the data file
(Where are the data?)
33
Data Dictionary describes the data file
34
CSPro Support
Web site: http://www.census.gov/ipc/www/
E-mail: [email protected]
[End of CSPro demonstration]
35
UN Editing Handbook
Uses Principles and Recommendations as base
Covers how editing fits into whole process Describes different types of edits Gives examples
36
Purpose of Handbook
No census data are ever perfect Changes are made -- little documentation Promote communication between subject
specialists and programmers “Cookbook” of suggestions -- presents
possible resolutions But country edit teams must decide
37
Major Elements in a Census
Preparatory work Enumeration Data processing -- keying, editing and
tabulations Building data bases and dissemination Evaluation of results Analysis of results
38
Errors in Census Process
Coverage Errors Questionnaire Design Enumerator/respondent errors Coding errors Data entry errors Computer editing errors Tabulation errors
39
Errors Generated During Census Processing
Activity Type of Error ┌───────────────────┐ │ Enumeration │ Respondent errors └─────────┬─────────┘ Enumerator errors V ┌─────────┴─────────┐ │ Field Editing │ Field checking └─────────┬─────────┘ Office checking V ┌─────────┴─────────┐ │ Office Coding │ Miscodes └─────────┬─────────┘ V ┌─────────┴─────────┐ │ Data Capture │ Miskeys └─────────┬─────────┘ V ┌─────────┴─────────┐ Logic Errors │ Computer Editing │ Misallocation └─────────┬─────────┘ Miscorrection V ┌─────────┴─────────┐ │ Tabulation │ Distribution of └─────────┬─────────┘ unknowns V ┌─────────┴─────────┐ │ Publication │ Misprints └───────────────────┘
40
Editing in Historical Perspective
Before computers: manual editing With computers: Increased complexity Automated changes Generalized editing packages New philosophies of editing Personal computers Appropriate levels of computer editing
41
Editing Team
Appropriate internal subject matter specialists
Computer Programmers Work together as a team Edit Specs as means of communication Outside experts -- academicians Outside experts -- private sector
42
WHAT CENSUS EDITING SHOULD DO
1 Give users measures of the quality of the data
2 Identify the types and sources of error, and
3 Provide adjusted census results
43
TABLE 1. SAMPLE POPULATION BY 15-YEAR AGE GROUP AND SEX,USING UNEDITED AND EDITED DATA
Unedited data Edited dataAge group Total Male Female Not
reportedTotal Male Female
Total 4147 2033 2091 23 4147 2045 2102Less than 15 years 1639 799 825 15 1743 855 88815 to 29 years 1256 612 643 1 1217 603 61430 to 44 years 727 356 369 2 695 338 35745 to 59 years 360 194 166 0 341 182 15960 to 74 years 116 54 59 3 114 53 6175 years and over 34 12 22 0 37 14 23Not reported 15 6 7 2
Sample table with & without unknowns
44
TABLE 2. POPULATION AND POPULATION CHANGE BY 15-YEAR AGEGROUP WITH UNKNOWNS: 1990 AND 2000
Numbers Per centAge group
2000 1990
NumberChange
Per centChange 2000 1990
Total 4147 3319 828 24.9 100.0 100.0
Less than 15 years 1639 1348 291 21.6 39.5 40.6
15 to 29 years 1256 902 354 39.2 30.3 27.2
30 to 44 years 727 538 189 35.1 17.5 16.2
45 to 59 years 360 200 160 80.0 8.7 6.0
60 to 74 years 116 89 27 30.3 2.8 2.7
75 years and over 34 25 9 36.0 0.8 0.8
Not reported 15 217 -202 -93.1 0.4 6.5
Table showing trends with unknowns
45
Basics of Census Editing
Systematic inspection and change (not always correction)
Fatal edits -- invalid or missing entries Query edits -- inconsistencies Must preserve the original data as much as possible Quality enumeration more important than editing Edit does not improve data quality -- makes more
esthetic Team must determine how far to do
46
More of Basics
Over-editing is harmful Treatment of unknowns Spurious changes Determining tolerances Learning from the edit process Quality assurance Costs of Editing Imputation Archiving
47
How Over-editing is Harmful
Timeliness Finances Distortion of true values A false sense of security
48
Editing Applications
Manual versus automatic correction Guidelines for correcting data Validity and consistency checks Methods of correcting and imputing data Other editing systems
49
Manual versus Automatic Correction Manual correction: takes a long time and
very subject to error Automatic correction: faster and consistent. Not necessarily correct, just consistent. Can look at many variables at the same time Can keep an audit trail
50
Guidelines for Correcting Data
Make the fewest required changes possible to the originally collected data
Eliminate obvious inconsistencies among the entries
Systematically supply entries for erroneous or missing items by using other entries for the housing unit, person, or other persons in the household or group
When appropriate, use “not reported”
51
52
Dangers in editing
Male with fertility – so fertility deleted Second male in spouse pair made female Then, Female without fertility – so fertility
imputed So, before one error – now the initial error
remains, but we have three MORE errors
53
Example of B-A-D Edit Changes
Person Relationship Sex Children ever born
Unedited data
1 Head of household Male 03
2 Spouse Male BLANK
Data after editing for sex
1 Head of household Female 03
2 Spouse Male BLANK
54
Sample house for hotdeck example
ID number Relationship Sex Age
123456789
10
1233444555
121[ ] 212[ ] 12
3935131040
[ ] 13
[ ] 4436
55
Initial and Final Hot Deck Values for single family
Initial values Relationships
Head of household Spouse Son/daughter Other relative Non-relative
(1) (2) (3) (4) (5)
Male (1) 35 35 12 40 40
Female (2) 32 32 12 37 37
Values after changes
Relationships
Head of household Spouse Son/daughter Other relative Non-relative
(1) (2) (3) (4) (5)
Male (1) 39 35 13 40 44
Female (2) 32 35 12 13 36
56
Validity and Consistency Checks
1. Top-down editing approach
2. Multiple variable edit
3. Coding considerations
57
1. Top Down Approach: Order of Edits HOUSING
VARIABLES ON QUESTIONNAIRE
Type of Dwelling Rooms Walls Roof Tenure
HOUSING VARIABLES– ORDER OF EDITS
Tenure Type of Dwelling Rooms Walls Roof
58
2. Multiple Variable Approach – Young Widowed Head with 3 ChildrenNumber Rule Relation Sex Age MarStat Fertility
1 Head of household should be 15 years or older 1 1
2 Spouse should be 15 years or older
3 A “spouse” should be married
4 If spouse present, head of household should be married
5 If spouse present, head of household and spouse should be opposite sex
6 Person less than 15 years old should be never married 1 1
7 Male should have no fertility
8 Female less than 15 years old should have no fertility 1 1
9 For female 15 years or older fertility entry should not be blank
10 A “child” should be younger than head of household
11 A “parent” should be older than head of household
Totals 1 1 3 1 0
59
3. Common Codes Assist in Editing
Group Birthplace Citizenship Language Ethnicity
France/French 10 10 10 10
Spain/Spanish 20 20 20 20
Latin America 25 25 20 25
Philippines/Filipino
30 30 30
Ilokano 32
Tagalog 32
England/English 40 40 40 40
Canada 50 50 40 50
USA 52 52 40 52
60
Methods of Correcting and Imputing Data
1. Change to unknown
2. Static or “Cold Deck” imputation
3. Dynamic or “Hot deck” imputation
61
1. Changing to Unknown – When you don’t have enough information
Usually in censuses, we don’t have enough information to get a good estimate of paid occupations and industries:
If not OCCUPATION in 001:997 then errmsg (“Occupation is invalid, assign unknown”); OCCUPATION = 998; Endif;
62
Changing to unknown: Countries choosing not to impute These days, most countries impute at least items
needed for planning and policy determination If a country still decides not to impute Then, staff might assign “unknown” even items
used for planning:
If SEX is not 1 or 2 then SEX = 9 endif
63
2. Static Imputation – Making young people “Never married” In Static Imputation, the same value or values are always
assigned:
If AGE < 15 then if MARITAL_STATUS <> NEVER_MARRIED then errmsg (“Young person not never married”); MARITAL_STATUS = NEVER_MARRIED; endif; Endif;
64
A kind of static imputation: changing using logical values Since we have only two sexes, we can alternate
between them when invalid or inconsistent values appear:
Keep a cell in the computer’s buffer: XSEX If SEX is unknown then let SEX = XSEX change XSEX to the other SEX for the next
usage endif
65
3. Hot Deck Imputation
Geographic considerations Use of related items Sequence of the items Complexity of the matrices Standardized hot decks Size of hot decks -- too big, audit trail,
too small, difficult items
66
Types of Edits
Structure edits – Bookkeeping, getting each locality within each minor civil division within each major civil division
Content edits – Housing items Content edits – Population items Content edits – Inter-record checking
67
Standard Edit: Language Edit
If this is the head and language is missing, first look for someone else in the house with language, and assign that.
If this is the head without language, no one else has language, use neighboring head of similar characteristics to assign a best guess.
If this is someone else in the house and language is missing, assign the head’s language.
68
PROC LANGUAGE errmsg (" ******* Language ************ "), summary;
{. . ****************************************************************************** . ************** **************** . ************** Language edit **************** . ************** **************** . ****************************************************************************** .} if LANGUAGE in 1:17 then if RELAT = 1 then ALANGUAGE (AGE10,SEX) = LANGUAGE; endif; else if RELAT = 1 then PERSONPTR = 0; do varying i = 1 until i > TOTOCC (POP_EDT) if LANGUAGE (i) in 1:17 then PERSONPTR = i; endif; enddo; if PERSONPTR = 0 then errmsg("*D05-2* LANGUAGE imputed from Age and Sex, pn= %02d, lang= %01d", PERSNUM,LANGUAGE) denom = denomPop summary; F1F2(); write("*D05-2* LANGUAGE imputed from Age and Sex, pn= %02d, lang= %01d", PERSNUM,LANGUAGE); impute( LANGUAGE , ALANGUAGE (AGE10,SEX)); else errmsg("*D05-3* Head's LANGUAGE imputed from Other's LANGUAGE, pn= %02d, lang= %01d, personptr= %02d, pr-lang %01d", PERSNUM,LANGUAGE,PERSONPTR,LANGUAGE(PERSONPTR)) denom = denomPop summary; F1F2(); write("*D05-3* Head's LANGUAGE imputed from Other's LANGUAGE, pn= %02d, lang= %01d, personptr= %02d, pr-lang %01d", PERSNUM,LANGUAGE,PERSONPTR,LANGUAGE(PERSONPTR)); impute (LANGUAGE , LANGUAGE (PERSONPTR)); endif; endif; endif; else F1F2(); errmsg("*D05-4* LANGUAGE imputed from Head's LANGUAGE, lang %d langhd %d", LANGUAGE,LANGUAGE(headpt)) denom = denomPop summary; write("*D05-4* LANGUAGE imputed from Head's LANGUAGE, pn= %02d, lang= %01d, langhd= %01d",persnum,LANGUAGE,LANGUAGE(headpt)); impute (LANGUAGE , LANGUAGE (1)); endif; endif;
69
Language OK and head, update the hotdeck
For the Standard edit, if the variable is valid, we update the hot deck.
This is the code:
if LANGUAGE in 1:17 then if RELAT = 1 then ALANGUAGE (AGE10,SEX) = LANGUAGE; endif;
70
Single person house, get language from nearby house
Normally, we want to look for others in the house with the variable.
But, in one-person houses, no one else to look at, so we have to impute:
if RELAT = 1 then if TOTOCC (POP_EDT) = 1 then errmsg("*D05-2A* Single person house: Language imputed from Age and Sex, pn= %02d, lang = %01d",PERSNUM,LANGUAGE) denom = denomPop summary; F1F2(); write("*D05-2A* Single person house: Language imputed from Age and Sex, pn= %02d, eth= %01d",PERSNUM,LANGUAGE); impute( LANGUAGE , ALANGUAGE (AGE10,SEX));
71
Someone else in house has language, assign that to head Assign the first other person’s language to head: else PERSONPTR = 0; do varying i = 1 until i > TOTOCC (POP_EDT) if LANGUAGE (i) in 1:17 then PERSONPTR = i; endif; enddo; if PERSONPTR = 0 then errmsg("*D05-2* LANGUAGE imputed from Age and Sex, pn= %02d, lang= %01d", PERSNUM,LANGUAGE) denom = denomPop summary; F1F2(); write("*D05-2* LANGUAGE imputed from Age and Sex, pn= %02d, lang= %01d", PERSNUM,LANGUAGE); impute( LANGUAGE , ALANGUAGE (AGE10,SEX));
72
If no one else has language, get from nearby head same age and sex
No one else has a valid entry for this item, so impute from the nearest neighbor with a valid entry:
errmsg("*D05-3* Head's LANGUAGE imputed from Other's LANGUAGE,
pn= %02d, lang= %01d, personptr= %02d, pr-lang %01d", PERSNUM,LANGUAGE,PERSONPTR,LANGUAGE(PERSONPTR)) denom = denomPop summary; F1F2(); write("*D05-3* Head's LANGUAGE imputed from Other's LANGUAGE, pn= %02d, lang= %01d, personptr= %02d, pr-lang %01d", PERSNUM,LANGUAGE,PERSONPTR,LANGUAGE(PERSONPTR)); impute (LANGUAGE , LANGUAGE (PERSONPTR));
73
For others in house, assign head’s language
Once the head has a valid entry for the variable, others can obtain theirs from the head:
F1F2(); errmsg("*D05-4* LANGUAGE imputed from Head's LANGUAGE, lang %d langhd %d", LANGUAGE,LANGUAGE(headpt)) denom = denomPop summary; write("*D05-4* LANGUAGE imputed from Head's LANGUAGE, pn= %02d, lang= %01d, langhd= %01d",persnum,LANGUAGE,LANGUAGE(headpt)); impute (LANGUAGE , LANGUAGE (1));
74
Language Edit: Within House
Example of WRITE Statement in CSPro to assist in finding the error
Note: Before and after edit displays, with what is done in edit in the middle
Assigning Head’s language from other people91200217 Population Group Case = 0009 ID1 ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS PERMPLAC SM 01 1 034 01 1 55 1 09 1 1 02 2 023 02 1 06 55 1 07 1 1 03 2 005 03 1 06 55 1 09 1 1 04 2 003 03 1 06 55 1 09 1 1V.14c: P07 invalid for head, imputing from other PN = 01 Lang = Oth lang = 06V.14c: P07 invalid for head, imputing from other PN = 01 Lang = 06 Oth lang = 06V.14c: P07 invalid for head, imputing from other PN = 01 Lang = 06 Oth lang = 06end ID1 ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS PERMPLAC SM 01 1 034 01 1 06 55 1 09 1 1 02 2 023 02 1 06 55 1 07 1 1 03 2 005 03 1 06 55 1 09 1 1 04 2 003 03 1 06 55 1 09 1 1
75
Language Edit: Imputed Head from Previous Household Head No one has language, so first head gets language
from previous head of same age and sex Then the others in the house get their language
from the head91200697 Language Case = 0027 ID1 ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS PERMPLAC SM 01 1 027 01 1 1 09 1 1 02 2 027 02 1 1 09 1 1 03 1 005 03 1 1 09 1 1V.14d: P07 invalid, imputing from deck ALANGUAGE PN = 01 Lang =V.15d: P08 invalid for head, impute from deck ARELIGIO PN = 01 Head Relig =V.14f: P07 invalid, imputing from head PN = 02 Lang = Head's lang = 06V.15f: P08 invalid, imputing from head's religion PN = 02 Relig = Head'srelig = 38V.14f: P07 invalid, imputing from head PN = 03 Lang = Head's lang = 06V.15b: imputing P08 from mother's religion PN = 03 Relig = Mo relig = 38end ID1 ID2 PN SEX AGE RTN GRP LAN REL RSA PRV CNT CTZ URS PERMPLAC SM 01 1 027 01 1 06 38 1 09 1 1 02 2 027 02 1 06 38 1 09 1 1 03 1 005 03 1 06 38 1 09 1 1
76
Series of Edit Problems
We are going to do three exercises that will look at different kinds of editing problems
Note: these are all simplified – most of the time edits must be more complicated
But these cover the basics
77
Countries choosing not to impute
Exercise 1 in the packet: simple edits for population items These days, most countries impute at least items needed
for planning and policy determination If a country still decides not to impute Then, staff might assign “unknown” even items used for
planning:
If SEX is not 1 or 2 then SEX = 9 endif
78
A simple kind of edit: when SEX is invalid Since we only have two sexes, the easiest way to edit is to
alternate between the sexes:
SEXCHANGE = 2 . . . If not SEX in 1:2 then SEX = SEXCHANGE; SEXCHANGE = 3 – SEXCHANGE; Endif;
The program will assign the stored sex and then will change the holding variable to await the next instance of “bad” sex
79
What if this person has complete fertility information? Use other intra-record variables to assist in an edit.
If this person has an invalid entry for sex but has fertility information:
If not SEX in 1:2 then if FERTILITY <> NOTAPPL then errmsg (“Has Fertility info so Female”); SEX = 2; endif; Endif;
80
But what if this is the Spouse and we know the Head’s Sex
For Programmers: Use inter-record information when it is available:
So when Head has sex reported, but the Spouse does not:
If SEX (1) in 1:2 then errmsg (“Sex of spouse from sex of head”); SEX = 3 – SEX(1); Endif;
81
Exercise 2: Housing edits
Since housing does not usually require crosstabulations, except by geography, edits tend to be more simple
But still must edit for invalids and certain inconsistencies
82
Housing edits: Rooms and bedrooms
When a census collects both rooms and bedrooms, the numbers of bedrooms should not be more than the number of rooms
Some countries collect the information independently – rooms except bedrooms, and then bedrooms, so this edit would not work
Edit: If Bedrooms > rooms, then make them the same
83
Housing edits: Walls and Roof
Each variable needs a separate edit If you use hotdeck, then invalids need to be
assigned from nearest neighbor with similar characteristics
Then, you need to check for inconsistencies For example, if you have a house with a concrete
roof but thatch walls, the roof would collapse the walls, so you need an edit to correct for this
84
Inter-record checking – one record type Sometimes you need to look between records, not
just within a record For example, Each household should have one and
only one head [This is exercise 3] So you need to look through house counting the
heads Need to make sure you have exactly one head So at least one head and not more than one head!!
85
Inter-record checking for spouses
Does every household have to have one and only one spouse?
Consider polygamous houses … do multiple spouses even live together ?
What about other types of household structures?
86
Other types of inter-record checking
If a spouse is present, the sex of the head and the sex of the spouse should be opposite
[This may no longer hold in some countries] If a spouse is present, both the head and the
spouse should be reported as “married” or in “common-law” arrangement – and these should be the same
87
Inter-record checking for population edits
Age of head and age of spouse
Figure 4. Example of household with potential inconsistencies in age reporting Father
Head of household Spouse (age 43) (age 70)
Son Daughter (age 10) (age 8)
88
Figure 4. Example of household with potential inconsistencies in age reporting Father
Head of household Spouse (age 43) (age 70)
Son Daughter (age 10) (age 8)
WHAT IS WRONG HERE?
Note: Head is 43 years old, Spouse is 70
Note: Children are 10 and 8
SO: need to change age of spouse
89
Inter-record checking for age
Need to use a hot deck – you have choices You could have a hot deck with age of head
and age of spouse for previous households OR, you could have a hot deck with age
differences between heads and spouses In either case, you should have separate
categories for males and females – because they act differently
90
Inter-record checking: Between Record Types Until now we looked at one record type – Population or
Housing – but sometimes we need to compare them Vacant houses should have no people and occupied houses
should have people CSPro code: if TENURE = 5 then {Vacant unit} if TOTOCC (POP) <> 0 then {people in vacant unit} [determine tenure – owning or renting – code note shown here] endif; Else {For owned or rented units} if TOTOCC(POP) = 0 then {no people in an occupied unit} impute (TENURE,5); {make this a vacant unit} endif; Endif;
91
THANK YOU
UN Editing Specifications Workshop