Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.
-
date post
18-Dec-2015 -
Category
Documents
-
view
222 -
download
3
Transcript of Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.
Statistical Disclosure Control for the 2011 UK Census
Keith Spicer
Office for National Statistics
Overview
• Disclosure Risk
• UK Census – context
• Evaluation of methods
• Proposed strategy
• Further work
What is disclosure risk?
There is a disclosure risk when information is published
that could allow an intruder to indicate the identity or
particulars of:
• an individual
• a household or family
• a business
• or another statistical unit
Statistical Disclosure Control
• Statistical Disclosure Control (SDC) involves• either:
• introducing sufficient ambiguity / damage into, or reducing level of detail of published statistics so that the risk of disclosing confidential information is reduced to an acceptable level
• and / or: • controlling access to data
Risk – Utility balance
Disclosure Risk:
Information about
confidential units
Data Utility: Information about legitimate items
Original Data
No dataReleased
Data
Maximum Tolerable Risk
High
High
Low
UK Census - Context (1)
• 2001 – • random record swapping• SCA applied in E, W, NI, not in Scotland• Lack of harmonisation and late changes• SCA protected individual tables, but some remaining
risk through differencing
UK Census - Context (2)
• RsG agreement November 2006– Small cell counts as long as ‘sufficient uncertainty’– Main risk attribute disclosure – finding out something new
about an individual……..
• Evaluation to short-list – Qualitative – including user acceptability, additivity,
consistency, feasibility– 3 methods:
• Record swapping• Over imputation• IACP method (post-tabular) based on ABS
UK Census - Context (3)
• Short-list of 3 methods evaluated• Quantitative assessment using 2001 Census data,
using different measures of risk and utility– Protection against disclosure (and differencing)– Measures of association– Effect on totals & sub-totals– Variances– Rankings
• Revisit qualitative aspects• Proposed Strategy – Record Swapping
Proposed Strategy: Record Swapping
• Swap the geographical location of a small
number of households
• Households are paired according to similar
characteristics (to avoid too much data
distortion)
• Creates uncertainty in the data
• Can target risky records
B
Area B
A
Treatment:Find a different geographical Area Identify another individual in a different area with the same characteristics on matching variables Swap the two records
Characteristics:
Age: 22,
Sex: Male,
Marital Status: Single
Economic activity: Student
Tenure: Rented
Characteristics
Age: 22,
Sex: Male,
Marital Status: Single
Economic activity: Active
Tenure: Owned
Matches all variables except economic activity
and tenure
Swap records
Record swapping
Record swapping
• Pre-tabular method protects underlying microdata• Protected tables will be additive and consistent• Minimise bias by use of matching variables• Vary swap rates by geographical level• Relatively simple to understand and implement
• Some risks from population uniques at higher geographies (in microdata)
• Need consideration for ‘special outputs’
Record swapping – further work
• Determine swapping rates– Set tolerable risk threshold– Vary by geographical level
• Targeted or random– How to determine ‘risky’ records
• Take into account levels of imputation
• Interaction with output design– Flexible table / hypercube solutions – how much detail can we
have in a hypercube?– Additional ‘rules’ around table design– Geography – providing ‘exact fit’?
Record swapping – further work
• Protecting outputs for special populations– Workplace zones– Communal establishments
• Origin-destination tables– Protection of most detailed via licensing– Consideration of what can be ‘public use’
• Microdata– Suite of products– Detailed content
• Record swapping will be ‘smarter’ in 2011 – targeting risky records at low geographies
Summary
• Extensive evaluation of SDC methods
• Record swapping primary strategy for tabular
outputs
• ‘Smarter’
• Further work continues
Output GeographyAndy Tait/Ian Coady
ONS Geography
Overview
• Background– 2001 Output Geography - OAs– Neighbourhood Geographies - SOAs
• What has changed since 2001?• 2011 Requirements
– 2007 Geography Consultation – what you said– Resulting Policy
• Work in progress– OA/SOA Maintenance Research project – Workplace Zones
• 2009 Geography Consultation
2001 Output Areas - why
• Census output geography separated from data collection geography
• a geography created from Census data
• consistent size in population/no of households
• socially homogeneous
• meets confidentiality thresholds
• aligns with administrative boundaries
• Consistent throughout UK
2001 Output Areas
• 175,000 output areas• Mean 297 persons; 123
households• Freely available digital
boundary data • Building blocks for
“neighbourhood” geographies: Super Output Areas (LSOAs, MSOAs)
Image courtesy of David Martin. This work is based on data provided through EDINA UKBORDERS with the support of the ESRC and JISC and uses boundary material which is copyright of the Crown.
2001 Output Areas – achieved size
• hhds
• Pop
0
10000
20000
30000
40000
50000
60000
70000
40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99 100 -109
110 -119
120 -129
130 -139
140 -149
150 -159
160 -169
170 –179
180 -189
190 -199
200+
Household range
0
10000
20000
30000
40000
100 -124
125 -149
150 -174
175 -199
200 -224
225 -249
250 -274
275 -299
300 -324
325 -349
350 -374
375 -399
400 -424
425 -449
450 -474
475 -499
500+
Population range
Super Output Areas (SOAs)
• created 2004, for Neighbourhood Statistics
• groupings of Output Areas
• layered hierarchy – lower, middle, upper layers
• each layer with size thresholds and targets offer levels of statistical reporting
• Lower SOAs ≈ approx 35,000 OAs, avge pop ≈ 1,500 - created automatically
• Middle SOAs ≈ approx 7,000 OAs, avge pop ≈ 7,200 - created automatically – modified locally
• Upper SOAs not created
Wards 1998Wards 1998
Index of Deprivation 1998Index of Deprivation 1998
Index of Deprivation 2004Index of Deprivation 2004
Lower Layer SOAs 2004Lower Layer SOAs 2004
Changes since 2001 - population
• Population growth, especially migration• More and smaller households • Newly built properties
– Greenfield/new land– Brownfield/in-filling
• Sub-division of existing properties• Changing socio-economic characteristics
of areas
Changes since 2001 - geography
• Postcodes• Census address register • Ward/parish changes since 2003• Administrative re-organisation
How much change by 2011
Lower threshold
Upper threshold
Population threshold
OAs 100 people 625 people (2 *target)
2.5 * household thresholds
LSOAs 1000 people
3000 people (2 *target)
2.5 * household thresholds
MSOAs 5000 people
15000 people (2 *target)
2.5 * household thresholds
How much change by 2011?
2001-2005 threshold breaches, based on mid-year population estimates
Output Areas:
2005 below 2005 within 2005 above 2001 totals
2001 below 221 228 1 450
2001 within 147 173553 682 174382
2001 above 0 78 506 584
2005 totals 368 173859 1189 175416
99.1%
How much change by 2011?
Lower Layer Super Output Areas:
2005 below 2005 within 2005 above 2001 totals
2001 below 6 8 0 14
2001 within 34 34242 58 34334
2001 above 0 3 27 30
2005 totals 40 34253 85 34378
99.6%
How much change by 2011?
Middle Layer Super Output Areas:
2005 below 2005 within 2005 above 2001 totals
2001 below 3 4 0 7
2001 within 8 7178 0 7186
2001 above 0 0 1 1
2005 totals 11 7182 1 7194
99.8%
Key messages
• Most output areas (and LSOAs, MSOAs) unlikely to have breached thresholds by 2011
• BUT, changes clustered geographically, so could breach badly in some areas
• Some areas already known to be problematic in 2001
Small Area Geography Consultation 2007Strong support for:• Stability with 2001 (but reflect change!)• Easy/free licensing of boundaries• Mean high water boundary set• England/Scotland alignment
Some support (in descending order) for: • Aligning boundaries to real world features• Separating communal establishments• Retaining postcode blocks v street blocks• Building a separate set of zones based on workplace• Building separate OAs with no population• Building an Upper layer of SOAs
Resulting in ONS policy for 2011 Geography………• Change only significant population change:
– split where populations too big– merge where population too small
• No more than 5% overall change (could be well under)• Assess methods of splitting/merging• No real world alignment for its own sake• Consider redesign of extreme cases where unfit as statistical zone • No separate “empty” OAs• Align Scotland and England at the border• Mean high water boundaries as well • Investigate new workplace geography linked to OAs• Keep licensing free, get better deal for commercial use • Exact count outputs for OAs and other geographies, e.g. wards – a matter for disclosure control
OA/SOAs – some “not fit for purpose”?
OA/SOAs – not fit for purpose” ?
Challenges for 2011 output geography design
• Stability at what level? OA, LSOA, MSOA?• Building blocks? Postcodes or street
blocks?• Constrain within wards, LADs?• Same design criteria as 2001?• BUT: balance against licensing issues• Automation of processes
Census2011Geog project – Southampton University
• ESRC funded project• Develop automated procedures for maintaining
(splitting, merging, re-designing) 2001 output geographies to create 2011 output geographies for E&W
• Assess implications of using different building blocks (e.g. postcodes, street blocks) maintenance
• Work extended to January 2010
2001 OAs 2001 LSOAs
Above upper threshold
Within thresholds
Below lower threshold
Merge(merge
2001 OAs)
Split(aggregate postcodes/
street blocks)
2011 OAs
2011 OAs
2011 OAs
Append 2011 OAs
Postcodes/Street blocksFor a 2001 LAD/UA
Merge all 2011 OAs from all LADs/UAs
Automated maintenance procedures
Absolute population change 2001-2005 (mid-year estimates)Camden
Increase
Decrease
This work is based on data provided through EDINA UKBORDERS with the support of the ESRC and JISC and uses boundary material which is copyright of the Crown.
Absolute population change 2001-2005 (mid-year estimates)Liverpool
Increase
Decrease
This work is based on data provided through EDINA UKBORDERS with the support of the ESRC and JISC and uses boundary material which is copyright of the Crown.
Absolute population change 2001-2005 (mid-year estimates)Manchester
Increase
Decrease
This work is based on data provided through EDINA UKBORDERS with the support of the ESRC and JISC and uses boundary material which is copyright of the Crown.
More information on OA Maintenance project at
http://census2011geog.census.ac.uk
Workplace Zones
• OAs based on where people live not work – can be unsuitable for workplace statistics
• Some OAs contain no/few businesses; some contain many businesses or large employer, e.g. business parks, City of London
• Workplace Zones project looking at splitting/merging OAs for a new geography nesting with OAs
• User Group established• Pilot WZs to be created/evaluated 2010 Q2
2009 Output Geography consultation
• Need for an Upper layer SOA
• Workplace Zone requirements
• Provide instances of OAs/SOAs that are unfit as a statistical geography– Priority instances– Not useful for analysis due to their design– ONS panel to consider redesign
2009 Output Geography consultation
• Census Geography consultation part of Census Outputs consultation
• Runs for three months from November 2009
• Follow up submissions January to May 2010
Conclusions contd
5. Greater flexibility in outputsi. Hypercube research
6. Multiple population bases7. Geography
i. Workplace zonesii. Possible production of data on two geographical
bases8. Application Programme Interface (API)
i. Access to census dataii. Functionality of census data
Conclusions contd
9. Increased user input in consultation processi. Rounds of consultationii. Online survey / persona researchiii. Methods of engaging users
• Topic group experts• Advisory groups• Working groups• Consulting users and distributors of census data• Academic groups• Direct consultation including output consultation events
and internet