Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

45
Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    222
  • download

    3

Transcript of Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Page 1: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Statistical Disclosure Control for the 2011 UK Census

Keith Spicer

Office for National Statistics

Page 2: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Overview

• Disclosure Risk

• UK Census – context

• Evaluation of methods

• Proposed strategy

• Further work

Page 3: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

What is disclosure risk?

There is a disclosure risk when information is published

that could allow an intruder to indicate the identity or

particulars of:

• an individual

• a household or family

• a business

• or another statistical unit

Page 4: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Statistical Disclosure Control

• Statistical Disclosure Control (SDC) involves• either:

• introducing sufficient ambiguity / damage into, or reducing level of detail of published statistics so that the risk of disclosing confidential information is reduced to an acceptable level

• and / or: • controlling access to data

Page 5: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Risk – Utility balance

Disclosure Risk:

Information about

confidential units

Data Utility: Information about legitimate items

Original Data

No dataReleased

Data

Maximum Tolerable Risk

High

High

Low

Page 6: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

UK Census - Context (1)

• 2001 – • random record swapping• SCA applied in E, W, NI, not in Scotland• Lack of harmonisation and late changes• SCA protected individual tables, but some remaining

risk through differencing

Page 7: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

UK Census - Context (2)

• RsG agreement November 2006– Small cell counts as long as ‘sufficient uncertainty’– Main risk attribute disclosure – finding out something new

about an individual……..

• Evaluation to short-list – Qualitative – including user acceptability, additivity,

consistency, feasibility– 3 methods:

• Record swapping• Over imputation• IACP method (post-tabular) based on ABS

Page 8: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

UK Census - Context (3)

• Short-list of 3 methods evaluated• Quantitative assessment using 2001 Census data,

using different measures of risk and utility– Protection against disclosure (and differencing)– Measures of association– Effect on totals & sub-totals– Variances– Rankings

• Revisit qualitative aspects• Proposed Strategy – Record Swapping

Page 9: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Proposed Strategy: Record Swapping

• Swap the geographical location of a small

number of households

• Households are paired according to similar

characteristics (to avoid too much data

distortion)

• Creates uncertainty in the data

• Can target risky records

Page 10: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

B

Area B

A

Treatment:Find a different geographical Area Identify another individual in a different area with the same characteristics on matching variables Swap the two records

Characteristics:

Age: 22,

Sex: Male,

Marital Status: Single

Economic activity: Student

Tenure: Rented

Characteristics

Age: 22,

Sex: Male,

Marital Status: Single

Economic activity: Active

Tenure: Owned

Matches all variables except economic activity

and tenure

Swap records

Record swapping

Page 11: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Record swapping

• Pre-tabular method protects underlying microdata• Protected tables will be additive and consistent• Minimise bias by use of matching variables• Vary swap rates by geographical level• Relatively simple to understand and implement

• Some risks from population uniques at higher geographies (in microdata)

• Need consideration for ‘special outputs’

Page 12: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Record swapping – further work

• Determine swapping rates– Set tolerable risk threshold– Vary by geographical level

• Targeted or random– How to determine ‘risky’ records

• Take into account levels of imputation

• Interaction with output design– Flexible table / hypercube solutions – how much detail can we

have in a hypercube?– Additional ‘rules’ around table design– Geography – providing ‘exact fit’?

Page 13: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Record swapping – further work

• Protecting outputs for special populations– Workplace zones– Communal establishments

• Origin-destination tables– Protection of most detailed via licensing– Consideration of what can be ‘public use’

• Microdata– Suite of products– Detailed content

• Record swapping will be ‘smarter’ in 2011 – targeting risky records at low geographies

Page 14: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Summary

• Extensive evaluation of SDC methods

• Record swapping primary strategy for tabular

outputs

• ‘Smarter’

• Further work continues

Page 15: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Output GeographyAndy Tait/Ian Coady

ONS Geography

Page 16: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Overview

• Background– 2001 Output Geography - OAs– Neighbourhood Geographies - SOAs

• What has changed since 2001?• 2011 Requirements

– 2007 Geography Consultation – what you said– Resulting Policy

• Work in progress– OA/SOA Maintenance Research project – Workplace Zones

• 2009 Geography Consultation

Page 17: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

2001 Output Areas - why

• Census output geography separated from data collection geography

• a geography created from Census data

• consistent size in population/no of households

• socially homogeneous

• meets confidentiality thresholds

• aligns with administrative boundaries

• Consistent throughout UK

Page 18: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

2001 Output Areas

• 175,000 output areas• Mean 297 persons; 123

households• Freely available digital

boundary data • Building blocks for

“neighbourhood” geographies: Super Output Areas (LSOAs, MSOAs)

Image courtesy of David Martin. This work is based on data provided through EDINA UKBORDERS with the support of the ESRC and JISC and uses boundary material which is copyright of the Crown.

Page 19: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

2001 Output Areas – achieved size

• hhds

• Pop

0

10000

20000

30000

40000

50000

60000

70000

40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99 100 -109

110 -119

120 -129

130 -139

140 -149

150 -159

160 -169

170 –179

180 -189

190 -199

200+

Household range

0

10000

20000

30000

40000

100 -124

125 -149

150 -174

175 -199

200 -224

225 -249

250 -274

275 -299

300 -324

325 -349

350 -374

375 -399

400 -424

425 -449

450 -474

475 -499

500+

Population range

Page 20: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Super Output Areas (SOAs)

• created 2004, for Neighbourhood Statistics

• groupings of Output Areas

• layered hierarchy – lower, middle, upper layers

• each layer with size thresholds and targets offer levels of statistical reporting

• Lower SOAs ≈ approx 35,000 OAs, avge pop ≈ 1,500 - created automatically

• Middle SOAs ≈ approx 7,000 OAs, avge pop ≈ 7,200 - created automatically – modified locally

• Upper SOAs not created

Page 21: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Wards 1998Wards 1998

Index of Deprivation 1998Index of Deprivation 1998

Page 22: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Index of Deprivation 2004Index of Deprivation 2004

Lower Layer SOAs 2004Lower Layer SOAs 2004

Page 23: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Changes since 2001 - population

• Population growth, especially migration• More and smaller households • Newly built properties

– Greenfield/new land– Brownfield/in-filling

• Sub-division of existing properties• Changing socio-economic characteristics

of areas

Page 24: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Changes since 2001 - geography

• Postcodes• Census address register • Ward/parish changes since 2003• Administrative re-organisation

Page 25: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

How much change by 2011

Lower threshold

Upper threshold

Population threshold

OAs 100 people 625 people (2 *target)

2.5 * household thresholds

LSOAs 1000 people

3000 people (2 *target)

2.5 * household thresholds

MSOAs 5000 people

15000 people (2 *target)

2.5 * household thresholds

Page 26: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

How much change by 2011?

2001-2005 threshold breaches, based on mid-year population estimates

Output Areas:

2005 below 2005 within 2005 above 2001 totals

2001 below 221 228 1 450

2001 within 147 173553 682 174382

2001 above 0 78 506 584

2005 totals 368 173859 1189 175416

99.1%

Page 27: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

How much change by 2011?

Lower Layer Super Output Areas:

2005 below 2005 within 2005 above 2001 totals

2001 below 6 8 0 14

2001 within 34 34242 58 34334

2001 above 0 3 27 30

2005 totals 40 34253 85 34378

99.6%

Page 28: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

How much change by 2011?

Middle Layer Super Output Areas:

2005 below 2005 within 2005 above 2001 totals

2001 below 3 4 0 7

2001 within 8 7178 0 7186

2001 above 0 0 1 1

2005 totals 11 7182 1 7194

99.8%

Page 29: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Key messages

• Most output areas (and LSOAs, MSOAs) unlikely to have breached thresholds by 2011

• BUT, changes clustered geographically, so could breach badly in some areas

• Some areas already known to be problematic in 2001

Page 30: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Small Area Geography Consultation 2007Strong support for:• Stability with 2001 (but reflect change!)• Easy/free licensing of boundaries• Mean high water boundary set• England/Scotland alignment

Some support (in descending order) for: • Aligning boundaries to real world features• Separating communal establishments• Retaining postcode blocks v street blocks• Building a separate set of zones based on workplace• Building separate OAs with no population• Building an Upper layer of SOAs

Page 31: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Resulting in ONS policy for 2011 Geography………• Change only significant population change:

– split where populations too big– merge where population too small

• No more than 5% overall change (could be well under)• Assess methods of splitting/merging• No real world alignment for its own sake• Consider redesign of extreme cases where unfit as statistical zone • No separate “empty” OAs• Align Scotland and England at the border• Mean high water boundaries as well • Investigate new workplace geography linked to OAs• Keep licensing free, get better deal for commercial use • Exact count outputs for OAs and other geographies, e.g. wards – a matter for disclosure control

Page 32: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

OA/SOAs – some “not fit for purpose”?

Page 33: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

OA/SOAs – not fit for purpose” ?

Page 34: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Challenges for 2011 output geography design

• Stability at what level? OA, LSOA, MSOA?• Building blocks? Postcodes or street

blocks?• Constrain within wards, LADs?• Same design criteria as 2001?• BUT: balance against licensing issues• Automation of processes

Page 35: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Census2011Geog project – Southampton University

• ESRC funded project• Develop automated procedures for maintaining

(splitting, merging, re-designing) 2001 output geographies to create 2011 output geographies for E&W

• Assess implications of using different building blocks (e.g. postcodes, street blocks) maintenance

• Work extended to January 2010

Page 36: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

2001 OAs 2001 LSOAs

Above upper threshold

Within thresholds

Below lower threshold

Merge(merge

2001 OAs)

Split(aggregate postcodes/

street blocks)

2011 OAs

2011 OAs

2011 OAs

Append 2011 OAs

Postcodes/Street blocksFor a 2001 LAD/UA

Merge all 2011 OAs from all LADs/UAs

Automated maintenance procedures

Page 37: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Absolute population change 2001-2005 (mid-year estimates)Camden

Increase

Decrease

This work is based on data provided through EDINA UKBORDERS with the support of the ESRC and JISC and uses boundary material which is copyright of the Crown.

Page 38: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Absolute population change 2001-2005 (mid-year estimates)Liverpool

Increase

Decrease

This work is based on data provided through EDINA UKBORDERS with the support of the ESRC and JISC and uses boundary material which is copyright of the Crown.

Page 39: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Absolute population change 2001-2005 (mid-year estimates)Manchester

Increase

Decrease

This work is based on data provided through EDINA UKBORDERS with the support of the ESRC and JISC and uses boundary material which is copyright of the Crown.

Page 40: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

More information on OA Maintenance project at

http://census2011geog.census.ac.uk

Page 41: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Workplace Zones

• OAs based on where people live not work – can be unsuitable for workplace statistics

• Some OAs contain no/few businesses; some contain many businesses or large employer, e.g. business parks, City of London

• Workplace Zones project looking at splitting/merging OAs for a new geography nesting with OAs

• User Group established• Pilot WZs to be created/evaluated 2010 Q2

Page 42: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

2009 Output Geography consultation

• Need for an Upper layer SOA

• Workplace Zone requirements

• Provide instances of OAs/SOAs that are unfit as a statistical geography– Priority instances– Not useful for analysis due to their design– ONS panel to consider redesign

Page 43: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

2009 Output Geography consultation

• Census Geography consultation part of Census Outputs consultation

• Runs for three months from November 2009

• Follow up submissions January to May 2010

Page 44: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Conclusions contd

5. Greater flexibility in outputsi. Hypercube research

6. Multiple population bases7. Geography

i. Workplace zonesii. Possible production of data on two geographical

bases8. Application Programme Interface (API)

i. Access to census dataii. Functionality of census data

Page 45: Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Conclusions contd

9. Increased user input in consultation processi. Rounds of consultationii. Online survey / persona researchiii. Methods of engaging users

• Topic group experts• Advisory groups• Working groups• Consulting users and distributors of census data• Academic groups• Direct consultation including output consultation events

and internet