Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009...
-
date post
22-Dec-2015 -
Category
Documents
-
view
213 -
download
1
Transcript of Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009...
www.ihsn.org
• International Household Survey Network• A network of international agencies
• Based in Paris at the OECD at PARIS21• A coordinating mechanism to:
– Improve quality and use of household survey data in developing countries
– Harmonize international recommendations for survey design, data analysis, etc
– Produce and disseminate international good practices
…
About IHSN
www.ihsn.org
Accelerated Data Program
• Implementing the IHSN Tools in the countries• Technical and financial support to establish national data
archives (in > 50 countries)• Many datasets documented (DDI)• Improved access to data by researchers, but not yet
satisfactory. We can measure demand through the NADA• Need to anonymize data remains the most frequently
expressed concern and obstacle to data access.• The ADP has provided some guidance but there is a lack of
simple and intuitive tools and guidelines available ADP countries.
ADP/IHSN in the world
ADP country Expected ADP in 2009 By partners
Focus Nigeria
Effects of data availability on MDG 7.Halving the population without sustainable access tosafe drinking water.
Providing robustestimates to informpolicy makersand sectormonitoring.
Water and SanitationSector. Workshop withWHO/UNICEF
www.ihsn.org
Effects of Data Availability
• Nigeria and the MDG: Rural access to improved water source
Resistance in the countries
• Nigeria Statistics Law: Statistical Act of 2007 obliges microdata release after due anonymization. The legal framework exists.
• Willing institution (the NBS in Nigeria)• Current anonymization strategies undertaken are limited to
removal of direct identifiers however,• Other countries are unable to articulate a proper policy for
dissemination and tend to use confidentiality as a barrier to mask political resistance or inertia.
• IHSN anonymization tools will be a way to deal with both real ethical concerns but also political resistance
www.ihsn.org
Better use of survey data
• Lots of survey data remain under-exploited because not accessible by researchers/users
• Obstacles:– Technical – Psychological– Financial Support by many sponsors– Legal – Ethical– Political … ? …
IHSN data documentation and cataloguing tools and guidelines
www.ihsn.org
•Direct identifiers, which are variables such as names, addresses, or identity card numbers. They permit direct identification of a respondent but are not needed for statistical or research purposes, and should thus be removed from the published dataset.
•Indirect identifiers, which are characteristics that may be shared by several respondents, and whose combination could lead to the re-identification of one of them. For example, the combination of variables such as district of residence, age, sex, and profession would be identifying if only one individual of that particular sex, age and profession lived in that particular district. Such variables are needed for statistical purposes, and should thus not be removed from the published data files.
Anonymize:Process
Once all identifying variables have been removed we can still have a disclosure problem, the problem remains dealing with the indirect identifiers.
The IHSN Anonymization tools will approach these problems by building on a
great deal of technical work undertaken by experts in the field.
The IHSN hosted an expert meeting in October 2008 to present its tools and acknowledges the work done by:
University of ManchesterISTAT (Italian Statistics)Cornell UniversityICPSR
Defining the problem
Developing SDC tools
• Building on existing work • Not an integrated software• A collection of specialized tools for:– Measuring the risk– Reducing the risk– Assessing the information loss 12 plug ins developed in C++ that interface with SPSS,
STATA or direct Server (Windows/Linux).Need to be thoroughly tested.
12 Plug-ins
• 12 plug-ins1. The μ-argus risk for weighted sample2. Re-identification rate to individual risk threshold3. Individual risk to household risk4. L-diversity for unweighted data5. SUDA2: DIS-sample data
6. Kanon: Micro-aggregation7. Local recoding8. Fixed length micro aggregation9. Noise Addition10. Pram: Post Randomization11. Rank Swapping12. Sampling
Risk Measures &Intruder ScenariosWhat does theintruder know?
Risk Reduction
What does the intruderwant?
Based on CENEX Handbook on Statistical Disclosure Control Version 1.01
Individual risk methodology
Poisson model
Individual
Hierarchical
K-anonymityl-diversity
t-completeness
SUDA
Record linkage
Distance-based
Probabilistic
Others
Measuring Disclosure Risk
Based on CENEX Handbook on Statistical Disclosure Control Version 1.01
Masking data Synthetic data file
Perturbative
Sampling
Global recoding
Top/bottom coding
Local suppression
Non perturbative
MASCC
Fixed/variable group
Uni-/Multivariate
Uncorrelated
Correlated
Non-linear
Noise addition
Multiplicative noise
Micro-aggregation
Data swapping
Rank swapping
Rounding
Resampling
PRAM
Local recoding
Reducing risk disclosure
Categorical data Continuous data
Entropy-based measures Mean variation
Direct comparison
Comparison of contingency tables
Mean square error
Mean absolute error
Based on CENEX Handbook on Statistical Disclosure Control Version 1.01
Measuring Information Loss
• In Stata (SPSS, SAS) using C++ plugins– Stata version 9 or >– Log file for easy replication of procedure– Informative output
• Or command-line (plugins with “data server”)• Why Stata (SPSS/SAS)?
– Because most countries use/know these software– Can use all tabulation and analysis functions
Developing SDC toolsProposal
Beta Interface
• Large, imperfect datasets in under resourced countries
• For use by official data producers in developing countries (IHSN objective)
• Relevant for other users as well• Free to all; public source code
Target use
• Testing, “calibrating” and documenting– Cornell + IHSN + selected countries
• Development/implementation of training and TA program– Detailed documentation and guidelines– Reference manual and training materials
• Possibly launched before end of the year (IHSN website)
• Participation of others welcome
Work Program for 2009
• Adding to the Tools to facilitate data access in developing countries:– Tools
• Metadata Editor• CDROM/HTML developer• Web Based National Data Archives• Question Bank
– Guidelines• Data Dissemination• Documentation Guide• Survey Quality Assessment Framework