© Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and...
-
Upload
dora-mccoy -
Category
Documents
-
view
213 -
download
1
Transcript of © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and...
![Page 1: © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e745503460f94b74ab2/html5/thumbnails/1.jpg)
Folie 1© Federal Statistical Office, Research Data Centre, Maurice Brandt
Analytical validity and confidentiality protection of anonymised longitudinal enterprise microdata –
Survey of a German Project
Maurice Brandt1, Michael Konold2, Rainer Lenz3 and Martin Rosemann4
Research Data Centres of the Federal Statistical Office1 and the Statistical Offices of the Länder2,
University of Applied Sciences Mainz3
Institute for Applied Economic Research4
Work session on statistical data confidentiality Manchester 17-19 December 2007
![Page 2: © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e745503460f94b74ab2/html5/thumbnails/2.jpg)
Folie 2© Federal Statistical Office, Research Data Centre, Maurice Brandt
Overview
1. Introduction
2. The data sets of the project
3. Anonymisation methods and analytical validity
4. Approaches to assessing anonymity
5. Conclusions
![Page 3: © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e745503460f94b74ab2/html5/thumbnails/3.jpg)
Folie 3© Federal Statistical Office, Research Data Centre, Maurice Brandt
1. Introduction
“Business Panel data and de facto anonymisation” new project since the beginning of 2006
improve the data infrastructure in Germany regarding longitudinal data on local units and enterprises
guarantee the access of the scientific community to the panel data of economic statistics
the formerly project “De facto anonymisation of business microdata” has shown that de facto anonymisation can be achieved on a cross-section basis
![Page 4: © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e745503460f94b74ab2/html5/thumbnails/4.jpg)
Folie 4© Federal Statistical Office, Research Data Centre, Maurice Brandt
1. Introduction
In this project different business statistics are linked to longitudinal datasets
it is planned to complement the data with information from the official business register
the data sets can already be used for scientific work
the final aim is to produce a scientific use file
![Page 5: © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e745503460f94b74ab2/html5/thumbnails/5.jpg)
Folie 5© Federal Statistical Office, Research Data Centre, Maurice Brandt
2.1 The data sets of the projectUnits of analysis are the local units in manufactoring and mining Complete enumeration of local units with 20 or more employees
Monthly reports years from 1995 to 2005 Information about employees, wages, salaries, turnover Survey of investments years from 1995 to 2005 Information on highly different types of investments
Survey of small units years from 1995 to 2002 Local units with 19 or fewer employees
![Page 6: © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e745503460f94b74ab2/html5/thumbnails/6.jpg)
Folie 6© Federal Statistical Office, Research Data Centre, Maurice Brandt
2.2 The data sets of the project
Cost Structure Survey Stratified sample of enterprises with 20 or more employees in the manufacturing and mining sector
years from 1995 to 2005 all together over 43.000 enterprises Information on output, production factors, employees from 1999 to 2002 13.300 enterprises available in the whole period studies regarding investments in research and development are possible
![Page 7: © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e745503460f94b74ab2/html5/thumbnails/7.jpg)
Folie 7© Federal Statistical Office, Research Data Centre, Maurice Brandt
2.3 The data sets of the projectTurnover Tax Statistics Very large data set of a total of 4.3 million enterprises years from 2000 to 2004 (1.8 million for the whole period)
Information on all taxable turnovers, turnover tax, prior tax and of tax liability
IAB Panel of local units Information on employment trend, staff structure, hours worked, turnover, export share, investments and innovation
Since year 1993 various waves on about 4.300 to a max. of 16.000 local units
![Page 8: © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e745503460f94b74ab2/html5/thumbnails/8.jpg)
Folie 8© Federal Statistical Office, Research Data Centre, Maurice Brandt
3. Anonymisation methods and analytical validityAnonymisation methods methods reducing the information (suppression of variables or presenting key variables in broader categories) methods modifying the values of numerical data (data perturbating methods)
Data perturbating methods for panel data Micro aggregation: (a) separately for all variables and all periods (Individual Ranking), (b) separately for all variables but jointly for all periods, (c) separately for all periods but jointly for all variables and (d) jointly for all periods and all variable Multiplicative stochastic noise: mixture distribution (approach of Höhne) Multiple Imputation
![Page 9: © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e745503460f94b74ab2/html5/thumbnails/9.jpg)
Folie 9© Federal Statistical Office, Research Data Centre, Maurice Brandt
3. Anonymisation methods and analytical validityIn FocusImpacts of data perturbating methods on descriptive distribution measures the estimation of econometric panel models, particularly on the within-estimator to control for individual unobservable heterogeneity
First Results the within estimator is consistent in the case of anonymisation by individual ranking Project team derived consistent within-estimators in the case of anonymisation by multiplicative stochastic noise (including the method of Höhne) and no autocorrelation Case of autocorrelation: work in progress Multiple Imputation: separate speech on this conference
![Page 10: © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e745503460f94b74ab2/html5/thumbnails/10.jpg)
Folie 10© Federal Statistical Office, Research Data Centre, Maurice Brandt
4. Approaches to assessing anonymity
We calculate coefficients ),( ji bad
(AP) Minimize ,),(11
n
jijji
n
i
xbad
s.t. ,,...,1,for}10{ nji,xij
nixn
jij ,...,1for1
1
.,...,1for11
njxn
iij
and
and obtain:
{a1,...,an} external data
{b1,...,bn} target data
![Page 11: © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e745503460f94b74ab2/html5/thumbnails/11.jpg)
Folie 11© Federal Statistical Office, Research Data Centre, Maurice Brandt
4. Approaches to assessing anonymity
Four approaches in order to estimate the coefficients of the linear program (AP) are used:
Conventional distance based approach Correlation based approach Distribution based approach Collinearity based approach
![Page 12: © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e745503460f94b74ab2/html5/thumbnails/12.jpg)
Folie 12© Federal Statistical Office, Research Data Centre, Maurice Brandt
5. Conclusions
Within the scope of the project the panel data sets can be used by remote data processing safe scientific work stations in the office
They are already used in some research projects
First scientific use files for data use on one‘s own workstation are probably available at the beginning of 2009
![Page 13: © Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.](https://reader036.fdocuments.us/reader036/viewer/2022072014/56649e745503460f94b74ab2/html5/thumbnails/13.jpg)
Folie 13© Federal Statistical Office, Research Data Centre, Maurice Brandt
Thank you for your attention