Quantile and quantile-function estimations under density ...
Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck...
-
Upload
calvin-wilcox -
Category
Documents
-
view
217 -
download
1
Transcript of Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck...
![Page 1: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/1.jpg)
Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and
Hot Deck
Jennifer Huckett
Iowa State University
June 20, 2007
![Page 2: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/2.jpg)
Outline
• Motivation
• Disclosure Limitation Methods
• Risk Assessment
• Simulation Study
• Results & Conclusions
![Page 3: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/3.jpg)
Motivation• Iowa Department of Revenue (IDR)
– Collects and maintains individual tax return data
• Legislative Services Agency (LSA)– Examines impact of tax law changes on liability
• Current system– LSA submits requests to IDR– IDR computes liability, reports to LSA– Occurs several times each year– Inefficient for both IDR and LSA
![Page 4: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/4.jpg)
• Solutions– Secure/remote access server
• Data are not released
• Some analyses suppressed
– Statistical disclosure limitation (SDL)• Tabular
• Microdata– enable IDR to provide LSA with data set
– allow LSA to compute liability with ease and accuracy
– MUST ENSURE CONFIDENTIALITY of RECORDS!
![Page 5: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/5.jpg)
Establishment Connection
• Very skew distributions, unusual associations among distributions
• Groups of variables are related to one another in unusual ways
• Similar to business tax data or business expenditure/revenue data
• Confidentiality is critical
![Page 6: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/6.jpg)
Traditional Approaches
• Recoding (e.g. aggregation)
• Noise addition
• Data swapping
• Data suppression
• Imputation
• Combinations of these
![Page 7: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/7.jpg)
Our Approach
• Synthetic microdata simulation– Retain key demographic variables– Simulate values for some variables
• Quantile regression conditional on key variables
• Compute fitted values at selected quantiles
– Impute values for remaining variables • Hot deck + rank swap
• Hot deck based on simulated income variables
![Page 8: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/8.jpg)
Quantile Regression
•
– = “tilted absolute value function” for quantile
– = linear function of predictors (xi)
• performed in R– quantreg package– rq function
Quantile Regression, Koenker 2004
)),((min ii xy
)ˆ( yyi ),( ix
th
![Page 9: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/9.jpg)
Simulate via Quantile Regression
• Estimate for quantiles from the set
• For each record on variable y
– Randomly select ~ Uniform(0,1)
– Compute fitted given x at above and below
– Interpolate to obtain = simulated value
={0.01, 0.02, ...,0.99}
*ˆy
**y
),( ix
![Page 10: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/10.jpg)
IDR Application: Key Demographic Variables
• Number of dependents– 0, 1, 2,…
– Categorized into • 0
• 1
• ≥2
• County– 1,…,99
– Categorized into 4 population size groups
• State filing status1. single2. married filing joint3. married filing separate
on combined return4. married filing separate
returns5. head of household6. widow(er) with
dependent child– Categorized into
• 1• 2 and 3• 4, 5, and 6
![Page 11: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/11.jpg)
IDR Application: Quantile Regression for wages
]4[]3[]2[]6,5,4[
]3,2[]2[#]1[#
111098
7654
43
32
210
countyIcountyIcountyIsfsI
sfsIdepIdepIageageageagewages
![Page 12: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/12.jpg)
• Hot Deck– Mahalanobis distance
– closest 20 records
• Rank Swap– compute sample rank, r
– draw random rank, r*, from discrete Uniform[r-10, r+10]
– impute value from record with rank r*
IDR Application: Hot Deck and Rank Swap for Federal Tax
)()'(),( 1jixxji xxSxxjid
![Page 13: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/13.jpg)
Disclosure Risk Measurement
• Using methods detailed in Reiter (2005) and Duncan and Lambert (1986, 1989)
• Examine specific records– Original records– Released records – Model intruder behavior to assess disclosure
risk
• Simulation Study
![Page 14: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/14.jpg)
Original and Released Records
![Page 15: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/15.jpg)
),|Pr( ZtjJ
Intruder Behavior
• Target record, t– Intruder has information on target
– Attempts to match t in released records
• Released records j=1,…,r in Z• Probability that record j belongs to target t is
• As – probability decreases
– disclosure risk decreases
![Page 16: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/16.jpg)
Simulation Study
Schemes for SDL influence divisions of A into Ap
(available, perturbed) and Ad (available, unperturbed).
![Page 17: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/17.jpg)
SDL Schemes in Simulation Study
• No SDL• Swap 30% marital status• Swap 30% marital status and minority• Recode age into 5 year intervals• Recode age into 5 year intervals and swap
30% marital status and minority• Simulation via quantile regression and hot
deck
![Page 18: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/18.jpg)
Targets
• Intruder has information on target, t, and wants to match with released records
• Consider a few targets– Unique record– Rare record– Common record
![Page 19: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/19.jpg)
Results from Simulation Study
),|Pr( ZtjJ
target No SDLMarital
swapMarital and
minority swapAge
recode
Swaps and
recode
Quantile regression
and hot deck
unique1 1 0.1046 1 0.0178 0.0895
rare0.3333 0.1044 0.1304 0.0526 0.0225
0.0016
common0.0385 0.0320 0.0320 0.0068 0.0055
0.0008
![Page 20: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/20.jpg)
Conclusions & Future Work
• Risk behaves as we expect– increased SDL– decreased disclosure risk (except for unique!)
• Perform SDL techniques to American Community Survey data at US Census Bureau
• Compare traditional techniques to quantile regression and hot deck by computing risk
• Measure utility of released data
![Page 21: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/21.jpg)
Acknowledgements
• Iowa Department of Revenue
• Iowa’s Legislative Services Agency
• National Institute of Statistical Sciences
• US Census Bureau Dissertation Fellowship Award
![Page 22: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e175503460f94b028f0/html5/thumbnails/22.jpg)
References
• Duncan,G.T. and Lambert, D. 1986. “Disclosure-Limited Data Dissemination,” Journal of the American Statistical Association, 81, 10-28.
• Duncan,G.T. and Lambert, D. 1989. “The Risk of Disclosure for Microdata,” Journal of Business and Economic Statisistics, 7, 207-217.
• Koenker, R. 2005. “Introduction,” Quantile Regression, Econometric Society Monograph Series, Cambridge University Press.
• Reiter, J.P. 2005. “Estimating Risks of Identification Disclosure in Microdata”, Journal of the American Statistical Association, 100, 472, 1103-1113.