Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using KNN Classification as a...

The x-CEG Conceptual Approach

ABSTRACT METHODOLOGY RESULTS DISCUSSION

REFERENCES

An Investigation of Data Privacy and Utility Preservation Using KNN Classification as a Gauge

Kato Mivule

Doctoral Candidate, Computer Science Department

Bowie State University

Advisor: Claude Turner, Ph.D.

Associate Professor, Computer Science Department

Bowie State University

1. R. C.-W. Wong, A. W.-C. Fu, K. Wang, and J. Pei, “Minimality Attack in Privacy

Preserving Data Publishing,” Proceedings of the 33rd international conference on Very

large data bases, pp. 543–554, 2007.

2. Krause and E. Horvitz, “A Utility-Theoretic Approach to Privacy in Online Services,”

Journal of Artificial Intelligence Research, vol. 39, pp. 633–662, 2010.

3. J. Kim, “A Method For Limiting Disclosure in Microdata Based Random Noise and

Transformation,” in Proceedings of the Survey Research Methods, American Statistical

Association,, 1986, vol. Jay Kim, A, no. 3, pp. 370–374.

4. M. Banerjee, “A utility-aware privacy preserving framework for distributed data mining

with worst case privacy guarantee,” University of Maryland, Baltimore County, 2011.

5. B. Liú, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Datacentric

Systems and Applications. Springer, 2011, pp. 124–125.

6. C. Dwork, “Differential Privacy,” in Automata languages and programming, vol. 4052, no.

d, M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener, Eds. Springer, 2006, pp. 1–12.

7. K. Mivule, C. Turner, and S.-Y. Ji, “Towards A Differential Privacy and Utility Preserving

Machine Learning Classifier,” in Procedia Computer Science, 2012, vol. 12, pp. 176–181.

8. K. Mivule, “Utilizing Noise Addition for Data Privacy, an Overview,” in Proceedings of

the International Conference on Information and Knowledge Engineering (IKE 2012),

2012, pp. 65–71.

9. Frank, A., Asuncion, A. Iris Data Set, UCI Machine Learning Repository

[http://archive.ics.uci.edu/ml/datasets/Iris]. Department of Information and Computer

Science, University of California, Irvine, CA (2010).

Acknowledgement

Special thanks to Dr. Soo-Yeon Ji, Dr. Hoda El-Sayed, Dr. Darsana Josyula, and the

Computer Science Department at Bowie State university.

THE EXPERIMENT

• Organizations by law have to safeguard the privacy of

individuals when handling data containing personal

identifiable information (PII).

• During the process of data privatization, the utility or

usefulness of the privatized data diminishes.

Data Privacy verses Data Utility

• Achieving an optimal balance between data privacy

and utility is an intractable problem.

• “Perfect privacy can be achieved by publishing

nothing at all, but this has no utility; perfect utility can

be obtained by publishing the data exactly as received,

but this offers no privacy” (Cynthia Dwork, 2006)

PRELIMINARIES

KNN classification of the original Iris dataset with classification error at

0.0400 (4 percent misclassified data)

KNN classification of the privatized Iris dataset with noise addition between

the mean and standard deviation.

KNN classification of the privatized Iris dataset with reduced noise addition

between mean = 0 and standard deviation = 0.1

A second run of the KNN classification of the privatized Iris

dataset with reduced noise addition between mean = 0 and

standard deviation = 0.1.

• The initial results from our investigation show that a

reduction in noise levels does affect the classification

error rate.

• However, this reduction in noise levels could lead to

low risky privacy levels.

• Finding the optimal balance between data privacy and

utility needs is still problematic.

• The level of noise does affect the classification error.

• Adjusting noise parameters is essential for lower

classification error.

• Precise classification (better utility) might mean low

privacy.

• Tradeoffs must be made between privacy and

utility.

IKE 2013 Conference on Information

and Knowledge Engineering

Las Vegas, NV, USA

Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using KNN Classification as a...

Documents

Transcript of Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using KNN Classification as a...

Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms

KATO 1229 PRECISION RAILROAD MODELS KATO … 1229 PRECISION RAILROAD MODELS KATO E001-7 DXÄH E001-8 E001-9 EOOI-IO EOOW

Kato Mivule - Towards Agent-based Data Privacy Engineering

Sweet KNN: An Efficient KNN on GPU through Reconciliation ... · Sweet KNN: An Efﬁcient KNN on GPU through Reconciliation between Redundancy Removal and Regularity Guoyang Chen,

Generador Kato

Boosting Knn TR 2008 03

Running Knn Mapreduce code on Amazon AWS - Virginia Techdmkd.cs.vt.edu/TUTORIAL/Bigdata/Running KNN... · Running Knn Mapreduce code on Amazon AWS Pseudo Code Inputs: Train Data D,

Kato Coilthread Catalogue

C03581 MR-130Ri - KATO WORKS · KATO KATO . Title: C03581_MR-130Ri Created Date: 10/25/2013 1:26:28 PM

DATAMINING2 Exercises –KNN, Naïve Bayes

kNN Imputation

Knn Kernels Margin

A Framework for Data Privacy and Security in sub- Saharan Africa - TAPIA 2011 Kato Mivule Doctoral Student Computer Science Department Bowie State University.

Karo - KATO WORKS · Karo KATO KATO . KATO . Title: C02313_NK-550VR Created Date: 8/4/2010 5:46:12 PM ...

Spinosaurus - Shuki Kato

KATO KCR360 REGULATOR

KNN (K-Nearest Neighbors)

KNN NSU H APER

Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Learning Systems

Sweet KNN: An Efﬁcient KNN on GPU through Reconciliation ... · show that Sweet KNN outperforms existing GPU implementations on KNN by up to 120X (11X on average). I. INTRODUCTION