Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Generalization Lattices...

14
Bioinformatics Literature Review Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005) Literature Review by Kato Mivule COSC891 Bioinformatics, Spring 2014 Bowie State University Reference: Bradley. A. Malin, "Protecting genomic sequence anonymity with generalization lattices.", Methods of information in medicine, Vol. 44, No. 5. (2005), pp. 687-692 Bowie State University Department of Computer Science Image Source: U.S. National Library of Medicine

description

Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005) Literature Review Talk by Kato Mivule

Transcript of Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Generalization Lattices...

Page 1: Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005)

Bioinformatics Literature Review

Protecting DNA Sequence Anonymity with Generalization Lattices

(Malin, 2005)

Literature Review by Kato Mivule

COSC891 – Bioinformatics, Spring 2014

Bowie State University

Reference: Bradley. A. Malin, "Protecting genomic sequence anonymity with generalization lattices.", Methods of

information in medicine, Vol. 44, No. 5. (2005), pp. 687-692

Bowie State University Department of Computer Science

Image Source: U.S. National Library of Medicine

Page 2: Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005)

Outline

• The Problem

• Methodology

• Conclusion and Future work

Bowie State University Department of Computer Science

Bioinformatics Literature Review

Page 3: Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005)

The Problem

• Transactions in DNA data poses serious privacy concerns.

• DNA uniquely identifies an individual.

• DNA data is prone to re-identification and inference attacks.

Bowie State University Department of Computer Science

Bioinformatics Literature Review

Page 4: Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005)

The Problem:

Bowie State University Department of Computer Science

Bioinformatics Literature Review

Source: Forbes.com - April 25th 2013

Page 5: Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005)

Methodology

• Apply k-Anonymity

• Apply Generalization

• Apply the concept of generalization lattice to determine the distance between

two residues in a single nucleotide region, which offers the most similar

generalized concept for two residues – for example adenine and guanine are

both purines.

• DNALA – using k-anonymity by granting that the DNA sequence of one

individual will be similar to the DNA sequence of another individual.

Bowie State University Department of Computer Science

Bioinformatics Literature Review

Page 6: Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005)

Methodology

• k-anonymity

• K-anonymity uses both generalization and suppression to enforce

confidentiality.

• K-anonymity requires that for a data set with quasi-identifier attributes in a

database to be published, values in the quasi-identifier attributes must be

repeated at least k times to ensure privacy, with the value of k > 1.

• Because of the generalization and suppression features, k-anonymity is

applicable for DNA data privacy.

Bowie State University Department of Computer Science

Bioinformatics Literature Review

Page 7: Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005)

Methodology

Generalization

• Generalization is a data privacy method in which values in attributes that

could cause identify disclosure are made less informative by being replaced

with general values.

• An example is replacing age values of people born between 1970 and 1979 to

just 1970.

• Generalization follows the Domain Generalization Hierarchy (DGH), which

is different levels of generalization. For example we could use L1 =1970-09

and generalize to the month, L2 = 1970, generalize to the year, L3 = 197*

generalize to the decade.

Bowie State University Department of Computer Science

Bioinformatics Literature Review

Page 8: Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005)

Methodology

DNALA – DNA Lattice Anonymization

• Employs k-anonymity for data privacy

• The technique safeguards privacy by ensuring that the DNA sequence of one

individual will be precisely the same as the sequence of one other individual

in the published data set.

• When an institution publishes DNA sequence data using DNALA technique,

the uniqueness of every DNA sequence is assured to be inseparable from at

least k – 1 other identities.

Bowie State University Department of Computer Science

Bioinformatics Literature Review

Page 9: Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005)

Methodology

DNA Domain Generalization Hierarchy

Bowie State University Department of Computer Science

Bioinformatics Literature Review

Image source: Malin, (2005)

Page 10: Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005)

Methodology

DNA Domain Generalization Hierarchy

Bowie State University Department of Computer Science

Bioinformatics Literature Review

Image source: Malin, (2005)

Page 11: Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005)

Methodology

DNA Domain Generalization Hierarchy

Bowie State University Department of Computer Science

Bioinformatics Literature Review

Image source: Malin, (2005)

Page 12: Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005)

Methodology

DNALA Algorithm

Bowie State University Department of Computer Science

Bioinformatics Literature Review

Image source: Malin, (2005)

Page 13: Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005)

Conclusion and Future works

•DNA data privacy using k-anonymity is still promising.

•Data utility remains a challenge as more DNA sequence info gets generalized.

•How do other algorithms such as noise addition, and differential privacy apply?

•Could we generate synthetic and or obfuscated DNA data with similar traits as the

original?

Bowie State University Department of Computer Science

Bioinformatics Literature Review

Page 14: Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Generalization Lattices (Malin, 2005)

References

1. Bradley. A. Malin, "Protecting genomic sequence anonymity with generalization lattices.", Methods

of information in medicine, Vol. 44, No. 5. (2005), pp. 687-692

2. K. Mivule and C. Turner, “Applying Data Privacy Techniques on Published Data in Uganda,” in

International Conference on e-Learning, e-Business, Enterprise Information Systems, and e-

Government (EEE), 2012, pp. 110–115.

3. Adam Tanner, Forbes.com "Harvard Professor Re-Identifies Anonymous Volunteers In DNA

Study", Forbes.com, 4/25/2013, Accessed: 02/10/2014, Available Online:

http://www.forbes.com/sites/adamtanner/2013/04/25/harvard-professor-re-identifies-anonymous-

volunteers-in-dna-study/

Bowie State University Department of Computer Science

Bioinformatics Literature Review