A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf ·...

135
XintaoWu, Xiaowei Ying University of North Carolina at Charlotte A Tutorial of Privacy-Preservation of Graphs and Social Networks

Transcript of A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf ·...

Page 1: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

XintaoWu, Xiaowei Ying

University of North Carolina at Charlotte

A Tutorial of Privacy-Preservation of

Graphs and Social Networks

Page 2: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

National Freedom of Information

2

Page 3: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Data Protection Laws

3

Page 4: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

National Laws USA

HIPAA for health care Passed August 21, 96

lowest bar and the States are welcome to enact more stringent rules California State Bill 1386

Grann-Leach-Bliley Act of 1999 for financial institutions

COPPA for childern’s online privacy

etc.

Canada PIPEDA 2000

Personal Information Protection and Electronic Documents Act

Effective from Jan 2004

European Union (Directive 94/46/EC) Passed by European Parliament Oct 95 and Effective from Oct 98.

Provides guidelines for member state legislation

Forbids sharing data with states that do not protect privacy

4

Page 5: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Privacy Breach AOL's publication of the search histories of more than 650,000 of its users

has yielded more than just one of the year's bigger privacy scandals. (Aug 6, 2006)

That database does not include names or user identities. Instead, it lists only a unique ID number for each user. AOL user 710794

an overweight golfer, owner of a 1986 Porsche 944 and 1998 Cadillac SLS, and a fan of the University of Tennessee Volunteers Men's Basketball team.

interested in the Cherokee County School District in Canton, Ga., and has looked up the Suwanee Sports Academy in Suwanee, Ga., which caters to local youth, and the Youth Basketball of America's Georgia affiliate.

regularly searches for "lolitas," a term commonly used to describe photographs and videos of minors who are nude or engaged in sexual acts.

5

Source: AOL's disturbing glimpse into users' lives By Declan McCullough , CNET News.com,

August 7, 2006, 8:05 PM PDT

Page 6: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Privacy Preserving Data Mining Data mining The goal of data mining is summary results (e.g., classification,

cluster, association rules etc.) from the data (distribution)

Individual Privacy Individual values in database must not be disclosed, or at least

no close estimation can be got by attackers Contractual limitations: privacy policies, corporate agreements

Privacy Preserving Data Mining How to transform data such that we can build a good data mining model (data utility) while preserving privacy at the record level (privacy)?

6

Page 7: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

PPDM on Tabular Data

7

ssn name zip race … age Sex Bal income … IntP

28223 Asian … 20 M 10k 85k … 2k

28223 Asian … 30 F 15k 70k … 18k

28262 Black … 20 M 50k 120k … 35k

28261 White … 26 M 45k 23k … 134k

. . … . . . . … .

28223 Asian … 20 M 80k 110k … 15k

69% unique on zip and birth date

87% with zip, birth date and gender

Generalization (k-anonymity, L-diversity,

t-closeness etc.) and Randomization

Refer to a survey book [Aggarwal, 08]

Page 8: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

PPDM Tutorials on Tabular Data Privacy in data system, Rakesh Agrawal, PODS03

Privacy preserving data mining, Chris Clifton, PKDD02, KDD03

Models and methods for privacy preserving data publishing and analysis, Johannes Gehrke, ICDM05, ICDE06, KDD06

Cryptographic techniques in privacy preserving data mining, HelgerLipmaa, PKDD06

Randomization based privacy preserving data mining, XintaoWu, PKDD06

Privacy in data publishing, Johannes Gehrke & AshwinMachanavajjhala, S&P09

Anonymized data: genertion, models, usage, Graham Cormode & Divesh Srivastava, SIGMOD09

8

Page 9: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Social Network

9

Page 10: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Network of US political books

(105 nodes, 441 edges)

Books about US politics sold by

Amazon.com. Edges represent

frequent co-purchasing of books by

the same buyers. Nodes have been

given colors of blue, white, or red to

indicate whether they are "liberal",

"neutral", or "conservative".

Social Network

10

Page 11: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Social Network

Network of the political blogs on the 2004 U.S. election

(polblogs, 1,222 nodes and 16,714 edges)

11

Page 12: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Social Network

12

• Collaboration network of scientists [Newman, PRE06]

Page 13: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

More Social Network Data

Newman’s collection

http://www-personal.umich.edu/~mejn/netdata/

Enron data

http://www.cs.cmu.edu/~enron/

Stanford large network dataset collection

http://snap.stanford.edu/data/index.html

13

Page 14: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Graph Mining A very hot research area Graph properties such as degree distribution Motif analysis Community partition and outlier detection Information spreading Resiliency/robustness, e.g., against virus propagation Spectral analysis

Research development “Managing and mining graph data” by Aggarwal and Wang,

Springer 2010. “Large graph-mining: power tools and a practitioner’s guide” by

Faloutsos et al. KDD09

14

Page 15: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Network Science and Privacy

15Source: Jeannette Wing, Computing research: a view from DC, SNOWBIRD, 2008

Page 16: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Outline

• Attacks on Naively Anonymized Graph

• Privacy Preserving Social Network Publishing

• K-anonymity

• Generalization

• Randomization

• Other Works

• Output Perturbation

• Background on differential privacy

• Accurate analysis of private network data

16

Page 17: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Social Network Data Publishing

17

Data ownerData miner

release

name sex age disease salary

Ada F 18 cancer 25k

Bob M 25 heart 110k

Cathy F 20 cancer 70k

Dell M 65 flu 65k

Ed M 60 cancer 300k

Fred M 24 flu 20k

George M 22 cancer 45k

Harry M 40 flu 95k

Irene F 45 heart 70k

id Sex age disease salary

5 F Y cancer 25k

3 M Y heart 110k

6 F Y cancer 70k

1 M O flu 65k

7 M O cancer 300k

2 M Y flu 20k

9 M Y cancer 45k

4 M M flu 95k

8 F M heart 70k

Page 18: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Threat of Re-identification

18

id Sex age disease salary

5 F Y cancer 25k

3 M Y heart 110k

6 F Y cancer 70k

1 M O flu 65k

7 M O cancer 300k

2 M Y flu 20k

9 M Y cancer 45k

4 M M flu 95k

8 F M heart 70k

Attacker

attack

Ada’s sensitive information is disclosed.

Privacy breachesIdentity disclosure

Link disclosure

Attribute disclosure

Page 19: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Deriving Personal Identifying Information [Gross WPES05]

User profiles (e.g., photo, birth date, residence,

interests, friend links) can be used to estimate personal

identifying information such as SSN.

### - ## - ####

Users should pay attention to (default) privacy

preference settings of online social networks.

19

Determined by zip code

https://secure.ssa.gov/apps10/poms.nsf/lnx/0100201030

Group no Sequential no

Page 20: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Active and Passive Attacks [BackstormWWW07]

u v

20

Active attack outline1. Join the network by creating some new user accounts;

2. Establish a highly distinguishable subgraph H among the

attacking nodes;

3. Send links to targeted individuals from the

attacking nodes;

4. In the released graph, identify the subgraph H

among the attacking nodes;

5. The targeted individuals and their links are

then identified.

Page 21: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Active and Passive Attacks [BackstormWWW07]

u v

21

Active attacks & subgraph HThe active attack is based on the subgraph H among the attackers:

1. No other subgraphs isomorphic to H;

2. Subgraph H has no non-trivial automorphism

3. Efficient to identify H regardless G;

Page 22: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Active and Passive Attacks [BackstormWWW07]

22

Passive attacks outline1. Observation: most nodes in the network already form a

uniquely identifiable subgraph.

2. One adversary recruits k-1 of his neighbors to form the

subgraph H of size k.

3. Work similarly to active attacks.

Drawback: Uniqueness of H is not guaranteed.

Page 23: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

23

Structural queries:A structural query Q represents complete or partial structural

information of a targeted individual that may be available to

adversaries.

Structural queries and identity privacy:

Attacks by Structural Queries [Hay VLDB08]

Page 24: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Attacks by Structural Queries [Hay VLDB08]

24

Degree sequence refinement queries

Page 25: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Attacks by Structural Queries [Hay VLDB08]

25

Subgraph queriesThe adversary is capable of gathering a fixed number of

edges around the targeted individual.

Hub fingerprint queriesA hub is a central node in a network with high degree and

high betweenness centrality. A hub fingerprint of node v is

the node's connections to a set of designated hubs within a

certain distance.

Page 26: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Attacks by Combining Multiple Graphs [Narayanan ISSP09]

26

Attack outline:1. The attacker has two type of auxiliary information:

Aggregate: an auxiliary graph whose members overlap with the

anonymized target graph

Individual: the detailed information on a very small number of

individuals (called seeds) in both the auxiliary graph and the target graph.

2. Identify seeds in the target graph.

3. Identify more nodes by comparing the neighborhoods of the

de-anonymized nodes in the auxiliary graph and the target

graph (propagation).

Page 27: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Deriving Link Structure of Entire Network [Korolova ICDE08]

A different threat in which

An adversary subverts user accounts to get local neighborhoods and pieces them together to build the entire network.

No underlying network is released.

A registered user often can see all the links and nodes incident to him within distance d from him.

d=0 if a user can see who he links to.

d=1 if a user can also see who links to all his friends.

Analysis showed that the number of local neighborhoods needed to cover a fraction of the entire network drops exponentially with increase of the lookahead parameter d.

27

Page 28: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Outline

• Attacks on Naively Anonymized Graph

• Privacy Preserving Social Network Publishing

• K-anonymity

• Generalization

• Randomization

• Other Works

• Output Perturbation

• Background on differential privacy

• Accurate analysis of private network data

28

Page 29: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Privacy Preserving Social Network Publishing

Naïve anoymization is not sufficient to prevent privacy

breaches, mainly due to link structure based attacks.

Graph topology has to be modified via

Adding/deleting edges/nodes

Grouping nodes/edges into super-nodes and super-edges

Breaking associations between node attributes and nodes

for rich graphs

How to quantify utility loss and privacy preservation in

the perturbed (and anonymized) graph?

29

Page 30: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Graph Utility

Utility heavily depends on mining tasks.

It is challenging to quantify the information loss in the

perturbed graph data.

Unlike tabular data, we cannot use the sum of the

information loss of each individual record.

We cannot use histograms to approximate the distribution

of graph topology.

It is more challenging when considering both structure

change and node attribute change.

30

Page 31: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

31

Graph Utility

Topological features: Structural characteristics of the graph.

Various measures form different perspectives.

Commonly used.

Spectral features: Defined as eigenvalues of the graph's adjacency matrix or other derived

matrices.

Closely related to many topological features.

Aggregate queries: Calculate the aggregate on some paths or subgraphs satisfying the query

condition.

E.g.: the average distance from a medical doctor vertex to a teacher vertex

in a network.

Page 32: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Topological Features

Topological features of networks

Harmonic mean of shortest distance

Transitivity(cluster coefficient)

Subgraph centrality

Modularity (community structure)

And many others (refer to: F. Costa et al., Characterization of Complex Networks:

A Survey of measurements, 2006)

32

Page 33: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Graph and Matrix

Adjacency matrix

1. For a undirected graph, A is a symmetric;

2. No self-links, diagonal entries of A are all 0

3. For a un-weighted graph, A is a 0-1 matrix

33

Page 34: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Spectral Features

Spectral features of networks

Adjacency spectrum

Laplacian spectrum

34

Page 35: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Topological vs. Spectral Features

Adjacency and Laplacian spectrum:

The maximum degree, chromatic number, clique number etc.

are related to ;

Epidemic threshold of the virus propagates in the network is

related to ;

Laplacian spectrum indicates the community structure:

1. k disconnected communities:

2. k loosely connected communities:

35

Page 36: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Topological vs. Spectral Features

Laplacian spectrum & communitiesDisconnected communities:

Loosely connected communities:

36

0.00 0.00 0.00 1.27 2.59 3.00 3.00 3.00 4.00 4.00

4.00 4.73 5.00 5.41 6.00 6.00 6.00 6.00 6.00

0.00 0.11 0.34 1.31 2.60 3.00 3.10 3.36 4.00 4.13

4.59 4.79 5.31 5.58 6.00 6.00 6.00 6.66 7.12

Page 37: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Eigenspace [Ying SDM09]

37

),,( 21 kiiii xxx

nk

k

k

nn

k

x

x

x

x

x

x

x

x

x

2

1

2

22

21

1

21

11

21

Topological vs. Spectral Features

Page 38: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Topological vs. Spectral Features

Topological & spectral features are related

No. of triangles:

Sub-graph centrality:

Graph diameter:

38

Page 39: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Outline

• Attacks on Naively Anonymized Graph

• Privacy Preserving Social Network Publishing

• K-anonymity

• Generalization

• Randomization

• Other Works

• Output Perturbation

• Background on differential privacy

• Accurate analysis of private network data

39

Page 40: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

K-anonymity Privacy Preservation

K-anonymity (Sweeney 1998)

Each individual is identical with at least K-1 other

individuals

A general definition for network data [Hay, VLDB08]

40

Page 41: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

K-anonymity

Each node is identical with at least K-1 other nodes

under topology-based attacks.

The adversary is assumed to have some knowledge of the

target user:

node degree (K-degree)

(immediate) neighborhood (K-neighborhood)

arbitrary subgraph (K-automorphism etc.)

K-anonymity approach guarantees that no node in the

released graph can be linked to a target individual with

success prob. greater than 1/K.

41

Page 42: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

K-degree Anonymity [Liu SIGMOD08]

Attacking model:The attackers know the degree of the targeted individual

42

Page 43: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

K-degree Anonymity [Liu SIGMOD08]

K-degree anonymous:

Optimize utility: minimize no. of added edges

43

Page 44: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

K-degree Anonymity [Liu SIGMOD08]

Algorithm outline:

44

Page 45: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

K-neighborhood Anonymity [Zhou ICDE08]

Attacking modelThe attackers know the immediate neighborhood of a targeted

individual.

45

Page 46: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

K-neighborhood Anonymity [Zhou ICDE08]

Problem:

46

Page 47: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

K-neighborhood Anonymity [Zhou ICDE08]

Algorithm outline1. Extract the neighborhoods of all vertices in the network.

2. Compare and test all neighborhoods by neighborhood component coding

3. Organize vertices into groups and anonymize the neighborhoods of

vertices in the same group until the graph satisfies K-neighborhood

anonymity.

47

1-neighborhood of Ada Naively Anonymized

Graph

K-neighborhood

Anonymity

Page 48: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

K-neighborhood Anonymity [Zhou ICDE08]

Graph utility:

1. The nodes have hierarchical label information.

2. Two ways to anonymize the neighborhoods: generalizing labels

and adding edges.

3. Focus on using the anonymized graph to answer aggregate

network queries as accurate as possible.

48

Page 49: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

K-automorphism Anonymity [ZouVLDB09]

Attacking model:The attackers can know any subgraph that contains the targeted

individual.

Graph automorphism

49

Page 50: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

K-automorphism Anonymity [ZouVLDB09]

K-automorphic graph

50

Page 51: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

K-automorphism Anonymity [ZouVLDB09]

Algorithm outline1. Partition graph G into several groups of subgraphs, each group

contains at least K subgraphs, and no subgraphs share a node.

2. Block Alignment: make subgraphs within each group

isomorphic to each other.

3. Edge Copy: copy the edges across the subgraphs properly.

51

Page 52: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

K-symmetry Model [Wu EDBT10]

Attacking model:The attackers can know any subgraph that contains the targeted

individual.

K-symmetry approach:1. A concept similar to K-automorphism (equivalent?)

2. Make graph K-symmetry by adding fake nodes

52

Page 53: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

K-isomorphism Model [Cheng SIGMOD10]

Attacking model:The attackers can know any subgraph that contains the targeted

individual.

Insufficient protection on link privacy by K-

automorphism approach

53

Example: the adversary can not identify Alice or Bob, but there must

be a link between them.

Page 54: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

K-isomorphism Model [Cheng SIGMOD10]

K-security graph

54

Page 55: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

K-anonymity

55

Privacy

protection

Utility preservation

K-neighborhood

K-degree

K-automorphism

K-symmetry

K-security

Page 56: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

K-obfuscation [Bonchi ICDE11]

56

Page 57: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

K-obfuscation [Bonchi ICDE11]

57

Both cases respect 2-candidate anonymity.

K-candidate aims at guaranteeing a lower bound on the amount of

uncertainty.

K-obfuscation measures the uncertainty.

The obfuscation level quantified by means of the entropy is always no

less than the one based on a-posteriori belief probabilities.

Page 58: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Outline

• Attacks on Naively Anonymized Graph

• Privacy Preserving Social Network Publishing

• K-anonymity

• Generalization

• Randomization

• Other Works

• Output Perturbation

• Background on differential privacy

• Accurate analysis of private network data

58

Page 59: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Generalization Approach [Hay VLDB08]

Generalize nodes into super nodes and edges into super

edges

59

Page 60: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Generalization Approach [Hay VLDB08]

The size of possible graph world:

Maximize the graph likelihood functionSimulated annealing algorithm

1. Start with a single partition containing all nodes;

2. Update state by splitting/merge partitions or move a node to a new

partition.

60

Page 61: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Anonymizing Rich Graph

Social networks contain much richer information in addition to the graph topology.

individual attributes including demographic information, such as age, gender and location, and sensitive personal data, such as political and religious preferences.

relationship may denote various types of interactions and one interaction can also involve more than two participants.

Queries are on the combinations of properties of entities and patterns of the link structure.

K-anonymous models based only on link structure may still leak privacy, e.g., all nodes in a K-anonymous group are associated with the same sensitive information.

61

Page 62: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Link Protection [Zheleva PinKDD07]

Graph model: one type of nodes but multiple types of edges, which are classified as either sensitive or non-sensitive.

The problem of link re-identification is to infer sensitive relationships from non-sensitive ones.

Cluster-edge anonymization is to aggregate edges by type to protect link privacy.

all the anonymized nodes in an equivalence class are collapsed into a single super-node.

release the number of edges of each type between equivalence classes.

62

Page 63: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Clustering Approach [Campan PinKDD08]

Graph model:

edges are not labeled

nodes are associated with attributes (identifier, quasi-

identifier, and sensitive ones).

K-anonymity model via cluster collapsing and

generalization is for both the quasi-identifier attributes

and the quasi-identifier relationship homogeneity.

Two nodes from any cluster are indistinguishable based on

either their relationships or their attributes.

63

Page 64: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Anonymizing Bipartitite Graph [CormodeVLDB08]

Graph model

two types of entities

an association only exists between two entities of different types, e.g., customers buy products.

The relationship is sensitive while entity attributes are public.

The anonymization method is to preserve the graph structure exactly by masking the mapping from entities to nodes.

Aggregate queries are used to evaluate utility.

the total number of OTC products bought by NJ customers.

64

Page 65: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Anonymizing Rick Interaction Graph [BhagatVLDB09]

Hyper-graphG(V,I,E) represents multiple types of interactions between entities

Attacking modelThe attackers know part of the links and nodes in the graph

65

Page 66: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Anonymizing Rick Graph [BhagatVLDB09]

Algorithm outline1. Sort the nodes according to the attributes;

2. Group nodes into super nodes satisfying class safety property,

and the size of each supper node is greater than K.

Class safety: each node cannot have interactions with two or more nodes from

the same group

3. Replace the node identifiers by the label list.

66

Page 67: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Outline

• Attacks on Naively Anonymized Graph

• Privacy Preserving Social Network Publishing

• K-anonymity

• Generalization

• Randomization

• Other Works

• Output Perturbation

• Background on differential privacy

• Accurate analysis of private network data

67

Page 68: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Basic Graph Randomization Operations1. Rand Add/Del: randomly add k false edges and delete k true edges (no. of

edges unchanged)

2. Rand Switch: randomly switch a pair of edges, and repeat it for k times

(nodes’ degree unchanged)

68

1

2 3

4

5

1

2 3

4

5

1

2 3

4

5

1

2 3

4

5

Page 69: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Resilient to Subgraph Attack

69

Links both within and outside attack

subgraph are changed.

Page 70: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Randomization

Randomized response model [Warner 1965]

70

: Cheated in the exam : Didn’t cheat in the exam

Cheated in

exam

Didn’t cheat

AAA

A

Randomization device

Do you belong to A? (p)

Do you belong to ?(1-p)A

)1)(1( pp AA

12

ˆ

12

pp

pAW

1

“Yes” answer

“No” answer

As: Unbiased estimate:

Procedure:

Purpose: Get the proportion( ) of population

members that cheated in the exam.A

Purpose

Page 71: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Randomization [Agarawal SIGMOD00]

71

50 | 40K | ... 30 | 70K | ... ...

...

Randomizer Randomizer

Reconstruct

Distribution

of Age

Reconstruct

Distribution

of Salary

Classification

AlgorithmModel

65 | 20K | ... 25 | 60K | ... ...30

becomes

65

(30+35)

Alice’s

age

Add random

number to

Age

Page 72: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Reconstruction

Given

x1+y1, x2+y2, ..., xn+yn where xi are original values.

the probability distribution of noise Y

Estimate the probability distribution of X.

72

met) criterion (stopping until

1

)())((

)())((1)(

repeat

0

ondistributi uniform

1

1

0

jj

afayxf

afayxf

naf

j

f

n

ij

XiiY

j

XiiYj

X

X

0

200

400

600

800

1000

1200

20 60

Age

Nu

mb

er o

f P

eo

ple

Original

Randomized

Reconstructed

[Agarawal SIGMOD00]

Page 73: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Outline on Graph Randomization

Link privacy

the prob. of existence a link (i,j) given the

perturbed graph

Feature preservation randomization

Spectrum preserving randomization

Markov chain based feature preserving randomization

Reconstruction from randomized graph

73

?)~

|1( GaP ij

Page 74: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Prior probability:

Posterior probabilities

74

Randomized

graph

i

j

ijm~

Method II:

If similarity is large, link (i. j) is

more likely to be a true link

Method I:

m

kmaaP ijij )1~|1(

)~,1~|1( xmaaP ijijij

2/)1()1(

nn

maP ij

Link Privacy: Posterior Beliefs [Ying PAKDD09]

Page 75: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Posterior probability and similarity measures

75

Similarity and proportion of existing edges – before randomization

Similarity and proportion of existing edges – after randomization

Link Privacy: Posterior Beliefs [Ying PAKDD09]

Page 76: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Link Privacy: Posterior Beliefs [Ying PAKDD09]

Posterior probability and similarity measures

How to calculate posterior probability for general

cases?

76

ji

ijijij

ji

ijij

ji

ij maaPaaPaPm )~,~|1()~|1()1(

prior prob. posterior prob. I posterior prob. II

?)~

|1( GaP ij

Page 77: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Exploit graph space to breach link privacy

Link Privacy: Graph Space [Ying SDM09]

77

Page 78: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Sample the graph space when the space is large

1. Start with the randomized graph, construct a Markov chain,

and uniformly sample the graph space.

2. Generate N uniform graph samples

Empirical evaluations show that node pairs with highest

probabilities have serious link disclosure risk (as high as 90%).

Link Privacy: Graph Space [Ying SDM09]

78

N

k

kij

N

jiGN

a

GGGN

1

21

),(1

)SpaceGraph |1Pr(

SpaceGraph ,,, :samples

Page 79: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Graph Features Under Pure Randomization

Topological and spectral features change significantly

along the randomization.

79(Networks of US political books, 105 nodes and 441 edges)

Can we better preserve

the network structure?

Page 80: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Spectrum Preserving Randomization [Ying SDM08]

Graph spectrum is related to many real graph features.

Preserve graph features by preserving some eigenvalues.

80

Page 81: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Spectrum Preserving Randomization [Ying SDM08]

Spectral Switch (apply to adjacency matrix):

Up-switch to increase the eigenvalue:

Down-switch to decrease the eigenvalue:

81

t (xt)

v (xv)

u (xu)

w (xw)

t (xt) u (xu)

w (xw) v (xv)

Page 82: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Spectral Switch (apply to Laplacian matrix):

Down-switch to decrease the eigenvalue:

Up-switch increase the eigenvalue:

t (yt)

v (yv)

u (yu)

w (yw)

t (yt) u (yu)

w (yw) v (yv)

82

Spectrum Preserving Randomization [Ying SDM08]

Page 83: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Algorithm outline

83

Spectrum Preserving Randomization [Ying SDM08]

Page 84: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Preserve any graph feature S(G) within a small range

Feature range constraint specified by the user

Markov chain with feature range constraint

Markov Chain Based Feature Preserving

Randomization [Ying SDM09]

84

],[)~

( ],,[)( ssRGSssRGS

RGSRGSRGSRGS

GGGG

t

t

)()()()(

~

010

10

(uniformity on accessible graphs)

Page 85: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Feature constraint can be used to breach link privacy

85

Original

Released

Markov Chain Based Feature Preserving

Randomization [Ying SDM09]

Page 86: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Reconstruction from Randomized Graph [Wu SDM10]

Motivation

From the randomized graph, can we reconstruct a graph

whose features are closer to the true features?

86

Page 87: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Low rank approximation approach

1. Best rank r approximation by eigen-decomposition:

2. Discretize the low rank matrix

87

Reconstruction from Randomized Graph [Wu SDM10]

Page 88: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Effect of including significant negative

eigenvalues

8888

Original r = 1

r = 4r = 2

Reconstruction from Randomized Graph [Wu SDM10]

Page 89: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

89

Reconstruction from Randomized Graph [Wu SDM10]

Algorithm Outline

Page 90: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Feature value of the original graph, randomized graph,

and the reconstructed graph

9090

Reconstruction from Randomized Graph [Wu SDM10]

Page 91: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Reconstructed graphs do not jeopardize link privacy for

real-world networks.

-- Privacy measured by the proportion of different edges:

9191

Reconstruction from Randomized Graph [Wu SDM10]

Page 92: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

92

Graphs with low rank may have privacy breached by

reconstruction:

Reconstruction on synthetic low rank graphs

92

Reconstruction from Randomized Graph [Wu SDM10]

Page 93: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

93

Graph and feature data

1. Original Graph

2. Binary feature matrix

3. Two individuals with higher similarity in features are more

likely to be connected in the graph.

Reconstruction problem

The maximum likelihood estimation is adopted to reconstruct

the original graph and features.

93

Reconstructing Randomized Social

Networks & Features [Vuokko SDM10]

Page 94: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Random sparsification [Bonchi ICDE11]

only remove edges from the graph without adding new edges.

outperform random add/del in terms of utility preservation partially due to the small word phenomenon:

adding random long-haul edges brings nodes close,

while removing an edge does not bring nodes so much apart since there exit alternative paths.

utility vs. privacy trade-off for various randomization strategies.

94

Page 95: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Outline

• Attacks on Naively Anonymized Graph

• Privacy Preserving Social Network Publishing

• K-anonymity

• Generalization

• Randomization

• Other Works

• Output Perturbation

• Background on differential privacy

• Accurate analysis of private network data

95

Page 96: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Edge-weighted Graph

Edge weights could be sensitive, e.g., trustworthiness of

user A according to user B, or transaction amount

between two accounts.

96

Page 97: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Anonymizing Edge-weighted Graph [Das TR09]

Some properties of edge weights in terms of some

functions are preserved.

Relative distances between nodes for shortest paths or

kNN queries.

A framework for edge weight anonymization of graph

data that preserves linear properties.

A linear property can be expressed by a specific set of

linear inequalities of edge weights.

Finding new weights for each edge is a linear programming

problem.

97

Page 98: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Gaussian Randomization [Liu SDM09]

Perturb edge weights while preserving global and local

utilities. Graph structure is unchanged.

Gaussian randomization multiplication

The original weight of each edge is multiplied by a random

Gaussian noise with mean 1 and some variance.

In the original graph, if the shortest distance d(A,B) is

much smaller than d(C,D), the order is high likely to be

preserved.

Greedy perturbation

Preserve a set of shortest distances.

98

Page 99: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Anonymizing Multi-graphs [Li SDM11]

How to generate an anonymized collection of graphs where each graph corresponds to an individual’s behavior.

XML representation of attributes about an individual

Click-graph in a user-session

Route for a given individual in a time-period

Condensation based approach

create constrained clusters of size at least K

construct a super-template to represent properties of the group

generate anonymized graphs from super-template

99

Page 100: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Computing Privacy Scores [Liu ICDM09]

The privacy score measures the user’s potential privacy risk due to

her online information sharing behaviors. It increases with

sensitivity of the information being shared

visibility of the revealed information in the network

100

Page 101: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Computing Privacy Scores [Liu ICDM09]

Item Response Theory based model

Used to measure the abilities of examinees, the difficulty

of the questions, and the prob. of an examinee to answer a

question correctly.

Each examinee is mapped to a user, and each question is

mapped to a profile item. The difficulty parameter is to

quantify the sensitivity of a profile item.

The true visibility is estimated from observed profiles.

101

Page 102: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Outline

• Attacks on Naively Anonymized Graph

• Privacy Preserving Social Network Publishing

• K-anonymity

• Generalization

• Randomization

• Other Works

• Output Perturbation

• Background on differential privacy

• Accurate analysis of private network data

102

Page 103: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Output Perturbation

103

Data ownerData miner

name sex age disease salary

Ada F 18 cancer 25k

Bob M 25 heart 110k

Cathy F 20 cancer 70k

Dell M 65 flu 65k

Ed M 60 cancer 300k

Fred M 24 flu 20k

George M 22 cancer 45k

Harry M 40 flu 95k

Irene F 45 heart 70k

Query f

Query result + noise

Page 104: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Differential Guarantee [Dwork, TCC06]

104

name disease

Ada cancer

Bob heart

Cathy cancer

Dell flu

Ed cancer

Fred flu

f count(#cancer)

f(x) + noise

name disease

Ada cancer

Bob heart

Cathy cancer

Dell flu

Ed cancer

Fred flu

K

K

f count(#cancer)

f(x’) + noise

3 + noise

2 + noise

Two databases (x, x’) differ in only one row.

Page 105: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Differential Guarantee

Require that the prob. distribution is essentially the same independent of whether any individual opts in to, or opts out of the database.

Anything that can be learned about a respondent from a statistical database should be learnable without access to the database. (Tore Dalenius 1977 ad omnia privacy)

Independent of adversary knowledge.

Different from prior work on comparing an adversary's prior and posterior views of an individual.

105

Page 106: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Outline

• Attacks on Naively Anonymized Graph

• Privacy Preserving Social Network Publishing

• K-anonymity

• Generalization

• Randomization

• Other Works

• Output Perturbation

• Background on differential privacy

• Accurate analysis of private network data

106

Page 107: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

-differential privacy

is a privacy parameter: smaller = stronger privacy

Differential Privacy [Dwork, TCC06 &Dwork CACM11]

Two neighboring datasets are defined in terms of

• Hamming distance |(x-x’) (x’-x)|=1

• Symmetric distance

107

Page 108: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

-differential privacy

is a privacy parameter: smaller = stronger privacy

Differential Privacy [Dwork, TCC06 &Dwork CACM11]

108

is public and its selection is a

social question.

Page 109: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Calibrating Noise

109

Laplace distribution

Sensitivity of function

global sensitivity

local sensitivity

Multiple queries

Page 111: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Gaussian Distribution

111

Page 112: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Sensitivity

112

name sex age disease salary

Ada F 18 cancer 25k

Bob M 25 heart 110k

Cathy F 20 cancer 70k

Dell M 65 flu 65k

Ed M 60 cancer 300k

Fred M 24 flu 20k

George M 22 cancer 45k

Harry M 40 flu 95k

Irene F 45 heart 70k

Function f sensitivity

Count(#cancer) 1

Sum(salary) u (domain upper bound)

Avg(salary) u/n

Complex functions or data mining tasks can be

decomposed to a sequence of simple functions.

L-1 distance for

vector output

Page 113: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Differential Guarantee

113

name disease

Ada cancer

Bob heart

Cathy cancer

Dell flu

Ed cancer

Fred flu

f count(#cancer)

f(x) + Lap(1/ ) K

=1 =2

Difference of Prob.

Page 114: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Neighboring Datasets

114

• Two neighboring datasets are defined in terms of

• Hamming distance |(x-x’) (x’-x)|=1 --- presence/absence

• Symmetric distance --- hiding the value of an individual row

• How about two data sets differing by k rows?

Page 115: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Multiple Queries

115

• The sensitivity of a sequence of two counting

queries is 2. Adding independent noise Lap(2/ )

to each answer yields -differential privacy.

Page 116: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Histogram Query

SELECT count(*)

FROM table

GROUP BY disease

name sex age disease salary

Ada F 18 cancer 25k

Bob M 25 heart 110k

Ed M 60 cancer 300k

George M 22 cancer 45k

Harry M 40 flu 95k

Irene F 45 heart 70k

[3, 2, 1]

cancer, heart, flu

[3+Lap(1/ ), 2+Lap(1/ ),1+Lap(1/ )]

Page 117: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Complex Data Mining Task

Many data mining tasks can be decomposed

into a series of noisy sum and/or noisy

count queries.

Known data mining algorithms include

SVD, ID3 decision tree, K-means clustering,

association rule etc.

117

Page 118: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Recent Development

[Xiao ICDE10] Dependencies among the queries can be

exploited to improve the accuracy of responses.

[Li PODS10] Matrix mechanism for answering a workload of

predicate counting queries

[Kifer SIGMOD11] Misconceptions of differential privacy.

118

Page 119: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Outline

• Attacks on Naively Anonymized Graph

• Privacy Preserving Social Network Publishing

• K-anonymity

• Generalization

• Randomization

• Other Works

• Output Perturbation

• Background on differential privacy

• Accurate analysis of private network data

119

Page 120: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Private Query Answering on Networks[Hay ICDM09]

120

Two neighboring graphs can be defined to differ by a

single edge, K edges, or a single node.

edge -differential privacy

query output is indistinguishable whether any

single edge is present or absent.

K-edge -differential privacy

query output is indistinguishable whether any set

of k edges is present or absent.

node -differential privacy

query output is indistinguishable whether any

single node (and all its edges) is present or absent.

Lap( f/ )

Lap( f K/ )

Page 121: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Degree Sequence

The list of degrees of each node in a graph

The degree sequence of a network may be sensitive as it

can be used to determine the graph structure by

incorporating other graph statistics

121

Page 122: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Two Equivalent Queries

122

Degree sequence D(G)=[1,1,3,3,3,3,2] D(G’)=[1,1,3,3,2,2,2]

D=2, Lap(2/ ) for each component

# of nodes with degree i=0,…n-1

F(G)=[0,2,1,4,0,0,0] F(G’)=[0,2,3,2,0,0,0]

F=4, Lap(4/ ) for each component

Page 123: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Boosting Accuracy [Hay ICDM09]

123

Rewrite query D to get S with constraint Cs

K Submit S

Perturbed answer A(S)

Perform inference on A(S) with

constraints Cs to derive a better estimation

Page 124: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Formulating Query D

124

Degree sequence D(G)=[1,1,3,3,3,3,2] D(G’)=[1,1,3,3,2,2,2]

D=2, Lap(2/ ) for each component

Return the rank i-th degree

S(G)=[1,1,2,3,3,3,3]Perturbed answer could be

[3,2,…]

A new (and more accurate) sequence could be derived by computing the

closest non-decreasing sequence.

+Lap(2/ )

Page 125: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Accurate Motif Analysis

Measures the frequency of occurrence of small

subgraphs in a network, e.g., # of triangles

125

# of triangles is n-2 # of triangles is 0

High sensitivity!

Page 126: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Weakening Privacy

Statistics such as transitivity, clustering coefficient,

centrality, and path-lengths have high sensitivity values.

Possible solutions

[Nissim STOC07] Smooth sensitivity, adopting local

sensitivity, i.e., the max. change between Q(I) and Q(I’) for any

I’ in neighbor(I).

[Rastogi PODS09] Adversary privacy, limiting assumptions

of the priori knowledge of the adversary

More exploration is needed on robust statistics and

differential privacy.

126

Page 127: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Model based Data Publishing

127

Data owner Data miner

name sex age disease salary

Ada F 18 cancer 25k

Bob M 25 heart 110k

Cathy F 20 cancer 70k

Dell M 65 flu 65k

Ed M 60 cancer 300k

Fred M 24 flu 20k

George M 22 cancer 45k

Harry M 40 flu 95k

Irene F 45 heart 70k

K.

.

.

Build models (e.g., contingency table, power-law graph)

Release differentially private model parameters

Generate synthetic data using models with perturbed parameters

Page 128: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Outline

• Attacks on Naively Anonymized Graph

• Privacy Preserving Social Network Publishing

• K-anonymity

• Generalization

• Randomization

• Other Works

• Output Perturbation

• Background on differential privacy

• Accurate analysis of private network data

128

Page 129: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

Challenges

How to quantify graph utility-privacy tradeoff especially for (dynamic) rich graphs?

existing data publication techniques do not provide guarantees of accuracy of graph analysis.

how to define utility?

Scalability is always an issue.

Differential privacy preserving social network mining

sensitivity of statistics and relaxation

129

Page 130: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

References

[Agarawal, SIGMOD00] R. Aggarwal and R. Srikant. Privacy preserving data mining. SIGMOD, 2000.

[Aggarwal, 08] C. C. Aggarwal and P. S. Yu. Privacy-preserving data mining: models and algorithms. Springer, 2008.

[Backstrom, WWW07] L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore art thou R3579X? Anonymized social networks, hidden patterns and structural steganography. WWW, 2007.

[Bhagat, VLDB09] S. Bhagat, G. Cormode, B. Krishnamurthy, and D. Srivastava. Class-based graph anaonymization for social network data. VLDB, 2009.

[Bonchi, ICDE11] F. Bonchi, A. Gionis, and T. Tassa. Identity Obfuscation in Graphs Through the Information Theoretic Lens. ICDE, 2011.

[Campan, PinKDD08] A. Campan and T. M. Truta. A clustering approach for data and structural anonymity in social network data. PinKDD, 2008.

[Cheng, SIGMOD10] J. Cheng, A. Fu, and J. Liu. K-isomorphism: privacy preserving network publication against structural attacks. SIGMOD, 2010.

[Cormode, VLDB08] G. Cormode, D. Srivastava, T. Yu, and Q. Zhang. Anonymizing bipartite graph data using safe groupings. VLDB, 2008.

130

Page 131: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

References

[Das, TR09] S. Das, Omer Egecioglu, and A. E. Abbadi. Anonymizing edge weighted social network graphs. 2009.

[Dwork, CACM11] C. Dwork. A firm foundation for private data analysis. CACM, 2011.

[Dwork, TCC06] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. TCC, 2006.

[Gross, WPES05] R. Gross and A. Acquisti. Information revelation and privacy in online social networks (the Facebook case). WPES, 2005.

[Hanhijarvi, SDM09] S. Hanhijarvi, G. C. Garriga, and K. Puolamaki. Randomization techniques for graphs. SDM, 2009.

[Hay, VLDB08] M. Hay, G. Miklau, D. Jensen, D. Towsely, and P. Weis. Resisting structural re-identification in anonymized social networks. VLDB, 2008.

[Hay, 07] M. Hay, G. Miklau, D. Jensen, P. Weis, and S. Srivastava. Anonymizing social networks. 2007.

131

Page 132: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

References

[Hay, 09] M. Hay, G. Miklau and D. Jensen. Enabling accurate analysis of private data analysis. 2009.

[Kifer, SIGMOD11] D. Kifer and A. Machanavajjhala. No Free Lunch in Data Privacy. SIGMOD, 2011.

[Korolova, ICDE08] A. Koroloca, R. Motwani, S. Nabar, and Y. Xu. Link privacy in social networks. ICDE, 2008.

[Li, PODS10] C. Li, M. Hay, V. Rastogi, G. Miklau, A. McGregor. Optimizing linear counting queries under differential privacy. PODS, 2010.

[Liu, SIGMOD08] K. Liu and E. Terzi. Towards identity anonymization on graphs. SIGMOD 2008.

[Liu, ICDM09] K. Liu and E. Terzi. A framework for computing the privacy scores of users in online social networks. ICDM 2009.

[Liu, SDM09] L. Liu, J. Wang, J. Liu and J. Zhang. Privacy preserving in social networks against sensitive edge disclosure. SDM 2008.

[McSherry, FOCS07] F. McSherry and K. Talwar. Mechanism design via differential privacy. FOCS, 2007.

132

Page 133: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

References

[Narayanan, 09] A. Narayanan and V. Shmatikov. De-anonymizing social networks. 2009.

[Newman, PRE06] M. Newman. Physical Review E, 2006.

[Vuokko, SDM10] N. Vyokko and E. Terzi. Reconstructing randomized social networks. SDM, 2010.

[Wu, SDM10] L. Wu, X. Ying, and X. Wu. Reconstruction of randomized graph via low rank approximation. SDM, 2010.

[Wu, EDBT11] W. Wu, Y. Xiao, W. Wang, Z. He, and Z. Wang. k-symmetry model for identity anonymization in social networks. EDBT, 2011.

[Wu, 09] X. Wu, X. Ying, K. Liu, and L. Chen. A Survey of Algorithms for Privacy-Preservation of Graphs and Social Networks. 2009.

[Ying, SDM08] X. Ying and X. Wu. Randomizing social networks: a spectrum preserving approach. SDM, 2008.

[Ying, SDM09] X. Ying and X. Wu. Graph generation with prescribed feature constraints. SDM, 2009.

133

Page 134: A Tutorial of Privacy-Preservation of Graphs and Social Networksxintaowu/publ/ppsn-tut.pdf · Community partition and outlier detection Information spreading Resiliency/robustness,

References [Ying, SDM09-2] X. Ying and X. Wu. On randomness measures for social networks. SDM, 2009.

[Ying, PAKDD09] X. Ying and X. Wu. On link privacy in randomizing social networks. PAKDD, 2009.

[Xiao, ICDE10] X. Xiao, G. Wang, and J. Gehrke. Differential privacy via wavelet transformation. ICDE,

2010.

[Zheleva, PinKDD07] E. Zhelava and L. Getoor. Preserving the privacy of sensitive relationships in

graph data. PinKDD, 2007.

[Zhou, ICDE08] B. Zhou and J. Pei. Preserving privacy in social networks against neighborhood attacks.

ICDE, 2008.

[Zou, VLDB09] L. Zou, L. Chen, and M. T. Ozsu. K-automorphism: A general framework for privacy

preserving network publication. VLDB, 2009.

134