Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. ·...

70
Jun Zhang Department of Computer Science University of Kentucky, USA 1

Transcript of Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. ·...

Page 1: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Jun ZhangDepartment of Computer Science

University of Kentucky, USA

1

Page 2: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Outline

�Social Network Background

�Privacy Challenges

�Data Privacy and Data Utility

�Clustering-Based and Heuristic Algorithms

�Case Study

�Conclusions

2

Page 3: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

What is “social network” ?

�Information circulation: blog photo

news

�Information sharing: data friendship communication

professional

Social network is a computerized interactive structure with the purpose of promoting information circulation and sharing , aided by computer devices and internet media.

Section 1: Background

3

Page 4: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

1971, Beginning of Internet

As a precursor of current internet, ARPANET just connected 18 academic and governmental partners.

Source: Richard T. Griffiths. History of the Internet, Internet for Historians

Section I: Background

4

Page 5: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

2011, everyone’s Internet

Section 1: Background

5

Page 6: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

What is social networking

Many People are virtually connected in

one way or another

Section 1: Background

6

Page 7: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Growing Popularity� More than 160 major social network websites in the world [1]

� In 3rd quarter of 2014, 1.35 billion active users in Facebook, 728 million users with daily login, each account has an average of 130 friends [3]

� In 2008, account creation of Facebook was majorly contributed by users with 35-year-old and older[4] (270% increase)

� At the beginning of 2008, Twitter had only 0.5 million users, at the end of 2008, the number turned to be 4.43 million. [5] (752% increase) In 4th quarter of 2014, it has 288 million active users

� Top global social network (2015): Facbook 1,415 million, QQ 829 million, WhatsApp 700 million, Qzone 629 million, Facebook Messenger 500 million, WeChat 468 million, LinkIn 347 million, Skype 300 million, Google+ 300 million, Instgram 300 million, Baidu Tieba 300 million, Twiter 288 million,

Section 1: Background

7

Page 8: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

What are they doing onlineSection 1: Background

8

Page 9: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Why research on social network?� Essential reason: social network is an abstract but

effective representation of the real society.

� This abstract representation can help us understand social development, economical depression, information circulation, epidemic spread, etc.

� A better understanding of social networks can promote social benefits and support policy-making.

Section 1: Background

9

Page 10: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Representation of real societyExample 1 [1][2]: � Social network can really

represent a real society trend in some cases

� The correlation between social network and real society means wide demographic involvement in social network

[1]. http://openlook.org/blog/2007/12/21/cb-1195/ (in Korean), [2]. Lee et al. Googling hidden interactions: Web search engine based weighted network construction, 2007.

Section 1: Background

10

Page 11: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Revolutions and Social Network

[1]. http://socialtuts.cim, [2]. News.mgid.com. [3]. Hellotrade.com.

Section 1: Background

11

Page 12: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

We are close to each other

Six Degree of Separation (small-world theory) :In a society, any two persons can be connected by no more than 6 friends.

Example 2: In 1967,Dr. Stanley

Milgram at Harvard University verified it by a seminal experiment.

On Facebook, the degrees of separation is 4.74 (average users). It is shrinking quickly

Source:http://en.wikipedia.org/wiki/Six_Degrees_of_Separation_%28film%29

Section 1: Background

12

Page 13: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Academic Interest

Section 1: Background

13

Page 14: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Why study Privacy Preservation�User engagement

Wide demographic engagement benefits social networks. Malicious users, like hackers, can exploit it。

� Information circulation

Information circulation advances data utilization. In the process of circulation, information can be damaged.

� Information sharing

Information sharing can promote mutual benefits. But how to control the level and boundary of sharing? Pirating

Section 1: Background

14

Page 15: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Privacy in social network� Case 1::::

Source: Sweeney. k-anonymity: a model for protecting privacy, 2002.

Dr. Sweeney in CMU could exactly locate the medical record of the Governor by analyzing the public voter registration table (right circle) of Massachusetts, USA, and GIC anonymized insurance information (left circle) .

Section 1: Background

15

Page 16: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Privacy in social network

Section 1: Background

16

Page 17: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Privacy in social network

Section 1: Background

17

Page 18: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Challenge 1: How to store/handle social data

� Huge volume: 1.415 billion active accounts in Facebook, 50% of active accounts with daily login, each account has an average of 130 friends

� Heterogeneity:numerical (age, salary, frequency),discrete (political affiliation), string (blog, comment),multimedia (photo, audio, video), relational data (friendship, membership)

� Rapid Update

Section 2: Challenge

18

Page 19: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Challenge 2: How to represent background

Any information can be used as background information to benefit privacy attacks

� topological structure

� profiles

� friendship

� membership

� contact frequency

� more…

Section 2: Challenge

19

Page 20: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Challenge 3: How to measure data loss� To preserve privacy, it is necessary to

modify/perturb original social networks.

� In the process of modification, some original information/patterns will be lost.

� How to mathematically measure data loss? Where is the baseline of data loss?

Section 2: Challenge

20

Page 21: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Challenge 4: How to design algorithms� Globalization of data utility

� Balance of data privacy and data utility

� Multi-utilization

Section 2: Challenge

21

Page 22: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

What is Data Privacy?� An open problem

� In the field of social networks, privacy is any information that can be used to link a social network user and a real society identity.

� Attack is any projection that can establish such linkages between a social network user and a real identity.

Section 3: Privacy & Utility

22

Page 23: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Purpose

Section 3: Privacy & Utility

Cut off any information connection

23

Page 24: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Naïve PrivacyDirectly identifiable information:

� Identifiable number (passport,drive license)

� Real name, address, affiliation

� Special experiences (chairman of a department)

Section 3: Privacy & Utility

24

Page 25: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Node Degree Attack

A

E

CD

B

F

Background:Bob makes friends with everyone. Or Bob is the most popular person in this group.

Haha, C is Bob.

Source:Backstrom et al. WWW ‘07, Hay et al. VLDB ‘08, Liu & TerziSIGMOD ’08, Narayanan SP ’08, ‘09

Section 3: Privacy & Utility

25

Page 26: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Neighborhood Attack

A

E

CD

B

F

Background:Bob has 2 friends, Alice and Carl , who know each other. Bob has another 2 friends, Dunn and Lily, who do not know each other. Lily has many other friends, but Bob knows nobody.

Source:Zhou et al. Preserving Privacy in Social Networks Against Neighborhood Attacks. ICDE 2008.

C is Bob, B is Lily, and A is Dunn.

Section 3: Privacy & Utility

26

Page 27: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Membership Attack

A

E

CD

B

F

“C should be male and around 20.

Source:Zheleva & Getoor PinKDD ‘07, Korolova et al. CIKM ‘08

Background:A={male, 32-year-old}, B={male, 16}, E={female,21}, F={female, 22}, D={male, 20}.C={??,??}

College Congress

Hiking Club

Section 3: Privacy & Utility

27

Page 28: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Link Label Attack

A

E

C

DB

F

Background:Bob is a hearing-impaired person.

Haha, B is Bob.

Source:Cormode et al., VLDB2008.

Section 3: Privacy & Utility

28

Page 29: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Association Attack

Section 3: Privacy & Utility

29

I don’t want my friendsKnow Peter is my brother

Page 30: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Sybil Attack

Source:Backstrom et al., WWW 2007.Photo: http://www.problogger.net/archives/2007/06/28/what-social-networking-sites-do-you-use-how-do-they-benefit-your-blog/

Bob

Theory: k=(2+δ)log(n) sybil nodes can breach most node identities

Experiment: As to a real social network with 4m nodes, just 7 sybilnodes can locate 2400 real identities.

Section 3: Privacy & Utility

30

Page 31: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

What is data utility?

�Another open question

�Data utility is considered as any knowledge/patterns from analyzing social networks which can facilitate the understanding of social networks or society.

Section 3: Privacy & Utility

31

Page 32: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Purpose

Section 3: Privacy & Utility

Maintain usefulnessof data

32

Page 33: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Degree distribution

Source:[1] Newman et al. Email networks and the spread of computer viruses. 2002 (photo)[2] Albert et al. Error and attack tolerance of complex networks. Nature 2000.

Section 3: Privacy & Utility

33

Page 34: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Giant community

Source: Newman et al. Email networks and the spread of computer viruses. 2002

Section 3: Privacy & Utility

34

Page 35: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Shortest paths (data utility)

Source:[1] Liu et al., Privacy Preservation in Social Networks with Sensitive Edge Weights ,SDM2009. [2]. Das et al. Anonymizing Edge-Weighted Social Network Graphs, ICDE 2010.

Section 3: Privacy & Utility

35

Page 36: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Eigenvalue

Source:Ying et al. On Randomness Measures for Social Networks, SDM2009.

Section 3: Privacy & Utility

36

Page 37: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

SQL query� Select count(distinct *)

from social network G

group by node.major (count the number of majors

in G)

� Select avg(node.age)

from social network G

where node.interest=“computer game”

(count average age of users

interested in computer game)

� more…… (data utility and SQL query)

Section 3: Privacy & Utility

37

Page 38: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Clustering-based algorithm

A={male,49}

E={female,22}

C={??,??}D={male,21}

B={male,16}

F={female,20}

College Congress

Hike Club

A={*,49}

E={female,[16-30]}

C={??,??}D={male, [16-30]}

B={*,16}

F={female, [16-30]}

Hike Club

Source:Campan, PinKDD08, Hay VLDB08, Cormode VLDB08, VLDB09

Section 4: Algorithms

College Congress

38

Page 39: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Heuristic

A

E

CD

B

F

d(G)={4,4,2,2,2,1,1,1,1}v(G)={D,C,E,F,B,A,I,G,H}

Goal:the number of nodes with the same degree should be more than 3.

d(G)={4, 4, 2, 2,2, 1, 1, 1, 1}

I

G

H

Source:Liu et al. Towards Identity Anonymization on Graphs, SIGMOD 2008.

Section 4: Algorithms

39

Page 40: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Heuristic

A

E

CD

B

F

d(G)={4,4,2,2,2,1,1,1,1}v(G)={D,C,E,F,B,A,I,G,H}

Goal:the number of nodes with the same degree should be more than 3.

d(G)={4, 4, 2, 2, 2, 1, 1, 1, 1}

d(G)={4, 4, 4, 4, 2, 2, 2, 2, 2}v(G)={D,C,E,F,B,A,I,G,H}

I

G

H

Source:Liu et al. Towards Identity Anonymization on Graphs, SIGMOD 2008.

Section 4: Algorithms

40

Page 41: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Heuristic

A

E

CD

B

F

d(G)={4,4,2,2,2,1,1,1,1}

Goal:the number of nodes with the same degree should be more than 3 (k=3).

DA(d[i,j])=optimal solution to make [d[i],d[j]] satisfy the goalI(d[i,j])=optimal solution to make [d[i],d[j]] have the same degree

I(d[i,j])=

For i<2kDA(d[1,i])=I(d[1,i])

For i>=2kDA(d[1,i])= {DA(d[1,t])+I(d[t+1,i]), I(d[1,i])}

I

G

H∑

=

j

ikkdid ))()((

minkitk −<=<=

Section 4: Algorithms

41

Page 42: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Weight privacy

42

In a weighted social network, a large weight probably implies a close personal relationship which many people do not want to become public

Section 5: Case Study

Page 43: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Social Network Weight Privacy

43

In a social network, the frequency of communications (chats, messages, e-mails) between users is weights. If you have several online friends, girlfriends, you probably do not want to disclose your communication frequency with each one of them.

100

20

40

35

Page 44: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Social Network in real life

44

T

PrincipalParents

Spend minimum amount of money, get maximum amount of benefit

Person 1

Person 2

Person n

In order to get his child enrolledin a good school, a parent wantsto get connected with the Principal

¥500

¥100

¥300¥200

¥400

¥500

Page 45: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Data Privacy and Data Utility

45

In the following presentation, • Data Privacy --- all edge’s weights • Data Utility --- shortest paths and lengths due to its rich application.

• Perturb weights as much as possible, • Keep the shortest paths (and lengths) the same as the original ones as much as possible.

Section 5: Case Study

Page 46: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

46

Weight Privacy in Business

New

Su

pp

lier

Walm

art

Agent A Agent D

Agent B

Agent C Agent E

Unit=Million Dollars/Month

40

10

43

60

85 50

90

70

6648

Find the cheapest supply chain from New Supplier to Walmart

This is to find the shortest path

Page 47: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Challenges

47

Theorem: There does NOT exist a perfect scheme to modify all weights but maintain all shortest paths (and lengths). *

* Formal proposition and mathematic proof are referred to Proposition 1 in our paper.

Data Utility

Data Privacy

Challenges:Data Utility (i.e., the shortest paths and lengths) is global property.Data Privacy (i.e., individual weights) is local information.

How can we carefully change local weights without unacceptable impact on shortest paths and lengths?

Section 5: Case Study

Page 48: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

48

Gaussian Perturbation

New

Su

pp

lier

Wal-M

art

Agent 1 Agent 5

Agent 2

Agent 3 Agent 4

Unit=million/month

40

10

43

60

85 50

90

70

66

w*i,j = wi,j (1-xi,j),

Here xi,j is a randomly generated number from the Gaussian distribution N(0,σ2).

New

Su

pp

lier

Wal-M

art

Agent 1 Agent 5

Agent 2

Agent 3 Agent 4

Unit=million/month

35

32

33

70

70 65

67

70

7048 36

• Privacy: Almost all weights are changed.• Utility: Same shortest path between New Supplier and Wal-Mart and length is 99.

Section 5: Case Study

Page 49: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

HOW TO MODIFY WEIGHTS AND KEEP SP

� Gaussian Perturbation

For a path, its connecting edges

may be changed in a negative or

positive way. Totally, change may

be very close to zero for a path.

97.7% xi,j and 99.9% xi,j are

resided in 2σ and 3σ from zero.

Page 50: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Analysis on Gaussian perturbation

50

Claim 2: Let the length of a path be L in original networks and L* be the length of the corresponding path in perturbed networks.

1. Approximately 68% L satisfy ,

2. Approximately 98% L satisfy

3. Approximately 99.7% L satisfy

for a given value of σ

* Formal theorem/corollary and mathematic proofs are referred to Theorem 2 and Corollary 3 in our paper, respectively.

Section 5: Case Study

Page 51: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Analysis on Gaussian perturbation

51* Formal theorem/corollary and mathematic proofs are referred to Theorem 2 and Corollary 3 in our paper, respectively.

Claim 3: Let di,j be the length of the shortest path between node i and node j, and di,j

second be the length of the second shortest path between same node pair.

For two given nodes i and j, if the ratio βi,j= is greater than 2σ, the

shortest path is highly possible to be preserved after Gaussian perturbation. *

Recall Claim 2. Approximately 98% L satisfy

Section 5: Case Study

If the shortest path and the second shortest path differ by a large length,the shortest path is very likely to be preserved after the perturbation

Page 52: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

An example

52

The shortest path, length is 21

The second shortest path, length is 30

Section 5: Case Study

Page 53: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

An exampleThe shortest path, length is 21

The second shortest path, length is 30

σ = 0.15

Gaussian perturbation

β1,6 = (30-21)/21 = 0.429 >= 2σ. So the shortest path between v1 and v6 can be maintained no matter how you choose the random value from Gaussian distribution.

Section 5: Case Study

Page 54: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

RESULTS WITH GAUSSIAN PERTURBATION

σ=0.1 on

EIES

σ=0.15 on

EIES

σ=

0.2

on

EIE

S

At x-axis 0.15, for example, the dashed

point (length) is 0.8699 and the solid

point (weight) is 0.8565. It means that,

in the Gaussian algorithm, 85.65% w*I,j

fall into wi,j (1 ± 0.15), and 86.99% d*I,j

fall into di,j (1 ± 0.15).

Page 55: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Greedy Perturbation: Discussion

55

• Gaussian Perturbation is quick and independent with global structure. But it cannot always keep the same shortest paths when σ is not large.

• We propose a Greedy Perturbation which can keep the exact shortest paths, and make sure that their corresponding lengths are similar to the original ones.

Section 5: Case Study

Page 56: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Edge Categorization

56

the shortest path p1,6 the shortest path p4,6 the shortest path p3,6

V1

V

2

V3

V

4

V

5

V

6

V1

V

2

V3

V

4

V

5

V

6

V1

V

2

V3

V

4

V

5

V

6

H={p1,6 , p4,6 , p3,6}.

Constraints: the shortest paths in H cannot be changed after perturbation.

Section 5: Case Study

Page 57: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Edge Categorization

57

the shortest path p1,6 the shortest path p4,6 the shortest path p3,6

non-visited edgespartially-visited edges

all-visited edgesV1

V

2

V3

V

4

V

5

V

6

6

9

6

7

5

13

25

10

10

V1

V

2

V3

V

4

V

5

V

6

V1

V

2

V3

V

4

V

5

V

6

V1

V

2

V3

V

4

V

5

V

6

Section 5: Case Study

Page 58: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Non-Visited Edge

58

Claim 3: For a non-visited edge, increasing its weight will NOT change all shortest paths (and lengths) in H. *

*Formal definition is referred to Proposition 7.

V1

V

2

V3

V

4

V

5

V

6

6

65

10

10

7�10

25

13

9

P1,6 (no change)

P4,6 (no change)

P3,6 (no change)

Section 5: Case Study

Page 59: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

All-Visited Edge

59

Claim 4: For an all-visited edge, decreasing its weight will NOT change all shortest paths in H, but decrease the length of corresponding shortest paths. *

*Formal definition is referred to 8.

V1

V

2

V3

V

4

V

5

V

6

6

9

65

13

25

10

10�5

7

P1,6 (no change)

P4,6 (no change)

P3,6 (no change)

Section 5: Case Study

Page 60: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Greedy Edge Perturbation (2)

60

Claim 5: For a partially-visited edge, if we want to increase its weight by t, we should guarantee the shortest paths, which go through it, will still go by this edge after perturbation. *

* How do we guarantee it (i.e., impose some constraints over the weight increasing) will be shown

as Proposition 9 in our paper.

V1

V

2

V3

V

4

V

5

V

6

6

9

6

7

5�16

13

25

10

10

P1,6 (probably change to P-1.6 )

P4,6 (no change)

P3,6 (probably change to P-3,6 )

P-1,6 , the shortest path

between V1 and V6 in G- (G delete the edge between V2

and V5)Constraints: the weight increment t should be smaller than the diff. between di,j and d-

I,j .

Section 5: Case Study

Page 61: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Partially-Visited Edge

61

Claim 6: For a partially-visited edge, if we want to decrease its weight by t, we should guarantee the shortest paths, which do not go through it, will not change after perturbation. *

* How do we guarantee it (i.e., impose some constraints over the weight decreasing) will be shown

as Proposition 10 in our paper.

V1

V

2

V3

V

4

V

5

V

6

6

9

6

7

5�2

13

25

10

10

P1,6 (no change)

P4,6 (probably change to P+4,6)

P3,6 (no change)

P+4,6 , the shortest path

between V4 and V6 and through edge (V2 � V5)

Constraints: the weight decrement t should be larger than the diff. between d+

i,j and di,j .

Section 5: Case Study

Page 62: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

62

1. Increase non-visited edges and decrease all-visited edges.

2. Sort all partial-visited edges in a descending order, in terms of the number of shortest paths going through them.

3. For a given partial-visited edge, whether increasing or decreasing depends on the comparison between the real length and the current (perturbed) length.

4. For a given partial-visited edge, the modified value is chosen as the boundary value of constraint inequalities.

Greedy Algorithm

* For the detailed algorithm, please refer to Algorithm 1 in our paper.

Section 5: Case Study

Page 63: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

RESULTS WITH GREEDY PERTURBATION (1)

For example, at x-axis 0.15, the dashed line point (length) is 60% and the solid

point (weight) is 54%. It means that, after the greedy perturbation, 54% w*I,j

of the perturbed edges fall into wi,j (1 ± 0.15), and 60% d*I,j of the perturbed

shortest path lengths fall into di,j (1±0.15), in addition to the shortest paths

of all targeted pairs in H being exactly preserved.

Page 64: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

RESULTS WITH GREEDY PERTURBATION (2)

Page 65: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

RESULTS WITH GREEDY PERTURBATION (3)

Page 66: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Discussion on Experiments

66

Data Utility Data Privacy

Gaussian Perturbation Lengths of the shortest paths are better preserved, cannot guarantee maintain the exact shortest path.

Low

Greedy Algorithm Length is not well preserved compared to Gaussian. But the shortest paths are exactly maintained.

High

Section 5: Case Study

Page 67: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Study Case Remarks(What do we want to do?)

Keep weight privacy and the shortest path utility.

(Why do we want to do?) Weights in some social cases are sensitive and confidential.

(How do we do?)Gaussian perturbation and greedy perturbation are proposed to achieve the balance between data utility and data privacy in different conditions.

(What we do is applicable?)It seems that the two strategies do meet the expectation of our purpose.

67

Section 5: Case Study

Page 68: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Conclusion� Social networks and social network research are

promising

� Privacy issues in social network analysis should be emphasized

� Social network privacy preservation, data utility, social network analysis algorithms, need further research and study

Section 6: Conclusion

68

Page 69: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Funding Agencies and

Student Researchers

69

•US National Science Foundation

•Kentucky Science andEngineering Foundation

•US National Institutes Of Health

Page 70: Jun Zhang - University of Kentuckyjzhang/CS689/PPDM-Chapter9.pdf · 2015. 3. 30. · Source:Campan, PinKDD08, Hay VLDB08, CormodeVLDB08, VLDB09 Section 4: Algorithms College Congress

Privacy-Preserving Social Network with

Sensitive Information

Contact Information:

Dr. Jun Zhang (张骏)

E-mail: [email protected]

[email protected] (中文)

http://www.cs.uky.edu/~jzhang

Phone: 13540021323 (中国手机)

70