Curso- Protect Yourself From Curse of Attribute Inference

6
curso: Protect Yourself from Curse of Attribute Inference A social network privacy-analyzer Eunsu Ryu Yao Rong Dept. of Electrical & Comp. Engg Duke University, Durham, NC, USA {er40, yao.rong}@duke.edu Jie Li Ashwin Machanavajjhala Dept. of Computer Science Duke University, Durham, NC, USA {jieli, ashwin}@cs.duke.edu ABSTRACT While social networking platforms allow users to control how their private information is shared, recent research has shown that a user’s sensitive attribute can be inferred based on friendship links and group memberships, even when the attribute value is not shared with anyone else. Thus, exist- ing access control mechanisms are unable to protect against such privacy breaches. Our research goal is to develop tools that help a user Alice be aware of privacy breaches via attribute inference. In this paper, we specifically focus on two problems: (a) whether Alice’s sensitive attribute can be inferred based on public information in Alice’s neighborhood, and (b) whether mak- ing Alice’s sensitive attribute public leads to the disclosure of sensitive information of another user Bob in Alice’s neigh- borhood. We propose three algorithms to detect the afore- mentioned privacy breaches. We limit our scope to the one- hop neighbors of Alice – information that is visible to an app that can be executed on behalf of Alice. Our results in- dicate that analyzing local networks is sufficient to extract a significant amount of information about most users. Categories and Subject Descriptors H.2 [Database Management]: Data mining General Terms Algorithms, Security Keywords social networks, attribute inference 1. INTRODUCTION Social networks have gained a wide popularity over the past decade. While the unprecedented success of the social networking industry has established an attractive ecosystem with advertisements and social gaming, the increasing vol- ume of personal information shared in social networks has Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DBSocial ’13 New York, NY USA Copyright 2013 ACM 978-1-2191-4 ...$15.00. raised serious privacy concerns [2, 7, 9, 13]. For instance, social networks have been criticized for leaking user privacy [6], and advertisers take advantage of social networks to col- lect information about users. As a remedy, social networking companies allow users to hide a portion of their profiles, or to select specific groups of friends with whom to share sensitive information. Unfor- tunately, it has been shown that this approach does little to protect users from privacy breach. Recent work [5, 13] demonstrates that it is still possible to infer sensitive user attributes to an embarrassingly high accuracy using only friendship and group information. The fact that every user publishes different parts of profile implies that private infor- mation is present, can be learned, and even shared subcon- sciously in the social network. For instance, while Alice may want to keep the fact that she can speak Mandarin private, if all of her friends publicize the fact that they speak Man- darin, then one might infer with high probability that Alice also speaks Mandarin. Therefore, the access control mechanisms provided by so- cial networks cannot protect against such privacy breaches, and in fact lull users into a false sense of privacy. It is thus important to raise awareness amongst social network users about the possibility of the aforementioned attribute infer- ence attacks. Our research goal is to build tools that can execute on behalf of a user, detect potential attribute in- ference attacks, and warn the user so that they can make an informed decision. In this paper, we present some initial work toward this goal. Contributions: In this paper we focus on two concrete problems: (a) whether a user Alice’s sensitive information can be inferred based on public attributes of her friends, and (b) whether making Alice’s attribute value publicly accessi- ble results in the disclosure of a private attribute value for another user Bob in Alice’s neighborhood. While the former problem directly affects Alice’s privacy, the latter problem may inform a conscientious friend of Bob about Bob’s pri- vacy disclosure. We depart from prior work in the following way: rather than analyzing the risks of attribute inference using large global networks that contain millions of users and thousands of attributes [5, 12, 13], we focus our attention to only the one-hop neighborhood of Alice. Not only does this allows our algorithms to be very efficient, focusing on the immediate neighborhood would help building tools (future work) that run on behalf of the user by leveraging the infor- mation in the social network that is accessible to the user (via APIs). Using the entire social network would require 13

description

Social Networking

Transcript of Curso- Protect Yourself From Curse of Attribute Inference

Page 1: Curso- Protect Yourself From Curse of Attribute Inference

curso: Protect Yourself from Curse of Attribute Inference

A social network privacy-analyzer

Eunsu Ryu Yao RongDept. of Electrical & Comp. Engg

Duke University, Durham, NC, USAer40, [email protected]

Jie Li Ashwin MachanavajjhalaDept. of Computer Science

Duke University, Durham, NC, USAjieli, [email protected]

ABSTRACTWhile social networking platforms allow users to controlhow their private information is shared, recent research hasshown that a user’s sensitive attribute can be inferred basedon friendship links and group memberships, even when theattribute value is not shared with anyone else. Thus, exist-ing access control mechanisms are unable to protect againstsuch privacy breaches.

Our research goal is to develop tools that help a user Alicebe aware of privacy breaches via attribute inference. In thispaper, we specifically focus on two problems: (a) whetherAlice’s sensitive attribute can be inferred based on publicinformation in Alice’s neighborhood, and (b) whether mak-ing Alice’s sensitive attribute public leads to the disclosureof sensitive information of another user Bob in Alice’s neigh-borhood. We propose three algorithms to detect the afore-mentioned privacy breaches. We limit our scope to the one-hop neighbors of Alice – information that is visible to anapp that can be executed on behalf of Alice. Our results in-dicate that analyzing local networks is sufficient to extracta significant amount of information about most users.

Categories and Subject DescriptorsH.2 [Database Management]: Data mining

General TermsAlgorithms, Security

Keywordssocial networks, attribute inference

1. INTRODUCTIONSocial networks have gained a wide popularity over the

past decade. While the unprecedented success of the socialnetworking industry has established an attractive ecosystemwith advertisements and social gaming, the increasing vol-ume of personal information shared in social networks has

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.DBSocial ’13 New York, NY USACopyright 2013 ACM 978-1-2191-4 ...$15.00.

raised serious privacy concerns [2, 7, 9, 13]. For instance,social networks have been criticized for leaking user privacy[6], and advertisers take advantage of social networks to col-lect information about users.

As a remedy, social networking companies allow users tohide a portion of their profiles, or to select specific groupsof friends with whom to share sensitive information. Unfor-tunately, it has been shown that this approach does littleto protect users from privacy breach. Recent work [5, 13]demonstrates that it is still possible to infer sensitive userattributes to an embarrassingly high accuracy using onlyfriendship and group information. The fact that every userpublishes different parts of profile implies that private infor-mation is present, can be learned, and even shared subcon-sciously in the social network. For instance, while Alice maywant to keep the fact that she can speak Mandarin private,if all of her friends publicize the fact that they speak Man-darin, then one might infer with high probability that Alicealso speaks Mandarin.

Therefore, the access control mechanisms provided by so-cial networks cannot protect against such privacy breaches,and in fact lull users into a false sense of privacy. It is thusimportant to raise awareness amongst social network usersabout the possibility of the aforementioned attribute infer-ence attacks. Our research goal is to build tools that canexecute on behalf of a user, detect potential attribute in-ference attacks, and warn the user so that they can makean informed decision. In this paper, we present some initialwork toward this goal.

Contributions: In this paper we focus on two concreteproblems: (a) whether a user Alice’s sensitive informationcan be inferred based on public attributes of her friends, and(b) whether making Alice’s attribute value publicly accessi-ble results in the disclosure of a private attribute value foranother user Bob in Alice’s neighborhood. While the formerproblem directly affects Alice’s privacy, the latter problemmay inform a conscientious friend of Bob about Bob’s pri-vacy disclosure. We depart from prior work in the followingway: rather than analyzing the risks of attribute inferenceusing large global networks that contain millions of users andthousands of attributes [5, 12, 13], we focus our attention toonly the one-hop neighborhood of Alice. Not only does thisallows our algorithms to be very efficient, focusing on theimmediate neighborhood would help building tools (futurework) that run on behalf of the user by leveraging the infor-mation in the social network that is accessible to the user(via APIs). Using the entire social network would require

13

Page 2: Curso- Protect Yourself From Curse of Attribute Inference

these tools to be built in cooperation with the social networkplatform.

We propose three models for detecting and quantifyingthe aforementioned privacy breach scenarios. Our methodsare evaluated on real social network data. Our results in-dicate that analyzing the one-hop neighborhood is sufficientto infer the values of private attributes from a significantfraction of users. However, it should be noted that the pro-posed framework is unable to prove if Alice’s profile is freeof attribute inference, as the adversary can potentially havemore information than is available to our tool.Outline: The remainder of the paper is organized as fol-lows. In Section 2, we discuss related work. Section 3 in-troduces notation, and describes the problem formulation.Section 4 presents three novel models for attribute inferencesalong with inference methods. In Section 5 we discuss ex-periments and performance evaluations of our models, andwe conclude in Section 6.

2. RELATED WORKMany recent research publications analyze social network

structures to infer hidden attributes. Zheleva & Getoor [13]use a large Facebook network to show that friendships andgroup memberships contain sufficient information to learnend-user hidden attributes with astounding accuracy. Otherwork on similar lines [7, 9], use friendships and attribute in-formation to evaluate risks of privacy breach. Backstrom etal [2] also show active attacks through which adversaries mayreidentify and learn sensitive information from anonymizedsocial networks.

Many models have been proposed for inferring links andattributes in social networks. In [1, 5, 12], authors use social-attribute networks (SANs) to jointly infer latent attributesas well as friendships.

The attribute inference problem can be formulated as aclustering problem [3], a matrix factorization problem [8],or a regression problem. Restrictive Boltzmann Machine(RBM) introduced in [11] also has an interesting applica-tion in inferring latent features such as hidden attributes.Using RBM for social network attribute inference may bean interesting direction for future research.

3. PROBLEM FORMULATIONFigure 1 shows the local social network around an end-

user named Alice. Alice has a set of public attributes acces-sible by all of her friends. She has a hidden attribute, say,her ability to speak Mandarin. She wants to find out, tothe best of her knowledge, whether an adversary can guessthis information based on her public attributes. She is alsoworried if publicizing this attribute would breach any of herfriends’ privacy. Specifically, based on the structure of herlocal network, Alice wants to:

• Task 1: determine whether her secret can be guessedfrom public information.

• Task 2: determine whether publicizing her hidden at-tribute would breach her friend’s privacy.

Formally, consider a local network L = (V,E) around a user(Alice). This network is a graph that consists of Alice andher friends; each node i ∈ V represents a user while theedges in E model friendships. Let N denote the numberof Alice’s friends so that |V | = N + 1, and i = 0 repre-

Figure 1: Local network around an end-user Alice.Alice only has access to her local network.

sents Alice herself. Assume each user i has M binary at-tributes/features xijMj=1, where xij ∈ 0, 1. For someuser-attribute pair (i, j), xij is public, while for others, xijis hidden. If xij = 1, then we say that user i has a posi-tive jth attribute/feature. Conversely, xij = 0 means theuser i has a negative jth attribute/feature. If xij is notknown (missing), then user i has a hidden (or private) jthattribute/feature. Under this construction, Alice has accessto T = (L,X) for all public xijs.

Suppose Alice has a hidden attribute x0m ∈ 0, 1. Weseek to:

• Quantify the amount of information about x0m that canbe inferred by adversaries based on the structure of T(for Task 1).

• Quantify the gain of information about the attribute macross the network L induced by publicizing x0m (forTask 2).

Specifically, we seek to design an estimator/predictorf(xij) for a hidden attribute j of user i. Based on f(xij),we compute an error function E(xij) that evaluates“good-ness” of our estimation. We shall design E(xij) to have a lowvalue if:

• An adversary can guess x0m with a high accuracy (forTask 1).

• Alice breaches her friend’s privacy by publicizing x0m

(for Task 2).

For ε > 0, we declare that privacy is breached at level εif E(xij) < ε.

3.1 Deviation and Error MetricsIn this section we formulate our error function E(xij) as-

sociated with the estimator f(xij). We first define the de-viation function g(xij) = g(i, j)

g(i, j) = |xij − f(xij)| (1)

as the residue in approximating xij with f(xij).Suppose Alice knows the value of her hidden attribute

x0m ∈ 0, 1. For Task 1, we declare that x0m is breachedat level ε if it is possible to infer x0m up to an error ε basedon Alice’s local network L:

E0(m) ≡ g(0,m) = |x0m − f(x0m)| < ε. (2)

Since x0m is either 0 or 1, another interpretation of E0(m)is that an adversary can use f(x0m) to correctly guess x0m

with probability 1− E0(m).

14

Page 3: Curso- Protect Yourself From Curse of Attribute Inference

For Task 2, we say that the mth attribute is breachedat level ε due to x0m, if the deviation after publicizing x0m

is on average lower than that before publicizing x0m. LetΩm = i|xim is public denote the set of Alice’s neighborswhose mth attribute value is public. Then we have

E(m) ≡ 1

|Ωm|∑

i∈Ωm

[g(i,m)− g′(i,m)

]< ε. (3)

Here g(i,m) = g(xim) is the deviation without x0m, whileg′(i,m) ≡ g(xim|x0m) is the deviation given x0m. Withthe error functions defined as above, we now turn our atten-tion to the design of the estimator function f(xij) to inferattribute values.

4. ATTRIBUTE INFERENCEIn this section, we present three techniques for inferring

private attributes values using a user’s 1-hop neighborhoodin a social network. Before we describe our algorithms, westart by describing the social-attribute network model, anddescribe utility metrics that will be used in our algorithms.

4.1 Social-Attribute Network ModelWe adopt the notion of Social-Attribute Networks (SAN)

[12]. A SAN can be constructed by augmenting the theoriginal network L with M distinct nodes corresponding toM attributes. The original nodes corresponding to the usersare called social nodes, and the new nodes representing theattributes are called the attribute nodes. An undirected linkbetween the user i and attribute j is formed if xij is public(positive or negative). Figure 2 shows an example of SAN(from [5]).

Figure 2: Example of a simple SAN model (from [5])

The plus sign between a social node ui and and attributenode j means xij = 1, while a minus sign signifies xij = 0.The mutex links tie a set of mutually exclusive attributestogether so that no two mutually exclusive attributes areselected simultaneously.

Based on the above description of a SAN, we define a num-ber of useful sets that will be used in this section. As before,we let i to represent a user, and j an attribute.

• Il = all users connected to user/attribute l .• I+

i = all friends and positive attributes of user i.• Fz

j = all users with feature j having value z.• Gi = all friends of user i in the network.• Mi = m|xim is public.

4.2 Utility FunctionsHere, we define three utility functions as weighted sums

of common neighbors with lower weights on popular nodes.

4.2.1 Importance of a friendWe define the importance of user i′ to user i as:

u(i, i′) =1

log |Gi′ |∑

t∈I+i ∩I+

i′

1

log |It|. (4)

Note that t runs through all the common friends andattributes between i and i′. |It| is the number of usersconnected to the user/attribute t, signifying the“popularity”of t. |Gi′ | is the total number of friends associated with i′.

The logarithm log |It| is inspired by the Adamic-Adar(AA) notion defined in [7], in which popular friends andattributes are considered less significant. The multiplicativefactor 1

log |Gi′ |takes into account the local nature of our net-

work L by further reducing the significance of a social nodewith a large number of friends (e.g. celebrities). u(i, i′)quantifies the significance of user i′ to user i, and is larger ifthe i and i′ share more friends (or attributes) in common.

4.2.2 Value of an attributeDefine v(j, i) as the value of attribute j to user i:

v(j, i) =∑

t∈Ii∩Ij

1

log |I+t |. (5)

Observe that t runs through the friends of user i with (pos-itive) feature j. |I+

t | is the number of friends and positivefeatures associated with user t. As in the case of u(i, i′), wedownplay the significance of high-degree social nodes. Thisutility function is designed so that feature j is more signifi-cant to user i if more of i’s friends have feature j.

4.2.3 Power of an attributeThe power of an attribute j (having value z) to user i is:

wz(i, j) =∑

t∈Ii∩Fzj

u(i, t). (6)

Here t runs through all the friends of user i having valuez for feature j (i.e. xij = z). The power of xij = z isobtained by adding up the importance of all the friends i′sof user i, having xi′j = 1. For example, the power of “theability to speak Chinese” to Alice is computed by summingthe importance of all of her Chinese-speaking friends.

We define the relative power of attribute j to user i:

∆wij = w1(i, j)− w0(i, j). (7)

Note ∆wij > 0 if and only if xij = 1 has more power/significanceto user i than does xij = 0. ∆wij = 0 means xij = 1 pos-sesses equal importance to xij = 0.

Now we present three designs of the estimator f(xij).

4.3 Deterministic AlgorithmFor a hidden attribute xij of interest, we compute the

relative power ∆wij as defined in (7):

∆wij = w1(i, j)− w0(i, j).

Since ∆wij can be any real number, we map it onto (0, 1)to construct the estimator f(xij):

f(xij) = h(∆wij) = 1/[1 + exp(−∆wij)], (8)

where h(α) = 1/(1 + exp(−α)) is the sigmoid function. Wesay that xij = 1 is more likely than xij = 0 if xij = 1 has

15

Page 4: Curso- Protect Yourself From Curse of Attribute Inference

more power relative to xij = 0 (i.e. ∆wij > 0). Conversely,xij = 0 is more likely than xij = 1 if ∆wij < 0. In short,∆wij > 0 implies that xij = 1 is a better guess than xij = 0.

4.4 Logistic RegressionNext, we use logistic regression to model an adversary

trying to learn a sensitive attribute xim associated with useri and a given feature m. Since xim takes on binary valuesand is currently hidden, we model

Pr [xim = 1] = h(∑i′ 6=i

[u(i, i′) + v(m, i′)

]βi′) (9)

using the utility functions u(i, i′) and v(j, i′) defined in (4)and (5). To learn the coefficients βi′s, we will use regular-ized maximum likelihood approach with `1 penalty on β.Specifically, we minimize `(β) defined as

−∑

i∈Ωm

[xim log h(yim) + (1− xim) log(1− h(yim))] + λ‖β‖1.

We may use known algorithms (such as gradient methods)to solve the above optimization problem in β. Once thecoefficients β are estimated, we may construct f(xim) bysimply computing the predictive value yim

yim =∑i′ 6=i

[u(i, i′) + v(m, i′)

]βi′

and taking the sigmoid transformation:

f(xim) = Pr(xim = 1) = h(yim).

4.5 Matrix FactorizationHere we use a Bayesian model to construct the estimator

f(xij). For all public xijs, we compute the relative poweras in (7)

∆wij = w1(i, j)− w0(i, j), wz(i, j) =∑

t∈Ii∩Fzj

u(i, t),

and organize them in an (N + 1)×M array W = [Wij ]ij ∈R(N+1)×M . Observe that the matrix W has missing values.Our goal is to estimate those missing entries Wij associatedwith the hidden attribute of interest xij .

We first assume that the matrix W can be represented asthe inner product of two latent matrices D and S plus somenoise:

W = DTS + E (10)

Specifically, we assume

Wij ∼ N (dTi sj , α

−1), α ∼ gamma(a, b)

dki ∼ N (0, 1), skj ∼ N (0, λ−1)

λ ∼ gamma(c, d), K ∼ Uniform(1, ...,Kmax)

Let θ = (α, λ,K). We use the Integrated Nested LaplaceApproximation (INLA) [10] to approximate the posteriorpredictive distribution p(W ∗ij |W). First, approximate themarginal posterior p(θ|W) by

p(θ|W) ≈ p(θ|W) ∝ p(W,D,S,θ)

p(D,S|W,θ)

∣∣∣∣(D,S)=(D∗,S∗)

,

where pG is the Gaussian approximation to p(D,S|W,θ)with mode at (D∗,W∗). These modes can be approximated

by stochastic gradient descent on − log p(D,S|W,θ). Forsimplicity, we employ

pG(D,S|W,θ) =∏i

N (di|d∗i ,H−1d∗i

)×∏j

N (sj |s∗j ,G−1s∗j

)

where the precisions Hd∗iand Gs∗j

are the Hessians eval-

uated at the modes. We can use p(θ|W) to approximatep(W ∗ij |W) for all public (i, j) by approximating integral:

p(W ∗ij |W) =∫p(Wij |di, sj ,θ)p(di, sj |W,θ)

×p(θ|W)dsiddjdθ,

from which we can approximate the expectation E(W ∗ij |W ).We can then construct the estimator f(xij) by

f(xij) = h(E(W ∗ij |W )). (11)

5. EXPERIMENTS

5.1 Datasets

5.1.1 Google+ datasetThe Google+ dataset introduced in [5] contains the social

and attribute links (SAN) of roughly 5200 users collectedseparately at three different times of the year 2012. Theauthors use the education and employment profiles of thetargets to construct a vocabulary of attributes. For analy-sis, we use education and employment attribute values thatbelong to more than five users.

5.1.2 UCI Facebook dataThe Facebook sampling dataset collected at UCI [4] con-

tains the network of nearly one million unique users, theirnetwork IDs and their privacy settings. Each person canhave zero, one or multiple network IDs, and exactly four pri-vacy settings: add as friend, photo thumbnail, view friends,send message. As more than 90% of users use the defaultprivacy settings (all enabled), we pre-process the dataset tominimize the number of overlapping attributes that do notadd much information about the identity of the users. Wecan also regard nodes with an exceptionally large numberof friends as the sensitive attributes and test how well ourmodel predicts these links.

5.1.3 Duke Facebook dataWe created a new Facebook dataset corresponding to pro-

files of students at Duke University. We crawled Facebookpages of Duke students, and retrieved attributes such as gen-der, education, employment, and likes. We use employmentas our sensitive attribute.

Duke Online phonebook is a service available for all of theDuke students, staff and faculty, which returns a compre-hensive set of attributes about Duke affiliates. We use datafrom Duke phonebook as ground truth (when the Facebookprofiles of Duke students are hidden) and use it to verify thequality of detecting attribute inference using our algorithms.

Table 1 shows the summary of datasets.

Dataset Nodes Attributes Domain SizeGoogle+ 5200 School, Work 275UCI FB 984K Popular Nodes 367Duke FB 1475 Work 69

Table 1: Summary of datasets

16

Page 5: Curso- Protect Yourself From Curse of Attribute Inference

5.2 Experimental SetupIn order to test the performance of our proposed approaches,

we evaluate the prediction/inference accuracy for each ofthe three algorithms on held-out test data. Specifically, werandomly take out some public attributes (ground truth),then run our algorithms to reconstruct these values assumingthat they are hidden. For the Duke dataset, we use groundtruth from the online phonebook when available. For Task1, if M0 is the set of binary attributes on which we run at-tribute inference, the average prediction accuracy A0 iscomputed as follows:

A0 = 1− 1

|M0|∑

m∈M0

|x0m − f(x0m)|, (12)

For Task 2, we compute the improvement defined

∆B(m) = B′(m)−B(m) (13)

B(m) = 1− 1

|Ωm|∑

i∈Ωm

|xim − f(xim)| (14)

B′(m) = 1− 1

|Ω′m|∑

i∈Ω′m

|xim − f(xim|x0m)|, (15)

where: B(m) is the fraction of correctly predicted instanceswithout the knowledge of x0m, and B′(m) = B(m|x0m) isthe same metric computed after x0m is publicized.

5.3 AlgorithmsWe evaluate the following algorithms:

• Det: The deterministic method

• Log: Logistic regression based inference

• Mat: Matrix factorization using INLA

• Maj: Majority vote, for baseline

5.4 ResultsWe now present our evaluation results. Table 2 shows

the average predictions accuracy of inferring Alice’s hiddenattributes. Higher accuracy means that the estimation isin general accurate. In each of the datasets, the predictionaccuracy is averaged over 20 different users (acting as Al-ice). We can see that all algorithms have about the sameperformance on all the datasets. Table 3 shows the improve-

Method Google+ UCI FB Duke FBDet .6844 ±.1068 .7490 ±.1233 .7511±.0965Log .7635 ±.0788 .6812 ±.1381 .7186±.0611Mat .8073 ±.0917 .7249 ±.1192 .7401±.0824Maj .5082 ±.1385 .5201 ±.1305 .6257±.0717

Table 2: Average prediction accuracy

ment in prediction accuracy after publicizing a given hiddenattribute x0m for Alice.

Method Google+ UCI FB Duke FBDet .0217 ±.0079 .0091±.0038 .0419±.0064Log .0225 ±.0062 .0057±.0027 .0327±.0071Mat .0334 ±.0093 .0048±.0016 .0648±.0108Maj .0119 ±.0027 .0021±.0035 .0196±.0163

Table 3: Improvement induced by making x0m public

Figure 5: Scatter plot of degree of the user versus in-ference error for Task 1 using the Matrix algorithmfor the Duke Facebook dataset

Figure 6: Scatter plot of degree of the user versusthe number of friends with attribute for the DukeFacebook dataset

We demonstrate our results in greater detail in Figures 3and 4. Figure 3 plots on the x-axis the inference error δ, andon the y-axis the fraction of users with inference errors lessthan threshold δ for each of the three datasets used in Task1. We see, for instance, that for about 20% of the users theinference error is less than 20%, and for more than 75% ofthe users the inference error is less than 50% (that is we cando better than random guessing for more than 75% of theusers). To further investigate our algorithms, we also plot-ted the inference error versus the degree of the user for Task1 (Figure 5). We can see as the degree of the user increases,the inference error also increases. This can be explained bythe fact that users with higher degree tend to have friendswho are more diverse and thus inferring their sensitive at-tribute is harder using our algorithms. Studying whetherthis result holds fundamentally for all inference algorithmsis an interesting direction for future work.

In Figure 6, we plot the fraction of neighbors with an at-tribute value against the degree of users. As the numberof friends increases, a diverse set of attribute values are ob-served in Alice’s neighborhood. Hence, the prevalence ofthe target attribute value decreases, and attribute inferencecould give higher errors for high-degree nodes.

Figure 4 plots the fraction of users that experience ac-curacy improvement of at least δ after Alice publicizes herhidden attribute x0m.

In summary, our results indicate that even the local net-

17

Page 6: Curso- Protect Yourself From Curse of Attribute Inference

Figure 3: Fraction of users with inference errors less than threshold δ for each of the three datasets used inTask 1

Figure 4: Fraction of users that experience accuracy improvement of at least δ after Alice publicizes herhidden attribute x0m.

work can give a reasonable estimate of hidden attributes:information content in social networks are densely clustered.

6. CONCLUSIONSocial networks are vulnerable to privacy attacks. Since

users publish different parts of their profiles, adversaries arecapable of inferring their sensitive attributes by exploitingthe link structure of the social network. Since sharing even asmall seemingly-benign chunk of personal information maybe detrimental to privacy, it is important to analyze the riskof publicizing hidden attributes.

Though there has been recent interests in analyzing theprivacy risks in social networks, the current trend seems tobe on the use of large networks. However, for an end-userwith access to only his/her one-hop neighbors, using suchglobal networks is impractical. Thus we proposed three waysfor making the best use of the information locally availableto individual end-users.

Throughout the paper, we answered two question for anend-user Alice:

• Task 1: determine whether her secret can be guessedfrom her public information.

• Task 2: determine whether publicizing her hidden at-tribute would breach her friend’s privacy.

We presented three novel schemes to answer the above twoquestions. While the proposed framework is not able toprove if Alice’s profile is free of attribute inference, our re-sults indicate that in some cases, even the local network cangive a reasonable estimate of hidden attributes, and thuscan be used to warn individuals of such privacy breaches.

7. REFERENCES[1] L. Adamic and E. Adar. Friends and neighbors on the

web. Social Networks, 25:211–230, 2001.

[2] L. Backstrom, C. Dwork, and J. Kleinberg. Whereforeart thou r3579x?: anonymized social networks, hidden

patterns, and structural steganography. In WWW,2007.

[3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latentdirichlet allocation. J. Mach. Learn. Res., 3:993–1022,Mar. 2003.

[4] M. Gjoka, M. Kurant, C. T. Butts, andA. Markopoulou. Walking in facebook: a case study ofunbiased sampling of osns. In INFOCOM, 2010.

[5] N. Z. Gong, A. Talwalkar, L. W. Mackey, L. Huang,E. C. R. Shin, E. Stefanov, E. Shi, and D. Song.Predicting links and inferring attributes using asocial-attribute network (san). CoRR, abs/1112.3265,2011.

[6] R. Gross and A. Acquisti. Information revelation andprivacy in online social networks. In WPES, 2005.

[7] J. He, W. W. Chu, and Z. V. Liu. Inferring privacyinformation from social networks. In ISI, 2006.

[8] Y. Koren, R. Bell, and C. Volinsky. Matrixfactorization techniques for recommender systems.Computer, 42(8):30–37, Aug. 2009.

[9] J. Lindamood, R. Heatherly, M. Kantarcioglu, andB. Thuraisingham. Inferring private information usingsocial network data. In WWW, 2009.

[10] H. Rue, S. Martino, and N. Chopin. ApproximateBayesian inference for latent Gaussian models usingintegrated nested Laplace approximations. J. RoyalStat. Soc., Series B, 2009.

[11] R. Salakhutdinov, A. Mnih, and G. Hinton. Restrictedboltzmann machines for collaborative filtering. InICML, 2007.

[12] Z. Yin, M. Gupta, T. Weninger, and J. Han. Linkrec:a unified framework for link recommendation withuser attributes and graph structure. In WWW, 2010.

[13] E. Zheleva and L. Getoor. To join or not to join: theillusion of privacy in social networks with mixedpublic and private user profiles. In WWW, 2009.

18