Crowdsourcing privacy: Bay Area Talks

57
Crowdsourcing Privacy Preferences through Time and Space Eran Toch Bay Area Talks, June 2013

description

This talk summarize several works that try to create new ways to help users manage their privacy in online social networks, and in other privacy-sensitive applications.

Transcript of Crowdsourcing privacy: Bay Area Talks

Page 1: Crowdsourcing privacy: Bay Area Talks

Crowdsourcing Privacy Preferences through Time and Space

Eran Toch

Bay Area Talks, June 2013

Page 2: Crowdsourcing privacy: Bay Area Talks

2

Page 3: Crowdsourcing privacy: Bay Area Talks

Privacy: an Old Idea

3

Privacy in the year 500 BC

Page 4: Crowdsourcing privacy: Bay Area Talks

4

Mishnah, Baba Batra 2a-b (500 BC - 200 CE) rules:"A person should not open a window to a common yard... A person would not open to a common yard a door against door and a window against window."

The Rashbam (1080 – 1160 CE) defines the Hezek Re’iya (“the damage of being seen)

The Talmud is contextualizing the text (Bible, Numbers 24-10):“and whose eyes are opened: How beautiful are your tents, O Jacob, your dwelling places, O Israel.”

“What did Balaam see? He saw that their tents openings were not directed at each other, and said: they deserve that the divine spirit will be on them.”

Page 5: Crowdsourcing privacy: Bay Area Talks

Houses and Systems

Privacy challenges in the design of people's houses is not that different that challenges in online privacy of information-sharing systems.

Page 6: Crowdsourcing privacy: Bay Area Talks

Social Networks

Information Sharing Systems

6

Multi-User Cloud Services Crowdsourcing

Apps

Enterprise Interoperability

Page 7: Crowdsourcing privacy: Bay Area Talks

The Architecture of Privacy

7

System

User

Information Sharing Systems

System

User

Traditional Systems

Page 8: Crowdsourcing privacy: Bay Area Talks

What can go wrong?

Page 9: Crowdsourcing privacy: Bay Area Talks
Page 10: Crowdsourcing privacy: Bay Area Talks

Expressing Preferences

10

Users find it challenging to understand the privacy landscape, to understand preferences, and to express them.

Page 11: Crowdsourcing privacy: Bay Area Talks

Agility and Changes

11

Sharing preferences will change over time, while the content is still accessible.

Page 12: Crowdsourcing privacy: Bay Area Talks

Where Should Users Start?

12

Page 13: Crowdsourcing privacy: Bay Area Talks

Agenda

‣ Crowdsourcing preferences.

‣ Predictions and defaults.

‣ Modeling longitudinal privacy.

‣ The future: design and theory.

Page 14: Crowdsourcing privacy: Bay Area Talks

Crowdsourcing Privacy Preferences

14

Eran Toch, Crowdsourcing Privacy Management in Context-Aware Applications, Personal and Ubiquitous Computing, 2013.

Page 15: Crowdsourcing privacy: Bay Area Talks

Crowdsourcing and Privacy

1.Modeling user behavior by analyzing a crowd of users.

2. Identifying meaningful context and recognizing personal differences.

3. Building mechanisms that can help individuals manage preferences.

15

Page 16: Crowdsourcing privacy: Bay Area Talks

The Question

16

‣ Can we model and predict users’ preferences for a given scenario?

‣ In our case, a combination of the location reported and the target of the information.

‣ Can we build mechanisms that help users manage their access control using the predictions?

Page 17: Crowdsourcing privacy: Bay Area Talks

Crowdsourcing Privacy Preferences

17

Aggregator

Preference

Application

Collecting preferences and their underlying context

ModelerBuilding a model for the preference

according to a context

Personalizer Personalizing the model for a specific, given user

Using the preference model in a specific application

Preference

PreferencePreference

Preference

Preference

Page 18: Crowdsourcing privacy: Bay Area Talks

Our User Study‣ 30 Users, 2 weeks.

‣ Smart-Spaces: Tracking locations and activities.

‣ Participants were surveyed three times a day.

‣ Asked about their willingness to share their location on a Likert scale.

18

Page 19: Crowdsourcing privacy: Bay Area Talks

19

Meta-Data Survey

Page 20: Crowdsourcing privacy: Bay Area Talks

Place Discrimination

20

21 3 4 5

Less likely to share More likely to share

Some places are considered

private

Some places are shared by almost

everybody

Page 21: Crowdsourcing privacy: Bay Area Talks

Distribution by Semantics

21

Wilingness to Share Location (0-Low, 4-High)

density

0.0

0.1

0.2

0.3

0.4

0.5

Home

1 2 3 4

None

1 2 3 4

Public

1 2 3 4

Transit

1 2 3 4

Travel

1 2 3 4

Work

1 2 3 4

SemanticsHome

None

Public

Transit

Travel

Work

Page 22: Crowdsourcing privacy: Bay Area Talks

Regression Model

22

‣ Predictions for a user u regarding a place k, are learned linearly.

‣ Easy to model and to compute

‣ Provides insight into variability

Prediction by place

Prediction by semantics

Personal tendency

Page 23: Crowdsourcing privacy: Bay Area Talks

Individual Differences‣ People can be

categorized according to their privacy approach:

‣ Privacy "Unconcerned"

‣ Privacy "Pragmatic"

‣ Privacy "Fundamentalist"

23

university

everybody

1.0

1.5

2.0

2.5

3.0

3.5

4.0

1.5 2.0 2.5 3.0 3.5 4.0

factor(fit.cluster)

1

2

3

Page 24: Crowdsourcing privacy: Bay Area Talks

Prediction Methods

24

Prediction Method

sq

ua

red

err

or

0.0

0.2

0.4

0.6

0.8

1.0

(a)Simple (b)Semantic (c)Semanic/Biased

Predicting by place

Predicting by semantics

Semantics and personal tendency

Higher error rate

Page 25: Crowdsourcing privacy: Bay Area Talks

Applications of Crowdsourcing

25

Page 26: Crowdsourcing privacy: Bay Area Talks

26

Applications

‣ Aggregating user models for:‣ Automatic decision making.

‣ Preference prediction for semi-manual suggestions.

‣ Defaults for new users.

‣ Providing possibilities for rule modifications.

Page 27: Crowdsourcing privacy: Bay Area Talks

27

Decision Engine

Location

Location

Disclose | Deny | Manual

Location

Requests

Simulated Access Control

Page 28: Crowdsourcing privacy: Bay Area Talks

Access Control Performance

28

threshold

accuracy

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

strategyA

M

SM

If we simulate an access control mechanism, we reach 80% accuracy in predicting what a user would do, and 93% accuracy when allowing the worst predictions to be examined by the user.

AutomationAccuracy

Page 29: Crowdsourcing privacy: Bay Area Talks

29

The Price of PrivacyHow much should businesses pay for a location?

25

Omer Barak, Gabriella Cohen, Alla Gazit and Eran Toch. The Price Is Right? Economic Value of Location Sharing, Submitted to MCSS: Workshop on Mobile Systems for Computational Social Science, Ubicomp 2013

Page 30: Crowdsourcing privacy: Bay Area Talks

30

Creating Defaults

People have a tendency to stick to the defaults:

‣ Organ donation choices

‣ Access control policies

‣ Enterprise calendars (L Palen, 1999)

Johnson, Eric J. and Goldstein, Daniel G., Do Defaults Save Lives? (Nov 21, 2003). Science, Vol. 302, 2003.

Page 31: Crowdsourcing privacy: Bay Area Talks

31

Can we create defaults that would reflect the decision space of existing users?

Selecting Defaults

Eran Toch, Norman M. Sadeh, Jason I. Hong: Generating default privacy policies for online social networks. CHI Extended Abstracts 2010: 4243-4248

Page 32: Crowdsourcing privacy: Bay Area Talks

Clustering Defaults

32

Policy a: location within the campus

Policy b: location outside of the campus

Page 33: Crowdsourcing privacy: Bay Area Talks

Generalizing Defaults

33

Developing clustering methods that map a large set of preferences into a manageable number of default preferences.

Ron Hirschprung, Eran Toch, and Oded Maimon. Evaluating Bi-Directional Data Agent Applicability and Design in Cloud Computing Environment, submitted to Information Systems Research

Page 34: Crowdsourcing privacy: Bay Area Talks

34

Testing the DefaultsUsing two user studies (n=499) we have evaluated the performance of the clustering methods.

Page 35: Crowdsourcing privacy: Bay Area Talks

Privacy through Time

35

Oshrat Rave-Ayalon and Eran Toch. Retrospective Privacy: Managing Longitudinal Privacy in Online Social Net- works, accepted to the Symposium on Usable Privacy and Security (SOUPS), 2013

Page 36: Crowdsourcing privacy: Bay Area Talks

36

Privacy through Time‣ Digital information is

almost never erased.

‣ With search engines and timelines, it becomes more and more accessible.

‣ What are the consequences for privacy?

Page 37: Crowdsourcing privacy: Bay Area Talks

A Note About Forgetting

37

In the past, forgetting was the default. Now, it's the other way around.

“Forgetting thus affords us a second chance, individually and as a society, to rise above our past mistakes and misdeeds, to accept that humans change over time.”

Page 38: Crowdsourcing privacy: Bay Area Talks

The Question‣ How information aging impacts users’ sharing

preferences in Online Social Networks?

‣ Providing a quantitative model and proof for longitudinal privacy.

38

t0 tm1

Publication Time 1 month 1 year

ty1

Anticipation privacy

preferences

Retrospective privacy

Page 39: Crowdsourcing privacy: Bay Area Talks

The Question - cont’d‣ Guiding the design of mechanisms longitudinal

privacy:

‣ Retrospective mechanisms.

‣ Future-facing mechanisms.

39

Page 40: Crowdsourcing privacy: Bay Area Talks

40

‣ A within-subject user study (n=193)

‣ Between-subject user study (n=298)

‣ Analyzing differences between users, randomly assigned to four conditions: ‣ 0-1 years‣ 1-2 years‣ 2+ years‣ Control: 0-2+ years

‣ Using a custom FB application.

Our Studies

Page 41: Crowdsourcing privacy: Bay Area Talks

41

Willingness to Share Over Time

Willingness to share decreases with time (Spearman correlation test, ρ = -0.21, p < 0.0001)

Page 42: Crowdsourcing privacy: Bay Area Talks

Growth of Variability

42

The variability of sharing preferences grows considerably after 1 month (Levene test for variance, F = 10.69, p < 0.00001).

Page 43: Crowdsourcing privacy: Bay Area Talks

Time and Self Representation

43

A post is becoming less and less representative of the user over time (Kruskal-Wallis , p < 0.0001)

Page 44: Crowdsourcing privacy: Bay Area Talks

The Decay of Content

Irrelevancy is the major reason for considering to hide the post (61% of the cases in which users declare they wish to hide the post)

44

offend

inappropriate

other

change

irrelevant

0 50 100 150 200Count

Rea

sons

for h

idin

g th

e po

st fr

om fr

iend

s

Page 45: Crowdsourcing privacy: Bay Area Talks

Life Changes

45

Some life changes decrease the willingness to share (p < 0.05, Kruskal-Wallis test)

Page 46: Crowdsourcing privacy: Bay Area Talks

46

Expiry Date for Information

A default expiration time of 1.5 years

Page 47: Crowdsourcing privacy: Bay Area Talks

Retrospective Mechanisms

47

Page 48: Crowdsourcing privacy: Bay Area Talks

What an Engineer Would Do?

48

Page 49: Crowdsourcing privacy: Bay Area Talks

Privacy-by-Design

‣ What should an engineer do?

‣ Classic PbD solutions such as k-anonymity, and Differential Privacy are partial.

‣ A new engineering toolbox is needed for Info-Share systems:

‣ Privacy-by-Design: Identifying Gaps and Overcoming Barriers in the Intersection of Law and Engineering,

‣ An Israel Science Foundation Project, with Prof. Michael Birnhack (TAU) and Dr. Irit Hadar (Haifa).

49

Page 50: Crowdsourcing privacy: Bay Area Talks

The Info-Share Toolbox

50

Policy-based Architecture-based

Priv

acy

Gua

rant

ee

An extension of Langheinrich's Privacy-by-Design framework

Data Minimization

ChoiceNotice and

NudgeCoaching

Recourse

Page 51: Crowdsourcing privacy: Bay Area Talks

‣ Ongoing project: Privacy-Peer-Pressure

‣ Empowering users to coach their friends to privacy behavior.

‣ Allowing users to copy their friends, learn from their friends, and spread good ideas around.

51

Coaching

Page 52: Crowdsourcing privacy: Bay Area Talks

52

Patrick Gage Kelley et al., Privacy as Part of the App Decision-Making Process. CHI 2013.

7/19/09 10:24 PMprivacy policy

Page 1 of 2http://cups.cs.cmu.edu/privacylabel-05-2009/a5.php

contact informationContact information may include name, address,phone number, email address, or other online orphysical contact information.

cookiesCookies or mechanisms that perform similar

provide service and maintain siteCollecting information to provide the service yourequested, to customize the site for your currentvisit, to perform web site and system maintenance,or to enhance, evaluate, or otherwise review thesite, but without connecting any information to you.

public forums

Access to your informationThis site gives you access to your contact data and someof its other data identified with you

How to resolve privacy-related disputes with this sitePlease email our customer service department

acme.com5000 Forbes AvenuePittsburgh, PA 15213 UnitedStatesPhone: [email protected]

we will collect and use yourinformation in this way

by default, we will collect and useyour information in this way unlessyou tell us not to by opting out

we will not collect and use yourinformation in this way

by default, we will not collect anduse your information in this wayunless you allow us to by opting in

The Acme Policy

types ofinformationthis sitecollects

how we use your informationwho we share yourinformation with

marketing telemarketing profilingothercompanies

publicforums

contactinformation

cookies

preferences

purchasinginformation

social securitynumber &

gov't ID

your activity onthis site

Information not collected or used by this site: demographic, financial,health, location.

Definitions h"p://cups.cs.cmu.edu/privacyLabel

Notice

Page 53: Crowdsourcing privacy: Bay Area Talks

Nudging

53Wang, Yang, et al. "Privacy nudges for social media: an exploratory Facebook study." WWW’13, 2013.

Page 54: Crowdsourcing privacy: Bay Area Talks

54

Choice

Page 55: Crowdsourcing privacy: Bay Area Talks

Theory‣ We need effective theory to think about

privacy

‣ Softer models of privacy: privacy by obscurity (Hartzog and Stutzman, 2012).

‣ Behavioral economic approaches (Acquisti’s work).

‣ Models that analyze privacy within a social context (e.g., social capital and privacy, Ellison et al., 2011).

‣ Models that analyze the relations between code and norms.

55

Page 56: Crowdsourcing privacy: Bay Area Talks

A New (Old) Metaphor

Page 57: Crowdsourcing privacy: Bay Area Talks

57

Acknowledgments‣Israel Science Foundation

‣Israel Ministry of Science

‣Israel Cyber Bureau

‣Tel Aviv University

http://toch.tau.ac.il/

[email protected]