Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department...

28
Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University http:// networks.cs.northwestern.edu

Transcript of Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department...

Page 1: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

Mosaic: Quantifying Privacy Leakage in Mobile Networks

Aleksandar Kuzmanovic

EECS DepartmentNorthwestern University

http://networks.cs.northwestern.edu

Page 2: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

Buy 1 Get 1 Free

Mosaic– Joint work with

• Ning Xia (Northwestern University)• Han Hee Song (Narus Inc.)• Yong Liao (Narus Inc.)• Marios Iliofotou (Narus Inc.)• Antonio Nucci (Narus Inc.)• Zhi-Li Zhang (University of Minnesota)

Synthoid– Joint work with

• Marcel Flores (Northwestern University)

2

Page 3: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

3

Scenario

IP1

ISP A … IPK IP1

CSP B … IPK

Different services

Dynamic IP,CSP/ISP

Different devices

Different “footprints”

Page 4: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

4

Problem

Other research

work

We are here!Tessellation Mosaic

IP1

ISP A

… IPK

IP1

CSP B

… IPK

Input packet traces

How much private information can be obtained and expanded about end users by monitoring network traffic?

Page 5: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

5

Motivation

IP1

ISP A

… IPK

IP1

CSP B

… IPK

I will know everything about everyone!

Agencies

Bad guys

Mobile Traffic:• Relevant: more personal information• Challenging: frequent IP changes

Page 6: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

6

• How to track users when they hop over different IPs?

Challenges

With Traffic Markers, it is possible to

connect the users’ true identities to their sessions.

Sessions:Flows(5-tuple) are grouped into sessions

timeIP3

timeIP2

timeIP1Traffic Markers:Identifiers in the traffic that can be used to differentiate users

Page 7: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

7

Datasets

• 3h-Dataset: main dataset for most experiments• 9h-Dataset: for quantifying privacy leakage• Ground Truth Dataset: for evaluation of session

attribution• RADIUS: provide session owners

Dataset Source Description

3h-Dataset CSP-A Complete payload

9h-Dataset CSP-A Only HTTP headers

Ground Truth Dataset CSP-B Payload & RADUIS info.

Page 8: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

Methodology Overview

TessellationIP1

ISP A

… IPK

IP1

CSP B

… IPK

Mosaic construction

Network data analysis

Public web crawling

Mapping fromsessions to users

Traffic attribution

Viatraffic markers

Via activity fingerprinting

Combine information from both network data and OSN profiles to infer the user mosaic.

8

Page 9: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

9

Traffic Attribution via Traffic Markers

Traffic Markers:

• Identifiers in the traffic to differentiate users• Key/value pairs from HTTP header• User IDs, device IDs or sessions IDs

Domain Keywords Category Source

osn1.com c_user=<OSN1_ID> OSN User ID Cookies

osn2.com oauth_token=<OSN2_ID>-## OSN User ID HTTP header

admob.com X-Admob-ISU Advertising HTTP header

pandora.com user_id User ID Cookies

google.com sid Session ID Cookies

How can we select and evaluate traffic markers from network data?

Page 10: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

10

Traffic Attribution via Traffic Markers

OSN IDs as Anchors:• The most popular user identifiers among all services• Linked to user public profiles

OSN IDs can be used as anchors, but their coverage on sessions is too small

OSN Source Session Coverage

OSN1 ID HTTP URL and cookies 1.3%

OSN2 ID HTTP header 1.0%

Top 2 OSN providers from North America

Only 2.3% sessions contain OSN IDs

Page 11: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

Block Generation: Group Sessions into Blocks

11

Traffic Attribution via Traffic Markers

99K session blocks generated from the 12M sessions

Session interval δ • Depends on the CSP• δ=60 seconds in our

study

Other sessions?OSN ID

≥δ

IP 1

Block• Session group on the

same IP from the same user

• Traffic markers shared by the same block

timeIP 1IP

timeIP

Block

Page 12: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

Culling the Traffic Markers: OSN IDs are not enough

12

Traffic Attribution via Traffic Markers

• Uniqueness: Can the traffic marker differentiate between users?• Persistency: How long does a traffic marker remain the same?

Traffic markers

mobclix.com

#uid

OS

N1 ID

pandora.com#user_id

mobclix.com

#u

mydas.m

obi#m

ac-id

google.com#sid

craigslist.org#cl_b

Persistency ~= 1The value of Google.com#sid remains the same for the same user nearly all the observation duration

Uniqueness = 1No two users will share the same google.com#sid value

We pick 625 traffic markers with uniqueness = 1, persistency > 0.9

Page 13: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

Traffic Attribution: Connecting the Dots

13

Traffic Attribution via Traffic Markers

IP 1

IP 2

IP 3

Same OSN ID

Same traffic marker

Traffic markers are the key in attributing sessions to the same user over different IP addresses

Tessellation User Ti

( )

Page 14: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

14

Traffic Attribution via Activity Fingerprinting

• What if a session block has no traffic markers?

Assumption (Activity Fingerprinting):• Users can be identified from the DNS names of their

favorite services

DNS names:• Extracted 54,000 distinct DNS names• Classified into 21 classes

Serviceclasses

Serviceproviders

Search bing, google, yahoo

Chat skype, mtalk.googl.com

Dating plentyoffish, date

E-commerce amazon, ebay

Email google, hotmail, yahoo

News msnbc, ew, cnn

Picture Flickr, picasa

… …

Activity Fingerprinting:• Favorite (top-k) DNS names as the

user’s “fingerprint”

Page 15: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

15

Traffic Attribution via Activity Fingerprinting

Y-axis:closer to 1, more distinct

the fingerprint is

Mobile users can be identified by the DNS names from their preferred services

Normalized DNS names

• Fi : Top k DNS names from user as “activity fingerprint”• : Uniqueness of the fingerprint

X-axis:normalized by the total number of DNS names

Page 16: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

16

Traffic Attribution Evaluation

Ri

RADIUS user(Ground Truth)

Session

Ri Rj

Ti

Correct (Not complete)

Tj

Not correct

Coverage = -----------------------

identified sessions/users

total sessions/users

= -------------------

correctly identified sessions/users

total identified sessions/users

Accuracy on Covered Set

TiTessellation user

(Correct?)

Page 17: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

17

Traffic Attribution Evaluation

• Evaluation Results

Session

User

Co

vera

ge

78.60%

69.00%

49.80%

43.20%

2.40%

15.70%

OSN ID extraction Via traffic markers Via activity fingerprinting

Session

User

Accu

racy o

n C

ove

red

Se

t

92.50%

96.40%

94.50%

99.30%

100.00%

100.00%

OSN ID extraction Via traffic markers Via activity fingerprinting

Page 18: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

• Mosaic of Real-World User Alice

Example MOSAIC with 12 information classes(tesserae):• Information (Education, affiliation and etc.) from OSN profiles• Information (Locations, devices and etc.) from user sessions

18

Construction of User Mosaic

Least gain

Most gain

Sub-classes:Residence, coordinates, city, state, and etc.

Page 19: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

• Leakage from OSN profiles vs. from Network Data

19

Information from OSN profiles and network data complement and corroborate each other

Analysis on network data provides real-time activities and locations

OSN profiles provide static user information (education, interests)

Quantifying Privacy Leakage

Information from both sides can corroborate to

each other

Page 20: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

• Traffic markers (OSN IDs and etc.) should be limited and encrypted

Preventing User Privacy Leakage20

Protect traffic markers

Restrict3rd parties

• Third party applications/developers should be strongly regulated

Page 21: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

Beyond Traffic Encryption

Trackers and Information Aggregators

21

Page 22: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

Current Approaches (in the Advertising Domain)

Block or disrupt ad interaction– May disrupt regular site operation

Privacy preserving infrastructures– Requires participation of ad networks

Do Not Track, Opt-out mechanisms– Requires trust in ad networks

22

Page 23: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

Synthoid

Endpoint User Profile Control

– User explicitly defines who he wants to be online, i.e., his online profile

– Synthoid imprints this profile into all possible trackers and information aggregators

23

Page 24: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

Ad Network Synthoid

24

Page 25: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

Synthoid Performance

Volume of Synthoid traffic varied 1% -- 100%

25

It is possible to completely alter the user profile with small amount of artificial traffic

Page 26: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

• Prevalence in the use of OSNs leaves users’ true identities available in the network• Significant portions of flows can be attributed to

users, even without any direct identity leaks

• Ubiquitous encryption is not likely to take place, plus it does not solve the problem

• Our endpoint user profile control lets users explicitly define their online profiles• Works for all possible trackers at once• Requires no changes or consent from trackers

• Currently developing Synthoid for search, mobile, information aggregators, price discriminators, etc.

26

Conclusions

Page 27: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

Thanks!

Questions?

27

http://networks.cs.northwestern.edu

Page 28: Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University .

28