Like a Pack of Wolves: Community Structure of Web Trackers

32
Like a Pack of Wolves: Community Structure of Web Trackers V. Kalavri, [email protected] (KTH Royal Institute of Technology) J. Blackburn, M. Varvello, K. Papagiannaki (Telefonica Research) Passive and Active Measurements Conference 31 March - 1 April 2016, Heraklion, Crete, Greece

Transcript of Like a Pack of Wolves: Community Structure of Web Trackers

Page 1: Like a Pack of Wolves: Community Structure of Web Trackers

Like a Pack of Wolves:Community Structure of Web Trackers

V. Kalavri, [email protected] (KTH Royal Institute of Technology)J. Blackburn, M. Varvello, K. Papagiannaki (Telefonica Research)

Passive and Active Measurements Conference31 March - 1 April 2016, Heraklion, Crete, Greece

Page 2: Like a Pack of Wolves: Community Structure of Web Trackers

Ads

Recommendations

Browsing the Web

2

Page 3: Like a Pack of Wolves: Community Structure of Web Trackers

Tracker

Tracker

Ad Server

display relevant ads

cookie exchange

profiling

Tracking

3

Page 4: Like a Pack of Wolves: Community Structure of Web Trackers

4

The study's authors defined "creepiness" by the feeling consumers get when they sense an ad is too personal because it uses data the consumer did not agree to provide, such as online-search and browsing history. Consumers are even more creeped out by this because they don't know how and where that information will be used.

Page 5: Like a Pack of Wolves: Community Structure of Web Trackers

5

Page 6: Like a Pack of Wolves: Community Structure of Web Trackers

Can’t we block them?

proxy

Tracker

Tracker

Ad Server

6

Legitimate site

Page 7: Like a Pack of Wolves: Community Structure of Web Trackers

● not frequently updated● not sure who or based on what criteria URLs are

blacklisted● miss “hidden” trackers or dual-role nodes● blocking requires manual matching against the list● can you buy your way into the whitelist?

Available Solutions

AdBlock, DoNotTrack, EasyPrivacy:

crowd-sourced “black lists” of tracker URLs

7

Page 8: Like a Pack of Wolves: Community Structure of Web Trackers

8

Page 9: Like a Pack of Wolves: Community Structure of Web Trackers

Towards Automatic Tracker Detection

Exploit fundamental properties of web tracker operation to automate tracker detection

● Structural attributes: network positions, connections● Operational aspects: data exchanged, communication

patterns

9

Page 10: Like a Pack of Wolves: Community Structure of Web Trackers

DataSet

6 months(Nov 2014 - April 2015)of augmented Apache logs from a web proxy

● 80m requests● 2m distinct URLs● 3k users

10

● User identification● URL requested● Headers● Performance

information, i.e. latency, bytes

● Tagged as Trackers or non-Trackers with EasyPrivacy

Page 11: Like a Pack of Wolves: Community Structure of Web Trackers

Web Tracking as a Graph Problem

11

facebook.com

youtube.com

google-analytics.com

b.scorecardresearch.com

V: hostsU: Referers

Referer-Hosts Graph

U: URLs visited by the user

V: embedded URLs

Page 12: Like a Pack of Wolves: Community Structure of Web Trackers

Referer-Hosts Graph: Connected Components

12

94% of all trackers belong to the same connected component!

Page 13: Like a Pack of Wolves: Community Structure of Web Trackers

Communities in Graphs

13

Vertices in the same community are likely to be similar with respect to network position and connectivity

Do trackers form communities?

Densely connectedinternally

Sparsely connectedwith each other

Page 14: Like a Pack of Wolves: Community Structure of Web Trackers

h2

h3 h4

h5 h6

h8

h7

h1

h3

h4

h5

h6

h1

h2

h7

h8

r1

r2

r3

r5

r6

r7

NT

NT

T

T

?

T

NT

NT

r4

referer-hosts graph

r1

r2r3

r3 r3 r4

r5r6

r7

hosts-projection graph

: referer: non-tracker host: tracker host: unlabeled host

The Hosts-Projection Graph

14

Page 15: Like a Pack of Wolves: Community Structure of Web Trackers

Hosts-Projection Graph: Degrees

15

#unique referers that tracker / other host are embedded within

Page 16: Like a Pack of Wolves: Community Structure of Web Trackers

Hosts-Projection Graph: Tracker Neighbors

16

Trackers are mainly connected to other Trackers

Page 17: Like a Pack of Wolves: Community Structure of Web Trackers

Web Tracker Communities

17

Popular trackers, e.g. google-analytics

Smaller trackers

Ad servers

Normal webpages

Page 18: Like a Pack of Wolves: Community Structure of Web Trackers

Data Pipeline

raw logs cleaned logs

1: logs pre-processing

2: bipartite graph creation

3: largest connected component extraction

4: hosts-projection graph

creation

5: community detection

google-analytics.com: Tbscored-research.com: Tfacebook.com: NTgithub.com: NTcdn.cxense.com: NT...

6: results

18

Page 19: Like a Pack of Wolves: Community Structure of Web Trackers

h5

h7 h8 h3 h4 h6

h2

h3 h4

h5 h6

h8

h7

h1

Classification via Neighborhood Analysis

19

: non-tracker host: tracker host: unlabeled host

⅖ non-tracker neighbors⅗ tracker neighbors

if % of tracker neighbors > threshold=> classify as tracker

Page 20: Like a Pack of Wolves: Community Structure of Web Trackers

Results

20

Page 21: Like a Pack of Wolves: Community Structure of Web Trackers

Classification via Label Propagation

non-tracker

tracker

unlabeled

Iterative Algorithm forCommunity Detection

● Vertices propagate their labels to their neighbors and adopt the most popular label in their neighborhood.

● Upon convergence, vertices with the same label belong to the same community.

● If an unlabeled node ends up in a trackers community, it is classified as a tracker

Page 22: Like a Pack of Wolves: Community Structure of Web Trackers

Classification via Label Propagation

2

3 4

5 6

8

7

1

i=0

Page 23: Like a Pack of Wolves: Community Structure of Web Trackers

Classification via Label Propagation

2

4

5 6

8

7

1

i=1

{2} {1, 3}

{2, 4, 5} {3, 5, 6}

{4, 5}{3, 4, 6, 7}{5, 8}

{7}

3

5 6

7 6

8

8

2

3

Page 24: Like a Pack of Wolves: Community Structure of Web Trackers

Classification via Label Propagation

3

5 6

7 6

8

8

2

i=2

5

7 7

6 7

8

8

3{3} {2, 5}

{3, 6, 7} {5, 6, 7}

{6, 7}{5, 6, 6, 8}{7, 8}

{8}

Page 25: Like a Pack of Wolves: Community Structure of Web Trackers

Classification via Label Propagation

5

7 7

6 7

8

8

3

i=3

7

7 7

7 7

8

8

5{5} {3, 7}

{5, 6, 7} {6, 7, 7}

{6, 7}{7, 7, 7, 8}{6, 8}

{8}

Page 26: Like a Pack of Wolves: Community Structure of Web Trackers

Classification via Label Propagation

7

7 7

7 7

8

8

5

i=4

7

7 7

7 7

8

8

7

{7} {5, 7}

{7, 7, 7} {7, 7, 7}

{7, 7}{7, 7, 7, 8}{7, 8}

{8}

Page 27: Like a Pack of Wolves: Community Structure of Web Trackers

Classification via Label Propagation

7

7 7

7 7

8

8

7 7

7 7

7 7

8

8

7

Page 28: Like a Pack of Wolves: Community Structure of Web Trackers

Results

28

Page 29: Like a Pack of Wolves: Community Structure of Web Trackers

Conclusions

● Web trackers are well-connected with each other○ 94% of web trackers are in the same connected component

● Web trackers are mainly connected to other trackers○ High clustering, tight communities

● 97% classification accuracy and < 2% FPR with simple methods○ Can be used to build robust and fully automated privacy preservation

systems

29

Page 30: Like a Pack of Wolves: Community Structure of Web Trackers

Like a Pack of Wolves:Community Structure of Web Trakcers

V. Kalavri, [email protected] (KTH Royal Institute of Technology)J. Blackburn, M. Varvello, K. Papagiannaki (Telefonica Research)

Passive and Active Measurements Conference31 March - 1 April 2016, Heraklion, Crete, Greece

Page 31: Like a Pack of Wolves: Community Structure of Web Trackers

Extra Slides

Page 32: Like a Pack of Wolves: Community Structure of Web Trackers

Referer-Hosts Graph: Degrees

32

#unique referers that tracker / other hosts are embedded within