Diffusion of User Tracking Data in the Online Advertising ... · Diffusion of User Tracking Data in...
Transcript of Diffusion of User Tracking Data in the Online Advertising ... · Diffusion of User Tracking Data in...
Diffusion of User Tracking Data in the Online Advertising Ecosystem
Muhammad Ahmad Bashir and Christo Wilson
Ads Under Real Time Bidding (RTB)
!3
Ad SpaceContent
User
1Supply Side Platforms
(SSPs)Ad Exchanges
2 3
Ads Under Real Time Bidding (RTB)
!3
Ad SpaceContent
User
1Supply Side Platforms
(SSPs)Ad Exchanges
2 3
4
Demand Side Platforms (DSPs)
Ads Under Real Time Bidding (RTB)
!3
Ad SpaceContent
User
1Supply Side Platforms
(SSPs)Ad Exchanges
2 3
4
Demand Side Platforms (DSPs)
Advertisers
5
Ads Under Real Time Bidding (RTB)
!3
Ad SpaceContent
User
1Supply Side Platforms
(SSPs)Ad Exchanges
2 3
4
Demand Side Platforms (DSPs)
6
Advertisers
5
Ads Under Real Time Bidding (RTB)
!3
Ad SpaceContent
User
1Supply Side Platforms
(SSPs)Ad Exchanges
2 3
4
Demand Side Platforms (DSPs)
6
Advertisers
5
Although RightMedia does not win the auction, it still sees the impression
Ads Under Real Time Bidding (RTB)
!3
Ad SpaceContent
User
1Supply Side Platforms
(SSPs)Ad Exchanges
2 3
4
Demand Side Platforms (DSPs)
6
Advertisers
5
Although RightMedia does not win the auction, it still sees the impression
Ads Under Real Time Bidding (RTB)
!3
Ad SpaceContent
User
1Supply Side Platforms
(SSPs)Ad Exchanges
2 3
4
Demand Side Platforms (DSPs)
6
Advertisers
5
Although RightMedia does not win the auction, it still sees the impression
Goal of the Study
• We want to take Real Time Bidding (RTB) into account• This goes beyond the current techniques being used
!5
Model the Diffusion of Impressions in the Advertising Ecosystem
Goal of the Study
• We want to take Real Time Bidding (RTB) into account• This goes beyond the current techniques being used
• We want to answer the following:1. What fraction of user impressions are viewed by ad companies?2. How much ad and tracker blocking extensions help?
!5
Model the Diffusion of Impressions in the Advertising Ecosystem
Goal of the Study
• We want to take Real Time Bidding (RTB) into account• This goes beyond the current techniques being used
• We want to answer the following:1. What fraction of user impressions are viewed by ad companies?2. How much ad and tracker blocking extensions help?
!5
Model the Diffusion of Impressions in the Advertising Ecosystem
Key Terms:
1. Impressions: Page Visits2. Publishers: First party websites visited by users (e.g. cnn, bbc, espn) 3. A&A: Advertising and Analytics related companies / domains
Dataset Used
!71. Bashir et al. Tracing Information Flows Between Ad Exchanges Using Retargeted Ads. Usenix Security 2016
Dataset Used• We use data from our prior work.1
• We trained personas using 738 popular e-commerce websites and solicit retargeted ads through 150 top publishers.
!71. Bashir et al. Tracing Information Flows Between Ad Exchanges Using Retargeted Ads. Usenix Security 2016
Dataset Used• We use data from our prior work.1
• We trained personas using 738 popular e-commerce websites and solicit retargeted ads through 150 top publishers.
• Using this data, we have inclusion chains of all resources
!71. Bashir et al. Tracing Information Flows Between Ad Exchanges Using Retargeted Ads. Usenix Security 2016
Dataset Used• We use data from our prior work.1
• We trained personas using 738 popular e-commerce websites and solicit retargeted ads through 150 top publishers.
• Using this data, we have inclusion chains of all resources
!71. Bashir et al. Tracing Information Flows Between Ad Exchanges Using Retargeted Ads. Usenix Security 2016
Example Inclusion Chain
From Chains to Graph
!8
Publishers A&A (Advertising and Analytics)
Graph RepresentationInclusion Chains
From Chains to Graph
!8
Publishers A&A (Advertising and Analytics)
Graph RepresentationInclusion Chains
From Chains to Graph
!8
1
1Publishers A&A (Advertising and Analytics)
Graph Representation
1
Inclusion Chains
From Chains to Graph
!8
1
1Publishers A&A (Advertising and Analytics)
Graph Representation
1
Inclusion Chains
From Chains to Graph
!8
1 1
1 1Publishers A&A (Advertising and Analytics)
Graph Representation
1
Inclusion Chains
From Chains to Graph
!8
1 1
1 1
2
Publishers A&A (Advertising and Analytics)
Graph RepresentationInclusion Chains
From Chains to Graph
!8
1 1
1 1
2
Publishers A&A (Advertising and Analytics)
Graph Representation
Nodes: Publishers or A&A domains Edges: Publisher —> A&A A&A —> A&A
Inclusion Chains
From Chains to Graph
!8
1 1
1 1
2
Publishers A&A (Advertising and Analytics)
Graph Representation
Nodes: Publishers or A&A domains Edges: Publisher —> A&A A&A —> A&A
Inclusion Chains
Nodes:
• Total: ~1.9K
• A&A: ~1K Edges:
• Total: ~26K
• Pub —> A&A: ~10.5K
• A&A —> A&A: ~15.5K
Some Properties of the Graph• The graph is very dense.
• No distinct communities• Web is not necessary balkanized into distinct groups
!9
Some Properties of the Graph• The graph is very dense.
• No distinct communities• Web is not necessary balkanized into distinct groups
• Expected top nodes with PageRank and Betweenness Centrality
!9
Goal of the Study
1. What fraction of user impressions are viewed by ad companies?
2. How much ad and tracker blocking extensions help?
!10
Model the Diffusion of Impressions in the Advertising Ecosystem
Our Simulation Setup
!11[1]. Burken et al. User centric walk: An integrated approach for modeling the browsing behavior of users on the web. ASS 2005
Our Simulation Setup We simulate browsing traces for 200 users using method from [1].
!11[1]. Burken et al. User centric walk: An integrated approach for modeling the browsing behavior of users on the web. ASS 2005
Our Simulation Setup We simulate browsing traces for 200 users using method from [1].
1. User generates an impression on N selected publishers.
!11[1]. Burken et al. User centric walk: An integrated approach for modeling the browsing behavior of users on the web. ASS 2005
Our Simulation Setup We simulate browsing traces for 200 users using method from [1].
1. User generates an impression on N selected publishers.2. Impressions are forwarded to A&A domains via:
A. Direct Propagation:
• Present on publisher or won RTB auction. Observable (goes through the browser) B. Indirect Propagation:
• A&A domains learn impressions through RTB participation. Non-observable
!11[1]. Burken et al. User centric walk: An integrated approach for modeling the browsing behavior of users on the web. ASS 2005
Our Simulation Setup We simulate browsing traces for 200 users using method from [1].
1. User generates an impression on N selected publishers.2. Impressions are forwarded to A&A domains via:
A. Direct Propagation:
• Present on publisher or won RTB auction. Observable (goes through the browser) B. Indirect Propagation:
• A&A domains learn impressions through RTB participation. Non-observable
!11[1]. Burken et al. User centric walk: An integrated approach for modeling the browsing behavior of users on the web. ASS 2005
Our Simulation Setup We simulate browsing traces for 200 users using method from [1].
1. User generates an impression on N selected publishers.2. Impressions are forwarded to A&A domains via:
A. Direct Propagation:
• Present on publisher or won RTB auction. Observable (goes through the browser) B. Indirect Propagation:
• A&A domains learn impressions through RTB participation. Non-observable
!11[1]. Burken et al. User centric walk: An integrated approach for modeling the browsing behavior of users on the web. ASS 2005
Our Simulation Setup We simulate browsing traces for 200 users using method from [1].
1. User generates an impression on N selected publishers.2. Impressions are forwarded to A&A domains via:
A. Direct Propagation:
• Present on publisher or won RTB auction. Observable (goes through the browser) B. Indirect Propagation:
• A&A domains learn impressions through RTB participation. Non-observable
!11[1]. Burken et al. User centric walk: An integrated approach for modeling the browsing behavior of users on the web. ASS 2005
Our Simulation Setup We simulate browsing traces for 200 users using method from [1].
1. User generates an impression on N selected publishers.2. Impressions are forwarded to A&A domains via:
A. Direct Propagation:
• Present on publisher or won RTB auction. Observable (goes through the browser) B. Indirect Propagation:
• A&A domains learn impressions through RTB participation. Non-observable3. RTB winner is decided based on probability (function of edge weights).
!11[1]. Burken et al. User centric walk: An integrated approach for modeling the browsing behavior of users on the web. ASS 2005
!12
PublisherExchange
DSP/Advertiser
Node TypeDirect
Indirect
Activationp1
p2 e2
e1
a3
a2
a1
a5
a4
Simulation Example (RTB Constrained)100
100
10
90
80
20
5
95
!12
PublisherExchange
DSP/Advertiser
Node TypeDirect
Indirect
Activationp1
p2 e2
e1
a3
a2
a1
a5
a4
Simulation Example (RTB Constrained)
We manually select 37 exchanges which are
allowed to forward indirect impressions to solicit bids during RTB
100
100
10
90
80
20
5
95
!12
PublisherExchange
DSP/Advertiser
Node TypeDirect
Indirect
Activationp1
p2 e2
e1
a3
a2
a1
a5
a4
p1
Simulation Example (RTB Constrained)
We manually select 37 exchanges which are
allowed to forward indirect impressions to solicit bids during RTB
100
100
10
90
80
20
5
95
!12
PublisherExchange
DSP/Advertiser
Node TypeDirect
Indirect
Activationp1
p2 e2
e1
a3
a2
a1
a5
a4
p1 e1
1
Simulation Example (RTB Constrained)
We manually select 37 exchanges which are
allowed to forward indirect impressions to solicit bids during RTB
100
100
10
90
80
20
5
95
!12
PublisherExchange
DSP/Advertiser
Node TypeDirect
Indirect
Activationp1
p2 e2
e1
a3
a2
a1
a5
a4
p1 e1
a2
a111
1
Simulation Example (RTB Constrained)
We manually select 37 exchanges which are
allowed to forward indirect impressions to solicit bids during RTB
100
100
10
90
80
20
5
95
!12
PublisherExchange
DSP/Advertiser
Node TypeDirect
Indirect
Activationp1
p2 e2
e1
a3
a2
a1
a5
a4
p1 e1
a2
a1
a5
a4
11
0
1
1
Simulation Example (RTB Constrained)
We manually select 37 exchanges which are
allowed to forward indirect impressions to solicit bids during RTB
100
100
10
90
80
20
5
95
!12
PublisherExchange
DSP/Advertiser
Node TypeDirect
Indirect
Activationp1
p2 e2
e1
a3
a2
a1
a5
a4
p1
p2
e1
a2
a1
a5
a4
11
0
1
1
Simulation Example (RTB Constrained)
We manually select 37 exchanges which are
allowed to forward indirect impressions to solicit bids during RTB
100
100
10
90
80
20
5
95
!12
PublisherExchange
DSP/Advertiser
Node TypeDirect
Indirect
Activationp1
p2 e2
e1
a3
a2
a1
a5
a4
p1
p2 e2
e1
a2
a1
a5
a4
1
1
1
0
1
1
Simulation Example (RTB Constrained)
We manually select 37 exchanges which are
allowed to forward indirect impressions to solicit bids during RTB
100
100
10
90
80
20
5
95
!12
PublisherExchange
DSP/Advertiser
Node TypeDirect
Indirect
Activationp1
p2 e2
e1
a3
a2
a1
a5
a4
p1
p2 e2
e1
a3
a2
a1
a5
a4
1
1
1
0
1
1
Simulation Example (RTB Constrained)
We manually select 37 exchanges which are
allowed to forward indirect impressions to solicit bids during RTB
100
100
10
90
80
20
5
95
!12
PublisherExchange
DSP/Advertiser
Node TypeDirect
Indirect
Activationp1
p2 e2
e1
a3
a2
a1
a5
a4
p1
p2 e2
e1
a3
a2
a1
a5
a4
1
1
1
2
1
0
1
Simulation Example (RTB Constrained)
We manually select 37 exchanges which are
allowed to forward indirect impressions to solicit bids during RTB
100
100
10
90
80
20
5
95
!12
PublisherExchange
DSP/Advertiser
Node TypeDirect
Indirect
Activationp1
p2 e2
e1
a3
a2
a1
a5
a4
p1
p2 e2
e1
a3
a2
a1
a5
a4
1
1
1
2
1
2
0
Simulation Example (RTB Constrained)
We manually select 37 exchanges which are
allowed to forward indirect impressions to solicit bids during RTB
100
100
10
90
80
20
5
95
!13
Impressions Observed
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Fra
cti
on
of
A&
A D
om
ain
s (
CD
F)
Fraction of Impressions Observed
Cookie Matching-OnlyRTB Constrained
RTB Relaxed
We have 3 simulation models:1. RTB-Relaxed: Upper-bound2. Cookie-Matching: Lower-bound3. RTB-Constrained: Realistic Scenario
!13
Impressions Observed
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Fra
cti
on
of
A&
A D
om
ain
s (
CD
F)
Fraction of Impressions Observed
Cookie Matching-OnlyRTB Constrained
RTB Relaxed
We have 3 simulation models:1. RTB-Relaxed: Upper-bound2. Cookie-Matching: Lower-bound3. RTB-Constrained: Realistic Scenario
!13
Impressions Observed
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Fra
cti
on
of
A&
A D
om
ain
s (
CD
F)
Fraction of Impressions Observed
Cookie Matching-OnlyRTB Constrained
RTB Relaxed
We have 3 simulation models:1. RTB-Relaxed: Upper-bound2. Cookie-Matching: Lower-bound3. RTB-Constrained: Realistic Scenario
!13
Take Away
1. RTB-Constrained is very close to RTB-Relaxed
2. 10% A&A see more than 90% of impressions in RTB-Constrained
Impressions Observed
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Fra
cti
on
of
A&
A D
om
ain
s (
CD
F)
Fraction of Impressions Observed
Cookie Matching-OnlyRTB Constrained
RTB Relaxed
We have 3 simulation models:1. RTB-Relaxed: Upper-bound2. Cookie-Matching: Lower-bound3. RTB-Constrained: Realistic Scenario
Impressions With Blocking
!15
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Fra
cti
on
of
A&
A D
om
ain
s (
CD
F)
Fraction of Impressions Observed
DisconnectGhostery
AdBlock PlusNo Removal
Impressions With Blocking
!15
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Fra
cti
on
of
A&
A D
om
ain
s (
CD
F)
Fraction of Impressions Observed
DisconnectGhostery
AdBlock PlusNo Removal
Impressions With Blocking
!15
Take Away
• Disconnect list is most effective.
• ABP is not effective at all due to Acceptable Ads program.
• Due to RTB, impressions are leaked to A&A domains even with blocking extensions.
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Fra
cti
on
of
A&
A D
om
ain
s (
CD
F)
Fraction of Impressions Observed
DisconnectGhostery
AdBlock PlusNo Removal
Top 10 Domains
!16
EasyList
Disconnect
Ghostery
AdBlock Plus
No Blocking
Impressions Observed (%)0 10 20 30 40 50 60 70 80 90 100
Top 10 Domains
!16
EasyList
Disconnect
Ghostery
AdBlock Plus
No Blocking
Impressions Observed (%)0 10 20 30 40 50 60 70 80 90 100
Top 10 companies can view majority of user impressions even with (most) blocking extensions installed
Top 10 Domains
!16
EasyList
Disconnect
Ghostery
AdBlock Plus
No Blocking
Impressions Observed (%)0 10 20 30 40 50 60 70 80 90 100
Top 10 companies can view majority of user impressions even with (most) blocking extensions installed
Domain Impression %
google-analytics 97.0youtube 91.7
quantserve 91.6scorecardresearch 91.6
skimresources 91.3twitter 91.1
pinterest 91.0addthis 90.0criteo 90.0
bluekai 90.8
Top 10 domains with most observed impressions under AdBlock Plus
Simulation Limitations • Our simulation models provide approximations
• Different users might have different browsing behaviors• We only simulate with respect to popular publishers
• The ecosystem could have changed from when the dataset was collected (December 2015)
• Not representative of mobile advertising ecosystem
!17
Summary• We are the first to provide a model to study the impact of Real Time Bidding (RTB)
on user privacy.
• Ad Exchanges share user impressions to facilitate RTB• More than 10% A&A domains view up to 90% of user impressions under
realistic conditions.
• Due to RTB, impressions can leak to A&A domains even with blocking extensions• AdBlock Plus is not effective at all due to Acceptable Ads program• Disconnect performed the best in terms of protecting privacy
Summary• We are the first to provide a model to study the impact of Real Time Bidding (RTB)
on user privacy.
• Ad Exchanges share user impressions to facilitate RTB• More than 10% A&A domains view up to 90% of user impressions under
realistic conditions.
• Due to RTB, impressions can leak to A&A domains even with blocking extensions• AdBlock Plus is not effective at all due to Acceptable Ads program• Disconnect performed the best in terms of protecting privacy
http://personalization.ccs.neu.edu/Projects/AdGraphs/
Number of Exchanges for RTB-C
!20
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
CD
F
Fraction of Impressions
5
10
20
30
50
100
Model Validation — Per Publisher
!21
0 50
100 150 200 250 300
Original
RTB-R
RTB-C
CM
# N
od
es A
cti
vate
d
0 1 2 3 4 5 6
Original
RTB-R
RTB-C
CM
Tre
e D
ep
th
Inclusion Chain from DOM
!22
<html><body>
<script src=“a1.com/cookie-match.js” </script><!-- Tracking pixel inserted dynamically by cookie-match.js --><img src=“a2.com/pixel.jpg” />
<iframe src=“a3.com/banner.html”><script src=“a4.com/ad.js”> </script>
</iframe></body>
</html>
DOM Tree for http://p.com/index.html
Inclusion Chain from DOM
!22
<html><body>
<script src=“a1.com/cookie-match.js” </script><!-- Tracking pixel inserted dynamically by cookie-match.js --><img src=“a2.com/pixel.jpg” />
<iframe src=“a3.com/banner.html”><script src=“a4.com/ad.js”> </script>
</iframe></body>
</html>
DOM Tree for http://p.com/index.html Inclusion of resources
Inclusion Chain from DOM
!22
<html><body>
<script src=“a1.com/cookie-match.js” </script><!-- Tracking pixel inserted dynamically by cookie-match.js --><img src=“a2.com/pixel.jpg” />
<iframe src=“a3.com/banner.html”><script src=“a4.com/ad.js”> </script>
</iframe></body>
</html>
DOM Tree for http://p.com/index.html
p
Inclusion of resources
Inclusion Chain from DOM
!22
<html><body>
<script src=“a1.com/cookie-match.js” </script><!-- Tracking pixel inserted dynamically by cookie-match.js --><img src=“a2.com/pixel.jpg” />
<iframe src=“a3.com/banner.html”><script src=“a4.com/ad.js”> </script>
</iframe></body>
</html>
DOM Tree for http://p.com/index.html
p
Inclusion of resources
Inclusion Chain from DOM
!22
<html><body>
<script src=“a1.com/cookie-match.js” </script><!-- Tracking pixel inserted dynamically by cookie-match.js --><img src=“a2.com/pixel.jpg” />
<iframe src=“a3.com/banner.html”><script src=“a4.com/ad.js”> </script>
</iframe></body>
</html>
DOM Tree for http://p.com/index.html
p
a1
Inclusion of resources
Inclusion Chain from DOM
!22
<html><body>
<script src=“a1.com/cookie-match.js” </script><!-- Tracking pixel inserted dynamically by cookie-match.js --><img src=“a2.com/pixel.jpg” />
<iframe src=“a3.com/banner.html”><script src=“a4.com/ad.js”> </script>
</iframe></body>
</html>
DOM Tree for http://p.com/index.html
p
a1
Inclusion of resources
Inclusion Chain from DOM
!22
<html><body>
<script src=“a1.com/cookie-match.js” </script><!-- Tracking pixel inserted dynamically by cookie-match.js --><img src=“a2.com/pixel.jpg” />
<iframe src=“a3.com/banner.html”><script src=“a4.com/ad.js”> </script>
</iframe></body>
</html>
DOM Tree for http://p.com/index.html
p
a1
Inclusion of resources
a2
Inclusion Chain from DOM
!22
<html><body>
<script src=“a1.com/cookie-match.js” </script><!-- Tracking pixel inserted dynamically by cookie-match.js --><img src=“a2.com/pixel.jpg” />
<iframe src=“a3.com/banner.html”><script src=“a4.com/ad.js”> </script>
</iframe></body>
</html>
DOM Tree for http://p.com/index.html
p
a1
Inclusion of resources
Inclusion Chain from DOM
!22
<html><body>
<script src=“a1.com/cookie-match.js” </script><!-- Tracking pixel inserted dynamically by cookie-match.js --><img src=“a2.com/pixel.jpg” />
<iframe src=“a3.com/banner.html”><script src=“a4.com/ad.js”> </script>
</iframe></body>
</html>
DOM Tree for http://p.com/index.html
p
a1
a2
Inclusion of resources
Inclusion Chain from DOM
!22
<html><body>
<script src=“a1.com/cookie-match.js” </script><!-- Tracking pixel inserted dynamically by cookie-match.js --><img src=“a2.com/pixel.jpg” />
<iframe src=“a3.com/banner.html”><script src=“a4.com/ad.js”> </script>
</iframe></body>
</html>
DOM Tree for http://p.com/index.html
p
a1
a2
Inclusion of resources
Inclusion Chain from DOM
!22
<html><body>
<script src=“a1.com/cookie-match.js” </script><!-- Tracking pixel inserted dynamically by cookie-match.js --><img src=“a2.com/pixel.jpg” />
<iframe src=“a3.com/banner.html”><script src=“a4.com/ad.js”> </script>
</iframe></body>
</html>
DOM Tree for http://p.com/index.html
p
a1 a3
a2
Inclusion of resources
Inclusion Chain from DOM
!22
<html><body>
<script src=“a1.com/cookie-match.js” </script><!-- Tracking pixel inserted dynamically by cookie-match.js --><img src=“a2.com/pixel.jpg” />
<iframe src=“a3.com/banner.html”><script src=“a4.com/ad.js”> </script>
</iframe></body>
</html>
DOM Tree for http://p.com/index.html
p
a1 a3
a2
Inclusion of resources
Inclusion Chain from DOM
!22
<html><body>
<script src=“a1.com/cookie-match.js” </script><!-- Tracking pixel inserted dynamically by cookie-match.js --><img src=“a2.com/pixel.jpg” />
<iframe src=“a3.com/banner.html”><script src=“a4.com/ad.js”> </script>
</iframe></body>
</html>
DOM Tree for http://p.com/index.html
p
a1 a3
a2 a4
Inclusion of resources
DOM -> Inclusion & Referer Graph
!23
(a) DOM Tree for http://p.com/index.html
<html> <body> <script src=”a1.com/cookie-match.js”></script> <!-- Tracking pixel inserted dynamically by cookie-match.js --> <img src=”a2.com/pixel.jpg”/>
<iframe src=”a3.com/banner.html”> <script src=”a4.com/ads.js”></script> </iframe> </body></html>
(d) Referer Graph(c) Inclusion Graph
a1
a2
a4
a1 a2
a4a3
(b) Inclusion Tree
p.com/index.html
a1.com/cookie-match.js
a2.com/pixel.jpg
a3.com/banner.html
a4.com/ads.js
p
a3
pPublisher
A&A