MuhammadHarisMughees,ZhiyunQian,andZubairShafiq...

16
Proceedings on Privacy Enhancing Technologies ..; .. (..):1–16 Muhammad Haris Mughees, Zhiyun Qian, and Zubair Shafiq Detecting Anti Ad-blockers in the Wild Abstract: The rise of ad-blockers is viewed as an eco- nomic threat by online publishers who primarily rely on online advertising to monetize their services. To address this threat, publishers have started to retaliate by em- ploying anti ad-blockers, which scout for ad-block users and react to them by pushing users to whitelist the web- site or disable ad-blockers altogether. The clash between ad-blockers and anti ad-blockers has resulted in a new arms race on the Web. In this paper, we present an auto- mated machine learning based approach to identify anti ad-blockers that detect and react to ad-block users. The approach is promising with precision of 94.8% and recall of 93.1%. Our automated approach allows us to conduct a large-scale measurement study of anti ad-blockers on Alexa top-100K websites. We identify 686 websites that make visible changes to their page content in response to ad-block detection. We characterize the spectrum of dif- ferent strategies used by anti ad-blockers. We find that a majority of publishers use fairly simple first-party anti ad-block scripts. However, we also note the use of third- party anti ad-block services that use more sophisticated tactics to detect and respond to ad-blockers. Keywords: ad-blockers, anti ad-blockers DOI Editor to enter DOI Received ..; revised ..; accepted ... 1 Introduction The online advertising industry has been largely fuel- ing the Web for the past many years. According to the Interactive Advertising Bureau (IAB), the annual on- line advertising revenues in the United States totaled $59.6 billion for 2015 [15]. Online advertising plays a critical role in allowing web content to be offered free of charge to end-users, with the implicit assumption that end-users agree to watch ads to support these “free” ser- Muhammad Haris Mughees: University of Illinois Urbana- Champaign, E-mail: [email protected] Zhiyun Qian: University of California-Riverside, E-mail: [email protected] Zubair Shafiq: The University of Iowa, E-mail: zubair- shafi[email protected] vices. Unfortunately, the economic magnetism of online advertising has made it an attractive target for various types of abuses. First, the online advertising ecosystem incentivizes the widespread tracking of users across web- sites raising privacy [25] and surveillance [34] concerns. To show targeted ads to users, advertisers track users across the web using cookies, beacons, and fingerprint- ing [22]. Second, the online advertising ecosystem not only lacks transparency but also does not provide any meaningful control for users to limit tracking and the use of personal information. Third, hackers are increas- ingly launching malvertising campaigns, where they use online advertising to target malware at a large number of users [41]. Finally, many publishers choose to place ads that interfere (e.g. autoplay, pop-ups, and anima- tion) with the organic content and annoy users [23]. Ad-blockers have become popular in recent years and they can block ads and/or trackers seamlessly with- out requiring any user input. A wide range of ad- block extensions are available for popular web browsers. Adblock Plus is the most popular ad-block extension [2, 31]. 22% of the most active residential broadband users of a major European ISP use Adblock Plus [37]. Another recent study showed that 18% of users in the U.S. and and 32% of users in Germany have installed ad-blockers [31]. According to PageFair, more than 600 million people around the world use ad-blockers on desk- top and mobile devices [1, 16]. To the online advertising industry and content publishers, ad-blockers are becom- ing a growing threat to their business model. To combat ad-blockers, two strategies have emerged: (1) publishers such as Google and Microsoft have enrolled in the acceptable ads program [14] to have their ads whitelisted; and (2) publishers have begun to detect the presence of ad-blockers and may refuse to serve any user with ad-blocker turned on. The latter strategy has emerged as an increasingly popular solution to counter ad-blockers. For example, Yahoo! Mail [35], WIRED [19], and Forbes [18] reportedly did so recently. The anti ad-block phenomenon has been manually stud- ied by researchers in [36, 38] on a relatively small scale due to lack of automated anti ad-block detection meth- ods. To fill this gap, in this work we perform automated detection and measurement of the anti ad-block phe- nomenon in the wild. Specifically, we are interested in understanding: (1) how many websites are reacting to

Transcript of MuhammadHarisMughees,ZhiyunQian,andZubairShafiq...

Proceedings on Privacy Enhancing Technologies ..; .. (..):1–16

Muhammad Haris Mughees, Zhiyun Qian, and Zubair Shafiq

Detecting Anti Ad-blockers in the WildAbstract: The rise of ad-blockers is viewed as an eco-nomic threat by online publishers who primarily rely ononline advertising to monetize their services. To addressthis threat, publishers have started to retaliate by em-ploying anti ad-blockers, which scout for ad-block usersand react to them by pushing users to whitelist the web-site or disable ad-blockers altogether. The clash betweenad-blockers and anti ad-blockers has resulted in a newarms race on the Web. In this paper, we present an auto-mated machine learning based approach to identify antiad-blockers that detect and react to ad-block users. Theapproach is promising with precision of 94.8% and recallof 93.1%. Our automated approach allows us to conducta large-scale measurement study of anti ad-blockers onAlexa top-100K websites. We identify 686 websites thatmake visible changes to their page content in response toad-block detection. We characterize the spectrum of dif-ferent strategies used by anti ad-blockers. We find thata majority of publishers use fairly simple first-party antiad-block scripts. However, we also note the use of third-party anti ad-block services that use more sophisticatedtactics to detect and respond to ad-blockers.

Keywords: ad-blockers, anti ad-blockers

DOI Editor to enter DOIReceived ..; revised ..; accepted ...

1 IntroductionThe online advertising industry has been largely fuel-ing the Web for the past many years. According to theInteractive Advertising Bureau (IAB), the annual on-line advertising revenues in the United States totaled$59.6 billion for 2015 [15]. Online advertising plays acritical role in allowing web content to be offered free ofcharge to end-users, with the implicit assumption thatend-users agree to watch ads to support these “free” ser-

Muhammad Haris Mughees: University of Illinois Urbana-Champaign, E-mail: [email protected] Qian: University of California-Riverside, E-mail:[email protected] Shafiq: The University of Iowa, E-mail: [email protected]

vices. Unfortunately, the economic magnetism of onlineadvertising has made it an attractive target for varioustypes of abuses. First, the online advertising ecosystemincentivizes the widespread tracking of users across web-sites raising privacy [25] and surveillance [34] concerns.To show targeted ads to users, advertisers track usersacross the web using cookies, beacons, and fingerprint-ing [22]. Second, the online advertising ecosystem notonly lacks transparency but also does not provide anymeaningful control for users to limit tracking and theuse of personal information. Third, hackers are increas-ingly launching malvertising campaigns, where they useonline advertising to target malware at a large numberof users [41]. Finally, many publishers choose to placeads that interfere (e.g. autoplay, pop-ups, and anima-tion) with the organic content and annoy users [23].

Ad-blockers have become popular in recent yearsand they can block ads and/or trackers seamlessly with-out requiring any user input. A wide range of ad-block extensions are available for popular web browsers.Adblock Plus is the most popular ad-block extension[2, 31]. 22% of the most active residential broadbandusers of a major European ISP use Adblock Plus [37].Another recent study showed that 18% of users in theU.S. and and 32% of users in Germany have installedad-blockers [31]. According to PageFair, more than 600million people around the world use ad-blockers on desk-top and mobile devices [1, 16]. To the online advertisingindustry and content publishers, ad-blockers are becom-ing a growing threat to their business model.

To combat ad-blockers, two strategies haveemerged: (1) publishers such as Google and Microsofthave enrolled in the acceptable ads program [14] to havetheir ads whitelisted; and (2) publishers have begun todetect the presence of ad-blockers and may refuse toserve any user with ad-blocker turned on. The latterstrategy has emerged as an increasingly popular solutionto counter ad-blockers. For example, Yahoo! Mail [35],WIRED [19], and Forbes [18] reportedly did so recently.The anti ad-block phenomenon has been manually stud-ied by researchers in [36, 38] on a relatively small scaledue to lack of automated anti ad-block detection meth-ods. To fill this gap, in this work we perform automateddetection and measurement of the anti ad-block phe-nomenon in the wild. Specifically, we are interested inunderstanding: (1) how many websites are reacting to

Detecting Anti Ad-blockers in the Wild 2

ad-blockers; and (2) what type of technical approachesare used.

Key Contributions. The key contributions and find-ings of the paper are the following:– We propose a machine learning based approach to

automatically identify websites that use anti ad-blockers. Our key idea is that websites that employanti ad-blockers will make distinct changes to theirweb page content for ad-block users (e.g. displayinga pop-up message) as compared to users withoutad-blockers. We extract these distinct features andfeed them to train machine learning models. Theapproach is promising with precision of 94.8% andrecall of 93.1%.

– Using our proposed approach, we conduct a mea-surement study of Alexa top-100K websites. Theresults show that 686 websites visibly react toad-blockers. Out of these websites, 165 websitesshow intrusive notifications that cannot be dis-missed by users without disabling their ad-blockeror whitelisiting the website. We also find that somewebsites have started to ask ad-block users to paya subscription fee (42 websites) or make a donation(30 websites).

– We cluster different anti ad-block approaches byanalyzing their JavaScript snippets. Our analysisshows that anti ad-block approaches range fromfairly simple to much more sophisticated. We findthat many websites use simple anti ad-block scriptsthat are served from first-party domains. Othersuse third-party anti ad-block services that are muchmore sophisticated.

2 Background & Related WorkIn this section, we first provide a brief background of ad-blockers and anti ad-blockers and then discuss relevantprior work.

2.1 Background

The popularity of ad-blockers. The issues with on-line ads have resulted in the proliferation of ad-blockingsoftware. Ad-blocking software (or ad-blocker) is an ef-fective tool that blocks ads seamlessly, generally pub-lished as extensions in web browsers such as Chromeand Firefox. More recently, Apple has also allowed con-tent blocking plugins on iOS devices [28]. Other popular

(a) forbes.com

(b) independent.co.uk

Fig. 1. Examples of ad-block detection responses.

relevant tools such as Ghostery [9] are primarily focusedon protecting user privacy by blocking trackers. Mostad-blockers include filter lists to remove both ads (e.g.EasyList [6]) and/or trackers (e.g. EasyPrivacy [7]). Re-cent reports have shown that the number of users usingad-blocking software has rapidly increased worldwide.According to PageFair, more than 600 million peoplearound the world use ad-blockers on desktop and mo-bile devices [1, 16].

How do ad-blockers work? Ad-blockers eliminateads by either page element removal or web request block-ing. For page element removal, ad-blockers use variousCSS selectors to access the elements and remove them.For web request blocking, ad-blockers look for particularURLs and remove the ones which belong to advertisers.For both of these actions, ad-blockers are dependent onfilter lists that contain the set of rules (as regular ex-pressions) specifying the element selectors and domainsto remove. There are various kinds of filter lists availablewhich can be included in ad-blockers. Each of these listsserves a different purpose. For example, Adblock Plusby default includes EasyList [6], which provides rulesfor removing ads from English websites. EasyPrivacy[7] helps ad-blockers to protect user privacy by remov-ing trackers.

The rise of anti ad-blockers. The online adver-tising industry sees ad-blocking tools as a growing

Detecting Anti Ad-blockers in the Wild 3

(a) (b) (c)

Fig. 2. Web page load evolution for www.vipleague.tv. (a) The original website content is loaded. (b) Ad-blocker removes ads from thepage. (c) Anti ad-blocker blocks the content and shows a pop-up notification asking the user to disable ad-blocking software.

threat to the ad-powered “free” web business model.The widespread use of ad-blockers has prompted a cat-and-mouse game between publishers and ad-blockingsoftware. IAB recently released a script [26] to DEAL(Detect, Explain, Ask, Limit) with ad-blockers. Usingsuch scripts, publishers have started to detect whetherusers are visiting their websites while using ad-blockingsoftware (detection step). Once detected, publishers no-tify users to turn off their ad-blocking software (reac-tions step). These notifications can range from a mildnon-intrusive message which is integrated inside websitecontent to more aggressive blocking of website contentand/or functionality. Figure 1 shows a couple of exam-ples of ad-block detection responses. To detect the useof ad-blocking software, publishers employ anti ad-blockscripts in their pages. When a user with the ad-blockingsoftware opens such a website, these scripts typicallymonitor the visibility of ads on the page to identify theuse of ad-blockers. If ads are found hidden or removedby an ad-blocker, publishers take countermeasures ac-cording to their policies.

Illustration of anti ad-blockers. To understand howanti ad-blockers operate, let’s walk through the com-plete cycle of a typical anti ad-blocker. Figure 2 showsthe web page loading process of http://www.vipleague.tv, which detects the presence of ad-blockers and subse-quently reacts to them, on a web browser with AdblockPlus. A JavaScript is employed by the publisher whichattempts to detect the presence of ad-blocker. Fig-ure 3 shows the JavaScript employed by http://www.vipleague.tv. The functionality of the JavaScript can bedivided into three parts: timeout, condition check, andresponse. In Figure 2, we note that the web browserstarts loading the HTML and other resources includedin the HTML code. While the content is loading, ad-block extension kicks in and starts evaluating the HTMLcode and page content to remove potential ads. Sincethe ad-blocker starts working after a small delay, the

1 // step 1: set timeout2 var myVar = setInterval ( function () {3 myFunc ()4 }, 2000);56 function myFunc () {78 // step 2: condition check9 if ( window . iExist === undefined || (!$(

10 "# XUinXYCfBvqpyDHOrOAVClxoWJemrlPpfCdWfiyAzNY ")11 .is(": visible ") && (($(". vip_052x003 "). height ()12 < 100 && !$("# vipchat "). length ) &&13 $(". vip_09x827 "). height () < 25))) {1415 // step 3: response16 $("# XUinXYCfBvqpyDHOrOAVClxoWJemrlPpfCdWfiy17 AzNY").css("width :100%; height :100%; position18 :fixed;z-index :999999; top :0");19 $("# XUinXYCfBvqpyDHOrOAVClxoWJemrlPpfCdW20 fiyAzNY ").show ()21 }22 else if ($("# XUinXYCfBvqpyDHOrOAVClxoWJemrlPpfC23 dWfiyAzNY ").is(": visible ") && $(". vip_052x003 ")24 . height () > 249) {2526 $("# XUinXYCfBvqpyDHOrOAVClxoWJemrlPpfCdWfi27 yAzNY").hide ()28 }29 }

Fig. 3. Ad-block detection JavaScript extracted from http://www.vipleague.tv

anti ad-block script has to wait some time before mon-itoring the ads. In Figure 3, the timeout is set at 2000milliseconds. Once the timer expires (typically a few sec-onds), the condition check is executed to verify the pres-ence/absence of ads, e.g. by checking the height, width,or visibility of ad frames. If the script detects that adswere removed or hidden, then the response step is exe-cuted. As discussed earlier, the implementation detailsof this step varies across publishers. A few publishersgently request users to remove/disable their ad-blockers,while others aggressively show a page-wide notificationand/or block content. For example, in Figure 3, the pub-lisher responds by changing CSS properties of a div toshow a pop-up message that asks the user to disablead-blocker or whitelist the site.

Detecting Anti Ad-blockers in the Wild 4

2.2 Related Work

Below, we discuss prior literature on online advertising,ad-blockers, and anti ad-blockers.

Online Advertising. Online advertising relies on so-phisticated tracking of users across the web to targetpersonalized ads. Roesner et al. [39] conducted an ac-tive measurement study of third-party tracking on theWeb. They found a total of 7264 (524 unique) track-ers on the Alexa top 500 websites. They found thata few trackers cover a large fraction of popular web-sites. More specifically, they reported that Google An-alytics and DoubleClick (both owned by Google) areused on 40%–60% of the top 500 websites. Using realusers’ browsing history, they reported that a few track-ers cover as much as 66% of the pages visited by a user.Metwalley et al. [32] conducted a passive measurementstudy to determine the extent of tracking on the Web.They found more than 400 tracking services, of whichtop 100 regularly track more than 50% of users. Theyfound that 80% of the users are tracked by at leastone tracking service within a second after starting webbrowsing. Lerner et al. [29] conducted a longitudinalactive measurement study using the Wayback Machine.The authors found that the scope and complexity ofonline tracking has dramatically increased over the last20 years. They showed that popular websites use morethird-party trackers now than ever before. The authorsreported that the coverage of top trackers on the Webis increasing rapidly, with top 5 trackers now coveringmore than 60% of the top 500 websites as compared toless than 30% ten years ago. Englehardt and Narayanan[22] showed that a handful of third parties includingGoogle, Facebook, Twitter, and AdNexus track usersacross a majority of top 1 million websites. The au-thors showed that Google alone tracks users across morethan 80% of the top 1 million websites. The authorsalso showed that stateless tracking based on Canvas,WebRTC, and AudioContext fingerprinting is employedacross thousands of websites.

Researchers have also studied security aspects ofonline advertising. Li et al. conducted the first large-scale study of malicious advertising (called malvertis-ing) on the Web [30]. Their analysis of 90,000 websitesshowed that not only malicious ads affect top websitesbut they also evade detection by various cloaking tech-niques. Zarras et. al. also conducted a large-scale studyto determine the extent at which users are exposed tomalicious advertisements [41]. Their measurement studyof more then 60,000 ads showed that around 1% ads ex-hibit malicious behavior. They also showed that a few

ad networks are more prone to malicious advertisementsthan others.

Ad-blockers. Due to the popularity of ad-blockers,researchers have tried to study the prevalence of ad-blockers. Pujol et al. conducted a measurement studyusing passive network traces of thousands of users froma European ISP to quantify ad-block usage [37]. Theirresults show that 22% of users use AdBlock Plus. Theyalso found that ad-block users still generate signifi-cant ad traffic due their enrollment in the acceptableads program. Walls et al. [40] conducted a study ofthe whitelists used by ad-blockers for allowing the ac-ceptable ads. They analyzed the evolution of ad-blockwhitelists and performed active measurements on pop-ular websites. Their analysis showed that the whitelistcontains around 5,936 filters and 3,545 unique pub-lisher domains. They reported that whitelists are in-clined towards top ranked Alexa websites (59% filtersare for top 5000 websites). Gugelmann et al. [24] pro-posed a methodology to complement manual filter listsof ad-blockers by automatically blacklisting intrusiveads. They trained a classifier on HTTP traffic statisticsand identified around 200 new advertising and trackingservices.

Anti Ad-blockers. Since the arms race between ad-blockers and anti ad-blockers is a relatively recent phe-nomenon, prior research is limited to small scale andmanual analysis of anti ad-blockers. Researchers [36, 38]have recently reported anecdotal evidence of ad-blockdetection and retaliation by publishers. Rafique et al.[38] conducted manual analysis to report that 163 outof the top 1000 free live video streaming aggregatorsemploy anti ad-blockers. Nithyanand et al. [36] clus-tered JavaScript snippets and manually analyzed theclusters to identify third-party anti ad-blockers. Theyfound that 6.7% of top 5000 Alexa websites employ antiad-blockers. Unfortunately such manual analysis is hardto scale up. In contrast, we use machine learning modelsto conduct large scale analysis of anti ad-blockers. Fur-thermore, while Nithyanand et al. [36] consider all web-sites that include anti ad-block scripts, we identify thoseanti ad-blockers that alter content to restrict access inresponse to ad-block detection because users only getaffected by anti ad-blockers that retaliate by restrictinguser access. To the best of our knowledge, we presentthe first automated large-scale measurement study ofanti ad-blockers in the wild.

Detecting Anti Ad-blockers in the Wild 5

3 Detecting Anti Ad-blockersIn this section, we design and implement our approachfor automatically identifying websites that employ antiad-blockers. The main premise of our approach is thatwebsites conducting ad-block detection make distinctchanges to their web page content for ad-block users ascompared to users without ad-blockers. Our goal is toidentify, quantify, and extract such distinct features thatcan be leveraged for training machine learning modelsto automatically detect websites that employ anti ad-blockers.

3.1 Overview

We want to identify distinct features that capture thechanges made by anti ad-blockers to the HTML struc-ture of web pages. One plausible approach is to applyimage analysis which is commonly used to detect phish-ing websites [17, 21]. However, in pilot experiments, wefound that the visible changes made by anti ad-blockerscan be non-intrusive and hard to distinguish from otherdynamically generated content. For instance, news web-sites may simply replace one news item with a differentone that talks about ad-blocker (Figure 4). More de-tails are given in §3.5. Therefore, we decided to analyzechanges in the HTML content of web pages.

The changes made by anti ad-blockers to the HTMLcontent can be categorized into: (1) addition of extraDOM nodes, (2) change in the style of existing DOMnodes, and (3) changes in the textual content. Below,we provide an overview of our proposed features andalso discuss how they capture the changes made by antiad-blockers in response to ad-block detection.

Node changes. In order to show notification to userswith ad-blockers, websites dynamically create and addnew DOM nodes. Thus, node additions in the DOMcan potentially indicate anti ad-blockers. We can logthe total number of DOM elements inserted in a webpage.

Style changes. A few websites include notificationswhich are in their page content but hidden. If thesewebsites detect the use of ad-blockers, they change thevisibility of their notification. To cover such cases, wecan log attribute changes to DOM elements of a webpage.

Text changes. Some websites change the textual con-tent (i.e. text-related nodes) in response to ad-block

Fig. 4. Non-intrusive changes made by anti ad-blockers on www.pri.org (the top right news item is replaced with a warning aboutad-blocker).

detection. Therefore, we can log changes in the textualcontent of a web page and addition of text-related nodesin a web page.

Structure changes. In addition to the above-mentioned features, we also consider other features likeinnerHTML to detect whether the structure is modifiedand track changes in URL to detect redirection.

3.2 Methodology

Figure 5 provides an overview of our methodology toautomatically detect anti ad-blockers. We conduct A/Btesting to compare the contents of a web page with andwithout ad-blocking software. To automate this process,we use the Selenium Web Driver [11] to open separateinstances of the Chrome web browser, with and with-out Adblock Plus (we also attempted different config-urations for Adblock Plus). We implement a customChrome browser extension to record changes in the con-tent of web pages during the page load process. Ourextension records the structure of the DOM tree, alltextual content, and HTML code of the web page. Weimplement a feature extraction script to process the col-lected data and generate a feature vector for each web-site. We feed the extracted features to a supervised clas-sification algorithm for training and testing. We trainthe machine learning model using a labeled set of web-sites with and without anti ad-blockers. Below we de-scribe these steps in detail.

Ad-blocker configurations. Since different ad-blocker configurations can affect the results (some maytrigger more anti ad-blocking than others), we decideto configure Adblock Plus in three different ways (each

Detecting Anti Ad-blockers in the Wild 6

Nodes Added

Attribute Changes

Text Changes

URL & HTML Changes

Nodes Added

Attribute Changes

Text Changes

URL & HTML Changes

Nodes Added

Attribute Changes

Text Changes

URL & HTML Changes

Website Name, nodes added, div,h1,h2,h3,a,p,iframe,visibility change,height change, text change

Feature Set

Resampling Training

Machine Learning

AlgorithmPredictive

Model

Feature Set

Step 1

Step 2

Step 3

Step 4

Fig. 5. Overview of our methodology for detecting anti ad-blockers.

configuration should catch more anti ad-blocking web-sites than the last):• EasyList + Acceptable Ads. This represents the con-figuration for most users. By default, Adblock Plusincludes EasyList to remove ads but allows some ac-ceptable ads [14] to be displayed.

• EasyList. In this configuration, we use EasyList butdisallow acceptable ads. Adblock Plus will block allads (including the acceptable ads).

• EasyList - Anti Ad-block rules. In this configuration,we disallow acceptable ads as well as eliminate filterrules in EasyList that can circumvent anti ad-blockers(these rules suppress anti ad-blocking which makes

the websites look as if no anti ad-blocking were em-ployed). EasyList has recently started to include fil-ter rules (1461 at the time of writing) to circumventanti ad-blockers. These rules are separated in !Anti-Adblock sections of EasyList.

A/B testing.We implement a web automation tool us-ing the Selenium Web Driver [11] to conduct measure-ments. For A/B testing, our tool first loads a websitewithout Adblock Plus, and then opens it with AdblockPlus in a separate browser instance. However, we findthat many websites host dynamic content that changesat a very small timescales. For example, some websitesinclude dynamic images (e.g. logos), which can intro-duce noise in our A/B testing. Similarly, many newswebsites update their content frequently which can alsoadd noise. Thus, we may incorrectly attribute thesechanges to the ad-blocker or the anti ad-blocker used bya publisher. To mitigate the impact of such noise, ourtool opens two instances of each website without Ad-block Plus in parallel (at the same time) and excludescontent that changes across both instances.

Data collection. To collect data while a web pageis loading, we use DOM Mutation Observers [13]to track changes in a DOM (e.g. DOMNodeAdded,DOMAttrModified, etc.). The changes we track includeaddition of new DOM nodes or scripts, node attributechanges like class change or style change, removal ofnodes, changes in text, etc. We implement the data col-lection module as a Chrome extension. The extension ispreloaded in the browser instances that are launched byour web automation tool. As soon as a web page startsloading, the extension attaches an observer listener withit. Whenever an event occurs, the listener fires and werecord the information. For example, we record the iden-tifier, type, value, name, parent nodes, and attributes ofthe corresponding node. For each attribute change, inaddition to above-mentioned information, we record thename of attribute which changes like style or class andits old and new value. We also log page level data suchas the complete DOM tree, innerText, and innerHTMLas well.

Feature extraction. We then process the output ofdata collector to extract a set of informative featureswhich can distinguish HTML content changes due toanti ad-blockers. Let A denote the data collected withan ad-blocker, and let B and B’ denote the data col-lected by loading a web page twice without an ad-blocker. We provide details of the feature extraction

Detecting Anti Ad-blockers in the Wild 7

Table 1. Features used to identify anti ad-blockers

Feature Set DescriptionChange in number of div nodes

Change in number of h1 nodes

Change in number of h2 nodes

Change in number of h3 nodes

Change in number of img nodes

Change in number of table nodes

Change in number of p nodes

Change in number of anchor nodes

Change in number of iframe nodes

Change in number of text nodes

Nodes

Total number of changes in nodes

Change in number of display attributes

Change in number of visibility attributes

Change in number of height attributes

Change in number of width attributesAttributes

Change in number of opacity attributes

Change in number of maxheight attributes

Change in number of background size attributes

Total number of changes in attributes

Change in number of lines

Change in number of words

Change in number of charactersTextual

Keywords (binary as presence/absence)

Cosine similarity of HTMLStructural

URL change (binary as yes/no)

process below. Table 1 includes the list of all featuresused in our study.• Node features. For each instance, we extract DOM re-lated nodes because our pilot experiments revealed thatwebsites using anti ad-blockers add only DOM relatednodes. More specifically, we extract the list of anchor,div, h1, h2, h3, img, table, p, iframe and text nodesfor each instance. Once we have a list of DOM nodesfor each instance, we compare A vs. B’ and B vs. B’ toobtain the list of differences between these nodes. Wedenote these lists as AB’ and BB’ lists. As explainedearlier, to remove node differences due to dynamic con-tent of websites, we cross-validate nodes in AB’ withBB’ using their properties. Our key idea is that if apublisher ads random nodes to a web page, they mayhave different identifiers but most of the other proper-

ties will be fairly similar. Thus, we remove the nodesfrom AB’ that also appear in BB’.• Attribute features. For each instance, we extractchanges in the style of DOM related nodes. More specif-ically, we focus on changes to the display-related nodeproperties. For instance, we log whether the visibilityproperty of a node changes from hidden to non-hidden.We also log changes to other display properties, e.g.the number of changes in height, width, and opacity ofnodes. Similar to node features, we compare A, B, andB’ to eliminate attribute changes from AB’ that alsoappear in BB’.• Textual features. We get the list of all text nodes in A,B, and B’. Using the lists, we extract changes in num-ber of lines, worlds, and characters. We again compareA, B, and B’ to eliminate changes in textual featuresfrom AB’ that also appear in BB’. We also use sevenkeywords (adblock, ad-block, ad block, whitelist, block-adblock, pagefair, fuckadblock) as binary (presense/ab-sense) features in A.• Structural features. We compare differences in theoverall page HTML using the cosine similarity metric.If the cosine similarity between A and B/B’ is verylow, it indicates significant content change. To check forpotential URL redirections, we use change in URL as abinary (yes/no) feature.

Model training and testing. We feed the extractedfeatures to a machine learning classifier for automati-cally detecting websites that employ anti ad-blockers.However, in order to train the classification algorithm,we need a sufficient number of labeled examples of web-sites that detect ad-blockers (i.e. positive samples) andwebsites that do not detect ad-blockers (i.e. negativesamples). To get positive samples, we open websiteswith Adblock Plus and manually analyze whether thedetect and respond to ad-blockers. Specifically, we an-alyze Alexa top-1K websites and some listed in crowd-sourced lists [3, 4]. Overall, we identify a total of 200positive training samples. Since a majority of Alexa top-1K websites do not deploy anti ad-blockers, we use themas negative training samples.

3.3 Feature Analysis

We analyze the extracted features to quantitatively un-derstand their usefulness in detecting anti ad-blockers.We first visualize the distributions of a few features. Fig-ure 6 plots the cumulative distribution functions (CDF)of two features. We observe that websites which employanti ad-blockers tend to change more lines and add div

Detecting Anti Ad-blockers in the Wild 8

Fig. 6. Distribution of features used to identify anti ad-blockers

nodes than other websites. These distributions confirmour intuition that anti ad-blockers make changes in thepage content that are distinguishable.

To systematically study the usefulness of differentfeatures, we employ the concept of information gain[33], which uses entropy to quantify how our knowl-edge of a feature reduces the uncertainty in the classvariable. The key benefit of information gain over othercorrelation-based analysis is that it can capture non-monotone dependencies. Let H(X) denote the entropy(i.e. uncertainty) of feature X. H is defined as:

H = −∑

i

pilog pi

Let H(Y ) denote the entropy (i.e. uncertainty) of thebinary class variable Y . Information gain is computedas:

IG(Y |X) = H(Y )−H(Y |X).We can normalize information gain, also called relativeinformation gain, as:

H(Y )−H(Y |X)H(Y ) .

Table 2. Features ranked based on information gain

Name Information GainChange in number of words 35.44%Change in number text nodes 27.89%Change in number of lines 18.13%Total number of changes in nodes 17.37%Change in number of characters 17.19%Change in number of div nodes 13.01%Change in number of height attributes 10.67%Change in number of display attributes 8.67%Total number of changes in attributes 7.20%Change in number of img nodes 5.82%

Using this, we can quantify what an input feature in-forms us about the presence of anti ad-blockers. Table2 ranks the top 10 features based on their informationgain. We note that textual features tend to have highinformation gain. They are followed by node and stylebased features (e.g. change in number of div nodes,change in number of height attributes).

3.4 Classifier Evaluation

We use the standard k-fold cross validation methodol-ogy to validate the accuracy of the trained models. Forthis purpose we select k = 5, divide the data into 5 foldswhere one fold is for testing while the other four foldsare used for training. To quantify the classification ac-curacy of the trained models, we use the standard ROCmetrics: precision, recall, and area under ROC curve(AUC). We try different machine learning classificationmethods. We tuned parameters of each of these mod-els to optimize their classification performance. Table 3summarizes the classification accuracy of these classi-fiers. We note that the random forest classifier, which isa combination of tree classifiers, outperforms the C4.5decision tree and the naive Bayes classifiers. The ran-dom forest classifier achieves 93.1% recall, 94.8% preci-sion, and 96.0% AUC.

Table 3. Effectiveness of different classifiers

Classifier Recall Precision AUCRandom Forest 93.1% 94.8% 96.0%

C4.5 Decision Tree 87.0% 89.0% 91.3%

Naive Bayes 82.0% 82.4% 89.0%

Detecting Anti Ad-blockers in the Wild 9

Fig. 7. Visualization of decision tree model for anti ad-blockers

To further evaluate the effectiveness of different fea-ture sets in identifying anti ad-blockers, we conductexperiments using stand alone feature sets and thenevaluate their all possible combinations. We divide thefeatures into node features, attribute features, textualfeatures, and structural features. Among stand alonefeature sets, as expected, textual features provide thebest classification accuracy. We also observe that usingcombinations of feature sets does improve the classifi-cation accuracy. The best classification performance isachieved when all feature sets are used together.

To further gain some intuition from the trained ma-chine learning models, we visualize a pruned version ofthe decision tree model trained on labeled data in Figure7. As expected from the information gain analysis, wenote that a text feature (change in number of words) isthe root node of the decision tree. Moreover, changes invisibility and number of div nodes are indicative of antiad-blockers. It is interesting to note that the top threefeatures in the decision tree belong to different featurecategories. This shows that different feature sets com-plement each other, rather than capturing similar infor-mation, which we also observed earlier when evaluatingdifferent combinations of features.

3.5 In the Wild Evaluation

Detecting anti ad-blockers in the wild. We nowapply our trained random forest model on the home-page of Alexa top 100K websites to gain an overall pic-

ture of anti ad-blockers in the wild. The experimentsare conducted in February 2017. We report the resultsfor the three different Adblock Plus configurations asmentioned in Section 3.2.1. For the first configuration of EasyList + AcceptableAds, the model predicted that 642 websites detect andrespond to ad-blockers. We manually inspect all identi-fied websites to understand its detection accuracy. Un-fortunately, it is sometimes difficult for us to deter-mine the ground truth when subtle and hard to identifychanges (e.g. embedded text) are made, especially whenthe language is not English. Nevertheless, we conserva-tively estimated that 556 websites are true positives.2. For the second configuration of EasyList, the modelpredicted that 651 websites detect and respond to ad-blockers. Upon manual inspection, 558 websites are truepositives. In theory, more websites should react to thisconfiguration as compared to the first configuration be-cause websites are not allowed to display any ads (in-cluding the acceptable ads). We surmise that only smallportion of ad-block users change the default setting, andit is therefore not causing significant loss of revenue topublishers. If more users were to disallow the acceptableads, we expect the results to change.3. For the third configuration of EasyList - Anti Ad-block rules, the model predicted that 786 websites de-tect and respond to ad-blockers. Upon manual inspec-tion, 686 websites are true positives. This represents anincrease of 130 websites that employ anti ad-blockingcompared to the first configuration. It is interesting tonote that anti ad-block filter rules in EasyList are onlymoderately effective (130/686=19%) in evading anti ad-blockers.

Due to the limitations of our methodology, as wediscuss later, the results represent a lower bound of antiad-block usage in the wild.

Characterizing ad-block detection responses. Wemanually analyze ad-block detection responses of the686 identified websites to categorize how they respondto ad-blockers. We look at two different aspects: (1) howaggressive are the response messages; and (2) what dothey ask the users to do.

We categorize the aggressiveness of the ad-block de-tection responses into three types: (a) non-intrusive no-tification, typically well integrated with the webpagecontent (409 websites); (b) intrusive notification (e.g.flashy popup) but it can be dismissed by the user (112websites); and (c) intrusive notification which cannotbe dismissed by the user without disabling ad-blockeror whitelisiting the website (165 websites). Our find-

Detecting Anti Ad-blockers in the Wild 10

0 20000 40000 60000 80000 100000Publisher Alexa Rank

0.0

0.2

0.4

0.6

0.8

1.0

CDF

(a) Alexa rank distribution

Blog

s

News

Ente

rtain

men

t

Gam

es

Porn

ogra

phy

Mar

ketin

g

Illeg

al S

oftw

are

Inte

rnet

Serv

ices

Busin

ess

Foru

m

Educ

atio

n

Tech

. For

ums

Hard

ware

Spor

ts

0.00

0.02

0.04

0.06

0.08

0.10

0.12

Frac

tion

(b) Categories

Fig. 8. Characteristics of websites that employ anti ad-blockers

ings show that a majority of the websites are still con-servative and do not want to lose users by showingintrusive notifications that can annoy users. However,a substantial number of these websites are taking amuch more aggressive stance against ad-blockers, e.g.www.forbes.com, by not allowing users to view contentif ad-block usage is detected.

We categorize the content of response messages intothree types: (a) users are asked to disable the ad-blockeror whitelist the website (644 websites); (b) users areasked to donate (30 websites); and (c) users are askedto pay a subscription fee (42 websites). Note that theresponses of some websites fall into multiple categories.We find that a small but non-trivial fraction of web-sites are actually considering alternative monetizationmodels such as donations and paid subscriptions.

Characterizing websites that employ anti ad-blockers.We characterize the websites that use anti ad-blockers in terms of their popularity (using Alexa ranks)and categorization (using McAfee’s URL categorization

service). Figure 8(a) shows that websites employing antiad-blockers are very much uniformly distributed acrossthe Alexa top 100K, without any obvious skews. Thiscan happen as websites that heavily rely on ads to mon-etize are spread across top 100K websites. Figure 8(b)shows the top 15 categories of the websites that em-ploy anti ad-blockers. The overall categorization trendis similar to what has been observed for top 5K websitesin [36]. The top five categories are blogs, news, enter-tainment, games, and pornography. The remaining 25%websites (not shown in the figure) are grouped in theother category.

Limitations. Our proposed methodology cannot iden-tify all websites that employ anti ad-blocking. There areseveral reasons. First, we only visit the homepages ofAlexa top-100K websites. Some websites may only em-ploy anti ad-blocking on specific subpages. Second, somewebsites may detect ad-blockers but may not react tothem. Since our methodology relies on detecting changesin the HTML content, we will not detect these websites.Third, the default EasyList is primarily constructedbased on ads in English websites, and may not havethe best coverage in blocking ads in other websites (al-though we do observe ads blocked in many non-Englishwebsites). Including additional filtering lists (such asthose that target other languages) may improve the de-tection rate of anti ad-blocking [10]. Finally, websitescan employ anti ad-blockers non-deterministically, e.g.once every 10 site visits, or after a long delay. Our mea-surements will likely miss these websites as well. In sum-mary, our results represent a lower bound on the web-sites that employ anti ad-blockers, and we plan to ad-dress the limitations as future work.

4 Analyzing Anti Ad-block ScriptsIn this section, we characterize the functionality of antiad-blockers. More specifically, we analyze anti ad-blockJavaScript code snippets to study different ad-blockdetection strategies. For systematic analysis, we firstautomatically cluster anti ad-blockers based on theirJavaScript code similarity and then manually analyzedifferent anti ad-block clusters. Through this analysis,we are able to identify several third-party anti ad-blockservices that are used by multiple publishers.

Detecting Anti Ad-blockers in the Wild 11

4.1 Collecting Anti Ad-block JavaScriptSnippets

As a first step, we collect the JavaScript snippets on allwebsites that employ anti ad-blockers. Analyzing thefunctionality of JavaScript code is non-trivial becausethe code can be obfuscated and packed inside functionssuch as eval. To overcome these issues, we leverage thefact that the packed code needs to unpack itself beforeexecution. We attach a debugger between the ChromeV8 JavaScript engine [12] and the web pages. Specifi-cally, we observe script.parsed function, which is in-voked when eval is called or new code is added with<iframe> or <script> tags. We implement the debug-ger as a Chrome extension and collect all JavaScriptsnippets parsed on a webpage and manually identify thesnippet responsible for ad-block detection.

4.2 Clustering

We want to cluster anti ad-block JavaScript snippetsinto a few groups. To this end, we map JavaScript snip-pets using a tree representation and then group the oneswith similar structure.

To identify similar anti ad-block JavaScript snip-pets, we first parse them to produce abstract syntaxtrees (ASTs). ASTs have been used in prior literaturefor JavaScript malware detection [20, 27]. ASTs allowus to retain the structural and logical properties of thecode while ignoring fine-grained details, such as variablenames, which are not useful for our analysis. Figures 9and 10 show two anti ad-block JavaScript snippets andtheir corresponding AST visualizations. We use the Es-prima JavaScript parser [8] to visualize the ASTs. Notethat although these JavaScript snippets look fairly dif-ferent due to different variable names and values, buttheir ASTs have similar logical structure except for mi-nor differences near the leaf nodes.

We transform ASTs of all anti ad-block JavaScriptsnippets to normalized node sequences by perform-ing the pre-order traversal. These sequences are com-posed of different AST node types such as IfStatement,WhileStatement, and AssignmentExpression. Note thatthese sequences are of different lengths. We identify 88distinct node types in the set of anti ad-block JavaScriptsnippets. To transform variable length normalized nodesequences to a fixed number of dimensions, we converteach sequence into a 88-dimensional summary vector.Each anti ad-block JavaScript snippet is represented asan 88-dimensional point, where each dimension corre-

1 $( document ).ready( function () {2 setTimeout ( function () {3 if ( localStorage .noad === undefined &&4 (16 >= $("#gAds"). height () ||5 16 >= $("#gAd2". height ()))) {6 $("#Blog1"). remove ();7 sweetAlert ("Oops .. please don ’t8 block my ADs"," warning ");9 }

10 }, 3456)11 });

Fig. 9. Example 1: AST of anti ad-block JavaScript snippet.

sponds to a node type. The value of each dimension isthe occurrence frequency of the node type in the se-quence.

We use Principal Component Analysis (PCA) to re-duce the dimensionality of the summary vector for visu-alization. Figure 11 separately plots 3-dimensional visu-alizations of first- and third-party anti ad-block scripts.In Figure 11(a), we note that most first-party anti ad-block scripts are in a dense cluster. The dense clus-ter of first-party scripts indicates that they use a sim-ilar approach for ad-block detection. In Figure 11(b),third-party anti ad-block scripts are spread out fromeach other. Multiple clusters of third-party anti ad-block scripts indicate that they use different approaches

Detecting Anti Ad-blockers in the Wild 12

1 jQuery ( document ).ready( function () {2 setTimeout ( if ( jQuery ("# adblock "). height () == 0){3 window . location = "/ adblock "4 }, 3456);5 });

Fig. 10. Example 2: AST of anti ad-block JavaScript snippet.

which are potentially more sophisticated than first-party scripts.

4.3 Analysis and Discussions

Next, we analyze ad-block detection strategies used bydifferent anti ad-block clusters. We first study the antiad-blockers in the dense central cluster of Figure 11(a).Most of these are customized anti ad-block scripts,which are served as first-party by publishers. Our in-spection of different clusters in Figure 11(b) revealedthat they are mostly third-party anti ad-block scripts.Below, we provide an in-depth analysis of both first-party and third-party anti ad-block scripts.

10000080000

6000040000

20000 40000

20000

0

20000

40000

20000

10000

0

10000

20000

(a) First-party

0

200000

400000 0

100000

200000

300000

100000

50000

0

50000

PageFair IAB Outbrain Taboola Hotjar

(b) Third-party

Fig. 11. Visualization of anti ad-block clusters using PCA. Mostfirst-party anti ad-blockers are in one dense cluster. In contrast,third-party anti ad-blockers are generally different from eachother.

4.3.1 First-Party Anti Ad-Block Scripts

First-party anti ad-block scripts are fairly simple. Mostof them are 5-10 lines of code, yet they can successfullydetect state-of-the-art ad-blockers. Ad-blockers blockads by removing ad frames and they do not really try tohide their operation. Therefore, such simple first-partyanti ad-blockers can easily detect ad-blockers.

Timing. First-party anti ad-blockers typically executetheir logic at the beginning of the page load process.Since it may take a few seconds before an ad-blockerremoves ads, some websites delay the execution of theirlogic using setTimeout() or setTimeIntervel(). In

Detecting Anti Ad-blockers in the Wild 13

1 // http :// knowlet3389 . blogspot .hk2 if ( localStorage .noad === undefined &&3 (16 >= $("#gAds"). height ()4 || 16 >= $("#gAd2"). height ()))5 // http :// www. elahmad .com6 if( jQuery ("# adblock "). height () ==0)7 // http :// urlchecker .org8 if ($("# adchecker "). height () < 10)9 // http :// www. hentai .to

10 if( document . getElementById (" tester ")!= undefined )11 // http :// forum.pac -rom.com12 if (!ad || ad. innerHTML . length == 0 ||13 ad. clientHeight === 0)

Fig. 12. Examples of detection logic used by first-party anti ad-blockers

Figures 9 and 10, we show two example anti ad-blockscripts, one inserting delay and another without delay.Since their ad-block detection logic is a one-time check(i.e. it is not invoked periodically), adding a delay helpsto ensure that ad-blockers remove ads before anti ad-blockers try to detect ad-blockers.

Detection logic. To detect ad-blockers, first-party antiad-blockers typically check different HTML elements todetect ad-blockers. In Figure 12, we illustrate the de-tection logic implemented by several websites. We notethat the detection checks are fairly simple and intuitive.For example, consider urlchecker.org, which checkswhether the height of adcheker div is less then 10pxs.Our further analysis revealed that Adblock Plus blocksadsbygoogle.js script due to which the adcheker div isempty and its height is equal to 1px. Other publisherscheck CSS properties of different div elements. Sincethe filter lists used by ad-blockers (e.g. EasyList [6])are publicly available, anti ad-blockers can successfullysetup such high-precision detection rules.

Response. Although the detection logic used by first-party anti ad-blockers is fairly similar across the board,their response to ad-blockers vary widely. Figure 13 listsa few of the responses. For hentai.to, a <p> element re-quests users to disable the ad-blocker. Since the origi-nal content is preserved, this response is not aggressive.However, for knowlet3389.blogspot.hk, the#Blog1 div isremoved upon ad-block detection, which indicates thatthe website hides its content from ad-block users. elah-mad.com also aggressively responds by redirecting ad-block users to a warning page. Overall, as discussed ear-lier in Section 3.5, we find a wide spectrum of responsesto ad-block detection, ranging from gentle requests toremove ad-blockers to more aggressive redirection.

Note that some first-party anti ad-block scripts inFigure 11(a) are separated from the dense cluster. Ouranalysis reveals that these scripts are generally moresophisticated than simple first-party anti ad-blockers.One such example is forbes.com, where the timing, de-

1 // http :// knowlet3389 . blogspot .hk2 $("#Blog1"). remove ();3 sweetAlert ("Oops .. please don ’t block my ADs",4 " warning ");5 // http :// www. elahmad .com6 window . location ="/ adblock "7 // http :// urlchecker .org8 $("# ads_notify "). fadeIn ();9 $("# getlinks ").hide ();

10 $("# adchecker_btn "). fadeIn ();11 // http :// www. hentai .to12 document .write(13 ’<p class ="no"> Please <u>disable </u> your14 adblocking software on hentai .to to keep15 our community <u>FREE </u>! ^.^ </p>’);16 // http :// forum.pac -rom.com17 alert("We’ve detected an ad blocker running18 on your browser ...");

Fig. 13. Examples of responses by first-party anti ad-blockers

tection logic, and response of its anti ad-blocker is muchmore sophisticated. First, instead of waiting for a fixedtime before running the one-time check, it continuouslychecks for ad-blockers after periodic intervals. Second, ituses random ad divs as baits and checks their height ordisplay properties for detection. Another unique aspectis that it uses cookies (e.g. forbes_ab) to keep track ofusers’ detection status across multiple visits.

4.3.2 Third-Party Anti Ad-Block Scripts

Some publishers rely on third-party anti ad-block ser-vices. These third-party anti ad-block services not onlyenable publishers to detect ad-block users but also pro-vide different ways to respond. For example, in additionto quantify ad-block usage, some third-party anti ad-block services “recoup lost advertising revenue” for pub-lishers. Below we analyze PageFair, which is a popularthird-party anti ad-block service that enables publish-ers to detect ad-block usage and display tailored non-intrusive ads to ad-block users.

Timing. PageFair performs multiple periodic checksat various stages of the web page load process to de-tect ad-blockers. This approach is much more sophisti-cated than most first-party anti ad-blockers and makesit harder for ad-blockers to evade detection by simplydelaying their activity.

Detection logic. PageFair’s detection logic attemptsto actively trap ad-blockers by injecting different baitson web pages. In addition, PageFair attempts to checkwhether any ad-block plug-in is installed by looking forvarious browser resources exposed by ad-blocking ex-tensions. The use of these methods makes PageFair’s

Detecting Anti Ad-blockers in the Wild 14

1 // DIV bait2 var b = document . createElement ("DIV");3 // d = " influads_block "4 b.id = d;5 //c=’’6 b. className = c;7 //1x1 div8 b.style.width = "1px";9 b.style. height = "1px";

10 // div not located in visible frame11 b.style.top =" -1000 px";12 b.style.left =" -1000 px";13 document .body. appendChild (b);14 // jquery selector of created div15 c = jQuery ("#" + d);16 // check if this div is hidden by ad - blocker17 d = c.is(": hidden ") ? 1 : 0;18 // removing the created div19 c. remove ();2021 // SCRIPT bait22 b = document . createElement (" SCRIPT ");23 //d="295 f89b1"24 b.id = d;25 b.type = "text/ javascript ";26 document . getElementsByTagName ("head")[0]27 . appendChild (b);28 //a=" http :// asset. pagefair .com/ adimages / adsense .js"29 b.src = a

Fig. 14. PageFair uses different baits to detect ad-blockers.

detection logic difficult to evade. We separately discussboth of these methods below.• Baiting: Figure 14 shows two different types of baitsused by PageFair. PageFair injects these baits withwords like “ad” in the element name or URL to triggerad-blockers. The first example shows an injected div el-ement whose identifier is set to influads_block and itis not visible on the page. The second example showsscript bait whose source is set to adsense.js, which isa common script used by Google AdSense. Note thatseveral other anti ad-blockers also use script baits todetect ad-blockers which alleviates the need to checkCSS properties of HTML elements. To identify such antiad-blockers, we rerun our experiments described in Sec-tion 3 and remove all page element removal rules (thatblock ads based on the div names etc.) from EasyListand leave unchanged the remaining rules such as the re-quest blocking rules (that block specific javascripts frombeing downloaded for example). We find that 149 outof 686 websites can still detect ad-blockers. Thus, antiad-blockers on these 149 websites use non-CSS basedapproaches to detect ad-block users (likely by detectingthe removal of bait javascripts).• Extension resources: PageFair also attempts to de-tect the presence of ad-blockers by accessing ex-tension resources exposed by various ad-blockers atchrome-extension://. Figure 15 shows how PageFairaccesses extension resources to identify 8 popular ad-blockers including AdBlock, Adblock Plus, AdBlock

1 var c = {2 adblock : "chrome - extension :// gighmmpiobklfepjoc3 namgkkbiglidom /img/ icon24 .png",45 adblock_plus : "chrome - extension :// cfhdojbkjhnkl6 bpkdaibdccddilifddb /block.html",78 adblock_pro : "chrome - extension :// ocifcklkibdehe9 kfnmflempfgjhbedch / components /block/block.html",

1011 adblock_premium : "chrome - extension :// fndlhnanhe12 doklpdaacidomdnplcjcpj /img/ icon24 .png",1314 adblock_super : "chrome - extension :// knebimhccknd15 higlamoabbnifdkijidd / widgets /block/block.html",1617 adguard : "chrome - extension :// bgnkhhnnamicmpeena18 elnjfhikgbkllg / elemhidehit .png",1920 adremover : "chrome - extension :// mcefmojpghnacead21 nghednjhbmphipkb /img/ icon24 .png",2223 ublock : "chrome - extension :// epcnnfbjfcgphgdmggk24 amkmgojdagdnn /document - blocked .html"25 }

Fig. 15. PageFair tries to access public extension resources todetect ad-blockers.

Pro, AdBlock Premium, Adblock Super, Adguard, AdRemover, and uBlock. For each type of ad-blocker, itincludes a unique extension identifier, e.g. gighmmpi-obklfepjocnamgkkbiglidom for AdBlock, and the resourcefile path. Note that Chrome generally does not allowweb pages to directly access extension resources un-less an extension specifies resources as web_accessible_resources in the manifest file and makes them publiclyaccessible. Our investigation showed that some ad-blockextension resources, such as whitelisted pages shown inFigure 15, are indeed publicly accessible. For example,Adblock Plus exposes block.html, which allows websitesto get the list of blocked URLs. Thus, ad-blockers aresusceptible to leaking the proof of their presence to antiad-blockers.

Response. PageFair provides a whitelist ad service un-der the acceptable ads manifesto [14]. To understandPageFair’s service, we installed PageFair scripts on atest website that uses Google Ads. With ad-blocker, asexpected, we find that Google ads are not served. In-stead, PageFair shows a replacement ad which is hostedon PageFair’s domain. It is noteworthy that PageFair’sdomain is whitelisted as part of the acceptable ads pro-gram. PageFair’s response is reflective of the growingadoption of acceptable ads by many publishers and ad-blockers [40].

In addition to third-party anti ad-block servicessuch as PageFair, several community scripts such as IAB[26] and BlockAdblock [5] are freely available for pub-lishers to detect ad-block users. Publishers can serve

Detecting Anti Ad-blockers in the Wild 15

1 function attachOrFire () {2 var fireNow = false ;3 var fn;45 // function triggers only when document is loaded

completely6 if ( document . readyState ) {7 if ( document . readyState == ’complete ’) {8 fireNow = true;9 }

10 }1112 fn = function () {13 if ( _options . useLocalBait ) {14 beginTest (quickBait , false);15 } else {16 log(’ignoreing local bait - download remote ’);17 fetchRemoteLists ();18 }19 }20 }2122 function beginTest (bait , isRemote ) {23 log(’start beginTest ’);24 if ( findResult == true) {25 return ; // found bait26 }27 testExecuting = true;28 // function to create a bait29 castBait (bait);3031 if (! isRemote ) {32 exeResult .quick = ’testing ’;33 } else {34 exeResult . remote = ’testing ’;35 }3637 timerIds .test = setTimeout (38 function () {39 // function to test bait40 reelIn (bait , 1);41 },42 5);43 }

Fig. 16. Key functions in IAB’s anti ad-block script

these scripts from first-party, third-party content deliv-ery networks, or they can be integrated into third-partyanalytics scripts. Below we analyze IAB’s communityanti ad-block script to detect and respond to ad-blockusers. Note that this script is available both as an inde-pendent script and as a Google Analytics wrapper.

Timing. IAB’s script makes multiple attempts todetect ad-blockers until the maximum_loop limit isreached. Instead of waiting for an arbitrary time, asshown in Figure 16, the script waits for the page tofinish loading and then sets the fireNow flag to begintesting.

Detection logic. IAB’s script injects realistic baits todetect ad-blockers. As shown in Figure 16, the scriptincludes an option to specify an external ad-block filterlist. If an external ad-block filter list is specified, thescript parses the list and then creates baits based on therules in the list. Otherwise, the default baits are used.

The script checks different CSS properties of these baitsto detect ad-block users.

Response. The script allows publishers to implementcustom responses to ad-block detection. For example,publishers can show custom notifications, set cookies totrack ad-block users, or load ads from alternate servers.

5 Concluding RemarksWe presented a machine learning based approach tostudy anti ad-blockers in the wild. Our main observa-tion is that at least 686 websites in the Alexa top-100Kcurrently deploy anti ad-blockers at their home page todetect ad-block users and respond with visible notifica-tions. The notifications ask users to either disable ad-blockers, consider donation, or pay a subscription fee.Almost a quarter of these notifications cannot be dis-missed unless users disable ad-blockers or pay up. Suchattempts to undermine ad-blockers could harm theirutility.

The arms race between ad-blockers and anti ad-blockers is rapidly escalating. To counter anti ad-blockers, ad-blockers have started to use filter lists toremove anti ad-block scripts and ad-block detectionwarnings. EasyList [6], which is used by ad-blockersto block ads, now contains rules that specifically tar-get anti ad-blockers. There are also some dedicated fil-ter lists to counter anti ad-blockers. Adblock WarningRemoval List [3] mostly contains page element removalrules to remove warning messages. Anti-Adblock Killerlist [4] mostly contains web request blocking rules toremove anti ad-block scripts. The rules in these crowd-sourced lists are tailored to specific anti ad-blockers.Since anti ad-blockers continuously adjust their behav-ior, these rules need to be constantly updated to keepup with the changes. We envision the cat-and-mousegame between ad-blockers and anti ad-blockers to con-tinue in future. We expect our findings and analysis tospur future research along this direction.

AcknowledgementsWe thank our shepherd, Vern Paxson, and the anony-mous reviewers for their valuable feedback that consid-erably improved the paper. This work is supported inpart by a grant from the Data Transparency Lab.

Detecting Anti Ad-blockers in the Wild 16

References[1] 2015 Adblocking Report. The PageFair Team. https://blog.

pagefair.com/2015/ad-blocking-report.[2] AdBlock Plus. https://adblockplus.org.[3] Adblock Warning Removal List. https://easylist-downloads.

adblockplus.org/antiadblockfilters.txt.[4] Anti-Adblock Killer List. https://github.com/reek/anti-

adblock-killer/blob/master/anti-adblock-killer-filters.txt.[5] BlockAdBlock. https://github.com/sitexw/BlockAdBlock.[6] EasyList. https://easylist-downloads.adblockplus.org/

easylist.txt.[7] EasyPrivacy. https://easylist-downloads.adblockplus.org/

easyprivacy.txt.[8] Esprima Javascript Parser. http://esprima.org.[9] Ghostery. https://www.ghostery.com.[10] Other Supplementary Filter Lists and EasyList Variants.

https://easylist.to/pages/other-supplementary-filter-lists-and-easylist-variants.html.

[11] Selenium WebDriver. http://www.seleniumhq.org/projects/webdriver.

[12] V8 JavaScript Engine. https://code.google.com/p/v8/.[13] W3C DOM4. W3C Recommendation 19 November 2015.

http://www.w3.org/TR/dom/#mutationobserver.[14] Allowing acceptable ads in adblock plus. https://

adblockplus.org/acceptable-ads, Nov 2015.[15] IAB Internet Advertising Revenue Report. http://www.

iab.com/wp-content/uploads/2015/05/IAB_Internet_Advertising_Revenue_FY_2014.pdf, April 2015.

[16] 2016 Mobile Adblocking Report. The PageFair Team. https://blog.pagefair.com/2015/ad-blocking-report, 2016.

[17] S. Afroz and R. Greenstadt. Phishzoo: Detecting phishingwebsites by looking at them. In 2011 IEEE Fifth Interna-tional Conference on Semantic Computing, 2011.

[18] M. Anderson. Sites that block adblockers seem to be suf-fering. https://thestack.com/world/2016/04/21/sites-that-block-adblockers-seem-to-be-suffering/, 2016.

[19] J. Barr. Checking in on Wired’s Ad-Blocking Experiment,Including an Ad-Free Version. http://adage.com/article/media/checking-wired-magazine-s-ad-blocking-experiment/303795/, 2016.

[20] C. Curtsinger, B. Livshits, B. Zorn, and C. Seifert. ZOZZLE:Fast and Precise In-Browser JavaScript Malware Detection.In USENIX Security Symposium, 2011.

[21] M. Dunlop, S. Groat, and D. Shelly. Goldphish: Using im-ages for content-based phishing analysis. In 2010 Fifth Inter-national Conference on Internet Monitoring and Protection,2010.

[22] S. Englehardt and A. Narayanan. Online tracking: A 1-million-site measurement and analysis. In ACM Conferenceon Computer and Communications Security (CCS). ACM,2016.

[23] D. G. Goldstein, R. P. McAfee, and S. Suri. The Cost ofAnnoying Ads. In WWW, 2013.

[24] D. Gugelmann, M. Happe, B. Ager, and V. Lenders. An Au-tomated Approach for Complementing Ad Blockers’ Black-lists. Privacy Enhancing Technologies (PETS), 2015(2):282–298, 2015.

[25] S. Guha, B. Cheng, and P. Francis. Privad: Practical privacyin online advertising. In NSDI, 2011.

[26] IAB. Ad Block Detection Code Access Request. https://www.iab.com/ad-block-detection-code-access-request/,2016.

[27] A. Kapravelos, Y. Shoshitaishvili, M. Cova, C. Kruegel, andG. Vigna. Revolver: An Automated Approach to the Detec-tion of Evasive Web-based Malware. In USENIX SecuritySymposium, 2013.

[28] L. Kelion. Apple brings ad-blocker extensions to Safari oniPhones. http://www.bbc.com/news/technology-34173732,September 2015.

[29] A. Lerner, A. K. Simpson, T. Kohno, and F. Roesner. In-ternet Jones and the Raiders of the Lost Trackers: An Ar-chaeological Study of Web Tracking from 1996 to 2016. InProceedings of USENIX Security, 2016.

[30] Z. Li, K. Zhang, Y. Xie, F. Yu, and X. Wang. KnowingYour Enemy: Understanding and Detecting Malicious WebAdvertising. In ACM CCS, 2012.

[31] M. Malloy, M. McNamara, A. Cahn, and P. Barford. Adblockers: Global prevalence and impact. In ACM InternetMeasurement Conference (IMC), 2016.

[32] H. Metwalley, S. Traverso, M. Mellia, S. Miskovic, andM. Baldi. The Online Tracking Horde: A View from PassiveMeasurements. In Traffic Monitoring and Analysis. 2015.

[33] T. Mitchell. Machine Learning. Mc-Graw-Hill, 1997.[34] M. L. Morgan Marquis-Boire, Glenn Greenwald. NSA’s

Google for the World’s Private Communications. https://theintercept.com/2015/07/01/nsas-google-worlds-private-communications/.

[35] J. Newman. Yahoo Mail begins blocking users with ad blockenabled. http://www.pcworld.com/article/3006981/data-center-cloud/yahoo-mail-begins-blocking-users-with-ad-block-enabled.html.

[36] R. Nithyanand, S. Khattak, M. Javed, N. Vallina-Rodriguez,M. Falahrastegar, J. E. Powles, E. D. Cristofaro, H. Had-dadi, and S. J. Murdoch. Ad-Blocking and Counter Block-ing: A Slice of the Arms Race. In 6th USENIX Workshopon Free and Open Communications on the Internet (FOCI),2016.

[37] E. Pujol, O. Hohlfeld, and A. Feldmann. Annoyed Users:Ads and Ad-Block Usage in the Wild. In ACM InternetMeasurement Conference (IMC), 2015.

[38] M. Z. Rafique, T. Van Goethem, W. Joosen, C. Huygens,and N. Nikiforakis. It’s free for a reason: Exploring theecosystem of free live streaming services. In Proceedings ofthe 23rd Network and Distributed System Security Sympo-sium (NDSS), 2016.

[39] F. Roesner, T. Kohno, and D. Wetherall. Detecting andDefending Against Third-Party Tracking on the Web. InUSENIX Symposium on Networked Systems Design andImplementation (NSDI), 2012.

[40] R. J. Walls, E. D. Kilmer, N. Lageman, and P. D. McDaniel.Measuring the impact and perception of acceptable adver-tisements. In ACM Internet Measurement Conference (IMC),2015.

[41] A. Zarras, A. Kapravelos, G. Stringhini, T. Holz, C. Kruegel,and G. Vigna. The Dark Alleys of Madison Avenue: Un-derstanding Malicious Advertisements. In ACM InternetMeasurement Conference (IMC), 2014.