Bot detection deck 042514 final

20

description

 

Transcript of Bot detection deck 042514 final

Page 1: Bot detection deck 042514 final
Page 2: Bot detection deck 042514 final

A CLOSER LOOK AT BOTSA Vindico Investigation – 1Q2014

Page 3: Bot detection deck 042514 final

What’s a Bot?

• An Internet bot is a software application that runs automated tasks over the Internet

• Bots can be used for good (search indexing) or bad (ad impressions, hacking, etc.)

• Reports now indicate there is more bot traffic than human traffic on the Internet

• There are 3 main ‘types’ of bots:

• Crawler/Spider

• Covert Crawler

• Zombie Computers (Botnet)

• Bad bots are impacting the video advertising industry

Crawler/Spider

Covert Crawler

Zombie Computers

Page 4: Bot detection deck 042514 final

Bot: Crawler / Spider

• USES: Automated data collection, indexing

• HARDWARE: Typically runs on a cluster of Virtual Machines (VM) on servers located in a datacenter

• ACTIVITY: Generally just makes ‘GET’ requests to static webpages and analyzes responses for links, content, etc. Crawler/spiders do not render the webpage in a browser

• EXAMPLES: GoogleBot, BingBot

• DETECTION: These bots usually identify themselves in their user-agent string

• ADS: Typically would not render an ad. In addition, these bots are almost always on the IAB Bot List and are excluded in impression accounts for MRC accredited ad servers by leveraging the fact that they identify themselves in their user agent string

• VIEWABILITY: Not Applicable (ads not rendered, impressions filtered)

Benign

Page 5: Bot detection deck 042514 final

Bot: Covert Crawler

• USES: Generally malicious – associated with ad fraud, spam, hacking, scraping

• HARDWARE: Typically runs on a cluster of Virtual Machines (VM) on servers located in a datacenter

• ACTIVITY: Mimics a human with full browsing and rendering behavior (plugins, cookies, user-agent, mouse movement, time delays, engage with pages of site)

• EXAMPLES: Client Connections Media, VERSA*, DDC*

• ADS: Attempts to trick ad tracking systems so it registers as a true impression. These crawlers do not identify themselves. In fact, they use a variety of real user-agent strings that are undistinguishable from real users

• VIEWABILITY: Both geometric and browser optimization approaches to viewability will think ads are viewable

Generally Malicious

*Source: detailed within this deck

Page 6: Bot detection deck 042514 final

Bot: Zombie Computer (Botnet)

Real machines ‘infected’ with software (‘virus,’ ‘worm,’ ‘malware’) that allows a remote party to take control of various parts of the system.

• USES: Malicious – associated with ad fraud, hacking (bank accounts, emails, credit cards), Bitcoin Mining, Ransomware

• HARDWARE: Can take over any PC, smart phone, or device. Typically created for Windows (PC) and Android (mobile) environments, but not limited to those

• ACTIVITY: ‘Borrows’ users’ machine, processing or Internet / IP as a proxy, for opening invisible browser windows and loading sites/ads, snooping on users. Replication over network

• EXAMPLES: CryptoLocker, ZeuS, TDSS, ZeroAccess, ASPROX

• ADS: Attempt to trick ad tracking systems so they get paid. Use real user machines, inherit real user IP addresses, real user agent strings, cookies, etc.

• VIEWABILITY: Exploits geometric viewability flaws

Malicious

Page 7: Bot detection deck 042514 final

Bots Have Negative Impact on Video Advertising

• Ad fraud has become an incredibly lucrative business for bot operators, especially with the rise of online video where CPMs are much higher and detection capabilities have historically been much lower.

• This has caused two major trends in the industry over the past 2 years:

• Number of impressions to skyrocket

• CPMs to decrease

• The two parties that are negatively impacted the most are advertisers and real publishers.

• Middle men are still able to make their margin, but lower CPMs force them to use the (cheaper) fraudulent inventory sources, which therefore continue to feed the beast and grow the problem.

Soure: Vindico Adtricity, Q1 2014; Annual Estimate based on $15 CPM

Page 8: Bot detection deck 042514 final

What Vindico Bot Detection Uncovered Using the Adtricity system we’ve identified the top 700,000 bots and zombie machines (botnets) over Q1 2014.

• Initial launch will focus on the top 50% of Bots:

• 11.23% of all Vindico-Adtricity VPAID Imps in Q1

• 7.9B Vindico-Adtricity Bot Impressions in Q1

• $76 million* in fraud in Q1 alone, just in US online video.

• 66% of bot impressions were from ‘zombie computers’; 34% were from ‘covert crawlers’

• Affected Advertisers

• Avg: 10.06% of impressions

• Highest: 52.66% of impressions

• Breakdown by Publishers:

• Highest: 50.9% of impressions

• Media Companies: <2% of impressions

• Networks: 24% of impressions*Estimate based on $15 CPM

The number of bots is rising and number of impressions affected are rising (see Q1 trend graph above)

Page 9: Bot detection deck 042514 final

Exposing Bots: Covert CrawlersCovert Crawler ‘Versa’

• Stats: 55 million imps / month = $825k / month*

• Total Sites: 5 Core with at least 100 total

• Notes: Sites are same template, fake display ads, tokenized urls, VMs spoofing user agents, exact amount of caps ads / IP, rotated screen resolutions, etc.

Distributed Data Center

• 150 million imps / month = $2.2 million / month*

• Top Sites: techbrowsing.com (1/2 the size of all Versa), anchorfree.us, recipeaccess.com

• Total Sites: 15 – 20 Core with at least 100 total

• 7-10 Core datacenters

• Examples: Host Protocol, EGIHosting, MyPrivateProxy.net, GIGLINX, Alentus%, ManageDNS

Generally Malicious

*Estimate based on $15 CPM

Page 10: Bot detection deck 042514 final

Exposing Bots: BotnetsThe Asprox / Kuluoz Botnet

• Currently this botnet is extremely active

• Current main method of initial infection: malware-phishing emails

• WhatsApp Message (via a link)

• Notice to Appear in Court (via an attachment)

• Once installed, it follows the below chain to PPC networks *:

*Source: techhelplist.com

Malicious

Page 11: Bot detection deck 042514 final

Exposing Bots: Reality v. Perception

*Source: techhelplist.comReality Perception

Page 12: Bot detection deck 042514 final

How to Fight Bots in Video AdvertisingViewability alone is not enough.

• Bots can fool viewability

• Good viewability vendors will record bot impressions as non-viewable, but some bots can manipulate viewability metrics for the campaign

Bot filtering alone is not enough.

• 1x1 iframes can still be manipulated

Bot filtering + viewability is not enough.

• Certain sites and measurements can be manipulated (i.e. porn sites, player size, etc.)

A combination of multiple metrics including viewability, execution, content, and traffic are the only way to truly protect ad dollars and grow the ecosystem to the point where it can truly complement TV for brand advertisers.

Page 13: Bot detection deck 042514 final

How Vindico HelpsThere are 3 strategic components to our Detection System:

1. Data Collection

• 40% of all online videos. More data points than anyone else.

2. Data Processing

• Big Data.

• Adtricity servers processes over 1 million events every minute.

• This data has to be logged, loaded, and ready for analysis in real time.

• Even Hadoop, the most well known Big Data framework, was not enough.

• Adtricity utilizes a cutting edge Big Data framework called Spark.

3. Data Analysis

More data than a human could ever analyze.

Adtricity uses cognitive thinking (artificial intelligence) through machine learning to detect and block bots in real time. Adtricity is a comprehensive measure of quality

offering a standardized and transparent system of measurement to the industry. Adtricity brings together viewability and verification into a single solution.

Page 14: Bot detection deck 042514 final

ConclusionBots have infiltrated the video advertising industry and are increasing scale and impressions at an alarming rate.

Vindico’s Bot Detection technology was developed to help advertisers combat fraudulent activity in video advertising. Bot detection is most powerful when part of a buy-side platform as it is organically integrated from the point of delivery and can be used across the full scope of the advertiser’s buy.

Page 15: Bot detection deck 042514 final

Appendix: Top 100 Domains Affected by Bots (pg1)

sekindo.com

menscraft.com

recipegroove.com

menswheels.com

tonightsrecipe.com

tophomegardens.com

outfox.tv

sportsfave.com

allsportshub.com

videolulu.com

sportsidea.com

recipeaccess.com

automotiveboss.com

suggestrecipe.com

clipsgo.com

sportspond.com

beautytrend.tv

athletesvenue.com

everymansfitness.com

expertbites.com

Page 16: Bot detection deck 042514 final

Appendix: Top 100 Domains Affected by Bots (pg2)

trendyidea.com

sportsflare.com

cookingniche.com

sportsadvise.com

cooltraveller.com

athleticsplay.com

financeknow.com

hobbymind.com

homesinspiration.com

loveablehomes.com

sheglamour.com

glamourvibe.com

bettermotorcars.com

plantingforum.com

outstandingvacations.com

recipegrandma.com

financesadviser.com

fitnesstrue.com

leisurenook.com

journeyexplorer.com

Page 17: Bot detection deck 042514 final

Appendix: Top 100 Domains Affected by Bots (pg3)

travellersdirect.com

motorcarsplus.com

cliptimes.com

kitchensview.com

womenschatter.com

fancyrides.com

fitnesswow.com

culinaryswap.com

growersgreen.com

cookingkudos.com

femalevogue.com

motherhoodchic.com

babywhat.com

currenciesforum.com

culinaryflare.com

craftseasy.com

planterstime.com

womenhour.com

insiderfoodie.com

lifestyleanswer.com

Page 18: Bot detection deck 042514 final

Appendix: Top 100 Domains Affected by Bots (pg4)

magazinebaby.com

kitchencuisines.com

plantersforum.com

cookingmogul.com

tastekitchens.com

lifestyleselection.com

gardenleisures.com

travelconnoisseurs.com

extendgame.com

sportsmansmag.com

athleticsleague.com

beautykittens.com

chefspoon.com

travelleralert.com

leisurelocator.com

sportsthrive.com

medicineshub.com

greenflourish.com

athleticsinteractive.com

clipsindex.com

Page 19: Bot detection deck 042514 final

Appendix: Top 100 Domains Affected by Bots (pg5)

lifestylereader.com

sportscompete.com

cookinghours.com

travellerstube.com

sportscircular.com

womenconcierge.com

athleteman.com

leisuretourist.com

travelleradventures.com

carsmenu.com

athleteinsight.com

athletestoday.com

gardenswise.com

sportsrevealed.com

sightscenes.com

foodsac.com

sportsyards.com

makeupbag.tv

womenvenue.com

leisureadventure.com

Page 20: Bot detection deck 042514 final