Bot detection deck 042514 final
-
Upload
vindicogroup -
Category
Marketing
-
view
50 -
download
1
description
Transcript of Bot detection deck 042514 final
A CLOSER LOOK AT BOTSA Vindico Investigation – 1Q2014
What’s a Bot?
• An Internet bot is a software application that runs automated tasks over the Internet
• Bots can be used for good (search indexing) or bad (ad impressions, hacking, etc.)
• Reports now indicate there is more bot traffic than human traffic on the Internet
• There are 3 main ‘types’ of bots:
• Crawler/Spider
• Covert Crawler
• Zombie Computers (Botnet)
• Bad bots are impacting the video advertising industry
Crawler/Spider
Covert Crawler
Zombie Computers
Bot: Crawler / Spider
• USES: Automated data collection, indexing
• HARDWARE: Typically runs on a cluster of Virtual Machines (VM) on servers located in a datacenter
• ACTIVITY: Generally just makes ‘GET’ requests to static webpages and analyzes responses for links, content, etc. Crawler/spiders do not render the webpage in a browser
• EXAMPLES: GoogleBot, BingBot
• DETECTION: These bots usually identify themselves in their user-agent string
• ADS: Typically would not render an ad. In addition, these bots are almost always on the IAB Bot List and are excluded in impression accounts for MRC accredited ad servers by leveraging the fact that they identify themselves in their user agent string
• VIEWABILITY: Not Applicable (ads not rendered, impressions filtered)
Benign
Bot: Covert Crawler
• USES: Generally malicious – associated with ad fraud, spam, hacking, scraping
• HARDWARE: Typically runs on a cluster of Virtual Machines (VM) on servers located in a datacenter
• ACTIVITY: Mimics a human with full browsing and rendering behavior (plugins, cookies, user-agent, mouse movement, time delays, engage with pages of site)
• EXAMPLES: Client Connections Media, VERSA*, DDC*
• ADS: Attempts to trick ad tracking systems so it registers as a true impression. These crawlers do not identify themselves. In fact, they use a variety of real user-agent strings that are undistinguishable from real users
• VIEWABILITY: Both geometric and browser optimization approaches to viewability will think ads are viewable
Generally Malicious
*Source: detailed within this deck
Bot: Zombie Computer (Botnet)
Real machines ‘infected’ with software (‘virus,’ ‘worm,’ ‘malware’) that allows a remote party to take control of various parts of the system.
• USES: Malicious – associated with ad fraud, hacking (bank accounts, emails, credit cards), Bitcoin Mining, Ransomware
• HARDWARE: Can take over any PC, smart phone, or device. Typically created for Windows (PC) and Android (mobile) environments, but not limited to those
• ACTIVITY: ‘Borrows’ users’ machine, processing or Internet / IP as a proxy, for opening invisible browser windows and loading sites/ads, snooping on users. Replication over network
• EXAMPLES: CryptoLocker, ZeuS, TDSS, ZeroAccess, ASPROX
• ADS: Attempt to trick ad tracking systems so they get paid. Use real user machines, inherit real user IP addresses, real user agent strings, cookies, etc.
• VIEWABILITY: Exploits geometric viewability flaws
Malicious
Bots Have Negative Impact on Video Advertising
• Ad fraud has become an incredibly lucrative business for bot operators, especially with the rise of online video where CPMs are much higher and detection capabilities have historically been much lower.
• This has caused two major trends in the industry over the past 2 years:
• Number of impressions to skyrocket
• CPMs to decrease
• The two parties that are negatively impacted the most are advertisers and real publishers.
• Middle men are still able to make their margin, but lower CPMs force them to use the (cheaper) fraudulent inventory sources, which therefore continue to feed the beast and grow the problem.
Soure: Vindico Adtricity, Q1 2014; Annual Estimate based on $15 CPM
What Vindico Bot Detection Uncovered Using the Adtricity system we’ve identified the top 700,000 bots and zombie machines (botnets) over Q1 2014.
• Initial launch will focus on the top 50% of Bots:
• 11.23% of all Vindico-Adtricity VPAID Imps in Q1
• 7.9B Vindico-Adtricity Bot Impressions in Q1
• $76 million* in fraud in Q1 alone, just in US online video.
• 66% of bot impressions were from ‘zombie computers’; 34% were from ‘covert crawlers’
• Affected Advertisers
• Avg: 10.06% of impressions
• Highest: 52.66% of impressions
• Breakdown by Publishers:
• Highest: 50.9% of impressions
• Media Companies: <2% of impressions
• Networks: 24% of impressions*Estimate based on $15 CPM
The number of bots is rising and number of impressions affected are rising (see Q1 trend graph above)
Exposing Bots: Covert CrawlersCovert Crawler ‘Versa’
• Stats: 55 million imps / month = $825k / month*
• Total Sites: 5 Core with at least 100 total
• Notes: Sites are same template, fake display ads, tokenized urls, VMs spoofing user agents, exact amount of caps ads / IP, rotated screen resolutions, etc.
Distributed Data Center
• 150 million imps / month = $2.2 million / month*
• Top Sites: techbrowsing.com (1/2 the size of all Versa), anchorfree.us, recipeaccess.com
• Total Sites: 15 – 20 Core with at least 100 total
• 7-10 Core datacenters
• Examples: Host Protocol, EGIHosting, MyPrivateProxy.net, GIGLINX, Alentus%, ManageDNS
Generally Malicious
*Estimate based on $15 CPM
Exposing Bots: BotnetsThe Asprox / Kuluoz Botnet
• Currently this botnet is extremely active
• Current main method of initial infection: malware-phishing emails
• WhatsApp Message (via a link)
• Notice to Appear in Court (via an attachment)
• Once installed, it follows the below chain to PPC networks *:
*Source: techhelplist.com
Malicious
Exposing Bots: Reality v. Perception
*Source: techhelplist.comReality Perception
How to Fight Bots in Video AdvertisingViewability alone is not enough.
• Bots can fool viewability
• Good viewability vendors will record bot impressions as non-viewable, but some bots can manipulate viewability metrics for the campaign
Bot filtering alone is not enough.
• 1x1 iframes can still be manipulated
Bot filtering + viewability is not enough.
• Certain sites and measurements can be manipulated (i.e. porn sites, player size, etc.)
A combination of multiple metrics including viewability, execution, content, and traffic are the only way to truly protect ad dollars and grow the ecosystem to the point where it can truly complement TV for brand advertisers.
How Vindico HelpsThere are 3 strategic components to our Detection System:
1. Data Collection
• 40% of all online videos. More data points than anyone else.
2. Data Processing
• Big Data.
• Adtricity servers processes over 1 million events every minute.
• This data has to be logged, loaded, and ready for analysis in real time.
• Even Hadoop, the most well known Big Data framework, was not enough.
• Adtricity utilizes a cutting edge Big Data framework called Spark.
3. Data Analysis
More data than a human could ever analyze.
Adtricity uses cognitive thinking (artificial intelligence) through machine learning to detect and block bots in real time. Adtricity is a comprehensive measure of quality
offering a standardized and transparent system of measurement to the industry. Adtricity brings together viewability and verification into a single solution.
ConclusionBots have infiltrated the video advertising industry and are increasing scale and impressions at an alarming rate.
Vindico’s Bot Detection technology was developed to help advertisers combat fraudulent activity in video advertising. Bot detection is most powerful when part of a buy-side platform as it is organically integrated from the point of delivery and can be used across the full scope of the advertiser’s buy.
Appendix: Top 100 Domains Affected by Bots (pg1)
sekindo.com
menscraft.com
recipegroove.com
menswheels.com
tonightsrecipe.com
tophomegardens.com
outfox.tv
sportsfave.com
allsportshub.com
videolulu.com
sportsidea.com
recipeaccess.com
automotiveboss.com
suggestrecipe.com
clipsgo.com
sportspond.com
beautytrend.tv
athletesvenue.com
everymansfitness.com
expertbites.com
Appendix: Top 100 Domains Affected by Bots (pg2)
trendyidea.com
sportsflare.com
cookingniche.com
sportsadvise.com
cooltraveller.com
athleticsplay.com
financeknow.com
hobbymind.com
homesinspiration.com
loveablehomes.com
sheglamour.com
glamourvibe.com
bettermotorcars.com
plantingforum.com
outstandingvacations.com
recipegrandma.com
financesadviser.com
fitnesstrue.com
leisurenook.com
journeyexplorer.com
Appendix: Top 100 Domains Affected by Bots (pg3)
travellersdirect.com
motorcarsplus.com
cliptimes.com
kitchensview.com
womenschatter.com
fancyrides.com
fitnesswow.com
culinaryswap.com
growersgreen.com
cookingkudos.com
femalevogue.com
motherhoodchic.com
babywhat.com
currenciesforum.com
culinaryflare.com
craftseasy.com
planterstime.com
womenhour.com
insiderfoodie.com
lifestyleanswer.com
Appendix: Top 100 Domains Affected by Bots (pg4)
magazinebaby.com
kitchencuisines.com
plantersforum.com
cookingmogul.com
tastekitchens.com
lifestyleselection.com
gardenleisures.com
travelconnoisseurs.com
extendgame.com
sportsmansmag.com
athleticsleague.com
beautykittens.com
chefspoon.com
travelleralert.com
leisurelocator.com
sportsthrive.com
medicineshub.com
greenflourish.com
athleticsinteractive.com
clipsindex.com
Appendix: Top 100 Domains Affected by Bots (pg5)
lifestylereader.com
sportscompete.com
cookinghours.com
travellerstube.com
sportscircular.com
womenconcierge.com
athleteman.com
leisuretourist.com
travelleradventures.com
carsmenu.com
athleteinsight.com
athletestoday.com
gardenswise.com
sportsrevealed.com
sightscenes.com
foodsac.com
sportsyards.com
makeupbag.tv
womenvenue.com
leisureadventure.com