Download - Detecting Fraudulent Clicks From BotNets 2.0

Detecting Fraudulent Clicks From BotNets 2.0

Adam BarthJoint work with Dan Boneh, Andrew Bortz, Collin Jackson,

John Mitchell, Weidong Shao, and Elizabeth Stinson

BotNets, Current and FutureTraditional BotNets BotNets 2.0

Permanent malware Ephemeral• Infect host

– Email attachments– Drive-by downloads

• Browser-based– Malicious advertisements– Popular web sites

Click-fraud, Spam,DDoS, Key-logging

Click-fraud, Spam,(maybe DDoS)

~100,000 members Much larger

Browser Security Model• Same-origin policy for network access

– Origin is scheme://host:port

• Write HTTP anywhere on the network– Easy using HTML forms– Except restricted ports, like 25 (SMTP)

• Read from origin only– Can read some “library” formats from anywhere

• JavaScript, CSS, Images, Applets, etc

Desired Properties of Policy• Can’t send spam

– Writes to port 25 blocked

• Can’t click advertisements– Need to READ a token to make a click count

• Unfortunately…

DNS Rebinding Attacks• Circumvent browser network access policy• attacker.com points to attacker and target

• Can read and write sockets to anywhere

<policy-file-request/>

<allow-access-from domain="*" to-ports="*" />

attacker’sserver

targetserver

rebindDNS

An Experiment• We ran a Flash ad (gains socket access)

– Paid $30– 50,951 impressions from 44,924 unique IP addresses

• 90.6% of browser vulnerable– More if we include other rebinding attacks

• $100 to hijack 100,000 IP addresses– No click required– Impressions are cheap

Duration of IP Hijacking

A Long Tail• Some impressions last for days

Using Rebinding for Click-Fraud• Enroll as a publisher with ad network A

– Publish pay-per-click ads on your site

• Enroll as a advertiser with ad network B– Buy pay-per-impression Flash ads

• Buy bots for $0.001 each– Use 99% just to generate impressions on your site– Use 1% to generate ad clicks on $0.50/per-click ads– Multiply your money by 5, repeat

Implications for Click-Fraud Defense• Simulates IP distribution exactly

– Each bot an independent sample from web visitors– Black-listing IPs as bot infested meaningless

• Traffic time-appropriate for IP– Human at that IP actually surfing the web right now

• HTTP headers appropriate for IP– Grab real headers from request for Flash ad– Can’t get cookies, but many networks don’t use them

Distinguish Bots from Humans• Bots cannot simulate human cognition

• Can’t use traditional CAPTCHAs– Too disruptive to the user experience– User has not interest in proving their humanity

• Click-fraud detection a different problem– CAPTCHAs determine if this client a human– We just need estimate the proportion of humans

A Straw-Man Design• Humans click “Yes!”• Bots click at random

• Ad network stats:– 3487 Yes clicks– 1271 No clicks

• How many bots?– Expectation: 2542– High probably bound an

exercise for the reader

A Real Advertisement• Where will humans

click?

• Bots cannot simulate

• Can’t trick humansinto clicking– Actually need process ad

Image Recognition Doesn’t Help

• Suppose the bot can identify the hot spots– Say by segmenting the image using vision techniques

• In what ratio should the bot click?– Depends on the relative appeal of the hot spots– Requires human-level AI to get right

• Any error a signal of bot proportion

Fraudster Has to Click on Many Ads

Ad Network can Measure Humans• At first, run ads on trusted partners

– Record distribution of human click location– Easy to record (x, y) coordinates of click on web

• Cheap for ad network– Was going to run ad anyway

• Expensive for attacker to influence– Must use valuable bot clicks without payout– Must be clicking everywhere all the time

A Work in Progress• Need to validate diversity in distribution

– Will run real ads and measure click location– How does distribution vary by screen location of ad?

• Experiment with ad design– Objective: human click location hard for bot to predict

• Text ads?– Less area to click and less enticing visuals– There still might be a valuable signal in click location

Conclusions• BotNets 2.0 are coming

– Cheap, large-scale, ephemeral bots in the browser– Don’t require full-machine compromise– Heuristic click-fraud detection’s days are numbered

• Click location can divide humans from bots– Accurate simulation requires human cognition– Easy for ad networks to deploy– More science needed to determine effectiveness

Thanks!