Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W...

19
1 Internet Outbreaks Internet Outbreaks Epidemiology and Defenses Epidemiology and Defenses Geoffrey M. Geoffrey M. Voelker Voelker Collaborative Center for Collaborative Center for Internet Epidemiology and Defenses Internet Epidemiology and Defenses (CCIED) (CCIED) Computer Science and Engineering Computer Science and Engineering 1 UC San Diego UC San Diego February 28, 2007 February 28, 2007 With David Anderson, Jay Chen, With David Anderson, Jay Chen, Cristian Cristian Estan Estan, Chris , Chris Fleizach Fleizach, , Ranjit Ranjit Jhala Jhala, , Flavio Flavio Junqueira Junqueira, , Erin Erin Kenneally Kenneally, Justin Ma, John McCullough, David Moore, Vern , Justin Ma, John McCullough, David Moore, Vern Paxson Paxson (ICSI), Stefan (ICSI), Stefan Savage, Colleen Shannon, Savage, Colleen Shannon, Sumeet Sumeet Singh, Alex Singh, Alex Snoeren Snoeren, Stuart , Stuart Staniford Staniford (Nevis), (Nevis), Amin Amin Vahdat Vahdat, Erik , Erik Vandekeift Vandekeift, George Varghese, Michael , George Varghese, Michael Vrable Vrable, Nick Weaver (ICSI), Qing Zhang , Nick Weaver (ICSI), Qing Zhang Paradise Lost Paradise Lost Our Goal Our Goal Develop the understanding and technology to Develop the understanding and technology to address large address large-scale subversion of Internet hosts scale subversion of Internet hosts Yahoo! and UPF 2

Transcript of Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W...

Page 1: Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W more common in worm traffic Address dispersion : traffic with W has many distinct

1

Internet OutbreaksInternet OutbreaksEpidemiology and DefensesEpidemiology and Defenses

Geoffrey M. Geoffrey M. VoelkerVoelker

Collaborative Center for Collaborative Center for Internet Epidemiology and DefensesInternet Epidemiology and Defenses

(CCIED)(CCIED)

Computer Science and EngineeringComputer Science and Engineering

11

p g gp g gUC San DiegoUC San Diego

February 28, 2007February 28, 2007

With David Anderson, Jay Chen, With David Anderson, Jay Chen, CristianCristian EstanEstan, Chris , Chris FleizachFleizach, , RanjitRanjit JhalaJhala, , FlavioFlavio JunqueiraJunqueira, , Erin Erin KenneallyKenneally, Justin Ma, John McCullough, David Moore, Vern , Justin Ma, John McCullough, David Moore, Vern PaxsonPaxson (ICSI), Stefan (ICSI), Stefan Savage, Colleen Shannon, Savage, Colleen Shannon, SumeetSumeet Singh, Alex Singh, Alex SnoerenSnoeren, Stuart , Stuart StanifordStaniford (Nevis), (Nevis), AminAmin

VahdatVahdat, Erik , Erik VandekeiftVandekeift, George Varghese, Michael , George Varghese, Michael VrableVrable, Nick Weaver (ICSI), Qing Zhang, Nick Weaver (ICSI), Qing Zhang

Paradise LostParadise Lost

Our GoalOur GoalDevelop the understanding and technology to Develop the understanding and technology to address largeaddress large--scale subversion of Internet hostsscale subversion of Internet hosts

Yahoo! and UPF 2

Page 2: Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W more common in worm traffic Address dispersion : traffic with W has many distinct

2

Threat TransformationThreat Transformation

Traditional threats Modern threatsAttacker manually targets high-value system/resource Defender increases cost to compromise high-value systemsBiggest threat: insider attacker

Attacker uses automation to target all systems at once (can filter later)Defender must defend allsystems at once Biggest threats: software vulnerabilities & naïve users

Yahoo! 3

LargeLarge--Scale EnablersScale EnablersUnrestricted high-performance connectivity

Large-scale adoption of IP model for networks & appsg p ppInternet is high-bandwidth, low-latencyThe Internet succeeded!

Software homogeneity & user naivetéSingle bug mass vulnerability in millions of hostsTrusting users (“ok”) mass vulnerability in millions of hosts

Lack of meaningful deterrence

Yahoo! 4

Lack of meaningful deterrenceLittle forensic attribution/audit capability

Effective anonymityNo deterrence, minimal risk

Page 3: Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W more common in worm traffic Address dispersion : traffic with W has many distinct

3

Driving Economic ForcesDriving Economic ForcesEmergence of profit-making payloads

Spam forwarding (MyDoom.A backdoor, SoBig), Credit Card p g ( y , g),theft (Korgo), DDoS extortion, (many) etc…“Virtuous” economic cycle transforms nature of threat

Commoditization of compromised hostsFluid third-party exchange market (millions)

» Going rate for Spam proxying 3 -10 cents/host/weekSeems small, but 25k botnet gets you $40k-130k/yr

» Raw bots, .01$+/host, Special orders ($50+)

Yahoo! 5

Hosts effectively becoming a criminal platformInnovation in both host substrate and its uses

Sophisticated infection and command/control networksDDoS, SPAM, piracy, phishing, identity theft are all applications

Botnet Spammer Rental RatesBotnet Spammer Rental Rates

>20-30k always online SOCKs4, url is de-duped and updated every >10 minutes. 900/weekly, Samples will be sent on request.

M thl t d t di t i

3.6 cents per bot week

6 cents per bot week

>Monthly payments arranged at discount prices.

>$350.00/weekly - $1,000/monthly (USD) >Always Online: 5,000 - 6,000>Updated every: 10 minutes

Yahoo! and UPF 6

p

2.5 cents per bot week

>$220.00/weekly - $800.00/monthly (USD)>Always Online: 9,000 - 10,000>Updated every: 5 minutes

September 2004 postings to SpecialHam.com, Spamforum.biz

Page 4: Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W more common in worm traffic Address dispersion : traffic with W has many distinct

4

Why Worms?Why Worms?All of these “applications” depend on automated mechanisms for subverting large numbers of hostsmechanisms for subverting large numbers of hostsSelf-propagating programs continue to be the most effective mechanism for host subversionPrevent automated subversion severely undermine phishing, DDoS, extortion, etc.

Yahoo! 7

Our Goal: Develop the understanding and technology to address large-scale subversion of Internet hosts

TodayTodayWorm outbreaks

What are we up against?What are we up against?

Framing the worm problem…and solutionsWhat are our options?

Two worm detection and monitoring techniquesFundamental basis for understanding and defending against large-scale Internet attacksEarlyBird: High-speed network-based content sifting

Yahoo! 8

EarlyBird: High speed network based content siftingPotemkin: Large-scale high-fidelity honeyfarm

Current projects

Page 5: Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W more common in worm traffic Address dispersion : traffic with W has many distinct

5

Network TelescopesNetwork Telescopes

Idea: Unsolicited packets evidence of global phenomenaBackscatter: response packets sent by victims provide insight into

Yahoo! 9

p p y p gglobal prevalence of DoS attacks (and who is getting attacked)Scans: request packets can indicate an infection attempt from a worm (and who is current infected, growth rate, etc.)

Very scalable: CCIED Telescope monitors 17M+ IP addrs (> 1% of all routable addresses of the Internet)

2001: A 2001: A DoSDoS OdysseyOdysseyInferring global Internet DoS attacks using backscatter

4 000 DoS attacks/week everyone a victim intense periodic4,000 DoS attacks/week, everyone a victim, intense, periodic

Yahoo! and UPF 10Moore et al., Inferring Internet Denial of Service Activity, USENIX Security, 2001

Page 6: Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W more common in worm traffic Address dispersion : traffic with W has many distinct

6

2001: A Worm Odyssey2001: A Worm OdysseyCodeRed worm released in July 2001

Exploited buffer overflow in Microsoft IIS Infects 360,000 hosts in 14 hours (CRv2)

» Propagation is limited by latency of TCP handshake

Yahoo! and UPF 11Moore et al, CodeRed: a Case study on the Spread of an Internet Worm, IMW 2002 andStaniford et al, How to 0wn the Internet in your Spare Time, USENIX Security 2002

Fast WormsFast WormsSlammer/Sapphire released in January 2003

First ~1 min behaves like classic scanning wormFirst 1 min behaves like classic scanning worm» Doubling time of ~8.5 seconds

>1 min worm saturates access bandwidth» Some hosts issue > 20,000 scans/sec» Self-interfering

Peaks at ~3 min» >55 million IP scans/sec

90% f I t t d i 10 i

Yahoo! and UPF 12

90% of Internet scanned in <10 mins

Moore et al, The Spread of the Sapphire/Slammer Worm, IEEE Security & Privacy, 1(4), 2003

Page 7: Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W more common in worm traffic Address dispersion : traffic with W has many distinct

7

Was Slammer really fast?Was Slammer really fast?Yes, it was orders of magnitude faster than CodeRedNo it was poorly written and unsophisticatedNo, it was poorly written and unsophisticatedWho cares? It is literally an academic point

The current debate is whether one can get < 500msBottom line: way faster than people!

Yahoo! 13

Staniford et al, The Top Speed of Flash Worms, ACM WORM, 2004

Understanding WormsUnderstanding WormsWorms are well modeled as infectious epidemics

Homogeneous random contactsHomogeneous random contacts

Classic SI modelN: population sizeS(t): susceptible hosts at time tI(t): infected hosts at time tβ: contact ratei(t): I(t)/N, s(t): S(t)/N

Yahoo! and UPF 14

i(t): I(t)/N, s(t): S(t)/N

)(

)(

1)( Tt

Tt

eeti −

+= β

β

Staniford, Paxson, Weaver, How to 0wn the Internet in Your Spare Time, USENIX Security 2002

Page 8: Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W more common in worm traffic Address dispersion : traffic with W has many distinct

8

What Can We Do?What Can We Do?1) Reduce number of susceptible hosts S(t)

PreventionPrevention

2) Reduce number of infected hosts I(t)Treatment

3) Prepare for the inevitable NSurvival

4) Reduce the contact rate βC t i t

Yahoo! 15

Containment

PreventionPreventionReduce # of susceptible hosts S(t)Software quality: eliminate vulnerability

Static/dynamic testing [e.g., Cowan, Wagner, Engler]Active research community, taken seriously in industry

» Security code review alone for Windows Server 2003 ~ $200MTraditional problems: soundness, completeness, usability

Software updating: reduce window of vulnerabilityMost worms exploit known vulnerability (10 days 6 months)

» Sapphire: Vulnerability & patch July 2002, worm January 2003

Yahoo! 16

Some activity (Shield [Wang04]), yet critical problem Is finding security holes a good idea? [Rescorla04]

Software heterogeneity: reduce impact of vulnerabilityArtificial heterogeneity [Forrest02]Exploit existing heterogeneity [Junqueira05]

Page 9: Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W more common in worm traffic Address dispersion : traffic with W has many distinct

9

TreatmentTreatmentReduce # of infected hosts I(t)Disinfection: Remove worm from infected hosts

Develop specialized “vaccine” in real-timeDistribute at competitive rate

» Counter-worm, anti-worm Code Green, CRclean, Worm vs. Worm [Castaneda04]

» Exploit vulnerability, patch host, propagateSeems tough [Weaver06]

» Legal issues of using exploits, even if well-intentioned

Yahoo! 17

» Propagation race problem

Automatically patch vulnerability [Keromytis03], [Sidiroglou05]Auto-generate and test patches in sandboxApply within administration domainRequires source, targets known exploits (e.g., overflows)

SurvivalSurvivalPrepare for inevitable

Game of escalationGame of escalation

Approach: Informed replicationWorms represent large-scale dependent failuresModel software configurations model dependent failures

Replicate data on hosts with disjoint configurationsExploit existing software heterogeneityEven with software skew only need 3 replicas

Yahoo! 18

Even with software skew, only need 3 replicas

PhoenixCooperative backup system using informed replication

[Junqueira et al., Surviving Internet Catastrophes, USENIX 2005]

Page 10: Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W more common in worm traffic Address dispersion : traffic with W has many distinct

10

Reactive ContainmentReactive ContainmentReduce contact rate βSlow worm downSlow worm down

Throttle connection rate to slow spread [Twycross03]Important capability, but worm still spreads…

QuarantineDetect and block wormHow feasible is it?

Yahoo! 19

Defense RequirementsDefense RequirementsAny reactive defense is defined by:

Reaction time – how long to detect worm propagateReaction time how long to detect worm, propagate information, and activate responseContainment strategy – how malicious behavior is identifiedDeployment scenario – who participates in the system

Given these, what are the engineering requirements for any effective defense?

Yahoo! 20

for any effective defense?

[Moore et al., Internet Quarantine: Requirements for Containing Self-Propagating Code, Infocom 2003]

Page 11: Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W more common in worm traffic Address dispersion : traffic with W has many distinct

11

Containment RequirementsContainment RequirementsUniversal deployment for Code Red

Address filtering (blacklists), must respond < 25 minsg ( ) pContent filtering (signatures), must respond < 3 hours

For faster worms (slammer): secondsWorse for non-universal deployment…

Bottom line: very challenging (at global scale)

e

Yahoo! and UPF 21

Rea

ctio

n tim

e

Propagation rate (probes/sec)

Scalable Detection and Scalable Detection and MonitoringMonitoring

Detection and monitoring are fundamental for understanding and defending against wormsunderstanding and defending against wormsLessons from containment

Need to detect worms in less than a secondHow can we do this?

Know thy enemyWhat does the worm/virus/bot do?Who is controlling it?

Yahoo! 22

Who is controlling it?

Page 12: Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W more common in worm traffic Address dispersion : traffic with W has many distinct

12

Signature InferenceSignature InferenceChallenge: In less than a second…

Detect worm probesDetect worm probesCharacterize worm packets with a byte signature

ApproachMonitor networkIdentify packets with common strings spreading like a wormUse signature for content filtering

Yahoo! 23

Content SiftingContent SiftingAssume unique, invariant string W for all worm probes

Works today but not foreverWorks today, but not forever

ConsequencesContent prevalence: W more common in worm trafficAddress dispersion: traffic with W has many distinct src/dests

Content SiftingIdentify W with high prevalence and high dispersionUse W as filter signature in network

Yahoo! 24

Use W as filter signature in network

[Singh et al., Automated Worm Fingerprinting, OSDI 2004]

Page 13: Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W more common in worm traffic Address dispersion : traffic with W has many distinct

13

Content Sifting in Early BirdContent Sifting in Early BirdChallenges: Time and space

Must touch every byte in all packets (1 Gbps 12 us/packet)y y p ( p p )Simple algorithm consumes 100 MB/s of memory

Approach: Careful algorithms and data structuresIncremental hash functionsValue-based samplingMulti-state filters and multi-resolution and counting bitmapsCombined: 60 us/packet in software

Works well in practice

Yahoo! 25

Works well in practiceDeployed at UCSD CSE for 8 monthsDetected every worm outbreak reported on security listsIdentified unknown worms (Kibvu, Sasser)

Tech TransferTech TransferContent sifting technologies patented by UC and licensed to startup, Netsift Inc.licensed to startup, Netsift Inc.Netsift significantly improved performance, features

Hardware implementation, new capabilities

In June 2005, Netsift was acquired by Cisco

Yahoo! 26

Page 14: Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W more common in worm traffic Address dispersion : traffic with W has many distinct

14

Going FurtherGoing FurtherNetwork telescopes, content sifting have limitations

Passive observation no interaction with malwarePassive observation, no interaction with malwareLexical domain is limited» Evasion through polymorphism, protocol framing, encryption

Want to answer deeper questionsWhat does a worm/virus/bot do?What vulnerabilities are exploited, and how?Who is controlling it, how is it controlled?

Yahoo! 27

g ,

Alternative: Endpoint monitoring

Scalability/Fidelity TradeoffScalability/Fidelity Tradeoff

Telescopes + Responders

Live Honeypot

(iSink, honeyd, Internet Motion Sensor)

VM-based Honeynet(e.g., Collapsar)

NetworkTelescopes(passive)

Yahoo! 28

MostScalable

HighestFidelity

Page 15: Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W more common in worm traffic Address dispersion : traffic with W has many distinct

15

Can We Achieve Both?Can We Achieve Both?Naïve approach: one machine per IP address

1M addresses = 1M hosts = $2B+ investment1M addresses 1M hosts $2B+ investmentOverkill… most resources will be wasted

In truth, only necessary to maintain the illusion of continuously live honeypot systems

Yahoo! 29

Maintain illusion on the cheap usingNetwork multiplexingHost multiplexing

NetworkNetwork--Level MultiplexingLevel MultiplexingMost addresses are idle at any given time

Late bind honeypots to IP addressesLate bind honeypots to IP addresses

Most traffic does not cause an infectionRecycle honeypots if can’t detect anything interestingOnly maintain honeypots of interest for extended periods

One honeypot for every 100-1000 IP addresses

Yahoo! 30

Page 16: Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W more common in worm traffic Address dispersion : traffic with W has many distinct

16

HostHost--level multiplexinglevel multiplexingCPU utilization in each honeypot is quite low (<<1%)

Use VMM to multiplex honeypots on single machineUse VMM to multiplex honeypots on single machineDone in practice, but limited by memory bottleneck

Memory coherence propertyFew memory pages are actually modified in inputShare unmodified pages between VMs copy-on-write

One physical machine for 100-1000 honeypots

Yahoo! 31

Potemkin: A HighPotemkin: A High--Fidelity, Fidelity, LargeLarge--Scale HoneyfarmScale Honeyfarm

Gateway: Multiplexes traffic onto VM honeypotsPotemkin VMM: Multiplexes VMs on serversp

Yahoo! and UPF 32Vrable et al., Scalability, Fidelity, and Containment in the Potemkin Virtual Honeyfarm, SOSP 2005

Page 17: Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W more common in worm traffic Address dispersion : traffic with W has many distinct

17

Potemkin VMMPotemkin VMM

Modified Xen using shadow translate modeIntegrated into VT for Windows supportIntegrated into VT for Windows support

Clone manager instantiates frozen VM image and keeps it resident in physical memory

Flash cloning: memory instantiated via eager copy of PTE pages and lazy faulting of data pages (no software startup)Delta virtualization: copy implemented as copy-on-write (no memory overhead for shared code/data)

Supports hundreds of simultaneous VMs per host

Yahoo! 33

Supports hundreds of simultaneous VMs per hostOverhead: currently takes 200-500ms to create new VM

Imperceptible to human user and under TCP handshake timeoutWildly unoptimized (e.g., includes multiple Python invocations)

» Pre-allocated VM’s can be invoked in ~5ms

SummarySummaryInternet hosts are highly vulnerable to worm outbreaks

Millions of hosts can be “taken” before anyone realizesMillions of hosts can be taken before anyone realizes Supports vibrant ecosystem of criminal activity

Containment (Quarantine) requires automated responsePrevention is a critical element, but outbreaks inevitable

Need scalable detection, can also plan to survive (Phoenix)

Different detection strategies, monitoring approachesHi h d t k b d t t ifti (E l Bi d)

Yahoo! 34

High-speed network-based content sifting (EarlyBird)Large-scale high-fidelity honeyfarm (Potemkin)

Smart bad guys still have a huge advantageEscalation: Rapid innovation in both problems and solutions

Page 18: Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W more common in worm traffic Address dispersion : traffic with W has many distinct

18

Underground EconomyUnderground EconomyAcquisition, trade, liquidation of illicit digital goods

ccard phishing bots malware scamsccard, phishing, bots, malware, scams, …Online markets, market enablers, cash out, …

Hypothesis: Understanding the underground economy will help us develop/target technology

Where are economic bottlenecks? Where is value-chain brittle? Where are participants exposed? Transaction volume/price dynamism?

Data sourcesSpam, IRC feeds/Web forums, phishing drop sitesHave one spam feed (200K/day), developing relationships for others…but always looking for more data

Yahoo! 35

SpamscatterSpamscatterMonitor scam sites advertised in spam

Extract URLs to scams from spamProbe, download pages for a week

Identify multiple sites for the same scamImage shingling: tolerates ad rotation, etc.

WorkloadSpam from 4-letter TLD (200,000 spams/day)

What do we find?2,300 scams/week

Yahoo! 36

2,300 scams/week60% scam sites in U.S.

» (vs. 13% spam relays)Only 10% scams “malicious”

» (vs. pharm, s/w, merchandise, etc.)38% sites hosted multiple scams

Page 19: Internet Outbreaks - University of California, San Diego · 2007. 3. 13. · Content prevalence : W more common in worm traffic Address dispersion : traffic with W has many distinct

19

Other ProjectsOther ProjectsSelf-moderating outbreaks (get 80% and stop) [Ma05]Prevalence of polymorphism in exploits [Ma06]p y p p [ ]Forensics with honeyfarm

Network dynamics, network and host supportData-centric attribution and policy enforcement

These files should not leave the corporate networkThese files always need to be encrypted on disk/networkAnd any objects derived from them (emails w/ attachments, cut-and-paste, etc.)p )Use generalized taint mechanisms and virtual machines

Privacy-preserving packet attributionAttribution: routers, hosts can verify packet sourcesPrivacy: …but not reveal contentsAttribution one step towards deterrence

Yahoo! 37

For More Info…For More Info…

http://www.ccied.org

Yahoo! and UPF 38