Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic...

21
1 CVW 9/14/10 MIT Lincoln Laboratory Generating Client Workloads and High-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright 1 Chris Connelly 1 , Tim Braje 1 , Jesse Rabek 1,2 , Lee Rossey 1 and Rob Cunningham 1 1 MIT Lincoln Laboratory 2 Palm, Inc. Presented at RAID, September 16, 2010 Ottawa, CA This work was supported by the US Air Force under Air Force contract FA8721-05-C-0002. The opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.

Transcript of Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic...

Page 1: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

1 CVW 9/14/10

MIT Lincoln Laboratory

Generating Client Workloads and High-Fidelity Network Traffic for Controllable, Repeatable

Experiments in Computer Security

Charles Wright1 Chris Connelly1, Tim Braje1, Jesse Rabek1,2,

Lee Rossey1 and Rob Cunningham1 1MIT Lincoln Laboratory

2Palm, Inc.

Presented at RAID, September 16, 2010 Ottawa, CA

This work was supported by the US Air Force under Air Force contract FA8721-05-C-0002. The opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.

Page 2: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

MIT Lincoln Laboratory 2

CVW 9/14/10

Problem: How to do science in computer security?

•  Hypotheses must be falsifiable

–  Properties must be observable and measurable

•  Experiments must be controllable

•  Results must be repeatable and reproducible

Image courtesy XKCD: http://store.xkcd.com/

S. Peisert and M. Bishop. “How to Design Computer Security Experiments”, WISE 2007.

Page 3: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

MIT Lincoln Laboratory 3

CVW 9/14/10

Challenges: Incredible Complexity

•  High-dimensional, non-continuous spaces

•  Constantly changing –  Hard to repeat or

reproduce results

•  Hard to control –  Especially when

human users are involved

Photo courtesy Peter Alfred Hess via Flickr http://www.flickr.com/photos/peterhess/

Page 4: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

MIT Lincoln Laboratory 4

CVW 9/14/10

Related Work

•  Network Testbeds –  PlanetLab [Chun03], Emulab [White02], FlexLab [Ricci07],

ModelNet [Vahdat02], VINI [Bavier06], NSF’s GENI •  Security Testbeds

–  DETER [Benzel06], LARIAT [Rossey01], DARPA’s National Cyber Range

•  Dynamic Application-layer Replay –  RolePlayer [Cui06], Predator [Small08]

•  Client-side Automation –  Strider HoneyMonkeys [Wang06], MITRE HoneyClient [Wang05]

Page 5: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

MIT Lincoln Laboratory 5

CVW 9/14/10

Network Security Testbed Overview

Internet Enterprise Network

Real Sites and

Content

Emulated Internet Users and

Hosts

LARIAT

Physical Testbed Cluster

Real Operating Systems

Real Apps

Emulated Local Users

User models

*All trademarks are property of their respective owners.

Page 6: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

MIT Lincoln Laboratory 6

CVW 9/14/10

Our Contributions

•  Client-side workload generation techniques –  Statistical models of human users –  Enables controllable, repeatable experiments

with client-side applications

•  Dynamic application-layer replay techniques for emulating the web

–  Provides a façade of connectedness on an isolated testbed network

–  Enables realistic use of client-side applications

•  An example experiment –  To illustrate the utility of our techniques and

highlight some remaining challenges

Page 7: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

MIT Lincoln Laboratory 7

CVW 9/14/10

Client-Side Workload Generation

•  Approach: Drive real, unmodified Windows GUI applications via the Microsoft COM API’s

–  Inject events for mouse clicks and keyboard text input

•  Benefits –  Realistic network traffic, including all the quirks of real

protocol implementations –  Real applications with real vulnerabilities allow for testing of

real attacks and defenses in a controlled environment

•  Challenges –  Which events to inject, and when? –  On an isolated testbed, what do networked apps talk to?

Page 8: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

MIT Lincoln Laboratory 8

CVW 9/14/10

Example Scenario: Email Worm

•  W32.Mixor is a mass-mailing worm that –  Stops security-related processes on the compromised host –  Harvests email addresses from the file system –  Sends infectious emails to the addresses

Hardware

Applications

Virtual Users

Operating Systems

*All trademarks are property of their respective owners.

Page 9: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

MIT Lincoln Laboratory 9

CVW 9/14/10

Workload Generation Overview

*All trademarks are property of their respective owners.

Page 10: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

MIT Lincoln Laboratory 10

CVW 9/14/10

Application User State Machines (AUSM’s)

•  AUSM is a Markov chain model for the sequence of COM events generated by a user of the given application

–  Captures probabilities for:   State Transitions   Emissions (outputs)

•  Benefits: –  Compact representation –  Well known algorithms for learning parameters and for

synthesizing data from models –  Repeatable: Same random seed Same outputs –  Controllable: Experimenter can easily modify AUSM

parameters and re-test

Page 11: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

MIT Lincoln Laboratory 11

CVW 9/14/10

Learning AUSM Parameters

•  Collected data from real users in our research group –  Used DETOURS framework from Microsoft Research –  Recorded all COM events on our workstations for one week

•  Split the event stream by application

•  Train a Markov chain model for each application’s events

Page 12: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

MIT Lincoln Laboratory 12

CVW 9/14/10

Generating Workloads with AUSMs

•  Use a PRNG to traverse the state machine model –  Seed host’s PRNG with a hash of Host ID and Experiment ID –  Each host produces a distinct set of actions –  Actions in an experiment are repeatable across runs

•  Second-level models generate outputs from each state –  E.g. n-gram model for English as input for Word documents

Page 13: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

MIT Lincoln Laboratory 13

CVW 9/14/10

Emulating the Web

1.  Use a HoneyClient to download pages on the real Internet

2.  Capture packets and extract TCP sessions; parse HTTP requests and responses

3.  Replay HTTP responses verbatim to satisfy clients’ requests on the testbed

Use dynamic application-layer replay to construct a reasonable façade of the world-wide web

Page 14: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

MIT Lincoln Laboratory 14

CVW 9/14/10

An Example Experiment

•  Goal: –  Quantify the impact of anti-virus software on system

resource consumption

•  Approach –  Run the same workload with and without AV

  Only tested an open source AV system so far   Can repeat tests with commercial systems in the future

–  Take care to get repeatable results –  Measure resource consumption for each scenario, and

compare

Page 15: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

MIT Lincoln Laboratory 15

CVW 9/14/10

Experiment Testbed Network

*All trademarks are property of their respective owners.

Page 16: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

MIT Lincoln Laboratory 16

CVW 9/14/10

Experimental Procedure

1.  Prepare Systems for Test Run a)  Revert disk images on SUT and INTERNET b)   Revert system clocks on SUT and INTERNET c)  Reboot SUT laptop into Windows environment d)   Seed PRNGs using master experiment seed e)  Start PCP performance logging service

2.  Execute Test Run a)  Start AUSM-based client workload generation b)   Let workload generation run for 2 hours c)  Stop AUSM-based client workload generation

3.  Collect Results a)  Stop PCP performance logging service b)   Archive performance logs c)  Reboot SUT laptop into Linux environment

4.  GOTO 1.

Page 17: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

MIT Lincoln Laboratory 17

CVW 9/14/10

Empirical Results (1)

•  System with AV consumes free RAM sooner

Page 18: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

MIT Lincoln Laboratory 18

CVW 9/14/10

Empirical Results (2)

•  System with AV taxes the CPU much more heavily

Page 19: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

MIT Lincoln Laboratory 19

CVW 9/14/10

Summary

•  New techniques help conduct controlled, repeatable experiments involving client-side security

–  Client-side workload generation –  Emulating the web

•  “Hard science” is still hard. Challenges remain in: –  Formulating good, testable hypotheses –  Making experiments reproducible by others –  Reducing test artifacts –  Instrumentation – How to collect reliable data? –  Developing better models for human users

Page 20: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

20 CVW 9/14/10

MIT Lincoln Laboratory

Thanks!

Page 21: Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic for Controllable, Repeatable Experiments in Computer Security Charles Wright1 Chris

21 CVW 9/14/10

MIT Lincoln Laboratory

Backup Slides