Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic...

1 CVW 9/14/10

MIT Lincoln Laboratory

Generating Client Workloads and High-Fidelity Network Traffic for Controllable, Repeatable

Experiments in Computer Security

Charles Wright1 Chris Connelly1, Tim Braje1, Jesse Rabek1,2,

Lee Rossey1 and Rob Cunningham1 1MIT Lincoln Laboratory

2Palm, Inc.

Presented at RAID, September 16, 2010 Ottawa, CA

This work was supported by the US Air Force under Air Force contract FA8721-05-C-0002. The opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.

MIT Lincoln Laboratory 2

CVW 9/14/10

Problem: How to do science in computer security?

•  Hypotheses must be falsifiable

–  Properties must be observable and measurable

•  Experiments must be controllable

•  Results must be repeatable and reproducible

Image courtesy XKCD: http://store.xkcd.com/

S. Peisert and M. Bishop. “How to Design Computer Security Experiments”, WISE 2007.


CVW 9/14/10

Challenges: Incredible Complexity

•  High-dimensional, non-continuous spaces

•  Constantly changing –  Hard to repeat or

reproduce results

•  Hard to control –  Especially when

human users are involved

Photo courtesy Peter Alfred Hess via Flickr http://www.flickr.com/photos/peterhess/


CVW 9/14/10

Related Work

•  Network Testbeds –  PlanetLab [Chun03], Emulab [White02], FlexLab [Ricci07],

ModelNet [Vahdat02], VINI [Bavier06], NSF’s GENI •  Security Testbeds

–  DETER [Benzel06], LARIAT [Rossey01], DARPA’s National Cyber Range

•  Dynamic Application-layer Replay –  RolePlayer [Cui06], Predator [Small08]

•  Client-side Automation –  Strider HoneyMonkeys [Wang06], MITRE HoneyClient [Wang05]


CVW 9/14/10

Network Security Testbed Overview

Internet Enterprise Network

Real Sites and

Content

Emulated Internet Users and

Hosts

LARIAT

Physical Testbed Cluster

Real Operating Systems

Real Apps

Emulated Local Users

User models

*All trademarks are property of their respective owners.


CVW 9/14/10

Our Contributions

•  Client-side workload generation techniques –  Statistical models of human users –  Enables controllable, repeatable experiments

with client-side applications

•  Dynamic application-layer replay techniques for emulating the web

–  Provides a façade of connectedness on an isolated testbed network

–  Enables realistic use of client-side applications

•  An example experiment –  To illustrate the utility of our techniques and

highlight some remaining challenges


CVW 9/14/10

Client-Side Workload Generation

•  Approach: Drive real, unmodified Windows GUI applications via the Microsoft COM API’s

–  Inject events for mouse clicks and keyboard text input

•  Benefits –  Realistic network traffic, including all the quirks of real

protocol implementations –  Real applications with real vulnerabilities allow for testing of

real attacks and defenses in a controlled environment

•  Challenges –  Which events to inject, and when? –  On an isolated testbed, what do networked apps talk to?


CVW 9/14/10

Example Scenario: Email Worm

•  W32.Mixor is a mass-mailing worm that –  Stops security-related processes on the compromised host –  Harvests email addresses from the file system –  Sends infectious emails to the addresses

Hardware

Applications

Virtual Users

Operating Systems



CVW 9/14/10

Workload Generation Overview



CVW 9/14/10

Application User State Machines (AUSM’s)

•  AUSM is a Markov chain model for the sequence of COM events generated by a user of the given application

–  Captures probabilities for:   State Transitions   Emissions (outputs)

•  Benefits: –  Compact representation –  Well known algorithms for learning parameters and for

synthesizing data from models –  Repeatable: Same random seed Same outputs –  Controllable: Experimenter can easily modify AUSM

parameters and re-test


CVW 9/14/10

Learning AUSM Parameters

•  Collected data from real users in our research group –  Used DETOURS framework from Microsoft Research –  Recorded all COM events on our workstations for one week

•  Split the event stream by application

•  Train a Markov chain model for each application’s events


CVW 9/14/10

Generating Workloads with AUSMs

•  Use a PRNG to traverse the state machine model –  Seed host’s PRNG with a hash of Host ID and Experiment ID –  Each host produces a distinct set of actions –  Actions in an experiment are repeatable across runs

•  Second-level models generate outputs from each state –  E.g. n-gram model for English as input for Word documents


CVW 9/14/10

Emulating the Web

1.  Use a HoneyClient to download pages on the real Internet

2.  Capture packets and extract TCP sessions; parse HTTP requests and responses

3.  Replay HTTP responses verbatim to satisfy clients’ requests on the testbed

Use dynamic application-layer replay to construct a reasonable façade of the world-wide web


CVW 9/14/10

An Example Experiment

•  Goal: –  Quantify the impact of anti-virus software on system

resource consumption

•  Approach –  Run the same workload with and without AV

  Only tested an open source AV system so far   Can repeat tests with commercial systems in the future

–  Take care to get repeatable results –  Measure resource consumption for each scenario, and

compare


CVW 9/14/10

Experiment Testbed Network



CVW 9/14/10

Experimental Procedure

1.  Prepare Systems for Test Run a)  Revert disk images on SUT and INTERNET b)   Revert system clocks on SUT and INTERNET c)  Reboot SUT laptop into Windows environment d)   Seed PRNGs using master experiment seed e)  Start PCP performance logging service

2.  Execute Test Run a)  Start AUSM-based client workload generation b)   Let workload generation run for 2 hours c)  Stop AUSM-based client workload generation

3.  Collect Results a)  Stop PCP performance logging service b)   Archive performance logs c)  Reboot SUT laptop into Linux environment

4.  GOTO 1.


CVW 9/14/10

Empirical Results (1)

•  System with AV consumes free RAM sooner


CVW 9/14/10

Empirical Results (2)

•  System with AV taxes the CPU much more heavily


CVW 9/14/10

Summary

•  New techniques help conduct controlled, repeatable experiments involving client-side security

–  Client-side workload generation –  Emulating the web

•  “Hard science” is still hard. Challenges remain in: –  Formulating good, testable hypotheses –  Making experiments reproducible by others –  Reducing test artifacts –  Instrumentation – How to collect reliable data? –  Developing better models for human users

20 CVW 9/14/10


Thanks!

21 CVW 9/14/10


Backup Slides

Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic...

Documents

Transcript of Generating Client Workloads and High-Fidelity Network … Wright.pdfHigh-Fidelity Network Traffic...