Practical Data Confinement

60
Practical Data Confinement Andrey Ermolinskiy, Sachin Katti, Scott Shenker, Lisa Fowler, Murphy McCauley

description

Practical Data Confinement. Andrey Ermolinskiy, Sachin Katti, Scott Shenker, Lisa Fowler, Murphy McCauley. Introduction. Controlling the flow of sensitive information is one of the central challenges in managing an organization Preventing exfiltration (theft) by malicious entities - PowerPoint PPT Presentation

Transcript of Practical Data Confinement

Page 1: Practical Data Confinement

Practical Data Confinement

Andrey Ermolinskiy, Sachin Katti, Scott Shenker, Lisa Fowler, Murphy McCauley

Page 2: Practical Data Confinement

Introduction Controlling the flow of sensitive information is one of

the central challenges in managing an organization Preventing exfiltration (theft) by malicious entities Enforcing dissemination policies

Page 3: Practical Data Confinement

Why is it so hard to secure sensitive data?

Page 4: Practical Data Confinement

Why is it so hard to secure sensitive data? Modern software is rife with security holes that

can be exploited for exfiltration

Page 5: Practical Data Confinement

Why is it so hard to secure sensitive data? Modern software is rife with security holes that

can be exploited for exfiltration

Users must be trusted to remember, understand, and obey dissemination restrictions In practice, users are careless and often

inadvertently allow data to leak E-mail sensitive documents to the wrong parties Transfer data to insecure machines and portable devices

Page 6: Practical Data Confinement

Our Goal Develop a practical data confinement solution

Page 7: Practical Data Confinement

Our Goal Develop a practical data confinement solution

Key requirement: compatibility with existing infrastructure and patterns of use Support current operating systems, applications, and

means of communication Office productivity apps: word processing, spreadsheets, … Communication: E-mail, IM, VoIP, FTP, DFS, …

Avoid imposing restrictions on user behavior Allow access to untrusted Internet sites Permit users to download and install untrusted applications

Page 8: Practical Data Confinement

Our Assumptions and Threat Model Users

Benign, do not intentionally exfiltrate data Make mistakes, inadvertently violate policies

Software platform (productivity applications and OS) Non-malicious, does not exfiltrate data in pristine state Vulnerable to attacks if exposed to external threats

Attackers Malicious external entities seeking to exfiltrate sensitive data Penetrate security barriers by exploiting vulnerabilities in the software

platform

Page 9: Practical Data Confinement

Central Design Decisions Policy enforcement responsibilities

Cannot rely on human users The system must track the flow of sensitive information,

enforce restrictions when the data is externalized

Page 10: Practical Data Confinement

Central Design Decisions Policy enforcement responsibilities

Cannot rely on human users The system must track the flow of sensitive information,

enforce restrictions when the data is externalized

Granularity of information flow tracking (IFT) Need fine-grained byte-level tracking and policy

enforcement to prevent accidental partial exfiltrations

Page 11: Practical Data Confinement

Central Design Decisions Placement of functionality

PDC inserts a thin software layer (hypervisor) between the OS and hardware

The hypervisor implements byte-level IFT and policy enforcement

A hypervisor-level solution Retains compatibility with existing OSes and applications Has sufficient control over hardware

Page 12: Practical Data Confinement

Central Design Decisions Placement of functionality

PDC inserts a thin software layer (hypervisor) between the OS and hardware

The hypervisor implements byte-level IFT and policy enforcement

A hypervisor-level solution Retains compatibility with existing OSes and applications Has sufficient control over hardware

Resolving tension between safety and user freedom Partition the application environment into two isolated

components: a “Safe world” and a “Free world”

Page 13: Practical Data Confinement

Partitioning the User Environment

Hypervisor

Hardware (CPU, Memory, Disk, NIC, USB, Printer, …)

Safe Virtual Machine Unsafe Virtual Machine

Access to sensitive data

Unrestricted communication and execution of untrusted code

IFT, policy enforcement

Page 14: Practical Data Confinement

Partitioning the User EnvironmentSensitive

dataNon-sensitive

data

Trusted code/data

Untrusted (potentially malicious)

code/data

Exposure to the threat of exfiltration

Page 15: Practical Data Confinement

PDC Use Cases Logical “air gaps” for high-security environments

VM-level isolation obviates the need for multiple physical networks

Preventing information leakage via e-mail “Do not disseminate the attached document”

Digital rights management Keeping track of copies; document self-destruct

Auto-redaction of sensitive content

Page 16: Practical Data Confinement

Talk Outline Introduction Requirements and Assumptions Use Cases PDC Architecture Prototype Implementation Preliminary Performance Evaluation Current Status and Future Work

Page 17: Practical Data Confinement

PDC Architecture: Hypervisor PDC uses an augmented hypervisor to

Ensure isolation between safe and unsafe VMs Tracks the propagation of sensitive data in the safe VM Enforces security policy at exit points

Network I/O, removable storage, printer, etc.

Page 18: Practical Data Confinement

PDC Architecture: Tag Tracking in the Safe VM

PDC associates an opaque 32-bit sensitivity tag with each byte of virtual hardware state User CPU registers accessible Volatile memory Files on disk

Page 19: Practical Data Confinement

PDC Architecture: Tag Tracking in the Safe VM

These tags are viewed as opaque identifiers The semantics can be tailored to fit the specific

needs of administrators/users Tags can be used to specify

Security policies Levels of security clearance High-level data objects High-level data types within an object

Page 20: Practical Data Confinement

PDC Architecture: Tag Tracking in the Safe VM

An augmented x86 emulator performs fine-grained instruction-level tag tracking (current implementation is based on QEMU)

PDC tracks explicit data flows (variable assignments, arithmetic operations)

add %eax, %ebxebx

eax

Page 21: Practical Data Confinement

PDC Architecture: Tag Tracking in the Safe VM

An augmented x86 emulator performs fine-grained instruction-level tag tracking (current implementation is based on QEMU)

PDC also tracks flows resulting from pointer dereferencing

mov %eax, %(ebx)

ebx

eax

Memory

Tag merge

Page 22: Practical Data Confinement

Challenges Tag storage overhead in memory and on disk

Naïve implementation would incur a 400% overhead Computational overhead of online tag tracking Tag explosion

Tag tracking across pointer exacerbates the problem Tag erosion due to implicit flows Bridging the semantic gap between application data

units and low-level machine state Impact of VM-level isolation on user experience

Page 23: Practical Data Confinement

Talk Outline Introduction Requirements and Assumptions Use Cases PDC Architecture Prototype Implementation

Storing sensitivity tags in memory and on disk Fine-grained tag tracking in QEMU “On-demand” emulation Policy enforcement

Performance Evaluation Current Status and Future Work

Page 24: Practical Data Confinement

PDC Implementation: The Big Picture

PDC-Xen (ring 0)

Shadow page tables Safe VM page tables

PageTagMask

CPU

CR3

Safe VM

App1 App2

Xen-RPC

NFS Client

VFS

Dom 0

Xen-RPC

NFS Server

Event channel

Shared ring buffer

PDC-ext3

QEMU / tag tracker

Safe VM (emulated)

PageTagDescriptorsNetwork daemon

Policy daemon

NIC

Page 25: Practical Data Confinement

Storing Tags in Volatile Memory

PageNumber

PDC maintains a 64-bit PageTagSummary for each page of machine memory

Uses a 4-level tree data structure to keep PageNumber PageTagSummary mappings

Array of 64-bit PageTagSummary structures

09192931

Page 26: Practical Data Confinement

Storing Tags in Volatile Memory

PageTagSummaryPage-wide tag for uniformly-tagged pages

Pointer to a PageTagDescriptor otherwise

PageTagDescriptor stores fine-grained (byte-level) tags within a page in one of two formats

PageTagDescriptorLinear array of tags (indexed by page offset)

RLE encoding

Page 27: Practical Data Confinement

Storing Tags on Disk PDC-ext3 provides persistent storage for the safe VM

New i-node field for file-level tags

Leaf indirect blocks store pointers to BlockTagDescriptors

BlockTagDescriptor byte-level tags within a block

i-node

Ind. blockLeaf Ind. block

Data block

BlockTagDescriptorLinear array

RLE

FileTag

Page 28: Practical Data Confinement

Back to the Big Picture

PDC-Xen (ring 0)

Shadow page tables Safe VM page tables

PageTagMask

CPU

CR3

Safe VM

App1 App2

Xen-RPC

NFS Client

VFS

Dom0

Xen-RPC

NFS Server

Event channel

Shared ring buffer

PDC-ext3

QEMU / tag tracker

Safe VM (emulated)Network daemon

Policy daemon

NICEmul. CPU

Context

Page 29: Practical Data Confinement

Fine-Grained Tag Tracking A modified version of QEMU emulates the safe VM

and tracks movement of sensitive data

QEMU relies on runtime binary recompilation to achieve reasonably efficient emulation We augment the QEMU compiler to generate a tag tracking instruction stream from

the input stream of x86 instructions

Guest machine

code block (x86)

Intermediate representation

(TCG)

Host machine code block (x86)

Tag tracking code block

stage 1

stage 2

Page 30: Practical Data Confinement

Fine-Grained Tag Tracking Tag tracking instructions manipulate the tag status of

emulated CPU registers and memory

The tag tracking instruction stream executes asynchronously in a separate thread

Basic instruction format

Dest. Operand Src. OperandAction

{Clear, Set, Merge} {Reg, Mem} {Reg, Mem}

Page 31: Practical Data Confinement

Fine-Grained Tag Tracking Problem: some of the instruction arguments are not known at compile time

Example: mov %eax,(%ebx) Source memory address is not known

The main emulation thread writes the values of these arguments to a temporary log (a circular memory buffer) at runtime

The tag tracker fetches unknown values from this log

Page 32: Practical Data Confinement

Binary Recompilation (Example)

Input x86 instructions Intermediate representation Tag tracking instructions

mov %eax, $123

push %ebp

movi_i32 tmp0,$123st_i32 tmp0,env,$0x0

ld_i32 tmp0,env,$0x14ld_i32 tmp2,env,$0x10movi_i32 tmp14, $0xfffffffcadd_i32 tmp2,tmp2,tmp14qemu_st_logaddr tmp0,tmp2st_i32 tmp2,env,$0x10

Clear4 eax

Set4 mem,ebp,0Merge4 mem,esp,0

Tag tracking argument log

MachineAddr(%esp)

Page 33: Practical Data Confinement

Binary Recompilation But things get more complex…

Switching between operating modes (Protected/real/virtual8086, 16/32bit)

Page 34: Practical Data Confinement

Binary Recompilation But things get more complex…

Switching between operating modes (Protected/real/virtual8086, 16/32bit)

Recovering from exceptions in the middle of a translation block

Page 35: Practical Data Confinement

Binary Recompilation But things get more complex…

Switching between operating modes (Protected/real/virtual8086, 16/32bit)

Recovering from exceptions in the middle of a translation block Multiple memory addressing modes

Page 36: Practical Data Confinement

Binary Recompilation But things get more complex…

Switching between operating modes (Protected/real/virtual8086, 16/32bit)

Recovering from exceptions in the middle of a translation block Multiple memory addressing modes Repeating instructions

rep movs

Page 37: Practical Data Confinement

Binary Recompilation But things get more complex…

Switching between operating modes (Protected/real/virtual8086, 16/32bit)

Recovering from exceptions in the middle of a translation block Multiple memory addressing modes Repeating instructions

rep movs Complex instructions whose semantics are partially determined

by the runtime state

saved EFLAGSsaved CSsaved EIP

saved ESPsaved SS

iret

Page 38: Practical Data Confinement

Back to the Big Picture

PDC-Xen (ring 0)

Shadow page tables Safe VM page tables

PageTagMask

CPU

CR3

Safe VM

App1 App2

Xen-RPC

NFS Client

VFS

Dom0

Xen-RPC

NFS Server

Event channel

Shared ring buffer

PDC-ext3

QEMU / tag tracker

Safe VM (emulated)Network daemon

Policy daemon

NICEmul. CPU

Context

Page 39: Practical Data Confinement

“On-Demand” Emulation

PageTagMask

During virtualized execution, PDC-Xen uses the paging hardware to intercept sensitive data access

Safe VM page tables

Shadow page tables

PageTagDescriptors

PDC-Xen (ring 0)

QEMU / tag tracker

Access to a tagged page from the safe VM causes a page fault and transfer of control to the hypervisor

Maintains shadow page tables, in which all memory pages containing tagged data are marked as not present

Page 40: Practical Data Confinement

“On-Demand” Emulation If the page fault is due to tagged data, PDC-Xen

suspends the guest domain and transfers control to the emulator (QEMU)

QEMU initializes the emulated CPU context from the native processor context (saved upon entry to the page fault handler) and resumes the safe VM in emulated mode

SafeVM VCPUDom0 VCPU

Safe VM

Safe VM MemoryDom0 Memory

Dom0

QEMU / tag tracker

Emul. SafeVM CPU

Access to a tagged page

Page fault handler

Safe VM memory mappings

Page 41: Practical Data Confinement

“On-Demand” Emulation Returning from emulated execution

QEMU terminates the main emulation loop, waits for the tag tracker to catch up QEMU then makes a hypercall to PDC-Xen and provides

Up-to-date processor context for the safe VM VCPU Up-to-date PageTagMask

Page 42: Practical Data Confinement

“On-Demand” Emulation Returning from emulated execution

QEMU terminates the main emulation loop, waits for the tag tracker to catch up QEMU then makes a hypercall to PDC-Xen and provides

Up-to-date processor context for the safe VM VCPU Up-to-date PageTagMask

The hypercall awakens the safe VM VCPU (blocked in the page fault handler) The page fault handler

Overwrites the call stack with up-to-date values of CS/EIP, SS/ESP, EFLAGS Restores other processor registers Returns control to the safe VM

Page 43: Practical Data Confinement

“On-Demand” Emulation - Challenges

Page 44: Practical Data Confinement

“On-Demand” Emulation - Challenges Updating PTEs in read-only page table mappings

Solution: QEMU maintains local writable “shadow” copies,

synchronizes them in background via hypercalls

Page 45: Practical Data Confinement

“On-Demand” Emulation - Challenges Updating PTEs in read-only page table mappings

Solution: QEMU maintains local writable “shadow” copies,

synchronizes them in background via hypercalls

Transferring control to the hypervisor during emulated execution (hypercall and fault handlers) Emulating hypervisor-level code is not an option Solution: Transient switch to native execution

Resume native execution at the instruction that causes a jump to the hypervisor (e.g., int 0x82 for hypercalls)

Page 46: Practical Data Confinement

“On-Demand” Emulation - Challenges Delivery of timer interrupts (events) in emulated mode The hardware clock advances faster in the emulated context

(i.e., each instruction consumes more clock cycles) Xen needs to scale the delivery of timer events accordingly

Page 47: Practical Data Confinement

“On-Demand” Emulation - Challenges Delivery of timer interrupts (events) in emulated mode The hardware clock advances faster in the emulated context

(i.e., each instruction consumes more clock cycles) Xen needs to scale the delivery of timer events accordingly

Use of the clock cycle counter (rdtsc instruction) Linux timer interrupt/event handler uses the clock cycle counter

to estimate timer jitter After switching from emulated to native execution, the guest

kernel observes a sudden jump forward in time

Page 48: Practical Data Confinement

Policy Enforcement The policy controller module

Resides in dom0 and interposes between the front-end and the back-end device driver

Fetches policies from a central policy server Looks up the tags associated with the data in shared I/O request

buffers and applies policies

Netw. interfaceback-end

Dom0

Block storage back-end

Safe VM

Netw. Interfacefront-end

Block storage front-end

Policy controller

Page 49: Practical Data Confinement

Network Communication PDC annotates outgoing packets with

PacketTagDescriptors, carrying the sensitivity tags

Current implementation transfers annotated packets via a TCP/IP tunnel

PayloadTCPHdrIPHdr

Tags

EthHdr

PayloadTCPHdrIPHdrEthHdrTCPHdrIPHdrEthHdr

Annotation TCP/IP encapsulation

Page 50: Practical Data Confinement

Talk Outline Introduction Requirements and Assumptions Use Cases PDC Architecture Prototype Implementation Preliminary Performance Evaluation

Application-level performance overhead Filesystem performance overhead Network bandwidth overhead

Current Status and Future Work

Page 51: Practical Data Confinement

Preliminary Performance Evaluation Experimental setup:

Quad-core AMD Phenom 9500, 2.33GHz, 3GB of RAM 100Mbps Ethernet PDC Hypervisor based on Xen v.3.3.0 Paravirtualized Linux kernel v.2.6.18-8 Tag tracker based on QEMU v.0.10.0

Page 52: Practical Data Confinement

Application-Level Overhead Goal: estimate the overall performance penalty (as

perceived by users) in realistic usage scenarios First scenario: recursive text search within a directory

tree (grep) Input dataset: 1GB sample of the Enron corporate e-mail

database (http://www.cs.cmu.edu/~enron) We mark a fraction (F) of the messages as sensitive, assigning

them uniform sensitivity tag We search the dataset for a single-word string and measure

the overall running time

Page 53: Practical Data Confinement

Application-Level OverheadPDC-Xen, paravirt. Linux, tag tracking

Linux on “bare metal”

Standard Xen, paravirt. Linux

F (%)

Page 54: Practical Data Confinement

Filesystem Performance Overhead Configurations: C1 – Linux on “bare metal”; standard ext3 C2 – Xen, paravirt. Linux; dom0 exposes a paravirt. block device; Guest domain mounts it as

ext3 C3 – Xen, paravirt. Linux; dom0 exposes ext3 to the guest domain via NFS/TCP C4 – Xen, paravirt. Linux; dom0 exposes ext3 to the guest domain via NFS/Xen-RPC C5 – Xen, paravirt. Linux; dom0 exposes PDC-ext3 to the guest domain via NFS/Xen-RPC

First experiment: sequential file write throughput Create a file write 1GB of data sequentially close sync

Page 55: Practical Data Confinement

Filesystem Performance Overhead Configurations: C1 – Linux on “bare metal”; standard ext3 C2 – Xen, paravirt. Linux; dom0 exposes a paravirt. block device; Guest domain mounts it as

ext3 C3 – Xen, paravirt. Linux; dom0 exposes ext3 to the guest domain via NFS/TCP C4 – Xen, paravirt. Linux; dom0 exposes ext3 to the guest domain via NFS/Xen-RPC C5 – Xen, paravirt. Linux; dom0 exposes PDC-ext3 to the guest domain via NFS/Xen-RPC

Config C1 C2 C3 C4 C5

Elapsed time (sec.) 2.56 2.69 3.70 3.40 3.35

Page 56: Practical Data Confinement

Filesystem Performance Overhead Second experiment: Metadata operation overhead

M1: Create a large directory tree (depth=6, fanout=6) M2: Remove the directory tree created by M1 (rm –rf *)

Page 57: Practical Data Confinement

Network Bandwidth Overhead We used iperf to measure end-to-end bandwidth

between a pair of directly-connected hosts Configurations:

NC1 – No packet interception NC2 – Interception and encapsulation NC3 –Interception, encapsulation, and annotation with sensitivity tags

Sender assigns sensitivity tags to a random sampling of outgoing packets We vary two parameters: Tag Prevalence (P) and Tag Fragmentation (F)

Page 58: Practical Data Confinement

Network Bandwidth Overhead

Page 59: Practical Data Confinement

Performance Evaluation - Summary Application performance in the safe VM

10x slowdown in the worst-case scenario We expect to reduce this overhead significantly through a

number of optimizations

Disk and network I/O overhead Proportional to the amount sensitive data and the degree of tag

fragmentation 4x overhead in the worst-case scenairo (assuming 32-bit tag

identifiers)

Page 60: Practical Data Confinement

Summary and Future Work PDC seeks a practical solution to the problem of data confinment

Defend against exfiltration by outside attackers Prevent accidental policy violations

Hypervisor-based architecture provides mechanisms for isolation, information flow tracking, and policy enforcement

Currently working on Improving stability and performance of the prototype Studying the issue of taint explosion in Windows and Linux

environments and its implications on PDC