Resource-Freeing Attacks: Improve Your Cloud Performance (at Your Neighbor's Expense)

1

Resource-Freeing Attacks:Improve Your Cloud Performance

(at Your Neighbor's Expense)

(Venkat)anathan Varadarajan, Thawan Kooburat,

Benjamin Farley, Thomas Ristenpart,

and Michael Swift

DEPARTMENT OF COMPUTER SCIENCES

2

Public Clouds (EC2, Azure, Rackspace, …)

VM

Multi-tenancyDifferent customers’ virtual machines (VMs) share same server

Why multi-tenancy?Improved resource utilization

VM

VM

VM

VM

VM

VM

Implications of Multi-tenancy

• VMs share many resources– CPU, cache, memory, disk, network, etc.

• Virtual Machine Managers (VMM) – Goal: Provide Isolation

• Deployed VMMs don’t perfectly isolate VMs– Side-channels [Ristenpart et al. ’09, Zhang et al. ’12]

3

Today: Performance degraded by other customers

VM

VM

VMM

4

CPU Net Disk Cache0

100

200

300

400

500

600

Perfo

rman

ce D

egra

datio

n (%

)

Contention in Xen

Local Xen TestbedMachine Intel Xeon E5430,

2.66 GhzCPU 2 packages each

with 2 coresCache Size 6MB per package

VM

VM

Non-work-conserving CPU scheduling

Work-conservingscheduling

3x-6x Performance loss Higher cost

This work: Greedy customer can recover performance by interfering with other tenants

Resource-Freeing Attack

What can a tenant do?

5

Pack up VM and move(See our SOCC 2012 paper)… but, not all workloads cheap to move

VM

VM

Ask provider for better isolation… requires overhaul of the cloud

6

Resource-freeing attacks (RFAs)

• What is an RFA? • RFA case studies

1. Two highly loaded web server VMs2. Last Level Cache (LLC) bound VM and

highly loaded webserver VM• Demonstration on Amazon EC2

7

The Setting

Victim:– One or more VMs– Public interface (eg, http)

Beneficiary:– VM whose performance we want

to improve

Helper:– Mounts the attack

Beneficiary and victim fighting over a target resource

Helper

VM

VM

Victim

Beneficiary

Example: Network Contention

• Beneficiary & Victim– Apache webservers hosting static and

dynamic (CGI) web pages.• Target Resource: Network Bandwidth• Work-conserving scheduler– network bandwidth

8

Net

Clients

What can you do?

VictimBeneficiary

Loca

l Xen

Test

be

d

Ways to Reduce Contention?

Break into victim VM and disable it

9

Net

Clients

Loca

l Xen

Test

be

d

But:• Requires knowledge of

vulnerability• Drastic• Easy to detect

Helper

VictimBeneficiary

The good:frees up resources used by victim

Ways to Reduce Contention?

Do a simple DoS attack?

This may NOT free up target resources

10

Net

Clients

Loca

l Xen

Test

be

dBackfires: May increase the contention

Helper

SYN

floo

d

VictimBeneficiary

11

Recipe for a Successful RFAShift resource away from the target resource towards the bottleneck resource

Shift resource usage via public interface

Proportion of Network usage

CPU intensive dynamic pages

Static pagesProp

ortio

n of

CPU

usa

ge

Push

tow

ards

CPU

bott

lene

ck

Reduce target resource usage

Limits

12

An RFA in Our Example

Net

Helper

CGI R

eque

st

CPU Utilization

Clients

Result in our testbed:Increases beneficiary’s share of bandwidth

No RFA: 1800 page requests/secW/ RFA: 3026 page requests/sec

50% 85%share of

bandwidth

13

Shared CPU Cache:– Ubiquitous: Almost all workloads need cache– Hardware controlled: Not easily isolated via

software– Performance Sensitive: High performance cost!

Resource-freeing attacks 1) Send targeted requests to victim 2) Shift resources use from target to a bottleneck

Can we mount RFAs when targetresource is CPU cache?

14

Cache Contention

1000 2000 30000

50

100

150

200

250

Webserver Request Rate

Cach

e Pe

rform

ance

Deg

rada

tion

(%)

RFA Goal

Case Study: Cache vs. Network

• Victim : Apache webserver hosting static and dynamic (CGI) web pages

• Beneficiary: Synthetic cache bound workload (LLCProbe)

• Target Resource: Cache• No cache isolation:– ~3x slower when sharing

cache with webserver

15

Net

Cache

$$$ Clients

Loca

l Xen

Test

be

d

VictimBeneficiary

CoreCore

16

Net

Cache vs. NetworkVictim webserver frequently interrupts, pollutes the cache– Reason: Xen gives higher

priority to VM consuming less CPU time

Cache

Clients$$$

Cache state time line

Beneficiary starts to run

Core Core

decreased cache efficiency

Webserver receives a

request

Heavily loaded web server

cache state

17

Net

Cache vs. Network w/ RFARFA helps in two ways:1. Webserver loses its

priority.2. Reducing the capacity

of webserver.Cache

Clients$$$

Cache state time line

Core Core

HelperHeavily loaded webserver requests under RFA

CGI R

eque

stBeneficiary starts to run

Webserver receives a

request

Heavily loaded web server

cache state

18

RFA: Performance ImprovementRFA intensities – time in ms per second

196% slowdown

86% slowdown

60%Performance Improvement

19

RFA Effect on InterruptionsBeneficiary: LLCProbe

40%

85%

x+

20

RFA Effect on Victim’s capacity

Decreases with increasing RFA intensity

21

Instance type m1.small

# of co-resident pairs 9 (23 total instances)

Machine type Intel Xeon E5507 with 4MB LLC

Experiments on Amazon EC2

VM

VM

VM

VM

VM

Multiple Accounts

Co-resident VMs from our accounts:Stand-ins for victim and beneficiary

Separate instances for helper and web clients

No direct interact with any other customersIndirect interaction akin to normal usage cases

VM

LLCProbe Synthetic Benchmark

RFA improved performance of LLCProbe on all experimental EC2 instances!

Highest performance improvement of 13%,

recovering 33% of performance lost.

22

Average performance improvement: 6%

23

mcf from SPEC-CPU

10% slowdown

6% slowdown

3% performance improvement = 35% reduction in performance loss

On average RFA improved performance across all SPEC workloads!

24

Discussion: Practical Aspects

RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities

(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds

(Co-resident VMs have predictable internal IP addresses)

No public interface? – Paper discusses possibilities for RFAs

VM

VM

25

Conclusion

Resource-Freeing Attacks– Interfere with victim to shift

resource use – Proof-of-concept of efficacy in

public clouds

Open questions: – Other RFAs? – Countermeasures: Detection, stricter

isolation, smarter scheduling?

VM

VM

26

References[MMSys10] Sean K. Barker and Prashant Shenoy. “Empirical evaluation of latency-sensitive application performance in the cloud.” In MMSys, 2010.

[Security10] Thomas Moscibroda and Onur Mutlu. “Memory performance attacks: Denial of memory service in multi-core systems.” In Usenix Security Symposium, 2007.

[CCS09] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage. “Hey, you, get off my cloud: exploring information leakage in third party compute clouds.” In CCS, 2009.

27

Backup Slides

28

Discussion: Countermeasures

Detection?– May be hard to differentiate RFA from legitimate

Stricter Isolation?– Works but expensive

Contention-aware scheduling– Not yet used in public IaaS

29

Discussion: Economies• Cost of RFA

– Helper instance, and– RFA traffic.

• Co-resident helper– An efficient implementation of helper can run inside the

attacker’s VM.– Current helper implementation consumes 15 Kbps of network

bandwidth and a CPU utilization of 0.7%.• Multiplex Singe Helper Instance for many beneficiaries.• Note: Currently, internal EC2 network traffic is free-of-

cost.

30

Identifying Co-resident VMs

• Identifying the public interface:– Predictable numerical distance between internal

IP addresses in public clouds.– Identifying port used by the victim application

(standard ports like http(s), etc.).

31

Experiment: Measuring Resource Contention

• Synthetic workloads

32

Other RFAs

• RFAs are not limited to the presented case studies.

• LLC vs. Disk– Sending spurious, random disk requests

asynchronously to create a bottleneck for the shared disk resource.

• Memory vs. Disk– Similarly to the above RFA

33

Discussion: More on Practical Aspects

• Work-conserving vs. Non-work-conserving schedulers– It is expected that public cloud environment manage

resources in a non-work-conserving fashion.– Eg. Net vs. Net RFA won’t work on Amazon EC2.

• Simulated client workload– What is the effect of RFA in the presence of multiple

independent client requests originating from numerous clients?

34

N/WCore Core Core Core

cache

memory

Disk

Hypervisor

Dom0 Dom0 Dom0 Dom0

VM VM VM VM

VM VM VM VM

Xen Internals

• Domain-0– Privileged Domain, direct

access to I/O devices.– All I/O requests goes

through Dom-0• Xen scheduler internal– Boost priority for

interactive workloads

Inco

min

g re

ques

t

35

Experiment: Measuring Resource Contention

• On a local Xen test bed

Loca

l Xen

Test

be

d

VM

N/WCore Core Core Core

VM

LLC

memory

Disk

VM VM VM

VMVM VM VM

Machine Intel Xeon E5430, 2.66 Ghz

Packages 2, 2 cores per package

LLC Size 6MB per package LLC

CPU Net Disk Memory Cache

0

100

200

300

400

500

600

CPUNetDiskMemoryCache

Conflicting Workloads

Perf

orm

ance

Deg

rada

tion

(%)

Observed Workloads:

Not all resources

conflict

Some have huge performance degradation

36

Boost Priority and Interruptions

Victim: Webserver Beneficiary: LLCProbe

95%

< 30%

40%

85%

Fewer interruptions Higher cache efficiency

37

Demonstration on EC2

• Problem #1: Achieving Co-residence– Launching multiple instances simultaneously from

two or more accounts.• Problem #2: Verifying Co-residency– Numerical distance between internal IP addresses

[CCS09].– Faster packet round-trip times.– Using resource contention experiments.

38

Normalized Performance on EC2

Baseline

Higher is better

Aggregate performance

degradation is within 5 performance points

6%

On an average all SPEC workloads

benefitted from RFA

Resource-Freeing Attacks: Improve Your Cloud Performance (at Your Neighbor's Expense)

Documents

Transcript of Resource-Freeing Attacks: Improve Your Cloud Performance (at Your Neighbor's Expense)