Resource-Freeing Attacks: Improve Your Cloud Performance (at Your Neighbor's Expense)
description
Transcript of Resource-Freeing Attacks: Improve Your Cloud Performance (at Your Neighbor's Expense)
1
Resource-Freeing Attacks:Improve Your Cloud Performance
(at Your Neighbor's Expense)
(Venkat)anathan Varadarajan, Thawan Kooburat,
Benjamin Farley, Thomas Ristenpart,
and Michael Swift
DEPARTMENT OF COMPUTER SCIENCES
2
Public Clouds (EC2, Azure, Rackspace, …)
VM
Multi-tenancyDifferent customers’ virtual machines (VMs) share same server
Why multi-tenancy?Improved resource utilization
VM
VM
VM
VM
VM
VM
Implications of Multi-tenancy
• VMs share many resources– CPU, cache, memory, disk, network, etc.
• Virtual Machine Managers (VMM) – Goal: Provide Isolation
• Deployed VMMs don’t perfectly isolate VMs– Side-channels [Ristenpart et al. ’09, Zhang et al. ’12]
3
Today: Performance degraded by other customers
VM
VM
VMM
4
CPU Net Disk Cache0
100
200
300
400
500
600
Perfo
rman
ce D
egra
datio
n (%
)
Contention in Xen
Local Xen TestbedMachine Intel Xeon E5430,
2.66 GhzCPU 2 packages each
with 2 coresCache Size 6MB per package
VM
VM
Non-work-conserving CPU scheduling
Work-conservingscheduling
3x-6x Performance loss Higher cost
This work: Greedy customer can recover performance by interfering with other tenants
Resource-Freeing Attack
What can a tenant do?
5
Pack up VM and move(See our SOCC 2012 paper)… but, not all workloads cheap to move
VM
VM
Ask provider for better isolation… requires overhaul of the cloud
6
Resource-freeing attacks (RFAs)
• What is an RFA? • RFA case studies
1. Two highly loaded web server VMs2. Last Level Cache (LLC) bound VM and
highly loaded webserver VM• Demonstration on Amazon EC2
7
The Setting
Victim:– One or more VMs– Public interface (eg, http)
Beneficiary:– VM whose performance we want
to improve
Helper:– Mounts the attack
Beneficiary and victim fighting over a target resource
Helper
VM
VM
Victim
Beneficiary
Example: Network Contention
• Beneficiary & Victim– Apache webservers hosting static and
dynamic (CGI) web pages.• Target Resource: Network Bandwidth• Work-conserving scheduler– network bandwidth
8
Net
Clients
What can you do?
VictimBeneficiary
Loca
l Xen
Test
be
d
Ways to Reduce Contention?
Break into victim VM and disable it
9
Net
Clients
Loca
l Xen
Test
be
d
But:• Requires knowledge of
vulnerability• Drastic• Easy to detect
Helper
VictimBeneficiary
The good:frees up resources used by victim
Ways to Reduce Contention?
Do a simple DoS attack?
This may NOT free up target resources
10
Net
Clients
Loca
l Xen
Test
be
dBackfires: May increase the contention
Helper
SYN
floo
d
VictimBeneficiary
11
Recipe for a Successful RFAShift resource away from the target resource towards the bottleneck resource
Shift resource usage via public interface
Proportion of Network usage
CPU intensive dynamic pages
Static pagesProp
ortio
n of
CPU
usa
ge
Push
tow
ards
CPU
bott
lene
ck
Reduce target resource usage
Limits
12
An RFA in Our Example
Net
Helper
CGI R
eque
st
CPU Utilization
Clients
Result in our testbed:Increases beneficiary’s share of bandwidth
No RFA: 1800 page requests/secW/ RFA: 3026 page requests/sec
50% 85%share of
bandwidth
13
Shared CPU Cache:– Ubiquitous: Almost all workloads need cache– Hardware controlled: Not easily isolated via
software– Performance Sensitive: High performance cost!
Resource-freeing attacks 1) Send targeted requests to victim 2) Shift resources use from target to a bottleneck
Can we mount RFAs when targetresource is CPU cache?
14
Cache Contention
1000 2000 30000
50
100
150
200
250
Webserver Request Rate
Cach
e Pe
rform
ance
Deg
rada
tion
(%)
RFA Goal
Case Study: Cache vs. Network
• Victim : Apache webserver hosting static and dynamic (CGI) web pages
• Beneficiary: Synthetic cache bound workload (LLCProbe)
• Target Resource: Cache• No cache isolation:– ~3x slower when sharing
cache with webserver
15
Net
Cache
$$$ Clients
Loca
l Xen
Test
be
d
VictimBeneficiary
CoreCore
16
Net
Cache vs. NetworkVictim webserver frequently interrupts, pollutes the cache– Reason: Xen gives higher
priority to VM consuming less CPU time
Cache
Clients$$$
Cache state time line
Beneficiary starts to run
Core Core
decreased cache efficiency
Webserver receives a
request
Heavily loaded web server
cache state
17
Net
Cache vs. Network w/ RFARFA helps in two ways:1. Webserver loses its
priority.2. Reducing the capacity
of webserver.Cache
Clients$$$
Cache state time line
Core Core
HelperHeavily loaded webserver requests under RFA
CGI R
eque
stBeneficiary starts to run
Webserver receives a
request
Heavily loaded web server
cache state
18
RFA: Performance ImprovementRFA intensities – time in ms per second
196% slowdown
86% slowdown
60%Performance Improvement
19
RFA Effect on InterruptionsBeneficiary: LLCProbe
40%
85%
x+
20
RFA Effect on Victim’s capacity
Decreases with increasing RFA intensity
21
Instance type m1.small
# of co-resident pairs 9 (23 total instances)
Machine type Intel Xeon E5507 with 4MB LLC
Experiments on Amazon EC2
VM
VM
VM
VM
VM
Multiple Accounts
Co-resident VMs from our accounts:Stand-ins for victim and beneficiary
Separate instances for helper and web clients
No direct interact with any other customersIndirect interaction akin to normal usage cases
VM
LLCProbe Synthetic Benchmark
RFA improved performance of LLCProbe on all experimental EC2 instances!
Highest performance improvement of 13%,
recovering 33% of performance lost.
22
Average performance improvement: 6%
23
mcf from SPEC-CPU
10% slowdown
6% slowdown
3% performance improvement = 35% reduction in performance loss
On average RFA improved performance across all SPEC workloads!
24
Discussion: Practical Aspects
RFA case studies used CPU intensive CGI requests– Alternative: DoS vulnerabilities
(Eg. hash-collision attacks)Identifying co-resident victims– Easy on most clouds
(Co-resident VMs have predictable internal IP addresses)
No public interface? – Paper discusses possibilities for RFAs
VM
VM
25
Conclusion
Resource-Freeing Attacks– Interfere with victim to shift
resource use – Proof-of-concept of efficacy in
public clouds
Open questions: – Other RFAs? – Countermeasures: Detection, stricter
isolation, smarter scheduling?
VM
VM
26
References[MMSys10] Sean K. Barker and Prashant Shenoy. “Empirical evaluation of latency-sensitive application performance in the cloud.” In MMSys, 2010.
[Security10] Thomas Moscibroda and Onur Mutlu. “Memory performance attacks: Denial of memory service in multi-core systems.” In Usenix Security Symposium, 2007.
[CCS09] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage. “Hey, you, get off my cloud: exploring information leakage in third party compute clouds.” In CCS, 2009.
27
Backup Slides
28
Discussion: Countermeasures
Detection?– May be hard to differentiate RFA from legitimate
Stricter Isolation?– Works but expensive
Contention-aware scheduling– Not yet used in public IaaS
29
Discussion: Economies• Cost of RFA
– Helper instance, and– RFA traffic.
• Co-resident helper– An efficient implementation of helper can run inside the
attacker’s VM.– Current helper implementation consumes 15 Kbps of network
bandwidth and a CPU utilization of 0.7%.• Multiplex Singe Helper Instance for many beneficiaries.• Note: Currently, internal EC2 network traffic is free-of-
cost.
30
Identifying Co-resident VMs
• Identifying the public interface:– Predictable numerical distance between internal
IP addresses in public clouds.– Identifying port used by the victim application
(standard ports like http(s), etc.).
31
Experiment: Measuring Resource Contention
• Synthetic workloads
32
Other RFAs
• RFAs are not limited to the presented case studies.
• LLC vs. Disk– Sending spurious, random disk requests
asynchronously to create a bottleneck for the shared disk resource.
• Memory vs. Disk– Similarly to the above RFA
33
Discussion: More on Practical Aspects
• Work-conserving vs. Non-work-conserving schedulers– It is expected that public cloud environment manage
resources in a non-work-conserving fashion.– Eg. Net vs. Net RFA won’t work on Amazon EC2.
• Simulated client workload– What is the effect of RFA in the presence of multiple
independent client requests originating from numerous clients?
34
N/WCore Core Core Core
cache
memory
Disk
Hypervisor
Dom0 Dom0 Dom0 Dom0
VM VM VM VM
VM VM VM VM
Xen Internals
• Domain-0– Privileged Domain, direct
access to I/O devices.– All I/O requests goes
through Dom-0• Xen scheduler internal– Boost priority for
interactive workloads
Inco
min
g re
ques
t
35
Experiment: Measuring Resource Contention
• On a local Xen test bed
Loca
l Xen
Test
be
d
VM
N/WCore Core Core Core
VM
LLC
memory
Disk
VM VM VM
VMVM VM VM
Machine Intel Xeon E5430, 2.66 Ghz
Packages 2, 2 cores per package
LLC Size 6MB per package LLC
CPU Net Disk Memory Cache
0
100
200
300
400
500
600
CPUNetDiskMemoryCache
Conflicting Workloads
Perf
orm
ance
Deg
rada
tion
(%)
Observed Workloads:
Not all resources
conflict
Some have huge performance degradation
36
Boost Priority and Interruptions
Victim: Webserver Beneficiary: LLCProbe
95%
< 30%
40%
85%
Fewer interruptions Higher cache efficiency
37
Demonstration on EC2
• Problem #1: Achieving Co-residence– Launching multiple instances simultaneously from
two or more accounts.• Problem #2: Verifying Co-residency– Numerical distance between internal IP addresses
[CCS09].– Faster packet round-trip times.– Using resource contention experiments.
38
Normalized Performance on EC2
Baseline
Higher is better
Aggregate performance
degradation is within 5 performance points
6%
On an average all SPEC workloads
benefitted from RFA