XS Oracle 2009 Just Run It
-
Upload
xen-project -
Category
Technology
-
view
765 -
download
0
description
Transcript of XS Oracle 2009 Just Run It
![Page 1: XS Oracle 2009 Just Run It](https://reader035.fdocuments.us/reader035/viewer/2022080209/54820fe0b4af9f730d8b46b3/html5/thumbnails/1.jpg)
JustRunIt: Experiment-BasedManagement with Xen
Wei Zheng1 Ricardo Bianchini1 Yoshio Turner2
J. Renato Santos2 G. Janakiraman3
1Rutgers University 2HP Labs3Skytap
Xen Summit 2009
![Page 2: XS Oracle 2009 Just Run It](https://reader035.fdocuments.us/reader035/viewer/2022080209/54820fe0b4af9f730d8b46b3/html5/thumbnails/2.jpg)
Data Center Management
• Challenging– Variety of tasks: Resource allocation, software/hardware
upgrades, application and OS configuration…– Complex: Affect performance, availability, energy consumption– Getting worse: larger scale data centers hosting more diverse
and dynamic workloads
• Automation might save us– Using analytical models: insight into system behavior,
fast exploration of large parameter spaces
• Modeling has some important drawbacks– Expensive to develop, often relies on simplifying assumptions,
may require re-calibration and re-validation as systems evolve
![Page 3: XS Oracle 2009 Just Run It](https://reader035.fdocuments.us/reader035/viewer/2022080209/54820fe0b4af9f730d8b46b3/html5/thumbnails/3.jpg)
• Experiments may be a better approach than modeling for many management tasks– Low cost: time and energy consumed by a few machines
– No simplifying assumptions
– No need for calibration/validation
• Use of VMs in production facilitates experimentation
JustRunIt – Infrastructure for experiment-based management of virtualized data centers hosting multiple services
Our Approach:Experiment-Based Management
![Page 4: XS Oracle 2009 Just Run It](https://reader035.fdocuments.us/reader035/viewer/2022080209/54820fe0b4af9f730d8b46b3/html5/thumbnails/4.jpg)
JustRunIt Architecture
X X
X X X
X I I X
I I I I
X X I X
T T
T
Driver
Checker
Heuristics
Coordinate Range
Time Limit
Interpolator
Management Entity
Coordinates
Experiment resultsCoord
inate
s
Experim
ent
resu
lts
Experimenter
![Page 5: XS Oracle 2009 Just Run It](https://reader035.fdocuments.us/reader035/viewer/2022080209/54820fe0b4af9f730d8b46b3/html5/thumbnails/5.jpg)
Experimenter• Step 1: Clone subset of
production system to a sandbox
VMVM
![Page 6: XS Oracle 2009 Just Run It](https://reader035.fdocuments.us/reader035/viewer/2022080209/54820fe0b4af9f730d8b46b3/html5/thumbnails/6.jpg)
Experimenter• Step 1: Clone subset of
production system to a sandbox– VM cloning: Modify Xen live
migration to resume original VM instead of destroying it
– Storage cloning: LVM copy-on-write snapshot for sandbox VM
VMVM
VM
![Page 7: XS Oracle 2009 Just Run It](https://reader035.fdocuments.us/reader035/viewer/2022080209/54820fe0b4af9f730d8b46b3/html5/thumbnails/7.jpg)
Experimenter• Step 1: Clone subset of
production system to a sandbox– VM cloning: Modify Xen live
migration to resume original VM instead of destroying it
– Storage cloning: LVM copy-on-write snapshot for sandbox VM
– L2/L3 network address translation: implemented in driver domain netback driver to prevent network address conflict
VMVM
VM
![Page 8: XS Oracle 2009 Just Run It](https://reader035.fdocuments.us/reader035/viewer/2022080209/54820fe0b4af9f730d8b46b3/html5/thumbnails/8.jpg)
Experimenter• Step 1: Clone subset of
production system to a sandbox
– VM cloning: Modify Xen live migration to resume original VM instead of destroying it
– Storage cloning: LVM copy-on-write snapshot for sandbox VM
– L2/L3 network address translation: implemented in driver domain netback driver to prevent network address conflict
– Apply configuration changes : e.g., resource allocation
– Resume execution
VMVM
VM
![Page 9: XS Oracle 2009 Just Run It](https://reader035.fdocuments.us/reader035/viewer/2022080209/54820fe0b4af9f730d8b46b3/html5/thumbnails/9.jpg)
A1
A2
A1
A2
A2
A3
A2
A3
A1
A2
A3
W1
W2
W1
W2
W2
W3
W1
W2
W3
D1
D2
D1
D3
D2
D3
D1
D2
D3
S_A2
Requests Replies
Assess performance and energy for different configurations
S_D2
S_W2
S_W2 S_A2 S_D2
![Page 10: XS Oracle 2009 Just Run It](https://reader035.fdocuments.us/reader035/viewer/2022080209/54820fe0b4af9f730d8b46b3/html5/thumbnails/10.jpg)
• Proxies filter requests/replies from the sandbox VM
• Emulates the timing and functional behavior of preceding and following service tiers– Application protocol level requests/replies (e.g. HTTP)
• In-proxy replays with fixed delay– Delay needed if sandbox is faster than production system
Experimenter
Tier-N VMIn-Proxy Out-Proxy
Sandbox VM
• Step 2: Workload replay using network proxies
![Page 11: XS Oracle 2009 Just Run It](https://reader035.fdocuments.us/reader035/viewer/2022080209/54820fe0b4af9f730d8b46b3/html5/thumbnails/11.jpg)
Req0 t0 Resp0 t0’ Req0 t00 Resp0 t00’Req1Req2Req3
t01t02t03
Reqn t0n
Resp1Resp2Resp3
t01’t02’t03’
Respn t0n’
Resp0 ts0’Req1Req2Req3
t1t2t3
Reqn tn
Resp1Resp2Resp3
t1’t2’t3’
Respn tn’
Resp1 ts1’Resp2 ts2’Resp3 ts3’
Respn tsn’
Throughput
Mean response time
Online & Sandbox
Online & Sandbox
Proxies
Tier-N VMIn-Proxy Out-Proxy
Sandbox VM
![Page 12: XS Oracle 2009 Just Run It](https://reader035.fdocuments.us/reader035/viewer/2022080209/54820fe0b4af9f730d8b46b3/html5/thumbnails/12.jpg)
Experiment Driver
• Fill in results matrix within a time limit
• Corners
• Midpoints (recursive)
• Heuristics
−Cancel experiments if gain for a resource addition falls below a threshold
−Cancel experiments for tiers that do not produce the largest gains from a resource addition
X X
XX
X X
X
X
X
CPU allocation
CPUFreq
(min,min)
![Page 13: XS Oracle 2009 Just Run It](https://reader035.fdocuments.us/reader035/viewer/2022080209/54820fe0b4af9f730d8b46b3/html5/thumbnails/13.jpg)
Evaluation Overview
• Implementation– Xen (< 50 lines python, <250 lines of C in netback)– Proxies for HTTP, mod_jk, MySQL (800-1500 LoC each)
• Based on Tinyproxy codebase• Two services (auction and bookstore)
• Case studies– Resource management: CPU– Evaluation of hardware upgrade
• Proxy and Replay Overhead– Throughput: No impact– Response time: < 5 ms impact
![Page 14: XS Oracle 2009 Just Run It](https://reader035.fdocuments.us/reader035/viewer/2022080209/54820fe0b4af9f730d8b46b3/html5/thumbnails/14.jpg)
Case Study 1: Resource Management
• Goal: satisfy < 50ms response time using minimum resource
• JustRunIt determines performance as function of resource allocation
• Management entity assigns resources using a bin packing algorithm (simulated annealing)– Minimize resource usage and VM migrations
• Compare experiment-based with a highly accurate model
![Page 15: XS Oracle 2009 Just Run It](https://reader035.fdocuments.us/reader035/viewer/2022080209/54820fe0b4af9f730d8b46b3/html5/thumbnails/15.jpg)
Case Study 1 Results
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9
Time (minutes)
Resp
on
se T
ime (
ms)
Service 0 RT
Service 1 RT
Service 2 RT
Service 3 RT
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9
Time (minutes)
Resp
on
se T
ime (
ms)
Service 0 RT
Service 1 RT
Service 2 RT
Service 3 RT
JustRunIt
Highly accurate modeling
![Page 16: XS Oracle 2009 Just Run It](https://reader035.fdocuments.us/reader035/viewer/2022080209/54820fe0b4af9f730d8b46b3/html5/thumbnails/16.jpg)
Case Study 2: Hardware Upgrades
• Sandbox on upgraded hardware
• Bin packing finds the number of new machines to accommodate the workload
• Example scenario: two services, each app server uses 90% of one CPU core on old hardware– New machine requires 72%
– Example too small to show any consolidation benefits of upgrading
![Page 17: XS Oracle 2009 Just Run It](https://reader035.fdocuments.us/reader035/viewer/2022080209/54820fe0b4af9f730d8b46b3/html5/thumbnails/17.jpg)
Conclusions
• JustRunIt infrastructure can support multiple (automated) management tasks
– Answers “what-if” questions realistically and transparently using actual experiments
• Possible directions include:
– Tier interactions
– Hypothetical workload mix
– Validate administrator actions
– Tradeoff between accuracy and experiment cost (resource usage and experiment time required)
![Page 18: XS Oracle 2009 Just Run It](https://reader035.fdocuments.us/reader035/viewer/2022080209/54820fe0b4af9f730d8b46b3/html5/thumbnails/18.jpg)
Related Work
• Modeling, simulation of data centers– Development and validation cost, simulation slow-down
• Scaled down emulation– DieCast [NSDI08] uses VMs and time dilation for emulation (JustRunIt
targets subset of mgmt tasks, native execution, and minimal additional hardware)
• Sandboxing for managing data centers– Prior work from Rutgers (validating operator actions OSDI04
USENIX06, entire data center experiments EUROSYS07) and file server verification (Tan et al USENIX 05)
– Very different infrastructure for JustRunIt (extrapolation, verification, application-transparency using VM cloning – practicality)