Towards a Smart Workload Generator on RAMP Archana Ganapathi, David Patterson, Anthony Joseph...
-
Upload
samantha-little -
Category
Documents
-
view
223 -
download
0
description
Transcript of Towards a Smart Workload Generator on RAMP Archana Ganapathi, David Patterson, Anthony Joseph...
Towards a Smart Workload Generator on RAMP
Archana Ganapathi, David Patterson, Anthony Joseph
{archanag, pattrsn, adj} @ cs.berkeley.edu
RADLAB goals
-
20
40
60
80
100
120
0 10 20 30 40 50Mill
ion
Uni
que
Vis
itors
for F
eb 2
006
Help Single Operator/ Developer sites become Google-scale
Eliminate SW/HW obstacles for scaling
Tools to identify/fix problems
Use RAMP to move websites from right to left
Source: Washington Post 3/31/2006
Top 50 Web Domains
RADLAB Goals (2)
Jan Feb Mar Apr May Jun
Log
scal
e
YouTube.com 2006 Daily Traffic Ranking Challenges:
Scalability Configurability Single person operation Cost-effectiveness
Reproducibility and Observability
Source: Alexa.com
RAMP for Time Travel UCB goal for RAMP: Data center in a box
Google/Amazon.com O(10,000) processors in data center Anticipate load of 3-6 months in future for fast moving
company Smart Workload Generation: “smart gateware” design
informed by SML analysis of workload
Time Dilation on Emulated Machines: Try software/ config changes and observe behavior prior to deployment
Targeted Component-specific Load Generation: Stress-test components in the critical path to determine performance limitations
RAMP as emulation environment + workload generator
Building Blocks
Workload generation engine Parser to extract data from server response Workload description language to specify
primitives to compile onto an FPGA Machine Learning techniques to discover web
interactions
ML to determine interactions
User scripts
Real System Workload Generator
Naïve Workload Generator (CS252 class project by Lorenzo Orecchia and Madhur Tulsiani)
Generate the data-set using analytical models server file size distribution request size distribution relative file popularity
Derive URL connectivity graph, load in memory Circuit logic to perform random walk on graph. We achieve our goal of scalability:
Graph Size: 1048576 Memory Usage: 21 MB Total data size: 21083 MB
Scalability: RAMP DRAM limits Given: 2GB DRAMs, 4 DRAM banks per FPGA,
100 MHz clock cycle ~ 10 ns. Per cycle = 21 bits per walk (21Mbits for 1M walks)
Assume 10 clock cycles per access => 100 ns10 million accesses per second per bank per walk
Given four DRAM banks = 40M accesses per sec
Compare: Google receives 2000 requests per second
Scalability: RAMP Ethernet limitsGiven: 20 10Gbit/sec Ethernet ports per boardAssume can generate 100 million accesses/sec
Naïve Response-ignorant workload generation: 4 bytes for URL + check sum + header… ~ 32 bytes => 3 GB for 100 million accesses (per second)
Smart Response-driven workload generation: Google: 23KB Flickr: 45KB CNN: 100KB Assume up to 200KB response can receive 50K responses per second per port
= 1M responses handled by 20 10G ports.
Ethernet limits cont. About 1000X Google average with
Smart (response-driven) generation
Mixed RAMP emulation/workload generation even higher BW inside box
Have plenty of headroom to tradeoff speed for greater accuracy of workload
Some Open Questions Limits on types of workloads? Workload trace sources?
Web services, existing traffic generators, …? Role of Response-Ignorant trace
generation? UDP, error/congestion-free TCP, …?
Required level of fidelity for Response-Driven trace generation? How much of TCP FSM to model?
Question/Feedback?
Backup Slides
State of the art Hardware vs. software based
Hammer, Optixia vs SURGE, SLAMd Tunability vs Automation
SPECweb, TPC-W, Harpoon vs Optixia, SLAMd Realistic vs Synthetic
SURGE, SLAMd, Harpoon vs TPC-W Generic vs App-Specific
SLAMd, Harpoon vs TPC-W, Hammer Open-loop vs Closed-loop
Partly-open loop is most realistic for web services
Workload Generator Next Steps Handle server responses
Include “server response” states in logic Parse server response to identify current state
Include think-time distribution User think-time + server response time What happens when things go wrong?
Improve temporal/spatial locality Prefetch other URLs a page is linked to Take advantage of Zipfian popularity
distribution
Sketch of Random Walk Module
MEMORY
Data-set parameters
Graph Size
Memory Usage
Total data size
Average file size
4096 50 KB 2138 MB 521 KB
65536 1 MB 3121 MB 47.6 KB
1048576 21 MB 21083 MB
20.11 KB
Circuit properties Device : Virtex-E
Maximum delayWalk Module : 3.99 ns
Memory Module : 5.82 ns
Estimated frequency = (1000/9.81)= 101.93 MHz
Number of LUTs per walk : 593 Number of slices per walk : 307
Request Size Distribution
0
2000
4000
6000
8000
10000
12000
8 40 72 104 136 168 200 232 264 296 328 360 392 424
Request Size (KB)
Num
ber o
f Req
uests
(o
ut of
2000
0)
ObservedExpected
Popularity Distribution
0
100
200
300
400
500
600
File Rank
Numb
er of
Req
uests
(out
of 20
000) Observed
Expected