Online Phase-Adaptive Data Layout...
Transcript of Online Phase-Adaptive Data Layout...
1
Online Phase-AdaptiveData Layout Selection
ECOOP, 10 July 2008
Chengliang Zhang Martin HirzelMicrosoft
(former IBM intern) IBM
2
Online Phase-AdaptiveData Layout Selection
Cache line or page A
Cache line or page B
Object 1 Object 2
Object 3 Object 4 try
measure
decide
Notraining
run.
Problem Statement
3
Data Layouts fromCopying Garbage Collection
BF
HI
4
Layout Performance Comparison
8-processor AMD
HI faster
BF faster
5
Multi-Armed Bandit Problem
BF
HI
Depth-first
Allocation order
Popularity
Random
Size
Type
Thread
6
Layout Auditing
tryData
reorganizer
Program
measure
Profiler
decide
Controller
PerformanceData
Layout
Perfor-mance
ProfilingDecision
LayoutDecision
Reward
7
Profiler
measure
ProfilerProfilingDecision
Reward
Data reorg.
Program
Data reorg.
Program
Data reorg.
Program
Data layout l3 l4 l5Physical time(wall clock) r3 e3 r4 e4 r5 e5Virtual time
(allocated bytes) v3 v4 v5
Reward for layout li uses historical average of:• Virtual time vi / program execution time ei• Virtual time vi-1 / reorganizer time ri
(always on)
8
Controller: Blind Justice
ProfilingDecision
LayoutDecision
Rewards
Goals• Match performance
of best layout• Online
Challenges• Confidence
vs. Curiosity• Phase changes
vs. Noise
9
Confidence vs. Curiosity
Confidence CuriosityNever tried
layoutFew samples /High variance
Many samples /Low variance
0 ∞
Pick layout l if either:• High confidence that l gives best reward• High curiosity about l ’s reward
⇒ use simulated annealing
Phase Changes vs. NoisePhase Adaptivity– When layout performance
changes, learn new bestlayout
⇒Forget historical rewards
Noise Tolerance– Perturbation from
extraneous causes⇒ Remember
historical rewards
10⇒ use exponential decay
11
SASO Properties ofControl Systems
• Stability• Accuracy• Settling• Overshoot
• Phase adaptivity
• Overhead
Methodology
12
20 Java programs(DaCapo suite, SPECjvm98 suite,and a few more)
J9 = IBM’s product Java VMHI
hierarchicalBF
breadth-firstLA
layout auditing
4 Hardware PlatformsIntel-2 AMD-2 AMD-4 AMD-8
Accuracy and Overhead
13
0
1
2
Intel-2 AMD-2 AMD-4 AMD-8
BF
HI
LA
Average %slowdown
vs. best
0
5
10
Intel-2 AMD-2 AMD-4 AMD-8
BF
HI
LA
0
10
20
Intel-2 AMD-2 AMD-4 AMD-8
BF
HI
LA
Number ofprograms
not optimal
Worst %slowdown
vs. best
Stability and Settling
14
Decay = 0.9 No decay (Decay=1.0)
75s
20s
BFbetter
HIbetter
HIbetter
BFbetter
HIbetter
HIbetter
Reward
LayoutDecision
BF
HI
15
Related Work
• Lau/Arnold/Hind/Calder PLDI’06:performance auditing for JIT optimization
• Soman/Krintz/Bacon ISMM’04: switchcopy vs. mark-sweep, generations or not
• Chen/Bhansali/Chilimbi/Gao/ChuangPLDI’06: throttle unless miss rate reduced
• Saavedra/Park PACT’96: adapt prefetchdistance based on cancellation & latency
16
• Accurate• Phase adaptive (good settling/stability)• Negligible overhead profiling• Online, hardware independent
Conclusions
17
Clustering Layouts by Performance[SIGMETRICS 2007]
BFHI