R®
Slide 1MICRO 2004
The Fuzzy Correlation between Code and Performance
PredictabilityMurali Annavaram, Ryan Rakvic, Marzia
Polito, Jean-Yves Bouguet, Richard Hankins, Bob Davies
Intel Corporation
Acknowledgements to John Shen, Carole Dulong, Brad Calder
R®
Slide 2MICRO 2004
Why Correlate Code and Performance?
• Predict CPI by observing just EIPs (or PCs)– Assume, similar EIP sequence -> similar CPI– Shown to work well for some CPU2K benchmarks – For improving simulation speed and dynamic optimizations
• Do server workloads exhibit this correlation?– Large code base and non-loopy code path– Processes communicate through inter process communication– Use of OS services
• Regression Trees– Quantify CPI prediction accuracy using EIPs– Find upper bound for correlation
R®
Slide 3MICRO 2004
Workloads & Experimental Infrastructure
• Three representative server workloads– ODB-C
• OLTP benchmark on Oracle 10g RDBMS– ODB-H
• DSS benchmark on Oracle 10g RDBMS– SPECjAppServer (SjAS)
• 3-Tier Application• Focus on middleware application server running BEA Weblogic
• Workloads tuned for maximum CPU utilization• Hardware Configuration
– Itanium-2 processor based system– Red Hat 2.1 + kernel 2.4.9-e.10smp– 16 GB of DDR memory– 34 Ultra320 SCSI 73 GB drives (used in ODB-C and ODB-H)– 200 MHz FS Bus
R®
Slide 4MICRO 2004
Tool Chain: Step 1: Data Collection
Benchmark
• VTUNE: Non-intrusive performance monitoring of physical systems
• Samples hardware counters • No code instrumentation/recompilation• Collects EIP & TSC once every 1M instructions
– 2% execution overhead– Sampling at 100K instructions has negligible effect
• Sampled EIPs are a good approximation for code path
EIP & Time Stamp Counter (TSC)VTUNE
R®
Slide 5MICRO 2004
Step 2: Vector Creation
• Execution divided into 100M instruction interval• Create 1 EIPV per interval
– 100 VTune samples per EIPV– EIPV has sample count of each unique EIP in that interval– Any EIP not sampled in an interval has zero count
• Instantaneous CPI associated with EIPVs– CPI = (End Time stamp - Begin Time Stamp) / 100M
Vector Creation
1.1010EIPV2
2.080EIPV1
1.0010
CPIEIP1EIP0
1.10100
2.020
1.00100EIPV0
CPI
EIP and TSC (EIPV, CPI)
R®
Slide 6MICRO 2004
Step 3: Regression Tree Analysis
0.702080EIPV7
2.580020EIPV6
2.1602020EIPV5
2.0602020EIPV4
0.602080EIPV3
2.680200EIPV2
1.120080EIPV1
1.000100EIPV0
CPIEIP2EIP1EIP0
• Divide EIPVs into clusters where sum of CPI variance minimized– CPI optimally drives EIPV clustering; machine dependent– By construction, EIPVs in the same cluster will have similar CPI
• Example: The CPI variance of regression tree clusters is smallest for all possible clusters of size 4
• K-means comparison: Clustering using distance between vectors – Does not use CPI values in clustering; machine independent
EIP1 <= 0EIP2 > 60
EIP0, 20
EIP2, 60 EIP1, 0
EIPV4, 2.0EIPV5, 2.1
EIPV2, 2.6EIPV6, 2.5
EIPV0, 1.0EIPV1, 1.1
EIPV3, 0.6EIPV7, 0.7
EIP1 > 0
EIP0 > 20EIP0<=20
4 clusters (k=4)
EIP2 <= 60
R®
Slide 7MICRO 2004
Computing Relative Error Metric• In our study, limit the tree size to 50 clusters
– > 50 clusters does not reduce CPI variance• Compute CPI variance for all K clusters for each tree TK
(1<=K<=50)• Relative Error (RE) = weighted sum of cluster CPI
variance / overall CPI variance
EIP0 <= 20 EIP0 > 20
EIPV2, 2.6
EIPV4, 2.0
EIPV5, 2.1
EIPV6, 2.5
EIPV0, 1.0
EIPV1, 1.1
EIPV3, 0.6
EIPV7, 0.7
EIP0, 20
2 clusters
EIP1 <= 0EIP2 > 60
EIP0, 20
EIP2, 60 EIP1, 0
EIPV4, 2.0EIPV5, 2.1
EIPV2, 2.6EIPV6, 2.5
EIPV0, 1.0EIPV1, 1.1
EIPV3, 0.6EIPV7, 0.7
EIP1 > 0
EIP0 > 20EIP0<=20
4 clusters (k=4)
EIP2 <= 60
RE=(.087+.057)/.662=0.22 RE=(.005+.005+.005+.005)/0.662=0.03
CPI Var of RightCPI Var of Left
R®
Slide 8MICRO 2004
Interpreting Relative Error Metric• RE represents CPI variance explained by EIPVs
– RE=0.15 means that 85% of the CPI variance explained by EIPVs.
– If RE~1 then EIPVs have no relationship with CPI• Small RE + Small tree size (K)
– workload behavior exhibits a small number of dominant phases
• If regression tree is large– Irrespective of RE, EIPVs and CPI relationship not
regulated by few dominant phases• If CPI variance is small (uniform CPI)
– No need for regression trees – Simple average is acceptable
R®
Slide 9MICRO 2004
Regression Tree Results - ODB-C and SjAS
• ODB-C– CPI has no correlation with EIPs
• SjAS – Only 20% of CPI variance explained by EIPs
00.20.40.60.8
11.2
1.41.61.8
0 5 10 15 20 25 30K
Rela
tive
Erro
rODB-CSjAS
R®
Slide 10MICRO 2004
A Visual Explanation - ODB-C and SjAS
• Many unique EIPs compared to SPEC– 24K in ODB-C, 31K in SjAS, compared to 646 in MCF from CPU2K
• Small CPI variance – 0.01 in ODB-C and 0.03 in SjAS
• Performance independent of EIPs
ODB-CEIPs
CPI
SjAS
EIPs
CPI
Time (sec) 2000
R®
Slide 11MICRO 2004
0.0
0.5
1.0
1.5
2.0
2.5
Time
CPI
EXEFEOtherWork
CPI breakdown – ODB-C and SjAS
• L3 misses occur frequently and uniformly– 50% of ODB-C CPI, 35% of SjAS CPI due to L3
• L3 misses overwhelmed other microarchitectural bottlenecks– CPI determined by L3 miss latency
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
Time
CP
I
EXEFEOtherWork
ODB-C SjAS
R®
Slide 12MICRO 2004
ODB-H – Q13• Three functions
– Sequential scan, join, sort• Repeated execution on
large input data• Relative error drops to
~0.15– 85% of CPI variance
explained by EIPs• Distinct phases
0
0.2
0.4
0.6
0.8
1
1.2
1 9 13 18 24 28 34 39
K
Rel
ativ
e Er
ror
EIPs
CPI
time
R®
Slide 13MICRO 2004
ODB-H – Q18• Same 3 functions
– but uses index scan instead of sequential scan
– more cache and branch misses
• CPI varies widely for the same code
• Relative error is 1.10
0.2
0.4
0.6
0.8
1
1.2
1 4 6 8 10 12 14 16 18 29 32
K
Rel
ativ
e Er
ror
EIPs
CPI
time
R®
Slide 14MICRO 2004
0.040.02Q11
0.060.03Q6
0.060.06Q12
0.090.08Q14
0.120.03Q20
0.120.02Q8
0.150.02Q13
0.150.02Q210.090.01Q2
0.150.04Q22 0.150.01Q17
0.160.02 Q4
0.17 0.02 Q9
0.180.03 Q15
0.180.02 Q10 0.25 0.01 Q16
0.18 0.03 Q5 0.36 0.01 Q3
0.19 0.04 Q7 0.37 0.01 Q19
0.90 0.03 SjAS 0.78 0.01 Q1
1.00 0.65 Q18 1.01 0.00 ODB-C
RECPI VarBmarkRECPI VarBmark
QI
QII
QIII
QIV
CPI Variance
Rel
ativ
e E
rror
• Classify benchmarks into four quadrants using CPI and RE
• Q-I and Q-II have low CPI variance– Relative Error is irrelevant– Uniform sampling is OK
• Q-III : Predicting CPI from EIPVs alone can not achieve accuracy – Machine dependent
parameters needed to capture CPI variations
• Q-IV benchmarks benefit from simple EIP based phase detection
Quadrant Classification
R®
Slide 15MICRO 2004
Summary• Using regression trees to identify optimal EIP-CPI
relationship in three server workloads
• ODB-C and SjAS– CPI has no correlation with EIPs– Large code segments– Uniform CPI dominated by L3 misses
• ODB-H exhibit a range of behaviors– Small code path– Algorithmic changes significantly impact CPI and EIP relation
• Quadrant based classification – No single sampling technique effective for all– Shows best-suited sampling to accurately capture CPI variance
Top Related