Download - The Fuzzy Correlation between Code and Performance ...Murali Annavaram, Ryan Rakvic, Marzia Polito, Jean-Yves Bouguet, Richard Hankins, Bob Davies Intel Corporation Acknowledgements

R®

Slide 1MICRO 2004

The Fuzzy Correlation between Code and Performance

PredictabilityMurali Annavaram, Ryan Rakvic, Marzia

Polito, Jean-Yves Bouguet, Richard Hankins, Bob Davies

Intel Corporation

Acknowledgements to John Shen, Carole Dulong, Brad Calder

R®

Slide 2MICRO 2004

Why Correlate Code and Performance?

• Predict CPI by observing just EIPs (or PCs)– Assume, similar EIP sequence -> similar CPI– Shown to work well for some CPU2K benchmarks – For improving simulation speed and dynamic optimizations

• Do server workloads exhibit this correlation?– Large code base and non-loopy code path– Processes communicate through inter process communication– Use of OS services

• Regression Trees– Quantify CPI prediction accuracy using EIPs– Find upper bound for correlation

R®

Slide 3MICRO 2004

Workloads & Experimental Infrastructure

• Three representative server workloads– ODB-C

• OLTP benchmark on Oracle 10g RDBMS– ODB-H

• DSS benchmark on Oracle 10g RDBMS– SPECjAppServer (SjAS)

• 3-Tier Application• Focus on middleware application server running BEA Weblogic

• Workloads tuned for maximum CPU utilization• Hardware Configuration

– Itanium-2 processor based system– Red Hat 2.1 + kernel 2.4.9-e.10smp– 16 GB of DDR memory– 34 Ultra320 SCSI 73 GB drives (used in ODB-C and ODB-H)– 200 MHz FS Bus

R®

Slide 4MICRO 2004

Tool Chain: Step 1: Data Collection

Benchmark

• VTUNE: Non-intrusive performance monitoring of physical systems

• Samples hardware counters • No code instrumentation/recompilation• Collects EIP & TSC once every 1M instructions

– 2% execution overhead– Sampling at 100K instructions has negligible effect

• Sampled EIPs are a good approximation for code path

EIP & Time Stamp Counter (TSC)VTUNE

R®

Slide 5MICRO 2004

Step 2: Vector Creation

• Execution divided into 100M instruction interval• Create 1 EIPV per interval

– 100 VTune samples per EIPV– EIPV has sample count of each unique EIP in that interval– Any EIP not sampled in an interval has zero count

• Instantaneous CPI associated with EIPVs– CPI = (End Time stamp - Begin Time Stamp) / 100M

Vector Creation

1.1010EIPV2

2.080EIPV1

1.0010

CPIEIP1EIP0

1.10100

2.020

1.00100EIPV0

CPI

EIP and TSC (EIPV, CPI)

R®

Slide 6MICRO 2004

Step 3: Regression Tree Analysis

0.702080EIPV7

2.580020EIPV6

2.1602020EIPV5

2.0602020EIPV4

0.602080EIPV3

2.680200EIPV2

1.120080EIPV1

1.000100EIPV0

CPIEIP2EIP1EIP0

• Divide EIPVs into clusters where sum of CPI variance minimized– CPI optimally drives EIPV clustering; machine dependent– By construction, EIPVs in the same cluster will have similar CPI

• Example: The CPI variance of regression tree clusters is smallest for all possible clusters of size 4

• K-means comparison: Clustering using distance between vectors – Does not use CPI values in clustering; machine independent

EIP1 <= 0EIP2 > 60

EIP0, 20

EIP2, 60 EIP1, 0

EIPV4, 2.0EIPV5, 2.1




EIP1 > 0

EIP0 > 20EIP0<=20

4 clusters (k=4)

EIP2 <= 60

R®

Slide 7MICRO 2004

Computing Relative Error Metric• In our study, limit the tree size to 50 clusters

– > 50 clusters does not reduce CPI variance• Compute CPI variance for all K clusters for each tree TK

(1<=K<=50)• Relative Error (RE) = weighted sum of cluster CPI

variance / overall CPI variance

EIP0 <= 20 EIP0 > 20

EIPV2, 2.6

EIPV4, 2.0

EIPV5, 2.1

EIPV6, 2.5

EIPV0, 1.0

EIPV1, 1.1

EIPV3, 0.6

EIPV7, 0.7

EIP0, 20

2 clusters

EIP1 <= 0EIP2 > 60

EIP0, 20

EIP2, 60 EIP1, 0





EIP1 > 0

EIP0 > 20EIP0<=20

4 clusters (k=4)

EIP2 <= 60

RE=(.087+.057)/.662=0.22 RE=(.005+.005+.005+.005)/0.662=0.03

CPI Var of RightCPI Var of Left

R®

Slide 8MICRO 2004

Interpreting Relative Error Metric• RE represents CPI variance explained by EIPVs

– RE=0.15 means that 85% of the CPI variance explained by EIPVs.

– If RE~1 then EIPVs have no relationship with CPI• Small RE + Small tree size (K)

– workload behavior exhibits a small number of dominant phases

• If regression tree is large– Irrespective of RE, EIPVs and CPI relationship not

regulated by few dominant phases• If CPI variance is small (uniform CPI)

– No need for regression trees – Simple average is acceptable

R®

Slide 9MICRO 2004

Regression Tree Results - ODB-C and SjAS

• ODB-C– CPI has no correlation with EIPs

• SjAS – Only 20% of CPI variance explained by EIPs

00.20.40.60.8

11.2

1.41.61.8

0 5 10 15 20 25 30K

Rela

tive

Erro

rODB-CSjAS

R®

Slide 10MICRO 2004

A Visual Explanation - ODB-C and SjAS

• Many unique EIPs compared to SPEC– 24K in ODB-C, 31K in SjAS, compared to 646 in MCF from CPU2K

• Small CPI variance – 0.01 in ODB-C and 0.03 in SjAS

• Performance independent of EIPs

ODB-CEIPs

CPI

SjAS

EIPs

CPI

Time (sec) 2000

R®

Slide 11MICRO 2004

0.0

0.5

1.0

1.5

2.0

2.5

Time

CPI

EXEFEOtherWork

CPI breakdown – ODB-C and SjAS

• L3 misses occur frequently and uniformly– 50% of ODB-C CPI, 35% of SjAS CPI due to L3

• L3 misses overwhelmed other microarchitectural bottlenecks– CPI determined by L3 miss latency

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

Time

CP

I

EXEFEOtherWork

ODB-C SjAS

R®

Slide 12MICRO 2004

ODB-H – Q13• Three functions

– Sequential scan, join, sort• Repeated execution on

large input data• Relative error drops to

~0.15– 85% of CPI variance

explained by EIPs• Distinct phases

0

0.2

0.4

0.6

0.8

1

1.2

1 9 13 18 24 28 34 39

K

Rel

ativ

e Er

ror

EIPs

CPI

time

R®

Slide 13MICRO 2004

ODB-H – Q18• Same 3 functions

– but uses index scan instead of sequential scan

– more cache and branch misses

• CPI varies widely for the same code

• Relative error is 1.10

0.2

0.4

0.6

0.8

1

1.2

1 4 6 8 10 12 14 16 18 29 32

K

Rel

ativ

e Er

ror

EIPs

CPI

time

R®

Slide 14MICRO 2004

0.040.02Q11

0.060.03Q6

0.060.06Q12

0.090.08Q14

0.120.03Q20

0.120.02Q8

0.150.02Q13

0.150.02Q210.090.01Q2

0.150.04Q22 0.150.01Q17

0.160.02 Q4

0.17 0.02 Q9

0.180.03 Q15

0.180.02 Q10 0.25 0.01 Q16

0.18 0.03 Q5 0.36 0.01 Q3

0.19 0.04 Q7 0.37 0.01 Q19

0.90 0.03 SjAS 0.78 0.01 Q1

1.00 0.65 Q18 1.01 0.00 ODB-C

RECPI VarBmarkRECPI VarBmark

QI

QII

QIII

QIV

CPI Variance

Rel

ativ

e E

rror

• Classify benchmarks into four quadrants using CPI and RE

• Q-I and Q-II have low CPI variance– Relative Error is irrelevant– Uniform sampling is OK

• Q-III : Predicting CPI from EIPVs alone can not achieve accuracy – Machine dependent

parameters needed to capture CPI variations

• Q-IV benchmarks benefit from simple EIP based phase detection

Quadrant Classification

R®

Slide 15MICRO 2004

Summary• Using regression trees to identify optimal EIP-CPI

relationship in three server workloads

• ODB-C and SjAS– CPI has no correlation with EIPs– Large code segments– Uniform CPI dominated by L3 misses

• ODB-H exhibit a range of behaviors– Small code path– Algorithmic changes significantly impact CPI and EIP relation

• Quadrant based classification – No single sampling technique effective for all– Shows best-suited sampling to accurately capture CPI variance