Congestion Management Process (CMP) Briefing to the Technical Committee May 2, 2012.
© 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012...
-
Upload
dustin-park -
Category
Documents
-
view
221 -
download
2
Transcript of © 2012 IBM Corporation Barcelona Supercomputing Center MICRO 2012 Tuesday, December 4, 2012...
© 2012 IBM CorporationBarcelona Supercomputing Center
MICRO 2012 Tuesday, December 4, 2012
Systematic Energy Characterization of CMP/SMT Processor Systems via Automated Micro-Benchmarks R. Bertran*+, A. Buyuktosunoglu*, M. Gupta*, M. Gonzalez+, P. Bose*
*IBM T.J. Watson Research Center+Barcelona Supercomputing Center
© 2012 IBM CorporationBarcelona Supercomputing Center
MICRO 2012 Tuesday, December 4, 2012
2
What is the maximum power consumption?
Any performance bug?Any reliability issues?
…
Time consuming and tedious – Error prone task
• Trial and error process – Several micro-
benchmarks are required
Deep expertise limited to few designers
– Detailed knowledge of the underlying architecture is required
Why do we need micro-benchmarks?
Micro-benchmarks!
AUTOMATED
SOLUTION N
EEDED!
© 2012 IBM CorporationBarcelona Supercomputing Center
MICRO 2012 Tuesday, December 4, 2012
MicroProbe:a micro-benchmark generation framework
© 2012 IBM CorporationBarcelona Supercomputing Center
MICRO 2012 Tuesday, December 4, 2012
MicroProbe Workflow
MicroProbeFramework
User
Micro-Bench-mark
Inputs Outputs
Micro-benchmarkgeneration
policy
ArchitectureDefinition
files
Endless loop50% INT 50% FPEndless loop for each instruction
of the ISA
Micro-Bench-mark
Micro-Bench-mark
Micro-Bench-mark
Max Powerstressmark
External tools
Realplatforms
Simulators Models
© 2012 IBM CorporationBarcelona Supercomputing Center
MICRO 2012 Tuesday, December 4, 2012
MicroProbe: Distinguishing Features
5
Feature Previous works MicroProbeISA queries- Instruction type - Operand length, binary codification etc. (manual)
Micro-architecture queries- Functional unit, latency, throughput, energy per instruction, average instruction power etc.
(manual)
Micro-architecture models- Set-associative cache model (no)
Code generation- Skeleton and instruction definition passes, memory modeling pass, branch modeling pass, ILP definition pass.
- Configurable passes (no)
Design space exploration- Integrated (no) - GA-based search - Exhaustive search (manual) - Customizable search (manual)
© 2012 IBM CorporationBarcelona Supercomputing Center
MICRO 2012 Tuesday, December 4, 2012
MicroProbe Usage and Design Overview
Researchidea
Micro-benchmark generation policies (user-defined scripts)
Loop stressingthe floating
point unit
Sequence of loadshitting 50% L1
and 50% L2
Generate a stress-mark for each functionalunit of the architecture
Search for the sequence of 2loads and 2 integer operations
with maximum IPC
MicroProbe Framework (Python API)
Architecture module Code generationmodule
Design spaceexploration moduleISA
definitionsISA
definitionsISA
definitions
Micro-architectureanalytical modelsMicro-architectureanalytical modelsMicro-architectureanalytical models
Micro-architecturedefinitions
Micro-architecturedefinitions
Micro-architecturedefinitions
Micro-benchmarksynthesizer
PassesPassesPasses
SearchdriversSearchdriversSearchdrivers
PropertiesPropertiesProperties
Micro-benchmarkMicro-benchmarkMicro-benchmark
Automaticbootstrapprocess
External tools
© 2012 IBM CorporationBarcelona Supercomputing Center
MICRO 2012 Tuesday, December 4, 2012
Max-power Stressmark Generation
7
Use MicroProbe to
generate max-power
stressmark
Characterize energy per instruction (EPI) and IPC (Architecture Module)
Select N instructions with max (IPC* EPI)
Form a basic endless loop (e.g. 4K) using
selected instructions (Code Generation Module)
Generate micro-benchmarks with different orders of the selected N
instructions
Evaluate using Design Space Exploration
Module
Pick the highest power microbenchmark
Loop:…mulldomulldolxvw4xlxvw4xxvnmsubmdpxvnmsubmdp…
mulldoxvnmsubmdp
lxvw4x
Loop:…mulldolxvw4xmulldoxvnmsubmdplxvw4xxvnmsubmdp…
© 2012 IBM CorporationBarcelona Supercomputing Center
MICRO 2012 Tuesday, December 4, 2012
CASE STUDIES
MicroProbe:A Micro-benchmark Generation Framework
8
© 2012 IBM CorporationBarcelona Supercomputing Center
MICRO 2012 Tuesday, December 4, 2012
Experimental Methodology
Platform:– Processor: POWER7 @ 3GHz
• 8-core 4-way SMT• 32KB L1, 256KB L2 and 4MB L3 per core
– Memory: 32 GB DDR3 SDRAM @ 800MHz– OS: RHEL 5.7 + Linux 3.0.1– EnergyScale architecture
• Power measurements in miliwatts• Sampling rate up to 1ms
In-house software collects power and performance counter traces [C. Lefurgy et al, IBM]
9
© 2012 IBM CorporationBarcelona Supercomputing Center
MICRO 2012 Tuesday, December 4, 2012
Case Study 1: EPI Characterization
10
High differences in EPI across instructions stressing different micro-
architecture components
High differences in EPI across instructions stressing the same micro-
architecture components and at the same rate (IPC)
© 2012 IBM CorporationBarcelona Supercomputing Center
MICRO 2012 Tuesday, December 4, 2012
MicroProbe
Heuristic:Max(EPI * IPC)
Selected instructions:mulldo,
xvnmsubmdp,lxvw4x
Case Study 2: Max-power Stressmark Generation
11
?Use a
computational intensive kernel
Use complex instructions
accessing different functional units with
high IPC
Generate all possible combinations of
complex instructions stressing different
units
Use MicroProbe
DAXPYSelected
intructions:mullw
xvmaddadplxvd2x
Loop:…mullwmullwxvmaddadpxvmaddadplxvd2xlxvd2x…
Loop:…mullwlxvd2xmullwxvmaddadpxvmaddadplxvd2x…
Loop:…mullwlxvd2xmullwxvmaddadplxvd2xxvmaddadp…
MicroProbe
LoopsLoopsLoopsLoopsLoopsLoopsLoopsLoopsLoopsLoopsLoopsLoops
ExpertDSE
Expertmanual
MicroProbe
© 2012 IBM CorporationBarcelona Supercomputing Center
MICRO 2012 Tuesday, December 4, 2012
Max-power Stressmark Generation
12
Max-power results
0.6
0.7
0.8
0.9
1
1.1
1.2
DAXPY Expert Manual Expert DSE MicroProbe
Methods
No
rmal
ized
po
wer
Min
Mean
Max
© 2012 IBM CorporationBarcelona Supercomputing Center
MICRO 2012 Tuesday, December 4, 2012
Case Study 3: Counter-based Processor Power Model
13
Bottom-up
Power modelingmethod
Dynamic Power
f(PMCs)
Intercept SMT1
Intercept SMT2-4
SMT effect
Linear Regression
f(CMP)
CMP effect
Uncore power
Func.Unit micro-BenchmarksCMP1–SMT1
Random micro-BenchmarksCMP1–SMT1
Random micro-Benchmarks
CMP1–SMT2/4
Random micro-Benchmarks
CMP1/8–SMT2/4
Model:
cores
k
#
1
Dynamic Power
f(PMCs)
SMTeffect
CMP effect
Uncore power
SMT enabled
#cores
threads
k
#
1
1
2
3
© 2012 IBM CorporationBarcelona Supercomputing Center
MICRO 2012 Tuesday, December 4, 2012
Counter-based Processor Power ModelValidation
Within acceptable error margins: < 4% on average
Model accuracy results on SPEC CPU2006
0123456789
10
1-1 1-2 1-4 2-1 2-2 2-4 4-1 4-2 4-4 6-1 6-2 6-4 8-1 8-2 8-4 Mean
CMP - SMT configuration
% E
rro
r
Micro trained
Random trained
SPEC trained
Proposed
© 2012 IBM CorporationBarcelona Supercomputing Center
MICRO 2012 Tuesday, December 4, 2012
Counter-based Processor Power ModelValidation on Corner Cases
Models trained using non-micro-architecture aware training sets show high errors and variability
Models trained using the micro-architecture aware training set show acceptable error margins: < 5% on average
Model accuracy results
0
5
10
15
20
FXUHigh
FXULow
L1Loads
MainMemory
VSUHigh
VSULow
Mean
Validation set
% E
rro
r
Micro trained
Random trained
SPEC trained
Proposed
62%
© 2012 IBM CorporationBarcelona Supercomputing Center
MICRO 2012 Tuesday, December 4, 2012
Conclusions
MicroProbe is a productive micro-benchmark generation framework
– Adaptive and flexible– Includes micro-architecture semantics– Integrates design space exploration
Presented three case studies:– Instruction-based EPI characterization– Automated max-power stressmark generation– CMP/SMT-aware bottom-up counter-based processor power model
16
© 2012 IBM CorporationBarcelona Supercomputing Center
MICRO 2012 Tuesday, December 4, 2012
QUESTIONS?
MicroProbe:A Micro-benchmark Generation Framework
17