CSE Dept., (XHU) 1 The Salishan conference on High-Speed Computing No Free Lunch, No Hidden Cost X....
-
Upload
lesley-bailey -
Category
Documents
-
view
217 -
download
1
Transcript of CSE Dept., (XHU) 1 The Salishan conference on High-Speed Computing No Free Lunch, No Hidden Cost X....
CSE Dept., (XHU) 1The Salishan conference on High-Speed Computing
No Free Lunch, No Hidden Cost
X. Sharon Hu
Dept. Computer Science and Engineering
University of Notre Dame
11Department of Computer Science and Engineering
How Can Co-Design Help?
The Salishan Conference on High-Speed Computing
CSE Dept., (XHU) 2The Salishan conference on High-Speed Computing
Theme: Exposing Hidden Execution Costs
Cost of execution: performance and power Computation Communication Data motion Synchronization …
How can we strike a balance between the extremes? Hide as much as possible? Explicitly manage “all” costs?
My “position”: Expose widely and choose wisely Focus on power
CSE Dept., (XHU) 3The Salishan conference on High-Speed Computing
Why Taking the Position?
Expose widely Better understanding the contribution by each
component Allowing application-specific tradeoffs Providing opportunities for powerful co-design tools
Choose wisely Requiring sophisticated co-design tools Exploring more algorithm/software options
CSE Dept., (XHU) 4The Salishan conference on High-Speed Computing
But Easier Said Than Done! Heterogeneity
Compute nodes: (multi-core) CPU, GP-GPU, FPGA, … Memory components: on-chip, on-board, disks, … Communication infrastructure: bus, NoC, networks, …
Parallelism (”non-determinism”) Data access: movement, coherence, … Resource contention synchronization
CSE Dept., (XHU) 5The Salishan conference on High-Speed Computing
Outline
Why expose widely?
How to benefit from exposing widely?
How to choose wisely?
Going forward
CSE Dept., (XHU) 6The Salishan conference on High-Speed Computing
Why Expose Widely? (1)
Different programs has different power distribution
MemoryConstSM
ConstCache
TextCache
GPU Cores
}
Hong and Kim, ISCA 2010
GPU Power Distribution (NVidia GTX 280)
CSE Dept., (XHU) 7The Salishan conference on High-Speed Computing
Why Expose Widely? (2)
Energy consumptions of three sorting algorithms (Pentium 4 + GeForce 570)
Data movement impacts different algorithms differently
CSE Dept., (XHU) 8The Salishan conference on High-Speed Computing
Why Expose Widely? (3)
Application dependent
Massaki Kondo, et. al., SigARCH 2007
Performance degradation due to memory bus contention
CSE Dept., (XHU) 9The Salishan conference on High-Speed Computing
Outline
Why expose widely?
How to benefit from exposing widely?
How to choose wisely?
Going forward
CSE Dept., (XHU) 10The Salishan conference on High-Speed Computing
How to Benefit from “Exposing Widely”?
Co-design is the key Expose all factors impacting the “execution model”
Computation: processing resource Data motion: memory components and hierarchy Communication: bus and network Resource contention, synchronization… Some examples
Software macromodelingHardware module-based modeling
Optimize through power management Keep in mind Amdahl’s law
CSE Dept., (XHU) 11The Salishan conference on High-Speed Computing
Macromodeling: Algorithm Complexity Based
Relate power/energy of a program with its complexity
Example: E = C1S + C2S2 + C3S3 (Tan, et. al. DAC’01) where S is the size of the array for a sorting algorithm
Example: Ecomm = C0 + C1S (Loghi, et. al. ACMTECS’07) where S is the size of exchanged messages
More sophisticated models to account for both computing and communication
How to handle resource contention?
CSE Dept., (XHU) 12The Salishan conference on High-Speed Computing
Power Modeling of Bus Contension
Penolazzi, Sander and Ahmed Hemani: DATE’11 Characterization step
C%N,1 : percentage of cycle difference between the N-
processor case and 1-processor case Can be one by IP providers on chosen benchmarks
Prediction step
)1(,)(
)( %1, CTCt
TNt
cycleE
NE Nstall
idleaa EnEEnE )()1()(
CSE Dept., (XHU) 13The Salishan conference on High-Speed Computing
Hierarchical Module-Based Power Modeling Accumulate energy/power of modules
CPU+GPU example
Access rate: software dependent
Data movement contributes to memory power
Resource contention modifies access rate
)()()( iotheri
iitotal MPMPMUtilP
idlei
imemGPUCPUtotal PMPPPPP )(
)()(
)()()(
ii
iii
MNonGatedPMMaxP
MgArchScalinMAccessRateMP
Adapted from Isci and Martonosi, Micro’03
CSE Dept., (XHU) 14The Salishan conference on High-Speed Computing
Outline
Why expose widely?
How to benefit from exposing widely?
How to choose wisely?
Going forward
CSE Dept., (XHU) 15The Salishan conference on High-Speed Computing
Managing Bus Contention to Reduce Energy
M. Kondo, H. Sasaki and H. Nakamura, 2006
Counter for mem request
Register for PU identification
Thresholds for selecting which PU uses what Vdd value
CSE Dept., (XHU) 16The Salishan conference on High-Speed Computing
Application Mapping to Reduce Energy (1)
Application mapping for heterogeneous systems
J1 J2
J3 J4
([minR1,maxR1], D1) ([minR2,maxR2], D2)
PE 1 PE 2
PE 3 PE 4
Memory
([minR4,maxR4], D4)([minR3,maxR3], D3)
R. Racu, R. Ernst, A. Hamann, B. Mochocki and X. Hu, “Methods for power optimization in distributed embedded systems with real-time requirements,”, CASES’06.
CSE Dept., (XHU) 17The Salishan conference on High-Speed Computing
Application Mapping to Reduce Energy (2)
Optimization: Minimize power/energy dissipation Satisfying timing properties (e.g. average path latency,
average lateness, etc.) …
Search Space: Scheduling parameter, traffic shaping, … Task level DVFS, i.e. task speed assignment Resource level DVFS, i.e., resource speed assignment …
CSE Dept., (XHU) 18The Salishan conference on High-Speed Computing
Application Mapping (3): Sensitivity Analysis
R. Racu, R. Ernst, A. Hamann, B. Mochocki and X. Hu, “Methods for power optimization in distributed embedded systems with real-time requirements,”, CASES’06.
CSE Dept., (XHU) 19The Salishan conference on High-Speed Computing
Application Mapping (4): GA-Based Approach
PowerAnalyzer
2’. Scheduling Trace
3’. Power Dissipation
Power model needed
CSE Dept., (XHU) 21The Salishan conference on High-Speed Computing
Outline
Why expose widely?
How to benefit from exposing widely?
How to choose wisely?
Going forward
CSE Dept., (XHU) 22The Salishan conference on High-Speed Computing
Going Forward: Systematic Co-design Effort
Expose more More hardware counters / registers More efficient/accurate high-level power models Better models for resource contention and
synchronization
Choose better Handling parallelism
Algorithm, OS, hardwareResource contentionsynchronization
Handling non-determinismWorst case boundsStatistical analysisInterval-based techniques
CSE Dept., (XHU) 23The Salishan conference on High-Speed Computing
ES Design v.s. HPCS Design Differences (maybe)
Application specific workloads v.s. domain specific workloads
Constraints, objectives, desirables? latency, throughput, energy, cost, reliability, fault
tolerance, IP protection/privacy, ToM, … Other issues: homogeneous v.s. heterogeneous, levels
of complexity, user expertise,…
Similarities Ever increasing hardware capability: multi-core, multi-
thread, complex communication fabrics, memory hierarchy, …
Productivity gap Common concerns: latency, throughput, energy, cost,
reliability, fault tolerance, …
CSE Dept., (XHU) 24The Salishan conference on High-Speed Computing
Leverage Co-Design for HPC
Systematic performance estimation Formal methods: scenario-based, statistical analysis Hybrid approaches: analytical+simulation Seamless migration from one abstraction level to the
next
Efficient design space exploration Efficient search techniques Multiple-level abstraction models Multiple-attribute optimization Others: memory and communication analysis and
design