Lecture 4 Parallel Computing Performanceece.uprm.edu/~wrivera/ICOM6025/Lecture4.pdf · Lecture 4...
Transcript of Lecture 4 Parallel Computing Performanceece.uprm.edu/~wrivera/ICOM6025/Lecture4.pdf · Lecture 4...
Dr. Wilson Rivera
ICOM 6025: High Performance ComputingElectrical and Computer Engineering Department
University of Puerto Rico
Lecture 4Parallel Computing Performance
• Goal: understand different methodologies and metrics to monitor, evaluate, and actuate under performance considerations. – Performance challenges– Performance Metrics– Energy efficiency– Benchmarking– Monitoring Tools
Outline
2ICOM 6025: High Performance Computing
Performance model Challenges
Transactions Video Streaming Batch jobs
Workloads
Memory Network Storage Power
Resource Allocation
Performance
ModelService Level Objectives
User
Behavior
CPU
ICOM 6025: High Performance Computing 3
Performance model Challenges
• Need realistic metrics to measure scalability• Impact on design of architectures and applications. • Multiple parameters involved
ParallelAlgorithm
ParallelMachine
PARALLEL SYSTEM
4ICOM 6025: High Performance Computing
• High Performance Computing (HPC) units are:– Flop: floating point operation– Flops/s: floating point operations per second– Bytes: size of data (a double precision floating point number is 8)
• Typical sizes are millions, billions, trillions…Mega Mflop/s = 106 flop/sec Mbyte = 220 ~106 bytesGiga Gflop/s = 109 flop/sec Gbyte = 230 ~ 109 bytesTera Tflop/s = 1012 flop/sec Tbyte = 240 ~ 1012 bytes Peta Pflop/s = 1015 flop/sec Pbyte = 250 ~ 1015 bytesExa Eflop/s = 1018 flop/sec Ebyte = 260 ~ 1018 bytesZetta Zflop/s = 1021 flop/sec Zbyte = 270 ~ 1021 bytesYotta Yflop/s = 1024 flop/sec Ybyte = 280 ~ 1024 bytes
Units of Measure
ICOM 6025: High Performance Computing 5
Overhead of Parallelism
• Parallelism overheads include:– Cost of starting a thread or process (latency)– Cost of communicating or sharing data (bandwidth)– Cost of synchronizing– Redundant computation
• Each of these can be in the range of milliseconds (=millions of flops) on some systems
• Tradeoff: Algorithm needs sufficiently large units of work (large granularity) to run fast in parallel, but not so large that there is not enough parallel work
6ICOM 6025: High Performance Computing
Scalability
• Horizontal Scalability – Add nodes to the system
• Vertical Scalability– Add resources (CPU, memory) per node
7
Scalability Models
• Fixed-Problem Size Model– Speedup– Amdhal’s Law
• Memory-Constrained Model– Scaled speedup (Gustafson)– Scaled speedup is less than lineal (Flatt & Kennedy)– Isoefficiency (kumar & Gupta)
• Fixed-Time Scaling Model– Isospeed (Sun & Rover)
8ICOM 6025: High Performance Computing
Speedup
),()(pW
W
TTS
p
sp =
WpT p ≅
WpWpWW
Wp
ETTT
Tp
s
),(1
1),( 00 +
=+
==
↑⎯→⎯↑
↓⎯→⎯↑
EWEp
*
*
Ideal
Actual
# of Processors
9ICOM 6025: High Performance Computing
Amdahl’s Law
• Let f be the fraction of a program that is sequential– 1-f is the fraction that can be
parallelized• Let T1 be the execution time on 1
processor• Let Tp be the execution time on p
processors• Sp is the speedup
Sp = T1 / Tp
= T1 / (fT1 +(1-f)T1 /p))= 1 / (f +(1-f)/p))
• As p → ∞Sp = 1 / f
10ICOM 6025: High Performance Computing
Amdahl’s Law places a strict limit on the speedup that can be realized by using multiple processors.
Amdahl’s Law and Scalability
• Scalability– Ability of parallel algorithm to achieve performance
gains proportional to the number of processors and the size of the problem
• When does Amdahl’s Law apply?– When the problem size is fixed– Strong scaling (p→∞, Sp = S∞ → 1 / f )– Speedup bound is determined by the degree of
sequential execution time in the computation, not number of processors!!!
– Perfect efficiency is hard to achieve
11Introduction to Parallel Computing, University of Oregon, IPCC
Isoefficiency
• Relation of problem size and the maximum number of processors which can be used in a cost-optimal fashion
• A parallel system is cost optimal iff – pTp =O(W).
• A parallel system is scalable iff its isoefficiency function exists. – If W needs to grow exponentially with respect to p, the
parallel system is poorly scalable. – If W grows nearly linear with p, the parallel system is highly
scalable
12ICOM 6025: High Performance Computing
Example: Maximum Element
In: a[]Out: maximum element in a
sequential_maximum(a) {n = a.lengthmax = a[0]for i = 1 to n – 1 {
if (a[i] > max)max = a[i]
}return max
}
21 11 23 17 48 33 22 41
21
23
23
48
48
48
48
O(n)
Example: Parallel Maximum
21 11 23 17 48 33 22 41
21 23 48 41
23 48
48
O(lgn)
Iso-efficiency Analysis
• Adding n numbers, using p=n processors– Ts=θ(n)– Tp= θ(log n)– E= θ(1/log n)– The larger the problem, the less efficiently we use the
processors • Adding n numbers, using p<n processors
– Ts=θ(n)– Tp= θ(n/p + log n)– E= θ(1/(1 + p(log p)/n))– The problem size must grow at least as fast as p log p to
balance the overhead of the parallel reduction
15
Effectiveness
}),,,(
)({
),,,(cos
),,,()(
)(
2max
min
ξα
ξ
ξα
ξαξ
αα
pHpw
pHpt
pHweperformanc
ww
T
TT
ppopt
p
p
A
=
=
=
=
Γ
∈
16ICOM 6025: High Performance Computing
Application of Effectiveness
17Scalable Parallel Genetic Algorithms
Application of Effectiveness
18
Energy EfficiencyRACK
4 enclosures x 64 blades x 20 VMs ~ 5,000 VMs
10kwH ~ $20k/month
DATA CENTER
2,500 square foot =100 racks ~ 500,000 VMs
1000kwH ~ $2M/month
CO2 Emissions
1000kwH x 9000 H/year ~ 9M kwH/year
1.3 lb/kwH x 9M kwH/year ~ 11.7M lb
11.7M lb/2,200 ~ 5,300 metric tons
5,300 x $40/ton ~ $250k /year
http://www.environmentalleader.com/2008/07/27/data-centers-and-carbon-pricing/
ICOM 6025: High Performance Computing 19
Energy Efficient Data Centers
• Facebook Oregon data center• Microsoft GFS Datacenter Tour• Time Lapse of Data Center Construction
Energy Efficiency
The Green Grid, Using Virtualization to improve data centres Efficiency, 14 January 2009
• 75% of servers were running below 5% utilization
• An idle server consume more than 40% of the power that a fully utilized server does
• At 10% utilization, the server utilized 173 watts of power
• Energy reduction example– 10 servers @ 10% utilization (173 watts
each)=1730watts– 1 server @ 50% utilization = 230 watts
• http://www.spec.org/power_ssj2008/results/power_ssj2008.html
Barroso and Holzle (Google), The case for energy proportional computing,, IEEE Computer 2007
ICOM 6025: High Performance Computing 21
Data Center Metrics
• WUE = annual water use/ IT equipment energy
• PUE =total facility power /IT equipment power– Low is good, e.g. google =1.11
MicroBenchmarks
• Bonnie++ (Hard drive performance)• Stream (Memory performance)• Netperf (Network performance) • LMbench (Low level system)• Netpipe (Network performance)• Intel MPI Benchmarks (Low and high level)
23ICOM 6025: High Performance Computing
MacroBenchmarks
• High Performance Linpack (HPL) – solves a dense linear system using LU factorization with partial
pivoting• Gromacs
– Molecular Dynamics, a good measure of floating point performance and maximum power
• NAS Parallel Suite – Computational Fluid Dynamics Kernels with self checking of
results• Intel’s MPI Benchmark• HPC Challenge Suite
– Includes several benchmark programs to test, computation, communication, and memory bandwidth
24ICOM 6025: High Performance Computing
NAS Parallel Benchmarks (NPB)
• http://www.nas.nasa.gov/Software/NPB
• Numerical Aerodynamics Simulation Program at NASA Ames Research Center
• Benchmarks run with little or no tuning• SP and BT are simulated CFD applications that solve
systems of equations resulting from an approximately factored implicit finite discretization of the Navier-Stokes equations– BT code solves block-tridiagonal systems of 5x5 blocks– SP code solves scalar pentadiagonal systems
Performance Monitoring
• PAPI– Performance Application Programming Interface– Access to hardware performance counters
• ompP– Profiling openMP code
• IPM– Integrated Performance Monitoring Paradyn– Monitoring message passing application
• Other Performance monitoring tools– Vampir– KOJAK– TAU– Scalasca– Paradyn– Periscope– Perfsuite– HPCToolkit– CaryPat– Spin – Intel’s Parallel Studio
• Parallel Advisor• Inspector XE
26ICOM 6025: High Performance Computing
Summary
• Performance challenges• Performance Metrics• Energy efficiency• Benchmarking• Monitoring Tools
27ICOM 6025: High Performance Computing