Computational Physics An Introduction to High-Performance Computing
description
Transcript of Computational Physics An Introduction to High-Performance Computing
![Page 1: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/1.jpg)
Introduction to Parallel Processing October 2005, Lecture #1
Computational Physics
An Introduction to High-Performance
ComputingGuy Tel-Zur
![Page 2: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/2.jpg)
Talk Outline• Motivation• Basic terms• Methods of Parallelization• Examples• Profiling, Benchmarking and Performance Tuning• Common H/W (GPGPU)• Supercomputers• Future Trends
![Page 3: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/3.jpg)
A Definition fromOxford Dictionary of Science:
A technique that allows more than one process – stream of activity – to be running at any given moment in a computer system, hence processes can be executed in parallel. This means that two or more processors are active among a group of processes at any instant.
![Page 4: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/4.jpg)
• Motivation• Basic terms• Parallelization methods• Examples• Profiling, Benchmarking and Performance Tuning• Common H/W• Supercomputers• Future trends
![Page 5: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/5.jpg)
Introduction to Parallel Processing
The need for Parallel Processing
• Get the solution faster and or solve a bigger problem
• Other considerations…(for and against)– Power -> MutliCores
• Serial processor limits
DEMO:N=input('Enter dimension: ')A=rand(N);B=rand(N);
ticC=A*B;
toc
![Page 6: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/6.jpg)
Why Parallel Processing• The universe is inherently parallel, so parallel
models fit it best.
חיזוי מז"א חישה מרחוק "ביולוגיה חישובית"
![Page 7: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/7.jpg)
The Demand for Computational Speed
Continual demand for greater computational speed from a computer system than is currently possible. Areas requiring great computational speed include numerical modeling and simulation of scientific and engineering problems. Computations must be completed within a “reasonable” time period.
![Page 8: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/8.jpg)
Exercise• In a galaxy there are 10^11 stars• Estimate the computing time for 100
iterations assuming O(N^2) interactions on a 1GFLOPS computer
![Page 9: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/9.jpg)
Solution• For 10^11 starts there are 10^22
interactions• X100 iterations 10^24 operations• Therefore the computing time:
• Conclusion: Improve the algorithm! Do approximations…hopefully n log(n)
t=1024
109 =1015sec=31 , 709 ,791 years
![Page 10: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/10.jpg)
Large Memory RequirementsUse parallel computing for executing larger problems which require more memory than exists on a single computer.
Japan’s Earth Simulator (35TFLOPS)
An Aurora simulation
![Page 11: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/11.jpg)
![Page 12: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/12.jpg)
Source: SciDAC Review, Number 16, 2010
![Page 13: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/13.jpg)
Molecular Dynamics
Source: SciDAC Review, Number 16, 2010
![Page 14: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/14.jpg)
Other considerations• Development cost
– Difficult to program and debug– Expensive H/W, Wait 1.5y and buy X2 faster
H/W
– TCO, ROI…
![Page 15: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/15.jpg)
Introduction to Parallel Processing
24/9/2010
ידיעה לחיזוק המוטיבציה למי שעוד
לא השתכנע בחשיבות התחום...
![Page 16: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/16.jpg)
• Motivation• Basic terms• Parallelization methods• Examples• Profiling, Benchmarking and Performance Tuning• Common H/W• Supercomputers• HTC and Condor• The Grid• Future trends
![Page 17: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/17.jpg)
Basic terms• Buzzwords• Flynn’s taxonomy• Speedup and Efficiency• Amdah’l Law• Load Imbalance
![Page 18: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/18.jpg)
Introduction to Parallel Processing
BuzzwordsFarming Embarrassingly parallelParallel Computing - simultaneous use ofmultiple processors Symmetric Multiprocessing (SMP) - a single
address space.Cluster Computing - a combination of commodity
units.Supercomputing - Use of the fastest, biggest
machines to solve large problems.
![Page 19: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/19.jpg)
Flynn’s taxonomy
• single-instruction single-data streams (SISD)
• single-instruction multiple-data streams (SIMD)
• multiple-instruction single-data streams (MISD)
• multiple-instruction multiple-data streams (MIMD) SPMD
![Page 20: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/20.jpg)
March 2010 Lecture #1Introduction to Parallel ProcessingPP2010B
http
://en
.wik
iped
ia.o
rg/w
iki/F
lynn
%27
s_ta
xono
my
![Page 21: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/21.jpg)
Introduction to Parallel Processing
“Time” Terms
Serial time, ts = Time of best serial (1 processor) algorithm.
Parallel time, tP = Time of the parallel algorithm + architecture to solve the problem using p processors.
Note: tP ≤ ts but tP=1 ≥ ts many times we assume t1
≈ ts
![Page 22: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/22.jpg)
מושגים בסיסיים חשובים ביותר!
• Speedup: ts / tP ;0 ≤ s.u. ≤p
• Work (cost): p * tP ; ts ≤W(p) ≤∞
(number of numerical operations)
• Efficiency: ts / (p * tP) ; 0 ≤ ≤1 (w1/wp)
![Page 23: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/23.jpg)
Maximal Possible Speedup
![Page 24: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/24.jpg)
Amdahl’s Law (1967)
11
/11/1 timeParallel1
fraction code Serial timeprocessor 1 timeSerial
+)f(nn=
tt=S(n)
n)f)(n+(t=nf)t(+tf=t=f)t(
=f==t
p
s
sssp
s
s
![Page 25: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/25.jpg)
Maximal Possible Efficiency = ts / (p * tP) ; 0 ≤ ≤1
![Page 26: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/26.jpg)
Amdahl’s Law - continue
f=nS
n
1)(
With only 5% of the computation being serial, the maximum speedup is 20
![Page 27: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/27.jpg)
An Example of Amdahl’s Law• Amdahl’s Law bounds the speedup due to any improvement.– Example: What will the speedup be if 20% of the exec. time is in
interprocessor communications which we can improve by 10X?S=T/T’= 1/ [.2/10 + .8] = 1.25=> Invest resources where time is spent. The slowest portion willdominate.
Amdahl’s Law and Murphy’s Law: “If any system component candamage performance, it will.”
![Page 28: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/28.jpg)
Computation/Communication Ratio
Computation timeCommunication time
=tcomp
tcomm
![Page 29: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/29.jpg)
Overhead
𝑓 𝑜h=1𝜀 −1=
𝑝𝑡𝑝−𝑡 𝑠𝑡 𝑠
= overhead = efficiency = number of processes = parallel time = serial time
![Page 30: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/30.jpg)
Load Imbalance
• Static / Dynamic
![Page 31: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/31.jpg)
Dynamic Partitioning – Domain Decomposition by Quad or Oct Trees
![Page 32: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/32.jpg)
• Motivation• Basic terms• Parallelization Methods• Examples• Profiling, Benchmarking and Performance Tuning• Common H/W• Supercomputers• HTC and Condor• The Grid• Future trends
![Page 33: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/33.jpg)
Methods of Parallelization
• Message Passing (PVM, MPI)• Shared Memory (OpenMP)• Hybrid• ----------------------• Network Topology
![Page 34: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/34.jpg)
Message Passing (MIMD)
![Page 35: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/35.jpg)
Introduction to Parallel Processing October 2005, Lecture #1
The Most Popular Message Passing APIs
PVM – Parallel Virtual Machine (ORNL)MPI – Message Passing Interface (ANL)
– Free SDKs for MPI: MPICH and LAM– New: OpenMPI (FT-MPI,LAM,LANL)
![Page 36: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/36.jpg)
MPI• Standardized, with process to keep it evolving.• Available on almost all parallel systems (free MPICH• used on many clusters), with interfaces for C andFortran.• Supplies many communication variations and optimizedfunctions for a wide range of needs.• Supports large program development and integration ofmultiple modules.• Many powerful packages and tools based on MPI.While MPI large (125 functions), usually need very fewfunctions, giving gentle learning curve.• Various training materials, tools and aids for MPI.
![Page 37: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/37.jpg)
October 2005, Lecture #1
MPI Basics• MPI_SEND() to send data• MPI_RECV() to receive it.--------------------• MPI_Init(&argc, &argv)• MPI_Comm_rank(MPI_COMM_WORLD, &my_rank)• MPI_Comm_size(MPI_COMM_WORLD,&num_processors)• MPI_Finalize()
![Page 38: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/38.jpg)
A Basic Programinitializeif (my_rank == 0){ sum = 0.0; for (source=1; source<num_procs; source++){ MPI_RECV(&value,1,MPI_FLOAT,source,tag, MPI_COMM_WORLD,&status); sum += value; }} else { MPI_SEND(&value,1,MPI_FLOAT,0,tag, MPI_COMM_WORLD);}finalize
![Page 39: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/39.jpg)
October 2005, Lecture #1
MPI – Cont’• Deadlocks• Collective Communication• MPI-2:
– Parallel I/O– One-Sided Communication
![Page 40: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/40.jpg)
Be Careful of Deadlocks
M.C. Escher’s Drawing Hands Un Safe SEND/RECV
![Page 41: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/41.jpg)
Introduction to Parallel Processing
Shared Memory
![Page 42: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/42.jpg)
Shared Memory ComputersIBM p690+
Each node: 32 POWER 4+ 1.7 GHz processors
Sun Fire 6800 900Mhz UltraSparc III processors
נציגה כחול-לבן
![Page 43: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/43.jpg)
October 2005, Lecture #1
OpenMP
![Page 44: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/44.jpg)
An OpenMP Example#include <omp.h>#include <stdio.h>int main(int argc, char* argv[]){printf("Hello parallel world from
thread:\n");#pragma omp parallel{printf("%d\n",
omp_get_thread_num());}printf("Back to the sequential
world\n");}
~> export OMP_NUM_THREADS=4
~> ./a.outHello parallel world from
thread:1302Back to sequential world~>
![Page 45: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/45.jpg)
Constellation systemsP
C
P
C
P
C
P
C
M
P
C
P
C
P
C
P
C
M
P
C
P
C
P
C
P
C
M
Interconnect
![Page 46: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/46.jpg)
Network Topology
![Page 47: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/47.jpg)
Network Properties• Bisection Width - # links to be cut in
order to divide the network into two equal parts
• Diameter – The max. distance between any two nodes
• Connectivity – Multiplicity of paths between any two nodes
• Cost – Total Number of links
![Page 48: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/48.jpg)
3D Torus
![Page 49: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/49.jpg)
Ciara VXR-3DT
![Page 50: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/50.jpg)
A Binary
Fat tree: Thinking Machine CM5, 1993
![Page 51: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/51.jpg)
4D Hypercube Network
![Page 52: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/52.jpg)
• Motivation• Basic terms• Methods of Parallelization• Examples• Profiling, Benchmarking and
Performance Tuning• Common H/W• Supercomputers• Future trends
![Page 53: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/53.jpg)
Example #1The car of the future
Reference: SC04 S2: Parallel Computing 101 tutorial
![Page 54: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/54.jpg)
A Distributed Car
![Page 55: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/55.jpg)
Halos
![Page 56: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/56.jpg)
Ghost points
![Page 57: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/57.jpg)
October 2005, Lecture #1
Example #2:Collisions of Billiard Balls
• MPI Parallel Code• MPE library is used for the real-time graphics• Each process is responsible to a single ball
![Page 58: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/58.jpg)
Example #3: Parallel Pattern Recognition
The Hough Transform
P.V.C. Hough. Methods and means for recognizing complex patterns.
U.S. Patent 3069654, 1962.
![Page 59: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/59.jpg)
Guy Tel-Zur, Ph.D. Thesis. Weizmann Institute 1996
![Page 60: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/60.jpg)
Ring candidate search by a Hough
transformation
![Page 61: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/61.jpg)
Parallel Patterns• Master / Workers paradigm• Domain decomposition: Divide the image into
slices. Allocate each slice to a process
![Page 62: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/62.jpg)
• Motivation• Basic terms• Methods of Parallelization• Examples• Profiling, Benchmarking and Performance Tuning• Common H/W• Supercomputers• Future trends
![Page 63: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/63.jpg)
Profiling, Benchmarking and Performance Tuning
• Profiling: Post mortem analysis• Benchmarking suite: The HPC Challenge• PAPI, http://icl.cs.utk.edu/papi/• By Intel (will be installed at the BGU)
– Vtune– Parallel Studio
![Page 64: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/64.jpg)
Profiling
![Page 65: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/65.jpg)
Profiling
MPICH: Java based Jumpshot3
![Page 66: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/66.jpg)
Introduction to Parallel Processing October 2005, Lecture #1
PVM Cluster view with XPVM
![Page 67: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/67.jpg)
Cluster Monitoring
![Page 68: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/68.jpg)
March 2010 Lecture #1
1עד כאן שיעור
![Page 69: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/69.jpg)
Diagnostics
Mic
row
ay –
Lin
k C
heck
er
![Page 70: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/70.jpg)
Why Performance Modelling?• Parallel performance is a multidimensional space:
– Resource parameters: # of processors, computation speed,network size/topology/protocols/etc., communication speed
– User-oriented parameters: Problem size, application input,target optimization (time vs. size)
– These issues interact and trade off with each other
• Large cost for development, deployment andmaintenance of both machines and codes
• Need to know in advance how a given applicationutilizes the machine’s resources.
![Page 71: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/71.jpg)
Performance Modelling
Basic approach:
Trun = Tcomputation + Tcommunication – Toverlap
Trun = f (T1,#CPUs , Scalability)
![Page 72: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/72.jpg)
HPC Challenge• HPL - the Linpack TPP benchmark which measures the floating point rate of
execution for solving a linear system of equations. • DGEMM - measures the floating point rate of execution of double precision
real matrix-matrix multiplication. • STREAM - a simple synthetic benchmark program that measures
sustainable memory bandwidth (in GB/s) and the corresponding computation rate for simple vector kernel.
• PTRANS (parallel matrix transpose) - exercises the communications where pairs of processors communicate with each other simultaneously. It is a useful test of the total communications capacity of the network.
• RandomAccess - measures the rate of integer random updates of memory (GUPS).
• FFTE - measures the floating point rate of execution of double precision complex one-dimensional Discrete Fourier Transform (DFT).
• Communication bandwidth and latency - a set of tests to measure latency and bandwidth of a number of simultaneous communication patterns; based on b_eff (effective bandwidth benchmark).
![Page 73: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/73.jpg)
Bottlenecks
A rule of thumb that often applies A contemporary processor, for a spectrum of applications, delivers
(i.e.,sustains) 10% of peak performance
![Page 74: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/74.jpg)
Processor-Memory Gap
1
10
100
100019
80
1984
1986
1988
1990
1992
1994
1996
1998
2000
DRAM
CPU
1982
Perf
orm
ance
![Page 75: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/75.jpg)
Memory Access Speed on a DEC 21164 Alpha– Registers 2 ns– LI On-Chip 4 ns; ~kB– L2 On-Chip 5 ns; ~MB– L3 Off-Chip 30ns– Memory 220ns; ~GB– Hard Disk 10ms; ~+100GB
![Page 76: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/76.jpg)
• Motivation• Basic terms• Methods of Parallelization• Examples• Profiling, Benchmarking and Performance Tuning• Common H/W• Supercomputers• HTC and Condor• The Grid• Future trends
![Page 77: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/77.jpg)
Common H/W
• Clusters– Pizzas– Blades– GPGPUs
![Page 78: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/78.jpg)
“Pizzas”
Tatung Dual Opteron Tyan 2881 dual Opteron board
![Page 79: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/79.jpg)
Blades4U, holding up to 8 server blades.dual XEON/XEON w/z EM64T/OpteronPCI-X, built-in KVM switch and GbE/FE switch, hot swappable 6+1 redundant power
![Page 80: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/80.jpg)
GPGPU
March 2010 Lecture #1
![Page 81: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/81.jpg)
Top of the line Networking• Mellanox Infiniband
– Server to Server 40Gps (QDR)– Switch to Switch:60Gbps– ~1micro-second latency
Bandwidth
![Page 82: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/82.jpg)
IS5600 - 648-port 20 and 40Gb/s InfiniBand Chassis Switch
![Page 83: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/83.jpg)
• Motivation• Basic terms• Methods of Parallelization• Examples• Profiling, Benchmarking and Performance Tuning• Common H/W• Supercomputers• Future trends
![Page 84: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/84.jpg)
Supercomputers• The Top 10• The Top 500• Trends (will be
covered while SCxx conference – Autumn semester OR ISCxx – Spring semester)
“An extremely high power computer that has a large amount of main memory and very fast processors… Often the processors run in parallel.”
![Page 85: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/85.jpg)
The Do-It-Yourself Supercomputer
Scientific American, August 2001 Issuealso available online:
http://www.sciam.com/2001/0801issue/0801hargrove.html
![Page 86: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/86.jpg)
The Top500
![Page 87: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/87.jpg)
Introduction to Parallel Processing
The Top15To
p 15
Ju
ne 2
009
![Page 88: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/88.jpg)
IBM Blue Gene
![Page 89: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/89.jpg)
Introduction to Parallel Processing
Barcelona Supercomputer Centre
![Page 90: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/90.jpg)
• 4.564 PowerPC 970 FX processors, 9 TB of Memory, 4 GB per node, 231 TB Storage Capacity. 3 networks: • Myrinet • Gigabit • 10/100 Ethernet• OS: Linux kernel version 2.6
![Page 91: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/91.jpg)
Virginia Tech1100 Dual 2.3 GHz Apple XServe/Mellanox Infiniband 4X/Cisco GigE
http://www.tcf.vt.edu/systemX.html
![Page 92: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/92.jpg)
Source: SciDAC Review, Number 16, 2010
![Page 93: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/93.jpg)
Top 500 List
Being published twice a year.
Spring Semester: ISC, Germany
Autumn Semester: SC, USA
We will cover these events in our course!
![Page 94: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/94.jpg)
• Motivation• Basic terms• Methods of Parallelization• Examples• Profiling, Benchmarking and Performance Tuning• Common H/W• Supercomputers• Future trends
![Page 95: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/95.jpg)
• Motivation• Basic terms• Methods of Parallelization• Examples• Profiling, Benchmarking and Performance Tuning• Common H/W• Supercomputers• Future trends
![Page 96: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/96.jpg)
Introduction to Parallel Processing
Technology Trends - Processors
![Page 97: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/97.jpg)
![Page 98: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/98.jpg)
Moore’s Law Still Holds
’60 ’65 ’70 ’75 ’80 ’85 ’90 ’95 ’00 ’05 ’10
Tran
sist
ors P
er D
ie
1K4K 16K
64K256K
1M
16M4M
64M
4004
80808086
80286i386™
i486™Pentium®
MemoryMicroprocessor
Pentium® IIPentium® III
256M
Pentium® 4Itanium®
1G2G4G
128M
Source: Intel
108
107
106
105
104
103
102
101
100
109
1010
1011
512M
![Page 99: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/99.jpg)
)Very near (Future trends
![Page 100: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/100.jpg)
Introduction to Parallel Processing October 2005, Lecture #1
1997 Prediction
![Page 101: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/101.jpg)
Introduction to Parallel Processing October 2005, Lecture #1
![Page 102: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/102.jpg)
Introduction to Parallel Processing
Power dissipation
• Opteron dual core 95W• Human Activities
– Sleeping 81W– Sitting 93W– Conversation 128W– Strolling 163W– Hiking 407W– Sprinting 1630W
![Page 103: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/103.jpg)
Introduction to Parallel Processing
Power Consumption Trends in Microprocessors
![Page 104: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/104.jpg)
Introduction to Parallel Processing
The Power Problem
![Page 105: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/105.jpg)
National Center for Supercomputing Applications
Managing the Heat Load
Liquid cooling system in Apple G5s Heat sinks in 6XX series Pentium 4s
Source: Thom H. Dunning, Jr.National Center for Supercomputing Applicationsand Department of ChemistryUniversity of Illinois at Urbana-Champaign
![Page 106: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/106.jpg)
Dual core (2005)
![Page 107: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/107.jpg)
2009
AMD Istanbul 6 cores:
![Page 108: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/108.jpg)
2009/10 - Nvida - Fermi512 cores
![Page 109: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/109.jpg)
System on a Chip
Sou
rce:
sci
dac
revi
ew, n
umbe
r 16,
201
0
![Page 110: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/110.jpg)
Top 500 – Trends Since 1993
![Page 111: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/111.jpg)
![Page 112: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/112.jpg)
![Page 113: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/113.jpg)
![Page 114: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/114.jpg)
![Page 115: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/115.jpg)
Processor Count
![Page 116: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/116.jpg)
93 94 95 96 97 98 99 00 01 02 03 04 05
My laptop
![Page 117: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/117.jpg)
![Page 118: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/118.jpg)
Price / Performance• $0.30/MFLOPS (was $0.60 two years ago)• $300/GFLOPS• $300,000/TFLOPS• $30,000,000 for #1
2009 :US$0.1/hour/core on Amazon EC2
2010 :US$0.085/hour/core on Amazon EC2
ירידת מחירים מתמדת.
אי אפשר לעדכן את השקפים
![Page 119: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/119.jpg)
The Dream Machine - 2005Quad dual core
![Page 120: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/120.jpg)
The Dream Machine - 200932 cores
October 2009 Lecture #1
Supermicro 2U Twin2 Servers – 8 X 4-cores processors375 GFLOPS/kW
![Page 121: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/121.jpg)
The Dream Machine 2010• AMD 12 cores (16 cores in 2011)
March 2010 Lecture #1Introduction to Parallel ProcessingPP2010B
![Page 122: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/122.jpg)
The Dream Machine 2010• Supermicro - Double-Density TwinBlade™• 20 DP Servers in 7U, 120 Servers in 42U, 240
sockets-> 6 cores/cpu = 1,440 cores/rack • Peak:1440*4ops*2GHz=11TF
March 2010 Lecture #1Introduction to Parallel ProcessingPP2010B
![Page 123: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/123.jpg)
Multi-core Many cores• Higher performance per watt • Directly connects the processor cores to a
single die to even further reduce latencies between processors
• Licensing per socket?• A short online flash clip from AMD
![Page 124: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/124.jpg)
Another Example: The CellBy Sony,Toshiba and IBM
• Observed clock speed: > 4 GHz • Peak performance (single precision): > 256 GFlops • Peak performance (double precision): >26 GFlops • Local storage size per SPU: 256KB • Area: 221 mm² • Technology 90nm• Total number of transistors: 234M
![Page 125: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/125.jpg)
The Cell (cont’)A heterogeneous chip multiprocessor consisting of a 64-bit Power core, augmented with 8 specialized co-processors based on a novel single-instruction multiple-data (SIMD) architecture called SPU (Synergistic Processor Unit), for data intensive processing as is found in cryptography, media and scientific applications. The system is integrated by a coherent on-chip bus.
Ref: http://www.research.ibm.com/cell
![Page 126: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/126.jpg)
Was taught for the first time in October 2005,
The Cell (Cont’)
![Page 127: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/127.jpg)
VirtualizationVirtualization—the use of software to allow workloads tobe shared at the processor level by providing the illusion ofmultiple processors—is growing in popularity.Virtualization balances workloads between underused ITassets, minimizing the requirement to have performanceoverhead held in reserve for peak situations and the needto manage unnecessary hardware.
Xen….
Our Educational Cluster is based on this technology!!!
![Page 128: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/128.jpg)
Mobile Distributed Computing
March 2010 Lecture #1Introduction to Parallel ProcessingPP2010B
![Page 129: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/129.jpg)
Summary
![Page 130: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/130.jpg)
References• Gordon Moore
http://www.intel.com/technology/mooreslaw/index.htm
• Moore’s Law : – ftp://download.intel.com/museum/Moores_Law/
Printed_Materials/Moores_Law_Backgrounder.pdf– http://www.intel.com/technology/silicon/mooreslaw/
index.htm• Future processors trends:
ftp://download.intel.com/technology/computing/archinnov/platform2015/download/Platform_2015.pdf
![Page 131: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/131.jpg)
References• My Parallel Processing Course website
http://www.ee.bgu.ac.il/~tel-zur/2011A• “Parallel Computing 101”, SC04, S2 Tutorial• HPC Challenge: http://icl.cs.utk.edu/hpcc• Condor at the Ben-Gurion University:
http://www.ee.bgu.ac.il/~tel-zur/condor
![Page 132: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/132.jpg)
References• MPI: http://www-unix.mcs.anl.gov/mpi/index.html• Mosix: http://www.mosix.org• Condor:http://www.cs.wisc.edu/condor• The Top500 Supercomputers:
http://www.top500.org• Grid Computing: Grid Café:
http://gridcafe.web.cern.ch/gridcafe/• Grid in Israel:
– Israel Academic Grid: http://iag.iucc.ac.il/– The IGT: http://www.grid.org.il/
• Mellanox: http://www.mellanox.com/
![Page 134: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/134.jpg)
References• Books: http://www.top500.org/main/Books/• The Sourcebook of Parallel Computing
![Page 135: Computational Physics An Introduction to High-Performance Computing](https://reader035.fdocuments.us/reader035/viewer/2022062302/56816381550346895dd464a1/html5/thumbnails/135.jpg)
References (a very partial list)More books at the course website