Parallel computing chapter 3
-
Upload
md-mahedi-mahfuj -
Category
Technology
-
view
1.481 -
download
0
Transcript of Parallel computing chapter 3
![Page 1: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/1.jpg)
EENG-630 Chapter 3 1
Chapter 3: Principles of Scalable Performance
• Performance measures
• Speedup laws
• Scalability principles
• Scaling up vs. scaling down
![Page 2: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/2.jpg)
EENG-630 Chapter 3 2
Performance metrics and measures
• Parallelism profiles
• Asymptotic speedup factor
• System efficiency, utilization and quality
• Standard performance measures
![Page 3: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/3.jpg)
EENG-630 Chapter 3 3
Degree of parallelism
• Reflects the matching of software and hardware parallelism
• Discrete time function – measure, for each time period, the # of processors used
• Parallelism profile is a plot of the DOP as a function of time
• Ideally have unlimited resources
![Page 4: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/4.jpg)
EENG-630 Chapter 3 4
Factors affecting parallelism profiles
• Algorithm structure
• Program optimization
• Resource utilization
• Run-time conditions
• Realistically limited by # of available processors, memory, and other nonprocessor resources
![Page 5: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/5.jpg)
EENG-630 Chapter 3 5
Average parallelism variables
• n – homogeneous processors
• m – maximum parallelism in a profile - computing capacity of a single
processor (execution rate only, no overhead)
• DOP=i – # processors busy during an observation period
![Page 6: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/6.jpg)
EENG-630 Chapter 3 6
Average parallelism
• Total amount of work performed is proportional to the area under the profile curve
m
i
i
t
t
tiW
dttDOPW
1
2
1)(
![Page 7: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/7.jpg)
EENG-630 Chapter 3 7
Average parallelism
m
i
i
m
i
i
t
t
ttiA
dttDOPtt
A
11
12
/
)(1 2
1
![Page 8: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/8.jpg)
EENG-630 Chapter 3 8
Example: parallelism profile and average parallelism
![Page 9: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/9.jpg)
EENG-630 Chapter 3 9
Asymptotic speedup
m
i
im
i
i
m
i
m
i
ii
i
WtT
WtT
11
1 1
)()(
)1()1(
m
i
i
m
i
i
iW
W
T
TS
1
1
/)(
)1(
= A in the ideal case
(response time)
![Page 10: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/10.jpg)
EENG-630 Chapter 3 10
Performance measures
• Consider n processors executing m programs in various modes
• Want to define the mean performance of these multimode computers:– Arithmetic mean performance– Geometric mean performance– Harmonic mean performance
![Page 11: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/11.jpg)
EENG-630 Chapter 3 11
Arithmetic mean performance
m
i
ia mRR1
/
m
iiia RfR
1
* )(
Arithmetic mean execution rate(assumes equal weighting)
Weighted arithmetic mean execution rate
-proportional to the sum of the inverses of execution times
![Page 12: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/12.jpg)
EENG-630 Chapter 3 12
Geometric mean performance
m
i
mig RR
1
/1
m
i
fig
iRR1
*
Geometric mean execution rate
Weighted geometric mean execution rate
-does not summarize the real performance since it does not have the inverse relation with the total time
![Page 13: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/13.jpg)
EENG-630 Chapter 3 13
Harmonic mean performance
ii RT /1
m
i i
m
iia Rm
Tm
T11
111
Mean execution time per instruction For program i
Arithmetic mean execution timeper instruction
![Page 14: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/14.jpg)
EENG-630 Chapter 3 14
Harmonic mean performance
m
ii
ah
R
mTR
1
)/1(/1
m
iii
h
RfR
1
*
)/(
1
Harmonic mean execution rate
Weighted harmonic mean execution rate
-corresponds to total # of operations divided by the total time (closest to the real performance)
![Page 15: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/15.jpg)
EENG-630 Chapter 3 15
Harmonic Mean Speedup
• Ties the various modes of a program to the number of processors used
• Program is in mode i if i processors used
• Sequential execution time T1 = 1/R1 = 1
n
i ii RfTTS
1
*1
/
1/
![Page 16: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/16.jpg)
EENG-630 Chapter 3 16
Harmonic Mean Speedup Performance
![Page 17: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/17.jpg)
EENG-630 Chapter 3 17
Amdahl’s Law
• Assume Ri = i, w = (, 0, 0, …, 1- )
• System is either sequential, with probability , or fully parallel with prob. 1-
• Implies S 1/ as n
)1(1
n
nSn
![Page 18: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/18.jpg)
EENG-630 Chapter 3 18
Speedup Performance
![Page 19: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/19.jpg)
EENG-630 Chapter 3 19
System Efficiency
• O(n) is the total # of unit operations
• T(n) is execution time in unit time steps
• T(n) < O(n) and T(1) = O(1)
)(/)1()( nTTNS
)(
)1()()(
nnT
T
n
nSnE
![Page 20: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/20.jpg)
EENG-630 Chapter 3 20
Redundancy and Utilization
• Redundancy signifies the extent of matching software and hardware parallelism
• Utilization indicates the percentage of resources kept busy during execution
)1(/)()( OnOnR
)(
)()()()(
nnT
nOnEnRnU
![Page 21: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/21.jpg)
EENG-630 Chapter 3 21
Quality of Parallelism
• Directly proportional to the speedup and efficiency and inversely related to the redundancy
• Upper-bounded by the speedup S(n)
)()(
)1(
)(
)()()(
2
3
nOnnT
T
nR
nEnSnQ
![Page 22: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/22.jpg)
EENG-630 Chapter 3 22
Example of Performance
• Given O(1) = T(1) = n3, O(n) = n3 + n2log n, and T(n) = 4 n3/(n+3)S(n) = (n+3)/4E(n) = (n+3)/(4n)R(n) = (n + log n)/nU(n) = (n+3)(n + log n)/(4n2)Q(n) = (n+3)2 / (16(n + log n))
![Page 23: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/23.jpg)
EENG-630 Chapter 3 23
Standard Performance Measures
• MIPS and Mflops– Depends on instruction set and program used
• Dhrystone results– Measure of integer performance
• Whestone results– Measure of floating-point performance
• TPS and KLIPS ratings– Transaction performance and reasoning power
![Page 24: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/24.jpg)
EENG-630 Chapter 3 24
Parallel Processing Applications
• Drug design
• High-speed civil transport
• Ocean modeling
• Ozone depletion research
• Air pollution
• Digital anatomy
![Page 25: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/25.jpg)
EENG-630 Chapter 3 25
Application Models for Parallel Computers
• Fixed-load model– Constant workload
• Fixed-time model– Demands constant program execution time
• Fixed-memory model– Limited by the memory bound
![Page 26: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/26.jpg)
EENG-630 Chapter 3 26
Algorithm Characteristics
• Deterministic vs. nondeterministic
• Computational granularity
• Parallelism profile
• Communication patterns and synchronization requirements
• Uniformity of operations
• Memory requirement and data structures
![Page 27: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/27.jpg)
EENG-630 Chapter 3 27
Isoefficiency Concept
• Relates workload to machine size n needed to maintain a fixed efficiency
• The smaller the power of n, the more scalable the system
),()(
)(
nshsw
swE
workload
overhead
![Page 28: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/28.jpg)
EENG-630 Chapter 3 28
Isoefficiency Function
• To maintain a constant E, w(s) should grow in proportion to h(s,n)
• C = E/(1-E) is constant for fixed E
),(1
)( nshE
Esw
),()( nshCnfE
![Page 29: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/29.jpg)
EENG-630 Chapter 3 29
Speedup Performance Laws
• Amdahl’s law– for fixed workload or fixed problem size
• Gustafson’s law– for scaled problems (problem size increases
with increased machine size)
• Speedup model– for scaled problems bounded by memory
capacity
![Page 30: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/30.jpg)
EENG-630 Chapter 3 30
Amdahl’s Law
• As # of processors increase, the fixed load is distributed to more processors
• Minimal turnaround time is primary goal• Speedup factor is upper-bounded by a
sequential bottleneck• Two cases:
DOP < n DOP n
![Page 31: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/31.jpg)
EENG-630 Chapter 3 31
Fixed Load Speedup Factor
• Case 1: DOP > n • Case 2: DOP < n
n
i
i
Wit i
i )(
m
i
i
n
i
i
WnT
1
)(
i
Wtnt i
ii )()(
m
i
i
m
iii
n
ni
i
W
W
nT
TS
1
)(
)1(
![Page 32: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/32.jpg)
EENG-630 Chapter 3 32
Gustafson’s Law
• With Amdahl’s Law, the workload cannot scale to match the available computing power as n increases
• Gustafson’s Law fixes the time, allowing the problem size to increase with higher n
• Not saving time, but increasing accuracy
![Page 33: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/33.jpg)
EENG-630 Chapter 3 33
Fixed-time Speedup
• As the machine size increases, have increased workload and new profile
• In general, Wi’ > Wi for 2 i m’ and W1’ = W1
• Assume T(1) = T’(n)
![Page 34: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/34.jpg)
EENG-630 Chapter 3 34
Gustafson’s Scaled Speedup
)(1
'
1
nQn
i
i
WW
m
i
im
ii
n
nm
ii
m
ii
n WW
nWW
W
WS
1
1
1
1
'
'
'
![Page 35: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/35.jpg)
EENG-630 Chapter 3 35
Memory Bounded Speedup Model
• Idea is to solve largest problem, limited by memory space
• Results in a scaled workload and higher accuracy• Each node can handle only a small subproblem for
distributed memory• Using a large # of nodes collectively increases the
memory capacity proportionally
![Page 36: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/36.jpg)
EENG-630 Chapter 3 36
Fixed-Memory Speedup
• Let M be the memory requirement and W the computational workload: W = g(M)
• g*(nM)=G(n)g(M)=G(n)Wn
nWnGW
WnGW
nQni
iW
WS
n
n
m
i
i
m
ii
n /)(
)(
)( 1
1
1
*1
*
**
*
![Page 37: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/37.jpg)
EENG-630 Chapter 3 37
Relating Speedup Models
• G(n) reflects the increase in workload as memory increases n times
• G(n) = 1 : Fixed problem size (Amdahl)
• G(n) = n : Workload increases n times when memory increased n times (Gustafson)
• G(n) > n : workload increases faster than memory than the memory requirement
![Page 38: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/38.jpg)
EENG-630 Chapter 3 38
Scalability Metrics
• Machine size (n) : # of processors• Clock rate (f) : determines basic m/c cycle• Problem size (s) : amount of computational
workload. Directly proportional to T(s,1).• CPU time (T(s,n)) : actual CPU time for
execution• I/O demand (d) : demand in moving the
program, data, and results for a given run
![Page 39: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/39.jpg)
EENG-630 Chapter 3 39
Scalability Metrics
• Memory capacity (m) : max # of memory words demanded
• Communication overhead (h(s,n)) : amount of time for interprocessor communication, synchronization, etc.
• Computer cost (c) : total cost of h/w and s/w resources required
• Programming overhead (p) : development overhead associated with an application program
![Page 40: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/40.jpg)
EENG-630 Chapter 3 40
Speedup and Efficiency
• The problem size is the independent parameter
n
nsSnsE
),(),(
),(),(
)1,(),(
nshnsT
sTnsS
![Page 41: Parallel computing chapter 3](https://reader031.fdocuments.us/reader031/viewer/2022020920/5550447cb4c905b2788b4bea/html5/thumbnails/41.jpg)
EENG-630 Chapter 3 41
Scalable Systems
• Ideally, if E(s,n)=1 for all algorithms and any s and n, system is scalable
• Practically, consider the scalability of a m/c
),(
),(
),(
),(),(
nsT
nsT
nsS
nsSns I
I