EECS 570: Fall 2003 -- rev3 1 Chapter 8: Cache Coherents in Scalable Multiprocessors.
EECS 570
-
Upload
mohammad-wilder -
Category
Documents
-
view
14 -
download
0
description
Transcript of EECS 570
![Page 1: EECS 570](https://reader035.fdocuments.us/reader035/viewer/2022072013/56812c7d550346895d9127e8/html5/thumbnails/1.jpg)
EECS 570: Fall 2003 -- rev1 1
EECS 570
• Notes on Chapter 2 – Parallel Programs
![Page 2: EECS 570](https://reader035.fdocuments.us/reader035/viewer/2022072013/56812c7d550346895d9127e8/html5/thumbnails/2.jpg)
EECS 570: Fall 2003 -- rev1 2
Terminology
• Task:– Programmer-defined sequential piece of work– Concurrency is only across tasks– Qualitative amount of work may be:
• small• large
• Process (thread):– Abstract entity that performs tasks– Equivalent OS concepts– Must communicate and synchronize with other processes– Execute on processor
• typically one-to-one mapping
![Page 3: EECS 570](https://reader035.fdocuments.us/reader035/viewer/2022072013/56812c7d550346895d9127e8/html5/thumbnails/3.jpg)
EECS 570: Fall 2003 -- rev1 3
Step in Creating a Parallel Program
![Page 4: EECS 570](https://reader035.fdocuments.us/reader035/viewer/2022072013/56812c7d550346895d9127e8/html5/thumbnails/4.jpg)
EECS 570: Fall 2003 -- rev1 4
Decomposition
• Break up computation into tasks to be divided among processes– could be static, quasi-static or dynamic– i.e., identify concurrency and decide level at which to
exploit it
Goal: Enough tasks to keep processes busy...
...but not too many
![Page 5: EECS 570](https://reader035.fdocuments.us/reader035/viewer/2022072013/56812c7d550346895d9127e8/html5/thumbnails/5.jpg)
EECS 570: Fall 2003 -- rev1 5
Amdahl's Law
• Assume fraction s of sequential execution is inherently serial– remainder (1- s) can be perfectly parallelized
• Speedup with p processors is:
1
1- ss - p
• Limit: ?
![Page 6: EECS 570](https://reader035.fdocuments.us/reader035/viewer/2022072013/56812c7d550346895d9127e8/html5/thumbnails/6.jpg)
EECS 570: Fall 2003 -- rev1 6
Aside on Cost-Effective Computing
• Isn't Speedup(P) < P inefficient?• If only throughput matters, use P computers instead?• But much of a computer's cost is NOT in the processor
[Wood & Hill, IEEE Computer 2/95]Let Costup(P) = Cost(P)/Cost(l)
• Parallel computing cost-effective:Speedup(P) > Costup(P)
• E.g. for SGI PowerChallenge w/500MB:Costup(32) = 8.6
![Page 7: EECS 570](https://reader035.fdocuments.us/reader035/viewer/2022072013/56812c7d550346895d9127e8/html5/thumbnails/7.jpg)
EECS 570: Fall 2003 -- rev1 7
Assignment
• Assign tasks to processes– Again, can be static, dynamic, or in between
• Goals:– Balance workload– Reduce communication– Minimize management overhead
• Decomposition + Assignment = Partitioning• Mostly independent of architecture/programming
model
![Page 8: EECS 570](https://reader035.fdocuments.us/reader035/viewer/2022072013/56812c7d550346895d9127e8/html5/thumbnails/8.jpg)
EECS 570: Fall 2003 -- rev1 8
Orchestration
• How do we achieve task communication, synchronization, and assignment given programming model?– data structures (naming)– task scheduling– communication: messages, shared data accesses– synchronization: locks, semaphores, barriers, etc,
• Goals– Reduce cost of communication and synchronization– Preserve data locality (reduce communication, enhance
caching)– Schedule tasks to satisfy dependencies early– Reduce overhead of parallelism management
![Page 9: EECS 570](https://reader035.fdocuments.us/reader035/viewer/2022072013/56812c7d550346895d9127e8/html5/thumbnails/9.jpg)
EECS 570: Fall 2003 -- rev1 9
Mapping
• Assign processes to processors– Usually up to OS, maybe with user
hints/preferences– Usually assume one-to-one, static
• Terminology:– space sharing– gang scheduling– processor affinity
![Page 10: EECS 570](https://reader035.fdocuments.us/reader035/viewer/2022072013/56812c7d550346895d9127e8/html5/thumbnails/10.jpg)
EECS 570: Fall 2003 -- rev1 10
Parallelizing Computation vs. Data
Above view is centered around computation• Computation is decomposed and assigned (partitioned)
Partitioning data is often a natural view too• Computation follows data: owner computes• Grid example: data mining; High Performance Fortran (HPF)
But not always sufficient• Distinction between comp. and data stronger in many applications
– Barnes-Hut, Raytrace
![Page 11: EECS 570](https://reader035.fdocuments.us/reader035/viewer/2022072013/56812c7d550346895d9127e8/html5/thumbnails/11.jpg)
EECS 570: Fall 2003 -- rev1 11
Assignment
• Static assignments (given decomposition into rows)– block: row i is assigned to
process floor(i/p)– cyclic: process i is
assigned rows I, i+p, and so on
• Dynamic– get a row index, work on the row, get a new row, and so on
• Static assignment reduces concurrency (from n to p)– block assign, reduces communication by keeping adjacent rows
together