ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23...
-
date post
22-Dec-2015 -
Category
Documents
-
view
217 -
download
2
Transcript of ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23...
![Page 1: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/1.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
ECE 669
Parallel Computer Architecture
Lecture 23
Parallel Compilation
![Page 2: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/2.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
Parallel Compilation
° Two approaches to compilation• Parallelize a program manually
• Sequential code converted to parallel code
° Develop a parallel compiler• Intermediate form
• Partitioning
- Block based or loop based
• Placement
• Routing
![Page 3: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/3.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
Compilation technologies for parallel machines
Assumptions:
Input: Parallel program
Output: “Coarse” parallel program& directives for:
• Which threads run in 1 task
• Where tasks are placed
• Where data is placed
• Which data elements go in each data chunk
Limitation: No special optimizations for synchronization -- synchro mem refs treated as any other comm.
![Page 4: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/4.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
Toy example
° Loop parallelization
![Page 5: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/5.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
Example° Matrix multiply
° Typically,
° Looking to find parallelism...
FORALL i
FORALL j
FOR k
C i, j C i, j A i ,k B k, j
![Page 6: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/6.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
Choosing a program representation...
° Dataflow graph
• No notion of storage problem
• Data values flow along arcs
• Nodes represent operations
+
a0 b0
c0
+
a2 b2
c2
+
a5 b5
c5
![Page 7: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/7.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
Compiler representation
° For certain kinds of structured programs
° Unstructured programs
A
DataX
Task A
Task B
LOOPLOOP nest
Data array
Index expressions
Communication weight
Array
A i, j
FORALL i
FORALL j
![Page 8: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/8.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
Process reference graph
• Nodes represent threads (processes) computation
• Edges represent communication (memory references)
• Can attach weights on edges to represent volume of communication
• Extension: precedence relations edges can be added too
• Can also try to represent multiple loop produced threads as one node
FORALL i FROM 0 to 5
C i A i B i
P0 P1 P2 P5
Memory Monolithicmemory
or 2 1
......Not very useful
![Page 9: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/9.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
• Allocate data items to nodes as well
• Nodes: Threads, data objects
• Edges: Communication
• Key: Works for both shared-memory, object-oriented, and dataflow systems! (Msg. passing)
Process communication graph
P0
A0 B0
C0
d1d0
1 1
1P1
A01 A2
C1
d3d2
1 1
1P2
B0
d5d4
P5
![Page 10: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/10.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
PCG for Jacobi relaxation
Fine PCG
Coarse PCG
4
4
4
4
4
: Computation
: Data
![Page 11: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/11.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
Compilation with PCGs
Fine process communication graph
Partitioning
Coarse process communication graph
![Page 12: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/12.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
Compilation with PCGs
Fine process communication graph
Coarse process communication graph
Placement
... other phases, scheduling. Dynamic?
MP:
Partitioning
Coarse process communication graph
![Page 13: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/13.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
Parallel Compilation
° Consider loop partitioning
° Create small local compilation
° Consider static routing between tiles
° Short neighbor-to-neighbor communication
° Compiler orchestrated
![Page 14: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/14.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
Flow Compilation
° Modulo unrolling
° Partitioning
° Scheduling
![Page 15: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/15.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
Modulo Unrolling – Smart Memory
° Loop unrolling relies on dependencies
° Allow maximum parallelism
° Minimize communication
![Page 16: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/16.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
Array Partitioning – Smart Memory
° Assign each line to separate memory
° Consider exchange of data
° Approach is scalable
![Page 17: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/17.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
Communication Scheduling – Smart Memory
° Determine where data should be sent
° Determine when data should be sent
![Page 18: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/18.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
Speedup for Jacobi – Smart Memory
° Virtual wires indicates scheduled paths
° Hard wires are dedicated paths
° Hard wires require more wiring resources
° RAW is a parallel processor from MIT
![Page 19: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/19.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
Partitioning
• Use heuristic for unstructured programs
• For structured programs...
...start from:
Arrays
List of arrays
A B C
L0 L1 L2
Loop Nests
List of loops
![Page 20: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/20.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
Notion of Iteration space, data space
E.g.
Forall i
Forall j
A i, j A i 2, j 1 A i , j 1
A Matrix
j
i
Data space
Iteration space
Represents a “thread” with a given value of i,j
![Page 21: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/21.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
° Partitioning: How to “tile” iteration for MIMD M/Cs data spaces?
Notion of Iteration space, data space
E.g.
Forall i
Forall j
A i, j A i 2, j 1 A i , j 1
A Matrix
j
i
Data space
Iteration space
Represents a “thread” with a given value of i,j
This thread affects the above computation
+
![Page 22: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/22.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
Loop partitioning for caches
° Machine model
• Assume all data is in memory
• Minimize first-time cache fetches
• Ignore secondary effects such as invalidations due to writes
Memory
Network
Cache Cache Cache
P PP
A
![Page 23: ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649d7e5503460f94a609fd/html5/thumbnails/23.jpg)
ECE669 L23: Parallel Compilation April 29, 2004
Summary
° Parallel compilation often targets block based and loop based parallelism
° Compilation steps address identification of parallelism and representations• Graphs often useful to represent program dependencies
° For static scheduling both computation and communication can be represented
° Data positioning is an important for computation