Discussion 05

CS-421 Parallel Processing BE (CIS) Batch 2005-06 Discussion-05

Page - 1 - of 2

Granularity Definition # 1

The level on which work is done in parallel or the task size in a parallel processing environment

Examples

Job / Program Level - The highest level of parallelism conducted among programs through multiprogramming/timesharing,

multiprocessing.

- Coarsest granularity

Task / Procedure Level Conducted among tasks of a common program (problem) e.g. multithreading

Interinstruction Level Conducted among instructions through superscalar techniques

Intrainstruction Level - Conducted among different phases of an instruction through pipelining

- Finest granularity

Definition # 2

Granularity = (Time spent on Computation)/ (Time spent on Communication)

Fine-Grained Applications 9 low granularity i.e. more communication and less computation 9 less opportunity for performance enhancement 9 Facilitates load balancing

Coarse-Grained Applications 9 High granularity i.e. large number of instructions between synchronization and communication points 9 More opportunity for performance enhancement 9 Hardened to balance load

*****

Multiple Issue Architectures These architectures are able to execute multiple instructions in one clock cycle (i.e. performance beyond just

pipelining). An N-way or N-issue architecture can achieve an ideal CPI of 1/N.

There are two major methods of implementing a multiple issue processor.

Static multiple issue Dynamic multiple issue Static Multiple Issue Architecture The scheduling of instructions into issue slots is done by the compiler. We can think of instructions issued in a given clock cycle forming an instruction packet.

CS-421 Parallel Processing BE (CIS) Batch 2005-06 Discussion-05

Page - 2 - of 2

It is useful to think of the issue packet as a single instruction allowing several operations in predefined fields. This was the reason behind the original name for this approach: Very Long Instruction Word

(VLIW) architecture.

Intel has its own name for this technique i.e. EPIC (Explicitly Parallel Instruction Computing) used in Itanium series.

If it is not possible to find operations that can be done at the same time for all functional units, then the instruction may contain a NOP in the group of fields for unneeded units.

Because most instruction words contain some NOPs, VLIW programs tend to be very long. The VLIW architecture requires the compiler to be very knowledgeable of implementation details of the

target computer, and may require a program to be recompiled if moved to a different implementation of

the same architecture.

Dynamic Multiple Issue Architecture Also known as superscalars

The processor (rather than the compiler) decides whether zero, one, or more instructions can be issued in

a given clock cycle.

Support from compilers is even more crucial for the performance of superscalars because a superscalar

processor can only look at a small window of program. A good compiler schedules code in such a way

that facilities scheduling decisions by the processor.

******

Discussion 05

Documents

Transcript of Discussion 05