Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining...

21
Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems

Transcript of Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining...

Page 1: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

Parallelism

Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems

Page 2: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

Last time

Looked at How instructions are executed in a computer.

Page 3: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

Instruction parallelism

One method of improving performance machines is to increase clock speed, but this is a limited and ‘brute force’ approach.

So parallelism (which is doing more than one thing at a time) has been used as an approach to get more performance for a given clock speed.

Page 4: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

instruction-level parallelism processor-level parallelism.

Individual instruction-level parallelism is used within individual instructions to get more instructions/sec out of a machine: Pipelining.

Page 5: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

Pipelining is a key technique used to make faster CPUs.

Processors allow instructions to be executed in stages; stages implemented using separate hardware.

Stages connected together forming an instruction pipeline, allowing more than one instruction to be processed at the same time.

Page 6: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

Tannebaum, 2006:

The instruction fetch stage (IF): - fetches instructions from memory or an instruction cache. It requires use of the fetch unit controlling the PC and buses to gain access to memory.

The instruction decode (ID):- uses the control unit (CU) to decode instructions and identify any source operands. Intermediate operands and operands stored in the register file moved into temporary ALU registers during this stage.

The execution stage (EX):- Arithmetic Logic Unit (ALU) performs operations on operand stored in input registers, and stores result in temporary ALU output registers.

The write-back (WB):- The contents of ALUs temporary output registers are copied to the register file.

Page 7: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

Fetch-execute

cycle stages

fetch decode execute write-back

1 instruction 1

2 instruction 1

3 instruction 1

4 instruction 1

5 instruction 2

6 instruction 2

7 instruction 2

8 instruction 2

Page 8: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

Pipelining

stages cycle fetch decode execute write-back

1 instruction 1 2 instruction 2 instruction 1 3 instruction 3 instruction 2 instruction 1 4 instruction 4 instruction 3 instruction 2 instruction 1 5 instruction 5 instruction 4 instruction 3 instruction 2 6 instruction 6 instruction 5 instruction 4 instruction 3 7 instruction 7 instruction 6 instruction 5 instruction 4 8 instruction 8 instruction 7 instruction 6 instruction 5

Page 9: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

An analogy of pipelining

Imagine a warehouse packing plant. Worker 1 puts a box on the conveyer belt; worker 2 puts the product into box and seals it, worker 3 puts address-label on box, and worker 4 picks the box to be delivered. However, after each time the worker has finished the task they do not wait for the whole procedure to be finished by the final worker, they are getting the next job (putting another empty box on the conveyer belt). Pipelining instructions works like this - several processing stages before it is completed, but the stages are working in a parallel.

Page 10: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

Speed up

If a k-stage pipeline executes n instructions using a clock with a cycle time t, without overlapping instructions the total time to execute instructions will be

So if 4 stages are used (k=4), 4 instructions (n=4), and t=1s, Ts=16s.

nktTs

Page 11: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

If instructions are executed in parallel, where kt is the time to fill-up to the point where the first instruction completes and (n-1)t is time taken for the remaining (n-1) instructions at a rate of one per clock cycle.

tnktTp )1(

Page 12: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

Speedup factor, S=Ts/Tp=(nk)/(k+n-1)

If n=50 (50 instructions in a sequence)

S= (50*4)/(4+50-1)=3.77

If n=100 S=3.88

Page 13: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

Problems with Pipelining

Unfortunately branching instructions alter the program flow. So a pipeline can become filled with instructions that are no longer needed, which are flushed from the pipe, so it can be filled with a new stream of instructions. This ‘wastes’ clock-cycles.

Resource Hazards: If two stages need to access same resource (e.g. ALU, same register). One solution is too duplicate the hardware, but being aware that delays can still occur depending on the order of the instructions.

Page 14: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

Pipelines are included in some CISC processors (e.g. MC68040 and Intel 80486). but software written for these did not in general make effective use of pipelining, and therefore speeding-up offered by the approach.

In superpipeling instructions are broken down into even finer steps by lengthening the pipeline (adding stages).

Page 15: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

An alternative is if one pipeline is good, why not increase the number of pipelines, allowing multiple instructions to be issued in the same clock cycle. Complex rules are used to determine whether a pair of instructions can be executed in parallel, in a similar way to the scheduling discussed earlier.

Page 16: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

Superscalar processors such as the Intel Pentium uses multiple pipelines, and uses scheduling of variable length blocks of instructions.

Page 17: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

Superscalar Architecture

Modified version of figure 2-6 Tannebaum (2006) pg 65

Page 18: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

In a superscalar approach contains a single pipeline but with multiple functional units, in the execution stage (e.g. such as multiple ALUs, floating point processor). The operand decode stage (s3) must be quicker than the functional units are able to execute the instructions. So no more than one functional unit will be busy at once.

Functional units are implemented in hardware. Pentium II had a similar structure to that shown

previously.

Page 19: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

Other forms of Parallelism

Multiprocessors: Consists of a large number of identical processors that operate in parallel, usually sharing a common memory. Since each CPUs can read and write to memory, software is usually used to co-ordinate the CPUs to avoid clashes. One design is to give each CPU local memory of its own, as well as the main memory. This extra memory can be used for program code and data not used by other CPUs, reducing the amount of BUS traffic to main memory.

Page 20: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

Multi-computers:- CPUs communicate over networks. Private memory not shared memory, but gives the illusion of it.

Multiprocessor systems are easier to program, but multi-computer systems are easier to build.

‘Soupercomputer’: WWW.sciam.com/2001/0801issue/0801hargrove.html

Page 21: Parallelism Processing more than one instruction at a time. Pipelining Speed-up due to pipelining Briefly consider multi-processor systems.

Reference and further reading

Tannebaum AS(2006) Structured Computer Organisation pg 65, ISBN 0-13-148521-0

Soupercomputer’: WWW.sciam.com/2001/0801issue/0801hargrove.html