Parallel Processrs Ppt

download Parallel Processrs Ppt

of 40

Transcript of Parallel Processrs Ppt

  • 8/13/2019 Parallel Processrs Ppt

    1/40

    PRESENTATION ON

    PARALLELPROCESSORS

  • 8/13/2019 Parallel Processrs Ppt

    2/40

    INTRODUCTION

    A parallel processor is a processor that performs concurrent* data processingtasks, which results in lesser execution time.

    Parallel processing involves simultaneous computations in the CPU for the

    purpose of increasing its computational speed. Instead of processing each

    instruction sequentially as in conventional computers. Parallel processing is

    established by distributing the data among the multiple functional units.

    For example, while an instruction is being executed in the ALU, the next

    instruction can be read from memory. The arithmetic, logic, and shift operations

    can be separated into three units and the operand diverted to each unit under the

    supervision of a control unit.

  • 8/13/2019 Parallel Processrs Ppt

    3/40

    Processor with multiple functional units

    www.ustudy.in

    Adder- subtractor

    Integer multiply

    Floating point

    add-subtract

    Incrementer

    Logic unit

    Shift unit

    Floating pointmultiply

    Floating pointdivide

    Processorregisters

    To

    memory

  • 8/13/2019 Parallel Processrs Ppt

    4/40

    The figure shows one possible way of separating the execution unit into eight

    functional units.

    They operands in the registers are applied to one of the units depending on the

    operation specified by the instruction.

    The adder-subtractor and integer multiplier perform the arithmetic operations with

    integer number.

    The floating point operations are separated into three circuits operating in parallel.

    The logic, shift and increment operations can be performed concurrently on

    different data.

    All units are independent of each other. So one number can be incremented while

    another number is being shifted.

  • 8/13/2019 Parallel Processrs Ppt

    5/40

    ADVANTAGES :

    Lesser execution time, so higher throughput, which is the

    maximum number of results that can be generated per unittime by a processor.

    Parallel processing is much faster than sequentialprocessing when it comes to doing repetitive calculations

    on vast amounts of data. This is because a parallelprocessor is capable of multithreading on a large scale, andcan therefore simultaneously process several streams ofdata. This makes parallel processors suitable for graphics

    cards since the calculations required for generating themillions of pixels per second are all repetitive.

    Disadvantages:- More hardware required, also more powerrequirements. Not good for low power and mobile devices.

  • 8/13/2019 Parallel Processrs Ppt

    6/40

    CLASSIFICATION

    There are variety of ways that parallel processing canbe classified.

    It can be based on the Internal organization of the processor

    The interconnection structure between processors

    The flow of information through the system

  • 8/13/2019 Parallel Processrs Ppt

    7/40

    Micheal J. Flynns classification

    one of the earliest classification systems for parallel (andsequential) computers and programs, now knownas Flynn's taxonomy.

    it is the organization of computer systems by

    o the number of instructions ando data sets that are manipulated simultaneously.

    Flynns classification divides computers into four majorgroups as follows:

    Single Instruction, Single Data (SISD) Single Instruction, Multiple Data (SIMD)

    Multiple Instruction, Single Data (MISD)

    Multiple Instruction, Multiple Data (MIMD)

    http://en.wikipedia.org/wiki/Flynn's_taxonomyhttp://en.wikipedia.org/wiki/Flynn's_taxonomy
  • 8/13/2019 Parallel Processrs Ppt

    8/40

    Single Instruction, Single Data (SISD) SISD represents a serial (non-parallel) computer, containing

    a control unit, a processor unit, and a memory unit.

    Single instruction: only one instruction stream is being actedon by the CPU during any one clock cycle

    Single data: only one data stream is being used as inputduring any one clock cycle

  • 8/13/2019 Parallel Processrs Ppt

    9/40

    Instructions are executed sequentiallyand the system may or may not have

    internal parallel processingcapabilities.

    Parallel processing in this case may beachieved by means of multiplefunctional units or by pipelineprocessing.

    This is the oldest and even today, themost common type of computer

    Examples: older generationmainframes, minicomputers andworkstations; most modern day PCs.

  • 8/13/2019 Parallel Processrs Ppt

    10/40

    Single Instruction, Multiple Data (SIMD) A type of parallel computer

    It represents an organization that includes many processing units under thesupervision of a common control unit.

    Single instruction: All processing units execute the same instruction at anygiven clock cycle

    Multiple data: Each processing unit can operate on a different data element

    The shared memory unit must contain multiple modules so that it can

    communicate with all the processors simultaneously. Best suited for specialized problems characterized by a high degree of

    regularity, such as graphics/image processing.

    Examples:

    Processor Arrays: Connection Machine CM-2, MasPar MP-1 & MP-2, ILLIAC IV

    Vector Pipelines: IBM 9000, Cray X-MP, Y-MP & C90, Fujitsu VP, NEC SX-2,Hitachi S820, ETA10

    Most modern computers, particularly those with graphics processor units(GPUs) employ SIMD instructions and execution units.

  • 8/13/2019 Parallel Processrs Ppt

    11/40

  • 8/13/2019 Parallel Processrs Ppt

    12/40

    Multiple Instruction, Single Data

    (MISD) MISD structure is only of theoretical interest since no

    practical system has been constructed using thisorganization.

    single data stream is fed into multiple processing units. Each processing unit operates on the data independently

    via independent instruction streams.

    Few actual examples of this class of parallel computer haveever existed. One is the experimental Carnegie-MellonC.mmp computer (1971).

    Some conceivable uses might be: Multiple frequency filtersoperating on a single signal stream. Multiple cryptographyalgorithms attempting to crack a single coded message.

  • 8/13/2019 Parallel Processrs Ppt

    13/40

  • 8/13/2019 Parallel Processrs Ppt

    14/40

    Multiple Instruction, Multiple Data (MIMD)

    MIMD organization refers to a computer system capable of processingseveral programs at the same time.

    Most multiprocessor and multicomputer systems can be classified inthis category.

    Currently, the most common type of parallel computer. Most modern

    computers fall into this category. Multiple Instruction: every processor may be executing a different

    instruction stream

    Multiple Data: every processor may be working with a different datastream

    Execution can be synchronous or asynchronous, deterministic or non-deterministic

    Examples: most current supercomputers, networked parallel computerclusters and "grids", multi-processor SMP computers, multi-core PCs.

    Note: many MIMD architectures also include SIMD execution sub-

    components

  • 8/13/2019 Parallel Processrs Ppt

    15/40

  • 8/13/2019 Parallel Processrs Ppt

    16/40

    A superscalar architecture is one in which several instructions can beinitiated simultaneously and executed independently.

    They have the ability to initiate multiple instructions during thesame clock cycle.

  • 8/13/2019 Parallel Processrs Ppt

    17/40

    A superscalar architecture consists of a numberofpipelinesthat are working in parallel.

  • 8/13/2019 Parallel Processrs Ppt

    18/40

    PIPELINE A pipelineis a set of data processing elements connected

    in series, so that the output of one element is the input ofthe next one.

  • 8/13/2019 Parallel Processrs Ppt

    19/40

    PIPELINING Pipelining allows the processor to read a new instruction from memory

    before it is finished processing the current one. As an instruction goesthrough each stage, the next instruction follows it does not need to wait untilit completely finishes.

    Pipelining saves time by ensuring that the microprocessor can start theexecution of a new instruction before completing the current or previousones. However, it can still complete just one instruction per clock cycle.

  • 8/13/2019 Parallel Processrs Ppt

    20/40

    ADVANTAGES

    Allows for instruction execution rate to exceed theclock rate (CPI of less than 1).

    It thereby allows faster CPU throughput thanwould otherwise be possible at the same clockrate.

    THROUGHPUT- It is maximum number ofinstructions that can be carried out at a givenperiod of time.

    http://en.wikipedia.org/wiki/Throughputhttp://en.wikipedia.org/wiki/Clock_ratehttp://en.wikipedia.org/wiki/Clock_ratehttp://en.wikipedia.org/wiki/Clock_ratehttp://en.wikipedia.org/wiki/Clock_ratehttp://en.wikipedia.org/wiki/Throughput
  • 8/13/2019 Parallel Processrs Ppt

    21/40

    Superscalar Architectures

    A typical Superscalar processor fetches and decodes theincoming instruction stream several instructions at a time.

    Superscalar Execution

  • 8/13/2019 Parallel Processrs Ppt

    22/40

    Instruction-Level Parallelism Superscalar processors are designed to exploit more

    instruction-level parallelism in user programs.

    For example,load R1 R2 add R3 R3, 1

    add R3 R3, 1 add R4 R3, R2

    add R4 R4, R2 store [R4] R0

    The three instructions on the left are independent, and intheory all three could be executed in parallel.

    The three instructions on the right cannot be executed in

    parallel because the second instruction uses the result of thefirst, and the third instruction uses the result of the second.

  • 8/13/2019 Parallel Processrs Ppt

    23/40

    Fetching and dispatching two instructions per cycle

    (degree 2)

  • 8/13/2019 Parallel Processrs Ppt

    24/40

    One floatingpoint and two integer operations are issued and

    executed simultaneously; each unit is pipelined and executes

    several operations in different pipeline stages.

  • 8/13/2019 Parallel Processrs Ppt

    25/40

    Hardware Organization of a superscalar

    processor

  • 8/13/2019 Parallel Processrs Ppt

    26/40

    Some Architectures PowerPC 604

    six independent execution units: Branch execution unit Load/Store unit 3 Integer units Floating-point unit

    in-order issue register renaming

    Power PC 620 provides in addition to the 604 out-of-order issue

    Pentium three independent execution units:

    2 Integer units Floating point unit

    in-order issue

  • 8/13/2019 Parallel Processrs Ppt

    27/40

    Intel P5 Microarchitecture

    Used in initial Pentium processor

    Could execute up to 2 instructions simultaneously

  • 8/13/2019 Parallel Processrs Ppt

    28/40

    PIPELINING: Pipelining is a technique of decomposing a sequential process

    (instruction) into sub operations and each of the suboperations get executed in a special dedicated segment that

    operates concurrently with all other segments.

    Each segment performs partial processing dictated by the waythe task is partitioned. The result obtained from thecomputation in each segment is transferred to the nextsegment in the pipeline.

  • 8/13/2019 Parallel Processrs Ppt

    29/40

  • 8/13/2019 Parallel Processrs Ppt

    30/40

    SUPERPIPELINING:

    Superpipelining is the breaking of longer stages of apipeline into smaller stages and this shortens the clockperiod per instruction. Therefore more number ofinstructions can be executed in the same time ascompared to pipelined structure.

    The breaking of stages increases the efficiency as clocktime is determined by the longest stage.

  • 8/13/2019 Parallel Processrs Ppt

    31/40

  • 8/13/2019 Parallel Processrs Ppt

    32/40

    TIMING DIAGRAM :

  • 8/13/2019 Parallel Processrs Ppt

    33/40

    Comparison of clock time per

    cycle:

  • 8/13/2019 Parallel Processrs Ppt

    34/40

    Some processors which have super pipelined

    architecture are :-

    MIPS R400,Intel Net Burst,ARM11 core.

    ARM cores are famous for their simple and cost-effectivedesign. However, ARM cores have also evolved and showsuperpipelining characteristics in their architectures andhave architectural features to hide the possible longpipeline stalls.The ARM11 (specifically, the ARM1136JF) is ahigh performance and low-power processor which isequipped with eight stage pipelining.

    The core consists of two fetch stages, one decode stage,one issue stage, and four stages for the integer pipeline.

  • 8/13/2019 Parallel Processrs Ppt

    35/40

    The eight stages of ARM11 core are :

  • 8/13/2019 Parallel Processrs Ppt

    36/40

    DIFFERENCE BETWEEN SUPERSCALING

    AND SUPER PIPELINING SUPERSCALING :It creates multiple pipelines within

    a processor, allowing the CPU to execute multipleinstructions simultaneously.

    SUPERPIPELINING :It breaks the instructionpipeline into smaller pipeline stages , allowing the

    CPU to start executing the next instruction beforecompleting the previous one. The processor can runmultiple instructions simultaneously, with eachinstruction being at a different stage of completion.

    ASPECTS SUPERSCALING SUPERPIPELINING

  • 8/13/2019 Parallel Processrs Ppt

    37/40

    1. APPROACH Dynamically issuesmultiple instruction

    per cycle.

    Divides the long latencystages of the pipeline into

    shorter stages.

    2. INSTRUCTION I SSUE

    RATE

    Multiple Multiple(different instructionsat different stages ofcompletion)

    3.EFFECTS Effects the clock perinstruction (CPI) termof the performanceequation

    Effects the clock cycletime term of the

    performance equation.

    4.DI FF ICULTY OF

    DESIGN

    Complex design issues Relatively easierdesign

    5.ADDI TIONAL AIDS Additional hardwareunits required like the

    fetch units.

    No additionalhardware unitsrequired.

  • 8/13/2019 Parallel Processrs Ppt

    38/40

    INSTRUCTION ISSUE STYLE: Both superscaling and superpipelining follow dynamic

    instruction scheduling. In dynamic scheduling, the instructions are fetched

    sequentially in program order. However, those instructionsare decoded and stored in a scheduling window of a

    processor execution core. After decoding the instructions,the processor core obtains the dependency informationbetween the instructions and can also identify theinstructions which are ready for execution.

    *The performance equation of a microprocessor :

    Execution Time = IC *CPI * clock cycle time

  • 8/13/2019 Parallel Processrs Ppt

    39/40

    CONCLUSION: From all these we can conclude that the techniques of

    parallel processing , superscaling and superpipeliningare different architectural improvements introduced to

    increase the efficiency of the modern day computers.

  • 8/13/2019 Parallel Processrs Ppt

    40/40