vikas_doc

download vikas_doc

of 11

Transcript of vikas_doc

  • 8/7/2019 vikas_doc

    1/11

    2011Presentation document

    Submitted To:

    Mrs. Harjeet Kaur

    Submitted By:

    Vikas Agarwal

    ROllNo: 73

    RegNo: 7070070006

    [PARALLEL ALGORITHM

    FOR MULTIPROCESSORS]

    SUBJECT: COMPUTER SYSTEM ARCHITECTURE (CSE 262)

  • 8/7/2019 vikas_doc

    2/11

    Parallel algorithm definition

    A parallel algorithm is an algorithm that has been specifically written for execution on a computer with

    two or more processing units.

    In computer science, a parallel algorithm or concurrent algorithm, as opposed to a traditional sequential

    (or serial) algorithm, is an algorithm which can be executed a piece at a time on many different

    processing devices, and then put back together again at the end to get the correct result.

    Some algorithms are easy to divide up into pieces like this. For example, splitting up the job of checking

    all of the numbers from one to a hundred thousand to see which are primes could be done by assigning a

    subset of the numbers to each available processor, and then putting the list of positive results back

    together.

    Most of the available algorithms to compute pi (), on the other hand, cannot be easily split up into

    parallel portions. They require the results from a preceding step to effectively carry on with the next step.

    Such problems are called inherently serial problems. Iterative numerical methods, such as Newton's

    method or the three-body problem, are also algorithms which are inherently serial. Some problems are

    very difficult to parallelize, although they are recursive. One such example is the depth-first search of

    graphs.

    Parallel algorithms are valuable because of substantial improvements in multiprocessing systems and the

    rise of multi-core processors. In general, it is easier to construct a computer with a single fast processor

    than one with many slow processors with the same throughput. But processor speed is increased primarily

    by shrinking the circuitry, and modern processors are pushing physical size and heat limits. These twin

    barriers have flipped the equation, making multiprocessing practical even for small systems.

    Parallel algorithms

    can be run on computers with single processor

    (multiple functional units, pipelined functional units, pipelined memory systems)

    A superscalar processor executes more than one instruction during a clock cycle by

    -> simultaneously dispatching multiple instructions to redundant functional units on the processor.

    ->Each functional unit is not a separate CPU core but an execution resource within a single CPU such as

    an arithmetic logic unit, a bit shifter, or a multiplier.

    Modelling algorithms

  • 8/7/2019 vikas_doc

    3/11

    when designing algorithm, take into account the cost of communication, the number of processors

    (efficiency)

    designer usually uses an abstract model of computation calledparallel random-access machine

    (PRAM)

    each CPU operation = one step (step like logical operations, memory accesses, arithmetic

    operations)

    models advantages

    an algorithms designer can ignore details of machine the algorithm is executed on

    neglects issues such as synchronisation and communication

    no limit on the number of processors in the machine

    any memory location is uniformely accessible from any processor

    no limit on the amount of shared memory in the system

    no conflict in accessing resources

    generally the programs written on those machines are MIMD

    In computing, MIMD (Multiple Instruction stream, Multiple Data stream) is a technique

    employed to achieve parallelism

    Multiprocessor model

    1. LOCAL MEMORY MACHINE MODEL

    MultiPROCESSOR

    MODEL

    LOCAL MEMORY

    MACHINE

    MODEL

    MODULAR

    MEMORYMACHINE

    MODELS

    PARALLEL

    RANDOM-ACCESS

    MACHINE

  • 8/7/2019 vikas_doc

    4/11

    A set of n processors each with its own local memory

    Processors connected to a common communication network

    Processor can access its own memory directly

    But also can access others processor memory, previously requesting it

    2. MODULAR MEMORY MACHINE MODELS

    typically the modules (proc and mem) are arranged in the way that the access to memory isuniform for all processors

    the time depends on communication network and memory access pattern

    3. PARALLEL RANDOM-ACCESS MACHINE

    processor can access any word of memory in a single step

  • 8/7/2019 vikas_doc

    5/11

    its just a model

    Work-depth model

    Picture: Summing 16 numbers on a tree. The total depth (longest chain of dependencies) is 4 and the total

    work (number of operations) is 15.

    How the cost of the algorithm can be calculated?

    Work - W

    Depth - D

    P = W/D PARALLELISM of the algorithm

    Mergesort

    Conceptually, a merge sort works as follows:

    - input: sequence of n keys

    - output: sorted sequence of n keys

    If the list is of length 1, then it is already sorted.

    Otherwise:

  • 8/7/2019 vikas_doc

    6/11

    Divide the unsorted list into two sublists of about half the size.

    Sort each sublist recursively by re-applying merge sort.

    Merge the two sublists back into one sorted list.

    Search

    Dynamic creation of tasks and channels during program execution

    Looking for nodes coresponding to solutions

    Initially a task created for the root of the tree

  • 8/7/2019 vikas_doc

    7/11

    Each circle represents a node in the search tree which is also a call to the search procedure.

    A task is created for each node in the tree as it is explored.

    At any one time, some tasks are actively engaged in expanding the tree further (these are shaded

    in the figure);

    others have reached solution nodes and are terminating, or are waiting for their offspring to

    report back with solutions.

    The lines represent the channels used to return solutions.

    Shortest-Path Algorithms

    The all-pairs shortest-path problem involves finding the shortest path between all pairs of vertices

    in a graph.

    A graph G=(V,E) comprises a set VofNvertices {vi} , and a set

    EV x Xof edges.

    For (vi, vj) and (vi,vj), i j

  • 8/7/2019 vikas_doc

    8/11

    In graph theory, the shortest path problem is the problem of finding a path between two vertices (or

    nodes) such that the sum of the weights of its constituent edges is minimized. An example is finding the

    quickest way to get from one location to another on a road map; in this case, the vertices represent

    locations and the edges represent segments of road and are weighted by the time needed to travel that

    segment.

    Formally, given a weighted graph (that is, a set V of vertices, a set E of edges, and a real-valued weight

    function f : E R), and one element v of V, find a path P from v to a v' of V so that \sum_{p\in P} f(p)

    is minimal among all paths connecting v to v' .

    The problem is also sometimes called the single-pair shortest path problem, to distinguish it from the

    following generalizations:

    * The single-source shortest path problem, in which we have to find shortest paths from a source vertexv to all other vertices in the graph.

    * The single-destination shortest path problem, in which we have to find shortest paths from all vertices

    in the graph to a single destination vertex v. This can be reduced to the single-source shortest path

    problem by reversing the edges in the graph.

    * The all-pairs shortest path problem, in which we have to find shortest paths between every pair of

    vertices v, v' in the graph.

  • 8/7/2019 vikas_doc

    9/11

    Floyds algorithm

    Floyds algorithm is a graph analysis algorithm for finding shortest paths in a weighted graph.

    A single execution of the algorithm will find the shortest paths between allpairs of vertices.

  • 8/7/2019 vikas_doc

    10/11

    parallel Floyds algorithm

    Parallel Floyds algorithm

    The first parallel Floyd algorithm is based on a one-dimensional, rowwise domain decomposition

    of the intermediate matrixIand the output matrix S.

    the algorithm can use at mostNprocessors.

    Each task has one or more adjacent rows ofIand is responsible for performing computation on

    those rows.

    Parallel version of Floyd's algorithm based on a one-dimensional decomposition of the I matrix.

    In (a), the data allocated to a single task are shaded: a contiguous block of rows.

    In (b), the data required by this task in the k th step of the algorithm are shaded: its own block

    and the k th row.

    Another Parallel Floyds algorithm

    An alternative parallel version of Floyd's algorithm uses a two-dimensional decomposition of the

    various matrices.

  • 8/7/2019 vikas_doc

    11/11