Presented by: Tan Q. Nguyen

35
Presented by: Tan Q. Nguyen

description

ORGANIZATION OF MULTIPROCESSOR SYSTEMS. Section 12.2. Presented by: Tan Q. Nguyen. Why do we need multiprocessor systems?. Who will carry this?. Recall the goal of computer architecture. To maximize computer system performance Within the CPU: - PowerPoint PPT Presentation

Transcript of Presented by: Tan Q. Nguyen

Page 1: Presented by:        Tan Q. Nguyen

Presented by:

Tan Q. Nguyen

Page 2: Presented by:        Tan Q. Nguyen

Why do we need multiprocessor systems?

Who will carry this?

Page 3: Presented by:        Tan Q. Nguyen

Recall the goal of computer architecture

To maximize computer system performance Within the CPU:

Incorporate an instruction pipeline to increase the number of instructions processed per clock cycle

Include cache memory to reduce the time needed to load and store data

speeds up transfers between memory and I/O devices by using Direct Memory Access (DMA) controller

Be able to check status of I/O devices by accepting interrupts

Page 4: Presented by:        Tan Q. Nguyen

The significance of multiprocessor systems

Just another way to maximize system performance

Page 5: Presented by:        Tan Q. Nguyen

Multiport Memory Is designed for the purpose of handling

multiple transfers within the memory itself.– A Multiport memory chip has two sets of

address , data and control pins for simultaneous data transfer.

– The CPU and DMA Controller can transfer data concurrently.

– A system with more than one CPU can handle simultaneous requests from two different processors.

Page 6: Presented by:        Tan Q. Nguyen

Multiport Memory Advantage

– Can handle two requests to read data from the same location at the same time

Disadvantage– Multiport Memory

cannot process two simultaneous requests to write data to the same memory location or to read from and write to the same memory location.

Page 7: Presented by:        Tan Q. Nguyen

The ways to organize processors with multiprocessor systems

There are many diverged, complex designs of computers.

Three known designs are:– Flynn’s Classification– System Topologies– MIMD System Architectures.

Page 8: Presented by:        Tan Q. Nguyen

Flynn’s Classification Named after the researcher Michael J.

Flynn. This classification is based on the flow

of instructions and data processing within the computer.

There are four categories:– SISD: single instruction single data– SIMD: single instruction multiple data– MISD: Multiple instruction single data– MIMD: Multiple instruction multiple data

Picture found in Dr. Lee’s website

Page 9: Presented by:        Tan Q. Nguyen

Flynn’s Classification (cont’d)

SISD: the classic von Neumann architecture MISD: not practical – Forget it SIMD: practical – but unnecessary to use

multiple processors to fetch and decode one single instruction.– An only significance of SIMD organization:

all the processors are less complex than traditional CPUs

Page 10: Presented by:        Tan Q. Nguyen

The classic von Neumann architecture

CPU MemorySubsystem

Address Bus

Data Bus

Control Bus

I/ODevice

I/ODevice

...

I/O Subsystem

Page 11: Presented by:        Tan Q. Nguyen

Generic organization of SIMD

MainMemory

ControlUnit

Processor

Processor

Processor

Memory

Memory

Memory

.

.

.....

CommunicationNetwork

Page 12: Presented by:        Tan Q. Nguyen

Generic organization of MIMD

Each processor has its own control unit

The processors can be assigned to parts of the same task or to completely separate tasks, which depends on their topology and architecture

Page 13: Presented by:        Tan Q. Nguyen

System Topologies The Topology of a Multiprocessor System

refers to the pattern of connections between its processors– Diameter: the maximum distance between two

processors– Bandwidth: the capacity of a communications

link multiplied by the number of such links in the system

– Bisection bandwidth: Split the processors into two halves Compute the total bandwidth of the links connecting

two halves

Page 14: Presented by:        Tan Q. Nguyen

Types of System Topologies

Shared Bus Ring Tree Mesh Hypercube. Completely Connected.

Page 15: Presented by:        Tan Q. Nguyen

Shared Bus Processors communicate with each

other exclusively via this bus. The bus can only handle one data

transmission at a time. Its diameter is 1 , total bandwidth is 1*l

and bisection bandwidth is also 1*l (where l is the bandwidth).

Page 16: Presented by:        Tan Q. Nguyen

M M M

PP P

Shared Bus

Global Memory

...

Page 17: Presented by:        Tan Q. Nguyen

Ring Processors communicate with each

other directly instead of a bus. All communication links are active

simultaneously. A ring with n processors has

diameter of |_n/2_| , total bandwidth of n*l and bisection bandwidth is 2*l (where l is the bandwidth).

Page 18: Presented by:        Tan Q. Nguyen

P

PP

PP

P

Page 19: Presented by:        Tan Q. Nguyen

Tree Processors communicate with each

other directly like in ring topology. Each processor has three connections. It has an advantageously low diameter of

2*|_log n_| , total bandwidth of (n-1)*l and bisection bandwidth of 1*l (where l is the bandwidth).

Page 20: Presented by:        Tan Q. Nguyen

P

P

PP

P

PP

Page 21: Presented by:        Tan Q. Nguyen

Mesh Every processor connects to the

processors above and below it , and to its left and right.

It has a diameter of 2n , total bandwidth of (2n - 2n) and bisection bandwidth of 2n*l (where l is the bandwidth).

Page 22: Presented by:        Tan Q. Nguyen

P P P

P

P

P

P

P

P

Page 23: Presented by:        Tan Q. Nguyen

Hypercube Is a multidimensional mesh. It has n processors with nlogn

connections. It has a relatively low diameter of

logn , total bandwidth of (n/2)*logn*l and a bisection bandwidth of (n/2)*l (where l is the bandwidth).

Page 24: Presented by:        Tan Q. Nguyen

P

PP

PP

PP

P P

PP

PP

PP

P

Page 25: Presented by:        Tan Q. Nguyen

Completely Connected• Every processor has n-1

connections , one to each of the other processors.

• Its diameter is 1 , a total bandwidth of (n/2)*(n-1)*l and bisection bandwidth of (|_n/2_| * n/2 )*l (where l is the bandwidth)

Page 26: Presented by:        Tan Q. Nguyen

PP

PP

PP

PP

Page 27: Presented by:        Tan Q. Nguyen

MIMD Architectures The Architecture of an MIMD system

refers to its connections with respect to system memory.

A Symmetric Multiprocessor ( SMP ) is a computer system that has two or more processors with comparable capabilities.– The processors are capable of performing

the same functions ; this is the symmetry of the SMPs.

Page 28: Presented by:        Tan Q. Nguyen

Types of SMP Uniform Memory Access ( UMA ). NonUniform Memory Access

( NUMA ). Cache Coherent NUMA ( CC-NUMA). Cache Only Memory Access

( COMA ).

Page 29: Presented by:        Tan Q. Nguyen

UMA UMA gives all CPUs equal access to

all locations in shared memory.

Processor 1

Communications Mechanism

Shared Memory

Processor 2

Processor n

.

.

.

Page 30: Presented by:        Tan Q. Nguyen

NUMA NUMA architectures do not allow uniform access to all

shared locations. Each processor can access the memory module closest

to it , its local shared memory faster than the other modules , hence ununiform memory access times.

Example: The Cray T3E Supercomputer.

Page 31: Presented by:        Tan Q. Nguyen

Processor 1 Processor 2 Processor n

Memory 1 Memory 1

Communications Mechanism

. . .Memory n

Page 32: Presented by:        Tan Q. Nguyen

Cache Coherent NUMA CCNUMA

It is similar to the NUMA Architecture. In addition each processor includes

cache memory. Example: Silicon Graphic’s SGI.

Page 33: Presented by:        Tan Q. Nguyen

Cache Only Memory Access

COMA In this architecture , each processor’s

local memory is treated as a cache. Example: 1 )Kendall Square Research’s KSR1

and KSR2. 2 )The Swedish Institute of Computer

Science’s Data Diffusion Machine ( DDM ).

Page 34: Presented by:        Tan Q. Nguyen

Multicomputer Network Of Workstations ( NOW ) or

Cluster Of Workstations ( COW ):– NOWs and COWs are more than a

group of workstations on a local area network (LAN).

– They have a master scheduler , which matches tasks and processors together.

Page 35: Presented by:        Tan Q. Nguyen

Massive parallel processor (MMP)

Consist of many self-contained nodes , each having a processor , memory , and hardware for implementing internal communications.

The processors communicate with each other using shared memory.

Example: IBM’s Blue Gene.