Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost...

37
Multiprocessors and the Interconnect

Transcript of Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost...

Page 1: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Multiprocessors and the Interconnect

Page 2: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Scope

TaxonomyMetricsTopologiesCharacteristics

cost performance

Page 3: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Interconnection

Carry data between processors and to memory

Interconnect components switches links (wires, fiber)

Interconnection network flavors static networks: point-to-point

communication linksAKA direct networks.

dynamic networks: switches and communication links

AKA indirect networks.

Page 4: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Static vs. Dynamic

Page 5: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Dynamic Networks

Switch: maps a fixed number of inputs to outputs

Number of ports on a switch = degree of the switch.

Switch cost grows as the square of switch degree peripheral hardware grows linearly with switch

degree packaging cost grows linearly with the number of

pins Key property: blocking vs. non-blocking

blocking path from p to q may conflict with path from r to s for independent p, q, r, s

non-blocking disjoint paths between each pair of independent sources

and sinks

Page 6: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Network Interface

Processor node’s link to the interconnect Network interface responsibilities

packetizing communication data computing routing information buffering incoming/outgoing data

Network interface connection I/O bus: PCI or PCIx on many modern systems memory bus: e.g. AMD HyperTransport, Intel QuickPath higher bandwidth and tighter coupling than I/O bus

Network performance depends on relative speeds of I/O and memory buses

Page 7: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Topologies

Many network topologiesTradeoff: performance vs. costMachines often implement

hybrids of multiple topologies packaging cost available components

Page 8: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Metrics

Degree number of links per node

Diameter longest distance between two nodes in the

network

Bisection Width min # of wire cuts to divide the network in 2

halves

Cost: # links or switches

Page 9: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Topologies: Bus

All processors access a common bus for exchanging data

Used in simplest and earliest parallel machines

Advantages distance between any two nodes is O(1) provides a convenient broadcast media

Disadvantages bus bandwidth is a performance

bottleneck

Page 10: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Bus Systems

A bus system is a hierarchy of buses connection various system and subsystem components. has a complement of control, signal, and power lines.

a variety of buses in a system: Local bus – (usually integral to a system board)

connects various major system components (chips) Memory bus – used within a memory board to connect

the interface, the controller, and the memory cells Data bus – might be used on an I/O board or VLSI chip to

connect various components Backplane – like a local bus, but with connectors to

which other boards can be attached

Page 11: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Bridges

The term bridge is used to denote a device that is used to connect two (or possibly more) buses.

The interconnected buses may use the same standards, or they may be different (e.g. PCI

in a modern PC).Bridge functions include

Communication protocol conversion Interrupt handling Serving as cache and memory agents

Page 12: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Bus

Since much of the data accessed by processors is local to the processor, cache is critical for the performance of busbased machines

Page 13: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Bus Replacement: Direct Connect

Intel Quickpath interconnect (2009 - present)

Page 14: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Direct Connect: 4 Node Configurations

Figure Credit : The Opteron CMP NorthBridgeArchitecture, Now and in the Future, AMD , PatConway, Bill Hughes , HOT CHIPS 2006

4N SQXFIRE BW 14.9GB/sDiam 2 avg:1

4N FCXFIRE BW 29.9GB/sDiam 1, Avg: 0.75

Page 15: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Direct Connect: 8 Node Configurations

Page 16: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Crossbar Network

A crossbar network uses an p×m grid of switches to connect p inputs to m outputs in a non-blocking manner

A non-blocking crossbar network connecting p processors to b memory banks

Cost of a crossbar: O(p^2) Generally difficult to scale for large values of p Earth Simulator: custom 640-way single-stage crossbar

Page 17: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Assessing Network Alternatives

Buses excellent cost scalability poor performance scalability

Crossbars excellent performance scalability poor cost scalability

Multistage interconnects compromise between these

extremes

Page 18: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Multistage Network

Page 19: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Multistage Omega Network

Organization log p stages p inputs/outputs

At each stage, input i is connected to output j if:

Page 20: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Omega Network Stage

Each Omega stage is connected in a perfect shuffle

Page 21: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Omega Network Switches

2×2 switches connect perfect shuffles

Each switch operates in two modes

Page 22: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Multistage Omega Network

Cost: p/2 × log p switching nodes → O(p log p)

Page 23: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Omega Network Routing

Let s = binary representation of the source processor d = binary representation of the destination

processor or memory The data traverses the link to the first

switching node if the most significant bit of s and d are the same

route data in pass-through mode by the switch else

use crossover path Strip off leftmost bit of s and d Repeat for each of the log p switching

stages

Page 24: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Omega Network Routing

Page 25: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Blocking in an Omega Network

Page 26: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Clos Network (non-blocking)

Page 27: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Star Connected Network

Static counterparts of busesEvery node connected only to a

common node at the centerDistance between any pair of

nodes is O(1)

Page 28: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Completely Connected Network

Each processor is connected to every other processor static counterparts of crossbars number of links in the network

scales as O(p^2)

Page 29: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Linear Array

Each node has two neighbors: left & right

If connection between nodes at ends: 1D torus (ring)

Page 30: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Meshes and k-d Meshes

Mesh: generalization of linear array to 2D nodes have 4 neighbors: north, south, east,

and west.k-d mesh:

d-dimensional mesh node have 2d neighbors

Page 31: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Hypercubes

Special d-dimensional mesh: p nodes, d = log p

Page 32: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Hypercube Properties

Distance between any two nodes is at most log p. Each node has log p neighbors Distance between two nodes = # of

bit positions that differ between node numbers

Page 33: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Trees

Page 34: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Tree Properties

Distance between any two nodes is no more than 2 log p

Trees can be laid out in 2D with no wire crossings

Problem links closer to root carry > traffic than

those at lower levels.Solution: fat tree

widen links as depth gets shallower copes with higher traffic on links near

root

Page 35: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Fat Tree Network

Fat tree network for 16 processing nodes Can judiciously choose “fatness” of links take full advantage of technology and

packaging constraints

Page 36: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Metrics for Interconnection Networks

Page 37: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Metrics for Dynamic Interconnection Networks