EN2910A: Advanced Computer Architecture - Brown...

Post on 22-Apr-2018

219 views 4 download

Transcript of EN2910A: Advanced Computer Architecture - Brown...

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers

Prof. Sherief Reda School of Engineering

Brown University

1

Material from: •  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale

Machines, Second Edition •  Computer Architecture: A Quantitative Approach by Hennessy and Patterson

Warehouse-scale computers

2

The data center as a computer. L. Barroso

Performance metrics for clusters

•  Supercomputers: –  Execution time –  Threads communicate using message passing –  FLOPS (FLOP/s): theoretical peak or using a standard

benchmark (e.g., LINPACK is used for Top-500 supercomputer ranking)

•  Data center scale: –  Abundant parallelism through request-level parallelism –  Latency is important metric because it is seen by users –  Bing study: users will use search less as response time

increases –  Service Level Objectives (SLOs)/Service Level Agreements

(SLAs). E.g. 99% of requests be below 100 ms

S. Reda EN2910A FALL’13 3

Anatomy of data center

4

•  7-ft rack (42U) holds 40 1U servers and a rack-level ethernet switch •  Rack switches have uplinks connecting to cluster-level switches

Servers

•  Data centers: –  1U servers; each has two quad-core processors & 16 GB –  Disk attached to each node, 1Gbps ethernet

•  Supercomputers –  Use more expensive high-density blade servers and GPGPU –  Blade chassis has number of blades (e.g., Cray XE6 blade with

four 16-core AMD Opteron processors with 64 GB). –  Chassis provides shared power supply and cooling 5

Blade chassis

Power usage of servers

6

Assumes two-socket x86 servers, 16 DIMMs and 8 disk drives per server, and an average utilization of 80%.

Storage hierarchy of a data center

7

Data centers use global distributed storage (disks attached to each compute node), whereas supercomputers use Network Attached Storage (NAS) devices connected directly to the cluster network.

Data center network

8

Datacenters often reduce communication costs by oversubscribing the network at the top-of-rack. In an oversubscribed network, we increase that 1:1 ratio between server and fabric ports. For example, with 2:1 oversubscription we only build a uplinks with half the bandwidth. Each server can still peak at 10 Gbps of traffic, but if all servers are simultaneously sending traffic they’ll only be able to average 5 Gbps. In practice, oversubscription ratios of 4–10 are common.

Quantifying latency, bandwidth and capacity

9

•  Data center assumption: 2,400 servers, each with 16 GB of DRAM and four 2 TB disk drives

•  Each group of 80 servers is connected through a 1-Gbps link to a rack-level switch that has an additional eight 1-Gbps ports used for connecting the rack to the cluster-level switch

Support equipment

10

Power distribution and losses

11

•  More losses inside servers to convert from AC to DC •  Total losses about 11%

Heat flow inside a facility

12

•  Datacenters employ raised floors. The underfloor area often contains power cables to racks, but its primary purpose is to distribute cool air to the server racks through perforated tiles.

•  Heat recirculation contaminates cold air aisles which is a main source of inefficiencies.

Heat exchange systems

13

Data center energy efficiency

14

Power usage effectiveness (PUE) reflects the quality of the datacenter building infrastructure itself, and captures the ratio of total building power to IT power.

survey results of over 1100 datacenters in 2012

Check Google’s green web http://www.google.com/green/bigpicture/

Need for energy-proportional computing

15

Average activity distribution of a sample of two Google clusters, each containing over 20,000 servers, over a period of 3 months ( January-March 2013).

Variable utilization creates server idleness à wasted power consumption à need for mechanism to adjust the power consumption based on the load of servers

Software

•  Platform-level software –  Operating system –  Virtual machines

•  Cluster-level infrastructure software –  Resource management –  Hardware abstraction (e.g., global distributed file systems and

message passing) –  Programming frameworks

•  Applications –  Transactional à data centers –  Batch à supercomputers

16

Supercomputer and data centers networks

•  Data centers use Ethernet, which is good enough for request-level parallelism with limited inter-thread communication.

•  Supercomputer use Infiniband and other proprietary networks which enable various possible network topologies to facilitate quick communication and synchronization among threads on various servers.

17

Indirect and direct networks

18

or

or

or

Direct Network Node: computing + switch

Indirect Network Network as a “box” with no servers

Network metrics

•  Network size: number of nodes •  Node degree: number of ports for each switch (switch complexity)

•  Network diameter: Longest shortest path between any two nodes in the network

•  Network bandwidth: best-case total bandwidth

•  Bisection bandwidth: worst-case total bandwidth when half of nodes is communicating with other half

19

Indirect networks: crossbar switch

20

•  Offer best latency as diameter is just 1 •  Point-to-point connections between N pair of nodes can be

established as long as the pairs are distinct. •  Bandwidth scales linearly with the number of nodes •  Size is not scalable as number of switches grows quadratically

with the number of nodes.

Indirect networks: Omega and butterfly

21

Omega network butterfly network

•  Number of switches = N/2 log N à addresses size scalability problems of crossbar.

•  Diameter grows as log N. •  Omega network uses perfect-shuffle exchange •  More prone to contention. Consider 0 communication with 5 and

4 wants to communicate with 6 in Omega.

Indirect networks: trees

22

Binary tree

How to build a fat tree with skinny switches?

Number of switches = N -1 Simple structure Bisection BW = 1

Fat tree

Links do not have the same BW Links get thicker (i.e. more BW) as we get to root

Direct networks: linear arrays and rings

23

•  Great aggregate BW à can exchange N-1 messages at a time •  Bisection BW = 1 for array and 2 for ring •  Layout of ring can be done to get short links

Direct networks: meshes and tori

24

mesh 2D torus 3D torus

•  Improves bisection BW and diameter •  Increases number of links; requires more ports per switch (i.e.,

degree increases) •  What is the diameter?

Designing a torus topology for HPC cluster •  Cluster of 192 blades. Each server is 10U

chassis with 16 blades and a switch with 32 ports

•  Racks: 12 10U chassis à 4 chassis per rack and a total of 3 racks

•  2D torus: 4x3 torus. Each switch has 32 ports à 16 ports to the blades and 16 ports to each of the neighbors (4 cables each)

25

[example from clusterdesign.org]

Direct networks: hypercubes

26

•  n-dimensional hypercubes •  Number of nodes = 2n.

•  One hop to n nodes à n+1 ports per switch. Diameter = n. •  What is the bisection bandwidth?

Summary

27

Supercomputers and data centers issues: •  Performance evaluation •  Server makeup •  Storage hierarchy •  Network topologies •  Support infrastructure for cooling and power delivery •  Power consumption concerns