Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern...

78
Eitan Frachtenberg MIT, 20-Sep- 2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using Modern Interconnects Designing Parallel Operating Systems using Modern Interconnects Eitan Frachtenberg ([email protected]) With Fabrizio Petrini, Juan Fernandez, Dror Feitelson, Jose-Carlos Sancho, Kei Davis Computer and Computational Sciences Division Los Alamos National Laboratory Ideas that change the world

Transcript of Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern...

Page 1: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

1

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Designing Parallel Operating Systems

using Modern Interconnects

Designing Parallel Operating Systems using Modern Interconnects

Eitan Frachtenberg ([email protected])

With

Fabrizio Petrini, Juan Fernandez, Dror Feitelson, Jose-Carlos Sancho, Kei Davis

Computer and Computational Sciences DivisionLos Alamos National Laboratory

Ideas that change the world

Page 2: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

2

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Cluster Supercomputers

Growing in prevalence and performance, 7 out of 10 top supercomputers

Running parallel applications Advanced, high-end interconnects

Page 3: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

3

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Distributed vs. Parallel

Distributed and parallel applications (including operating systems) may be distinguished by their use of global and collective operations

Distributed—local information, relatively small number of point-to-point messages

Parallel—global synchronization: barriers, reductions, exchanges

Page 4: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

4

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

System Software Components

JobScheduling

Fault Tolerance Parallel

I/O

CommunicationLibrary

ResourceManagement

SystemSoftware

SystemSoftware

Page 5: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

5

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Problems with System Software

Independent single-node OS (e.g. Linux) connected by distributed dæmons: Redundant components Performance hits Scalability issues Load balancing issues

Page 6: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

6

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

OS’s Collective Operations

Many OS tasks are inherently global or collective operations:

Job launching, data dissemination Context switching Job termination (normal and forced) Load balancing

Page 7: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

7

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Local

Operating

System

Resource

Management

Parallel

I/O

Fault Tolerance

Job Scheduling

User-Level

Communication

Local

Operating

System

Resource

Management

Parallel

I/O

Fault Tolerance

Job Scheduling

User-Level

Communication

Node 1 Node 2

Global Parallel Operating System

Job Scheduling Fault Tolerance Communication Parallel I/O Resource Mgmt

Page 8: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

8

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

The Vision

Modern interconnects are very powerful collective operations programmable NICs on-board RAM

Use a small set of network mechanisms as parallel OS infrastructure

Build upon this infrastructure to create unified system software

System software Inherits scalability and performance from network features

Page 9: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

9

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Example: ASCI Q Barrier [HotI’03]

Page 10: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

10

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Parallel OS Primitives

System software built atop three primitives Xfer-And-Signal

Transfer block of data to a set of nodes Optionally signal local/remote event upon completion

Compare-And-Write Compare global variable on a set of nodes Optionally write global variable on the same set of nodes

Test-Event Poll local event

Page 11: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

11

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Core Primitives on QsNet

System software built atop three primitives Xfer-And-Signal (QsNet):

Node S transfers block of data to nodes D1, D2, D3 and D4

Events triggered at source and destinations

S D1 D2D4D3

SourceEvent

DestinationEvents

Page 12: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

12

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Core Primitives (cont.)

System software built atop three primitives Compare-And-Write (QsNet):

Node S compares variable V on nodes D1, D2, D3 and D4

S D1 D2D4D3

•Is V {, , >} to Value?

Page 13: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

13

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Core Primitives (cont.)

System software built atop three primitives Compare-And-Write (QsNet):

Node S compares variable V on nodes D1, D2, D3 and D4

Partial results are combined in the switches

S D1 D2D4D3

Page 14: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

14

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

System Software Components

JobScheduling

Fault Tolerance Parallel

I/O

CommunicationLibrary

ResourceResourceManagementManagement

SystemSoftware

SystemSoftware

Page 15: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

15

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

- Inherits scalability from network primitives:- Data dissemination and coordination- Interactive job launching speeds- Context-switching at milliseconds level

- Described in [SC’02]

Scalable Tool for Resource Management

Page 16: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

16

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

State of the Art in ResourceManagement

Resource managers (e.g. PBS, LSF, RMS, LoadLeveler, Maui) are typically implemented using TCP/IP—favors portability over performance, Poorly-scaling algorithms for the distribution/collection of data

and control messages Favoring development time over performance

Scalable performance not important for small clusters but crucial for large ones.

There exists a need for fast and scalable resource management.

Page 17: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

17

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Experimental Setup

64 nodes/256 processors ES40 Alphaserver cluster 2 independent network rails of Quadrics Elan3 Files are placed in ramdisk in order to avoid I/O

bottlenecks and expose the performance of the resource management algorithms

Page 18: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

18

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Launch Times (Unloaded System)

The launch time is constant when we increase the number of processors.

STORM is highly scalable

Page 19: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

19

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Launch Times (Loaded System, 12 MB)

Worst case: 1.5seconds to launch a 12 MB file on 256 processors

Page 20: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

20

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Measured and Estimated Launch Times

The model shows that in an ES40-based Alphaserver a 12MB binary can be launched in 135ms on 16,384 nodes

Page 21: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

21

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Comparative Evaluation(Measured & Modeled)

Page 22: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

22

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

System Software Components

JobJobSchedulingScheduling

Fault Tolerance Parallel

I/O

CommunicationLibrary

ResourceManagement

SystemSoftware

SystemSoftware

Page 23: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

23

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Job Scheduling

Controls the allocation of space and time resources to jobs

HPC apps have special requirements Multiple processing and network resources Synchronization ( < 1ms granularity) Potentially memory hogs with little locality

Has significant effect on throughput, responsiveness, and utilization

Page 24: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

24

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

First-Come-First-Serve (FCFS)

Page 25: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

25

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Gang Scheduling (GS)

Page 26: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

26

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Implicit CoScheduling

Page 27: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

27

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Hybrid Methods

Combine global synchronization & local information Rely on scalable primitives for global coordination

and information exchange First implementation of two novel algorithms:

Flexible CoScheduling (FCS) Buffered CoScheduling (BCS)

Page 28: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

28

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Flexible CoScheduling (FCS)

Measure communication characteristics, such as granularity and wait times

Classify processes based on synchronization requirements

Schedule processes based on class Described in [IPDPS’03]

Page 29: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

29

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

FCS Classification

Granularity

Block times

Fine Coarse

Short Long

CS

Always gang-scheduled

F

Preferably gang-scheduled

DC

Locally scheduled

Page 30: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

30

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Methodology

Synthetic, controllable MPI programs Workload

Static: all jobs start together Dynamic: different sizes, arrival and run times

Various schedulers implemented: FCFS, GS, FCS, SB (ICS), BCS

Emulation vs. simulation Actual implementation takes into account all the

overhead and factors of a real system

Page 31: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

31

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Hardware Environment

Environment ported to three architectures and clusters: Crescendo: 32x2 Pentium III, 1GB Accelerando: 32x2 Itanium II, 2GB Wolverine: 64x4 Alpha ES40, 8GB

Page 32: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

32

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Synthetic Application

Bulk synchronous, 3ms basic granularity Can control: granularity, variability and

Communication pattern

Page 33: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

33

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Synthetic Scenarios

Balanced Complementing Imbalanced Mixed

Page 34: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

34

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Turnaround Time

0

50

100

150

200

250

300

350

400

Balanced Imbalanced Complementing Mixed

FCFS GS SB FCS Optimal

Page 35: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

35

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Dynamic Workloads [JSSPP’03]

Static workloads are simple and offer insights, but are not realistic

Most real-life workloads are more complex Users submit jobs dynamically, of varying time and

space requirements

Page 36: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

36

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Dynamic Workload Methodology

Emulation using a workload model [Lublin03] 1000 jobs, approx. 12 days, shrunk to 2 hrs Varying load by factoring arrival times Using same synthetic application, with random:

Arrival time, run time, and size, based on model Granularity (fine, medium, coarse) communication pattern (ring, barrier, none)

Recent study with scientific apps (yet unpublished)

Page 37: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

37

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Load – Response Time

Page 38: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

38

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Load – Bounded Slowdown

Page 39: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

39

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Timeslice – Response Time

Page 40: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

40

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

System Software Components

JobScheduling

Fault Tolerance Parallel

I/O

CommunicationCommunicationLibraryLibrary

ResourceManagement

SystemSoftware

SystemSoftware

Page 41: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

41

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Buffered CoScheduling (BCS)

Buffer all communications Exchange information about pending

communication every time slice Schedule and execute communication Implemented mostly on the NIC Requires fine-grained heartbeats Described in [SC’03]

Page 42: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

42

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Design and Implementation

Global synchronization Strobe sent at regular intervals (time slices)

Compare-And-Write + Xfer-And-Signal (Master) Test-Event (Slaves)

All system activities are tightly coupled Global Scheduling

Exchange of communication requirements Xfer-And-Signal + Test-Event

Communication scheduling Real transmission

Xfer-And-Signal + Test-Event

Page 43: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

43

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Design and Implementation

Implementation in the NIC Application processes interact with NIC threads

MPI primitive Descriptor posted to the NIC Communications are buffered

Cooperative threads running in the NIC Synchronize Partial exchange of control information Schedule communications Perform real transmissions and reduce computations

Comp/comm completely overlapped

Page 44: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

44

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Design and Implementation

Non-blocking primitives: MPI_Isend/Irecv

Page 45: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

45

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Design and Implementation

Blocking primitives: MPI_Send/Recv

Page 46: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

46

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Performance Evaluation

BCS MPI vs. Quadrics MPI Experimental Setup

Benchmarks and Applications• NPB (IS,EP,MG,CG,LU) - Class C• SWEEP3D - 50x50x50• SAGE - timing.input

Scheduling parameters• 500μs communication scheduling time slice (1 rail)• 250μs communication scheduling time slice (2 rails)

Page 47: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

47

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Performance Evaluation

Benchmarks and Applications (C)

Application Slowdown

IS (32PEs) 10.40%

EP (49PEs) 5.35%

MG (32PEs) 4.37%

CG (32PEs) 10.83%

LU (32PEs) 15.04%

SWEEP3D (49PEs) -2.23%

SAGE (62PEs) -0.42%

Page 48: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

48

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Performance Evaluation

SAGE - timing.input (IA32)

0.5% SPEEDUP

Page 49: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

49

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Blocking Communication

Blocking vs. Non-blocking SWEEP3D (IA32)MPI_Send/Recv MPI_Isend/Irecv + MPI_Waitall

Page 50: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

50

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

System Software Components

JobScheduling

Fault Fault ToleranceTolerance Parallel

I/O

CommunicationLibrary

ResourceManagement

SystemSoftware

SystemSoftware

Page 51: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

51

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Fault Tolerance Today

Fault tolerance is commonly achieved, if at all, by Checkpointing Segmentation of the machine Removal of fault-prone components

Massive hardware redundancy is not considered economically feasible

Page 52: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

52

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Our Approach to Fault Tolerance

Recent work shows that scalable, system-level fault-tolerance is within reach with current technology, with low overhead, can be achieved through a global operating system

Two results provide the basis for this claim1. Buffered CoScheduling that enforces frequent,

global recovery lines and global control

2. Feasibility of incremental checkpoint

Page 53: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

53

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Checkpointing and Recovery

Simplicity Easy implementation

Cost-effective No additional hardware support

Critical aspect: Bandwidth requirements

Saving process state

Page 54: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

54

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Reducing Bandwidth

Incremental checkpointing Only the memory modified from the previous

checkpoint is saved to stable storage

Full

Process state

Incremental

Page 55: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

55

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Enabling Automatic Checkpointing

Low

User intervention Checkpoint data

Low

Hardware

Operating system

Run-time library

Application

High

High

automatic

Page 56: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

56

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

The Bandwidth Challenge

Does the current technology provide enough bandwidth?

• Frequent• Automatic

Page 57: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

57

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Methodology

Quantifying the Bandwidth Requirements Checkpoint intervals: 1s to 20s Comparing with the current bandwidth available

900 MB/s

75 MB/s

Sustained network bandwidthQuadrics QsNet II

Single sustained disk bandwidthUltra SCSI controller

Page 58: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

58

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Memory Footprint

Sage-1000MB 954.6MB

Sage-500MB 497.3MB

Sage-100MB 103.7MB

Sage-50MB 55MB

Sweep3D 105.5MB

SP Class C 40.1MB

LU Class C 16.6MB

BT Class C 76.5MB

FT Class C 118MB

Increasing memory footprint

64 Itanium II processors

Page 59: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

59

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Bandwidth Requirements

0

50

100

150

200

250

300

1 5 10 20

Maximum Average

Ban

dw

idth

(M

B/s

)

Timeslices (s)

78.8MB/s 12.1MB/

s

Decreases with the timeslicesSage-1000MB

Page 60: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

60

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

0

50

100

150

200

250

300

Sage1000MB

Sage 500MB

Sage 100MB

Sage 50MB

Sweepd3D SP LU BT FT

Maximum Average

Bandwidth Requirementsfor 1 second

Increases with memory footprint

Single SCSI disk performance

Most demanding

Page 61: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

61

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Increasing Memory Footprint Size

0102030405060708090

1 5 10 20

50MB

100MB

500MB

1000MB

Ave

rage

Ban

dw

idth

(M

B/s

)

Timeslices (s)

Increases sublinearly

Page 62: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

62

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Increasing Processor Count

0102030405060708090

1 5 10 20

8 16 32 64

Ave

rage

Ban

dw

idth

(M

B/s

)

Timeslices (s)

Decreases slightly with processor count

Weak-scaling

Page 63: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

63

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Technological Trends

0102030405060708090

100

Processor Memory Storage Network

Performance of applications bounded by memory improvements

Increases at a faster

pace

Per

form

ance

Im

pro

vem

ent

per

year

Page 64: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

64

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Conclusions

As clusters grow, interconnection technology advances: Better bandwidth and latency On-board programmable processor, RAM Hardware support for collective operations

Allows the development of common system infrastructure that is a parallel program in itself

Page 65: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

65

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Conclusions (cont.)

On top of infrastructure we built: Scalable resource management (STORM) Novel job scheduling algorithms Simplified system design and communication library Possible basis for transparent fault tolerance

Page 66: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

66

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Conclusions (cont.)

Experimental performance evaluation demonstrates: Scalable interactive job launching and context-

switching Multiprogramming parallel jobs is feasible Adaptive scheduling algorithms adjust to different job

requirements, improving response times and slowdown in various workloads

Transparent, frequent checkpoint within current reach

Page 67: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

67

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

References

Eitan’s web page

http://www.cs.huji.ac.il/~etcs/pubs/

Fabrizio’s web page

http://www.c3.lanl.gov/~fabrizio/publications.html

PAL team web page:

http://www.c3.lanl.gov/par_arch/Publications.html

Page 68: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

68

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Resource Overlapping

Page 69: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

69

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Turnaround Time

Page 70: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

70

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Response Time

Page 71: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

71

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Timeslice – Bounded Slowdown

Page 72: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

72

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

FCFS vs. GS and MPL

Page 73: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

73

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

FCFS vs. GS and MPL (2)

Page 74: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

74

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Backfilling

Backfilling is a technique to move jobs forward in queue

Can be combined with time-sharing schedulers such as GS when all timeslots are full

Page 75: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

75

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Backfilling

Backfilling is a technique to move jobs forward in queue

Can be combined with time-sharing schedulers such as GS when all timeslots are full

Page 76: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

76

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Effect of Backfilling

Page 77: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

77

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Characterization

Data initializationRegular

processing bursts

Sage-1000MB

Page 78: Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Eitan Frachtenberg MIT, 20-Sep-2004

78

P AL

Designing Parallel Operating Systems using Modern Interconnects

CCS-3

Communication

Interleaved

Sage-1000MB

Regular communication

bursts