Fault tolerance, malleability and migration for divide-and ...

Fault tolerance, malleability and migration for divide-and-conquer

applications on the Grid

Gosia Wrzesińska, Rob V. van Nieuwpoort, Jason Maassen, Henri E. Bal

vrije Universiteit Ibis

Distributed supercomputing

• Parallel processing on geographically distributed computing systems (grids)

• Needed:– Fault-tolerance: survive node crashes

(also entire clusters)– Malleability: add or remove

machines at runtime– Migration: move a running

application to another set of machines (comes for free with malleability)

• We focus on divide-and-conquer applications

LeidenDelft

Internet

Berlin

Outline

• The Ibis grid programming environment• Satin: a divide-and-conquer framework

• Fault-tolerance, malleability and migration in Satin• Performance evaluation

The Ibis system• Java-centric => portability

– „write once, run anywhere”

• Efficient communication– Efficient pure Java implementation– Optimized solutions for special cases with native code

• High level programming models:– Divide & Conquer (Satin)– Remote Method Invocation (RMI)– Replicated Method Invocation (RepMI)– Group Method Invocation (GMI)

http://www.cs.vu.nl/ibis/

Satin: divide-and-conquer on the Grid

• Effective paradigm for Grid applications (hierarchical)

• Satin: Grid-friendly load balancing (aware of cluster hierarchy)

• Missing support for– Fault tolerance– Malleability– Migration

fib(1) fib(0) fib(0)

fib(0)

fib(4)

fib(1)

fib(2)

fib(3)

fib(5)

fib(1) fib(1)

fib(1)

fib(2) fib(2)cpu 2

cpu 1cpu 3

Example: Fibonacci

class Fib { int fib (int n) {

if (n < 2) return n;int x = fib(n-1);int y = fib(n-2);return x + y;

fib(1) fib(0) fib(0)

fib(0)

fib(4)

fib(1)

fib(2)

fib(3)

fib(5)

fib(1) fib(1)

fib(1)

fib(2) fib(2)

Single-threaded Java

Example: Fibonacci

public interface FibInter extends ibis.satin.Spawnable {

public int fib (int n);}

class Fib extends ibis.satin.SatinObject implements FibInter { public int fib (int n) {

if (n < 2) return n;int x = fib(n-1); /*spawned*/int y = fib(n-2); /*spawned*/sync();return x + y;

LeidenDelft

Internet

Berlin

Compiling Satin programs

Javacompiler

bytecoderewriter JVM

source bytecodebytecode

Executing Satin programs

• Spawn: put work in work queue

• Sync: – Run work from queue– If empty: steal (load balancing)

Example application: Fibonacci

Fib(3) Fib(2)

Fib(4) Fib(3)

Fib(2)

Fib(1)

Fib(0)

Fib(1) Fib(0) Fib(1)

Fib(2) Fib(1)

Fib(0)

Fib(5)

processor 1(master)

processor 2

processor 3 2

Satin: load balancing for Grids

• Random Stealing (RS)– Pick a victim at random

– Provably optimal on a single cluster (Cilk)– Problems on multiple clusters:

• (C-1)/C % stealing over WAN

• Synchronous protocol

Grid-aware load balancing

• Cluster-aware Random Stealing (CRS) [van Nieuwpoort et al., PPoPP 2001]– When idle:

• Send asynchronous steal request to random node in different cluster

• In the meantime steal locally (synchronously)• Only one wide-area steal request at a time

Satin raytracer on a grid• GridLab testbed: 5 cities in

Europe• 40 cpus

• Distance up to 2000km• Factor of 10 difference in CPU

speeds• Latencies:

– 0.2 – 210 ms daytime

– 0.2 – 66 ms night

• Bandwidth:– 9KB/s – 11MB/s

• Three orders of magnitude difference in communication speeds

Configuration

1 x 16MIPSIrixSMPZIB Berlin,

Germany

1 x 4AlphaTru64SMPLecce,

1 x 2Pentium-3LinuxSMPCardiff,

Wales, UK

4 x 2XeonLinuxClusterBrno,

Czech Republic

1 x 2SparcSolarisSMPAmsterdam,

The Netherlands

8 x 1Pentium-3LinuxClusterAmsterdam,

The Netherlands

CPUsCPUOSTypeLocation

CRS performance on GridLab testbed

single cluster RS night CRS night RS daytime CRS daytime

Satin summary

• Satin allows rapid development of parallel applications which are able to run with high efficiency in geographically distributed and highly heterogeneous environments

• Applications:– Barnes-Hut, Raytracer, SAT solver, TSP, Knapsack– All master-worker algorithms

• Still missing: FT, malleability, migration

Fault-tolerance, malleability, migration

• Can be implemented by handling processors joining or leaving the ongoing computation

• Processors may leave either unexpectedly (crash) or gracefully

• Handling joining processors is trivial:– Let them start stealing jobs

• Handling leaving processors is harder:– Recompute missing jobs– Problems: orphan jobs, partial results from gracefully

leaving processors

Crashing processors

processor 1

processor 2

processor 3

Crashing processors

processor 1

processor 3

Crashing processors

processor 1

processor 3

Crashing processors

processor 1

processor 3

Problem: orphan jobs – jobs stolen from crashed processors

Crashing processors

processor 1

processor 3

Problem: orphan jobs – jobs stolen from crashed processors

Handling orphan jobs• For each finished orphan, broadcast (jobID,processorID)

tuple; abort the rest• All processors store tuples in orphan tables• Processors perform lookups in orphan tables for each

recomputed job• If successful: send a result request to the owner (async), put

the job on a stolen jobs list

processor 3

broadcast(9,cpu3)(15,cpu3)

Handling orphan jobs - example

processor 1

processor 2

processor 3

10 12 13

processor 1

processor 3

processor 1

processor 3

processor 1

processor 3

(9, cpu3)(15,cpu3)

1 cpu315 cpu3

processor 1

processor 3

1 cpu315 cpu3

processor 1

processor 3

1 cpu315 cpu3

processor 1

processor 3

1 cpu315 cpu3

Processors leaving gracefully

processor 1

processor 2

processor 3

10 12 13

processor 1

processor 2

processor 3

Send results to another processor; treat those results as orphans14

10 12 13

processor 1

processor 3

Send results to another processor; treat those results as orphans

processor 1

processor 3(11,cpu3)(9,cpu3)(15,cpu3)

11 cpu39 cpu3

15 cpu3

processor 1

processor 3

11 cpu39 cpu3

15 cpu3

processor 1

processor 3

11 cpu39 cpu3

15 cpu3

A crash of the master

• Master: the processor that started the computation by spawning the root job

• If master crashes:– Elect a new master– Execute normal crash recovery– New master restarts the applications– In the new run, all results from the previous run are reused

Some remarks about scalability

• Little data is broadcast (< 1% jobs, pointers)

• Message combining

• Lightweight broadcast: no need for reliability, synchronization, etc.

Performance evaluation

• Leiden, Delft (DAS-2) + Berlin, Brno (GridLab)• Bandwidth:

62 – 654 Mbit/s• Latency:

2 – 21 ms

Impact of saving partial results

wide-area DAS-2 GridLab

1 cluster leavesunexpectedly (withoutsaving orphans)

1 cluster leavesunexpectedly (withsaving orphans)

1 cluster leavesgracefully

1.5/3.5 clusters (nocrashes)

16 cpus Leiden 16 cpus Delft

8 cpus Leiden, 8 cpus Delft4 cpus Berlin, 4 cpus Brno

Migration overhead

with migration

without migration

8 cpus Leiden 4 cpus Berlin4 cpus Brno(Leiden cpus replaced by Delft)

Crash-free execution overhead

Raytracer TSP SAT solver Knapsack

plain Satin malleable Satin

Used: 32 cpus in Delft

Summary

• Satin implements fault-tolerance, malleability and migration for divide-and-conquer applications

• Save partial results by repairing the execution tree

• Applications can adapt to changing numbers of cpus and migrate without loss of work (overhead < 10%)

• Outperform traditional approach by 25%• No overhead during crash-free execution

Further information

http://www.cs.vu.nl/ibis/

Publications and a software distribution available at:

Additional slides

Ibis design

Partial results on leaving cpus

If processors leave gracefully:

• Send all finished jobs to another processor

• Treat those jobs as orphans = broadcast (jobID, processorID) tuples

• Execute the normal crash recovery procedure

Job identifiers

• rootId = 1

• childId = parentId * branching_factor + child_no

• Problem: need to know maximal branching factor of the tree

• Solution: strings of bytes, one byte per tree level

Distributed ASCI Supercomputer (DAS) – 2

Node configuration

Dual 1 GHz Pentium-III>= 1 GB memory100 Mbit Ethernet +(Myrinet)Linux

VU (72 nodes)UvA (32)

Leiden (32) Delft (32)

GigaPort(1-10 Gb)

Utrecht (32)

GridLab testbed

interface FibInter extends ibis.satin.Spawnable {

public int fib(long n);}

class Fib extends ibis.satin.SatinObject implements FibInter { public int fib (int n) {

if (n < 2) return n;int x = fib (n - 1);int y = fib (n - 2);sync();return x + y;

Java + divide&conquer

Example

Grid results

67 %223Compression

88 %285SAT-solver

81 %405Raytracer

EfficiencyCPUssitesProgram

• Efficiency based on normalization to single CPU type (1GHz P3)

Fault tolerance, malleability and migration for divide-and ...

Documents

Transcript of Fault tolerance, malleability and migration for divide-and ...

Volatility liquidity and malleability Replicating the art of the 1960s … · 2017. 3. 18. · 1 Volatility, Liquidity and Malleability: Replicating the Art of the 1960s Bryony Rose

Malleability and Measurement of Army Leader Attributes · 2.7. Summary of Research on Malleability and Common Mae us t ce l ntl eI : s e r..... 42. xii Malleability and Measurement

Introduction to Standard Geometrical Tolerance to Standard Geometrical Tolerance ... •Specific Tolerances for Slip and Press Fits •Machine Tolerances •Tolerance ... tolerance

Integration and exploitation of intra-routine malleability ...

Viscosity Conductivity Malleability Hardness Melting point Boiling point Density Ductility Magnetism.

Round Optimal Concurrent Non-Malleability from … Optimal Concurrent Non-Malleability from Polynomial Hardness Dakshita Khurana UCLA dakshita@cs.ucla.edu Abstract Non-malleable commitments

Adapting to Family Setbacks: Malleability of Students’ and ...majorsmatter.net/family/readings/Renzulli_Barr_2017.pdf · Adapting to Family Setbacks: Malleability of Students’

Creativity and Society - Quintessential Infovore...Creativity and Society Tolerance for Creativity • Societies differ in – Degree of tolerance – Domains of tolerance • Tolerance

Warm Up Divide using long division ÷ Divide.

Time Bending: Temporal Malleability and Organizational ...

HOST TOLERANCE A secreted bacterial peptidoglycan hydrolase enhances tolerance … · HOST TOLERANCE A secreted bacterial peptidoglycan hydrolase enhances tolerance to enteric pathogens

Malleability, Misrepresentation, Manipulation: The ...

Controlled malleability. Sanitizable Signatures

IEEE TRANSACTIONS ON COMPUTERS 1 SMT Malleability in IBM …people.ac.upc.edu/fcazorla/articles/amorari_ToC_2012.pdf · 2013-02-18 · IEEE TRANSACTIONS ON COMPUTERS 1 SMT Malleability

Digital Divide Draft Sept 19. Digital Divide Definition - Digital Divide.

ZEROING OUT ZERO TOLERANCE: ELIMINATING ZERO TOLERANCE ... · PDF fileZEROING OUT ZERO TOLERANCE: ELIMINATING ZERO TOLERANCE POLICIES IN ... B. Congressional Prompting of Zero Tolerance

Section 2.2 Physical Properties. Viscosity Conductivity Malleability Hardness Melting Point Boiling Point Density.

Limits to Non-Malleability · 2019-12-19 · Limits to Non-Malleability Marshall Ball1, Dana Dachman-Soled2, Mukul Kulkarni3, and Tal Malkin1 1 Columbia University fmarshall,talg@cs.columbia.edu

Fault tolerance, malleability and migration for divide-and-conquer applications on the Grid

Adaptationism, culture and the malleability of human nature · 2006-06-22 · 1 Adaptationism, culture and the malleability of human nature Chandra Sekhar Sripada A traditional family