Writing Fault-Tolerant Applications Using Resilient...
Transcript of Writing Fault-Tolerant Applications Using Resilient...
© 2014 IBM Corporation
Writing Fault-Tolerant Applications Using Resilient X10 2014/06/12 Kiyokuni Kawachiya (presentation by Mikio Takeuchi) IBM Research - Tokyo
X10 Workshop 2014 at Edinburgh, UK
This work was funded in part by the Air Force Office of Scientific Research under Contract No. FA8750-13-C-0052.
© 2014 IBM Corporation
IBM Research - Tokyo
2 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
Writing Fault-Tolerant Applications Using Resilient X10Contents
X10 overview
– Extension for fault tolerance ... Resilient X10
X10 applications with fault tolerance
– Computing π with the Monte Carlo method ... MontePi– Clustering points by K-Means ... Kmeans– Computing heat diffusion ... HeatTransfer
Evaluation– Modification size, performance, and fault tolerance
Summary
© 2014 IBM Corporation
IBM Research - Tokyo
3 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
Resilient X10 – Extension for Fault Tolerance
Stores activities’ critical information in a “Resilient Storage”
– The most stable implementation uses Place 0 for this purpose
Throws a new exception, DeadPlaceException (DPE), for a place death
– If an activity is being moved to the place, corresponding at throws the DPE– If an activity is asynchronously executed in the dead place by async,
governing finish will throw a MultipleExceptions which contains the DPE
A simple fault-tolerant program which just reports node failuresclass ResilientExample {
public static def main(Rail[String]) {finish for (pl in Place.places()) async {
try {at (pl) do_something(); // parallel distributed execution
} catch (e:DeadPlaceException) {Console.OUT.println(e.place + " died"); // report failure
}} // end of finish, wait for the execution in all places
} }
If the target place (pl) dies, at statement throws a DPE, which is caught here
Fig 3
→next slide
© 2014 IBM Corporation
IBM Research - Tokyo
4 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
Utilization of the Rooted Exception Model
In X10, exceptions thrown from asynchronous activities can be caught
The finish governing the activity (async) receives the exception(s), and throws a MultipleExceptions ... Rooted Exception Model By enclosing a finish with try~catch, async exceptions can be caught
• DeadPlaceException can be caught with the same mechanism
class HelloWorld {public static def main(args:Rail[String]) {
finish for (pl in Place.places()) {at (pl) async { // parallel distributed exec in each place
Console.OUT.println("Hello from " + here);do_something();
}} // end of finish, wait for the execution in all places
}}
class HelloWorld {public static def main(args:Rail[String]) {
finish for (pl in Place.places()) {at (pl) async { // parallel distributed exec in each place
Console.OUT.println("Hello from " + here);do_something();
}} // end of finish, wait for the execution in all places
}}
try {
} catch (es:MultipleExceptions) { for (e in es.exceptions()) ... }
Fig 2
How to catch exceptions thrown asynchronously?
© 2014 IBM Corporation
IBM Research - Tokyo
5 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
Writing Fault-Tolerant Applications
The DeadPlaceException notification (and some support methods) are sufficient to add fault tolerance to existing distributed X10
However, it is necessary to understand the structure of each application– How the application is doing the distributed processing?– How the execution can be continued after a node failure?
Introduce three methods to add fault tolerance(a) Ignore failures and use the results from the remaining nodes(b) Reassign the failed node’s work to the remaining nodes(c) Restore the computation from a periodic snapshot. (b)+checkpointing
© 2014 IBM Corporation
IBM Research - Tokyo
6 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
(a) MontePi – Computing π with the Monte Carlo Method
Overview
Try ITERS times at each place, and update the result at Place 0
Place death is simply ignored
– The result may become less accurate, but it is still valid
class ResilientMontePi {public static def main (args:Rail[String]) {
:finish for (p in Place.places()) async {
try {at (p) {
val rnd = new x10.util.Random(System.nanoTime());var c:Long = 0;for (iter in 1..ITERS) { // ITERS trials per place
val x = rnd.nextDouble(), y = rnd.nextDouble();if (x*x + y*y <= 1.0) c++; // if inside the circle
}val count = c;at (result) atomic { // update the global result
val r = result(); r() = Pair(r().first+count, r().second+ITERS);} }
} catch (e:DeadPlaceException) { /* just ignore place death */ }} // end of finish, wait for the execution in all places/* calculate the value of π and print it */
} }
In case of node failure
Fig 4
© 2014 IBM Corporation
IBM Research - Tokyo
7 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
(b) KMeans – Clustering Points by K-MeansOverview
Each place processes assigned points, and iterates until conver- gence
Don’t assign the work to dead place(s)
– The work is reassigned to remaining places
Place death is ignored– Partial results are
still utilized
class ResilientKMeans {public static def main(args:Rail[String]) {
:for (iter in 1..ITERATIONS) { // iterate until convergence
/* deliver current cluster values to other places */val numAvail = Place.MAX_PLACES - Place.numDead();val div = POINTS / numAvail; // share for each placeval rem = POINTS % numAvail; // extra share for Place 0var start:Long = 0; // next point to be processedtry {
finish for (pl in Place.places()) {if (pl.isDead()) continue; // skip dead place(s)var end:Long = start+div; if (pl==place0) end+=rem;at (pl) async { /* process [start,end), and return the data */ }start = end;
} // end of finish, wait for the execution in all places} catch (es:MultipleExceptions) { /* just ignore place death */ }/* compute new cluster values, and exit if converged */
} // end of for (iter)/* print the result */
} }Assign the work only to live nodes
Fig 5
© 2014 IBM Corporation
IBM Research - Tokyo
8 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
(c) HeatTransfer – Computing Heat DiffusionOverview
A 2D DistArray holds the heat values of grid points
Each place computes heat diffusion for its local elements
Upon place death, the DistArray is restored from the snapshot
Create a snapshot of the DistArray at every 10th iteration
class ResilientHeatTransfer {static val livePlaces = new ArrayList[Place]();static val restore_needed = new Cell[Boolean](false);public static def main(args:Rail[String]) {
: val A = ResilientDistArray.make[Double](BigD, ...); // create a DistArrayA.snapshot(); // create the initial snapshotfor (iter in 1..ITERATIONS) { // iterate until convergence
try {if (restore_needed()) { // if some places died
val livePG = new SparsePlaceGroup(livePlaces.toRail()));BigD = Dist.makeBlock(BigR, 0, livePG); // recreate Dist, andA.restore(BigD); // restore elements from the snapshotrestore_needed() = false;
}finish ateach (z in D_Base) { // distributed processing
/* compute new heat values for A's local elements */Temp = ((at (A.dist(x-1,y)) A(x-1,y)) + (at (A.dist(x+1,y)) A(x+1,y))
+ (at (A.dist(x,y-1)) A(x,y-1)) + (at (A.dist(x,y+1)) A(x,y+1)))/4;}/* if converged, exit the for loop */if (iter % 10 == 0) A.snapshot(); // create a snapshot at every 10th iter.
} catch (e:Exception) { processException(e); }} // end of for (iter)/* print the result */
} }
Remove the dead place from the livePlaces list and set the restore_needed flag
Fig 7
© 2014 IBM Corporation
IBM Research - Tokyo
9 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
Resilient DistArray – Fault Tolerant DistArray
An extended DistArray which supports snapshot and reconfiguration
public class ResilientDistArray[T] ... {public static def make[T](dist:Dist, init:(Point)=>T) : ResilientDistArray[T];public static def make[T](dist:Dist){T haszero} : ResilientDistArray[T];public final operator this(pt:Point) : T; // read elementpublic final operator this(pt:Point)=(v:T) : T; // set elementpublic final def map[S,U](dst:ResilientDistArray[S], src:ResilientDistArray[U],
filter:Region, op:(T,U)=>S) : ResilientDistArray[S];public final def reduce(op:(T,T)=>T, unit:T) : T;
:// Create a snapshotpublic def snapshot() { snapshot_try();snapshot_commit(); }public def snapshot_try() : void;public def snapshot_commit() : void;// Reconstruct the DistArray with new Distpublic def restore(newDist:Dist) : void;public def remake(newDist:Dist, init:(Point)=>T) : void;public def remake(newDist:Dist){T haszero} : void;
}
Interface overview
Normal DistArray interfaces, plus the followings:
snapshot()
Dump the element values into the Resilient Storage
restore(newDist)
Reconstruct the DistArray over live places, and restore the snapshot
Added I/F
Fig 6
© 2014 IBM Corporation
IBM Research - Tokyo
10 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
Evaluation – Modification Amount
Modifications necessary to add fault tolerance to the base codes
Consideration
Fault tolerance could be added with very small modifications
– The modified code can still run on standard X10, as long as node failure does not occur
Modification ratio of HeatTransfer is relatively large, but– Larger DistArray applications can be made fault tolerant using the
same approach, and the modification ratio will be smaller– Half of the modifications are in the exception-handling code of
processException, which can be provided as a support library
Program Code size Mod. size
MontePi 30 lines 4 lines
KMeans 93 lines 12 lines
HeatTransfer 68 lines 24 lines
(ResilientDistArray) ~200 lines -
© 2014 IBM Corporation
IBM Research - Tokyo
11 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
Evaluation – Performance
Relative execution times of base code and fault- tolerant code, on Standard and Resilient X10
Consideration
On Standard X10
– For MontePi and KMeans, almost no overheads were observed– For HeatTransfer, FT version took 9% more time, because of the cost of the
periodic snapshots
On Resilient X10
– For MontePi and KMeans, there were 2.2~9.0% overheads– For HeatTransfer, 6x slowdown mainly because at is invoked too frequently
5.0
5.2
5.4
5.6
5.8
6.0
6.2
6.4
6.6
1
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
MontePi KMeans HeatTransfer
Base on X10 FT on X10 Base on ResX10 FT on ResX10
Environment: 2.7GHz Xeon blades (total 16 cores),40Gbps InfiniBand, Red Hat Enterprise Linux Server 6.3Native X10 2.4.2, 8 places, Place-0 resilient storage
X10 ResX10
Low
er is
bet
ter
Bas
e co
deFT
cod
e
Fig 8
© 2014 IBM Corporation
IBM Research - Tokyo
12 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
Evaluation – Fault Tolerance
Behavior when places are killed
Fault tolerance was achieved by the combination of Resilient X10 and fault-tolerant applications
Effects caused by place deaths
– When 4 places among the 8 places were killed• Deviation of MontePi result increased from 0.0008% to 0.002%• But the execution time did not increase
– When Place 2 was killed during the execution of the 17th iteration• Execution time increased by 11% in KMeans and 14% in HeatTransfer• But the executions still ended with correct results
Base code Fault-tolerant code
On Standard X10 Aborted Aborted
On Resilient X10 End with DPE Survived
© 2014 IBM Corporation
IBM Research - Tokyo
13 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
ConclusionsSummary Introduced three methods of adding fault tolerance to existing applications
(a) Ignore failures and use the results from the remaining nodes ... MontePi(b) Reassign the failed node’s work to the remaining nodes ... KMeans(c) Restore the computation from a periodic snapshot ... HeatTransfer
with ResilientDistArray
Evaluated the fault-tolerant applications in a real distributed environment– Very small modifications to add fault tolerance– 2.2~9.0% execution overhead (but 6x slowdown in case of too frequent at)– 11~14% additional overhead when 1 among 8 places was lost
Future work– Reduce the overhead – both in Resilient X10 and fault-tolerant applications– Make fault-tolerant versions of larger X10 applications [12,19]
© 2014 IBM Corporation
IBM Research - Tokyo
14 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
Additional Information about Resilient X10
The Resilient X10 function is included as a technology preview in X10 2.4.3 (released in May 2014)
– Can be enabled by specifying “X10_RESILIENT_MODE=1”– Can run with either of Native X10 and Managed X10
• The communication layer is limited to sockets– Sample codes exist under “samples/resiliency/”
• Refer to README.txt in the directory for details
Related papers[3] Resilient X10: Efficient Failure-Aware Programming
Cunningham, D., Grove, D., Herta, B., Iyengar, A., Kawachiya, K., Murata, H., Saraswat, V., Takeuchi, M. and Tardieu, O. Proceedings of PPoPP ’14, pp. 67–80 (2014).
[2] Semantics of (Resilient) X10 Crafa, S., Cunningham, D., Saraswat, V., Shinnar, A. and Tardieu, O. Proceedings of ECOOP ’14, to appear (2014).
© 2014 IBM Corporation
IBM Research - Tokyo
15 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
BACKUP
© 2014 IBM Corporation
IBM Research - Tokyo
16 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
Execution Example – HeatTransfer (10x10)$ cd X10/242/samples/resiliency$ x10c++ -O -NO_CHECKS ResilientHeatTransfer.x10 -o ResilientHeatTransfer$ X10_NPLACES=8 X10_RESILIENT_MODE=1 runx10 ResilientHeatTransfer 10HeatTransfer for 10x10, epsilon=1.0E-50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 X10_RESILIENT_STORE_MODE=0X10_RESILIENT_STORE_VERBOSE=0---- Iteration: 1delta=0.25---- Iteration: 2delta=0.125
:---- Iteration: 10delta=0.023305892944336Create a snapshot at iteration 10---- Iteration: 11delta=0.020451545715332
:
---- Iteration: 86Create new Dist over available 6 places0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 Restore from a snapshot at iteration 80delta=8.5E-4---- Iteration: 87delta=8.1E-4
:---- Iteration: 194delta=9.7E-6Result converged---- Result0.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 0.000 0.491 0.679 0.761 0.799 0.815 0.815 0.799 0.761 0.679 0.491 0.000 0.000 0.285 0.463 0.566 0.622 0.646 0.646 0.622 0.566 0.463 0.285 0.000 0.000 0.184 0.324 0.419 0.475 0.501 0.501 0.475 0.419 0.324 0.184 0.000 0.000 0.126 0.232 0.309 0.358 0.382 0.382 0.358 0.309 0.232 0.126 0.000 0.000 0.089 0.167 0.227 0.267 0.287 0.287 0.267 0.227 0.167 0.089 0.000 0.000 0.064 0.121 0.166 0.197 0.212 0.212 0.197 0.166 0.121 0.064 0.000 0.000 0.045 0.086 0.118 0.141 0.153 0.153 0.141 0.118 0.086 0.045 0.000 0.000 0.031 0.058 0.081 0.097 0.105 0.105 0.097 0.081 0.058 0.031 0.000 0.000 0.019 0.036 0.051 0.061 0.066 0.066 0.061 0.051 0.036 0.019 0.000 0.000 0.009 0.017 0.024 0.029 0.032 0.032 0.029 0.024 0.017 0.009 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
---- Iteration: 38delta=0.003633990233121---- Iteration: 39 <---- Place 2 was killedPlace 2 exited unexpectedly with signal: TerminatedMultipleExceptions size=2DeadPlaceException thrown from Place(2)DeadPlaceException thrown from Place(2)
---- Iteration: 40Create new Dist over available 7 places0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 Restore from a snapshot at iteration 30delta=0.005177850168275Create a snapshot at iteration 40---- Iteration: 41delta=0.004930303185298
:---- Iteration: 85 <---- Place 7 was killedPlace 7 exited unexpectedly with signal: TerminatedMultipleExceptions size=1DeadPlaceException thrown from Place(7)
© 2014 IBM Corporation
IBM Research - Tokyo
17 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
ResilientKMeans
© 2014 IBM Corporation
IBM Research - Tokyo
18 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
© 2014 IBM Corporation
IBM Research - Tokyo
19 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
ResilientHeatTransfer
© 2014 IBM Corporation
IBM Research - Tokyo
20 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
© 2014 IBM Corporation
IBM Research - Tokyo
21 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
Programming Language X10
X10 [25] is a programming language that supports parallel/distributed computing internally
Parallel and distributed Hello World in X10
– Compilation and execution example using Native X10
class HelloWorld {public static def main(args:Rail[String]) {
finish for (pl in Place.places()) {at (pl) async { // parallel distributed exec in each place
Console.OUT.println("Hello from " + here);}
} // end of finish, wait for the execution in all places}
} $ x10c++ HelloWorld.x10 -o HelloWorld # compile$ X10_NPLACES=4 runx10 HelloWorld # executionHello from Place(3)Hello from Place(0)Hello from Place(2)Hello from Place(1)
Fig 2
Executed at each node (place) in a parallel and distributed manner
© 2014 IBM Corporation
IBM Research - Tokyo
22 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
Execution Model of X10 – Asynchronous PGASAsynchronous Partitioned Global Address Space
A global address space is divided into multiple places (≒computing nodes)– Each place can contain activities and objects
An activity (≒thread) is created by async, and can move to another place by at
An object belongs to a specific place, but can be remotely referenced from other places– To access a remote reference, activities must move to its home place
DistArray is a data structure whose elements are scattered over multiple places
Fig 1
DistArray
DistArray
Immutable data, class, struct, function
async at at at
asyncasyncActivity
Object
Address space
Local ref
Place 0 Place MAX_PLACES-1...Place 1
Remote ref
Creation
Place shift
© 2014 IBM Corporation
IBM Research - Tokyo
23 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
If a Computing Node Failed ...Consider the case Place 1 (’s node) dies
Activities, objects, and part of DistArrays in the dead place are lost– This causes the abort of the entire X10 processing in standard X10
However in PGAS model, it is relatively easy to localize the impact of place death– Objects in other places are still alive, although remote references become inaccessible– Can continue the execution using the remaining nodes (places) Resilient X10 [2,3]
DistArray
DistArray
Immutable data, class, struct, function
async at at at
asyncasyncActivity
Object
Address space
Local ref
Place 0 Place MAX_PLACES-1...Place 1
Remote ref
Creation
Place shift
Fig 1
© 2014 IBM Corporation
IBM Research - Tokyo
24 Writing Fault-Tolerant Applications Using Resilient X10 / Kiyokuni Kawachiya2014/06/12
DistArray – A Distributed Array Library
DistArray – An array whose elements are scattered over multiple places
– Suitable for SPMD-style processingExample – Compute 12 + 22 + ・・・
+ 10002 using multiple places
Upon a node failure, the DistArray elements in the dead place are lost
public class DistArrayExample {public static def main(Rail[String]) {
val R = Region.make(1..1000);val D = Dist.makeBlock(R, 0, PlaceGroup.WORLD);val A = DistArray.make[Long](D, ([i]:Point)=>i);val tmp = new Array[Long](Place.MAX_PLACES);finish for (pl in Place.places()) async {
tmp(pl.id) = at (pl) { var s:Long = 0;for (p:Point in D|here) s += A(p)*A(p); s };
} // end of finish, wait for the execution in all placesval result = tmp.reduce((a:Long,b:Long)=>a+b, 0);Console.OUT.println(result); // -> 333833500
} }
Overview
Create a DistArray D with initial values 1~1000
Each place processes its local elements (x), and calculates the sum of x2
Sum up the all results
Local part of the DistArray is processed here