An Analysis of Communication Induced Checkpointing.pdf

download An Analysis of Communication Induced Checkpointing.pdf

of 8

Transcript of An Analysis of Communication Induced Checkpointing.pdf

  • 8/21/2019 An Analysis of Communication Induced Checkpointing.pdf

    1/8

    A n A n a l y s i s o f C o m m u n i c a t i o n I n d u c e d C h e c k p o i n t i n g

    L o r e n z o A l v i s i

    z

    E l m o o t a z b e l l a h E l n o z a h y

    y

    S r i r a m R a o

    z

    S y e d A m i r H u s a i n

    z

    A s a n k a D e M e l

    z

    z

    D e p a r t m e n t o f C o m p u t e r S c i e n c e s

    y

    I B M A u s t i n R e s e a r c h L a b

    U . T . A u s t i n A u s t i n , T e x a s , U S A .

    Abstract

    Communication induced checkpointing (CIC) allows processes ina distributed computation to take independent checkpoints and

    to avoid the domino effect. This paper presents an analysis ofCIC protocols based on a prototype implementation and validatedsimulations. Our result inidcate that there is sufficient evidence to

    suspect that much of theconventional wisdomabouttheseprotocolsis questionable.

    1 Introduction

    There are three styles for implementing application-transparentrollback-recovery in message-passing systems, namely coordi-nated checkpointing, message logging, and communication-inducedcheckpointing (CIC) [5]. Both coordinated checkpointing andmessage logging have received considerable analysis in the litera-ture [6, 12, 14, 15], but little is known about the behavior of CICprotocols. This paper presents an experimental analysis of theseprotocols through a prototype implementation and reveals severalof their theoretical and pragmatic characteristics.

    CIC protocols are believed to have several advantages overother styles of rollback-recovery. For instance, they allow pro-cesses considerable autonomy in deciding when to take check-

    points. A process can thus take a checkpoint at times when savingthe state would incur a small overhead [10, 13]. CIC protocolsalso are believed to scale up well with a larger number of processessince they do not require the processes to participate in a globalcheckpoint. But there is a price to pay for these advantages. First,the protocol-specific information piggybacked on application mes-sages occasionally "induces" processes to take forced checkpointsbefore they can process the messages. Second, processes have topay the overhead of piggybacking information on top of applica-tion messages, and they also need to keep several checkpoints onstable storage. These advantages and disadvantages are clearlyqualitative and potentially arguable. A purpose of our work is toshed some light on these issues using quantitative metrics drawnfrom a real system.

    To this end, we have implemented three CIC protocols-theoriginal one by Briatico et al [2], and two recent protocols [1, 7]based on the Z-path theory [11] and we have examined their per-formance using two metrics, namely the average number of forcedcheckpoints a protocol causes and its running time. The first met-ric is important because forced checkpoints negate the autonomyadvantage of CIC protocols. Also, they contribute substantiallyto the performance and resource overheads. A good CIC proto-col therefore tries to limit these forced checkpoints to the extent

    T h i s a u t h o r s u p p o r t e d i n p a r t b y t h e N a t i o n a l S c i e n c e F o u n d a -

    t i o n C A R E E R a w a r d C C R - 9 7 3 4 1 8 5 , R e s e a r c h I n f r a s t r u c t u r e A w a r d

    C D A - 9 7 3 4 1 8 5 , a n d D A R P A S P A W A R g r a n t N 6 6 0 0 1 - 9 8 - 8 9 1 1 .

    p0

    p3

    p2

    m0

    m2 m

    3

    m1

    m4

    C0,0

    C0,1

    C0,2

    C1,0

    C1,1

    C1,2

    C1,3

    C2,3

    C2,2

    C2,1

    C2,0

    Figure 1 : A distributed computation. Ci ; j

    denotes the j t h

    checkpoint of process pi

    .

    possible. The experiments use four compute-intensive programsfrom the NPB 2.3 benchmark suite [3], which is a representativeof a class of applications that have traditionally been the primaryusers of checkpointing protocols. We then use the implementationin part to validate a simulation model that we built to study furtherthe scalability of the protocols and their behaviors under different

    communication patterns.Ourresults reveal several important propertiesof CICprotocols

    and highlight several implementation and theoretical issues thatwere not addressed in the literature. We hope that our work willbe a first step in investigating an area that thus far has been only

    subject to theoretical research.

    2 Background

    This section reviews the three protocols used in the experiments,along with some necessary definitions. The description is in-evitably terse and covers only the features necessary to follow theexperimental work described later.

    2.1 Definitions

    Local checkpoints: A process may take a local checkpointany time

    during the execution. The local checkpoints of different processesare not coordinated to form a global consistent checkpoint [4].Forced checkpoints: To guard against the domino effect, a CIC

    protocol piggybacks protocol-specific information to applicationmessages that processes exchange. Each process examines the in-formationand occasionally is forced to take a checkpoint accordingto the protocol.Useless checkpoints: A useless checkpoint of a process is one that

    will never be part of a global consistent state [18]. In Figure 1,checkpoint C

    2 ; 2

    is an example of a useless checkpoint. Uselesscheckpoints are not desirable because they do not contribute to therecovery of the system from failures, but they consume resourcesand cause performance overhead.

  • 8/21/2019 An Analysis of Communication Induced Checkpointing.pdf

    2/8

    Checkpoint intervals: A checkpoint interval is the sequence of

    events between two consecutive checkpoints in the execution of aprocess.

    2.2 Z-paths and Z-cycles

    Z-paths: A Z-path (zigzag path) is a special sequence of messages

    that connects two checkpoints. Let!

    denote Lamports happen-before relation [9]. Given two local checkpoints Ci ; m

    and Cj ; n

    ,a Z-path exists between C

    i ; m

    and Cj ; n

    if and only if one of thefollowing two conditions holds:

    1.m n

    andi = j

    ; or

    2. There exists a sequence of messages m

    0

    ; m

    1

    ; ; m

    z

    ,z 0

    ,such that:

    (a)C

    i ; m

    !

    sendi

    m

    0

    ;

    (b) 8 l z , either deliverk

    m

    l

    and sendk

    m

    l + 1

    arein the same checkpoint interval, or deliver

    k

    m

    l

    !

    sendk

    m

    l + 1

    ; and

    (c) deliverj

    m

    z

    ! C

    j ; n

    where sendi

    and deliveri

    arecommunication events executed by process p

    i

    . In Fig-

    ure 1, m 1 ; m 2 and m 1 ; m 3 are examples of Z-paths.

    Z-cycles: A Z-cycle is a Z-path that begins and ends with the same

    checkpoint. In Figure 1, the Z-path m4

    ; m

    1

    ; m

    3

    is a Z-cycle thatinvolves checkpoint

    C

    2 ; 2

    .

    2.3 Z-cycles and CIC

    CIC protocols do not take useless checkpoints. These protocolsrecognize that the creation of useless checkpoints depends on theoccurrence of specific patterns in which processes communicateand take checkpoints [8]. Informally, these protocols recognizepotentially dangerous patterns and break them before they occur.This intuition has been formalized in an elegant theory based onthe notion Z-cycles. A key result in this theory is that a local check-point is useless if it is involved in a Z-cycle [8, 11]. Hence, to avoiduseless checkpoints it suffices that no Z-path ever becomes a Z-cycle. Enforcing the no-Z-cycle

    N Z C

    condition may require that

    a process save additional forced checkpoints in addition to its localcheckpoints. There are two approaches to avoiding Z-cycles. Thefirst approach uses a function that associates a timestamp with eachcheckpoint. The protocol guarantees, through forced checkpointsif necessary, that (i) if there are two checkpoints

    C

    i ; m

    andC

    j ; n

    such that Ci ; m

    ! C

    j ; n

    , then t s Cj ; n

    t s C

    i ; m

    , where t s C is the timestamp associated with checkpoint C ; and (ii) consec-utive local checkpoints of a process have increasing timestamps.The second approach relies instead on preventing the formationof specific checkpoint and communication patterns that may leadto the creation of a Z-cycle. Protocols that follow this approachdo not adopt a specific function for associating timestamps withcheckpoints. However, for these protocols there always exists

    an equivalent time-stamping function that would cause the sameforced checkpoints [8].

    2.4 Briatico, Ciuffoletti and Simoncini (BCS)

    In BCS, each process pi

    maintains a logical clockl ci

    that functionsas p

    i

    s checkpoint timestamp. The timestamp is an integer variablewith initial value 0 and is incremented according to the followingfunction:

    1. l ci

    increases by 1 whenever pi

    takes a local checkpoint.

    2. pi

    piggybacks on every messagem

    it sends a copy of thecurrent value of l c

    i

    . We denote the piggybacked value asm : l c

    .

    3. Whenever pi

    receives a message m , it compares l ci

    withm : l c . If m : l c l c

    i

    , then pi

    sets l ci

    to the value of m : l c andtakes a forced checkpoint before processing the message.

    The set of checkpoints having the same timestamps in differentprocesses is guaranteed to be a consistent state. Therefore, thispro-tocol guarantees that there is always a recovery line correspondingto the lowest timestamp in the system, and the domino effect isprevented.

    2.5 Helary, Mostefaoui, Netzer and Raynal (HMNR)

    The HMNR protocol uses the observation that if checkpointstimestamps always increase along a Z-path (as opposed as sim-ply non-decreasing, as required by rule (i) above), then no Z-cyclecan ever form. It is thus possible to design functions that takeadvantage of this observation. Helary et al start with the followingsimple scheme which would require each process to maintain alogical clock, as in BCS, and to apply the following rules:

    1. l ci

    increases by 1 whenever pi

    takes a local or forced check-point.

    2. Whenever pi

    sends a message m , it piggybacks on m a copyof

    l c

    i

    , and we denote this value bym : l c

    as before.

    3. Whenever pi

    receives a message m , it compares l ci

    withm : l c

    . Ifm : l c l c

    i

    , thenp

    i

    setsl c

    i

    to the value ofm : l c

    .

    Then, they refine the protocol using more sophisticated ob-servations and requiring that processes append more information.Figure 2, adapted from [7], shows how this simple timestamp func-tion can be used to decide when to take a checkpoint. Let t s m denote the timestamp piggybacked on message

    m

    . When pro-cess p

    1

    receives message m0

    , if t s m0

    t s m

    1

    then certainlyt s C

    0 ; 0

    t s C

    2 ; 1

    and there is no need for a forced checkpoint.

    If t s m 0 t s m 1 , however, delivering m 1 may create the pos-sibility of generating a Z-path along which the timestamps do notincrease. A straightforward way to avoid this risk is to force

    p

    1

    to take a checkpoint before delivering m0

    , thereby breaking theZ-path. It may be possible, however, for p

    1

    to avoid takinga forcedcheckpoint by using a more sophisticated timestamp function. Forinstance, a function that piggybacks on application messages infor-mation about the logical clocks of all processes may give processp

    1

    more information to decide if a forced checkpoint is really nec-essary. In Figure 2 (b), if p

    1

    knows that the value of p2

    s localclock is at least

    x

    when it is about to deliverm

    0

    , then even ift s m

    0

    t s m

    1

    , p1

    does not need to take a forced checkpointif

    t s C

    0 ; 0

    t s m

    1

    x t s C

    2 ; 1

    . HNMR uses this ob-servation and more sophisticated ones to reduce the number offorced checkpoints while still ensuring that the timestamps always

    increase along a Z-path. In [7], Helary et al present several CICalgorithms. For our experiments, we have used the most sophisti-cated one, which reduces as much as possible the number of forcedcheckpoints.

    2.6 Baldoni, Quaglia and Ciciani (BQC)

    The BQC protocol does not prevent forced checkpoints by usingan explicit function to timestamp checkpoints [1]. Rather, thisprotocol enforces N Z C by preventing the formation of patterns ofcheckpoints and communication that may result in the creation of

  • 8/21/2019 An Analysis of Communication Induced Checkpointing.pdf

    3/8

    p0

    p2

    p1

    m0

    m1

    C0,0

    C1,0

    C2,1C2,0

    p0

    p2

    p1

    m0

    m1

    C0,0

    C1,0

    C2,1C2,0

    C1,1

    Figure 2 : To checkpoint or not to checkpoint?

    a Z-cycle. In particular, BQC prevents the creation of suspectedZ-cycles. Figure 3 shows the structure of a suspected Z-cycle. Inthis example, process p

    i

    would take a checkpoint before deliveringmessage m

    2

    , in order to avoid a potential Z-cycle that includesm

    0

    andm

    1

    and involves checkpointC

    0 ; 1

    . Note that a suspectedZ-cycle is not necessarily a Z-cycle. For instance, there is noZ-cycle in Figure 3 unless there is an actual Z-path starting withm

    0

    and ending with m1

    between Ci ; 0

    and C0 ; 1

    . This informationmay or may not be available to process p

    i

    when it receives m2

    . Ifthe information is not available, the protocol opts for safety andtakes a forced checkpoint before delivering m

    2

    . If the informationis available, however, the protocol refrains from taking a forcedcheckpoint. The actual protocol propagates n 2 values on eachapplication message to help processes detect suspected Z-cyclesand suppress them using forced checkpoints [1]. The referencealso contains a formal characterization of the notion of suspectedZ-cycle and a proof that a protocol that prevents suspected Z-cyclealso satisfies N Z C [1].

    3 Implementation Issues

    An implementation of CIC must deal with several pragmatic issuesthat are typically left out of protocol specifications. We describehow we resolved these issues in our implementation.

    3.1 Autonomy in Local Checkpoints

    A stated advantage of CIC protocols is that they prevent the dominoeffect while allowing processes considerable autonomy in decid-ing when to take local checkpoints. An efficient implementationof CIC must therefore adopt a checkpointing policy that exploitsthis autonomy and translates it into a benefit. In general, this re-quires a good understanding of the application and of the executionenvironment. For example, detailed knowledge of the applicationmay allow processes to take checkpoints when the size of the livevariables is small [10, 13].

    p0

    p1

    m0

    m1

    C0,0

    C1,0

    Ci,0

    pi

    C0,1

    first message sentcausally afterto reachp

    i

    C0,1

    m2

    Figure 3 : A suspect Z-cycle involving checkpoint C0 ; 1

    .

    In our study, we have chosen a set of compute-intensive, long-running applications that have traditionally been the main benefi-ciary of checkpointing protocols. We have found that either thecomplexity of the application program precludes investing the ef-fort in defining a reasonable checkpoint placement policy, or thatthe application structure does not reveal points within the execu-tion where taking a checkpoint is more advantageous than others.Therefore, we have resolved to use a probabilistic distribution toemulate what an autonomous application would do in deciding onwhere to place the local checkpoints. In addition, we have alsoused the traditional policy of taking checkpoints at regular inter-vals. The results were similar; indeed, one of the first conclusionsthat we reached in our implementation is that even with a deepunderstanding of the applications structure, no policy for takinglocal checkpoints can reasonably be adopted without consideringthe effect of the forced checkpoints that occur because of commu-nication events. For example, a pre-run analysis may decide on theexecution points during which it is most "convenient" to schedulelocal checkpoints within a particular process. But if this policyignores the forced checkpoints, then it is possible to schedule alocal checkpoint immediately after a forced checkpoint. In thiscase, taking the local checkpoint will result in additional work andoverhead, with no substantial reduction of the amount of work atrisk. A more reasonable decision may then be to postpone taking

    the local checkpoint. In any case, the autonomy of deciding whento take checkpoints seems to be limited by the occurrence of forcedcheckpoints due to interactions with other processes.

    3.2 Non-blocking Checkpointing in CIC

    The benefits of non-blocking checkpointing in reducing the per-formance overhead of checkpointing protocols have been clearlyestablished [6, 14]. Non-blocking checkpointing allows the appli-cation to resume computation as soon as possible and to schedulethe actual writing of the checkpoint concurrently with the applica-tion execution. The result is that saving the checkpoint to stablestorage does not become a bottleneck that impedes applicationprogress. However, using non-blocking checkpointing in CIC in-troduces potential inconsistencies with the specification of the pro-

    tocols. To understand why, consider the situation where a processp

    i

    takes a non-blocking checkpoint (local or forced) and then sendsa message m to another process p

    j

    ; assume furthermore that theCIC protocol being used requires

    p

    j

    to take a forced checkpointbefore delivering m . Recall that all CIC protocols maintain theinvariant that no Z-cycles may form and no useless checkpoint isever taken: hence, the forced checkpoint of

    p

    j

    should never beuseless and no Z-cycles can include it. However, suppose that p

    i

    fails after sendingm

    , but beforep

    i

    s checkpoint has been entirelywritten to stable storage. The failure of p

    i

    makes the checkpointtaken by p

    j

    useless, thereby violating the protocol invariant. The

  • 8/21/2019 An Analysis of Communication Induced Checkpointing.pdf

    4/8

    problem is not just cosmetic, because several such communica-tion events may occur while several non-blocking checkpoints arebeing written to stable storage according to an implementation ofnon-blocking checkpointing. Therefore, it may be possible formany checkpoints on stable storage to be rendered useless becausea failure of some process in the system occurred before one ormore of its checkpoints were saved on stable storage. Indeed, in

    situations like this, Z-cycles do form and useless checkpoints aretaken. There are twosolutions to thisproblem. The first one blocksany outgoing messages from a process until all its non-blockingcheckpoints have been written to stable storage. The messagesare buffered and released only when a process receives notificationfrom the checkpointing agent that the checkpoint has been saved.There is a penalty to pay for this modification, but it is prefer-able to disallowing non-blocking checkpointing altogether. It isinteresting to note here that this solution shows how a pragmaticconsideration may require a modification in the protocol itself tomaintain its invariant. The second solution that we considered isto simply allow these temporary Z-cycles to form, and hope thatthe checkpoints will be written before a failure actually occurs.In a sense, this is an optimistic implementation of CIC, whichmay allow Z-cycles to form temporarily while some checkpointsare being written to stable storage. If the optimistic assumption

    holds, then the invariant of the protocol is preserved and no use-less checkpoints are ever taken. However, if the assumption isviolated because of a failure, then some of the checkpoints thatwould have otherwise be part of the recovery line will have tobe discarded. The benefit of this optimistic alternative is that theoverhead is small and does not require any modification to the pro-tocol as specified. After an interesting debate among the authors,it was resolved to use the second solution for its simplicity andbecause failures are supposedly rare. It is important however forfuture implementers to understand the subtle issues involved withthe pragmatic choices described here and how they may affect theprotocol implementation.

    4 Experiments and Analysis

    The testbed for this study consists of four 300-MHz Pentium-IIbased workstations connected by a 100MB/s Ethernet. Each work-station has two processors, 512MB of RAM, and a 4GB disk usedto implement stable storage. The machines ran Solaris 2.6, andused Suns f77 and C compilers. The testbed is part of the Egidatool [16], which includes support for incremental checkpointingand implements non-blocking checkpointing by forking off a childprocess that writes the checkpoint to stable storage. The appli-cations under study consist of four MPI [17] programs from theNPB 2.3 benchmark suite [3]. These programs represent commoncomputational loads in fluid dynamics applications and typify thekind of applications that have traditionally benefited from check-pointing; their characteristics are given in Table 1.The performancemetrics we report are the number of forced checkpoints that a pro-tocol causes and the performance overhead. We use a combination

    of experiments on the prototype implementation, and then we usethe prototype itself to validate a simulator that we built to studyCIC protocol further, under different communication patterns andenvironments.

    4.1 The Measured Performance of CIC Protocols

    The first set of experiments consists of running each of the fourapplications under the three protocols and for two local checkpointplacement policies. The first policy triggers local checkpoints ac-

    cording to an exponential distribution with a mean checkpoint in-terval set to 360 seconds, while the second policy uses the sameprobabilistic distribution but with the mean checkpoint interval setto 480 seconds. Table 2 shows the results of the experiments. It re-ports the execution time of the entire application, in addition to theper-process average number of local and forced checkpoints. Forconvenience, the average per-process total number of checkpoints

    is also reported. The table also reports the per-process averagecheckpoint size (either local or forced).

    Analysis The results reveal a few issues. In BCS and HMNR,the number of forced checkpoints is essentially the same. In con-trast, the BQC protocol is showing a comparatively larger numberof forced checkpoints when compared to the other two.. The rea-son for this can be ascribed to the communication pattern of theapplications under study. These applications use a common it-erative structure to solve a computationally intensive problem inwhich processes exchange partial results and resume. This leads toa communication pattern that mimics a periodic broadcast. Underthis pattern, the BQC protocol seems to be too "eager" in pre-venting Z-cycles compared to BCS and HMNR. Furthermore, twoadditional effects seem to occur:

    1. Many suspected Z-cycles end up causing forced checkpointswithout actually being a menace.

    2. It is often the case that more than one process "volunteer"in parallel to break the same suspected Z-cycle by forcingcheckpoints.

    Consider Figure 4. In this example, we see process p2

    take aforced checkpoint because of message m

    3

    , not knowing that pro-cess

    p

    1

    has already broken the suspected Z-cycle (part (2) of thefigure). Similarly, process p

    0

    takes a forced checkpoint becauseof message

    m

    2

    , not knowing that processp

    1

    has already brokenthe Z-cycle using checkpoint C

    1 ; 1

    . This behavior continues for awhile under the communication pattern used by our applications.This suggests that there is a disadvantage to using CIC protocolsthat suspect Z-cycles or that are eager to prevent a Z-cycle from

    forming before it is actually clear that one is indeed forming. Incontrast, protocols that use time-stamping functions adopt a lazyapproach: they prevent the formation of Z-cycles only at the lastpossible moment, and therefore work better. Additionally, the re-sults show that the alleged benefit of process autonomy in placinglocal checkpoints does not materialize in practice. Under the bestcircumstances, a process takes twice as many forced checkpoints aslocal ones. The curious notion of process autonomy in distributedsystems where all processes become inter-dependent seems to beon shaky ground. The results also point out to another seriousproblem with CIC protocols in general, namely the unpredictabil-ity of the checkpointing rate. In all experiments, the protocolsended up taking more checkpoints than could be anticipated basedon the local distribution of checkpoint placement. For BCS andHMNR, the number of forced checkpoints was generally twice the

    number of local ones. For BQC, the ratio was worse. The ratio initself is a function of the application, the number of processes, andthe checkpoint placement. The fact that it is unpredictable makesthe protocols cumbersome to use in practice, because it is diffi-cult to plan ahead of time the actual stable storage requirementsand the mean checkpointing interval. Contrast this with consis-tent checkpointing protocols where the number of checkpoints andrequired stable storage can be estimated with great certainty be-forehand [6]. The table also points to another negative aspect ofusing CIC protocols. The performance overhead when consider-ing the running time was relatively bad, reaching between 5 % to

  • 8/21/2019 An Analysis of Communication Induced Checkpointing.pdf

    5/8

    Application NPB Specific Info. Messages/sec. Message Size (Avg.) Exec. Time(KB) (sec.)

    b t C l a s s A 6 5 0 . 5 1 5 3 0

    c g C l a s s B 1 4 6 0 . 7 1 5 1 6

    l u C l a s s A 5 4 3 . 7 9 7 5

    s p C l a s s A 1 7 4 3 . 4 1 2 2 2

    T a b l e 1 : C h a r a c t e r i s t i c s o f t h e b e n c h m a r k s u s e d i n t h e e x p e r i m e n t s .

    N u m b e r o f C h e c k p o i n t s

    Application Protocols

    Local Forced Total Ckpt Size Exec.Time M B s e c .

    B C S 3 6 0 6 1 3 1 9 6 9 . 5 1 7 7 7

    4 8 0 4 9 1 3 7 0 . 9 1 7 1 5

    b t H M N R 3 6 0 5 1 1 1 6 7 0 . 5 1 7 0 9

    4 8 0 4 9 1 3 7 1 . 0 1 6 8 3

    B Q C 3 6 0 6 5 4 6 0 4 1 . 5 1 8 7 5

    4 8 0 4 3 9 4 3 4 1 . 0 1 8 1 9

    B C S 3 6 0 5 1 1 1 6 1 7 . 6 1 6 8 3

    4 8 0 4 9 1 3 1 8 . 9 1 6 5 5

    c g H M N R 3 6 0 5 1 1 1 6 1 7 . 7 1 6 5 5

    4 8 0 3 1 0 1 3 1 9 . 1 1 6 4 3

    B Q C 3 6 0 5 2 4 2 9 8 . 5 1 6 8 9

    4 8 0 3 1 6 1 9 1 3 . 5 1 6 6 5

    B C S 3 6 0 4 9 1 3 1 0 . 7 1 0 5 1

    4 8 0 2 4 6 1 1 . 0 1 0 3 3

    l u H M N R 3 6 0 4 9 1 3 1 0 . 8 1 0 3 5

    4 8 0 2 4 6 1 1 . 0 1 0 1 5

    B Q C 3 6 0 4 3 3 3 7 6 . 6 1 0 5 0

    4 8 0 2 1 4 1 6 6 . 0 1 0 3 6

    B C S 3 6 0 5 1 1 1 6 2 0 . 8 1 3 3 9

    4 8 0 3 7 1 0 2 1 . 2 1 2 9 0

    s p H M N R 3 6 0 5 1 1 1 6 2 0 . 7 1 3 2 0

    4 8 0 3 7 1 0 2 1 . 1 1 3 0 0

    B Q C 3 6 0 5 3 7 4 2 1 1 . 8 1 3 6 2

    4 8 0 3 3 1 3 4 1 2 . 2 1 3 2 9

    T a b l e 2 : P e r f o r m a n c e o f t h r e e C I C p r o t o c o l s f o r t w o c h e c k p o i n t i n t e r v a l s a n d f o u r a p p l i c a t i o n s .

    20 % of the execution time. This behavior is actually commonin systems where checkpoints are not coordinated and processescommunicate frequently [6]. In these situations, when a processtakes a local checkpoint independent of the others it inevitablyslows down because of the state saving and memory copying thatoccur during the checkpoint. This in turn delays the productionof the expected partial result that the process will send to othersin the next communication round. Consequently, the slowdownaffects other processes even if they are not taking checkpoints inthe meantime. The resulting slowdowns stagger quickly and have acumulative effect because many of these independent checkpointsoccur at different times [6]. Finally, we would like to note thatincremental checkpointing seems to mitigate some of the effects of

    having to take so many checkpoints (forced or local). The resultsshow that the average per-process checkpoint size goes down as thefrequency of checkpointing increases, just as one would expect. Insummary, lazy protocols that use time-stamping to break Z-cyclesperform better than eager protocols that take forced checkpointsas soon as they suspect a Z-cycle. The unpredictability of the ac-tual number of checkpoints to be taken (forced and local) makesthese protocols cumbersome to use in practice because no reason-able planning of resources and checkpointing frequency can bemade without understanding the application and its communica-

    tion patterns. Also, it seems that any hope of a benefit for allowingprocesses to take independent checkpoints is thwarted by the factthat a process ends up taking at least twice as many forced check-points than local ones. And finally, CIC protocols share some ofthe negative performance properties of independent checkpointingwhen used in computations where the processes are tightly coupledand communicate frequently.

    4.2 Scalability and Effects of Communication Patterns

    To assess the effect of increasing the number of processes on theprotocol performance, we constructed a simulator to measure thenumber of forced checkpoints for each of the three protocols. We

    first validated the simulator using the measured number of forcedcheckpoints for 4 processes, and we then used it to estimate thenumber of forced checkpoints for different numbers of processorsand different communication patterns. Figure 5 shows the resultsfor the three protocols as the number of processes in the computa-tion varies. Two different sets of measurements are reported. Thefirst set is for a random distribution of messages with a relativelylow load, where each process sends an average of 10 messagesbetween each two consecutive local checkpoints. That is, in thissimulation processes do not communicate much and communicate

  • 8/21/2019 An Analysis of Communication Induced Checkpointing.pdf

    6/8

    C0,0

    p0

    p1

    m0

    m1

    C0,0

    C1,0

    p2

    C0,1

    C2,0 m

    2

    m3

    2 forced

    C1,1

    C2,2

    p0

    p1

    m0

    m1

    C1,0

    p2

    C0,1

    C2,0

    m2

    m3

    4

    C1,1

    C0,2

    m4

    m5

    C2,2

    C2,1

    C2,1

    p0

    p1

    m0

    m1

    C0,0

    C1,0

    p2

    C0,1

    C2,0 m

    2

    m3

    1

    p0

    p1

    m0

    m1

    C0,0

    C1,0

    p2

    C0,1

    C2,0

    m2

    m3

    3

    forced

    C1,1

    C0,2

    C2,1

    C2,1

    Figure 4 : Anomalies in detecting suspect Z-cycles.

    withdifferent processes equally at random and at random intervals.During this simulation, 119 local checkpoints were taken on aver-age. The second set of measurements shows the same results butwith a different communication pattern in which each process talksto two designated neighbors at uniform intervals. A process sendsabout 500 messages between each two consecutive local check-points. This communication pattern is representative of those thatoccur in distributed over-relaxation algorithms.

    Analysis The results show that, in general, the communicationpattern strongly affects the behavior of CIC protocols. This is ex-pected. But the results also show that CIC protocols do not scalewell. In both cases, there is an almost linear increase in the num-ber of forced checkpoints per process as the number of processes

    increase. For this set of applications, at least, the conventionalwisdom that these protocols scale better because they do not resortto global coordination is not true. The results also show that CICprotocols seem to favor random patterns of communications withlow loads.

    4.3 An Adaptive Local Checkpointing Policy

    The results of the experiments so far suggest that a flurry of forcedcheckpoints occur throughout the system as a result of one processtaking a local checkpoint. It is plausible that if forced checkpointsare not taken into account, a local checkpointing policy may take alocal checkpoint shortly after a forced checkpoint has been taken.Such a local checkpoint advances the recovery point of the processby a very short amount compared to the previous forced check-point. Furthermore, this local checkpoint is likely to trigger moreforced checkpoints in other processes, escalating the phenomenoneven further. It may be argued that the resulting overhead can belimited by using incremental checkpointing, and that therefore thelocal checkpoint does not have to save a lot of state on stable stor-age if a forced checkpoint has been taken recently. But we contendthat taking a checkpoint, however small, always has an overheadassociated with it, if only to compute the state that must be savedand to arrange for the copy-on-write to implement non-blockingcheckpointing. However it may be, this overhead cannot be ig-nored. Therefore, there is very little to gain by taking this local

    checkpoint, while there is a potential for larger overhead. To fixthis problem, we experimented with an adaptive local checkpoint-ing policy that refrains from taking a scheduled local checkpoint ifa forced checkpoint has occurred during the last T seconds, whereT is a parameter. Figure 6 shows the resulting number of local andforced checkpoints for the four applications and the three protocolsunder study. We report three measurements, one with the adap-tive policy disabled (T = 0) and two for different values of T (60and 90 seconds). For each T, the table shows the number of localand forced checkpoints under each of the three protocols. Themeasurements for different applications are reported separately.

    Analysis The resultsshow that taking forced checkpointing intoaccount reduces the number of local checkpoints, which in turn

    reduces the number of forced checkpoints. The results are morepronounced for the BQC protocol. These results show two things:

    1. A successful local checkpoint placement policymust containa dynamic element that accounts for the occurrence of forcedcheckpoints. The simple policy "let us checkpoint every xseconds" does not work well.

    2. A successful local checkpoint placement policy must adaptto the application communication patterns as they changeduringexecution. Thiswould reducethe frequency of check-points when the communication load is heavy and the fre-quency of forced checkpoints is high, and vice versa.

    Our recommendations once more outline the unpredictability thatfaces a user of CIC protocols in practice, though they outline

    plausible solutions. It is perhaps possible to come up with betterplacement policies than the one we outlined here, but this is out ofthe papers scope.

    5 Conclusions

    We have conducted several experiments to analyze the behaviorand characteristics of communication-induced checkpointing fora class of compute intensive distributed applications. Our resultsshow that:

  • 8/21/2019 An Analysis of Communication Induced Checkpointing.pdf

    7/8

    BCS HMNR BQC

    119 local checkpoints

    0

    50

    100

    150

    200

    forced

    checkpoints

    35

    68

    146

    15

    42

    106

    19

    67

    177

    4-proc

    8-proc

    16-proc

    BCS HMNR BQC

    118 local checkpoints

    0

    1000

    2000

    3000

    4000

    forced

    checkp

    oints

    301

    593760

    300

    589

    758

    477

    2170

    4016

    4-proc

    8-proc

    16-proc

    (a) (b)

    Figure 5 : The effect of adaptive local checkpointing: Number of local and forced checkpoints for the three protocols under values of Tfor the four applications under study.

    1. CIC protocols that use an eager approach to preventing Z-cycles by taking forced checkpoints whenever they suspectthe formation of a Z-cycle are bound to perform worse thanlazy protocols that use a time-stamping function to preventa Z-cycle at the last possible second.

    2. CIC protocols do not scale well with a larger number of pro-cesses. We have found thatthe numberof forced checkpointsincreases almost linearly with the number of processes.

    3. A process takes at least twice as many forced checkpointsas local ones. Therefore, the touted benefit of autonomy ofCIC protocols in allowing the processes to take independentcheckpoints does not seem to materialize in practice.

    4. There is a considerable unpredictability in the way CICprotocols behave in practice. The amount of stable stor-age required, performance overhead, and number of forcedcheckpoints depend greatly on the number of processes, theapplication, and the communication pattern. This unpre-dictability makes the use of CIC protocols in practice morecumbersome than other alternatives.

    5. A successful placement policy of local checkpoints must bedynamic, must account for forced checkpoints, and mustadapt to changes in the application behavior.

    6. CIC protocols seem to perform best when the communica-tion load is low and the pattern is random. Regular, heavyload communication patterns seem to fare worse.

    We would like to stress that the results are only valid for theapplication set that we have studiedwe lay no claim that theseresults generalize to all applications. Nevertheless, we believe thatthere is sufficient evidence to suspect that much of the conven-tional wisdom about these protocols is questionable and that moreexperimental work to investigate these protocols further.

    References

    1 R . B a l d o n i , F . Q u a g l i a , a n d B . C i c i a n i . A V P -

    A c c o r d a n t C h e c k p o i n t i n g P r o t o c o l P r e v e n t i n g U s e l e s s

    C h e c k p o i n t s . I n

    Proceedings of the IEEE Symposium onReliable Distributed Systems

    , p a g e s 6 1 6 7 , 1 9 9 8 .

    2 D . B r i a t i c o , A . C i u o l e t t i , a n d L . S i m o n c i n i . A D i s -

    t r i b u t e d D o m i n o - E e c t F r e e R e c o v e r y A l g o r i t h m . I n

    Proceedings of the IEEE International Symposium on Relia-

    bility Distributed Software and Database, p a g e s 2 0 7 2 1 5 ,

    D e c e m b e r 1 9 8 4 .

    3 N A S A A m e s R e s e a r c h C e n t e r . N A S P a r a l l e l B e n c h -

    m a r k s . h t t p : s c i e n c e . n a s . n a s a . g o v S o f t w a r e N P B ,

    1 9 9 7 .

    4 K . M . C h a n d y a n d L . L a m p o r t . D i s t r i b u t e d s n a p s h o t s :

    d e t e r m i n i n g g l o b a l s t a t e s o f d i s t r i b u t e d s y s t e m s .

    ACM

    Transactions on Computer Systems , 3 1 : 6 3 7 5 , F e b r u a r y 1 9 8 5 .

    5 E . N . E l n o z a h y , D . B . J o h n s o n , a n d Y . M . W a n g .

    A S u r v e y o f R o l l b a c k - R e c o v e r y P r o t o c o l s i n M e s s a g e -

    P a s s i n g S y s t e m s . T e c h n i c a l R e p o r t C M U - C S - 9 6 - 1 8 1 ,

    C a r n e g i e M e l l o n U n i v e r s i t y , 1 9 9 6 .

    6 E . N . E l n o z a h y , D . B . J o h n s o n , a n d W . Z w a e n e p o e l .

    T h e P e r f o r m a n c e o f C o n s i s t e n t C h e c k p o i n t i n g . I n

    Pro-ceedings of the Eleventh Symposium on Reliable DistributedSystems

    , p a g e s 3 9 4 7 , O c t o b e r 1 9 9 2 .

    7 J . M . H e l a r y , A . M o s t e f a o u i , R . H . B . N e t z e r , a n d

    M . R a y n a l . P r e v e n t i n g U s e l e s s C h e c k p o i n t s i n D i s -

    t r i b u t e d C o m p u t a t i o n s . I n

    Proceedings of IEEE Inter-

    national Symposium on Reliable Distributed Systems , p a g e s 1 8 3 1 9 0 , 1 9 9 7 .

    8 J . M . H e l a r y , A . M o s t e f a o u i , a n d M . R a y n a l . V i r t u a l

    p r e c e d e n c e i n a s y n c h r o n o u s s y s t e m s : c o n c e p t s a n d a p -

    p l i c a t i o n s . I n

    Proceedings of the 11th Workshop on Dis-tributed Algorithms

    . L N C S p r e s s , 1 9 9 7 .

    9 L . L a m p o r t . T i m e , C l o c k s , a n d t h e O r d e r i n g o f E v e n t s

    i n a D i s t r i b u t e d S y s t e m .

    Communications of the ACM,

    2 1 7 : 5 5 8 5 6 5 , J u l y 1 9 7 8 .

  • 8/21/2019 An Analysis of Communication Induced Checkpointing.pdf

    8/8

    local

    BCS

    forced

    BCS

    local

    HMNR

    forced

    HMNR

    local

    BQC

    forced

    BQC

    0

    10

    20

    30

    40

    50

    64 4

    13

    87

    54

    3

    11

    87

    64

    3

    54

    39

    30

    BT

    T = 0

    T = 60

    T = 90

    local

    BCS

    forced

    BCS

    local

    HMNR

    forced

    HMNR

    local

    BQC

    forced

    BQC

    0

    5

    10

    15

    20

    25

    54 4

    11

    9

    7

    54

    3

    11

    9

    7

    54

    3

    24

    16

    14

    CG

    T = 0

    T = 60

    T = 90

    (a) (b)

    local

    BCS

    forced

    BCS

    local

    HMNR

    forced

    HMNR

    local

    BQC

    forced

    BQC

    0

    10

    20

    30

    43 3

    9

    65

    43 3

    9

    65

    43 3

    33

    21

    17

    LU

    T = 0

    T = 60

    T = 90

    local

    BCS

    forced

    BCS

    local

    HMNR

    forced

    HMNR

    local

    BQC

    forced

    BQC

    0

    10

    20

    30

    5

    3 3

    11

    76

    5

    3 3

    11

    7

    5 54

    3

    37

    31

    28

    SP

    T = 0

    T = 60

    T = 90

    (c) (d)

    Figure 6 : The effect of adaptive local checkpointing: Number of local and forced checkpoints for the three protocols under values of Tfor the four applications under study.

    1 0 C . C . L i a n d W . K . F u c h s . C A T C H : C o m p i l e r - a s s i s t e d

    t e c h n i q u e s f o r c h e c k p o i n t i n g . I n

    Proceedings of the20th International Symposium on Fault-Tolerant Computing

    ,

    p a g e s 7 4 8 1 , 1 9 9 0 .

    1 1 R . N e t z e r a n d J . X u . N e c e s s a r y a n d S u c i e n t C o n d i -

    t i o n s f o r C o n s i s t e n t G l o b a l S n a p s h o t s . T e c h n i c a l R e -

    p o r t 9 3 - 3 2 , D e p a r t m e n t o f C o m p u t e r S c i e n c e s , B r o w n

    U n i v e r s i t y , J u l y 1 9 9 3 .

    1 2 N . N e v e s a n d W . K . F u c h s . R E N E W : A T o o l f o r F a s t

    a n d E c i e n t I m p l e m e n t a t i o n o f C h e c k p o i n t P r o t o c o l s .

    I n

    Proceedings of the 28th IEEE Fault-Tolerant Computing

    Symposium (FTCS), M u n i c h , G e r m a n y , J u n e 1 9 9 8 .

    1 3 J . S . P l a n k , M . B e c k , a n d G . K i n g s l e y . C o m p i l e r -

    a s s i s t e d m e m o r y e x c l u s i o n f o r f a s t c h e c k p o i n t i n g .

    IEEETechnical Committee on Operating Systems Newsletter, Spe-

    cial Issue on Fault Tolerance, p a g e s 6 2 | 6 7 , D e c e m b e r

    1 9 9 5 .

    1 4 J . S . P l a n k , M . B e c k , G . K i n g s l e y , a n d K . L i .

    L i b c k p t : T r a n s p a r e n t c h e c k p o i n t i n g u n d e r U n i x . I n

    Proceedings of the USENIX Technical Conference, p a g e s

    2 1 3 2 2 4 , J a n u a r y 1 9 9 5 .

    1 5 J . S . P l a n k a n d K . L i . F a s t e r c h e c k p o i n t i n g w i t h N + 1

    p a r i t y . I n

    Proceedings of the Twenty Fourth InternationalSymposium on Fault-tolerant Computing (FTCS-24)

    , p a g e s

    2 8 8 | 2 9 7 , J u n e 1 9 9 4 .

    1 6 S . R a o , L . A l v i s i , a n d H . M . V i n . E g i d a : A n E x t e n s i -

    b l e T o o l k i t f o r L o w - o v e r h e a d F a u l t - t o l e r a n c e . I n

    Pro-ceedings of the IEEE Fault-Tolerant Computing Symposium

    (FTCS-29), M a d i s o n , W I , J u n e 1 9 9 9 .

    1 7 M . S n i r , S . O t t o , S . H u s s - L e d e r m a n , D . W a l k e r , a n d

    J . D o n g a r r a .

    MPI: The Complete Reference. S c i e n t i c

    a n d E n g i n e e r i n g C o m p u t a t i o n S e r i e s . T h e M I T P r e s s ,

    C a m b r i d g e , M A , 1 9 9 6 .

    1 8 Y . M . W a n g a n d W . K . F u c h s . O p t i m i s i t i c m e s s a g e l o g -

    g i n g f o r i n d e p e n d e n t c h e c k p o i n t i n g i n m e s s a g e - p a s s i n g

    s y s t e m s . I n

    Proceedings of the 11th Symposium on Reliable

    Distributed Systems, p a g e s 1 4 7 1 5 4 , O c t o b e r 1 9 9 2 .