1 Bugs, Bugs and Bugs. 2 Bugs: Run Time Handling Heisenbugs/MandelbugsHeisenbugs/Mandelbugs...

Post on 15-Jan-2016

238 views 0 download

Transcript of 1 Bugs, Bugs and Bugs. 2 Bugs: Run Time Handling Heisenbugs/MandelbugsHeisenbugs/Mandelbugs...

1

Bugs, Bugs and Bugs

2

Bugs: Run Time Handling

• Heisenbugs/MandelbugsHeisenbugs/Mandelbugs

– Heisenbugs are easier to take care of during run-time– Higher chance that robust programming mechanisms are successful

• Bohr bugs are typically easier to find and fix…at design time

• But harder to take care of during run time

3

Perturbation Classifications/Coverage

• PersistencePersistence– Transient fault– Intermittent fault– Permanent fault

• Creation timeCreation time– Design fault– Operational fault

• IntentionIntention– Accidental fault– Intentional fault

Crash failure Fail-silent and Fail-stop

Omission failure Timing failure

System fails to respond within a specified time slice

Both late and early responses might be “bad”

Late timing failure = performance failure

Arbitrary failure System behaves arbitrarily

4

Robust Programming Mechanisms

Objective: Sustain the delivery of services despite perturbations!

• Process Pairs• Graceful Degradation• Selective Retry• Checkpointing • Rejuvenation• Micro-reboots• Recovery Blocks• Diversity (NVP, NCP)• ...

5

Process Pairs (Continual Service)Implementation Variants: - Active replicas – both process client requests [+ fast; - complex]- Primary/Backup – state transfer [+- simpler; - delay]

6

Process Pairs

• Process pair scheme robust to varied types of software faults Process pair scheme robust to varied types of software faults (crashes, resource shortage/delays, load…) :(crashes, resource shortage/delays, load…) :

– Study of print servers with process pair technology (primary / backup)– 2000 systems; 10 million system hours– 99.3% of failures affected only one server, i.e., 99.3% of failures were

tolerated

7

Simple Process Pair (same host)

......

forever {forever {

wait_for_request(Request);wait_for_request(Request);

process_request(Request);process_request(Request);

}}

......

forever {forever {

wait_for_request(Request);wait_for_request(Request);

process_request(Request);process_request(Request);

}}eventloop

Server Process:

8

Simple Process Pair (same host)

int ft = backup();int ft = backup();

......

forever {forever {

wait_for_request(Request);wait_for_request(Request);

process_request(Request);process_request(Request);

}}

int ft = backup();int ft = backup();

......

forever {forever {

wait_for_request(Request);wait_for_request(Request);

process_request(Request);process_request(Request);

}}

create backup process;primary returns

create backup process;primary returns

eventloop

Server Process:

9

Simple Process Pair Implementation

backup

event loop event loop

- Don’t forget that we are assuming that the backup has the “full” state info or that the needed state is stored on (external) stable storage

- Mostly focusing on crash failures…primary can hang too…watchdog timers- Transients ok too except this model is at a basic concept level…

10

Syscalls

parent processkernelfork timewaitpid fork waitpid waitpidfork

...

11

man page: fork

forkfork() creates a child process that differs from the parent process only in its () creates a child process that differs from the parent process only in its PID and PPID, and in the fact that resource utilizations are set to 0. File PID and PPID, and in the fact that resource utilizations are set to 0. File locks and pending signals are not inherited.locks and pending signals are not inherited.

RETURN VALUE RETURN VALUE On success, the PID of the child process is returned in the On success, the PID of the child process is returned in the parent's thread of execution, parent's thread of execution, and a 0 is returned in the child's threadand a 0 is returned in the child's thread of of execution. execution. On failureOn failure, a , a -1-1 will be returned in the parent's context, no child will be returned in the parent's context, no child process will be created, and process will be created, and errnoerrno will be set appropriately. will be set appropriately.

ERRORSERRORSEAGAIN forkEAGAIN fork() cannot allocate sufficient memory to copy the parent's page () cannot allocate sufficient memory to copy the parent's page

tables and allocate a task structure for the child.tables and allocate a task structure for the child.EAGAIN EAGAIN It was not possible to create a new process because the caller's It was not possible to create a new process because the caller's

RLIMIT_NPROCRLIMIT_NPROC resource limit was encountered. resource limit was encountered.ENOMEM forkENOMEM fork() failed to allocate the necessary kernel structures because () failed to allocate the necessary kernel structures because

memory is tightmemory is tight..

12

man page: waitpid(pid, *status, options)

The The waitpidwaitpid() system call suspends execution of the current process until a child () system call suspends execution of the current process until a child specified by specified by pidpid argument has changed state. By default, argument has changed state. By default, waitpidwaitpid() waits only () waits only for terminated children. The value of for terminated children. The value of pidpid can be: can be:

< -1 meaning wait for any child process whose process group ID is equal to the < -1 meaning wait for any child process whose process group ID is equal to the absolute value of absolute value of pidpid..

-1 meaning wait for any child process.-1 meaning wait for any child process.0 0 meaning wait for any child process whose process group ID is equal to that meaning wait for any child process whose process group ID is equal to that

of the calling process.of the calling process.

>0 meaning wait for the child whose process ID is equal to the value of >0 meaning wait for the child whose process ID is equal to the value of pidpid..waitpidwaitpid(): (): on success, returns the process ID of the child whose state has changed; on success, returns the process ID of the child whose state has changed; on on

error, -1 is returnederror, -1 is returned..ERRORSERRORSECHILD ECHILD The process specified by The process specified by pidpid does not exist or is not a child of the calling process. does not exist or is not a child of the calling process.

(This can happen for one's own child if the action for SIGCHLD is set to SIG_IGN. See (This can happen for one's own child if the action for SIGCHLD is set to SIG_IGN. See also the LINUX NOTES section about threads.)also the LINUX NOTES section about threads.)

EINTR WNOHANGEINTR WNOHANG was not set and an unblocked signal or a was not set and an unblocked signal or a SIGCHLDSIGCHLD was caught. was caught.EINVALEINVALThe The optionsoptions argument was invalid. argument was invalid.

13

Simple Process Pair

int backup() {int backup() {int ret, restarts = 0;int ret, restarts = 0;for (;; restarts++) {for (;; restarts++) {

ret = fork(); ret = fork(); if (ret == 0) {// child?if (ret == 0) {// child?

return restarts;return restarts;}}while(ret != waitpid(ret,0,0))while(ret != waitpid(ret,0,0))

;;}}

}}

int backup() {int backup() {int ret, restarts = 0;int ret, restarts = 0;for (;; restarts++) {for (;; restarts++) {

ret = fork(); ret = fork(); if (ret == 0) {// child?if (ret == 0) {// child?

return restarts;return restarts;}}while(ret != waitpid(ret,0,0))while(ret != waitpid(ret,0,0))

;;}}

}}

count number of child procscount number of child procs

create childcreate child

parent waits for child to terminateparent waits for child to terminate

waitpid(PID, *status, options)

14

Robust?

15

Failed fork system call (looping?)...

int backup() {int ret, restarts = 0;for (;; restarts++) {

ret = fork(); if (ret == 0) {// child?

return restarts;}

while(ret != waitpid(ret,0,0));

}}

int backup() {int ret, restarts = 0;for (;; restarts++) {

ret = fork(); if (ret == 0) {// child?

return restarts;}

while(ret != waitpid(ret,0,0));

}}

returns -1 on errorreturns -1 on error

parent does not returnparent does not return

returns with -1: no child createdreturns with -1: no child created

retry until success

16

Problem: forked another child

......

fork() fork() // fork non-terminating child// fork non-terminating child......backup() ...backup() ...

fork(); fork(); // fails // fails returns -1 returns -1 waitpid(-1,0,0); waitpid(-1,0,0);

// waits for any child ... might not return// waits for any child ... might not return

......

fork() fork() // fork non-terminating child// fork non-terminating child......backup() ...backup() ...

fork(); fork(); // fails // fails returns -1 returns -1 waitpid(-1,0,0); waitpid(-1,0,0);

// waits for any child ... might not return// waits for any child ... might not return

ret = fork() waitpid(ret,0,0)ret = fork() waitpid(ret,0,0)

17

Graceful Degradation

int backup() {int backup() {int ret, restarts = 0;int ret, restarts = 0;for (;; restarts++) {for (;; restarts++) {

ret = fork(); ret = fork(); if (ret < 0) if (ret < 0) { l{ log(“backup: ...”);og(“backup: ...”);

return -1; }return -1; }if (ret == 0) {// child returnsif (ret == 0) {// child returns

return restarts;return restarts;}}while(ret != waitpid(ret,0,0)) ;while(ret != waitpid(ret,0,0)) ;

}}}}

int backup() {int backup() {int ret, restarts = 0;int ret, restarts = 0;for (;; restarts++) {for (;; restarts++) {

ret = fork(); ret = fork(); if (ret < 0) if (ret < 0) { l{ log(“backup: ...”);og(“backup: ...”);

return -1; }return -1; }if (ret == 0) {// child returnsif (ret == 0) {// child returns

return restarts;return restarts;}}while(ret != waitpid(ret,0,0)) ;while(ret != waitpid(ret,0,0)) ;

}}}}

process can run withoutbackup: just return if

fork fails

process can run withoutbackup: just return if

fork fails

18

Selective Retries

• Retries:Retries:– repeat a call until it succeeds or until we run out of time (timeout) or max.

number of retries

• Selective Retries:Selective Retries:– repeat only calls when there is a chance that retry can succeed– e.g., memory shortage might disappear– e.g., invalid argument will typically stay invalid

19

Not always clear if retry could succeed

ForkFork() creates a child process that differs from the parent process only in its PID and () creates a child process that differs from the parent process only in its PID and PPID, and in the fact that resource utilizations are set to 0. File locks and pending PPID, and in the fact that resource utilizations are set to 0. File locks and pending signals are not inherited.signals are not inherited.

RETURN VALUE RETURN VALUE On success, the PID of the child process is returned in the parent's On success, the PID of the child process is returned in the parent's thread of execution, and a 0 is returned in the child's thread of execution. thread of execution, and a 0 is returned in the child's thread of execution. On failure, On failure, a -1 willa -1 will be returned in the parent's context, no child process will be created, and be returned in the parent's context, no child process will be created, and errnoerrno will be set appropriately. will be set appropriately.

ERRORSERRORSEAGAIN forkEAGAIN fork() cannot allocate sufficient memory to copy the parent's page tables and () cannot allocate sufficient memory to copy the parent's page tables and

allocate a task structure for the child.allocate a task structure for the child.EAGAIN EAGAIN It was not possible to create a new process because the caller's It was not possible to create a new process because the caller's RLIMIT_NPROCRLIMIT_NPROC

resource limit was encountered.resource limit was encountered.ENOMEM forkENOMEM fork() failed to allocate the necessary kernel structures because memory is () failed to allocate the necessary kernel structures because memory is

tight.tight.

20

Selective Retries

int ft = backup();int ft = backup();

......

forever {forever {

wait_for_request(Request);wait_for_request(Request);

process_request(Request);process_request(Request);

}}

int ft = backup();int ft = backup();

......

forever {forever {

wait_for_request(Request);wait_for_request(Request);

process_request(Request);process_request(Request);

}}

Can fail but might succeed when more memory avail or

less processes

Can fail but might succeed when more memory avail or

less processes

Infinite re-tries? Delays?

21

Selective Retries

int ft = backup();int ft = backup();

......

forever {forever {

if (ft < 0) { ft = backup(); }if (ft < 0) { ft = backup(); }

wait_for_request(Request);wait_for_request(Request);

process_request(Request);process_request(Request);

}}

int ft = backup();int ft = backup();

......

forever {forever {

if (ft < 0) { ft = backup(); }if (ft < 0) { ft = backup(); }

wait_for_request(Request);wait_for_request(Request);

process_request(Request);process_request(Request);

}}

Retry if no backupRetry if no backup

Might be a lot of retries ... Might be a lot of retries ... state might already be corrupted

state might already be corrupted

22

Retry Questions...

• How often should we retry?How often should we retry?– should we wait between retries?

• Should we retry at some later point in time?Should we retry at some later point in time?– how many times until we give up?

• At what level should we retry?At what level should we retry?

23

Hierarchical Retries

potentially:exp. increase in retries!

composability: retries should be independent of each other!

function h

calls

calls

retry

retry

function f retry

24

Selective Retries

• Under high load calls might fail due to resource Under high load calls might fail due to resource shortageshortage

• We can use selective retries to increase probability of We can use selective retries to increase probability of success during resource allocationsuccess during resource allocation

• Operating systems like Linux have a “killer process” Operating systems like Linux have a “killer process” that terminates processes if too few resources existthat terminates processes if too few resources exist

• With selective retries this will make sure that With selective retries this will make sure that processes that survive can complete their requestsprocesses that survive can complete their requests

25

Bohrbugs

26

Continuous Crashing

27

Continuous Crashing

• Finite number of retries by client?Finite number of retries by client?– client will stop sending the request eventually

• But what if we cannot control clientsBut what if we cannot control clients– clients might think it is fun to crash server? DoS attacks take place like this!

What happens if the retrying request activates bohrbugs?What happens if the retrying request activates bohrbugs?

28

Graceful Degradation

• Alternative Approach:Alternative Approach:– server needs to make sure that failed request is only retried for

a fixed number of times

• Problem:Problem:– how can we know that a request has already been partially

processed several times?

• Solution:Solution:– need to keep some state info between request instances!

29

State Handling

30

Using Session State

int ft = backup();

...

forever {

wait_for_request(Request);

get_session_state(Request);

if(num_retries < N) {

process_request(Request);

store_session_state(Request);

}else { return_error(); }

}

int ft = backup();

...

forever {

wait_for_request(Request);

get_session_state(Request);

if(num_retries < N) {

process_request(Request);

store_session_state(Request);

}else { return_error(); }

}

updates number of retries

updates number of retries

31

Crash of Parent!

32

What if parent process dies?

Possible reasons:Possible reasons:• Operator might kill wrong processOperator might kill wrong process• Parent might terminate for some other reason, e.g.,Parent might terminate for some other reason, e.g.,

– Linux: out of memory process killer (see earlier slide!)– Kills processes that use too much memory:

• “more cpu time decreases the chance of being killed”• Parent could get killed

33

Detecting Parent Crashes

34

Detection of Process Crashes

• Pipe used to Pipe used to communicate between communicate between procsprocs– Unix: ls | sort

• Pipe end closed whenPipe end closed when– process terminates

• Process B can detectProcess B can detect– when process A

terminated

35

Adding Parent Termination Detection

int fd[2]; // pipe fdint fd[2]; // pipe fdint backup() { ...int backup() { ...pipe(fd);pipe(fd);ret = fork();ret = fork();if (ret == 0) { // child?if (ret == 0) { // child?

close (fd [1]);close (fd [1]); return restarts++;return restarts++;} // parent closes other end:} // parent closes other end:close (fd [0]);close (fd [0]);

......

int fd[2]; // pipe fdint fd[2]; // pipe fdint backup() { ...int backup() { ...pipe(fd);pipe(fd);ret = fork();ret = fork();if (ret == 0) { // child?if (ret == 0) { // child?

close (fd [1]);close (fd [1]); return restarts++;return restarts++;} // parent closes other end:} // parent closes other end:close (fd [0]);close (fd [0]);

......

write end

read end

36

Child can detect parent termination

int hasParentTerminated() {int hasParentTerminated() {

// check if other end of pipe has been closed// check if other end of pipe has been closed

......

}}

int hasParentTerminated() {int hasParentTerminated() {

// check if other end of pipe has been closed// check if other end of pipe has been closed

......

}}

has to be called periodically

37

Problem: State Corruption

38

Parent Replacement

alreadyexecutedrequests

e.g., new parent allocated resources that are never freed

39

Alternative Approach

40

Process Links

Generalized Crash DetectionGeneralized Crash Detection

41

Linking Processes

• We can use a pipe as a failure detector:We can use a pipe as a failure detector:– We can detect that a process has terminated

• We can use that for:We can use that for:– Replacing failed processes– Providing some “termination atomicity”:

• If one process fails, some other processes might not be able to work properly anymore

• One simple way is to terminate all such processes

• Garbage collection of processes

42

Process Links: “Termination Atomicity”

• Set of cooperating processesSet of cooperating processes• If some process p terminates, each linked process q must If some process p terminates, each linked process q must

terminateterminate• We can link processes via “process links”:We can link processes via “process links”:

– Programming language support – Java, Erlang, …

43

Pipe And Filter

44

Example: Farmer / Worker

45

Asymmetric Link Behavior

46

Master as Process PairMitigates parent crash semantics by avoiding

terminations as possible for liveness

Error Recovery in Distributed Systems (DS)Checkpointing

48

Handling Transients?

• Transient Fault: a fault that is no longer present after system restart

• Many flavors:– SW transients– OS transients– Middleware/Protocol transients– Network transients– Operational transients– Power transients

• Need to recover from the effects of transients detect them! … let us assume simple local sanity checks (acceptance tests)

exisit!

49

So how does one handle these transients?So how does one handle these transients?

Objective: - sustained ops (key driver: sustained performance)- transparent handling of bugs (to users and application designers)

System Model:System Model: Coupled/Distributed/Networked Processes Coupled/Distributed/Networked Processes

50

Periodic Checkpointing

51

Checkpointing

pid parent = getpid();pid parent = getpid();......for (int nxt_ckpt=for (int nxt_ckpt=00 ;; nxt_ckpt -- ) { ;; nxt_ckpt -- ) {if (nxt_ckpt <= 0) {if (nxt_ckpt <= 0) {

pid newparent = getpid();pid newparent = getpid();if (backup() >= 0 if (backup() >= 0

&& parent != newparent) {&& parent != newparent) {kill(parent, KILL);kill(parent, KILL);parent = newparent;parent = newparent;nxt_ckpt = nxt_ckpt = NN;;

}}}}wait_for_request(Request);wait_for_request(Request);process_request(Request);process_request(Request);

}}

pid parent = getpid();pid parent = getpid();......for (int nxt_ckpt=for (int nxt_ckpt=00 ;; nxt_ckpt -- ) { ;; nxt_ckpt -- ) {if (nxt_ckpt <= 0) {if (nxt_ckpt <= 0) {

pid newparent = getpid();pid newparent = getpid();if (backup() >= 0 if (backup() >= 0

&& parent != newparent) {&& parent != newparent) {kill(parent, KILL);kill(parent, KILL);parent = newparent;parent = newparent;nxt_ckpt = nxt_ckpt = NN;;

}}}}wait_for_request(Request);wait_for_request(Request);process_request(Request);process_request(Request);

}}

52

Backup Code Revisited

• Issue:Issue:– If we have multiple generations, we want the ancestors only to take over

if none of the children is alive

• Use process links instead of waitpidUse process links instead of waitpid– Waitpid in endless loop is dangerous anyhow...

53

Temporal Redundancy

““Redo” tasks on error detection Redo” tasks on error detection

Xtask progress

transient occurs(and is detected)

P

REDO task

54

Backward Error Recovery

• Save process state at predetermined (periodic) Save process state at predetermined (periodic) recovery pointsrecovery points– Called “checkpoints”– Checkpoints stored on stable storage, not affected by same failures

• Recover by Recover by rolling backrolling back to a previously saved (error-free) state to a previously saved (error-free) state

task progress

transient

task progress transient (& acceptance

test)

X

Xchkpt chkpt

chkpt: complete set of (state) information needed to re-starttask executionfrom chkpt.

P

P

55

Advantages of Backward Recovery

+ Requires no knowledge of the errors in the system state

+ Can handle arbitrary / unpredictable faults (as long as they do not affect the recovery mechanism)

+ Can be applied regardless of the sustained damage (the saved state must be error-free, though)

+ General scheme / application independent

+ Particularly suitable for recovering from transient faults

56

Disadvantages of Backward Recovery

―Requires significant resources (e.g. time, computation, stable storage) for checkpointing and recovery

―Checkpointing requires– To identify consistent states– The system to be halted / slowed down temporarily

―Care must be taken in concurrent systems to avoid the orphans, lost and domino effects (will cover later in the lecture...)

57

Forward Error Recovery

• Detect the error• Damage assessment• Build a new error-free state from which the system can

continue execution– “Safe stop”– Degraded mode– Error compensation

• E.g., switching to a different component, etc…

Faultdetected

Fault manifests

State Reconstruction

Damage Assessment

58

Advantages of Forward Recovery

+ Efficient (time / memory)– If the characteristics of the fault are well understood, forward recovery is

a very efficient solution

+ Well suited for real-time applications– Missed deadlines can be addressed

+ Anticipated faults can be dealt with in a timely way using redundancy

59

Disadvantages of Forward Recovery

—Application-specific—Can only remove predictable errors from the system state—Requires knowledge of the actual error—Depends on the accuracy of error detection, potential damage

prediction, and actual damage assessment—Not usable if the system state is damaged beyond

recoverability

60

Error Recovery

• Save process state at predetermined (periodic) Save process state at predetermined (periodic) recovery pointsrecovery points– Called “checkpoints”– Checkpoints stored on stable storage, not affected by same failures

• Recover by Recover by rolling backrolling back to a previously saved (error-free) state to a previously saved (error-free) state

task progress

transient

task progress transient (& acceptance

test)

X

Xchkpt chkpt

chkpt: complete set of (state) information needed to re-starttask executionfrom chkpt.

P

P

61

Logging Requests

request_no = 0;...

for (int nxt_ckpt=0 ;; nxt_ckpt--) {checkpoint(&nxt_ckpt);

wait_for_request(Request);log_to_disk(++request_no,Request);

process_request(Request);}

request_no = 0;...

for (int nxt_ckpt=0 ;; nxt_ckpt--) {checkpoint(&nxt_ckpt);

wait_for_request(Request);log_to_disk(++request_no,Request);

process_request(Request);}

62

Processing Log

request_no = 0;...

for (int nxt_ckpt=0 ;; nxt_ckpt--) {if (checkpoint(&nxt_ckpt) == recovery) {

while((request_no+1,R) in log) { process_request(R); request_no+

+;}

}wait_for_request(Request);

log_to_disk(++request_no,Request);process_request(Request);

}

request_no = 0;...

for (int nxt_ckpt=0 ;; nxt_ckpt--) {if (checkpoint(&nxt_ckpt) == recovery) {

while((request_no+1,R) in log) { process_request(R); request_no+

+;}

}wait_for_request(Request);

log_to_disk(++request_no,Request);process_request(Request);

}

63 63

Problems:Lost Updates, corrupted saved states...not

easy to fix!

• State diverges from original computationState diverges from original computation– results of replayed request might be different

• could detect this by keeping a log of replies

– new client request might be processed correctly• e.g., ids in requests might not make sense to the current server

instance

64

Frequency vs Completeness

• Less complete checkpointLess complete checkpoint– higher probability that error is purged from saved state– omitted state needs to be recomputed on recovery

• Less frequent checkpointingLess frequent checkpointing– checkpoint becomes larger– state information becomes stale– …

• ““Application save” is (in practice) very robust Application save” is (in practice) very robust – might not always contain all info (e.g., window position) for

transparent restart

66

Distributed Systems: Checkpointing

So So how how does one place the chkpts & does one place the chkpts & wherewhere??

Should we synchronize process-es(-ors) & checkpoints?Should we synchronize process-es(-ors) & checkpoints?

P1

P2

P3

Note: A system can be synchronous though the msg. based comm. can still be async!

67

Options for Checkpoint Storage?

• Key building block: Key building block: stable storagestable storage– Persistent: survives the failure of the entity that created/initialized/used it– Reliable: very low probability of losing or corrupting info

• Implementation Implementation – Typically non-volatile media (disks)– Single disk? Often replicated/multiple volatile memories– Make sure one replica at least always survives!

68

Options for Checkpoint Placement?

• UncoordinatedUncoordinated: processes take checkpoints independently: processes take checkpoints independently– Pro: no delays– Con: consistency?

• CoordinatedCoordinated: have processes coordinate before taking a checkpoint: have processes coordinate before taking a checkpoint– Pro: globally consistent checkpoints– Con: co-ordination delays

• Communication-inducedCommunication-induced: checkpoint when receiving and prior to : checkpoint when receiving and prior to processing messages that may introduce conflictsprocessing messages that may introduce conflicts

69

What happens when we don’t synchronize?

orphan msgs.orphan msgs. lost msgs.lost msgs.

P1

P2

P1

P2

X

X

chkpt C1 chkpt C1

chkpt C2 chkpt C2

Msg Msg

fault

fault

Rollback to C1 & C2 gives an inconsistent state

70

..and more problems... domino effects

P1

P2

X

fault

* problems are fixable though require considerable pre-planning

oo

71

• PP11 fails, recovers, rolls back to fails, recovers, rolls back to CCaa

• PP22 finds it received message ( finds it received message (mmii) never sent, rollback to ) never sent, rollback to CCbb

• PP33 finds it received message ( finds it received message (mmjj) never sent, roll back to ) never sent, roll back to CCcc• …………

P1

P2

P3

Recovery line Ca

Cb

Cc

Boom!

mi

mj

72

Consistent Checkpoints: No orphans, lost msgs or dominos!

P1

P2

all messages sent ARE recorded with a consistent cut!

P3

consistent cut

73

• Processes co-ordinate (synchronize) to set checkpoints guaranteed to be Processes co-ordinate (synchronize) to set checkpoints guaranteed to be consistentconsistent– 2 Phase Consistent Checkpointing

Phase IPhase I: : An initiator node X takes a “tentative” checkpoint and requests all other processes to set checkpoints. All processes inform X when they are willing to checkpoint

Phase IIPhase II: : If all other processes are willing to checkpoint, then X decides to make its checkpoint permanent; otherwise X decides that all checkpoints shall be discarded. Informs all of decision

Either all or none take permanent checkpoints!

Synchronizing Checkpoints (not the processors!)

74

2Phase Consistent Checkpoints

X

R

{X1,R1,S1} preliminary checkpoints{X2,R2,S2} consistent checkpoints

S

requests

X1 X2

S2

R1 R2

S1

75

Atomic Commitment and Window of Vulnerability

• So far, recovery of actions that can be individually rolled back…So far, recovery of actions that can be individually rolled back…

• Better idea: Better idea: – Encapsulate actions in sequences that cannot be undone individually– Atomic transactions provide this– Properties: ACID

• Atomicity: transaction is an indivisible unit of work• Consistency: transaction leaves system in correct state or aborts• Isolation: transactions’ behavior not affected by other concurrent

transactions• Durability: transaction’s effects are permanent after it commits • (Serializable)

76

Atomic Commit (cont.)

• To implement transactions, processes must coordinate!To implement transactions, processes must coordinate!– Bundling of related events– Coordination between processes

• One protocol: two-phase commitOne protocol: two-phase commit

Commit Abort

Q: can this somehow block?

77

Two-phase commit (cont.)

• Problem: coordinator failure after PREPARE & before COMMIT blocks Problem: coordinator failure after PREPARE & before COMMIT blocks participants waiting for decision (a)participants waiting for decision (a)

• Three-phase commit overcomes this (b)Three-phase commit overcomes this (b)– delay final decision until enough processes “know” which decision will be

taken

78

State Transfer

• Reintegrating a failed component requires state Reintegrating a failed component requires state transfer!transfer!– If checkpoint/log to stable storage, recovering replica can do

incremental transfer• Recover first from last checkpoint• Get further logs from active replicas

– Goal: minimal interference with remaining replicas– Problem: state is being updated!

• Might result in incorrect state transfer (have to coordinate with ongoing messages)

• Might change such that the new replica can never catch up!– Solution: give higher priority to state-transfer messages

• Lots of variations…