Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05,...

18
Xavier DÉFAGO School of Information Science, Japan Adv. Inst. of Science & Tech. (JAIST) PRESTO, Japan Science & Tech. Agency (JST) PDCAT’05, Dalian, China, December 5, 2005 Failure Detection in Distributed Systems: Retrospective and recent advances PDCAT’05, Dalian, China, December 5, 2005 Failure Detection Context Two computer programs Exchange messages 2 B <script var a= var xl if(xls A <script var a= var xl if(xls PDCAT’05, Dalian, China, December 5, 2005 Failure Detection Context A crashes B must detect crash 3 B <script var a= var xl if(xls A BOOM A BOOM PDCAT’05, Dalian, China, December 5, 2005 Simple Approach Solution Use time & timeouts Problem How to set timeout? What implications, when too long. when too short. 4 PDCAT’05, Dalian, China, December 5, 2005 Simple Approach Problems Depends on application needs. Depends on system behavior. Needs global time? Outcome Long failover / reconfiguration time Unstable applications Many side-effects: difficult to maintain 5 PDCAT’05, Dalian, China, December 5, 2005 Question 6 Can we do better? How?

Transcript of Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05,...

Page 1: Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05, Dalian, China, De cembe r 5, 2005 Failure Detection in Distributed Systems: Retrospective

Xavier DÉFAGO

School of Information Science,Japan Adv. Inst. of Science & Tech. (JAIST)

PRESTO, Japan Science & Tech. Agency (JST)

PDCAT’05, Dalian, China, December 5, 2005

Failure Detection in Distributed Systems:

Retrospective and recent advances

PDCAT’05, Dalian, China, December 5, 2005

Failure Detection

• Context• Two computer programs

• Exchange messages

2

B

<script

var a=

var xl

if(xls

A

<script

var a=

var xl

if(xls

PDCAT’05, Dalian, China, December 5, 2005

Failure Detection

• Context• A crashes

• B must detect crash

3

B

<script

var a=

var xl

if(xls

A

<script

var a=

var xl

if(xls

BOOM

A BOOM

PDCAT’05, Dalian, China, December 5, 2005

Simple Approach

• Solution

• Use time & timeouts

• Problem

• How to set timeout?

• What implications,

• when too long.

• when too short.

4

PDCAT’05, Dalian, China, December 5, 2005

Simple Approach

• Problems

• Depends on application needs.

• Depends on system behavior.

• Needs global time?

• Outcome

• Long failover / reconfiguration time

• Unstable applications

• Many side-effects: difficult to maintain

5 PDCAT’05, Dalian, China, December 5, 2005

Question

6

Can we do better?How?

Page 2: Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05, Dalian, China, De cembe r 5, 2005 Failure Detection in Distributed Systems: Retrospective

PDCAT’05, Dalian, China, December 5, 2005

Answer

7

• Can we do better?

• YES!

• How?

• Depends on context

PDCAT’05, Dalian, China, December 5, 2005 8

Illustration: Case 1

• Simple illustration• Set of tasks

CPU farm

Job dispatcher

PDCAT’05, Dalian, China, December 5, 2005 9

Illustration: Case 1

• Simple illustration• Set of tasks

• Dispatch tasks; wait for results

CPU farm

Job dispatcher

BOOM

PDCAT’05, Dalian, China, December 5, 2005

Illustration: Case 2

• Dispatcher• Failover

• Must keep consistency

Backup dispatcher

10

CPU farm

Job dispatcher BOOM

PDCAT’05, Dalian, China, December 5, 2005

Outline

• I: Theory

• Agreement problems, Unreliable failure detectors

• II: QoS

• QoS metrics, Comparison of FDs

• III: Implementation

• Basic FDs, Adaptive FDs

• IV: Accrual FDs

• Novel concept

• V: Conclusion

11 PDCAT’05, Dalian, China, December 5, 2005

Part ITheory

Page 3: Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05, Dalian, China, De cembe r 5, 2005 Failure Detection in Distributed Systems: Retrospective

PDCAT’05, Dalian, China, December 5, 2005

Context: Case 2

• Dispatcher• Failover

• Must keep consistency

• Requires agreement between dispatchers

Backup dispatcher

13

CPU farm

Job dispatcher BOOM

PDCAT’05, Dalian, China, December 5, 2005

Outline: Part I

• System Model

• Consensus & Impossibility

• Unreliable FDs

• Solving Consensus w/FD

14

PDCAT’05, Dalian, China, December 5, 2005

Outline: Part I

• System Model

• Consensus & Impossibility

• Unreliable FDs

• Solving Consensus w/FD

15 PDCAT’05, Dalian, China, December 5, 2005 16

• Processes

• represents running program (or state machine)

• Communication

• message driven

• fully connected

System Model

C = {cij |cij : channel from pi to pj}

p1

p2

p3

p4

p5

p6

P = {p1, p2, . . . , pn}

PDCAT’05, Dalian, China, December 5, 2005

Failure Models

• Crash failures

• Failed process stops executing any event.

• Omission failures

• Failed process omits executing some events.

• Arbitrary failures (Byzantine)

• Failed process can do anything.

17 PDCAT’05, Dalian, China, December 5, 2005

Failure Models

• Crash failures

• Failed process stops executing any event.

• Omission failures

• Failed process omits executing some events.

• Arbitrary failures (Byzantine)

• Failed process can do anything.

18

Page 4: Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05, Dalian, China, De cembe r 5, 2005 Failure Detection in Distributed Systems: Retrospective

PDCAT’05, Dalian, China, December 5, 2005 19

• Synchronous

• Bound on comm. delays

• Bound on process speed

• Asynchronous

• No bounds

• Semi-synchronous (e.g.)

• GST: global stabilization time

• Unknown bounds after GST

Synchrony

p1

p2

p3

p4

p5

p6

PDCAT’05, Dalian, China, December 5, 2005

Outline: Part I

• System Model

• Consensus & Impossibility

• Unreliable FDs

• Solving Consensus w/FD

20

PDCAT’05, Dalian, China, December 5, 2005 21

• Problem• All (correct) processes decide

• Decision value is same for all processes

• Decision value is one of proposed values

Consensus

Consensus

Application

propose decide

p1

p3

p2 Consensus

propose(vi) decide(v)

v = v2v1

v2

v3

PDCAT’05, Dalian, China, December 5, 2005 22

• Integrity

• Every process decides at most once.

• Validity

• If a process decides v, then v was proposed by some process.

• Agreement

• Two correct processes do not decide differently

• Termination

• Every correct process eventually decides.

Consensus (specification)

PDCAT’05, Dalian, China, December 5, 2005 23

Consensus (specification)

• Agreement• Two correct processes do not decide differently

p1

p3

p2 Consensus

propose(vi) decide(v)

v1

v2

v3

PDCAT’05, Dalian, China, December 5, 2005 24

Consensus (specification)

• Validity• If a process decides v, then v was proposed by some

process.

p1

p3

p2 Consensus

propose(vi) decide(v)

v1

v2

v3

Page 5: Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05, Dalian, China, De cembe r 5, 2005 Failure Detection in Distributed Systems: Retrospective

PDCAT’05, Dalian, China, December 5, 2005 25

Consensus (specification)

• Termination• Every correct process eventually decides.

p1

p3

p2 Consensus

propose(vi) decide(v)

v1

v2

v3

PDCAT’05, Dalian, China, December 5, 2005 26

Consensus (specification)

• Integrity• Every process decides at most once.

p1

p3

p2 Consensus

propose(vi) decide(v)

v1

v2

v3

PDCAT’05, Dalian, China, December 5, 2005

Impossibility

• Model

• Asynchronous

• Some process may crash

• Result

• Consensus has no deterministic solution

• Reason

• Impossibility to distinguish crashed vs. slow process

• NB

• Cannot guarantee Termination in all cases.

• Probabilistic Termination possible (w/Prob.=1)

27

from [FLP85]

PDCAT’05, Dalian, China, December 5, 2005

Related Agreement Problems

• Total Order Broadcast

• Deliver messages in same sequence

• See [DSU04]

• Leader Election

• Elect one correct leader

• Group Membership

• Agree on group composition

• See [CKV01]

• Atomic Commit

• Decide on issue of transaction: Commit / Abort

28

Impossible

Impossible

Impossible

Impossible

PDCAT’05, Dalian, China, December 5, 2005

Outline: Part I

• System Model

• Consensus & Impossibility

• Unreliable FDs

• Solving Consensus w/FD

29 PDCAT’05, Dalian, China, December 5, 2005 30

• FD module

• Attached to each process

• Queried locally

• Outputs list of suspected processes

• Failure Detector

• Distributed entity

• Federation of FD modules

• Global properties

Failure Detectors (concept)

Failure Detector

Oracle

p1 p2

p3 p4

FD1 FD2

FD3 FD4

query

FD modulehost

from [CT96]

Page 6: Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05, Dalian, China, De cembe r 5, 2005 Failure Detection in Distributed Systems: Retrospective

PDCAT’05, Dalian, China, December 5, 2005 31

• FD Unreliable

• Can make mistakes; can change its mind

• Possible Mistakes

• suspect p and p has not crashed (wrong suspicion)

• trust p and p has crashed

• Properties

• Restrict allowed mistakes

• Completeness; Accuracy

Unreliable Failure Detectors

PDCAT’05, Dalian, China, December 5, 2005 32

• Strong Completeness• Eventually every process that crashes is permanently

suspected by all processes.

Completeness

p1

p3

p2

p4

BOOM p1!Dp2

p1!Dp3

p1!Dp4

suspect

suspect

suspect

PDCAT’05, Dalian, China, December 5, 2005 33

• Strong Accuracy• No process is suspected before it crashes.

Accuracy

Attention: counter!exampl"

p1

p3

p2

p4

BOOM

p1!Dp3

suspect

suspect

suspect

p4!Dp2

p2!Dp4

BOOM

PDCAT’05, Dalian, China, December 5, 2005 34

• Weak Accuracy• Some correct process is never suspected

Accuracy

p1

p3

p2

p4

BOOM

p1!Dp3

suspect

suspect

suspect

p4!Dp2

p2!Dp4

BOOM

PDCAT’05, Dalian, China, December 5, 2005 35

• Eventual Strong Accuracy• There is a time after which correct processes are not

suspected by any correct process.

• Eventual Weak Accuracy• There is a time after which some correct process is never

suspected by any correct process

Eventual Accuracy

p1

p3

p2Chaotic Period

no guarantee

Good Period

accuracy satisfied

PDCAT’05, Dalian, China, December 5, 2005 36

Failure Detector Classes

• Remark• Other classes of failure detectors exist.

Accuracy Perpetual Eventual

Strong

Weak

P !P

!SS

(Perfect) (Eventually Perfect)

(Strong)(Eventually Strong)

Page 7: Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05, Dalian, China, De cembe r 5, 2005 Failure Detection in Distributed Systems: Retrospective

PDCAT’05, Dalian, China, December 5, 2005

Outline: Part I

• System Model

• Consensus & Impossibility

• Unreliable FDs

• Solving Consensus w/FD

37 PDCAT’05, Dalian, China, December 5, 2005 38

• System Model

• Asynchronous System

• Crash permanent failures

• Quasi-Reliable channels

• Failure detector

• Assumptions

• At least majority of processes are correct

Solving Consensus with !Sfrom [CT96]

FD ! !S

t =

!

n ! 1

2

"

where t is max. faulty processes

PDCAT’05, Dalian, China, December 5, 2005 39

• Requirements• Reliable Broadcast, failure detector

• Concept• Use rotating coordinator

• Asynchronous rounds, 4 phases

Solving Consensus with !S

! !S

p1

p3

p2 Consensus

propose(vi) decide(v)

v = v2v1

v2

v3

PDCAT’05, Dalian, China, December 5, 2005 40

• Selection of coordinator

• One coordinator/round

• Suspicion

• Suspect coordinator => reject round => go to next one.

• Estimate

• Processes keep estimate (of decision value)

• Initialized with initial value

• Modified in round i by coordinator of i

• Timestamp

• Estimate modified in what round

Solving Consensus with !S

round i : c = p(1+i mod n)

PDCAT’05, Dalian, China, December 5, 2005

Solving Consensus with !S(crash-free case)

41

p1

p2

p3

p4

p5

v5

v4

v3

v2

v1

Phase 1

estimate

PDCAT’05, Dalian, China, December 5, 2005

Solving Consensus with !S(crash-free case)

42

p1

p2

p3

p4

p5

v5

v4

v3

v2

v1

Phase 1

estimate

Phase 2

propose

v := v1

Page 8: Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05, Dalian, China, De cembe r 5, 2005 Failure Detection in Distributed Systems: Retrospective

PDCAT’05, Dalian, China, December 5, 2005

Solving Consensus with !S(crash-free case)

43

p1

p2

p3

p4

p5

v5

v4

v3

v2

v1

Phase 1

estimate

Phase 2

propose

v := v1

Phase 3

ack

v1

v1

v1

v1

v1

PDCAT’05, Dalian, China, December 5, 2005

Solving Consensus with !S(crash-free case)

44

p1

p2

p3

p4

p5

v5

v4

v3

v2

v1

Phase 1

estimate

Phase 2

propose

v := v1

Phase 3

ack

v1

v1

v1

v1

v1

Phase 4

decide

v1

v1

v1

v1

v1

PDCAT’05, Dalian, China, December 5, 2005

Solving Consensus with !S(one-crash case)

Phase 1

Phase 2

p1

p2

p3

p4

p5

estimate

Round 1

v5

v4

v3

v2

v1

v := v1

45 PDCAT’05, Dalian, China, December 5, 2005

Solving Consensus with !S(one-crash case)

Phase 1

Phase 2

p1

p2

p3

p4

p5

estimate

Round 1

v5

v4

v3

v2

v1

v := v1 BOOM

46

PDCAT’05, Dalian, China, December 5, 2005

Solving Consensus with !S(one-crash case)

Phase 1 Phase 3

Phase 2

p1

p2

p3

p4

p5

propose

estimate

Round 1

v5

v4

v3

v2

v1

v := v1 BOOM

47 PDCAT’05, Dalian, China, December 5, 2005

Solving Consensus with !S(one-crash case)

Phase 1 Phase 3

Phase 2

p1

p2

p3

p4

p5

propose

estimate

Round 1

v5

v4

v3

v2

v1

v := v1 BOOM

48

Suspect p1

Suspect p1ack

v1

v1

Nack

Page 9: Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05, Dalian, China, De cembe r 5, 2005 Failure Detection in Distributed Systems: Retrospective

PDCAT’05, Dalian, China, December 5, 2005

Solving Consensus with !S(one-crash case)

Phase 1 Phase 3

Phase 2

p1

p2

p3

p4

p5

propose

estimate

Round 1

v5

v4

v3

v2

v1

v := v1 BOOM

49

ack

v1

v1

Nack

Round 2

v2

v3

Phase 1

estimate

PDCAT’05, Dalian, China, December 5, 2005

Solving Consensus with !S(one-crash case)

Phase 1 Phase 3

Phase 2

p1

p2

p3

p4

p5

propose

estimate

Round 1

v5

v4

v3

v2

v1

v := v1 BOOM

50

ack

v1

v1

Nack

Round 2

v2

v3

Phase 1

estimate

Phase 2

v := v1

propose

Phase 4

v1

v1

v1

v1

ack

decide

Phase 3

PDCAT’05, Dalian, China, December 5, 2005

Event. Accuracy in Practice

• Observations• Bad period: stagnation (progress possible)

• Good period: progress guaranteed

• Termination => finished

51

Termination

Stagnation

ProgressStagnation

Progress

p1

p3

p2

Chao

tic P

erio

d

Good P

erio

d

Agreement

PDCAT’05, Dalian, China, December 5, 2005

Event. Accuracy in Practice

• In practice• Good period “often enough” => Termination

• FD: must ensure accuracy “often enough”

52

Termination

Stagnation

ProgressStagnation

Progress

p1

p3

p2

Chao

tic P

erio

d

Good P

erio

d

Agreement

PDCAT’05, Dalian, China, December 5, 2005 53

Weakest Failure Detectorfor Consensus

• Properties: !W

• Weak Completeness:

Crash detected by some correct process.

• Eventual Weak Accuracy:

Eventually, some correct process never suspected.

• Equivalence

• ♢W ! ♢S

from [CHT96]

PDCAT’05, Dalian, China, December 5, 2005

": Eventual Leadership

• Definition

• Eventually, one correct process becomes leader for all.

• Application

• Solve Consensus: e.g., PAXOS algorithm of Lamport.

• Equivalence

• ♢W ! !

54

from [Lam98]

Page 10: Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05, Dalian, China, De cembe r 5, 2005 Failure Detection in Distributed Systems: Retrospective

PDCAT’05, Dalian, China, December 5, 2005

Part IIQuality of Service

PDCAT’05, Dalian, China, December 5, 2005

Outline: Part II

• Quality of Service Metrics

• Comparison

56

PDCAT’05, Dalian, China, December 5, 2005

QoS of Failure Detectors

• Latency Metricwhen p faulty:

• Detection time

• “How long to detect?” The shorter the better.

57

trust

suspect

up

down

BOOM

detection time

monitored

process

FD

output

PDCAT’05, Dalian, China, December 5, 2005

QoS of Failure Detectors

• Accuracy Metricswhen p correct:

• Average mistake rate

• Query accuracy prob.

• Good period duration

trust

suspect

up

mistakes!

FD

output

monitored

process

58

PDCAT’05, Dalian, China, December 5, 2005

QoS Tradeoff

• Aggressive FD• Short latency; low accuracy

• Conservative FD• Long latency; high accuracy

detection time

(in

)acc

ura

cy

59

!{d,a}!{d',a'}

!{d",a"}

Aggressive

Conservative

PDCAT’05, Dalian, China, December 5, 2005

detection time

(in

)acc

ura

cy

Requirements vs. Guarantees

• FD QoS• !{d,a} : effect. detection time, effect. mistakes

• Application requirements• "{D,A} : max. detect. time, max. mistakes

60

!{D,A}

!{d,a}!{d',a'}

!{d",a"}

Page 11: Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05, Dalian, China, De cembe r 5, 2005 Failure Detection in Distributed Systems: Retrospective

PDCAT’05, Dalian, China, December 5, 2005

Parametric Failure Detector

• Parametric FD protocol• Parameter value defines FD best QoS

• Tradeoff: accuracy <-> detection latency

(in

)acc

ura

cy

detection time

61

!{d,a}

!{d',a'}

PDCAT’05, Dalian, China, December 5, 2005

QoS Coverage

• Coverage of FD• FD could be tuned to support app. req.

• Measure of FD

detection time

(in

)acc

ura

cy

62

PDCAT’05, Dalian, China, December 5, 2005

detection time

(in

)acc

ura

cy

Dynamic QoS Coverage

• Approximate coverage• Instantiate several QoS sets

• Find minimal set; minimal change

63 PDCAT’05, Dalian, China, December 5, 2005

Comparing Parametric FDs

• Simple case• 2 FDs

• A includes B

• So, A better than B

detection time

(in

)acc

ura

cy

64

FDAFDB

PDCAT’05, Dalian, China, December 5, 2005

Comparing Parametric FDs

• Complex case• 3 FDs: aggressive, intermediate, conservative

• Which one is better?

detection time

(in

)acc

ura

cy

65

FDA

FDBFDC

PDCAT’05, Dalian, China, December 5, 2005

Part IIIImplementations

Page 12: Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05, Dalian, China, De cembe r 5, 2005 Failure Detection in Distributed Systems: Retrospective

PDCAT’05, Dalian, China, December 5, 2005

Outline: Part III

• Basic FDs

• Adaptive FDs

• Experimental Results

67 PDCAT’05, Dalian, China, December 5, 2005

Interrogation

• Advantage• Simplest

• Drawback• Longest detection time

68

!i query interval!to timeout

p

q!i

!to

BOOM

!to

“alive?”

“YES!”

PDCAT’05, Dalian, China, December 5, 2005

Heartbeat

• Advantage• Good with broadcast medium

• Drawback• q takes active role

p

q

!i !i !i !i

BOOM

!to !to

!i heartbeat interval

!to timeout

69 PDCAT’05, Dalian, China, December 5, 2005

Parameters & QoS

• Detection Time

• Heartbeat interval

• Timeout

• Transmission delays

• Accuracy

• Timeout

• Variations in transmission delays

70

PDCAT’05, Dalian, China, December 5, 2005

Heartbeat Interval

• Observation

• Small influence on accuracy

• Limited by network admin.

• Detection time " transmission + interval

• Rule-of-thumb

• Same order as average transmission delay

71 PDCAT’05, Dalian, China, December 5, 2005

Timeout

• Observation

• Influence on detection time

• Influence on accuracy

• Too short = instability

• Problem

• Depends on system behavior

• System behavior changes

• => need adaptive timeout

72

Page 13: Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05, Dalian, China, De cembe r 5, 2005 Failure Detection in Distributed Systems: Retrospective

PDCAT’05, Dalian, China, December 5, 2005 73

• Concept

• Periodic heartbeat

• Timeout => suspicion

• Timeout adjusted dynamically

• Parameters

• Period

• Safety margin

Adaptive Approach

Heartbeat

!to!to !to

!i !i !i

FDp

FDq

p suspects q

BOOM

p1

To1 estim.

FDq

QoS

heartbeat

sampling

PDCAT’05, Dalian, China, December 5, 2005

Chen’s Failure Detector

• Context

• Heartbeat failure detector

• Adaptive

• Several variants

• Adaptation

• Freshness points

• Parameters

• Known heartbeat interval

• Safety margin based on QoS

74

from [CTA02]

PDCAT’05, Dalian, China, December 5, 2005

Chen’FD: Freshness Points

• Freshness points• Samples heartbeat arrivals

• Compute normalized distribution

• Estimates future arrivals

• Adds safety margin !

75

p

q

!i !i !i

EAi EAi+1!i+1

!i+2!i

!

!i : heartbeat interval

HB i : ith heartbeat msgEAi : expected arrival for HB i

! : safety margin"i : freshness point for HB i

Samplearrivals

PDCAT’05, Dalian, China, December 5, 2005

Chen’s FD: Mechanism

• Suspicion when...

• Freshness point i past

• No heartbeat k#i received

• QoS tuning

• Adjust safety margin based on QoS requirements

• Other

• variants based on other assumptions (e.g., synchronized clocks, etc.)

76

PDCAT’05, Dalian, China, December 5, 2005

Bertier’s Failure Detector

• Principle

• Based on Chen’s failure detector

• Adaptive, but not tunable

• Safety margin

• Safety margin adjusted dynamically

• Use Van Jacobson’s estimation

• Very aggressive failure detector

77

from [BMS02]

PDCAT’05, Dalian, China, December 5, 2005

PHI-FD

• PHI-FD• Estimates arrival distribution probability

• Increase level based on probability

• Sets threshold for suspicions (based on QoS)

78

App. 3

do action(!)network

Plater (t)last arrival !

Failure Detector

App. 1

! > !1 ! suspect

App. 2

! > !2 ! suspect

heartbeat arrivals

sampling window

estimation

Plater (t)

t

tnow

! log10 Plater (tnow ! Tlast)Tlast

from [HDYK04]

Page 14: Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05, Dalian, China, De cembe r 5, 2005 Failure Detection in Distributed Systems: Retrospective

PDCAT’05, Dalian, China, December 5, 2005

Experimentation: LAN

• LAN

• single FastEther hub

• Parameters

• HB interval: 20$ms

• Duration: 5% hour

• Total HB: 1’000’000

• no loss

1e-05

0.0001

0.001

0.01

0.1

1

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Mis

take r

ate

[1/s

]

Detection time [s]

! FD

Chen FD

Bertier FD ! FDChen FD

Bertier FD

Bertier FD

Chen FD

! accrual FD

79 PDCAT’05, Dalian, China, December 5, 2005

Experimentation: WAN

• WAN

• JAIST (JP) – EPFL (CH)

• Parameters

• HB interval: 100$ms

• Duration: 1 week

• Total HB: ~ 6’000’000

0.001

0.01

0.1

0.4

0 0.5 1 1.5 2 2.5

Mis

take r

ate

[1/s

]

Detection time [s]

Chen FD

!-FD

Bertier FD

! FDChen FD

Bertier FDBertier FD

Chen FD

! accrual FD

80

PDCAT’05, Dalian, China, December 5, 2005

Experimentation: WAN

81

Apr 3

0:00 0:00 0:00 0:00 0:00 0:00 0:00

Apr 4 Apr 5 Apr 6 Apr 7 Apr 8 Apr 9

time (UTC)

0

50

100

150

200

250

300

350

400

450

500

0 5 10 15 20 25

Occ

ure

nce

[#

burs

ts]

Burst length [# lost messages]

2004

W32/Netsky.T@mm

PDCAT’05, Dalian, China, December 5, 2005

Part IVAccrual FDs

PDCAT’05, Dalian, China, December 5, 2005

Outline: Part IV

• Motivation

• Definition (accrual FDs)

• Equivalence: "P

• Relation w/QoS

83 PDCAT’05, Dalian, China, December 5, 2005

Motivation

• Objective (long term)

• Offer failure detection as generic service

• E.g., NTP for clock synchronization

• Open issues

• interaction style

• notification

• self-configuration, deployment

84

Accomodate various usage patterns and

QoS requirements simultaneously

Page 15: Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05, Dalian, China, De cembe r 5, 2005 Failure Detection in Distributed Systems: Retrospective

PDCAT’05, Dalian, China, December 5, 2005

Different Patterns

• Dispatch. – Worker

• Action: release resources

• Needs stability

• => conservative FD

• Dispatch. replica

• Action: failover

• Needs quick reaction

(see Consensus)

• => mid. aggressive FD

85

Backup dispatcher

PDCAT’05, Dalian, China, December 5, 2005

Variable Suspicion Costs

• Case 1:

• Cost varies with time:

• amount work completed

• available resources

• Case 2:

• Important task

• Most likely up machine

86

?

BOOM

BOOM

PDCAT’05, Dalian, China, December 5, 2005

Decoupling

• Failure detection• 2 roles: monitoring, interpretation

• interpretation –> QoS

• => decoupling

87

Failure

Detection

Service

Programs,

Protocols

Monitoring

Interpretation

Action

Interpretation

ActionParametric

Action

suspicion level

suspicionssuspicions

Monitoring

Interpretation

Action Action Action

suspicions

Binary FD Accrual FD

PDCAT’05, Dalian, China, December 5, 2005

Accrual Failure Detectors

• Abstraction

• Provides suspicion level

• Separates monitoring / interpretation

• QoS handled locally (e.g., by thresholds)

• Formally

• Well-defined behavior

• Preserves theoretical characteristics

• Relation between threshold & QoS

• Practically

• Many implementations possible

88

from [DUHK05]

PDCAT’05, Dalian, China, December 5, 2005

Suspicion Level

• Suspicion Level

• function of time to non-negative

• means “confidence that p is faulty”: (0 = trust p)

89

slqp : T !" R+0

slqp(t)

t

slqp(t)

PDCAT’05, Dalian, China, December 5, 2005

Property 1 (Upper bound)

• If p is correct

• slqp(t) is bounded

• bound SLmax is unknown

90

slqp(t)

t

SLmax

slqp(t)

p ! correct(F ) " #SLmax : $t (slqp(t) % SLmax )

Page 16: Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05, Dalian, China, De cembe r 5, 2005 Failure Detection in Distributed Systems: Retrospective

PDCAT’05, Dalian, China, December 5, 2005

Property 2 (Accruement)

• If p is faulty

• slqp(t) is eventually monotonic

• (unknown) minimum increase rate

91

SLmax

slqp(t)

t

slqp(t)

BOOM

monotonic

p! faulty(F ) " #K#Q$k % K

!

slqp(tqryq(k)) & slqp(tqryq

(k+1))'slqp(tqryq

(k)) < slqp(tqryq(k+Q))

"

PDCAT’05, Dalian, China, December 5, 2005

Class !Pac

• Class !Pac

• for all pairs of processes:

- Upper bound holds

- Accruement holds

• Equivalence: !Pac ! !P

• can solve same set of problems

• implementable in same systems

92

PDCAT’05, Dalian, China, December 5, 2005

Equivalence

• Accrual to Binary

• dynamic thresholds

• suspect: threshold & trust

• trust: rate insufficient

• Binary to Accrual

• at each query:

- p suspected => increm. by #

- p trusted => reset to 0

93

slqp(t)

slqp(t)D!Pac

t

Tsusp(t)

trust

suspectD!"P

trust

suspectD!Pslqp(t)

slqp(t)

t

D!"Pac

PDCAT’05, Dalian, China, December 5, 2005

Equivalence

• Means...

• same computational power

- can solve same set of problems

- can be implemented over same set of systems

• but...

• loss of information

• overall performance can be different

94

!Pac ! !P

PDCAT’05, Dalian, China, December 5, 2005

• Threshold function• function of time

• triggers suspicion

• defines binary FD:

slqp(t)

t

slqp(t)

QoS w/multiple thresholds

95

Ti : T !" R+

DTi

T1(t)

trust

DT1suspect

T2(t)

DT2

trust

PDCAT’05, Dalian, China, December 5, 2005

QoS w/multiple thresholds

• Hypotheses

• Hyp.1: At all time, T1(t) & T2(t)

• Theorem

• "T2 suspects p only if DT1 suspects p

• Detection time

• detect. time w/DT1 & detect. time w/DT2

• Query accuracy prob.

• prob. trust w/DT1 & prob. trust w/DT2

96

Page 17: Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05, Dalian, China, De cembe r 5, 2005 Failure Detection in Distributed Systems: Retrospective

PDCAT’05, Dalian, China, December 5, 2005

• Threshold function• function of time

• triggers suspicion

• defines binary FD:

slqp(t)

t

slqp(t)

QoS w/multiple thresholds

97

Ti : T !" R+

DTi

T1(t)

trust

DT1suspect

T2(t)

DT2

trust

PDCAT’05, Dalian, China, December 5, 2005

• Trust threshold function• shared by all detectors

slqp(t)

t

slqp(t)

T1(t)

trust

DT1suspect

T2(t)

DT2

trust

T0(t)

QoS w/multiple thresholds

98

PDCAT’05, Dalian, China, December 5, 2005

QoS w/multiple thresholds

• Hypotheses

• Hyp.1: At all time, T1(t) & T2(t)

• Hyp.2: D#T1 & D#T2 use same trust threshold

• Theorem

• "#T2 has T-transition => D#T1 has T-transition

• Mistake rate

• mistake rate w/D#T1 >= mistake rate w/D#T2

• Good period duration

• good period duration w/D#T1 <= w/D#T2

• mistake duration: cannot say99 PDCAT’05, Dalian, China, December 5, 2005

Implementations

• Simple implementation• Partially synchronous model

• Susp. level increase with time

• Reset when receive heartbeat

100

slqp(t)

t

slqp(t)

p

q

BOOM

PDCAT’05, Dalian, China, December 5, 2005

Implementations

• Chen-based adaptation [CTA02]

• After “expected arrival” point, increase with time

• Reset when receive heartbeat

• Safety margin ! set with threshold

101

slqp(t)

t

slqp(t)

p

q

BOOM

EA1 EA2 EA3 EA4 EA5

!

!

!5

T=!

PDCAT’05, Dalian, China, December 5, 2005

Implementations

• PHI accrual FD [HDYK04]

• Samples arrival probability

• Estimate prob. receive HB later

• Reset when receive heartbeat

102

Plater (t)

t

App. 3

do action(!)network

Plater (t)last arrival !

Failure Detector

App. 1

! > !1 ! suspect

App. 2

! > !2 ! suspect

heartbeat arrivals

sampling window

estimation

Plater (t)

t

tnow

! log10 Plater (tnow ! Tlast)Tlast

Page 18: Failure Detection in Distributed Systems · PRESTO, Japan Science & Tech . Agency (JST) PDCA TÕ05, Dalian, China, De cembe r 5, 2005 Failure Detection in Distributed Systems: Retrospective

PDCAT’05, Dalian, China, December 5, 2005

Implementations

• Difference• Dominant metric for suspicion level

103

suspicion level related to...

Simple implementationdetection time

Chen-based adaptation

$ accrual FD accuracy

PDCAT’05, Dalian, China, December 5, 2005

Part VConclusion

PDCAT’05, Dalian, China, December 5, 2005

Other Important Issues

• Notification

• Gossiping

• Hierarchy, spanning tree

• Interface

• QoS negotiation

• Evaluation

• Representative environments

• Benchmarks

105 PDCAT’05, Dalian, China, December 5, 2005

Conclusions

• Failure Detection

• Active area of research

• Theory well-developed

• Practice lagging

• Open questions...

• Better abstractions / interfaces

• Better performance (QoS)

• Lower overhead

• Self-configuration

106