Post on 05-Feb-2016
description
The Complexity of Adding Failsafe Fault-tolerance
Sandeep S. KulkarniAli Ebnenasir
MotivationsWhy automatic addition of fault-tolerance?Why begin with a fault-intolerant program? Reuse of the fault-intolerant program Separation of concerns (functionality vs. fault-
tolerance) Potential to preserve properties such as
efficiencyOne obstacle Adding masking fault-tolerance to distributed
programs is NP-hard [ FTRTFT, 2000]
Motivation (Continued)Approach for dealing with complexity Heuristics [SRDS 2001]
Weaker form of tolerance Failsafe
Safety only in the presence of faults Nonmasking
Safety may be temporarily violated Restricting input
Programs Specifications
Motivation (Continued) Why failSafe Fault-Tolerance? Simplify the design of masking Partial automation of masking fault-
tolerance (using TSE’98)
Intolerant Program
Nonmasking fault-tolerant
Masking fault-tolerant
Failsafe fault-tolerant
Automate
Automate
Outline of the TalkProblem of adding fault-toleranceDifficulties caused by distributionComplexity of failsafe fault-toleranceClass of programs and specifications for which polynomial synthesis is possible
Basic Concepts:Programs and Faults
State space Sp
Program transitions deltap, faults deltafInvariant S, fault-span TSpecification spec: Safety is specified by transitions, (sj, sk) that should not be executed
S
T
p/f p
f
Problem StatementInputs: program p, Invariant S, Faults f, Specification specOutputs: program p’, Invariant S’Requirements: Only fault-tolerance is added; no new functional behavior is added
Invariant of fault-intolerant program Invariant of fault-tolerant program
No new transition here New transitions may be added here
Difficulties with Distribution
Read/Write restrictionsTwo Boolean variables a and bProcess cannot read bCan we include the following transition?
a=0,b=0 a=1,b=0
• Only if we include the transition
a=0,b=1 a=1,b=1
Groups of transitions (instead of individual transitions) must be chosen.
Reduction from 3-SATIncluded iff x0 is false
Included iff x0 is true
Included iffxj is false
Included iffxk is true
Included iffxl is false
cj = xj \/ xk \/ xl_
an = a0a0
Dealing with the Complexity of Adding
Failsafe Fault-toleranceFor what class of problems, failsafe fault-tolerance can be added in polynomial timeRestrictions on Fault-tolerant programs Specifications Faults
Our approach for restrictions: In the absence of faults, preserve all
computations of the fault-intolerant program
Restrictions on Programs and Specifications
Monotonicity requirements Capture the notion that safe
assumptions can be made about variables that cannot be read
Focus on specifications and transitions of fault-intolerant programs
Monotonicity of Specifications
Definition: A specification spec is positive monotonic with respect to variable x iff:
For every s0, s1, s’0, s’1: The value of all other variables in s0 and s’0 are the same The value of all other variables in s1 and s’1 are the same
s1s0
x = falsex = false
If
Does not violate safety
s’0 s’1
x = truex = true
Does not violate safety
Then
Monotonicity of ProgramsDefinition: Program p with invariant S is negative monotonic with respect to variable x iff:
For every s0, s1, s’0, s’1: The value of all other variables in s0 and s’0 are the same The value of all other variables in s1 and s’1 are the same
s1s0
Invariant S
x = truex = true
s’0 s’1
X = falsex = false
TheoremAdding failsafe fault-tolerance can be done in polynomial time if either:
Program is negative monotonic, and Spec is positive monotonic
Or Program is positive monotonic, and Spec is negative monotonic
If only one of these conditions is satisfied then adding failsafe fault-tolerance is still NP-hard For many problems, these requirements are easily
met
Example: Byzantine Agreement
Processes: General, g, and three non-generals j, k, and lVariables
d.g : {0, 1} d.j, d.k, d.l : {0, 1, ┴ } b.g, b.j, b.k, b.l : {true, false} f.g, f.j, f.k, f.l : {0, 1}
Fault-intolerant program transitions d.j = ┴ /\ f.j = 0 d.j := d.g d.j ≠ ┴ /\ f.j = 0 f.j := 1
Fault transitions ¬b.g /\ ¬b.j /\ ¬b.k /\ ¬b.l b.j := true b.j d.j,f.j :=0|1,0|1
Example: Byzantine Agreement
(Continued)Safety Specification:
Agreement: No two non-Byzantine non-generals can finalize with different decisions
Validity: If g is not Byzantine, no process can finalize with different decision with respect to g
Read/Write restrictions Readable variables for process j:
b.j, d.j, f.j d.g, d.k, d.l
Process j can write d.j, f.j
Example: Byzantine Agreement
(Continued) Observation 1:
Positive monotonicity of specification with respect to b.j Observation 2:
Negative monotonicity of program, consisting of the transitions of j, with respect to b.k
Observation 3: Negative monotonicity of specification with respect to f.j
Observation 4: Positive monotonicity of program, consisting of the
transitions of j, with respect to f.k
SummaryComplexity analysis for failsafe fault-tolerance Reduction from 3-SAT Restrictions on specifications and
programs for which polynomial synthesis is possible Several problems fall in this category
Byzantine agreement, consensus, commit, … Necessity of these restrictions
Future WorkSimplifying the design of masking fault-tolerance using the two-step approachRefining boundary between classes for which polynomial synthesis is possible and for which exponential complexity is inevitableUsing monotonicity requirements for simplifying masking fault-tolerance
Thank YouQuestions?
Future WorkConclusion
Specifying the boundary Fault-tolerance addition can be done in polynomial time Exponential complexity is inevitable Goal: what problems can benefit from automation?
Necessity and sufficiency of monotonicity requirements
Future Work How can we Change a non-monotonic program to a
monotonic one by modifying its invariant?
How can we Strengthen a non-monotonic specification to a monotonic one?
How a nonmasking program can be designed manually to satisfy monotonicity requirements?
Basic Concepts: Fault-tolerant Program
Fault-tolerance in the presence of faults:
Failsafe: Satisfies its safety specification
Nonmasking: Satisfies its liveness specification(safety may be violated temporarily)
Masking: Satisfies safety and liveness specification
The complexity of Adding Failsafe fault-tolerance Adding (failsafe/nonmasking/masking) fault-tolerance in high atomicity model is in PAdding masking fault-tolerance to distributed programs is in NPHow about failsafe?
Adding Failsafe to distributed programsis NP-hard!! (proof in the paper) Reduction of 3-SAT to the problem of failsafe
fault-tolerance addition
Our ApproachStepwise towards masking fault-tolerance: Automating the addition of failsafe
fault-tolerance How hard is adding failsafe fault-tolerance?Polynomial time boundaries for failsafe tolerance addition?
Sp’
Sp,