Anish Arora Ohio State University Mikhail Nesterenko Kent State University Local Tolerance to...
-
Upload
ernest-kelly -
Category
Documents
-
view
219 -
download
0
description
Transcript of Anish Arora Ohio State University Mikhail Nesterenko Kent State University Local Tolerance to...
Anish AroraOhio State University
Mikhail NesterenkoKent State University
Local Tolerance to Unbounded
Byzantine Faults
large system size presents unique challenges and opportunitiesto ensuring dependability
• problem faults:
– occur often– affect multiple components– interact unpredictably
asynchronous execution model faults are spatially/temporally unbounded, complex &
undetectable
• opportunity a fault directly affects a region rather than whole system if faults are contained, rest of the system continues to
function
Faults in System of Large Scale
affected
faultyunaffected
lack of spatial bound arbitrary number of processes can
be faulty cannot rely on limited scope of
fault or number of faulty processes
lack of temporal bound faulty process behaves incorrectly arbitrarily long
cannot wait until fault stops
contain correctness and tolerance instead of faults use execution models that simplify such containment
Difficulties Containing Unbounded Faults
Outline• containing correctness and tolerance:
strict fault containment and strict stabilization
• execution models and example programs
reactive program: dining philosophers
transformational execution models and programs– output dependent: -independent set selection– output independent: lightweight spanner construction
address specification first• what does it mean for a system
to be correct when its arbitrary portion is faulty?
• spec defines correct sequences for each process P
• sequence involves states of Pand possibly others
a program is locally containing of faults of class F if constant l (containment radius) such thatevery P conforms to its spec if faulty processes are at least
l hops away from P
problem: correctness of P depends onevery process in the system conforming to spec or F
Containing Correctness
fault of class F
containment radius l
containment locality
Strict Fault Containment
strict fault containing (SFC) program is locally containing of unboundedByzantine faults
a process satisfies spec regardlessof actions of processes outsidelocality
SFC-program is containing ofbounded and unbounded faults of any class
for each P the spec can only mention processes inside locality a problem lacking such specs (e.g. routing) does
not have SFC-solutions
Byzantinefault
Strict Stabilization
additional tolerance properties to faults within locality for a strictly-fault containing program
strict stabilization – stabilization from transient faults: regardless of actions outside locality, each P eventually satisfies spec
Outline• containing correctness and tolerance:
strict fault containment and strict stabilization
• execution models and example programs
reactive program: dining philosophers
transformational execution models and programs– output dependent: k-independent set selection– output independent: lightweight spanner construction
Dining Philosophers Problemdefinition
network of processes, each may request to eat
properties– mutual exclusion – no
two neighbors eat together– liveness – each requesting
process eats eventually
execution model interleaving communication via shared registers high-atomicity
thinking (T)
hungry (H)
eating (E)
cycle forrequesting process
Solution to Dining Philosopherspriority based
actions
• if T & higher priority neighbors thinkingbecome hungry
• if H & no neighbors are eating eat (ensures MX)
• E & done think & give priority to neighbors
(ensures liveness)
waiting chain ≤ 3 optimal containment
radius of 2
E TH any
decreasing priority
Fault Containment andInformation Propagation
• fault containment leverages limit on information propagation
• idea: abstract fromthe process of information propagation and highlight the result
a
b
c
d
process: sends info to b
sends a’s info to c
sends a’s info to d
result: d reads from a
Execution Modelstransformation program – given input computes output (e.g.
leader election)
models for transformation programs – each process reads from processes within range (finite distance)
• output dependent – each process reads all information within range: input and (atomically) output
• output independent – each process reads only input within rangeevery program in this model is
strictly fault containing
Preads
input&output
range
Preads
input only
k-Independent Set Selection (cf. [HHJS01])problem: select a maximal subset
of processes S such that• for each process in S each other
process of S is at least k hops away
solution actions• if no member of S less than k-hops away join S• if exists member of S less than k-hops away leave S
observe:• only faulty node P can make
another process Q to leave S• if Q leaves S, it can make
another process R join Scontainment radius is 2k
1-independent set
joins S leaves S joins S
P Q R
k k
Outline• containing correctness and tolerance:
strict fault containment and strict stabilization
• execution models and example programs
reactive program: dining philosophers
transformational execution models and programs– output dependent: k-independent set selection– output independent: lightweight spanner construction
• practical problem: fast routing tree construction in sensor networks• spanner construction with double range• spanner optimization with larger ranges
Experimental Platform: Wireless Sensors
• 4 MHz Amtel processor• 8 Kb of programming memory• 512B of data memory• 916 MHz single-channel, low-power radio• 10 Kbps of raw bandwidth• uniform antenna length & orientation • TinyOS as the runtime system• fresh AA batteries
Experiment: Fast Routing Tree Construction By Flooding [G+02]
• 156 nodes are arranged in a 13x12 grid on an open parking lot, with grid spacing of 2 feet.
• the base station is placed in the middle of the base of the grid and starts the flooding
• each receiving node rebroadcast the flood message immediately upon receipt and then squelches further broadcasts the sender is selected as parent, thus routing tree to the
base station is formed• expectation: a routing tree with relatively regular structure:
# of children, link length, path size, etc.
Backward
Link
Long Link
Straggler
Clustering
1 hop 2 hops
3 hops final
Problems and Solution Approach
problem: routing tree constructed fast over“raw” topology is inadequate uneven clustering (some nodes have too many
neighbors) long links (possibly unreliable) unoptimal paths (backward links)
idea: pre-process the topology to mitigate the problem weigh links (by length, error rate, node degree, etc.) locally construct a connected but lightweight spanner
– link weight may be reflexive (depend on the spanner, ex: node degree)
Lightweight Spanner Construction Using2k-Range• spanner – connected subgraph
that includes all nodes (ex: spanning tree)
• k-local spanner – there is a path within distance ≤ k to each neighbor
problem: given a weighted graph(all weights unique) and 2k-rangebuild a lightweight k-local spanner
solution: each process P computes the minimum spanning tree for eachprocess Q in distance no more than k and selects the union of incident edges
kk
P
Q
P can compute MSTfor each process Q
in this region
MST for Q’s region
Spanner Optimization Using Ranges > 2• each P computes spanner’s
topology in neighborhood with radius range-k P knows complete spanner in
this region
• P iteratively repeats theprocedure on the resultant spanner
kk
P
Q
P can compute MST
for each process Qin this region
k
Conclusion• complexity and scale of large systems
forces unorthodox approaches to faults
• we explored spatial dimension of fault tolerance to complex unbounded faults, used lack of global info propagation stated necessary conditions and impossibility results gave first examples of programs
• question: how to solve problems that do have global info propagation? is it possible to contain problems before they spread?