Tolerating Faults in Counting Networks Marc D. Riedel Jehoshua Bruck California Institute of...
-
date post
20-Dec-2015 -
Category
Documents
-
view
221 -
download
1
Transcript of Tolerating Faults in Counting Networks Marc D. Riedel Jehoshua Bruck California Institute of...
Tolerating Faults in Tolerating Faults in Counting NetworksCounting Networks
http://www.paradise.caltech.edu
Marc D. Riedel Jehoshua BruckCalifornia Institute of Technology
Parallel and Distributed Computing Group
Multiprocessor Coordination
0P
• scheduling
Shared Counting
Processes cooperate to assign successive values
1P
2P
3P
4P
602602
606606
605605
601601
603603
604604
607607
608608609609
610610
• load balancing• resource allocation
Multiprocessor CoordinationCentralized Solution
serialized access
0P
1P
2P
3P
4P
602602
601601
603603
604604608608
600601602603604605606
Multiprocessor CoordinationCentralized Solution
• high contentionDisadvantages:
0P
1P
2P
3P
4P
602602
601601
603603
604604608608
• low throughput
0
00
0 0
0
Counting NetworksData structure for multiprocessor coordinationAspnes, Herlihy & Shavit (1991)
concurrent data structure
0
00
0 0
0
Counting NetworksData structure for multiprocessor coordinationAspnes, Herlihy & Shavit (1991)
0P 1
600 ,0P11
concurrent data structure
0
00
0 0
0
Counting NetworksData structure for multiprocessor coordinationAspnes, Herlihy & Shavit (1991)
0P 0
0
1
0
0 0
0 600 ,0P111P 1 1
600 ,1P1
concurrent data structure
change thisto 601 with eq.editor
Counting NetworksData structure for multiprocessor coordinationAspnes, Herlihy & Shavit (1991)
)(logdepth 2 nO
nwidth
Concurrent accessby up to n processes
Each process accesses 1/n-th of bits0
00
0 0
0
0
0
1
0
0 0
0 111 1
1
Counting NetworksData structure for multiprocessor coordinationAspnes, Herlihy & Shavit (1991)
)(logdepth 2 nO
nwidth
0
00
0 0
0
0
0
1
0
0 0
0 111 1
1
• low contentionAdvantages:
• high throughput
Shared Memory Architectures
Balancer : shared boolean variable.
Type balancerbegin state: boolean; top: ptr to balancer; bottom: ptr to balancer;end
statetop
bottom
1
Processes shepherd tokens through the network.
01
b
e
a
a a a
b
b bc c
c c
d
d e e
e d dfg
f
g
f f
g
g
Counting NetworkData structure for multiprocessor coordination
Aspnes, Herlihy & Shavit (1991)
depth )(log2 nO
outputsn
inputsn
b
e
a
a a a
b
b bc c
c c
d
d e e
e d dfg
f
g
f f
g
g
step sequence
Counting Network
Isomorphic to Batcher’s Bitonic sorting network.
0P
concurrent data structure
01
00 0P
Fault Tolerance
0
• No lost tokensNo errors in control:Dynamic faults in the data
structure: • Corrupted data• Inaccessible data
• No errors in network wiring
inputs outputs
Fault Model
2
yx
2
yx
:, yx received prior to the fault
:, yx received after the fault
x x
y y
x
y
tokens bypass balancer
Fault Tolerance
inputs outputs
Naïve approach: replicate every balancer.
imbalance in token countsDoesn’t work!
Fault-Tolerant Balancer
inputs outputs
L F F
k+1 “pseudo-balancers”,
tolerates k faults
two bits of memory each
Pseudo-Balancer
inputs outputs
L
two bits of memory
state: up or downstatus: leader (L) or follower (F)
Fault Tolerance
1st Solution: Counting Network constructed with FT balancers.
CountingNetwork
)(log2 nO
FT Counting
Network
)log( 2 nkO
tolerates k faults
Fault Tolerance
FT balancers
1y2y
ny
CorrectionNetwork
1x2x
nx
1y2y
ny
CountingNetwork
2nd Solution: Rectify errors with a correction network.
)(log2 nO )log( 2 nkO
remapped faulty balancers
(better provided that )log nk
inaccessiblebalancer
spare balancer,random initial state
Redirect pointers to spare balancer
Remapping Faulty Balancers
Error Bound
Error bound for the output sequence of a balancing network with remapped balancers:
1x2x
nx
1y2y
ny
BalancingNetwork
k faults
Distance Measure
n
iii yyD
12
1)y(y,
The distance between two sequences nyyy ,,, 21 ynyyy ,,, 21 yand is:
Definition:
gives number of“misplaced tokens”
)y(y,D
1x2x
nx
1y2y
ny
BalancingNetwork
k faults
Two identical balancing networks, given same inputs:
1x
2x
nx
1y2y
ny
1x2x
nx
1y2y
ny
kD )y(y,
Error Bound
k faultsno faults
Correction Network
Strategy: Construct a block which reduces error by one.
step sequencewith k errors
step sequencewith errors1k
1y2y
ny
1y2y
ny
CORRECT[n]
Correction Network
1z
2z
nz BUTTERFLY[n]
1y2y
ny
largest value
smallest value
step sequencewith k errors
1y2y
ny
step sequencewith errors1k
To reduce error by one: balance smallest and largest entries.
Butterfly Network
Network which separates out smallest and largest entries:
0
1
10
1
0
1
34
0
1
0
6
5
1
0
17
17
4
3
3
2
9
9
9
8
7
6
6
5
6
6
6
5
largest value
smallest value
Butterfly Network
Balance smallest and largest entries:
0
1
10
1
0
1
34
0
1
0
6
5
1
0
17
17
4
3
3
2
9
9
9
8
7
6
6
5
6
6
6
5
6
6
6
5
6
6
6
6
error reduced
Correction Network
step sequencewith k errors
Strategy: to correct k faults, append k copies.
1y2y
ny
CORRECT[n]#k
1y2y
ny
CORRECT[n]#1
)1)(log1( nk)1)(log1( nk
smooth sequence
step sequence
)1)(log1(depth nkk
Fault Tolerance
FT balancers
1y2y
ny
CorrectionNetwork
)log( 2 nkO
1x2x
nx
1y2y
ny
CountingNetwork
)(log2 nO
remapped faulty balancers
Correction network, constructed with FT balancers, isappended to counting network.
Conclusions
• Upper bound on error resulting from faults.
• Practical method for tolerating faults with extra stages.)log( 2 nkO
Future Work• Extend concepts to Diffracting Trees (Shavit et al.,
1996) and other constructs.• General framework for fault-tolerant concurrent
data structures.
Leader
incoming tokens colored green
Accepts tokens on either wire.
inputs outputs
L
two bits of memory
Colors outgoing tokens red.
Leader
incoming tokens colored green
Accepts tokens on either wire.
inputs outputs
L
two bits of memory
Colors outgoing tokens red.
Leader
incoming tokens colored green
Accepts tokens on either wire.
inputs outputs
L
two bits of memory
Colors outgoing tokens red.
Leader
incoming tokens colored green
Accepts tokens on either wire.
inputs outputs
L
two bits of memory
Colors outgoing tokens red.
Leader
incoming tokens colored green
Accepts tokens on either wire.
inputs outputs
L
two bits of memory
Colors outgoing tokens red.
Follower
Accepts red tokens in order.
inputs outputs
F
two bits of memory
Becomes a leader if it receives a green token.
Follower
Accepts red tokens in order.
inputs outputs
F
two bits of memory
Becomes a leader if it receives a green token.
L
Follower
Accepts red tokens in order.
inputs outputs
F
two bits of memory
Becomes a leader if it receives a green token.
L