Scalable membership management
-
Upload
vinthemaverick -
Category
Education
-
view
338 -
download
0
Transcript of Scalable membership management
![Page 1: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/1.jpg)
Scalable membership management and failure detection?
Vinay SettyINF5360
![Page 2: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/2.jpg)
What is Gossiping?
• Spread of information in a random manner• Some examples:– Human gossiping – Epidemic diseases– Physical phenomenon: wild fire, diffusion etc– Computer viruses and worms
![Page 3: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/3.jpg)
Gossiping in Computer Science
• Term first coined by Demers et al (1987)• Some applications of gossip protocols– Peer Sampling– Data Aggregation– Clustering– Information Dissemination (Multicast, Pub/Sub)– Overlay/topology – Maintenance– Failure detection?
![Page 4: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/4.jpg)
Gossip-Based Protocol: Example
0
12
5
76
3
9
4
8
![Page 5: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/5.jpg)
Today’s Focus
• Theoretical angle for Gossip-based protocols [Allavena et al PODC 2005]– Probability of partitioning– Time till partitioning– Bounds on in-degree– Essential elements of gossiping– Simulation results
• Cyclon [Voulgaris et al]• Scamp [Ganesh et al]• NewsCast [Jelasity et al]
![Page 6: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/6.jpg)
Membership Service
• Full Membership– Complete knowledge at each node– Random subset used for gossiping– Not scalable– Hard to maintain
• Partial Membership– Random subset at each node– Gossip partners chosen from local view
![Page 7: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/7.jpg)
View Selection
u
q
r
ps
t
s,p,r
t,q,r
L1
L2
L2L1
s,p,tv
v
Weighted with w
![Page 8: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/8.jpg)
Essential Elements of Gossiping
• Mixing: Construct a list L1 consisting of local views of local view of node u– Guarantees non partitioning– “Pull” based
• Reinforcement: Construct a list L2 consisting of local views of nodes that requested local view of u– Balances network– removes old possibly dead edges, adds new edges
![Page 9: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/9.jpg)
Partitioning and Size Estimate
• A and B partition iff x=1 and y=0• Partitioning is least possible when x=y• Goal of protocol is to maintain this balance
![Page 10: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/10.jpg)
Size Estimates
• Idea: – Assuming edges were drawn uniformly randomly,
expected x+y µ |A|– x is estimate of size of A by nodes in A – y is estimate of size of A by nodes in B
• Mixing:– Agreeing on estimation of x and y ensures no
partition (even if x and y are not accurate)• Reinforcement:– Brings estimation of x and y to correct value
![Page 11: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/11.jpg)
K-regularity
• View Size: k• Number of nodes: n• Fraction of nodes in partition: γ• |A|= γn ≤ |B|• #edges from A to B: (1-x)γkn• #edges from B to A: y (1-γ)kn• Number of edges in A-B cut:– (1-x)γkn +x (1-γ)kn
(since x=y)– ≥ γkn (assuming γ≤½)
![Page 12: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/12.jpg)
Time Till Partitioning
• View Size: k• Number of nodes: n• Fraction of nodes in partition: γ• Churn rate: μ (μn nodes leave and join) • Claim: Expected time before a partition of size
γ happens ≈ 2γkn – As long as μ≪γkn
![Page 13: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/13.jpg)
Iterations until Partitioning
Number of nodes: n View size: k = log n Churn: n/32
![Page 14: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/14.jpg)
View Size vs Time until Partition
Number of nodes: n View size: k = log n Churn: n/32
![Page 15: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/15.jpg)
Simplified Model for Proof
– Single randomly chosen element from view is replaced instead of whole views
– Assumption: The out-edges of nodes of A are identically distributed and same applies to B
– a = #edges from A to A– c = #edges from A to B– b = #edges from B to A– d = #edges from B to B
![Page 16: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/16.jpg)
Proof Intuition
Partition state: a = γkn and b = 0
![Page 17: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/17.jpg)
In-Degree Analysis
• Load balancing requires balance in in-degree distribution
• In-degree is governed by the way edges created, copied and destroyed
• Copying some edges more than others cause variability in in-degree
• Node living longer is expected to have higher in-degree• Solution: Increase reinforcement and keep track of
timestamps like in Cyclon• Simulation: max in-degree < 4.5 times of random graph
and standard deviation < 3.2 times
![Page 18: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/18.jpg)
Discussion
• Are these theoretical guarantees practically useful?
• Goal is not provide failure detection
![Page 19: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/19.jpg)
Cyclon
• Consists of same elements as suggested by [Allavena et al PODC 2005]
• [Allavena et al PODC 2005] Analysis holds for Cyclon
• Major differences:– Timestamps– shuffling
![Page 20: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/20.jpg)
Basic Shuffling
• Select a random subset of l neighbors (1 ≤ l ≤ c) from P’s own cache, and a random peer, Q, within this subset, where l is a system parameter, called shuffle length.
• Replace Q’s address with P’s address. • Send the updated subset to Q. • Receive from Q a subset of no more than l of Q’s neighbors. • Discard entries pointing to P, and entries that are already in
P’s cache. • Update P’s cache to include all remaining entries, by
– firstly using empty cache slots (if any), and – secondly replacing entries among the ones originally sent to Q.
![Page 21: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/21.jpg)
Shuffling Example
![Page 22: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/22.jpg)
Enhanced Shuffling• Increase by one the age of all neighbors. • Select neighbor Q with the highest age among all neighbors, and l −
1 other random neighbors. • Replace Q’s entry with a new entry of age 0 and with P’s address. • Send the updated subset to peer Q. • Receive from Q a subset of no more that l of its own entries. • Discard entries pointing at P and entries already contained in P’s
cache. • Update P’s cache to include all remaining entries, by firstly using
empty • cache slots (if any), and secondly replacing entries among the ones
sent to Q.
![Page 23: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/23.jpg)
Time Until Dead Links Removed
![Page 24: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/24.jpg)
Number of Clusters
![Page 25: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/25.jpg)
Tolerance to Partitioning
![Page 26: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/26.jpg)
In-Degree Distribution
![Page 27: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/27.jpg)
SCAMP
• Partial knowledge of the membership: local view
• Fanout automatically set = size of the local view
• Fanout evolves naturally with the size of the group– Size of local views converges towards C.log(n)
![Page 28: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/28.jpg)
Join (Subscription)
0
2
3
1s
s
s
P=1/sizeof view
(1-P)
P=1/sizeof view
P=1/sizeof view
(1-P)
Subscription forwarded
S
Subscription to a random member
s
s
s
s
s(1-P)
![Page 29: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/29.jpg)
Join(Subscription) algorithm
0
1
5
4
6
1 4 5 6
6
6
0
Local view7 6
2
8 7
7 2
8 3 6
3 6
7 0 1 5 6
6
6 6
![Page 30: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/30.jpg)
Load Balancing
• Indirection:– Forward the subscription instead of handling
request• Lease associated with each subscription• Periodically nodes have to re-subscribe– Nodes having failed permanently will time out– Re-balance the partial views
![Page 31: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/31.jpg)
Unsubscription
0
1
5
41 4 5 Unsub (0), [1,4,5]
Local view
z
x
y
8 9 0
7 3 0
6 0 2
8 9 4
x
y
z
7 3 5
6 0 1
![Page 32: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/32.jpg)
Degree
• System modelled as random directed graph• D(N) = Average out-degree for N-nodes
system• Subscription adds D(N)+1 directed arcs, so• (N+1) D(N+1) = N D(N) + D(N)+1 • Solution of this recursion is• D(N)=D(1)+1/2+1/3+…+1/N Log(N)
![Page 33: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/33.jpg)
33
Distribution of view size
Log=12.2
Log=13.12
![Page 34: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/34.jpg)
34
Reliability: 5000 node system
2500
0.9
0.92
0.94
0.96
0.98
1
0 500 1000 1500 2000Number of failures
Relia
bilit
y
SCAMP
Global membership knowledge, fanout=8
Global membership knowledge, fanout=9
![Page 35: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/35.jpg)
NewsCast
• Goal: Aggregate information in – a large and dynamic – distributed environment – a robust and dependable manner
![Page 36: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/36.jpg)
Idea
• Gets news from application, timestamps it and adds local peer address to the cache entry
• Finds a random peer in cache addresses – Sends all cache entries to this peer– Receives all cache entries from that peer
• Passes on cache entries (containing news items) to application
• Merges old cache with received cache – Keeps at most C most recent cache entries
![Page 37: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/37.jpg)
Aggregation
• Each node ni maintains a single number xi
• Every node ni selects a random node nk, and sends its value xi to nk
• nk responds with the aggregate (e.g. max(xi,xk) ) of the incoming and its own value
• 4. Aggregate values will converge “exponentially”
![Page 38: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/38.jpg)
Path length under failures
![Page 39: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/39.jpg)
Connectivity Under Failures
![Page 40: Scalable membership management](https://reader035.fdocuments.us/reader035/viewer/2022062703/5550b62eb4c90504628b4be7/html5/thumbnails/40.jpg)
Aggregation