A scalable Content- Addressable Network
description
Transcript of A scalable Content- Addressable Network
1
A scalable Content- Addressable Network
Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker
Pirammanayagam Manickavasagam
2
Overview
Introduction Design Design Improvements Design Review Related works Discussion
3
Introduction
Hash Table Functionality: Maps ‘key’ to a ‘value’.
Content Addressable Network (CAN) :-
Is a concept that provides distributed infrastructure which has Hash Table like functionality on Internet like Scale.
Characteristics: scalable, fault-tolerant and completely self-organizing.
4
Introduction (cont..)
Napster Locating a file is centralized.
Gnutella Floods the request for a file, not scalable
CAN provides a solution: Scalable - Nodes maintain small amount of control
state Distributed - Hash table is stored in all Peers, so it
is.
5
Design
Each node stores a chunk of hash table entry and details of adjacent zones.
Requests are forwarded towards the CAN node that contains the key.
Indexing uses virtual d-dimensional Cartesian coordinates. Coordinates are purely logical
6
Coordinate Space
•A
•D •B
•C
0,01,0
0,1
Each node randomly picks a coordinate.Coordinate space is dynamically partitioned
Each node owns its individual zone
7
Design (cont..)
Inserting a pair ( key K1, value V1) Use Hash function to map K1 to a point P1 in space Then this pair is stored in the Node that owns the zone
Retrieving a value: Need to know the key and use the key to identify the
node Node learns and maintains the table of details of
adjacent nodes.
8
Routing
Information's needed for routing CAN node hold routing table that contains IP address
and its virtual coordinate space. Neighbor is determined if one of the d-dimension is
same and another dimension abuts. For a d-dimensional coordinate individual node
maintains 2d neighbors
9
In figure nodes 5&1 are neighbors, as 5 has same Y coordinates as 1 and X coordinate abut 1’s.
10
Routing (Cont..)
CAN message has destination address By simple greedy forwarding to the neighbor
closest to the destination it proceeds it routing. average path length = (d/4)n1/d hops. ( n - # of
zones) As many path is available, network sustains even
if some node fails.
11
Construction
1. First the new node must find a node already in the CAN.
2. Next, using the CAN routing mechanisms, it must find a node whose zone will be split.
3. Finally, the neighbors of the split zone must be notified so that routing can include the new node.
12
Bootstrap
From DNS domain name, one or more bootstrap nodes is determined.
A bootstrap node maintains a partial list of CAN nodes it believes are currently in the system.
TO join a CAN, a new node looks up the CAN domain name in DNS to retrieve a bootstrap nodes IP address.
This bootstrap node then supplies the IP address of several randomly chosen nodes currently in system.
13
Finding a zone
New node randomly chooses a point (p) in space. Sends JOIN request destined for P. This is sent into CAN via existing CAN node. Current occupant node then splits its zone in half
and assigns one half to the new node. Splitting is done by assuming certain order.
Eg, in 2 d, X coordinate splits first and then Y coordinate.
14
Maintenance
Departure of a Node Single Node Failure Multiple Failure
15
Departure of a Node
The node that departs hands over the details to the one of its neighbor.
If the zone of one of the neighbors can be merged with the departing node’s zone to produce a valid single zone, then this is done.
If not, then the zone is handed to the neighbor whose current zone is smallest, and that node will then temporarily handle both zones.
16
Departure of a Node
•A
•D •B
•C
1,0
0,1
0,0
•D
•E •F.
When node F fails, E will be merged with F
17
Failures
Prolonged absence of update message will indicate the failure of a node. Neighbor node starts a takeover timer running. When the timer expires, a node sends a TAKEOVER
message conveying its own zone volume to all of the failed node’s neighbors.
It accepts the TAKEOVER only if the zone volume in the message is smaller than its own zone volume.
Otherwise it sends its TAKEOVER message.
18
Multiple Failure
First does a ring search to get the unreachable nodes.
Then rebuilds neighbor state table to do safe takeover.
19
Design Improvements
Multi-dimensioned coordinate spaces Increasing the dimensions of the CAN coordinate space
reduces the routing path length, and hence the path latency.
Increase in Dimension => increase in neighbor => increase in routing => increases routing fault tolerance
20
21
Design Improvements
Realities: multiple coordinate spaces Each node maintain multiple, independent coordinate spaces with
each node in the system. Each such coordinate space is a “reality”.
Given a coordinate, it is searched in all realities. This reduces the average path length.
Multiple dimensions vs. multiple realities Multiple Reality has increased fault tolerance and data
availability than multiple dimensions.
22
Design Improvements
Overloading coordinate zones allow multiple nodes to share the same zone. Nodes that share the
same zone are termed peers. MAXPEERS, which is the maximum number of allowable peers
per zone. reduced path length (number of hops), and hence reduced path
latency improved fault tolerance
Multiple hash functions Almost equal to multi realities.
23
Design Improvements
Topologically-sensitive construction of the CAN overlay network CAN nodes are ordered with their round-trip-time to
each of landmarks. With m landmarks, m! such orderings are possible. Every portion is assigned a landmark ordering. a new node joins the CAN at a random point in that
portion of the coordinate space associated with its landmark ordering.
24
Design Improvements
More Uniform Partitioning Zone are split after comparing volume of its zone with those
of its immediate neighbors in the coordinate space. Zone with the largest volume is split. we can see that without the uniform partitioning feature a
little over 40% of the nodes are assigned to zones with volume V as compared to almost 90% with this feature and the largest zone volume drops from 8V to 2V .
Not surprisingly, the partitioning of the space further improves with increasing dimensions.
Caching and Replication techniques
25
26
Design Review
Following metrics were used to evaluate system performance: Path length: the number of (application-level) hops required to route
between two points in the coordinate space. Neighbor-state: the number of CAN nodes for which an individual node
must retain state. Latency: we consider both the end-to-end latency of the total routing path
between two points in the coordinate space and the per-hop latency, i.e., latency of individual application level hops obtained by dividing the end-to-end latency by the path length.
Volume: the volume of the zone to which a node is assigned that is indicative of the request and storage load a node must handle.
Routing fault tolerance: the availability of multiple paths between two points in the CAN.
Hash table availability: adequate replication of a (key,value) entry to withstand the loss of one or more replicas.
27
Design Review
The key design parameters affecting system performance are: dimensionality of the virtual coordinate space: d number of realities: r number of peer nodes per zone: p number of hash functions (i.e. number of points per reality at which a (key, value)
pair is stored): k use of the RTT-weighted routing metric use of the uniform partitioning
Test system specification: A system size of n=218 nodes ,Transit-Stub topology with delay of 100ms on intra-
transit links, 10ms on stub-transit links and 1ms on intra-stub links (i.e. 100ms on links that connect two transit nodes, 10ms on links that connect a transit node to a stubnode and so forth).
Transit-stub models explicitly group vertices into domains, and reflect that grouping in the connectivity between vertices.
28
100 node transit-stub topology
29
Bare bones: CAN that does not utilize most of our additional design features Knobs-on-full: CAN making full use of our added features (without the landmark ordering feature)
30
Related Work
Related Algorithms Distance vector and Link State algorithms
These need widespread topological information. CAN in other hand stores only less data.
Plaxton algorithm Each node has n bit label divided into l levels. Each level has width w = n/ l. Each node forwards a packet to a neighbor whose label
matches the destination label in more digits.
31
Related Work
Algorithms with geographic routing. ‘space’ in this algorithm refers to physical space. No neighbor search problem. Correctly mimic the space is a trivial problem It is not extensible to multi dimension
32
Related System
Domain Name System It stores (domain name, IP address).
Ocean Store To provide continuous access to persistent information Uses Plaxtons algorithm
Peer-to-Peer file sharing systems Freenet
Stores Keys ( analogous URL ), address of other nodes, data corresponding to key.
33
Discussion
Addresses two key problems in the design of Content-Addressable Networks: scalable routing and indexing.
Simulation results validate the scalability of our overall design – for a CAN with over 260,000 nodes, we can route with a latency that is less than twice the IP path latency.
Future works Secure CAN Key word searching