Topics in Reliable Distributed Systems 048961 Fall 2003-2004 Dr. Idit Keidar.

Topics in Reliable Distributed Systems

048961 Fall 2003-2004

Dr. Idit Keidar

Lecture 2: Introduction to Peer-to-Peer (P2P)

Lookup Systemswith Chord as an Example

What Does Peer-to-Peer Mean?

Characterization from CFP of IPTPS 2004: – decentralized, – self-organizing – distributed systems, – in which all or most communication is

symmetric.

Typical Characteristics

• Lots of nodes (e.g., millions).• Dynamic: frequent join, leave, failure.• Little or no infrastructure.

– No central server.

• Communication possible between every pair of nodes (cf. the Internet).

• All nodes are “peers” – have same role; don’t have lots of resources.

The Main Challenge

To design and implement a robust and scalable distributed system

composed of inexpensive, individually unreliable

computers in unrelated administrative domains.

“Looking up Data in P2P Systems”, Balakrishnan et al., CACM Feb 2003.

It All Started with Lookup

• Goal: Make billions of objects available to millions of concurrent users.– E.g., music files.

• Need a mechanism to keep track of them.– Map files to their locations.

• First There was Napster.– Centralized server/database.– Pros and cons?

Traditional Scalability Solution

• Hierarchy.– Tree overlay.– Structured lookup.– E.g., DNS.

• Pros and cons?

Symmetric Lookup Algorithms

• All nodes were created equal.

• No hierarchy.– Overlay not a tree.

• So how does the search go?

Searching in Symmetric Overlays Take I: The Gnutella* Approach

• Build a symmetric unstructured overlay.– Each node has several neighbors;– has several keys in its local database.

• When asked to find a key X– Check local database if X is known.– If yes, return, if not, ask your neighbors.

• What is the communication pattern?• Problems?

*More or less.

Take II: FastTrack / KaZaA

• Improve scalability by re-introducing a hierarchy.– Though not a tree.– Super-peers have more resources, more

neighbors, know more keys.– Search goes through super-peers.

• Pros and Cons?

Structured Lookup Overlays

• Many recent academic systems – – CAN, Chord , D2B, Kademlia, Koorde, Pastry,

Tapestry, Viceroy, …

• Symmetric, no hierarchy.• Decentralized self management.• Structured overlay – data stored in a defined place,

search goes on a defined path.• Implement Distributed Hash Table (DHT)

abstraction.

Distributed Hash Tables (DHTs)

• Nodes store table entries.

• Good abstraction for lookup? Why?

• Requirements for being able to use DHTs?– Data identified with unique keys.– Nodes can store keys for each other.

• Location of object or actual object.

The DHT Service Interface

lookup( key )

returns the location of the node currently responsible for this key.

key usually numeric (in some range).

Using the DHT Interface

• How do you publish a file?

• How do you find a file?

What Does a DHT Implementation Need to Do?

• Map keys to nodes.

• Route a request to the appropriate node.

Lookup Example

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

insert(K1,V1)

K V(K1,V1)

lookup(K1)

Mapping Keys to Nodes

• Goal: load balancing.

• Typical approach: – Give an m-bit identifier to each node and each

key (e.g., using SHA-1 on the key, IP address)– Map key to node whose id is “close” to the key

(need distance function). – How is load balancing achieved?

Routing Issues

• Each node must be able to forward each lookup query to a node closer to the destination.

• Maintain routing tables adaptively.– Each node knows some other nodes.– Must adapt to changes (joins, leaves, failures).– Goals?

Chord

Stoica, Morris, Karger, Kaashoek, and Balakrishnan

Chord Logical Structure

• m-bit ID space, numerical distance mod m.

• Think of nodes as organized in a logical ring according to their IDs.

N1

N8

N10

N14

N21

N30N38

N42

N48

N51N56

K54

Assigning Keys to Nodes

• Key k assigned to first node whose ID equals or follows k – successor(k).

N1

N8

N10

N14

N21

N30N38

N42

N48

N51N56

K54

Moving Keys upon Join/Leave

• When a node joins, it becomes responsible for some keys previously assigned to its successor – Local change.– Assuming load balance, should move O(1/N)

keys.

• And what happens when a node leaves?

Simple Routing Solutions

• Each node knows only its successor. – Routing around the circle.

• Each node knows all other nodes.– O(1) routing.

Chord Skiplist Routing

• Each node has “fingers” to nodes ½ way around the ID space from it, ¼ the way…

• Entry i at n contains successor(n+2i-1).N0

N8

N10

N14

N21

N30N38

N42

N48

N51N56

How many fingers in the finger table?

Forwarding Queries

• Query for key k is forwarded to finger with highest ID not exceeding k.

K54 Lookup(K54)N0

N8

N10

N14

N21

N30N38

N42

N48

N51N56

How long does lookup take?

Chord Data Structures

• Finger table.

• Predecessor.

• Interval – list of (O(log N)) successors.

Koorde

• O(log N) routing with 2 fingers only.– Will be presented.

Joining Chord

• Find your successor.

• Learn the keys that you are responsible for.

• Initialize finger table

• Notify other nodes that need to change their finger table and predecessor pointer.

Failure Handling

• Periodically probing predecessors, successors.

• Interval in addition to fingers.

• Redundant paths.

Topics in Reliable Distributed Systems 048961 Fall 2003-2004 Dr. Idit Keidar.

Documents

Transcript of Topics in Reliable Distributed Systems 048961 Fall 2003-2004 Dr. Idit Keidar.