Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software...

Lecture 10

Naming services for flat namespaces

EECE 411: Design of Distributed Software Applications

Logistics / reminders Project

Send Samer and me your group membership by the end of the week

Quizzes: Q1: next time Q2: 11/16

Implementation options: Flat namespace

Problem: Given an essentially unstructured name how can we design a scalable solution that associates names to addresses?

Possible designs: [last time] Simple solutions (broadcasting,

forwarding pointers) Hash table-like approaches

Consistent hashing, Distributed Hash Tables

Functionality to implement Map: names access points (addresses)

Similar to a hash-table Manage (huge) list of pairs (name, address)

or (key, value)

Put (key, value) Lookup (key) value

Key idea: partitioning. Allocate parts of the list to different nodes

Why the put()/get() interface?

API supports a wide range of applications imposes no structure/meaning on keys

Key/value pairs are persistent and global Can store keys in other values (indirection) And thus build complex data structures

Why Might The Design Be Hard?

Decentralized: no central authority Scalable: low network traffic overhead Efficient: find items quickly (latency) Dynamic: nodes fail, new nodes join General-purpose: flexible naming

The Lookup Problem

Internet

Publisher

Put (Key=“title”Value=file data…) Client

Get(key=“title”)

• At the heart of all these services

Motivation: Centralized Lookup (Napster)

Publisher@

Client

Lookup(“title”)

N2N1SetLoc(“title”, N4)

Simple, but O(N) state and a single point of failure

Key=“title”Value=file data…

Motivation: Flooded Queries (Gnutella)

N4Publisher@

Client

Robust, but worst case O(N) messages per lookup

Key=“title”Value=file data…

Lookup(“title”)

Motivation: FreeDB, Routed DHT Queries (Chord, &c.)

N4Publisher

Client

Lookup(H(audio data))

Key=H(audio data)Value={artist,

album title, track title}

Hash table-like approaches Consistent hashing, Distributed Hash Tables

Partition Solution: Consistent hashing

Consistent hashing: the output range of a hash function is treated as a

fixed circular space or “ring”.

CircularID Space N32

Key ID Node ID

Partition Solution: Consistent hashing

Mapping keys to nodes Advantages: incremental scalability, load

balancing

CircularID Space

K33, K40, K52

K11, K30

K5, K10

K65, K70

Key ID Node ID

Consistent hashing

How do store & lookup work?

N60 K33, K40, K52

K11, K30

K5, K10

K65, K70

Key ID Node ID

“Key 5 isAt N10”

What node stores K5?

Additional trick: Virtual Nodes

Problem: How to do load balancing when nodes are heterogeneous?

Solution idea: Each node owns an ID space proportional to its ‘power’

Virtual Nodes: Each physical node hosts multiple (similar) virtual nodes. Virtual nodes are treated the sameAdvantages: load balancing, incremental scalability, dealing with

failures Dealing with heterogeneity: The number of virtual nodes that a

node is responsible for can decided based on its capacity, accounting for heterogeneity in the physical infrastructure.

When a node joins (if it supports many VN) it accepts a roughly equivalent amount of load from each of the other existing nodes.

If a node becomes unavailable the load handled by this node is evenly dispersed across the remaining available nodes.

Consistent Hashing – Summary so far

Mechanism: Nodes get an identity by hashing their IP address, keys are also

hashed into same space A key with id (hashed into) k, is assigned to first node whose

hashed id is equal or follows k, in circular space: successor(k)

Advantage Incremental scalability, load balancing Theoretical results:

[N number of nodes, k number of keys in the system] [With high probability] Each node is responsible for at most

(1+)K/N keys [With high probability] Joining or leaving of a node relocates

O(K/N) keys (and only to or from the responsible node)

BUT Consistent hashing – problem

How large is the state maintained at each node? O(N); N number of nodes.

N60 K33, K40, K52

K11, K30

K5, K10

K65, K70

Key ID Node ID

Basic Lookup (nonsolution)

“Where is key 50?”

• Lookups find the ID’s successor• Correct if successors are correct

Successor Lists Ensure Robust Lookup

• Each node remembers r successors• Lookup can skip over dead nodes

10, 20, 32

20, 32, 40

32, 40, 60

40, 60, 80

60, 80, 99

80, 99, 110

99, 110, 5

110, 5, 10

5, 10, 20

“Finger Table” Accelerates Lookups

1/161/321/641/128

Lookups take O(log N) hops

Lookup(K19)

Summary of Performance Characteristics

Efficient: O(log N) messages per lookup Scalable: O(log N) state per node Robust: survives massive membership

changes

Joining the Ring Three step process

Initialize all fingers of new node Update fingers of existing nodes Transfer keys from successor to new node

Two invariants to maintain to insure correctness Each node’s successor list is maintained successor(k) is responsible for monitoring k

1. Lookup(37,38,40,…,100,164)

N20N99

Join: Initialize New Node’s Finger Table

Locate any node p in the ring Ask node p to lookup fingers of new

N20N99

Join: Update Fingers of Existing Nodes

New node calls update function on existing nodes Existing nodes recursively update fingers of other

Copy keys 21..36from N40 to N36 (the others saty)K30

N20N99

Join: Transfer Keys

Only keys in the range are transferred

Lookup(90)

Handling Failures Problem: Failures could cause incorrect lookup Solution: Fallback: keep track of successor’s

successor (i.e., keep list of r successors)

EECE 411: Design of Distributed Software Applications28

Choosing Successor List Length

r - length of successor list N – nodes in the system

Assume 50% of the nodes fail P(successor list all dead for a specific node) =

(1/2)r i.e., P(this node breaks the ring) depends on independent failure assumption

P(no broken nodes) = (1 – (1/2)r)N

r = 2log(N) makes prob. = 1 – 1/N

DHT – Summary so far

Mechanism: Nodes get an identity by hashing their IP address, keys are also

hashed into same space A key with id (hashed into) k, is assigned to first node whose

hashed id is equal or follows k, in circular space: successor(k)

Properties Incremental scalability, good load balancing Efficient: O(log N) messages per lookup Scalable: O(log N) state per node Robust: survives massive membership changes

Some experimental results

Chord Lookup Cost Is O(log N)

Number of Nodes

Constant is 1/2

Failure Experimental Setup Start 1,000 CFS/Chord servers

Successor list has 20 entries Wait until they stabilize Insert 1,000 key/value pairs

Five replicas of each Stop X% of the servers Immediately perform 1,000 lookups

DHash Replicates Blocks at r Successors

Block17

• Replicas are easy to find if successor fails• Hashed node IDs ensure independent failure

Massive Failures Have Little Impact

5 10 15 20 25 30 35 40 45 50

Failed Nodes (Percent)

(1/2)6 is 1.6%

Applications

An Example Application: The CD Database

Compute Disc Fingerprint

Recognize Fingerprint?

Album & Track Titles

An Example Application: The CD Database

Type In Album andTrack Titles

Album & Track Titles

No Such Fingerprint

A DHT-Based FreeDB Cache FreeDB is a volunteer service

Has suffered outages as long as 48 hours Service costs born largely by volunteer

mirrors Idea: Build a cache of FreeDB with a

DHT Add to availability of main service Goal: explore how easy this is to do

Cache Illustration

DHTDHTNew Albums

Disc Fingerp

Disc In

Disc Fingerprint

Trackerless BitTorrent:

A client wants to download the file: Contacts the tracker identified in

the .torrent file (using HTTP) Tracker sends client a (random)

list of peers who have/are downloading the file

Client contacts peers on list to see which segments of the file they have

Client requests segments from peers Client reports to other peers it knows about that it

has the segment Other peers start to contact client to get the segment

(while client is getting other segments)

A distributed system is: a collection of independent computers that

appears to its users as a single coherent system

Components need to: Communicate Cooperate => support needed

Naming – enables some resource sharing Synchronization

Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software...

Documents

Transcript of Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software...

MIST MSc-EECE Syllabus

Isbd namespaces

Autoloading & Namespaces

Linux network namespaces

Carte desvres samer

Namespaces in Linux

Samer Bassam - Certificates

EECE 315 (101)

Setting Up Namespaces

EECE 460 : Control System Design - SISO Pole …EECE 460 : Control System Design SISO Pole Placement Guy A. Dumont UBC EECE January 2012 Guy A. Dumont (UBC EECE) EECE 460: Pole Placement

2. XML-Namespaces

Namespaces Soap

Namespaces y C#

CANCER Samer

COSC349—Cloud Computing Architecture David Eyers...Linux kernel namespaces (first release 2002) • Namespaces show processes subsets of resources • Two namespaces can reuse the

Containers: Design, Application & Hands-onpuru/courses/spring19/... · Namespaces • Isolated system views, 6 namespaces, Each namespaces has multiple isolated environments. •

EECE 460 : Control System Design - PID Control · EECE 460 : Control System Design PID Control Guy A. Dumont UBC EECE January 2011 Guy A. Dumont (UBC EECE) EECE 460 PID Control January

Eece 711 group presentation

EECE 310: Software Engineering

Scaling Namespaces and Optimizing Data Storage · Scaling Namespaces and Optimizing Data Storage Scaling namespaces Benefits of an HDFS Federation HDFS federation provides namespace