1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems...

54
1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay CMPT 771/471: Overlay Networks and P2P Systems Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda Instructor: Dr. Mohamed Hefeeda

Transcript of 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems...

Page 1: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

1

School of Computing Science

Simon Fraser University

CMPT 771/471: Overlay Networks CMPT 771/471: Overlay Networks and P2P Systemsand P2P Systems

Instructor: Dr. Mohamed HefeedaInstructor: Dr. Mohamed Hefeeda

Page 2: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

2

P2P Computing: Definitions

Peers cooperate to achieve desired functions- Peers:

• End-systems (typically, user machines)

• Interconnected through an overlay network

• Peer ≡ Like the others (similar or behave in similar manner)

- Cooperate: • Share resources, e.g., data, CPU cycles, storage, bandwidth

• Participate in protocols, e.g., routing, replication, …

- Functions: • File-sharing, distributed computing, communications,

content distribution, streaming, …

Note: the P2P concept is much wider than file sharing

Page 3: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

3

When Did P2P Start?

Napster (Late 1990’s)- Court shut Napster down in 2001

Gnutella (2000) Then the killer FastTrack (Kazaa, ...) Now BitTorrent, and many others Accompanied by significant research interest Claim

- P2P is much older than Napster!

Proof- The original Internet!

- Remember UUCP (unix-to-unix copy)?

Page 4: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

4

What IS and IS NOT New in P2P?

What is not new- Concepts!

What is new- The term P2P (may be!)

- New characteristics of • Nodes which constitute the

• System that we build

Page 5: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

5

What IS NOT New in P2P?

Distributed architectures Distributed resource sharing Node management (join/leave/fail) Group communications Distributed state management ….

Page 6: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

6

What IS New in P2P?

Nodes (Peers)- Quite heterogeneous

• Several order of magnitudes difference in resources

• Compare bandwidth of dial-up peer vs high-speed LAN peer

- Unreliable• Failure is the norm!

- Offer limited capacity• Load sharing and balancing are critical

- Autonomous• Rational, i.e., maximize their own benefits!

• Motivations should be provided to peers to cooperate in a way that optimizes the system performance

Page 7: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

7

What IS New in P2P? (cont’d)

System - Scale

• Numerous number of peers (millions)

- Structure and topology• Ad-hoc: No control over peer joining/leaving

• Highly dynamic

- Membership/participation• Typically open

- More security concerns• Trust, privacy, data integrity, …

- Cost of building and running• Small fraction of same-scale centralized systems

• How much would it cost to build/run a super computer with processing power of that 3 Million SETI@Home PCs?

Page 8: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

8

What IS New in P2P? (cont’d)

So what? We need to design new lighter-weight

algorithms and protocols to scale to millions (or billions!) of nodes given the new characteristics

Question: why now, not two decades ago?- We did not have such abundant (and

underutilized) computing resources back then!

- And, network connectivity was very limited

Page 9: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

9

Why is it Important to Study P2P?

P2P traffic is a major portion of Internet traffic (50+%), current killer app

P2P traffic has exceeded web traffic (former killer app)!

Direct implications on the design, administration, and use of computer networks and network resources

- Think of ISP designers or campus network administrators

Many potential distributed applications

Page 10: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

10

Sample P2P Applications

File sharing- BitTorrent, Overnet, eDonkey, Gnutella,, …

Distributed cycle sharing- SETI@home, Gnome@home, …

File and storage systems- OceanStore, CFS, Freenet, Farsite, …

Media streaming and content distribution- SopCast, CoolStreaming, …

- SplitStream, CoopNet, PeerCast, Bullet, Zigzag, NICE, …

Page 11: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

11

P2P vs. its Cousin (Grid Computing)

Common Goal:- Aggregate resources (e.g., storage, CPU

cycles, and data) into common pool and provide efficient access to them

Differences along five axes- Target communities and applications

- Type of shared resources

- Scalability of the system

- Services provided

- Software required

Page 12: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

12

P2P vs Grid Computing (cont’d)

Issue Grid P2P

Communities and Applications

Established communities, e.g., scientific institutions Computationally-intensive problems

Grass-root communities (anonymous) Mostly, file-swapping

Resources Shared

Powerful and Reliable machines, clusters High-speed connectivity Specialized instruments

PCs with limited capacity and connectivity Unreliable Very diverse

Page 13: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

13

P2P vs Grid Computing (cont’d)

Issue Grid P2P

System Scalability

Hundreds to thousands of nodes

Hundreds of thousands to Millions of nodes

Services Provided

Sophisticated services: authentication, resource discovery, scheduling, access control, and membership control Members usually trust each others

Limited services: resource discovery limited trust among peers

Software required

Sophisticated suit: e.g., Globus, Condor

Simple:, e.g., BitTorrent, SETI@Home (screen saver)

Page 14: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

14

P2P vs Grid Computing: Discussion

The differences mentioned are based on the traditional view of each paradigm

- It is conceived that both paradigms will converge and will complement each other

Target communities and applications- Grid: is going open

Type of shared resources- P2P: is to include various and more powerful resources

Scalability of the system- Grid: is to increase number of nodes

Services provided- P2P: is to provide authentication, data integrity, trust

management, …

Page 15: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

15

P2P Systems: Simple Model

P2P Substrate

Operating System

Hardware

Middleware

P2P Application

Software architecture model on a peer

System architecture: Peers form an overlay according to the P2P

Substrate

Page 16: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

16

Overlay Network

An abstract layer built on top of physical network

Neighbors in overlay can be several hops away in physical network

Page 17: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

17

Overlay Network (cont’d)

Page 18: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

18

Overlay Network (cont’d)

Why do we need overlays? Flexibility in

- Choosing neighbors

- Forming and customizing topology to fit application’s needs (e.g., short delay, reliability, high BW, …)

- Designing communication protocols among nodes

Get around limitations in legacy networks Enable new (and old!) network services

Page 19: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

19

Overlay Network (cont’d)

Overlay design issues- Select neighbors

- Handle node arrivals, departures

- Detect and handle failures (nodes, links)

- Monitor and adapt to network dynamics

- Match with the underlying physical network

Page 20: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

20

Overlay Network (cont’d)

Some applications that use overlays- Application level multicast, e.g., ESM, Zigzag, NICE, …

• Build multicast tree(s) or mesh(es) in the application (not network) layer

- Reliable inter-domain routing, e.g., RON• Improves BGP by finding robust routes faster

- Content Distribution Networks (CDN)• To distribute bandwidth intensive content (software

updates,…)

- Peer-to-peer file sharing • File exchange among peers

- P2P streaming• Real time streaming

Page 21: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

21

Overlay Network (cont’d)

Example application … Application Level Multicast (ALM)

Let us first see IP Multicast

Page 22: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

22

Overlay Network (cont’d) Recall: IP Multicast

source

Page 23: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

23

Overlay Network (cont’d)

IP Multicast

- Most efficient (packets traverse each link only once)

What is wrong with IP Multicast?

- Not enabled in many routers- Not scalable (core routers need to maintain state for

multicast sessions)

Now let us see ALM …

Page 24: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

24

Overlay Network (cont’d)Application Level Multicast (ALM)

source

Page 25: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

25

Overlay Network (cont’d)

Several algorithms have been proposed to improve the efficiency of ALM

- Get it as close as possible to IP Multicast

- See ESM, NICE, Zigzag papers

Page 26: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

26

Peer Software Model

A software client installed on each peer

Three components:

- P2P Substrate

- Middleware

- P2P Application

P2P Substrate

Operating System

Hardware

Middleware

P2P Application

Software model on peer

Page 27: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

27

Peer Software Model (cont’d)

P2P Substrate (key component)- Overlay management

• Construction

• Maintenance (peer join/leave/fail and network dynamics)

- Resource management• Allocation (storage)

• Discovery (routing and lookup)

Ex: Pastry, CAN, Chord, …

More on this later

Page 28: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

28

Peer Software Model (cont’d)

Middleware- Provides auxiliary services to P2P applications:

• Peer selection

• Trust management

• Data integrity validation

• Authentication and authorization

• Membership management

• Accounting (Economics and rationality)

• …

- Ex: CollectCast, EigenTrust, Micro payment

Page 29: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

29

Peer Software Model (cont’d)

P2P Application- Potentially, there could be multiple applications

running on top of a single P2P substrate

- Applications include• File sharing

• File and storage systems

• Distributed cycle sharing

• Content distribution

- This layer provides some functions and bookkeeping relevant to target application

• File assembly (file sharing)

• Buffering and rate smoothing (streaming)

Ex: Promise, Bullet, CFS

Page 30: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

30

P2P Substrate

Key component, which - Manages the Overlay

- Allocates and discovers objects

P2P Substrates can be - Structured

- Unstructured

- Based on the flexibility of placing objects at peers

Page 31: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

31

P2P Substrates: Classification

Structured (or tightly controlled, DHT) − Objects are rigidly assigned to specific peers

− Looks like as a Distributed Hash Table (DHT)

− Efficient search & guarantee of finding

− Lack of partial name and keyword queries

− Maintenance overhead

− Ex: Chord, CAN, Pastry, Tapestry, Kademila (Overnet)

Page 32: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

32

P2P Substrates: Classification

Unstructured (or loosely controlled)− Objects can be anywhere

− Support partial name and keyword queries

− Inefficient search & no guarantee of finding

− Some heuristics exist to enhance performance

− Ex: Gnutella, Kazaa (super node), GIA [Chawathe et al. 03]

Page 33: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

33

Structured P2P Substrates

Objects are rigidly assigned to peers− Objects and peers have IDs (usually by

hashing some attributes)

− Objects are assigned to peers based on IDs

Peers in overlay form specific geometrical shape, e.g.,

- tree, ring, hypercube, butterfly network

Shape (to some extent) determines − How neighbors are chosen, and

− How messages are routed

Page 34: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

34

Structured P2P Substrates (cont’d)

Substrate provides a Distributed Hash Table (DHT)-like interface

− InsertObject (key, value), findObject (key), …

− In the literature, many authors refer to structured P2P substrates as DHTs

It also provides peer management (join, leave, fail) operations

Most of these operations are done in O(log n) steps, n is number of peers

Page 35: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

35

Structured P2P Substrates (cont’d)

DHTs: Efficient search & guarantee of finding

However, − Lack of partial name and keyword queries

− Maintenance overhead, even O(log n) may be too much in very dynamic environments

Ex: Chord, CAN, Pastry, Tapestry, Kademila (Overnet)

Page 36: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

36

Example: Content Addressable Network (CAN) [Ratnasamy 01]

− Nodes form an overlay in d-dimensional space− Node IDs are chosen randomly from the d-space

− Object IDs (keys) are chosen from the same d-space

− Space is dynamically partitioned into zones

− Each node owns a zone

− Zones are split and merged as nodes join and leave

− Each node stores− Portion of the hash table that belongs to its zone

− Information about its immediate neighbors in the d-space

Page 37: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

37

2-d CAN: Dynamic Space Division

n1

n2

n3

n4

0

0

7

7

n5

Page 38: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

38

2-d CAN: Key Assignment

n1

n2

n3

n4

0

0

7

7

K1K2

K3

K4

n5

Page 39: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

39

2-d CAN: Routing (Lookup)

n1

n2

n3

n4

0

0

7

7

K1K2

K3

K4

n5

K4?

K4?

Page 40: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

40

CAN: Routing

− Nodes keep 2d = O(d) state information (neighbor coordinates, IPs)

− Constant, does not depend on number of nodes n

− Greedy routing - Route to the node that is closest to the

destination

- On average, is done in O(n1/d) = O(log n) when d = log n /2

Page 41: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

41

CAN: Node Join

− New node finds a node already in the CAN− (bootstrap: one (or a few) dedicated nodes outside the

CAN maintain a partial list of active nodes)

− It finds a node whose zone will be split− Choose a random point P (will be its ID)

− Forward a JOIN request to P through the existing node

− The node that owns P splits its zone and sends half of its routing table to the new node

− Neighbors of the split zone are notified

Page 42: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

42

CAN: Node Leave, Fail

− Graceful departure− The leaving node hands over its zone to one of its

neighbors − Failure

− Detected by the absence of heart beat messages sent periodically in regular operation

− Neighbors initiate takeover timers, proportional to the volume of their zones

− Neighbor with smallest timer takes over zone of dead node

− notifies other neighbors so they cancel their timers (some negotiation between neighbors may occur)

− Note: the (key, value) entries stored at the failed node are lost

− Nodes that insert (key, value) pairs periodically refresh (or re-insert) them

Page 43: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

43

CAN: Discussion

− Scalable − O(log n) steps for operations − State information is O(d) at each node

− Locality − Nodes are neighbors in the overlay, not in the physical

network− Suggestion (for better routing)

− Each node measures RTT between itself and its neighbors− Forwards the request to the neighbor with maximum ratio

of progress to RTT

Page 44: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

44

CAN: Discussion

− What is wrong with CAN (and DHTs in general)?

− Maintenance cost− Although logarithmic in number of nodes, still too much

for very dynamic P2P systems

− Peers are joining and leaving all the time

Page 45: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

45

Unstructured P2P Substrates

− Objects can be anywhere Loosely-controlled overlays

− The loose control− Makes overlay tolerate transient behavior of nodes

− For example, when a peer leaves, nothing needs to be done because there is no structure to restore

− Enables system to support flexible search queries− Queries are sent in plain text and every node runs a mini-

database engine

− But, we loose on searching− Usually using flooding, inefficient− Some heuristics exist to enhance performance− No guarantee on locating a requested object (e.g., rarely

requested objects)− Ex: Gnutella, Kazaa (super node), GIA [Chawathe et al.

03]

Page 46: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

46

Example: Gnutella

− Peers are called servents− All peers form an unstructured overlay− Peer join

− Find an active peer already in Gnutella (e.g., contact known Gnutella hosts)

− Send a Ping message through the active peer− Peers willing to accept new neighbors reply with Pong

− Peer leave, fail− Just drop out of the network!

− To search for a file− Send a Query message to all neighbors with a TTL (=7)− Upon receiving a Query message

− Check local database and reply with a QueryHit to requester

− Decrement TTL and forward to all neighbors if nonzero

Page 47: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

Flooding in Gnutella

Scalability Problem

47

Page 48: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

48

Heuristics for Searching [Yang and Garcia-Molina 02]

− Iterative deepening− Multiple BFS with increasing TTLs

− Reduce traffic but increase response time

− Directed BFS− Send to “good” neighbors (subset of your neighbors

that returned many results in the past) need to keep history

− Local Indices− Keep a small index over files stored on neighbors

(within number of hops)

− May answer queries on behalf of them

− Save cost of sending queries over the network

− Index currency?

Page 49: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

49

Heuristics for Searching: Super Node

− Used in recent Gnutella-like networks

− Relatively powerful nodes play special role− maintain indexes over other peers

Super Node (SN)

Ordinary Node (ON)

Page 50: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

50

Super Node Systems

− File search− ON sends a query to its SN

− SN replies with a list of IPs of ONs that have the file

− SN may forward the query to other SNs

− Parallel downloads take place between ONs

Page 51: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

51

Super Node Systems

− Two types of traffic− Signaling

− Handshaking, connection establishment, uploading metadata, …

− Over TCP connections between SN—SN and SN—ON

− Content traffic − Files exchanged

− Mostly through HTTP between ON—ON

Page 52: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

52

Lessons from Deployed P2P Systems

− Distributed design− Exploit heterogeneity− Load balancing − Locality in neighbor selection− Connection Shuffling

− If a peer searches for a file and does not find it, it may try later and gets it!

− Efficient gossiping algorithms− To learn about other SNs and perform shuffling

− Consider peers behind NATs and Firewalls− They are everywhere!

Page 53: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

53

P2P Systems: Summary

P2P is an active research area with many potential applications in industry and academia

In P2P computing paradigm:- Peers cooperate to achieve desired functions

New characteristics- heterogeneity, unreliability, rationality, scale, ad hoc

new and lighter-weight algorithms are needed

Simple model for P2P systems:- Peers form an abstract layer called overlay

- A peer software client may have three components• P2P substrate, middleware, and P2P application

• Borders between components may be blurred

Page 54: 1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

54

Summary (cont’d)

P2P substrate: A key component, which - Manages the Overlay

- Allocates and discovers objects

P2P Substrates can be - Structured (DHT)

• Example: CAN

- Unstructured• Example: Gnutella