Peer-to-Peer (P2P) networks and applications. What is P2P? r “the sharing of computer resources...

Peer-to-Peer (P2P) networks and applications

What is P2P?

“the sharing of computer resources and services by direct exchange of information”

What is P2P?

“P2P is a class of applications that take advantage of resources – storage, cycles, content, human presence – available at the edges of the Internet. Because accessing these decentralized resources means operating in an environment of unstable and unpredictable IP addresses P2P nodes must operate outside the DNS system and have significant, or total autonomy from central servers”

What is P2P?

“A distributed network architecture may be called a P2P network if the participants share a part of their own resources. These shared resources are necessary to provide the service offered by the network. The participants of such a network are both resource providers and resource consumers”

What is P2P?

Various definitions seem to agree on sharing of resources direct communication between equals

(peers) no centralized control

What is a peer?

“…an entity with capabilities similar to other entities in the system.”

Client/Server Architecture

Well known, powerful, reliable server is a data source

Clients request data from server

Very successful model WWW (HTTP), FTP,

Web services, etc.

Server

Client

Client Client

Client

Internet

* Figure from http://project-iris.net/talks/dht-toronto-03.ppt

Client/Server Limitations

Scalability is hard to achieve Presents a single point of failure Requires administration Unused resources at the network edge

P2P systems try to address these limitations

P2P Architecture

All nodes are both clients and servers Provide and consume

data Any node can initiate

a connection

No centralized data source “The ultimate form of

democracy on the Internet”

“The ultimate threat to copy-right protection on the Internet”

* Content from http://project-iris.net/talks/dht-toronto-03.ppt

Node

Node

Node Node

Node

Internet

P2P Network Characteristics

Clients are also servers and routers Nodes contribute content, storage, memory, CPU

Nodes are autonomous (no administrative authority)

Network is dynamic: nodes enter and leave the network “frequently”

Nodes collaborate directly with each other (not through well-known servers)

Nodes have widely varying capabilities

P2P Goals and Benefits

Efficient use of resources Unused bandwidth, storage, processing power at the “edge of the network”

Scalability No central information, communication and computation bottleneck Aggregate resources grow naturally with utilization

Reliability Replicas Geographic distribution No single point of failure

Ease of administration Nodes self-organize Built-in fault tolerance, replication, and load balancing Increased autonomy

Anonymity – Privacy not easy in a centralized system

Dynamism highly dynamic environment ad-hoc communication and collaboration

What is Peer-to-Peer (P2P)?

P2P Applications

File sharing (Napster, Gnutella, Kazaa)

Multiplayer games (Unreal Tournament, DOOM)

Collaborative applications (ICQ, shared whiteboard)

Distributed computation (Seti@home)

Ad-hoc networks

P2P System Taxonomy

Historic Data-centric Computation-centric User-centric Network-centric Platforms

Computation-centric

User-centric

sendMessage receiveMessage sendMessage receiveMessage

User-centric (Common implementation)

sendMessage receiveMessage sendMessage receiveMessage

Network-centric

Platforms

Find Peers … Send Messages

Gnutella Instant Messaging

P2P Goals/Benefits

Cost sharing Resource aggregation Improved scalability/reliability Increased autonomy Anonymity/privacy Dynamism Ad-hoc communication

P2P Challenges

Decentralization Scalability and Performance Anonymity Fairness Dynamism Security Transparency Fault Resilience and Robustness

Peer-to-Peer Content Sharing

Popular file sharing P2P Systems

Napster, Gnutella, Kazaa, Freenet

Large scale sharing of files. User A makes files (music, video, etc.) on

their computer available to others User B connects to the network, searches

for files and downloads files directly from user A

Issues of copyright infringement

Research Areas

Peer discovery and group management Data placement and searching Reliable and efficient file exchange Security/privacy/anonymity/trust

Design Concerns Group Management

Per-node state Load balancing Fault tolerance/resiliency

Search Bandwidth usage Time to locate item Success rate Fault tolerance/resiliency

Approaches

Centralized Unstructured Structured (Distributed Hash Tables)

Centralized index

original “Napster” design

1) when peer connects, it informs central server: IP address content

2) Alice queries for “Hey Jude”

3) Alice requests file from Bob

centralizeddirectory server

peers

Alice

Bob

1

1

1

12

3

Centralized modelBob Alice

JaneJudy

file transfer is decentralized, but locating content is highly centralized

Centralized Benefits:

Low per-node state Limited bandwidth usage Short location time High success rate Fault tolerant

Drawbacks: Single point of failure Limited scale Possibly unbalanced load

copyright infringement

Bob Alice

JaneJudy

Napster

program for sharing files over the Internet a “disruptive” application/technology? history:

5/99: Shawn Fanning (freshman, Northeasten U.) founds Napster Online music service

12/99: first lawsuit 3/00: 25% UWisc traffic Napster 2000: est. 60M users 2/01: US Circuit Court of

Appeals: Napster knew users violating copyright laws

7/01: # simultaneous online users:Napster 160K, Gnutella: 40K, Morpheus: 300K

Napster: how does it work

Application-level, client-server protocol over point-to-point TCP

Four steps: Connect to Napster server Upload your list of files (push) to server. Give server keywords to search the full list with. Select “best” of correct answers. (pings)

Napster

napster.com

users

File list is uploaded

1.

Napster

napster.com

user

Requestand

results

User requests search at server.

2.

Napster

napster.com

user

pingspings

User pings hosts that apparently have data.

Looks for best transfer rate.

3.

Napster

napster.com

user

Retrievesfile

User retrieves file

4.

Napster

Central Napster server Can ensure correct results Fast search

Bottleneck for scalability Single point of failure Susceptible to denial of service

• Malicious users• Lawsuits, legislation

Hybrid P2P system – “all peers are equal but some are more equal than others” Search is centralized File transfer is direct (peer-to-peer)

Unstructured

fully distributed no central server

used by Gnutella Each peer indexes

the files it makes available for sharing (and no other files)

overlay network: graph edge between peer X

and Y if there’s a TCP connection

all active peers and edges form overlay net

edge: virtual (not physical) link

given peer typically connected with < 10 overlay neighbors

Gnutella: Query flooding

Query

QueryHit

Query

Query

QueryHit

Query

Query

QueryHit

File transfer:HTTP

Query messagesent over existing TCPconnections peers forwardQuery message QueryHit sent over reversepath

Gnutella: Peer joining

1. joining peer Alice must find another peer in Gnutella network: use list of candidate peers

2. Alice sequentially attempts TCP connections with candidate peers until connection setup with Bob

3. Flooding: Alice sends Ping message to Bob; Bob forwards Ping message to his overlay neighbors (who then forward to their neighbors….) peers receiving Ping message respond to

Alice with Pong message4. Alice receives many Pong messages, and can

then setup additional TCP connections

Unstructured

Bob

Alice

Jane

Judy

Carl

Scalability:limited scopeflooding

Unstructured

Gnutella model Benefits:

Limited per-node state Fault tolerant

Drawbacks: High bandwidth usage Long time to locate item No guarantee on success

rate Possibly unbalanced load

Bob

Alice

Jane

Judy

Carl

Gnutella

Searching by flooding: If you don’t have the file

you want, query 7 of your neighbors.

If they don’t have it, they contact 7 of their neighbors, for a maximum hop count of 10.

Requests are flooded, but there is no tree structure.

No looping but packets may be received twice.

Reverse path forwarding

* Figure from http://computer.howstuffworks.com/file-sharing.htm

Gnutella

fool.* ?

TTL = 2

Gnutella

TTL = 1

TTL = 1

IPX:fool.her

fool.herX

TTL = 1

Gnutella

fool.you

fool.meY

IPY:fool.me fool.you

Gnutella

IPY:fool.me fool.you

Gnutella: strengths and weaknesses pros:

flexibility in query processing complete decentralization simplicity fault tolerance/self-organization

cons: severe scalability problems susceptible to attacks

Pure P2P system

Gnutella: initial problems and fixes 2000: avg size of reachable network only 400-

800 hosts. Why so small? modem users: not enough bandwidth to provide

search routing capabilities: routing black holes

Fix: create peer hierarchy based on capabilities previously: all peers identical, most modem black

holes preferential connection:

• favors routing to well-connected peers• favors reply to clients that themselves serve large

number of files: prevent freeloading

Structured

FreeNet, Chord, CAN, Tapestry, Pastry model

001 012

212

305

332

212 ?

212 ?

Structured

FreeNet, Chord, CAN, Tapestry, Pastry model

Benefits: Manageable per-node state Manageable bandwidth

usage and time to locate item

Guaranteed success

Drawbacks: Possibly unbalanced load Harder to support fault

tolerance

001 012

212

305

332

212 ?

212 ?

Unstructured vs Structured P2P The systems we described do not offer any

guarantees about their performance (or even correctness)

Structured P2P Scalable guarantees on numbers of hops to

answer a query Maintain all other P2P properties (load balance,

self-organization, dynamic nature)

Approach: Distributed Hash Tables (DHT)

Distributed Hash Tables (DHT)

Distributed version of a hash table data structure Stores (key, value) pairs

The key is like a filename The value can be file contents, or pointer to location

Goal: Efficiently insert/lookup/delete (key, value) pairs

Each peer stores a subset of (key, value) pairs in the system

Core operation: Find node responsible for a key Map key to node Efficiently route insert/lookup/delete request to this node

Allow for frequent node arrivals/departures

DHT Desirable Properties

Keys should mapped evenly to all nodes in the network (load balance)

Each node should maintain information about only a few other nodes (scalability, low update cost)

Messages should be routed to a node efficiently (small number of hops)

Node arrival/departures should only affect a few nodes

DHT Routing Protocols

DHT is a generic interface

There are several implementations of this interface Chord [MIT] Pastry [Microsoft Research UK, Rice University] Tapestry [UC Berkeley] Content Addressable Network (CAN) [UC Berkeley]

SkipNet [Microsoft Research US, Univ. of Washington] Kademlia [New York University] Viceroy [Israel, UC Berkeley] P-Grid [EPFL Switzerland] Freenet [Ian Clarke]

Basic Approach

In all approaches: keys are associated with globally unique IDs

integers of size m (for large m) key ID space (search space) is uniformly

populated - mapping of keys to IDs using (consistent) hashing

a node is responsible for indexing all the keys in a certain subspace (zone) of the ID space

nodes have only partial knowledge of other node’s responsibilities

Improvements: SuperPeers KaZaA model Hybrid centralized and unstructured Advantages and disadvantages?

Hierarchical Overlay

between centralized index, query flooding approaches

each peer is either a super node or assigned to a super node TCP connection between

peer and its super node. TCP connections between

some pairs of super nodes.

Super node tracks content in its children

ordinary peer

group-leader peer

neighoring re la tionshipsin overlay network

Kazaa (Fasttrack network)

Hybrid of centralized Napster and decentralized Gnutella hybrid P2P system

Super-peers act as local search hubs Each super-peer is similar to a Napster server for a small

portion of the network Super-peers are automatically chosen by the system based

on their capacities (storage, bandwidth, etc.) and availability (connection time)

Users upload their list of files to a super-peer Super-peers periodically exchange file lists You send queries to a super-peer for files of interest

File Distribution: Server-Client vs P2PQuestion : How much time to distribute file

from one server to N peers?

us

u2d1 d2u1

uN

dN

Server

Network (with abundant bandwidth)

File, size F

us: server upload bandwidth

ui: peer i upload bandwidth

di: peer i download bandwidth

File distribution time: server-client

us

u2d1 d2u1

uN

dN

Server


F server

sequentially sends N copies: NF/us time

client i takes F/di

time to download

increases linearly in N(for large N)

= dcs = max { NF/us, F/min(di) }i

Time to distribute F to N clients using

client/server approach

File distribution time: P2P

us

u2d1 d2u1

uN

dN

Server


F server must send one

copy: F/us time

client i takes F/di time to download

NF bits must be downloaded (aggregate) fastest possible upload rate: us + ui

dP2P = max { F/us, F/min(di) , NF/(us + ui) }i

0

0.5

1

1.5

2

2.5

3

3.5

0 5 10 15 20 25 30 35

N

Min

imu

m D

istr

ibut

ion

Tim

e P2P

Client-Server

Server-client vs. P2P: example

Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us

File distribution: BitTorrent

tracker: tracks peers participating in torrent

torrent: group of peers exchanging chunks of a file

obtain listof peers

trading chunks

peer

P2P file distribution

BitTorrent (1)

file divided into 256KB chunks. peer joining torrent:

has no chunks, but will accumulate them over time

registers with tracker to get list of peers, connects to subset of peers (“neighbors”)

while downloading, peer uploads chunks to other peers.

peers may come and go once peer has entire file, it may (selfishly) leave

or (altruistically) remain

BitTorrent (2)

Pulling Chunks at any given time,

different peers have different subsets of file chunks

periodically, a peer (Alice) asks each neighbor for list of chunks that they have.

Alice sends requests for her missing chunks rarest first

Sending Chunks: tit-for-tat Alice sends chunks to four

neighbors currently sending her chunks at the highest rate re-evaluate top 4 every

10 secs every 30 secs: randomly

select another peer, starts sending chunks newly chosen peer may

join top 4 “optimistically unchoke”

BitTorrent: Tit-for-tat(1) Alice “optimistically unchokes” Bob

(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates(3) Bob becomes one of Alice’s top-four providers

With higher upload rate, can find better trading partners & get file faster!

P2P: searching for information

File sharing (eg e-mule)

Index dynamically tracks the locations of files that peers share.

Peers need to tell index what they have.

Peers search index to determine where files can be found

Instant messaging Index maps user

names to locations. When user starts IM

application, it needs to inform index of its location

Peers search index to determine IP address of user.

Index in P2P system: maps information to peer location(location = IP address & port number).

P2P Case study: Skype

inherently P2P: pairs of users communicate.

proprietary application-layer protocol (inferred via reverse engineering)

hierarchical overlay with SNs

Index maps usernames to IP addresses; distributed over SNs

Skype clients (SC)

Supernode (SN)

Skype login server

Anonymity

Napster, Gnutella, Kazaa don’t provide anonymity Users know who they are downloading from Others know who sent a query

Freenet Designed to provide anonymity among

other features

Freenet

Keys are mapped to IDs Each node stores a cache of keys, and a routing

table for some keys routing table defines the overlay network

Queries are routed to the node with the most similar key

Data flows in reverse path of query Impossible to know if a user is initiating or forwarding a

query Impossible to know if a user is consuming or forwarding

data Keys replicated in (some) of the nodes along the path

Freenet routing

K117 ? TTL = 8

N01

N02

N03

N04

N07

N06

N05

N08

K124 N02 K317 N08 K613 N05

Freenet routing

K117 ?

TTL = 7

N01

N02

N03

N04

N07

N06

N05

N08

K124 N02 K317 N08 K613 N05

K514 N01

TTL = 6

sorry...

Freenet routing

K117 ?

TTL = 7

N01

N02

N03

N04

N07

N06

N05

N08

K124 N02 K317 N08 K613 N05

K514 N01

TTL = 6

K100 N07 K222 N06 K617 N05

TTL = 5

Freenet routing

K117 ?

TTL = 7

N01

N02

N03

N04

N07

N06

N05

N08

K124 N02 K317 N08 K613 N05

K514 N01

TTL = 6

K100 N07 K222 N06 K617 N05

TTL = 5 117

117

Freenet routing

K117 ?

TTL = 7

N01

N02

N03

N04

N07

N06

N05

N08

K124 N02 K317 N08 K613 N05

K514 N01

TTL = 6

K117 N07 K100 N07 K222 N06 K617 N05 TTL = 5

117 117

117

Freenet routing

N01

N02

N03

N04

N07

N06

N05

N08

K117 N07 K124 N02 K317 N08 K613 N05

K514 N01

K117 N07 K100 N07 K222 N06 K617 N05

117 117

117

Freenet routing

N01

N02

N03

N04

N07

N06

N05

N08

K117 N08K124 N02K317 N08 K613 N05

K514 N01

K117 N08 K100 N07 K222 N06 K617 N05

117 117

117

Inserts are performed similarly – they are unsuccessful queries

Freenet strengths and weaknesses pros:

complete decentralization fault tolerance/self-organization anonymity scalability (to some degree)

cons: questionable efficiency & performance rare keys disappear from the system improvement of performance requires high overhead

(maintenance of additional information for routing)

P2P Review

Two key functions of P2P systems Sharing content Finding content

Sharing content Direct transfer between peers

• All systems do this Structured vs. unstructured placement of data Automatic replication of data

Finding content Centralized (Napster) Decentralized (Gnutella) Probabilistic guarantees (DHTs)

Issues with P2P

Free Riding (Free Loading) Two types of free riding

• Downloading but not sharing any data• Not sharing any interesting data

On Gnutella• 15% of users contribute 94% of content• 63% of users never responded to a query

– Didn’t have “interesting” data

No ranking: what is a trusted source? “spoofing”

Peer-to-Peer (P2P) networks and applications. What is P2P? r “the sharing of computer resources...

Documents

Transcript of Peer-to-Peer (P2P) networks and applications. What is P2P? r “the sharing of computer resources...