Peer-to-Peer (P2P) networks and applications. What is P2P? r “the sharing of computer resources...
-
Upload
dwight-logan -
Category
Documents
-
view
225 -
download
0
Transcript of Peer-to-Peer (P2P) networks and applications. What is P2P? r “the sharing of computer resources...
Peer-to-Peer (P2P) networks and applications
What is P2P?
“the sharing of computer resources and services by direct exchange of information”
What is P2P?
“P2P is a class of applications that take advantage of resources – storage, cycles, content, human presence – available at the edges of the Internet. Because accessing these decentralized resources means operating in an environment of unstable and unpredictable IP addresses P2P nodes must operate outside the DNS system and have significant, or total autonomy from central servers”
What is P2P?
“A distributed network architecture may be called a P2P network if the participants share a part of their own resources. These shared resources are necessary to provide the service offered by the network. The participants of such a network are both resource providers and resource consumers”
What is P2P?
Various definitions seem to agree on sharing of resources direct communication between equals
(peers) no centralized control
What is a peer?
“…an entity with capabilities similar to other entities in the system.”
Client/Server Architecture
Well known, powerful, reliable server is a data source
Clients request data from server
Very successful model WWW (HTTP), FTP,
Web services, etc.
Server
Client
Client Client
Client
Internet
* Figure from http://project-iris.net/talks/dht-toronto-03.ppt
Client/Server Limitations
Scalability is hard to achieve Presents a single point of failure Requires administration Unused resources at the network edge
P2P systems try to address these limitations
P2P Architecture
All nodes are both clients and servers Provide and consume
data Any node can initiate
a connection
No centralized data source “The ultimate form of
democracy on the Internet”
“The ultimate threat to copy-right protection on the Internet”
* Content from http://project-iris.net/talks/dht-toronto-03.ppt
Node
Node
Node Node
Node
Internet
P2P Network Characteristics
Clients are also servers and routers Nodes contribute content, storage, memory, CPU
Nodes are autonomous (no administrative authority)
Network is dynamic: nodes enter and leave the network “frequently”
Nodes collaborate directly with each other (not through well-known servers)
Nodes have widely varying capabilities
P2P Goals and Benefits
Efficient use of resources Unused bandwidth, storage, processing power at the “edge of the network”
Scalability No central information, communication and computation bottleneck Aggregate resources grow naturally with utilization
Reliability Replicas Geographic distribution No single point of failure
Ease of administration Nodes self-organize Built-in fault tolerance, replication, and load balancing Increased autonomy
Anonymity – Privacy not easy in a centralized system
Dynamism highly dynamic environment ad-hoc communication and collaboration
What is Peer-to-Peer (P2P)?
P2P Applications
File sharing (Napster, Gnutella, Kazaa)
Multiplayer games (Unreal Tournament, DOOM)
Collaborative applications (ICQ, shared whiteboard)
Distributed computation (Seti@home)
Ad-hoc networks
P2P System Taxonomy
Historic Data-centric Computation-centric User-centric Network-centric Platforms
Computation-centric
User-centric
sendMessage receiveMessage sendMessage receiveMessage
User-centric (Common implementation)
sendMessage receiveMessage sendMessage receiveMessage
Network-centric
Network-centric
Platforms
Find Peers … Send Messages
Gnutella Instant Messaging
P2P Goals/Benefits
Cost sharing Resource aggregation Improved scalability/reliability Increased autonomy Anonymity/privacy Dynamism Ad-hoc communication
P2P Challenges
Decentralization Scalability and Performance Anonymity Fairness Dynamism Security Transparency Fault Resilience and Robustness
Peer-to-Peer Content Sharing
Popular file sharing P2P Systems
Napster, Gnutella, Kazaa, Freenet
Large scale sharing of files. User A makes files (music, video, etc.) on
their computer available to others User B connects to the network, searches
for files and downloads files directly from user A
Issues of copyright infringement
Research Areas
Peer discovery and group management Data placement and searching Reliable and efficient file exchange Security/privacy/anonymity/trust
Design Concerns Group Management
Per-node state Load balancing Fault tolerance/resiliency
Search Bandwidth usage Time to locate item Success rate Fault tolerance/resiliency
Approaches
Centralized Unstructured Structured (Distributed Hash Tables)
Centralized index
original “Napster” design
1) when peer connects, it informs central server: IP address content
2) Alice queries for “Hey Jude”
3) Alice requests file from Bob
centralizeddirectory server
peers
Alice
Bob
1
1
1
12
3
Centralized modelBob Alice
JaneJudy
file transfer is decentralized, but locating content is highly centralized
Centralized Benefits:
Low per-node state Limited bandwidth usage Short location time High success rate Fault tolerant
Drawbacks: Single point of failure Limited scale Possibly unbalanced load
copyright infringement
Bob Alice
JaneJudy
Napster
program for sharing files over the Internet a “disruptive” application/technology? history:
5/99: Shawn Fanning (freshman, Northeasten U.) founds Napster Online music service
12/99: first lawsuit 3/00: 25% UWisc traffic Napster 2000: est. 60M users 2/01: US Circuit Court of
Appeals: Napster knew users violating copyright laws
7/01: # simultaneous online users:Napster 160K, Gnutella: 40K, Morpheus: 300K
Napster: how does it work
Application-level, client-server protocol over point-to-point TCP
Four steps: Connect to Napster server Upload your list of files (push) to server. Give server keywords to search the full list with. Select “best” of correct answers. (pings)
Napster
napster.com
users
File list is uploaded
1.
Napster
napster.com
user
Requestand
results
User requests search at server.
2.
Napster
napster.com
user
pingspings
User pings hosts that apparently have data.
Looks for best transfer rate.
3.
Napster
napster.com
user
Retrievesfile
User retrieves file
4.
Napster
Central Napster server Can ensure correct results Fast search
Bottleneck for scalability Single point of failure Susceptible to denial of service
• Malicious users• Lawsuits, legislation
Hybrid P2P system – “all peers are equal but some are more equal than others” Search is centralized File transfer is direct (peer-to-peer)
Unstructured
fully distributed no central server
used by Gnutella Each peer indexes
the files it makes available for sharing (and no other files)
overlay network: graph edge between peer X
and Y if there’s a TCP connection
all active peers and edges form overlay net
edge: virtual (not physical) link
given peer typically connected with < 10 overlay neighbors
Gnutella: Query flooding
Query
QueryHit
Query
Query
QueryHit
Query
Query
QueryHit
File transfer:HTTP
Query messagesent over existing TCPconnections peers forwardQuery message QueryHit sent over reversepath
Gnutella: Peer joining
1. joining peer Alice must find another peer in Gnutella network: use list of candidate peers
2. Alice sequentially attempts TCP connections with candidate peers until connection setup with Bob
3. Flooding: Alice sends Ping message to Bob; Bob forwards Ping message to his overlay neighbors (who then forward to their neighbors….) peers receiving Ping message respond to
Alice with Pong message4. Alice receives many Pong messages, and can
then setup additional TCP connections
Unstructured
Bob
Alice
Jane
Judy
Carl
Scalability:limited scopeflooding
Unstructured
Gnutella model Benefits:
Limited per-node state Fault tolerant
Drawbacks: High bandwidth usage Long time to locate item No guarantee on success
rate Possibly unbalanced load
Bob
Alice
Jane
Judy
Carl
Gnutella
Searching by flooding: If you don’t have the file
you want, query 7 of your neighbors.
If they don’t have it, they contact 7 of their neighbors, for a maximum hop count of 10.
Requests are flooded, but there is no tree structure.
No looping but packets may be received twice.
Reverse path forwarding
* Figure from http://computer.howstuffworks.com/file-sharing.htm
Gnutella
fool.* ?
TTL = 2
Gnutella
TTL = 1
TTL = 1
IPX:fool.her
fool.herX
TTL = 1
Gnutella
fool.you
fool.meY
IPY:fool.me fool.you
Gnutella
IPY:fool.me fool.you
Gnutella: strengths and weaknesses pros:
flexibility in query processing complete decentralization simplicity fault tolerance/self-organization
cons: severe scalability problems susceptible to attacks
Pure P2P system
Gnutella: initial problems and fixes 2000: avg size of reachable network only 400-
800 hosts. Why so small? modem users: not enough bandwidth to provide
search routing capabilities: routing black holes
Fix: create peer hierarchy based on capabilities previously: all peers identical, most modem black
holes preferential connection:
• favors routing to well-connected peers• favors reply to clients that themselves serve large
number of files: prevent freeloading
Structured
FreeNet, Chord, CAN, Tapestry, Pastry model
001 012
212
305
332
212 ?
212 ?
Structured
FreeNet, Chord, CAN, Tapestry, Pastry model
Benefits: Manageable per-node state Manageable bandwidth
usage and time to locate item
Guaranteed success
Drawbacks: Possibly unbalanced load Harder to support fault
tolerance
001 012
212
305
332
212 ?
212 ?
Unstructured vs Structured P2P The systems we described do not offer any
guarantees about their performance (or even correctness)
Structured P2P Scalable guarantees on numbers of hops to
answer a query Maintain all other P2P properties (load balance,
self-organization, dynamic nature)
Approach: Distributed Hash Tables (DHT)
Distributed Hash Tables (DHT)
Distributed version of a hash table data structure Stores (key, value) pairs
The key is like a filename The value can be file contents, or pointer to location
Goal: Efficiently insert/lookup/delete (key, value) pairs
Each peer stores a subset of (key, value) pairs in the system
Core operation: Find node responsible for a key Map key to node Efficiently route insert/lookup/delete request to this node
Allow for frequent node arrivals/departures
DHT Desirable Properties
Keys should mapped evenly to all nodes in the network (load balance)
Each node should maintain information about only a few other nodes (scalability, low update cost)
Messages should be routed to a node efficiently (small number of hops)
Node arrival/departures should only affect a few nodes
DHT Routing Protocols
DHT is a generic interface
There are several implementations of this interface Chord [MIT] Pastry [Microsoft Research UK, Rice University] Tapestry [UC Berkeley] Content Addressable Network (CAN) [UC Berkeley]
SkipNet [Microsoft Research US, Univ. of Washington] Kademlia [New York University] Viceroy [Israel, UC Berkeley] P-Grid [EPFL Switzerland] Freenet [Ian Clarke]
Basic Approach
In all approaches: keys are associated with globally unique IDs
integers of size m (for large m) key ID space (search space) is uniformly
populated - mapping of keys to IDs using (consistent) hashing
a node is responsible for indexing all the keys in a certain subspace (zone) of the ID space
nodes have only partial knowledge of other node’s responsibilities
Improvements: SuperPeers KaZaA model Hybrid centralized and unstructured Advantages and disadvantages?
Hierarchical Overlay
between centralized index, query flooding approaches
each peer is either a super node or assigned to a super node TCP connection between
peer and its super node. TCP connections between
some pairs of super nodes.
Super node tracks content in its children
ordinary peer
group-leader peer
neighoring re la tionshipsin overlay network
Kazaa (Fasttrack network)
Hybrid of centralized Napster and decentralized Gnutella hybrid P2P system
Super-peers act as local search hubs Each super-peer is similar to a Napster server for a small
portion of the network Super-peers are automatically chosen by the system based
on their capacities (storage, bandwidth, etc.) and availability (connection time)
Users upload their list of files to a super-peer Super-peers periodically exchange file lists You send queries to a super-peer for files of interest
File Distribution: Server-Client vs P2PQuestion : How much time to distribute file
from one server to N peers?
us
u2d1 d2u1
uN
dN
Server
Network (with abundant bandwidth)
File, size F
us: server upload bandwidth
ui: peer i upload bandwidth
di: peer i download bandwidth
File distribution time: server-client
us
u2d1 d2u1
uN
dN
Server
Network (with abundant bandwidth)
F server
sequentially sends N copies: NF/us time
client i takes F/di
time to download
increases linearly in N(for large N)
= dcs = max { NF/us, F/min(di) }i
Time to distribute F to N clients using
client/server approach
File distribution time: P2P
us
u2d1 d2u1
uN
dN
Server
Network (with abundant bandwidth)
F server must send one
copy: F/us time
client i takes F/di time to download
NF bits must be downloaded (aggregate) fastest possible upload rate: us + ui
dP2P = max { F/us, F/min(di) , NF/(us + ui) }i
0
0.5
1
1.5
2
2.5
3
3.5
0 5 10 15 20 25 30 35
N
Min
imu
m D
istr
ibut
ion
Tim
e P2P
Client-Server
Server-client vs. P2P: example
Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us
File distribution: BitTorrent
tracker: tracks peers participating in torrent
torrent: group of peers exchanging chunks of a file
obtain listof peers
trading chunks
peer
P2P file distribution
BitTorrent (1)
file divided into 256KB chunks. peer joining torrent:
has no chunks, but will accumulate them over time
registers with tracker to get list of peers, connects to subset of peers (“neighbors”)
while downloading, peer uploads chunks to other peers.
peers may come and go once peer has entire file, it may (selfishly) leave
or (altruistically) remain
BitTorrent (2)
Pulling Chunks at any given time,
different peers have different subsets of file chunks
periodically, a peer (Alice) asks each neighbor for list of chunks that they have.
Alice sends requests for her missing chunks rarest first
Sending Chunks: tit-for-tat Alice sends chunks to four
neighbors currently sending her chunks at the highest rate re-evaluate top 4 every
10 secs every 30 secs: randomly
select another peer, starts sending chunks newly chosen peer may
join top 4 “optimistically unchoke”
BitTorrent: Tit-for-tat(1) Alice “optimistically unchokes” Bob
(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates(3) Bob becomes one of Alice’s top-four providers
With higher upload rate, can find better trading partners & get file faster!
P2P: searching for information
File sharing (eg e-mule)
Index dynamically tracks the locations of files that peers share.
Peers need to tell index what they have.
Peers search index to determine where files can be found
Instant messaging Index maps user
names to locations. When user starts IM
application, it needs to inform index of its location
Peers search index to determine IP address of user.
Index in P2P system: maps information to peer location(location = IP address & port number).
P2P Case study: Skype
inherently P2P: pairs of users communicate.
proprietary application-layer protocol (inferred via reverse engineering)
hierarchical overlay with SNs
Index maps usernames to IP addresses; distributed over SNs
Skype clients (SC)
Supernode (SN)
Skype login server
Anonymity
Napster, Gnutella, Kazaa don’t provide anonymity Users know who they are downloading from Others know who sent a query
Freenet Designed to provide anonymity among
other features
Freenet
Keys are mapped to IDs Each node stores a cache of keys, and a routing
table for some keys routing table defines the overlay network
Queries are routed to the node with the most similar key
Data flows in reverse path of query Impossible to know if a user is initiating or forwarding a
query Impossible to know if a user is consuming or forwarding
data Keys replicated in (some) of the nodes along the path
Freenet routing
K117 ? TTL = 8
N01
N02
N03
N04
N07
N06
N05
N08
K124 N02 K317 N08 K613 N05
Freenet routing
K117 ?
TTL = 7
N01
N02
N03
N04
N07
N06
N05
N08
K124 N02 K317 N08 K613 N05
K514 N01
TTL = 6
sorry...
Freenet routing
K117 ?
TTL = 7
N01
N02
N03
N04
N07
N06
N05
N08
K124 N02 K317 N08 K613 N05
K514 N01
TTL = 6
K100 N07 K222 N06 K617 N05
TTL = 5
Freenet routing
K117 ?
TTL = 7
N01
N02
N03
N04
N07
N06
N05
N08
K124 N02 K317 N08 K613 N05
K514 N01
TTL = 6
K100 N07 K222 N06 K617 N05
TTL = 5 117
117
Freenet routing
K117 ?
TTL = 7
N01
N02
N03
N04
N07
N06
N05
N08
K124 N02 K317 N08 K613 N05
K514 N01
TTL = 6
K117 N07 K100 N07 K222 N06 K617 N05 TTL = 5
117 117
117
Freenet routing
N01
N02
N03
N04
N07
N06
N05
N08
K117 N07 K124 N02 K317 N08 K613 N05
K514 N01
K117 N07 K100 N07 K222 N06 K617 N05
117 117
117
Freenet routing
N01
N02
N03
N04
N07
N06
N05
N08
K117 N08K124 N02K317 N08 K613 N05
K514 N01
K117 N08 K100 N07 K222 N06 K617 N05
117 117
117
Inserts are performed similarly – they are unsuccessful queries
Freenet strengths and weaknesses pros:
complete decentralization fault tolerance/self-organization anonymity scalability (to some degree)
cons: questionable efficiency & performance rare keys disappear from the system improvement of performance requires high overhead
(maintenance of additional information for routing)
P2P Review
Two key functions of P2P systems Sharing content Finding content
Sharing content Direct transfer between peers
• All systems do this Structured vs. unstructured placement of data Automatic replication of data
Finding content Centralized (Napster) Decentralized (Gnutella) Probabilistic guarantees (DHTs)
Issues with P2P
Free Riding (Free Loading) Two types of free riding
• Downloading but not sharing any data• Not sharing any interesting data
On Gnutella• 15% of users contribute 94% of content• 63% of users never responded to a query
– Didn’t have “interesting” data
No ranking: what is a trusted source? “spoofing”