CS3516-10-P2P - Worcester Polytechnic Institute · (1) Alice “optimistically unchokes” Bob (2)...
Transcript of CS3516-10-P2P - Worcester Polytechnic Institute · (1) Alice “optimistically unchokes” Bob (2)...
1
CS 3516: Advanced Computer Networks
Prof. Yanhua Li
Welcome to
Time: 9:00am –9:50am M, T, R, and F Location: Fuller 320
Fall 2016 A-term
Some slides are originally from the course materials of the textbook “Computer Networking: A Top Down Approach”, 6th edition, by
Jim Kurose, Keith Ross, Addison-Wesley March 2012. Copyright 1996-2013 J.F Kurose and K.W. Ross, All Rights Reserved.
Application Layer 2-2
Chapter 2: outline
2.6 P2P applications 1. P2P vs Client&Server 2. Unstructured Peer-to-Peer Networks BitTorrent 3. Structured Peer-to-Peer Networks Distributed Hash Table (DHT)
Application Layer 2-3
Pure P2P architecture v no always-on server v arbitrary end systems
directly communicate v peers are intermittently
connected and change IP addresses
examples: § file distribution
(BitTorrent) § Streaming (KanKan) § VoIP (Skype)
Application Layer 2-4
File distribution: client-server vs P2P
Question: how much time to distribute file (size F) from one server to N peers? § peer upload/download capacity is limited resource
us
uN
dN
server
network (with abundant bandwidth)
file, size F
us: server upload capacity
ui: peer i upload capacity
di: peer i download capacity u2 d2
u1 d1
di
ui
Application Layer 2-5
File distribution time: client-server
v server transmission: must sequentially send (upload) N file copies: § time to send one copy: F/us § time to send N copies: NF/us
increases linearly in N
time to distribute F to N clients using
client-server approach Dc-s > max{NF/us,,F/dmin}
v client: each client must download file copy § dmin = min client download rate § min client download time: F/dmin
us
network di
ui
F
Application Layer 2-6
File distribution time: P2P
v server transmission: must upload at least one copy § time to send one copy: F/us
time to distribute F to N clients using
P2P approach
us
network di
ui
F
DP2P > max{F/us,,F/dmin,,NF/(us + Σui)}
v client: each client must download file copy § min client download time: F/dmin
v clients: as aggregate must download NF bits § max upload rate (limting max download rate) is us + Σui
… but so does this, as each peer brings service capacity increases linearly in N …
Application Layer 2-7
0
0.5
1
1.5
2
2.5
3
3.5
0 5 10 15 20 25 30 35
N
Min
imum
Dis
tribu
tion
Tim
e P2PClient-Server
Client-server vs. P2P: example
client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us
8
Peer-to-Peer Networks: Unstructured and Structured v Unstructured Peer-to-Peer Networks
§ Napster § Gnutella § BitTorrent
v Distributed Hash Tables (DHT) and Structured Networks
Application Layer 2-9
Chapter 2: outline
2.6 P2P applications 1. P2P vs Client&Server 2. Unstructured Peer-to-Peer Networks § Napster § Gnutella § BitTorrent 3. Structured Peer-to-Peer Networks Distributed Hash Table (DHT)
10
Peer-to-Peer Networks: How Did it Start?
v A killer application: Napster § Free music over the Internet
v Key idea: share the content, storage and bandwidth of individual (home) users
Internet
11
Model
v Each user stores a subset of files v Each user has access (can download) files from all
users in the system
13
Other Challenges
v Scale: up to hundred of thousands or millions of machines
v Dynamicity: machines can come and go any time
14
Peer-to-Peer Networks: Napster v Napster history: the rise
§ January 1999: Napster version 1.0 § May 1999: company founded § September 1999: first lawsuits § 2000: 80 million users
v Napster history: the fall § Mid 2001: out of business due to lawsuits § Mid 2001: dozens of P2P alternatives that were harder
to touch, though these have gradually been constrained § 2003: growth of pay services like iTunes
v Napster history: the resurrection § 2003: Napster reconstituted as a pay service § 2006: still lots of file sharing going on
Shawn Fanning,Northeastern freshman
15
Napster Technology: Directory Service v User installing the software
§ Download the client program § Register name, password, local directory, etc.
v Client contacts Napster (via TCP) § Provides a list of music files it will share § … and Napster’s central server updates the directory
v Client searches on a title or performer § Napster identifies online clients with the file § … and provides IP addresses
v Client requests the file from the chosen supplier § Supplier transmits the file to the client § Both client and supplier report status to Napster
16
Napster v Assume a centralized index system that maps files
(songs) to machines that are alive v How to find a file (song)
§ Query the index system à return a machine that stores the required file
• Ideally this is the closest/least-loaded machine § ftp the file
v Advantages: § Simplicity, easy to implement sophisticated search
engines on top of the index system v Disadvantages:
§ Robustness, scalability
18
Napster Technology: Properties v Server’s directory continually updated
§ Always know what music is currently available v Peer-to-peer file transfer
§ No load on the server v As a protocol
§ Login, search, upload, download, and status operations § No security: cleartext passwords and other vulnerability
v Bandwidth issues § Suppliers ranked by apparent bandwidth & response time
19
Napster: Limitations of Central Directory
v Single point of failure v Performance bottleneck v Copyright infringement
v So, later P2P systems were more distributed
File transfer is decentralized, but locating content is highly centralized
20
Peer-to-Peer Networks: Gnutella
v Gnutella history § 2000: J. Frankel &
T. Pepper released Gnutella
§ Soon after: many other clients (e.g., Morpheus, Limewire, Bearshare)
§ 2001: protocol enhancements, e.g., “ultrapeers”
v Query flooding § Join: contact a few nodes to
become neighbors § Publish: no need! § Search: ask neighbors, who
ask their neighbors § Fetch: get file directly from
another node
21
Gnutella v Distribute file location v Idea: flood the request v How to find a file:
§ Send request to all neighbors § Neighbors recursively multicast the request § Eventually a machine that has the file receives
the request, and it sends back the answer v Advantages:
§ Totally decentralized, highly robust v Disadvantages:
§ Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL)
22
Gnutella v Ad-hoc topology v Queries are flooded for bounded number of hops v No guarantees on recall
Query: “xyz”
xyz
xyz
23
Gnutella: Query Flooding
v Fully distributed § No central server
v Public domain protocol
v Many Gnutella clients implementing protocol
Overlay network: graph v Edge between peer X and
Y if there’s a TCP connection
v All active peers and edges is overlay net
24
Gnutella: Protocol
Query QueryHit
Query QueryHit
File transfer: HTTP
Scalability: limited scope flooding
• Query message sent over existing TCP connections
• Peers forward Query message
• QueryHit sent over reverse path
25
Gnutella: Pros and Cons
v Advantages § Fully decentralized,
v Disadvantages § Search scope may be quite large § Search time may be quite long § High overhead and nodes come and go often
26
BitTorrent: Simultaneous Downloading
v Divide large file into many pieces § Replicate different pieces on different peers § A peer with a complete piece can trade with other
peers § Peer can (hopefully) assemble the entire file
v Allows simultaneous downloading § Retrieving different parts of the file from different
peers at the same time
27
BitTorrent Components v Seed
§ Peer with entire file § Fragmented in pieces
v Leech § Peer with an incomplete copy of the file
v Torrent file § Passive component § Stores summaries of the pieces to allow peers to verify
their integrity
v Tracker § Allows peers to find each other § Returns a list of random peers
28
BitTorrent: Overall Architecture
Web page with link to .torrent
A
B
C
Peer
[Leech]
Downloader
“US”
Peer
[Seed]
Peer
[Leech]
Tracker Web Server
.torre
nt
29
BitTorrent: Overall Architecture
Web page with link to .torrent
A
B
C
Peer
[Leech]
Downloader
“US”
Peer
[Seed]
Peer
[Leech]
Tracker Web Server
30
BitTorrent: Overall Architecture
Web page with link
to .torrent
A
B
C
Peer
[Leech]
Downloader
“US”
Peer
[Seed]
Peer
[Leech]
Tracker Web Server
31
BitTorrent: Overall Architecture
Web page with link
to .torrent
A
B
C
Peer
[Leech]
Downloader
“US”
Peer
[Seed]
Peer
[Leech]
Tracker
Shake-hand
Web Server
32
BitTorrent: Overall Architecture
Web page with link
to .torrent
A
B
C
Peer
[Leech]
Downloader
“US”
Peer
[Seed]
Peer
[Leech]
Tracker
pieces
Web Server
33
BitTorrent: Overall Architecture
Web page with link
to .torrent
A
B
C
Peer
[Leech]
Downloader
“US”
Peer
[Seed]
Peer
[Leech]
Tracker
pieces
Web Server
34
BitTorrent: Overall Architecture
Web page with link
to .torrent
A
B
C
Peer
[Leech]
Downloader
“US”
Peer
[Seed]
Peer
[Leech]
Tracker
pieces
Web Server
35
Free-Riding Problem in P2P Networks v Vast majority of users are free-riders
§ Most share no files and answer no queries § Others limit # of connections or upload speed
v A few “peers” essentially act as servers § A few individuals contributing to the public good § Making them hubs that basically act as a server
v BitTorrent prevent free riding § Allow the fastest peers to download from you § Occasionally let some free loaders download
Application Layer 2-36
P2P file distribution: BitTorrent
tracker: tracks peers participating in torrent
torrent: group of peers exchanging chunks of a file
Alice arrives …
v file divided into 256Kb chunks v peers in torrent send/receive file chunks
… obtains list of peers from tracker … and begins exchanging file chunks with peers in torrent
Application Layer 2-37
v peer joining torrent: § has no chunks, but will
accumulate them over time from other peers
§ registers with tracker to get list of peers, connects to subset of peers (“neighbors”)
P2P file distribution: BitTorrent
v while downloading, peer uploads chunks to other peers v peer may change peers with whom it exchanges chunks v churn: peers may come and go v once peer has entire file, it may (selfishly) leave or
(altruistically) remain in torrent
Application Layer 2-38
BitTorrent: requesting, sending file chunks
requesting chunks: v at any given time, different
peers have different subsets of file chunks
v periodically, Alice asks each peer for list of chunks that they have
v Alice requests missing chunks from peers, rarest first
sending chunks: tit-for-tat v Alice sends chunks to those
four peers currently sending her chunks at highest rate § other peers are choked by Alice
(do not receive chunks from her) § re-evaluate top 4 every10 secs
v every 30 secs: randomly select another peer, starts sending chunks § “optimistically unchoke” this peer § newly chosen peer may join top 4
Application Layer 2-39
BitTorrent: tit-for-tat (1) Alice “optimistically unchokes” Bob (2) Alice becomes one of Bob’s top-four providers; Bob reciprocates (3) Bob becomes one of Alice’s top-four providers
higher upload rate: find better trading partners, get file faster !
Application Layer 2-40
Chapter 2: outline
2.6 P2P applications 1. P2P vs Client&Server 2. Unstructured Peer-to-Peer Networks BitTorrent 3. Structured Peer-to-Peer Networks Distributed Hash Table (DHT)
Distributed Hash Table (DHT)
v Hash table
v DHT paradigm
v Circular DHT and overlay networks
v Peer churn
Key Value John Washington 132-54-3570 Diana Louise Jones 761-55-3791 Xiaoming Liu 385-41-0902 Rakesh Gopal 441-89-1956 Linda Cohen 217-66-5609 ……. ……… Lisa Kobayashi 177-23-0199
Simple database with(key, value) pairs: • key: human name; value: social security #
Simple Database
• key: movie title; value: IP address
Original Key Key Value John Washington 8962458 132-54-3570 Diana Louise Jones 7800356 761-55-3791 Xiaoming Liu 1567109 385-41-0902 Rakesh Gopal 2360012 441-89-1956 Linda Cohen 5430938 217-66-5609 ……. ……… Lisa Kobayashi 9290124 177-23-0199
• More convenient to store and search on numerical representation of key • key = hash(original key)
Hash Table
v Distribute (key, value) pairs over millions of peers § pairs are evenly distributed over peers
v Any peer can query database with a key § database returns value for the key § To resolve query, small number of messages exchanged among
peers v Each peer only knows about a small number of other
peers v Robust to peers coming and going (churn)
Distributed Hash Table (DHT)
Naïve method: Random distribution of (key, value) pairs.
Assign key-value pairs to peers v rule: assign key-value pair to the peer that has the
closest ID. v convention: closest is the immediate successor of
the key. v e.g., ID space {0,1,2,3,…,63} v suppose 8 peers: 1,12,13,25,32,40,48,60
§ If key = 51, then assigned to peer 60 § If key = 60, then assigned to peer 60 § If key = 61, then assigned to peer 1
1
12
13
25
3240
48
60
Circular DHT
• each peer only aware of immediate successor and predecessor.
“overlay network”
1
12
13
25
3240
48
60
Whatisthevalueassociatedwithkey53?
value
O(N) messages on average to resolve query, when there are N peers
Resolving a query
Circular DHT with shortcuts
• each peer keeps track of IP addresses of predecessor, successor, short cuts.
• reduced from 6 to 3 messages. • possible to design shortcuts with O(log N) neighbors, O(log N)
messages in query
1
12
13
25
3240
48
60
Whatisthevalueforkey53value
Peer churn
example: peer 5 abruptly leaves
1
3
4
5
810
12
15
handling peer churn: v peers may come and go (churn) v each peer knows address of its two successors v each peer periodically pings its two successors to check aliveness v if immediate successor leaves, choose next successor as new immediate successor
Peer churn
example: peer 5 abruptly leaves v peer 4 detects peer 5’s departure; makes 8 its immediate successor v 4 asks 8 who its immediate successor is; makes 8’s immediate successor its second successor.
1
3
4
810
12
15
handling peer churn: v peers may come and go (churn) v each peer knows address of its two successors v each peer periodically pings its two successors to check aliveness v if immediate successor leaves, choose next successor as new immediate successor
How about node 3?
Application Layer 2-51
Chapter 2: summary
v application architectures § client-server § P2P
v application service requirements: § reliability, throughput,
delay, security v Internet transport service
model § connection-oriented,
reliable: TCP § unreliable, datagrams: UDP
our study of network apps now complete!
v specific protocols: § HTTP § FTP § SMTP, POP, IMAP § DNS § P2P: BitTorrent, DHT
Application Layer 2-52
v typical request/reply message exchange: § client requests info or
service § server responds with
data, status code v message formats:
§ headers: fields giving info about data
§ data: info being communicated
important themes: v control vs. data msgs
§ in-band, out-of-band v centralized vs. decentralized v stateless vs. stateful v reliable vs. unreliable msg
transfer v “complexity at network
edge”
Chapter 2: summary most importantly: learned about protocols!