1 9/28/2009 Network Applications and Network Programming: Web and P2P.
-
Upload
peter-newman -
Category
Documents
-
view
222 -
download
0
Transcript of 1 9/28/2009 Network Applications and Network Programming: Web and P2P.
Admin.
Submit programming assignment 0 using class server if you want, you can also send email
attachment to TA
2
Recap: FTP, HTTP
FTP: file transfer ASCII (human-readable
format) requests and responses stateful server one data channel and one control channel
HTTP Extensibility: ASCII requests, header lines,
entity body, and responses line Scalability/robustness
• stateless server (each request should contain the full information); DNS load balancing
• Client caching Web caches one data channel
3
Recap: WebServer Flow
TCP socket space
state: listeningaddress: {*.6789, *.*}completed connection queue: sendbuf:recvbuf:
128.36.232.5128.36.230.2
state: listeningaddress: {*.25, *.*}completed connection queue:sendbuf:recvbuf:
state: establishedaddress: {128.36.232.5:6789, 198.69.10.10.1500}sendbuf:recvbuf:
connSocket = accept()
Create ServerSocket(6789)
read request from connSocket
read local file
write file to connSocket
close connSocket
Recap: Writing High Performance Servers:
Major Issues: Many socket/IO operations can cause processing to block, e.g., accept: waiting for new connection; read a socket waiting for data or close; write a socket waiting for buffer space; I/O read/write for disk to finish
Thus a crucial perspective of network server design is the concurrency design (non-blocking) for high performance to avoid denial of service
A technique to avoidblocking: Thread
Multi-Threaded Web Server
6
connSocket = accept()
Create ServerSocket(6789)
Create thread for connSocket
read request from connSocket
read local file
write file to connSocket
close connSocket
read request from connSocket
read local file
write file to connSocket
close connSocket
Recap: Writing High Performance Servers Problems of multiple
threads Too many threads
throughput meltdown, response time explosion
Event-Driven Programming
Event-driven programming, also called asynchronous i/o Tell the OS to not block when accepting/reading/writing on sockets Java: asynchronous i/o
for an example see: http://www.cafeaulait.org/books/jnp3/examples/12/
Yields efficient and scalable concurrency Many examples: Click router, Flash web server, TP Monitors, etc.
Web Server
9
connSocket = accept()
Create ServerSocket(6789)
Create thread for connSocket
read request from connSocket
read local file
write file to connSocket
close connSocket
read request from connSocket
read local file
write file to connSocket
close connSocket
If the OS will not block on sockets, how may the program structure look
like?
Typical Structure of Async i/o
Typically, async i/o programs use Finite State Machines (FSM) to monitor the progress of requests The state info keeps track of the execution
stage of processing each request, e.g., reading request, writing reply, …
The program has a loop to check potential events at each state
10
Async I/O in Java
An important class is the class Selector, to support event loop
A Selector is a multiplexer of selectable channel objects example channels: DatagramChannel, ServerSocketChannel, SocketChannel
use configureBlocking(false) to make a channel non-blocking
A selector may be created by invoking the open method of this class
Async I/O in Java
A selectable channel registers events (called a SelectionKey) with a selector with the register method
A SelectionKey object contains two operation sets interest Set ready Set
A SelectionKey object has an attachment which can store data often the attachment is a
buffer
Selector
Selection Key
Selectable Channel
register
Async I/O in Java
Call select (or selectNow(), or select(int timeout)) to check for ready events, called the selected key set
Iterate over the set to process all ready events
Problems of Event-Driven Server
Difficult to engineer, modularize, and tune
No performance/failure isolation between Finite-State-Machines (FSMs)
FSM code can never block (but page faults, i/o, garbage collection may still force a block) thus still need multiple threads
Summary of Traditional C-S Web Servers
Is the application extensible, scalable, robust, secure?
15
app. server
C0
client 1
client 2
client 3
client n
DNS
Content Distribution History...
“With 25 years of Internet experience, we’ve learned exactly one way to deal with the exponential growth: Caching”.
(1997, Van Jacobson)
16
17
Web Caches (Proxy)
Web caches/proxy placed at entrance of an ISP
Client sends all http requests to web cache if object at web
cache, web cache immediately returns object in http response
else requests object from origin server, then returns http response to client
client
Proxyserver
client
http request
http re
quest
http response
http re
sponse
http re
quest
http re
sponse
http requesthttp response
origin server
origin server
Web Proxy/Cache
Web caches give good performance because very often a single client
repeatedly accesses the same document
a nearby client also accesses the same document
Cache Hit ratio increases logarithmically with number of users
18
app. server
C0
client 1
client 2 client
3
ISP cache
client 4
client 5
client 6
ISP cache
19
Benefits of Web Caching
Assume: cache is “close” to client (e.g., in same network)
smaller response time: cache “closer” to client
decrease traffic to distant servers link out of
institutional/local ISP network often bottleneck
originservers
public Internet
institutionalnetwork 10 Mbps LAN
1.5 Mbps access link
institutionalcache
What went wrong with Web Caches? Web protocols evolved extensively to
accommodate caching, e.g. HTTP 1.1 However, Web caching was developed with a
strong ISP perspective, leaving content providers out of the picture It is the ISP who places a cache and controls it ISPs only interest to use Web caches is to reduce
bandwidth
In the USA: Bandwidth relative cheap In Europe, there were many more Web caches
However, ISPs can arbitrarily tune Web caches to deliver stale content
20
Content Provider Perspective
Content providers care about User experience latency Content freshness Accurate access statistics Avoid flash crowds Minimize bandwidth usage in their access
link
21
Content Distribution Networks Content Distribution Networks (CDNs) build an
overlay networks of caches to provide fast, cost effective, and reliable content delivery, while working tightly with content providers.
Example: Akamai – original and largest commercial CDN
operates over 25,000 servers in over 1,000 networks
Akamai (AH kuh my) is Hawaiian for intelligent, clever and informally “cool”. Founded Apr 99, Boston MA by MIT students
22
Basic of Akamai Operation Content provider server
provides the base HTML document Akamai caches embedded objects at a set
of its cache servers (called edge servers) Akamaization of embedded content: e.g.,
<IMG SRC= http://www.provider.com/image.gif > changed to
<IMGSRC = http://a661. g.akamai.net/hash/image.gif>
Akamai customizes DNS to select serving edge servers based on closeness to client browser server load
23
More Akamai information
URL akamaization is becoming obsolete and only supported for legacy reasons Currently most content providers prefer to
use DNS CNAME techniques to get all their content served from the Akamai servers
still content providers need to run their origin servers
Akamai Evolution: Files/streaming Secure pages and whole pages Dynamic page assembly at the edge (ESI) Distributed applications
24
Discussion: Problems of Traditional Content Distribution
25
app. server
C0
client 1
client 2
client 3
client n
DNS
26
Objectives of P2P
Share the resources (storage and bandwidth) of individual clients to improve scalability/robustness
Bypass DNS to find clients with resources! examples: instant
messaging, skype
Internet
P2P
But P2P is not new
Original Internet was a p2p system: The original ARPANET connected UCLA,
Stanford Research Institute, UCSB, and Univ. of Utah
No DNS or routing infrastructure, just connected by phone lines
Computers also served as routers
P2P is simply an iteration of scalable distributed systems
P2P Systems
File Sharing: BitTorrent, LimeWireStreaming: PPLive, PPStream, Zatto,
…Research systems
Collaborative computing: SETI@Home project
• Human genome mapping• Intel NetBatch: 10,000 computers in 25
worldwide sites for simulations, saved about 500million
Peer-to-Peer Computing- 40-70% of total traffic in many networks- upset the music industry, drawn college
students, web developers, recording artists and universities into court
Source: ipoque Internet study 2008/2009
30
Recap: P2P Objectives
Bypass DNS to locate clients with resources!examples: instant
messaging, skype
Share the storage and bandwidth of individual clients to improve scalability/robustness
Internet
The Lookup Problem
Internet
N1
N2 N3
N6N5
N4
Publisher
Key=“title”Value=MP3 data… Client
Lookup(“title”)
?
find where a particular file is storedpay particular attention to see its equivalence of
DNS
33
Centralized Database: Napster Program for sharing music over the Internet History:
5/99: Shawn Fanning (freshman, Northeasten U.) founded Napster Online music service, wrote the program in 60 hours
12/99: first lawsuit 3/00: 25% UWisc traffic Napster 2000: est. 60M users 2/01: US Circuit Court of
Appeals: Napster knew users violating copyright laws
7/01: # simultaneous online users:Napster 160K
9/02: bankruptcy
We are referring to the Napster before closure.03/2000
34
Napster: How Does it Work?
Application-level, client-server protocol over TCP
A centralized index system that maps files (songs) to machines that are alive and with files
Steps: Connect to Napster server Upload your list of files (push) to server Give server keywords to search the full list Select “best” of hosts with answers
40
Napster MessagesGeneral Packet Format
[chunksize] [chunkinfo] [data...]
CHUNKSIZE: Intel-endian 16-bit integer size of [data...] in bytes
CHUNKINFO: (hex) Intel-endian 16-bit integer.
00 - login rejected 02 - login requested 03 - login accepted 0D - challenge? (nuprin1715) 2D - added to hotlist 2E - browse error (user isn't online!) 2F - user offline
5B - whois query 5C - whois result 5D - whois: user is offline! 69 - list all channels 6A - channel info 90 - join channel 91 - leave channel …..
41
Centralized Database: Napster Summary of features: a hybrid design
control: client-server (aka special DNS) for files data: peer to peer
Advantages simplicity, easy to implement sophisticated
search engines on top of the index system
Disadvantages application specific (compared with DNS) lack of robustness, scalability: central search
server single point of bottleneck/failure easy to sue !
42
Variation: BitTorrent
A global central index server is replaced by one tracker per file (called a swarm) reduces centralization; but needs other
means to locate trackers
The bandwidth scalability management technique is more interesting more later
43
Outline
Recap P2P
the lookup problem Napster (central query server; distributed
data servers) Gnutella
Gnutella
On March 14th 2000, J. Frankel and T. Pepper from AOL’s Nullsoft division (also the developers of the popular Winamp mp3 player) released Gnutella
Within hours, AOL pulled the plug on it
Quickly reverse-engineered and soon many other clients became available: Bearshare, Morpheus, LimeWire, etc.
44
45
Decentralized Flooding: Gnutella
On startup, client contacts other servents (server + client) in network to form interconnection/peering relationships servent interconnection used to forward control (queries,
hits, etc) How to find a resource record: decentralized flooding
send requests to neighbors neighbors recursively forward the requests
47
Decentralized Flooding
B
A
C E
F
H
J
S
D
G
IK
send query to neighbors
M
N
L
Each node forwards the query to its neighbors other than the onewho forwards it the query
48
Background: Decentralized Flooding
B
A
C E
F
H
J
S
D
G
IK
M
N
L
Each node should keep track of forwarded queries to avoid loop ! nodes keep state (which will time out---soft state) carry the state in the query, i.e. carry a list of visited nodes
49
Decentralized Flooding: Gnutella
Basic message header Unique ID, TTL, Hops
Message types Ping – probes network for other servents Pong – response to ping, contains IP addr, # of files, etc.
Query – search criteria + speed requirement of servent QueryHit – successful response to Query, contains addr
+ port to transfer from, speed of servent, etc.
Ping, Queries are flooded QueryHit, Pong: reverse path of previous message
50
Advantages and Disadvantages of Gnutella
Advantages: totally decentralized, highly robust
Disadvantages: not scalable; the entire network can be swamped
with flood requests• especially hard on slow clients; at some point broadcast
traffic on Gnutella exceeded 56 kbps to alleviate this problem, each request has a TTL to
limit the scope• each query has an initial TTL, and each node forwarding
it reduces it by one; if TTL reaches 0, the query is dropped (consequence?)
Flooding: FastTrack (aka Kazaa) Modifies the Gnutella protocol into two-level hierarchy Supernodes
Nodes that have better connection to Internet Act as temporary indexing servers for other nodes Help improve the stability of the network
Standard nodes Connect to supernodes and report list of files
Search Broadcast (Gnutella-style) search across
supernodes Disadvantages
Kept a centralized registration prone to law suits
52
Outline
Recap P2P
the lookup problem Napster (central query server; distributed
data server) Gnutella (decentralized, flooding) Freenet
53
Freenet History
final year project Ian Clarke , Edinburgh University, Scotland, June, 1999
Goals: totally distributed system without using centralized
index or broadcast (flooding) respond adaptively to usage patterns, transparently
moving, replicating files as necessary to provide efficient service
provide publisher anonymity, security free speech : resistant to attacks – a third party
shouldn’t be able to deny (e.g., deleting) the access to a particular file (data item, object)
54
Basic Structure of Freenet
Each machine stores a set of files; each file is identified by a unique identifier (called key or id)
Each node maintains a “routing table” id – file id, key next_hop node – where a file
corresponding to the id might be available
file – local copy if one exists
id next_hop file
……
…
55
Freenet Query Search by routing API: file = query(id);
Upon receiving a query for file id check whether the queried file is stored locally check TTL to limit the search scope
• each query is associated a TTL that is decremented each time the query message is forwarded
• when TTL=1, the query is forwarded with a probability• TTL can be initiated to a random value within some bounds to obscure
distance to originator look for the “closest” id in the table with an unvisited next_hope
node• if found one, forward the query to the corresponding next_hop• otherwise, backtrack
– ends up performing a Depth First Search (DFS)-like traversal– search direction ordered by closeness to target
When file is returned it is cached along the reverse path (any advantage?)
id next_hop file
……
…
56
Query Example
4 n1 f412 n2 f12 5 n3
9 n3 f9
3 n1 f314 n4 f14 5 n3
14 n5 f1413 n2 f13 3 n6
n1 n2
n3
n4
4 n1 f410 n5 f10 8 n6
n5
query(10)
1
2
3
4
4’
5
Beside the routing table, each node also maintains a query table containing the state of all outstanding queries that have traversed it
to backtrack
57
Insert
API: insert(id, file); Two steps
first attempt a “search” for the file to be inserted
if found, report collision
if not found, insert the file by sending it along the query path
• thus inserted files are placed on nodes already possessing files with similar keys
• a node probabilistically replaces the originator with itself (why?)
58
Insert Example Assume query returned failure along the shown path
(backtrack slightly complicate things); insert f10
4 n1 f412 n2 f12 5 n3
9 n3 f9
3 n1 f314 n4 f14 5 n3
14 n5 f1413 n2 f13 3 n6
n1 n2
n3
n4
4 n1 f411 n5 f11 8 n6
n5
insert(10, f10)
59
Insert Example
10 n1 f10 4 n1 f412 n2
14 n5 f1413 n2 f13 3 n6
n1
n4
4 n1 f411 n5 f11 8 n6
n5
insert(10, f10)
9 n3 f9
n2orig=n1
3 n1 f314 n4 f14 5 n3
n3
60
Insert Example
10 n1 f10 4 n1 f412 n2
10 n1 f10 9 n3 f9
14 n5 f1413 n2 f13 3 n6
n1 n2
n4
4 n1 f411 n5 f11 8 n6
n5
insert(10, f10)
3 n1 f314 n4 f14 5 n3
n3
61
Insert Example
n2 replaces the originator (n1) with itself
10 n1 f10 4 n1 f412 n2
10 n1 f10 9 n3 f9
14 n5 f1413 n2 f13 3 n6
n1 n2
n4
4 n1 f411 n5 f11 8 n6
n5
insert(10, f10)
orig=n2
3 n1 f314 n4 f14 5 n3
n3
62
Insert Example
10 n1 f10 4 n1 f412 n2
10 n1 f10 9 n3 f9
10 n2 10 3 n1 f314 n4
14 n5 f1413 n2 f13 3 n6
n1 n2
n3
n4
4 n1 f411 n5 f11 8 n6
n5
insert(10, f10)
orig=n2
63
Insert Example
10 n1 f10 4 n1 f412 n2
10 n1 f10 9 n3 f9
10 n2 10 3 n1 f314 n4
10 n2 f1014 n5 f1413 n2
n1 n2
n3
n4
10 n4 f10 4 n1 f411 n5
n5
Insert(10, f10)
64
Freenet Analysis
Authors claim the following effects: nodes eventually specialize in locating similar keys
• if a node is listed in a routing table, it will get queries for related keys
• thus will gain “experience” answering those queries popular data will be transparently replicated and will
exist closer to requestors as nodes process queries, connectivity increases
• nodes will discover other nodes in the network
Caveat: lexigraphic closeness of file names/keys may not imply content similarity
65
Understanding Freenet Self-Organization: Freenet Graph
We create a Freenet reference graph creating a vertex for each Freenet node adding a directed link from A to B if A refers
to an item stored at B
id next_hop file
……
66
Experiment: Freenet Graph: Init
- Assume a network of 1000 nodes, with node id 0 to 999- Each node can store 50 data items, and 200 references- Assume initially each node i has i, and knows the storage of i
– 2, -1, i + 1, i + 2 (all mod 1000) - thus a regular, locally-clustered graph with avg path length
~ 1000 / 8 = 125
i-2 i-1 i i+1 i+2
id next_hop file
……
67
Experiment: Evolution of Freenet Graph
At each step pick a node
randomly flip a coin to
determine search or insert
• if search, randomly pick a key in the network
• if insert, pick a random key
Evolution of path length and clustering;Clustering is defined as percentage of local links
68
Freenet Evolves to Small-World Network With usage, the
regular, highly localized Freenet network evolved into one irregular graph
High percentage of highly connected nodes provide shortcuts/bridges make the world a “small
world” most queries only
traverse a small number of hops to find the file
69
Small-World
First discovered by Milgrom in 1967, Milgram mailed 160 letters to a set of
randomly chosen people in Omaha, Nebraska goal: pass the letters to a given person in Boston
• each person can only pass the letter to an intermediary known on a first-name basis
• pick the person who may make the best progress result: 42 letters made it through ! median intermediaries was 5.5---thus six degree of
separation
a potential explanation: highly connected people with non-local links in mostly locally connected communities improve search performance !
70
Kleinberg’s Result on Distributed Search
Question: how many long distance links to maintain so that distributed (greedy) search is effective?
Assume that the probability of a long link is some () inverse-power of the number of lattice steps
Kleinberg’s Law: Distributed algorithm exists only when probability is proportional to (lattice steps)-d, where d is the dimension of the space
71
Distributed Search
In other words, if double distance, increase number of neighbors by a constant -> see Chord
probability is proportional to (lattice steps)-d
73
Freenet: Properties Query using intelligent routing
decentralized architecture robust avoid flooding low overhead DFS search guided by closeness to target
Integration of query and caching makes it adaptive to usage patterns: reorganize network
reference structure free speech: attempts to discover/supplant existing
files will just spread the files !
Provide publisher anonymity, security each node probabilistically replaces originator with
itself
74
Freenet: Issues
Does not always guarantee that a file is found, even if the file is in the network
Good average-case performance, but a potentially long search path in a large network approaching small-world…
75
Summary
All of the previous p2p systems are called unstructured p2p systems
Advantages of unstructured p2p algorithms tend to be simple can optimize for properties such as locality
Disadvantages hard to make performance guarantee failure even when files exist
Aside: All Peers Equal?
56kbps Modem
10Mbps LAN
1.5Mbps DSL
56kbps Modem56kbps Modem
1.5Mbps DSL
1.5Mbps DSL
1.5Mbps DSL
Aside: Network Resilience
Partial Topology Random 30% die Targeted 4% die
from Saroiu et al., MMCN 2002
82
A Relay TCP Client: telnet-like Program
TCP client
TCP server
writen
readn
fgets
fputs
http://zoo.cs.yale.edu/classes/cs433/programming/examples-c-socket/tcpclient
83
Method 1: Process and Thread process
fork() waitpid()
Thread: light weight process pthread_create() pthread_exit()
84
pthread
Void main() { char recvline[MAXLINE + 1]; ss = new socketstream(sockfd);
pthread_t tid; if (pthread_create(&tid, NULL, copy_to, NULL)) { err_quit("pthread_creat()"); }
while (ss->read_line(recvline, MAXLINE) > 0) { fprintf(stdout, "%s\n", recvline); }}
void *copy_to(void *arg) { char sendline[MAXLINE];
if (debug) cout << "Thread create()!" << endl; while (fgets(sendline, sizeof(sendline), stdin)) ss->writen_socket(sendline, strlen(sendline));
shutdown(sockfd, SHUT_WR); if (debug) cout << "Thread done!" << endl;
pthread_exit(0);}
85
Method 2: Asynchronous I/O (Select)
select: deal with blocking system callint select(int n, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);
FD_CLR(int fd, fd_set *set);FD_ZERO(fd_set *set);FD_ISSET(int fd, fd_set *set);FD_SET(int fd, fd_set *set);
87
Examples of Network Programming
Library to make life easier Four design examples
TCP Client TCP server using select TCP server using process and thread Reliable UDP
Warning: It will be hard to listen to me reading through the code. Read the code.
88
Example 2: A Concurrent TCP Server Using Process or Thread
Get a line, and echo it back Use select() For how to use process or thread, see
later Check the code at:
http://zoo.cs.yale.edu/classes/cs433/programming/examples-c-socket/tcpserver
Are there potential denial of service problems with the code?