INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.

Post on 18-Jan-2018

218 views 0 download

description

Introduction First instance of peer-to-peer file sharing dates back to December 1987 Wayne Bell created WWIVnet Still exists: Other systems now exist. 3

Transcript of INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.

INTERNET TECHNOLOGIES

Week 10 Peer to Peer Paradigm

1

Introduction

• Client-servers will be discussed next week

• Peer to Peer this week.

2

Server

Clients

Simultaneous Server/Clients

Introduction

• First instance of peer-to-peer file sharing dates back to December 1987• Wayne Bell created WWIVnet• Still exists:

• http://bbs.filenet.wwiv.net • Other systems now exist.

3

P2P Networks

• Internet users that are ready to share their resources become peers and form a network

• When a peer in the network has a file to share, it makes it available to the rest of the peers

• An interested peer can connect itself to the computer where the file is stored and download it.

4

Centralised Network

• Hybrid P2P Network• Directory system (listing peers and what they offer)

located on a central server (client-server paradigm)• Storage and downloading occurs via P2P paradigm• Peer queries central server

• Server sends IP address of nodes holding files• Peer then downloads files from those nodes

• Directory constantly updated as nodes join and leave network.

5

Centralised Network

• Maintenance of directory very simple• Drawbacks

• Directory vulnerable to attack• Whole system fails if servers go down

• Original Napster used centralised Network• Made them liable for copyright breaches• New Napster a legal pay per music site.

6

Figure 29.1: Centralised network

7

Decentralised Network

• Peers arrange themselves into an overlay network• Logical network on top of the physical network

• Can be classified as• Unstructured Networks• Structured Networks.

8

Unstructured Network

• Nodes linked randomly• Queries need to flood network

• Can result in high traffic ie not efficient• Examples include

• Gnutella• Freenet.

9

Gnutella• Unstructured decentralised P2P network• Directory randomly distributed between nodes• Node A sends query (request for file location) to a known

neighbour node (eg W)• If Node W knows location of requested data

• Sends location of data back to Node A• If Node W doesn’t know

• Sends queries to all its known neighbours• Eventually info gets back to A (if it exists) and

Node A can get copy of file.

10

Gnutella

• Queries flood the network and can cause a large amount of traffic

• NB each node must have at least 1 neighbour• On initial software install, a list of peers are included• Later the commands 'ping' and 'pong’ used to query if

nodes 'alive'• Unstructured networks do not scale well• Gnutella uses a tiered system (ultra nodes and leaves)

as well as Query Routing Protocol and Dynamic Querying to reduce overhead.

11

Structured Network

• Predefined set of rules to link nodes• Queries are resolved effectively and efficiently• Distributed Hash Table (DHT) most common

technique used • Domain Name System (DNS)• BitTorrent.

12

Distributed Hash Table (DHT)

• Distributes data among a set of nodes according to some predefined rules

• Each peer in a DHT-based network becomes responsible for a range of data items

• DHT-based networks allow each peer to have partial knowledge about whole network • Avoids flooding overhead found in unstructured

P2P networks.

13

Address Space

• Each data item and responsible peer mapped to a point in a large address space of size 2m

• Uses modular arithmetic• Points in address space distributed evenly on a

circle with 2m points (from 0 to 2m – 1)• Most DHT implementations use m = 160

(~1.5x1048 points)• Textbook uses m = 5, 25 = 32 in examples for

simplification.

14

Figure 29.2: Address space

15

Hashing Identifiers

• Peers added to address space ring• Usually use a hash function to encode IP address

• hash function is any function that can be used to map digital data of arbitrary size to digital data of fixed size

• node ID = hash (Peer IP address)• Name of object (eg filename) also hashed and added

to address space ring• key = hash (Object name)

16

Storing Objects

• Two strategies• Direct

• Object stored (on original peer) closest to key• Indirect

• Peer keeps object, reference to object stored on another peer close to key

• Most common strategy.

17

Example 29.1

• For Figure 29.3, assume several peers already joined• Node N5 (IP address 110.34.56.20) has file 'Liberty’ to

share with peers• Node makes hash of filename, 'Liberty' to get key = 14• Closest node to key 14 is node N17• N5 creates reference to filename (key), its IP address,

and the port number etc, then sends reference to be stored in node N17

• ie file stored in N5, key of file is k14 (a point in the DHT ring), but reference to file stored in node N17.

18

Figure 29.3: Example 29.1

19

Distributed Hash Table (DHT)

• Main function is to route a query to node responsible for storing reference to an object

• Different routing strategies are used by different systems

• All involve nodes that have partial knowledge of the ring to route queries to node closest to responsible nodes

• All implementations need to handle departures and arrivals of peers in their networks.

20

P2P Networks

• Three P2P protocols that use DHT• Chord protocol

• Simple and elegant approach to routing queries• Pastry protocol

• More complex than chord• Kademila protocol

• Similar to Pastry, different distance measuring protocol.

21

Chord

• Published by Stoker in 2001

• Used in several applications • Collaborative File System (CFS)• ConChord• Distributive Domain Name System (DDNS).

22

Pastry

• Another popular protocol in the P2P paradigm • Designed by Rowstron and Druschel in 2001• Uses DHT• Some fundamental differences between Pastry and

Chord in identifier space and routing process.

23

Pastry

• Used in some applications• PAST

• Distributed file system• SCRIBE

• Decentralised publish/subscribe system.

24

Kademlia

• Another DHT peer-to-peer network• Designed by Maymounkov and Mazières in 2002• Similar to Pastry, routes messages based on the

distance between nodes• Address space based on a binary tree• Interpretation of the distance metric uses bitwise

XOR function to measure distances.

25

Kademlia

26

BitTorrent

• Designed by Bram Cohen (2001) for sharing large files among a set of peers

• Based on Kademlia• Sharing different from other file-sharing protocols• Instead of one peer allowing another peer to

download the whole file, a group of peers take part in process to give all peers in the group a copy of file

• File sharing a collaborative process called a torrent.

27

BitTorrent with a Tracker

• Original BitTorrent• Another entity in a torrent, called 'the tracker’

• Central server tracks seeds and peers in swarm• Seeds

• Peer with whole file• Leeches

• Peer with part data (downloading more).

28

29

Figure 29.12: Example of a torrent

Trackerless BitTorrent

• Original BitTorrent design• If tracker fails, new peers cannot connect to

network and updating interrupted• New implementations of BitTorrent eliminate need

for centralised tracker.

30

End

31