LowTalk.ppt

30
Optimal Scheduling in Peer-to-Peer Networks Lee Center Workshop 5/19/06 Mortada Mehyar (with Prof. Steven Low, Netlab)

Transcript of LowTalk.ppt

Page 1: LowTalk.ppt

Optimal Scheduling in Peer-to-Peer Networks

Lee Center Workshop 5/19/06

Mortada Mehyar(with Prof. Steven Low, Netlab)

Page 2: LowTalk.ppt

Outline

Brief description of p2p file sharing and Bittorrent protocol

Our model for Bittorrent-like file sharing

Efficiency of scheduling algorithms with respect to different optimality criteria.

Page 3: LowTalk.ppt

About Bittorrent

A p2p protocol started ~ 2002 The most popular p2p system. It

accounts for 35% of all Internet traffic! (according to British Web analysis firm CacheLogic)

Warner Brothers to distribute films through Bittorrent (May 2006)

Page 4: LowTalk.ppt

Bittorrent Basics

Divide file into small pieces (256KB).

Utilize all peers’ upload capacities

server

client

client

client

Problem: large file (~GB) and large demand (10s, 1000s or more clients.) It is not feasible to set up infrastructure for traditional client-server download.

Page 5: LowTalk.ppt

Bittorrent schematicSeed (peer with entire file)

peer

peer

peer

new peer(with torrent file)

tracker

Page 6: LowTalk.ppt

Bittorrent algorithms: who to upload to?

Tit-for-tat: upload to peers from which most data downloaded in last 30 seconds (4 peers by default.)

Therefore: incentive to upload in order to be chosen by other peers!

Page 7: LowTalk.ppt

Bittorrent Algorithms: What piece to send?

Rarest-first: upload the piece that is rarest among your neighbors first

1 2

1 12

3

Page 8: LowTalk.ppt

The ‘Broadcasting Model’

t = 0 1

t = 1 1

t = 2 1 1

1 1 1 1t = 3

M = 1, N = 7, all upload capacities are 1 piece per unit time

Page 9: LowTalk.ppt

Example: M = 2 , N = 3

t = 0

t = 1 1

t = 2 1 1

1 2

t = 3 2

1

2 1

1 2

2 21

NM log+(rarest first!)NM log⋅

2 2t = 4

Page 10: LowTalk.ppt

Equal capacities, general M, N

Theorem 1:There exists a schedule for a server to broadcast M messages to N nodes in M+logN time [Bar-Noy et al, 2000]

However, it is very difficult to extend the result to networks of different capacities

Page 11: LowTalk.ppt

‘Uplink Sharing Model’

1 server, N peers with possibly different capacities.

Suppose upload capacities are the only bottleneck.

Suppose M >> 1SC

1C

2C3C

F

Page 12: LowTalk.ppt

Optimal Last Finish Time

Theorem 2:the minimal time for all N peers to obtain a file F (optimal last finish time) from a server is

( )*L

*L

*L T, ... ,T,T

where F is the file size and Cs, C1,…,CN are the upload capacities. There always exists a schedule S0 such that the finish time vector is

+=

∑ =

N

j jSS CC

NF

C

F

1

*L ,maxT

Page 13: LowTalk.ppt

Example (Zero Peer Capacities) Suppose all peers have 0 capacity,

consider the following two strategies Divide capacity equally among peers:

Upload to peers one by one:

SSS C

NF

C

F

C

F,....,

2,

SSS C

NF

C

NF

C

NF,....,,

The last finish time is the same, but the latter is obviously better! In fact, the latter can be shown to be ‘average finish time’ optimal.

Page 14: LowTalk.ppt

Optimal Average Finish Time (N=3)

SC2

321 CCC ++2

321

CCC ++0

t1 t2 t30 t1 t2 t3 t1 t2 t3

SC

F

finish time

*LT

*LT

Page 15: LowTalk.ppt

Conclusion and Ongoing Work

Simple model with rich structure for understanding efficiency of p2p file sharing

It captures many issues Bittorrent addresses (e.g. favoring fast peers, rarest first policy)

Lots of questions remain open: understanding fairness-efficiency tradeoff other kinds of optimality criteria

Page 16: LowTalk.ppt

Netlab’s other research projects http://netlab.caltech.edu

More details about this work [email protected]

Page 17: LowTalk.ppt

Thank You!

Page 18: LowTalk.ppt

Backup slides start here

Page 19: LowTalk.ppt

Another way to look at Ts

( )

( )

−<

−≥

=∑

∑∑

=

=

=

1 if ,

1 if ,

T

1

1

1*S

N

CC

C

F

N

CC

C

NF

N

j j

SS

N

j j

SN

j j

Page 20: LowTalk.ppt

Previous Bittorrent Modeling Work

Qiu & Srikant [Sigcomm’04] Predator-prey-like fluid models Assumes equal capacities among peers Assumes rates of peer joins/leaves and

studies equilibrium and stability

Page 21: LowTalk.ppt

Proof of Theorem 2

+≥

∑ =

N

j jSS CC

NF

C

F

1

*S ,maxT

First notice that the two terms have to be lower bounds of the optimal last finish time

So it remains to show that the equality is achievable. Here’s a strategy for that:

Page 22: LowTalk.ppt

Proof of Theorem 2

11

−≥

∑ =

N

CC

N

j j

S

1−NCi

When

the server allocates to peer i:

N

CC

N

C

NN

CC N

j jS

N

j j

N

j j

S ∑∑∑

==

=+

=−

+−−

11

1

11

Each peer therefore receives:

+=

+=

∑∑ ==

N

j jSSN

j jS CC

NF

C

F

CC

NF

11

*S ,maxT

Page 23: LowTalk.ppt

Proof of Theorem 2

11

−<

∑ =

N

CC

N

j j

S

∑ =

N

j j

Si

C

CC

1

When

the server allocates to peer i:

S

N

iN

j j

Si CC

CC =∑∑==

11

Each peer therefore receives:

+==

∑ =

N

j jSSS CC

NF

C

F

C

F

1

*S ,maxT

Page 24: LowTalk.ppt

Bittorrent Basics Torrent file:

Meta data about the file: filename, size, author, etc.

Hash info for each file piece to verify integrity

Link to centralized tracker Published on the Web

Tracker: Keeps track of the IPs of peers ‘Bootstraps’ new peers Centralized, but does not coordinate data

transfer among peers

Page 25: LowTalk.ppt

P2P systems

Napster (centralized directory) Kazza (semi-decentralized system with super

peers) Gnutella (e.g. Limewire, Bearshare,

decentralized) Bittorrent (most popular and successful for

distribution of large files)

Page 26: LowTalk.ppt

Another way to look at TL

( )

( )

−<

−≥

=∑

∑∑

=

=

=

1 if ,

1 if ,

T

1

1

1*L

N

CC

C

F

N

CC

C

NF

N

j j

SS

N

j j

SN

j j

Page 27: LowTalk.ppt

Non-Zero Peer Capacities

If the peer capacities are not all 0, then the “upload one by one” strategy can be shown to result in these finish times:

However… this is not average finish time optimal!

++++ ∑ −

=

1

1211

,....,3

,2

,1

N

j jSSSS CC

N

CCCCCC

Page 28: LowTalk.ppt

Comparing finish time vectors

Definition: a finish time vector v1 is strictly better than another finish time vector v2 if no component of v1 is larger, and some component of v1 is smaller than the corresponding component of v2

- (2, 3, 3) strictly better than (3, 3, 3) - (1, 2, 3) (2, 2, 2) cannot be compared

with respect to this

Page 29: LowTalk.ppt

The ‘Broadcasting Model'

Assume discrete-time, synchronous system where N nodes have equal upload capacity of 1 “message” per unit time

Objective: find a schedule such that every node receives all M messages in minimal time

Page 30: LowTalk.ppt

Assumptions reasonable for p2p

Size of file pieces (256KB for BT) is usually much smaller than total size of file (~GB). Namely, the number of pieces M >> 1.

Upload links are usually much slower (e.g. DSL lines), so assume upload capacities are the only bottleneck.