1 Peer-To-Peer Data Management Hector Garcia-Molina ICDE Conference, February 28, 2002.

40
1 Peer-To-Peer Data Management Hector Garcia-Molina ICDE Conference, February 28, 2002
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of 1 Peer-To-Peer Data Management Hector Garcia-Molina ICDE Conference, February 28, 2002.

1

Peer-To-Peer Data Management

Hector Garcia-Molina

ICDE Conference, February 28, 2002

2

What is P2P?

napster

gnutellamaorpheus

kazaa

bearshare seti@home

folding@home

ebay

limewire

icq

fiorana

mojo nation

jxta

united devicesopen cola

uddi

process tree

can

chord

ocean store

farsite

pastry

tapestry

?grove

netmeeting

freenet

popular power

aim

jabber

3

Napster

central index

join

query

answer

get

file

...

4

Gnutella

query

5

Morpheus

...

...

......

...

...

super peer

6

Seti@Home

satellite dish

...

raw data chunk

analyzed data

central site

7

Lockss

library A

library B

library C

library E

library DD1

D2

D3

8

PeerCast

Stanford

source

Stanford

source

after:

before:

9

What is a P2P System?

Multiple sites (at edge)

Distributed resources

Sites are autonomous (different owners)

Sites are both clients and servers

Sites have equal functionality

P2P Purity

10

P2P is BAD IDEA!!

Distribution is expensive!

Specialized functionality is good!

11

Example: Distributed Data Management

Distribution is expensive

If you must distribute:build centralized directory, index

• use backups for reliability

for replicated data, use primary copy• use backups for reliability

12

Computational Efficiency is NOT Main Goal

Main driving force in a P2P system:exploiting existing (often free) resourcessharing costs among manylegal protectionautonomyanonymity

13

Should We Do P2P Research?

Should we help people break the law?

Analogy: Should we develop pillows, knives, hammers, drugs, bath tubs, cars, airplanes, ... ??

14

Should We Do P2P Research?

YES: P2P not exclusively for breaking lawRemember the VCR

YES: P2P can liberate us from culture “plantation owners” (Lessig)

15

Is “Free Culture’’ Feasible?

Example: Legal texts

Can we afford it?

economicactivity

rules of the game

today

16

Should DB community work on P2P?

YES

17

P2P Challenges

Easier to list NON-Research-Topics:Color schemes for P2P NodesImpact of P2P on Moroccan 15th Century Literature

18

P2P Challenges

Search

Resource Management

Security & Privacy

19

Search Taxonomy

lookup

content queriessearch

single site regional global

scope of index

freenet

gnutella napstermorpheus

can

routing

replicated SP

partial

20

Index Implementation Taxonomy

yes

no

centralized distributed P2P

nature of index

freenetgnutella

napster

morpheus

can

routing

inde

x lo

cati

on c

orre

late

d w

ith

cont

ent l

ocat

ion replicated SP

partial

21

Content Addressable Network (CAN)

1

2

NodesData

22

Can We Improve Flooding?

yes

no

centralized distributed P2P

nature of index

freenetgnutella

napster

morpheus

can

routing

inde

x lo

cati

on c

orre

late

d w

ith

cont

ent l

ocat

ion replicated SP

partial

23

Directed BFS in Gnutella

Heuristics for Selecting Direction

>RES: Returned most results

<TIME: Shortest satisfaction time

<HOPS: Min hops for results

>MSG: Sent us most messages (all types)

<QLEN: Shortest queue

<LAT: Shortest latency

>DEG: Highest degree

query?...

24

How Does One Evaluate?

Live Gnutella?

Use real Gnutella as “laboratory”

25

Time to Satisfaction for Directed BFS

26

Routing Index

A B

C

D

5025C

AIDB

015D

AIDB

050B

AIDB

015D

200A

5025C

2065B

7075B

5090B

200A

AIDB

Q(DB)

27

Types of Routing Indexes

Compound

Hop Count

Exponential Decay

Strategies for CyclesIgnore (for Hop-Count, exponential)Avoid Update CyclesDetect Update Cycles and Recover

28

Effect of Index Compression

0

100

200

300

400

500

600

0% 50% 67% 75% 80% 83%

Index Compression

Me

ss

age

s CRI

HRI

ERI

No-RI

29

Effect of Network Topology

0

100

200

300

400

500

600

700

CRI HRI ERI No RI

Me

ssa

ge

s Tree

Tree+Cycle

Powerlaw

30

Resource Management

Resource:storage (lockss)CPU processing (seti@home)bandwidth (PeerCast)

Issues:fairnessload balancing

31

Example: Data Trading

site 1 site 2 site 3

A1 B1 C1

A2 B2 C2

B1 A1

trade

B2 A2

trade

32

Example: Data Trading

site 1 site 2 site 3

A1 B1 C1

A2 B2 C2

B1 A1

trade

C1

A2 trade

C2 B2

trade

33

Data Trading

Order of trades impacts reliability

Issues:Swaps vs. DeedsFixed price vs. bidsPreference to

• sites with a lot of space?

• reliable sites?

• “desperate” sites?

34

Effect of Bid Policies

-50

0

50

100

150

200

250

300

350

2 3 4 5

Local storage factor (F)

MT

TF

dif

fere

nce

vs.

Fix

edP

rice

(p

erce

nt)

FreeSpace InverseRareCollection RareCollection UsedSpace

bid more (ask more in return)when I have more free space

bid more (ask more in return)when I have less free space

35

Effect of One Maverick Site

-40

-20

0

20

40

60

80

100

120

140

2 3 4 5

Local storage factor (F)

MT

TF

dif

fere

nce

vs.

no

mav

eric

k si

tes

(per

cen

t)

Normal No maverick sites Maverick

always bids high

36

Security & Privacy

Issues:AnonymityReputationAccountabilityInformation PreservationInformation QualityTrustDenial of service attacks

37

Information Preservation

Example Policy: make 3 copies of documents

A1 make copies

What can go wrong?

38

What Can Go Wrong?

“Bad” sites make copies

“Bad” site alters copy

“Bad” site publishes fake

“Bad” site makes may copies of other docs

...

A1 make copies

A’1

A1

39

Conclusion

P2P systems popular today

P2P systems vulnerable and inefficient

Many challenges aheadSearchResource ManagementSecurity and Privacy

40

For Additional Information

Google: “Stanford Peers”

http://www-db.stanford.edu/peers/