Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive...

30
www.dvs1.informatik.tu- darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof Leng, Alejandro P. Buchmann Databases and Distributed Systems Group Technische Universität Darmstadt Germany

Transcript of Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive...

Page 1: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

www.dvs1.informatik.tu-darmstadt.de

BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search

Wesley W. Terpstra, Jussi Kangasharju, Christof Leng, Alejandro P. Buchmann

Databases and Distributed Systems Group

Technische Universität Darmstadt

Germany

Page 2: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

2

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Classification of P2P Search

Centralized

EDonkey2000

Chord

CAN

Kademlia

P-Grid

Tapestry Pastry

Napster

Structured Unstructured

Gnutella

Random Walks Gia

BubbleStorm

FastTrack

Page 3: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

4

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Why unstructured search?

Page 4: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

5

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Query processing is independent from routing

This simplifies application development:

Implementing a query language locally distributed implementation “for free”

Reuse existing libraries for query languages SQLite, XPath, Lucene, …

No need to invent a new algorithm per query language

Separation of Concerns

Page 5: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

6

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Expressive Searches are Easy

Any selective query can be supported

One operation in unstructured systems can perform a full-text search range restriction on file size hierarchical type selection

Structured systems break queries into small pieces e.g. DHTs must transform the query into key-value Cannot simply compare the cost of operations

Page 6: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

7

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Example: Full text search

One op. fetches documents May contact more peers

Perform multiple lookups Transfer word lists (not via DHT) Fetch documents

Keyword 1

Keyword 2

Keyword 3

Doc 1

Doc 2

Doc 2

Doc 1

Unstructured Overlay DHT with inverted index

Latency favours the unstructured approach Relative bandwidth requirements highly parameter dependent

Page 7: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

8

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Not everything is uniform

Natural load balancing

P2P applications often handle Zipfian loads Human text has =1 YouTube has =0.5

An unstructured request can be served by any peer Heterogeneity is accommodated by irregular degree In comparison, adding a keyspace creates hot

spots

Page 8: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

9

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Example: Zipfian Load

For natural languages (e.g. full text search) in a keyspace: Expected load on most the loaded peer is 7000x average The loaded peer probably has only average capacity

1000

10000

100

10

1

0.1 100000 10000

Node Rank

1000 100 101

Load

com

pare

d t

o load

avera

ge

alpha=1.0

alpha=0.5

alpha=0.0

100,000 peers1 million documents

Page 9: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

10

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

BubbleStorm Intuition

Replicate both queries and data copies each (hidden constants unequal)

Data and queries rendezvous in the network€

O n( )

Page 10: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

11

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

BubbleStorm: Random Replication

Place data replicas on random nodes Nodes evaluate query replicas on all stored data

Where both data and query go, matches are found

Collisions result from the birthday paradoxData

Query

Page 11: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

12

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

BubbleStorm: Exploiting Heterogeneity

Peers have different capacities Faster peers receive more traffic

This is beneficial! Contribution is squared

Data

Query

Page 12: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

13

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

BubbleStorm Components

Random Topology Allows efficient sampling of peers at random

Topology Measurement Computes network size and statistics

See PODC’07 brief announcement

Bubblecast Replicates queries/data onto peers quickly

Bubble Maintainer Preserves the correct number of replicas

Covered in this talk

Page 13: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

14

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Random Multigraph Topology

Random graphs support the birthday paradox Exploring an edge leads to a randomly sampled peer creation of random node subset (bubble) is cheap

Node degree is chosen proportional to bandwidth As random walks (and bubblecasts) follow edges

with equal probabilityUtilization will be balanced for heterogeneity

Page 14: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

15

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Random Multigraph Topology

The topology is a random permutation of its edges It is modified only when peers join or leave

Topology Eulerian Cycle

Page 15: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

16

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Join Algorithm

Topology Eulerian Cycle

Joining Node

Contact bootstrapping node Random walk finds a random edge Split the edge and insert in between Multiple joins are executed in parallel or

iteratively

Page 16: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

17

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Leave Algorithm

Leave splices two neighboring edges together Join and leave do not change degree of neighbors

Topology Eulerian Cycle

Leaving Node

Page 17: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

18

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Bubblecast Motivation

Random Walk

- high latency- unreliable+ precise length+ balanced link load

Bubblecast

+ low latency+ reliable+ precise node count+ balanced link load

Flooding

+ low latency+ reliable- imprecise node count- unbalanced link load

node counter (not hops)fixed branch factor

branch in every step

Page 18: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

19

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Example Bubblecast Execution

Decrement the counter for matching locally

17

8 8

4 3 4 3

2 1 1 1 2 1 1 1

1 1

- 1

- 1 - 1

- 1 - 1- 1 - 1

- 1 - 1

Split the counter between two neighbors Counters are always integral

Forwarding terminates when counter reaches 0 Final routing depth differs by at most one hop

A counter specifies the number of replicas to create

Page 19: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

20

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Bubblecast Properties

Used for query and data replication

Fixed branch factor balances load Same stationary distribution as a random walk

Counter for edges crossed, not hops Precisely controls replica count

Logarithmic routing depth Slightly deeper than flooding

Message loss reduces replication by log(size)

Samples random nodes… due to random topology

Page 20: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

21

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Complexity and Correctness

BubbleStorm costs roughly bandwidth / op.to provide exhaustive search with

The full equation ( ) is complicated by Heterogeneous peer capacity (H) Dependent sampling (due to repeated withdrawals) Unequal query and post traffic (; BS optimizes this)Full details in the paper

c n

P(failure) < e−c 2

c 1 2 3 4

P(success) 63.21% 98.17% 99.99% 99.99999%

e−c 2 +c 3HΥ

Page 21: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

22

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Heterogeneous System

Simulation parameters 1 million peers with heterogeneous upstream:

60% 16kB/s, 25% 32kB/s,10% 128kB/s, 5% 1.2MB/s

100B query every 5 user minutes (80/20 injection) 2kB meta data stored every 30 user minutes

Exponential lifetime, mean 60 minutes 10% of leaves are crash failures

Target reliability is 98.2% (c=2)

Page 22: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

23

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Apples to Apples Performance: Success

0

0.2

0.4

0.6

0.8

1

100 1000 10000 100000 1e+06

Success Probability

Network size (nodes)

BubbleStormRandom Walk

Gnutella

Search success remains unaffected by increasing the network size

Page 23: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

24

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Apples to Apples Performance: Latency

0.1

1

10

100

1000

100 1000 10000 100000 1e+06

Latency (s)

Network size (nodes)

RW PostRW QueryRW MatchGnu QueryGnu Match

BS PostBS QueryBS Match

BS = BubbleStormGnu = GnutellaRW = Ferreira P2P’05

Post = Data replicated Query = Query completed

Match = First hit found

Page 24: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

25

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Apples to Apples Performance: Bandwidth

0.01

0.1

1

10

100

1000

10000

100 1000 10000 100000 1e+06

Total System Traffic (MB/s)

Network size (nodes)

Uplink CapacityGnutella

TopologyRandom Walk

Bubblecast

Page 25: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

26

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Homogeneous Network: Leave 50%

Give all peers 10kB/s upstream (heterogeneity would help) Set 50% to depart (gracefully) after 1 minute

0.95

0.96

0.97

0.98

0.99

1

27:00 29:00 31:00 33:00 35:00

Probability/Fraction reached

Time (mm:ss)

Unique peersSuccess

Page 26: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

27

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Homogeneous Network: Crash 50%

Crash 50% of peers after 1 minute Echo effect: posted data is missing

0

0.2

0.4

0.6

0.8

1

27:00 29:00 31:00 33:00 35:00

Probability/Fraction reached

Time (mm:ss)

Unique peersSuccess

Page 27: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

28

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

Current and Future Work

Replica preservation in persistent bubbles Sustain bubble sizes under churn Scale bubble sizes with network size / composition

Update content Non-destructive, versioned updates Delete with death certificates

Release implementation

Page 28: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

29

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

BubbleStorm Properties in Recap

Unstructured Queries may be in any language

Heterogeneous Exploits peers with varied capacity

Load Balanced Stationary utilization distribution is flat

Resilient Survives 50% crash fail and 90% leave

Exhaustive All matches of an operation can be retrieved

Probabilistic Success is a tunable guarantee

Page 29: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

30

DA

TA

BA

SE

S A

ND

DIS

TR

IBU

TE

D S

YS

TE

MS

TE

CH

NIS

CH

E U

NIV

ER

SIT

ÄT

DA

RM

ST

AD

T

SIGCOMM’07: 1. Motivation 2. Overview 3. Topology 4. Bubblecast 5. Evaluation

When does BubbleStorm fit?

Complex query languages Keyword search and beyond

Zipfian load with large Partitioning data will create an all-pairs sub-problem

Mostly static data Allows us to trade post traffic for search traffic

Highly volatile networks Unstructured topology recovers quickly

Page 30: Www.dvs1.informatik.tu-darmstadt.de BubbleStorm: Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Wesley W. Terpstra, Jussi Kangasharju, Christof.

www.dvs1.informatik.tu-darmstadt.de

?Questions

Thanks for listening!