One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems

21
1 One Torus to Rule One Torus to Rule Them All: Multi- Them All: Multi- dimensional Queries dimensional Queries in P2P Systems in P2P Systems Prasanna Ganesan Prasanna Ganesan Beverly Yang Beverly Yang Hector Garcia-Molina Hector Garcia-Molina Stanford University Stanford University

description

One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems. Prasanna Ganesan Beverly Yang Hector Garcia-Molina Stanford University. Motivation. P2P Systems Dynamic set of nodes Dynamic data distributed over nodes No centralization - PowerPoint PPT Presentation

Transcript of One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems

Page 1: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

11

One Torus to Rule Them One Torus to Rule Them All: Multi-dimensional All: Multi-dimensional Queries in P2P SystemsQueries in P2P Systems

Prasanna GanesanPrasanna Ganesan

Beverly YangBeverly Yang

Hector Garcia-MolinaHector Garcia-Molina

Stanford UniversityStanford University

Page 2: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

22

MotivationMotivation

P2P SystemsP2P Systems– Dynamic set of nodesDynamic set of nodes– Dynamic data distributed over nodesDynamic data distributed over nodes– No centralizationNo centralization– Traditionally Traditionally : Simple point queries over data: Simple point queries over data

New P2P applications desire multi-New P2P applications desire multi-dimensional queriesdimensional queries– Photo Sharing: Find all labels for photos in a Photo Sharing: Find all labels for photos in a

geographical areageographical area– Multi-player games: Find all objects in an areaMulti-player games: Find all objects in an area

Page 3: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

33

ProblemProblem

Devise P2P system to store relation R Devise P2P system to store relation R with:with:

1.1. Efficient tuple insertion/deletionEfficient tuple insertion/deletion

2.2. Efficient node join/leaveEfficient node join/leave– Minimize #messagesMinimize #messages

3.3. Efficient multi-dimensional range Efficient multi-dimensional range queriesqueries

– Minimize #nodes processing queryMinimize #nodes processing query

4.4. Load balance across nodesLoad balance across nodes

A parallel DB on steroids

Page 4: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

44

Challenge 1: Partitioning Challenge 1: Partitioning ProblemProblem

Partition data withPartition data with1.1. Locality: Keep Locality: Keep

nearby tuples on nearby tuples on same nodesame node

2.2. Load balance: Load balance: Equal #tuples on Equal #tuples on all nodesall nodes

ComplicationsComplications– Dynamic dataDynamic data– Dynamic nodesDynamic nodes

Page 5: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

55

Challenge 2: Routing Challenge 2: Routing ProblemProblem

Route query/insert/delete to relevant Route query/insert/delete to relevant nodesnodes– No centralization!No centralization!– Replicated directory too expensive!Replicated directory too expensive!– Trade-off between cost of query and Trade-off between cost of query and

cost of maintaining routing structurecost of maintaining routing structure

Page 6: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

66

RoadmapRoadmap

Two Different ApproachesTwo Different Approaches– SCRAP: Space-filling curves with Range SCRAP: Space-filling curves with Range

PartitionsPartitions– MURK: Multi-dimensional Rectangulation MURK: Multi-dimensional Rectangulation

with kd-treeswith kd-trees Comparing the two approachesComparing the two approaches

Page 7: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

77

SCRAP PartitioningSCRAP Partitioning

Two-Step ProcessTwo-Step Process1.1. Map data to 1-d with space-filling curve Map data to 1-d with space-filling curve

– E.g., <E.g., <110011110011,,010101010101> becomes > becomes 110011110000001111001111

Page 8: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

88

Scrap Partitioning (2)Scrap Partitioning (2)

2. Range partition 1-d data2. Range partition 1-d data– Preserves locality!Preserves locality!

Page 9: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

99

Load Balancing with SCRAPLoad Balancing with SCRAP

Adjust partitions when unbalancedAdjust partitions when unbalanced– Adjust boundary with neighborAdjust boundary with neighbor– Migrate to new areaMigrate to new area– Guarantees: All loads within factor 4.24. Constant tuple Guarantees: All loads within factor 4.24. Constant tuple

movements per insert/delete [GBGM04]movements per insert/delete [GBGM04]

Page 10: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

1010

Query RoutingQuery Routing

Map multi-dim query to set of 1-d rangesMap multi-dim query to set of 1-d ranges Send each 1-d range query to relevant Send each 1-d range query to relevant

nodenode Use a linked list to interconnect nodesUse a linked list to interconnect nodes

– Add “skip” pointers for fast routingAdd “skip” pointers for fast routing

– O(log n) messages for routing/node O(log n) messages for routing/node joins/leavesjoins/leaves

Page 11: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

1111

RoadmapRoadmap

Two Different ApproachesTwo Different Approaches– SCRAP: Space-filling curves with Range SCRAP: Space-filling curves with Range

PartitionsPartitions– MURK: Multi-dimensional Rectangulation MURK: Multi-dimensional Rectangulation

with kd-treeswith kd-trees Comparing the two approachesComparing the two approaches

Page 12: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

1212

MURKMURK

Intuition: Partition data in native Intuition: Partition data in native space into “Rectangles” space into “Rectangles” – a la a la kd-treeskd-trees

Page 13: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

1313

Kd-tree InterpretationKd-tree Interpretation

Nodes form leaves of Nodes form leaves of kd-treekd-tree

Node Join: Split Node Join: Split existing leafexisting leaf

Node leaveNode leave– Sibling takes overSibling takes over– If no sibling, find If no sibling, find

someone in sibling someone in sibling sub-treesub-tree

Page 14: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

1414

Murk PropertiesMurk Properties

Locality: Locality: Rectangulation Rectangulation better than SCRAPbetter than SCRAP

Load BalanceLoad Balance– Ok if data Ok if data

distribution is staticdistribution is static– ??? If data ??? If data

distribution is distribution is dynamicdynamic

Page 15: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

1515

Routing QueriesRouting Queries

Build a grid of nodesBuild a grid of nodes– Adjacent nodes link to each otherAdjacent nodes link to each other– Analogous to linked list in higher dimensionsAnalogous to linked list in higher dimensions

ProblemsProblems– Node managing large space has many Node managing large space has many

neighbors!neighbors!– Routing on grid is too slow. Need skip Routing on grid is too slow. Need skip

pointerspointers– Not easy to add skip pointers (see paper)Not easy to add skip pointers (see paper)

Page 16: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

1616

EvaluationEvaluation

DatasetsDatasets– Uniform: 32-bit ints drawn at randomUniform: 32-bit ints drawn at random– Skewed: Photo Co-ords from real collectionSkewed: Photo Co-ords from real collection

Nodes join one at a time to build Nodes join one at a time to build networknetwork

EvaluateEvaluate– Locality: #nodes that process a queryLocality: #nodes that process a query– Routing: #messages transmitted per queryRouting: #messages transmitted per query

Page 17: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

1717

Dimensionality vs. LocalityDimensionality vs. Locality

Dimensionality

#nodes = 8192. #Ideal Locality =1

Page 18: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

1818

Selectivity vs. LocalitySelectivity vs. Locality

Page 19: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

1919

Network Size vs. routing Network Size vs. routing CostCost

Network Size

Page 20: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

2020

ConclusionsConclusions

SCRAPSCRAP– Simple partitioning and routingSimple partitioning and routing– Excellent load balanceExcellent load balance– Issue: Space-filling curve offers poor locality Issue: Space-filling curve offers poor locality

MURKMURK– Much better locality than SCRAPMuch better locality than SCRAP– Routing still okRouting still ok– Load balance is more complex and heuristic Load balance is more complex and heuristic

Page 21: One Torus to Rule Them All:   Multi-dimensional Queries in P2P Systems

2121

More InformationMore Information

Load Balancing, Range Queries and Load Balancing, Range Queries and P2PP2P– ““Online Balancing of Range-Partitioned Data Online Balancing of Range-Partitioned Data

with Applications to P2P Systemswith Applications to P2P Systems”, ”, VLDB 2004VLDB 2004

– ““Distributed Balanced Tables: Not Making a Distributed Balanced Tables: Not Making a Hash of it AllHash of it All”, ”, Stanford Tech ReportStanford Tech Report

– Google: “Prasanna Ganesan”Google: “Prasanna Ganesan” More work on P2PMore work on P2P

– Google: “Stanford Peers”Google: “Stanford Peers”