P2PR-tree: An R-tree-based Spatial Index for P2P Environments

29
P2PR-tree: An R-tree-based Spatial P2PR-tree: An R-tree-based Spatial Index for P2P Environments Index for P2P Environments ANIRBAN MONDAL ANIRBAN MONDAL YI LIFU YI LIFU MASARU KITSUREGAWA MASARU KITSUREGAWA University of Tokyo. University of Tokyo. E-mail: [email protected] E-mail: [email protected] tokyo.ac.jp tokyo.ac.jp

description

P2PR-tree: An R-tree-based Spatial Index for P2P Environments. ANIRBAN MONDAL YI LIFU MASARU KITSUREGAWA University of Tokyo. E-mail: [email protected]. PRESENTATION OUTLINE. Motivating Spatial Applications on P2P systems Existing Spatial Indexes - PowerPoint PPT Presentation

Transcript of P2PR-tree: An R-tree-based Spatial Index for P2P Environments

Page 1: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

P2PR-tree: An R-tree-based Spatial P2PR-tree: An R-tree-based Spatial Index for P2P EnvironmentsIndex for P2P Environments

ANIRBAN MONDALANIRBAN MONDALYI LIFU YI LIFU

MASARU KITSUREGAWAMASARU KITSUREGAWAUniversity of Tokyo.University of Tokyo.

E-mail: [email protected]: [email protected]

Page 2: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

PRESENTATION OUTLINEPRESENTATION OUTLINE

Motivating Spatial Applications onMotivating Spatial Applications on

P2P systemsP2P systems Existing Spatial IndexesExisting Spatial Indexes Our proposal: The P2PR-tree Our proposal: The P2PR-tree Performance AnalysisPerformance Analysis Conclusion and Future WorkConclusion and Future Work

Page 3: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

Spatial Applications on P2P systemsSpatial Applications on P2P systems Spatial data occurs in several important and diverse applications

Geographic Information Systems (GIS) Computer-aided design (CAD) Resource management Development planning, emergency planning and scientific research.

Unprecedented growth of available spatial data at geographically distributed locations.

Trend of increased globalization. Popularity of P2P data sharing

Efficient global sharing of distributively owned spatial data in P2P systems

Page 4: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

Application exampleApplication example

Searching for Real Estate information

in Tokyo

Query MBRQuery

ResultsResults

Page 5: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

Existing Spatial IndexesExisting Spatial Indexes

Centralized spatial indexesR-tree, R*-tree, R+-tree

Distributed spatial indexes M-RtreeMC-Rtree

Page 6: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

MC-RtreeMC-RtreeR-tree which indexes the

covering MBRs of the data stored at the clients

Each client has its own R-tree for managing its own data

Master

client clientclientclient

CentralizationCentralization

Designed for clusters.Designed for clusters.

Optimize disk I/Os.Optimize disk I/Os.

Page 7: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

Why can’t we use existing Why can’t we use existing R-tree-based approaches?R-tree-based approaches?

They use centralized mechanisms They use centralized mechanisms     →→   not scalable.not scalable.

All updates must pass through Master NodeAll updates must pass through Master NodeAll searches need to be routed by the Master All searches need to be routed by the Master

NodeNode→→    Performance bottleneck at the Master NodePerformance bottleneck at the Master Node

They do not optimize communication time.They do not optimize communication time.

Page 8: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

GRID-Related Projects

GRID Physics Network and European DataGrid Improving scientific research which require efficient

distributed handling of data in the petabyte range,

Earth Systems GRID (ESG) aims at facilitating detailed analysis of huge amounts of

climate data by a geographically distributed community via high bandwidth networks.

NASA Information Power GRID (IPG) improve existing systems in NASA for solving complex

scientific problems efficiently

Page 9: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

How our proposal differs from GRID-related spatial works?

GRID

Restrict data sharing only among scientific and research organizations

Individual nodes are usually dedicated and expected to be available most of the time.

Some amount of centralized control is possible by collaborations between organizations.

Our proposal

Allow normal users to share/upload data.

Individual nodes may join/leave anytime.

Distributively owned peers, hence centralized control practically challenging.

Page 10: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

Existing Search mechanisms Existing Search mechanisms in P2P systemsin P2P systems

Broadcast (Gnutella)Broadcast (Gnutella)Centralized (Napster)Centralized (Napster)Routing indices (RIs)Routing indices (RIs)Distributed hash tables (Chord,CAN,Tapestry)Distributed hash tables (Chord,CAN,Tapestry)

Existing works on P2P systems Existing works on P2P systems mostly address file-sharing. mostly address file-sharing.

Page 11: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

P2PR-tree (Peer-to-Peer R-tree)

A distributed R-tree-based indexing scheme designed for P2P systems

Parts of the distributed indexes are built autonomously by each peer.

Hierarchical and performs efficient pruning.Completely decentralized

Highly Scalable

Page 12: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

Block 1 Block 2

Block 3 Block 4

Dividing the Universe Dividing the Universe

P5 P6P1 P2P4

PPP

P

P

P

PP

P

PPP P

P

P

P

PP

P

PPP P

P

P

P

PP

P

PPP P

P

Level 2

B1 B2 B3 B4

G1 G2 G3 G4

P5 P6 P3

P1 P2 P20 P3 P4

SG1 SG2

Level 0

Level 1

Level 3

…..

P20 P3

Page 13: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

DefinitionsDefinitions

Unit: A Block, Group, Subgroup atUnit: A Block, Group, Subgroup at any level, or a peerany level, or a peer UnitMBR: Minimum Bounding Rectangle of a UnitUnitMBR: Minimum Bounding Rectangle of a Unit Router: In order to route messages to a Unit X, a peer A Router: In order to route messages to a Unit X, a peer A

needs to know at least one peer (say peer needs to know at least one peer (say peer B) B) which belongs to Unit X. We define peer B which belongs to Unit X. We define peer B as Peer as Peer A’s Router to Unit X. A’s Router to Unit X.

UnitRouterInfo: The addresses of routers to a Unit UnitRouterInfo: The addresses of routers to a Unit UnitInfo: UnitMBR and UnitRouterInfo of a UnitUnitInfo: UnitMBR and UnitRouterInfo of a Unit ChildInfo (Level i): UnitInfo of Child Units at Level i+1 in ChildInfo (Level i): UnitInfo of Child Units at Level i+1 in

the P2PR-tree the P2PR-tree

Page 14: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

Data Structure at a peerData Structure at a peer

).....( 210 LiiiiPeer

StructureDatatreeRLocal

iiiiChildInfo

iiChildInfo

iChildInfo

BlockInfoAll

L

).....(

...

).(

)(

1210

10

0

A Peer of Level L can be specified as

maintains the following information).....( 210 LiiiiPeer

jlevelatIDUniti j where

Page 15: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

Example of Data StructureExample of Data Structure

Level 2 Units

B1 B2 B3 B4

G1 G2 G3 G4

P5 P6 P3

P1 P2 P20 P3 P4

SG1 SG2

Level 0 Units

Level 1 Units

Level 3 Units

...

P2 can be specified as Peer(1.2.1.2)

G1 G2 G3 G4

P11 P12 P21 P33 P66

SG1 SG2

StructureDatatreeRLocal

SubgroupiPeersofInfoChildInfo

GroupiGroupsofInfoChildInfo

BlockiGroupsofInfoChildInfo

BBBBBlockInfoAll

1 n :)1.2.1(

2 n Sub :)2.1(

1 n :)1(

4,3,2,1:

Page 16: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

B1 B2 B3 B4

G1 G2 G3 G4

P5 P6 P3P1 P2 P4 P3

Level 0

Level 1

Level 2

…..

Maintaining informationPeer Level = 2 , (B1,B2,B3,B4)(G1,G2,G3,G4)(P6,P3)

P5 P6P1 P2

G1 G2

G3 G4

P9

P10

P8

P2PR-tree P2PR-tree

P3 P3

P4

Block 1

BlockMBR information stored at every peer

Page 17: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

B1 B2 B3 B4

G1 G2 G3 G4

P5 P6 P3P1 P2 P4 P3

Level 0

Level 1

Level 2

…..

P5 P6P1 P2

G1 G2

G3 G4

P9

P10

P8

P4

P3

P2PR-treeP2PR-tree

Block 1

Maintaining informationPeer Level = 2 , (B1,B2,B3,B4)(G1,G2,G3,G4)(P6,P3)

BlockMBR information stored at every peer

Page 18: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

B1 B2 B3 B4

G1 G2 G3 G4

P5 P6 P3P1 P2 P3 P4

BlockMBR information stored at every peer

Level 0

Level 1

Level 2

…..

Maintaining informationPeer Level = 2 , (B1,B2,B3,B4)(G1,G2,G3,G4)(P2,P3,P4)

P30

P5 P6P1 P2

P20

SG1 SG2

G1 G2

G3 G4

P9

P10

P8

P4

P30P3

Peer Join operation in P2PR-treePeer Join operation in P2PR-tree

Block 1

Page 19: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

P5 P6P1 P2

P20

SG1 SG2

G1 G2

G3 G4

P9

P10

P8

P4

P30P3

Level 2

B1 B2 B3 B4

G1 G2 G3 G4

P5 P6 P3 P30

P1 P2 P20 P3 P4

SG1 SG2

BlockMBR information stored at every peer

Level 0

Level 1

Level 3

…..

Maintaining informationPeer Level = 3 , (B1,B2,B3,B4)(G1,G2,G3,G4), (SG1,SG2), (P2,P20)

Peer Join operation in P2PR-treePeer Join operation in P2PR-tree

Block 1

Page 20: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

Routing IssuesRouting Issues

Assumption: A peer initially knows at least N Assumption: A peer initially knows at least N routers for a Unit.routers for a Unit.

Piggybacking to refresh routers for each peer. Piggybacking to refresh routers for each peer. During piggybacking, a peer sends the addresses and During piggybacking, a peer sends the addresses and

reliability information of other peers in its own Unit.reliability information of other peers in its own Unit. Each peer maintains most reliable R routers for Each peer maintains most reliable R routers for

Units based on reliability.Units based on reliability. What if all routers that a peer knows in a specific What if all routers that a peer knows in a specific

Unit are unavailable?Unit are unavailable? Peer contacts the peers in other blocks to find out Peer contacts the peers in other blocks to find out

new routers for that block.new routers for that block.

Page 21: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

Level 2

B1 B2 B3 B4

G1 G2 G3 G4

P5 P6 P3 P30

P1 P2 P20 P3 P4

SG1 SG2

BlockMBR information stored at every peer

Level 0

Level 1

Level 3

Maintaining InformationPeer Level = 2(P5→B1, P25→B2, P35→B3, B4)(P41→G1, G2, P43→G3, P49→G4)(P45, P46)

Searching the P2PR-treeSearching the P2PR-tree

P5 P6P1 P2

P20

SG1

SG2

G1 G2

G3 G4

P9

P10

P8

P4

P30

P3

Block 1

P45P46

P41

P42

G1 G2

G3 G4

P49

P40

P48

P44

P60

P43

Block 4

G1 G2 G3 G4

P45 P46 P60

Query Level = 0Query comes to P60

Maintaining InformationPeer Level = 2(P5→B1, P25→B2, P35→B3, B4)(P41→G1, G2, P43→G3, P49→G4)(P45, P46)

B1

Page 22: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

Level 2

B1 B2 B3 B4

G1 G2 G3 G4

P5 P6 P3 P30

P1 P2 P20 P3 P4

SG1 SG2

BlockMBR information stored at every peer

Level 0

Level 1

Level 3

Maintaining InformationPeer Level = 2(B1, P26→B2, P36→B3, P42→B4)(P4→G1, G2, P8→G3, P9→G4)(P6, P30)

Searching the P2PR-treeSearching the P2PR-tree

P5 P6P1 P2

P20

SG1

SG2

G1 G2

G3 G4

P9

P10

P8

P30

P3

Block 1

P45P46

P41

P42

G1 G2

G3 G4

P49

P40

P48

P44

P60

P43

Block 4

G1 G2 G3 G4

P45 P46 P60

Query Level = 1Query comes to P60

G1

Maintaining InformationPeer Level = 2(B1, P26→B2, P36→B3, P42→B4)(P4→G1, G2, P8→G3, P9→G4)(P6, P30)

P4

Page 23: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

Level 2

B1 B2 B3 B4

G1 G2 G3 G4

P5 P6 P3 P30

P1 P2 P20 P3 P4

SG1 SG2

BlockMBR information stored at every peer

Level 0

Level 1

Level 3

Searching the P2PR-treeSearching the P2PR-tree

P5 P6P1 P2

SG1

SG2

G1 G2

G3 G4

P9

P10

P8

P30

P3

Block 1

P45P46

P41

P42

G1 G2

G3 G4

P49

P40

P48

P44

P60

P43

Block 4

G1 G2 G3 G4

P45 P46 P60

Query Level = 2Query comes to P60

P4

Maintaining InformationPeer Level = 3(B1, P27→B2, P37→B3, P43→B4)(G1, P6→G2, P8→G3, P10→G4)(P20→SG1, SG2)(P3)

Maintaining InformationPeer Level = 3(B1, P27→B2, P37→B3, P43→B4)(G1, P6→G2, P8→G3, P10→G4)(P20→SG1, SG2)(P3)

SG1

P20

Page 24: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

Level 2

B1 B2 B3 B4

G1 G2 G3 G4

P5 P6 P3 P30

P1 P2 P20 P3 P4

SG1 SG2

BlockMBR information stored at every peer

Level 0

Level 1

Level 3

Searching the P2PR-treeSearching the P2PR-tree

P5 P6

SG1

SG2

G1 G2

G3 G4

P9

P10

P8

P30

P3

Block 1

P45P46

P41

P42

G1 G2

G3 G4

P49

P40

P48

P44

P60

P43

Block 4

G1 G2 G3 G4

P45 P46 P60

Query Level = 3Query comes to P60

P4

Maintaining InformationPeer Level = 3(B1, P28→B2, P38→B3, P45→B4)(G1, P30→G2, P8→G3, P9→G4)(SG1, P3→SG2)(P1,P2)

P20

Maintaining InformationPeer Level = 3(B1, P28→B2, P38→B3, P45→B4)(G1, P30→G2, P8→G3, P9→G4)(SG1, P3→SG2)(P1, P2)

P1 P2P1 P2

Page 25: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

Performance EvaluationPerformance Evaluation

Investigates the followingInvestigates the following Effect of variations in workload skewEffect of variations in workload skew

Performance metric:Performance metric: Average Response TimeAverage Response Time

Comparison with Centralized MC-RtreeComparison with Centralized MC-Rtree 1000 data providing peers1000 data providing peers

Page 26: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

Effect of variations in workload skew when the query interarrival rate was fixed at 20 queries/second

0

50

100

0 0.3 0.5 0.7 1

Zipf Factor

Avg

. R

esp

Tim

e (

sec) P2P-Rtree

MC-Rtree

Page 27: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

Effect of variations in workload skew when the query interarrival rate was fixed at 100 queries/second

0

300

600

0 0.3 0.5 0.7 1

Zipf Factor

Avg

. R

esp

Tim

e (

sec) P2P-Rtree

MC-Rtree

Page 28: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

ConclusionConclusion

Investigation of the problem of spatial indexing in P2P environments.

Proposal of the P2PR-tree (Peer-to-Peer R-tree).Scalable decentralized P2P data structureEfficient routing scheme

Page 29: P2PR-tree: An R-tree-based Spatial Index for P2P Environments

Future Scope of WorkFuture Scope of Work

Detailed simulationDetailed simulationReplicationReplication

AvailabilityAvailabilityLoad-balancing Load-balancing