P2PR-tree: An R-tree-based Spatial Index for P2P Environments
description
Transcript of P2PR-tree: An R-tree-based Spatial Index for P2P Environments
P2PR-tree: An R-tree-based Spatial P2PR-tree: An R-tree-based Spatial Index for P2P EnvironmentsIndex for P2P Environments
ANIRBAN MONDALANIRBAN MONDALYI LIFU YI LIFU
MASARU KITSUREGAWAMASARU KITSUREGAWAUniversity of Tokyo.University of Tokyo.
E-mail: [email protected]: [email protected]
PRESENTATION OUTLINEPRESENTATION OUTLINE
Motivating Spatial Applications onMotivating Spatial Applications on
P2P systemsP2P systems Existing Spatial IndexesExisting Spatial Indexes Our proposal: The P2PR-tree Our proposal: The P2PR-tree Performance AnalysisPerformance Analysis Conclusion and Future WorkConclusion and Future Work
Spatial Applications on P2P systemsSpatial Applications on P2P systems Spatial data occurs in several important and diverse applications
Geographic Information Systems (GIS) Computer-aided design (CAD) Resource management Development planning, emergency planning and scientific research.
Unprecedented growth of available spatial data at geographically distributed locations.
Trend of increased globalization. Popularity of P2P data sharing
Efficient global sharing of distributively owned spatial data in P2P systems
Application exampleApplication example
Searching for Real Estate information
in Tokyo
Query MBRQuery
ResultsResults
Existing Spatial IndexesExisting Spatial Indexes
Centralized spatial indexesR-tree, R*-tree, R+-tree
Distributed spatial indexes M-RtreeMC-Rtree
MC-RtreeMC-RtreeR-tree which indexes the
covering MBRs of the data stored at the clients
Each client has its own R-tree for managing its own data
Master
client clientclientclient
CentralizationCentralization
Designed for clusters.Designed for clusters.
Optimize disk I/Os.Optimize disk I/Os.
Why can’t we use existing Why can’t we use existing R-tree-based approaches?R-tree-based approaches?
They use centralized mechanisms They use centralized mechanisms →→ not scalable.not scalable.
All updates must pass through Master NodeAll updates must pass through Master NodeAll searches need to be routed by the Master All searches need to be routed by the Master
NodeNode→→ Performance bottleneck at the Master NodePerformance bottleneck at the Master Node
They do not optimize communication time.They do not optimize communication time.
GRID-Related Projects
GRID Physics Network and European DataGrid Improving scientific research which require efficient
distributed handling of data in the petabyte range,
Earth Systems GRID (ESG) aims at facilitating detailed analysis of huge amounts of
climate data by a geographically distributed community via high bandwidth networks.
NASA Information Power GRID (IPG) improve existing systems in NASA for solving complex
scientific problems efficiently
How our proposal differs from GRID-related spatial works?
GRID
Restrict data sharing only among scientific and research organizations
Individual nodes are usually dedicated and expected to be available most of the time.
Some amount of centralized control is possible by collaborations between organizations.
Our proposal
Allow normal users to share/upload data.
Individual nodes may join/leave anytime.
Distributively owned peers, hence centralized control practically challenging.
Existing Search mechanisms Existing Search mechanisms in P2P systemsin P2P systems
Broadcast (Gnutella)Broadcast (Gnutella)Centralized (Napster)Centralized (Napster)Routing indices (RIs)Routing indices (RIs)Distributed hash tables (Chord,CAN,Tapestry)Distributed hash tables (Chord,CAN,Tapestry)
Existing works on P2P systems Existing works on P2P systems mostly address file-sharing. mostly address file-sharing.
P2PR-tree (Peer-to-Peer R-tree)
A distributed R-tree-based indexing scheme designed for P2P systems
Parts of the distributed indexes are built autonomously by each peer.
Hierarchical and performs efficient pruning.Completely decentralized
Highly Scalable
Block 1 Block 2
Block 3 Block 4
Dividing the Universe Dividing the Universe
P5 P6P1 P2P4
PPP
P
P
P
P
PP
P
PPP P
P
P
P
PP
P
PPP P
P
P
P
PP
P
PPP P
P
Level 2
B1 B2 B3 B4
G1 G2 G3 G4
P5 P6 P3
P1 P2 P20 P3 P4
SG1 SG2
Level 0
Level 1
Level 3
…..
P20 P3
DefinitionsDefinitions
Unit: A Block, Group, Subgroup atUnit: A Block, Group, Subgroup at any level, or a peerany level, or a peer UnitMBR: Minimum Bounding Rectangle of a UnitUnitMBR: Minimum Bounding Rectangle of a Unit Router: In order to route messages to a Unit X, a peer A Router: In order to route messages to a Unit X, a peer A
needs to know at least one peer (say peer needs to know at least one peer (say peer B) B) which belongs to Unit X. We define peer B which belongs to Unit X. We define peer B as Peer as Peer A’s Router to Unit X. A’s Router to Unit X.
UnitRouterInfo: The addresses of routers to a Unit UnitRouterInfo: The addresses of routers to a Unit UnitInfo: UnitMBR and UnitRouterInfo of a UnitUnitInfo: UnitMBR and UnitRouterInfo of a Unit ChildInfo (Level i): UnitInfo of Child Units at Level i+1 in ChildInfo (Level i): UnitInfo of Child Units at Level i+1 in
the P2PR-tree the P2PR-tree
Data Structure at a peerData Structure at a peer
).....( 210 LiiiiPeer
StructureDatatreeRLocal
iiiiChildInfo
iiChildInfo
iChildInfo
BlockInfoAll
L
).....(
...
).(
)(
1210
10
0
A Peer of Level L can be specified as
maintains the following information).....( 210 LiiiiPeer
jlevelatIDUniti j where
Example of Data StructureExample of Data Structure
Level 2 Units
B1 B2 B3 B4
G1 G2 G3 G4
P5 P6 P3
P1 P2 P20 P3 P4
SG1 SG2
Level 0 Units
Level 1 Units
Level 3 Units
...
P2 can be specified as Peer(1.2.1.2)
G1 G2 G3 G4
P11 P12 P21 P33 P66
SG1 SG2
StructureDatatreeRLocal
SubgroupiPeersofInfoChildInfo
GroupiGroupsofInfoChildInfo
BlockiGroupsofInfoChildInfo
BBBBBlockInfoAll
1 n :)1.2.1(
2 n Sub :)2.1(
1 n :)1(
4,3,2,1:
B1 B2 B3 B4
G1 G2 G3 G4
P5 P6 P3P1 P2 P4 P3
Level 0
Level 1
Level 2
…..
Maintaining informationPeer Level = 2 , (B1,B2,B3,B4)(G1,G2,G3,G4)(P6,P3)
P5 P6P1 P2
G1 G2
G3 G4
P9
P10
P8
P2PR-tree P2PR-tree
P3 P3
P4
Block 1
BlockMBR information stored at every peer
B1 B2 B3 B4
G1 G2 G3 G4
P5 P6 P3P1 P2 P4 P3
Level 0
Level 1
Level 2
…..
P5 P6P1 P2
G1 G2
G3 G4
P9
P10
P8
P4
P3
P2PR-treeP2PR-tree
Block 1
Maintaining informationPeer Level = 2 , (B1,B2,B3,B4)(G1,G2,G3,G4)(P6,P3)
BlockMBR information stored at every peer
B1 B2 B3 B4
G1 G2 G3 G4
P5 P6 P3P1 P2 P3 P4
BlockMBR information stored at every peer
Level 0
Level 1
Level 2
…..
Maintaining informationPeer Level = 2 , (B1,B2,B3,B4)(G1,G2,G3,G4)(P2,P3,P4)
P30
P5 P6P1 P2
P20
SG1 SG2
G1 G2
G3 G4
P9
P10
P8
P4
P30P3
Peer Join operation in P2PR-treePeer Join operation in P2PR-tree
Block 1
P5 P6P1 P2
P20
SG1 SG2
G1 G2
G3 G4
P9
P10
P8
P4
P30P3
Level 2
B1 B2 B3 B4
G1 G2 G3 G4
P5 P6 P3 P30
P1 P2 P20 P3 P4
SG1 SG2
BlockMBR information stored at every peer
Level 0
Level 1
Level 3
…..
Maintaining informationPeer Level = 3 , (B1,B2,B3,B4)(G1,G2,G3,G4), (SG1,SG2), (P2,P20)
Peer Join operation in P2PR-treePeer Join operation in P2PR-tree
Block 1
Routing IssuesRouting Issues
Assumption: A peer initially knows at least N Assumption: A peer initially knows at least N routers for a Unit.routers for a Unit.
Piggybacking to refresh routers for each peer. Piggybacking to refresh routers for each peer. During piggybacking, a peer sends the addresses and During piggybacking, a peer sends the addresses and
reliability information of other peers in its own Unit.reliability information of other peers in its own Unit. Each peer maintains most reliable R routers for Each peer maintains most reliable R routers for
Units based on reliability.Units based on reliability. What if all routers that a peer knows in a specific What if all routers that a peer knows in a specific
Unit are unavailable?Unit are unavailable? Peer contacts the peers in other blocks to find out Peer contacts the peers in other blocks to find out
new routers for that block.new routers for that block.
Level 2
B1 B2 B3 B4
G1 G2 G3 G4
P5 P6 P3 P30
P1 P2 P20 P3 P4
SG1 SG2
BlockMBR information stored at every peer
Level 0
Level 1
Level 3
…
Maintaining InformationPeer Level = 2(P5→B1, P25→B2, P35→B3, B4)(P41→G1, G2, P43→G3, P49→G4)(P45, P46)
Searching the P2PR-treeSearching the P2PR-tree
P5 P6P1 P2
P20
SG1
SG2
G1 G2
G3 G4
P9
P10
P8
P4
P30
P3
Block 1
P45P46
P41
P42
G1 G2
G3 G4
P49
P40
P48
P44
P60
P43
Block 4
G1 G2 G3 G4
P45 P46 P60
Query Level = 0Query comes to P60
Maintaining InformationPeer Level = 2(P5→B1, P25→B2, P35→B3, B4)(P41→G1, G2, P43→G3, P49→G4)(P45, P46)
B1
Level 2
B1 B2 B3 B4
G1 G2 G3 G4
P5 P6 P3 P30
P1 P2 P20 P3 P4
SG1 SG2
BlockMBR information stored at every peer
Level 0
Level 1
Level 3
…
Maintaining InformationPeer Level = 2(B1, P26→B2, P36→B3, P42→B4)(P4→G1, G2, P8→G3, P9→G4)(P6, P30)
Searching the P2PR-treeSearching the P2PR-tree
P5 P6P1 P2
P20
SG1
SG2
G1 G2
G3 G4
P9
P10
P8
P30
P3
Block 1
P45P46
P41
P42
G1 G2
G3 G4
P49
P40
P48
P44
P60
P43
Block 4
G1 G2 G3 G4
P45 P46 P60
Query Level = 1Query comes to P60
G1
Maintaining InformationPeer Level = 2(B1, P26→B2, P36→B3, P42→B4)(P4→G1, G2, P8→G3, P9→G4)(P6, P30)
P4
Level 2
B1 B2 B3 B4
G1 G2 G3 G4
P5 P6 P3 P30
P1 P2 P20 P3 P4
SG1 SG2
BlockMBR information stored at every peer
Level 0
Level 1
Level 3
…
Searching the P2PR-treeSearching the P2PR-tree
P5 P6P1 P2
SG1
SG2
G1 G2
G3 G4
P9
P10
P8
P30
P3
Block 1
P45P46
P41
P42
G1 G2
G3 G4
P49
P40
P48
P44
P60
P43
Block 4
G1 G2 G3 G4
P45 P46 P60
Query Level = 2Query comes to P60
P4
Maintaining InformationPeer Level = 3(B1, P27→B2, P37→B3, P43→B4)(G1, P6→G2, P8→G3, P10→G4)(P20→SG1, SG2)(P3)
Maintaining InformationPeer Level = 3(B1, P27→B2, P37→B3, P43→B4)(G1, P6→G2, P8→G3, P10→G4)(P20→SG1, SG2)(P3)
SG1
P20
Level 2
B1 B2 B3 B4
G1 G2 G3 G4
P5 P6 P3 P30
P1 P2 P20 P3 P4
SG1 SG2
BlockMBR information stored at every peer
Level 0
Level 1
Level 3
…
Searching the P2PR-treeSearching the P2PR-tree
P5 P6
SG1
SG2
G1 G2
G3 G4
P9
P10
P8
P30
P3
Block 1
P45P46
P41
P42
G1 G2
G3 G4
P49
P40
P48
P44
P60
P43
Block 4
G1 G2 G3 G4
P45 P46 P60
Query Level = 3Query comes to P60
P4
Maintaining InformationPeer Level = 3(B1, P28→B2, P38→B3, P45→B4)(G1, P30→G2, P8→G3, P9→G4)(SG1, P3→SG2)(P1,P2)
P20
Maintaining InformationPeer Level = 3(B1, P28→B2, P38→B3, P45→B4)(G1, P30→G2, P8→G3, P9→G4)(SG1, P3→SG2)(P1, P2)
P1 P2P1 P2
Performance EvaluationPerformance Evaluation
Investigates the followingInvestigates the following Effect of variations in workload skewEffect of variations in workload skew
Performance metric:Performance metric: Average Response TimeAverage Response Time
Comparison with Centralized MC-RtreeComparison with Centralized MC-Rtree 1000 data providing peers1000 data providing peers
Effect of variations in workload skew when the query interarrival rate was fixed at 20 queries/second
0
50
100
0 0.3 0.5 0.7 1
Zipf Factor
Avg
. R
esp
Tim
e (
sec) P2P-Rtree
MC-Rtree
Effect of variations in workload skew when the query interarrival rate was fixed at 100 queries/second
0
300
600
0 0.3 0.5 0.7 1
Zipf Factor
Avg
. R
esp
Tim
e (
sec) P2P-Rtree
MC-Rtree
ConclusionConclusion
Investigation of the problem of spatial indexing in P2P environments.
Proposal of the P2PR-tree (Peer-to-Peer R-tree).Scalable decentralized P2P data structureEfficient routing scheme
Future Scope of WorkFuture Scope of Work
Detailed simulationDetailed simulationReplicationReplication
AvailabilityAvailabilityLoad-balancing Load-balancing