The Near Earth Asteroid Rendezvous (NEAR) Rendezvous Burn Anomaly
Rendezvous Points-Based Scalable Content Discovery with Load Balancing Jun Gao Peter Steenkiste...
-
date post
20-Dec-2015 -
Category
Documents
-
view
223 -
download
0
Transcript of Rendezvous Points-Based Scalable Content Discovery with Load Balancing Jun Gao Peter Steenkiste...
Rendezvous Points-Based Scalable Content Discovery with Load Balancing
Jun Gao Peter Steenkiste
Computer Science DepartmentCarnegie Mellon University
October 24th, 2002
NGC 2002, Boston, USA
Jun Gao Carnegie Mellon University 2
Outline
Content Discovery System (CDS) Existing solutions CDS system design Simulation evaluation Conclusions
Jun Gao Carnegie Mellon University 3
Content Discovery System (CDS)
Example: a highway monitoring service Cameras and sensors monitor
road and traffic status Users issue flexible queries
CDS enables content discovery Locate contents that match
queries
Example services Service discovery; P2P;
pub/sub; sensor networks
Snapshot from traffic.com
Jun Gao Carnegie Mellon University 4
Comparison of Existing Solutions
Searchability
Centralized
Distributed
Registrationflooding
Querybroadcasting Hash-based
Graph-basedTree-based
Scalability
Robustness
yes yes yesHierarchicalnames
Look-up
yes? no no
yes
yes
no yes yes
yes
no
Load balancing yes? no no yes?no
DesignGoals
Solutions
Jun Gao Carnegie Mellon University 5
CDS Design
Attribute-value pair based naming scheme Enable searchability
Peer-to-peer system architecture Robust distributed system
Rendezvous Points-based content discovery Improve scalability
Load Balancing Matrix Dynamic balance load
Jun Gao Carnegie Mellon University 6
Naming Scheme
Based on Attribute-Value pairs CN: {a1=v1, a2=v2,..., an=vn}
Not necessarily hierarchical Attribute can be dynamic
Searchable via subset matching Q CN Number of matches for a CN is large
2n-1
Camera ID = 5562Highway = I-279 Exit = 4City = PittsburghSpeed = 25mphRoad condition = Icy
Highway = I-279 Exit = 4City = Pittsburgh
Q1
Q2
CN1
City = PittsburghSpeed = 25mph
Jun Gao Carnegie Mellon University 7
Distributed Infrastructure
Hash-based overlay substrate Routing, forwarding, management Node ID Hash function H(node)
Application layer publishes contents or issues queries
CDS layer determines where to register contents and send queries Centralized and network-wide
flooding are not scalable Idea: use a small set of nodes
as Rendezvous Points
TCP/IP
Hash-based Overlay
CDS
Application
N1
N6
N4N5
N3N2
N9
N8N7
Jun Gao Carnegie Mellon University 8
RP-based Scheme
Hash each AV-pair to get a set of RPs | RP | = n
RP node stores names that share the same pair Maintain with soft state
Query is sent directly to an RP node Use the least loaded RP RP node fully resolves
locally
N3N5
CN1
N4 N6
CN1: {a1=v1, a2=v2, a3=v3, a4=v4}
RP2
CN2
CN2: {a1=v1, a2=v2, a5=v5, a6=v6}
N2N1
RP1
Ni H(ai=vi)
N7N8
N9
?
Q:{a1=v1, a2=v2}
Jun Gao Carnegie Mellon University 9
System Properties
Efficient registration and query O(n) registration messages; n small O(m) messages for query with probing
Hashing AV-pair individually ensures subset matching Query may contain only 1 AV-pair
No inter-RP node communication for query resolution Tradeoff between CPU and Bandwidth
Load is spread across nodes Different names use different RP set
Jun Gao Carnegie Mellon University 10
Load Concentration Problem
RP node may be overloaded Some AV-pairs more popular than
others Speed=55mph vs. Speed=95mph P2P keyword follows Zipf
distribution However, many nodes are
underutilized Intuition: use a set of nodes to
share load caused by popular pairs
Challenge: accomplish load balancing in a distributed and self-adaptive fashion
10
100
1000
10000
100000
1 10 100 1000 10000
AV-pair rank
#of names
Example Zipf distribution
Jun Gao Carnegie Mellon University 11
Load Balancing Matrix (LBM)
Organize nodes into a logical matrix Each column holds a partition Rows are replicas of each
other
Node IDs are determined by: H(a1=v1, p, r) N1
(p,r)
Matrix expands itself to accommodate extra load Increase P when registration
load reaches threshold Query load R
1,1 2,1 3,1
1,2 2,2 3,2
1,3 2,3 3,3
CN1CN2 CN3 CN4 CN3 CN4
Q1Q2
Q3Q4
Q3
Q4
LBM for AV-pair: {a1=v1}
R
P
Jun Gao Carnegie Mellon University 12
Registration and Query with LBM
Determine LBM size Register with one random
column of each matrix Compute IDs locally
1,1
1,2
2,1
2,2
1,3 2,3
CN1:{a1v1, a2v2, a3v3}
0,0
P=?, R=?
P=3, R=3
LBM1
LBM2
LBM3
3,1
3,2
3,3
Q:{a1v1, a2v2}Load is balanced within an LBM
Registration
Query can be sent to the matrix of any AV-pair Use LBM with min(P)
Sent to one random node in each column
Query
Jun Gao Carnegie Mellon University 13
System Properties with LBM
Registration and query cost for one pair increases O(R) registration messages O(P) query messages Matrix size depends on current load
LBM must be kept small for efficiency Query optimization helps, e.g., large P small R Matrix shrinking mechanism E.g., May query a subset of the partitions
Load on each RP node is upper-bounded Efficient processing
Underutilized nodes are recruited as LBM expands
Jun Gao Carnegie Mellon University 14
Simulation Evaluation
Implement in an event-driven simulator Each node monitors its registration and query load Assume Chord-like underlying routing mechanism
Experiment setup 10,000 nodes in the CDS network 10,000 distinct AV-pairs (50 attributes, 200 values/attribute) Use synthetic registration and query workload
Performance metric: success rate System should maintain high success rate as load increases
Jun Gao Carnegie Mellon University 15
Workload
10
100
1000
10000
100000
1 10 100 1000 10000
Uniform
Skewed
1
10
100
1000
10000
100000
1 10 100 1000 10000
Rank of AV-pairs Rank of AV-pairs
Num
ber
of n
ames
Num
ber
of q
uerie
s
Registration load Query load
Jun Gao Carnegie Mellon University 16
Registration Success Rate Comparison
0
20
40
60
80
100
0.01 0.1 1 10
Uniform w/o LBM
Uniform w. LBM
Skewed w/o LBM
Skewed w. LBM
Avg. registration arrival rate (1000 reg/sec)
Reg
istr
atio
n su
cces
s ra
te (
%)
Threshold = 50 reg/secPoisson arrivalPmax = 10
100,000 names 1 registration every 10 secs.
Jun Gao Carnegie Mellon University 17
40
60
80
100
0.01 0.1 1 10 100
Random
Random w. LBM
With opt.
Opt. with LBM
Query Success Rate Comparison
Avg. query arrival rate (1000q/sec)
Que
ry s
ucce
ss r
ate
(%)
1 million users, 1 query every 10 secs.
Threshold = 200 q/secPoisson arrivalRmax = 10