Efficient Placement of Geographical Data Over Broadcast ...Efficient Placement of Geographical Data...
Transcript of Efficient Placement of Geographical Data Over Broadcast ...Efficient Placement of Geographical Data...
1
Efficient Placement of Geographical Data Over Efficient Placement of Geographical Data Over Broadcast Channel for Spatial Range Query Broadcast Channel for Spatial Range Query
Under Quadratic Cost Model Under Quadratic Cost Model
Jianting Zhang
Le Gruenwald
School of Computer ScienceThe University of Oklahoma
Norman, Oklahoma, 73019, USA{jianting, ggruenwald}@ou.edu
2
OutlineOutlineIntroductionRelated WorkReview of the Cost ModelThe Optimization Method– General Ideas– The Approximation Algorithm– An Example
Experiments and ResultsConclusions and Future Work Directions
3
IntroductionIntroductionWhat is What is Geographical Information?
• Mailing Address:
Engineering Laboratory, Room 139
200 Felgar Street,Norman,OK, 73019-6151
• Relative Direction:
4 miles southeast of Norman
• Coordination: Longitude/Latitude
(-97.443067, 35.194425)
4
IntroductionIntroductionWhy Broadcasting?Why Broadcasting?
Help solving several key problems in mobile computing– Bandwidth
Independent of number of usersExcellent scalability
– Power Consumption:Listen/Sleep mode consumes less power than in send mode
– MobilityNo mobility management is required at neither server side nor client side
5
IntroductionIntroductionTypes of BroadcastingTypes of Broadcasting
– Pull basedNeeds explicit client requestOnly requested data are broadcastNeeds frequent scheduling
– Push basedSchedule broadcast sequence without explicit requestsNeed prior knowledge for efficient sequencingSuitable for pushing data to a large number of users
6
IntroductionIntroductionWhy GI Broadcasting ?Why GI Broadcasting ?
Public Information – Service locations: ATM machines, Restaurants– Traffic & Road Conditions– Weather Information
Large number of potential users of GI (metropolitan area for example)Relatively static/low update frequencyMostly Read-onlyPrivacy is not a big concernDistributed in nature
7
IntroductionIntroductionDisk Access vs. Air AccessDisk Access vs. Air Access
8
IntroductionIntroductionParameters in Broadcast SystemParameters in Broadcast System
Access Time (Latency) – AT: –The duration between the time the broadcast channel is accessed to the time when all data are retrieved
–The user may switch to sleep mode in between active downloading
9
IntroductionIntroductionParameters in Broadcast SystemParameters in Broadcast System
Tune-in Time -TT: – Time for downloading data from broadcast
sequence – Mobile hosts are in active (listen) mode
10
IntroductionIntroductionResearch Objectives: A big pictureResearch Objectives: A big picture
11
IntroductionIntroductionBroadcast Scheme Under ConsiderationBroadcast Scheme Under Consideration
Index and Data UseSeparate Channels
12
Related WorkRelated Work•General Data Broadcasting:
•Tree-indexing, Hashing, Signature, Hybrid, etc.
•Suitable for One-dimensional and/or categorical data
•Allows only one data item per access
•Focus on trade off between TT and AT using replication
•Object-Orientated/Relational Database Broadcasting
•Allows multiple data items per access
•Assumes data access has predefined orders
13
Related WorkRelated Work
•Geographical Data is multi-dimensional and continuous data.
•There may be multiple data items in a spatial range query result set and they may not have a pre-defined order.
•Existing broadcast techniques can not be applied to geographical data broadcast for efficient spatial range query processing.
14
Cost ModelCost ModelBrief ReviewBrief Review
Assumption 1 : it takes unit time to broadcast a single data itemThus we can use the difference of the positions (D) between two data items in a broadcast sequence as the measurement of the data access time.
Assumption 2 : Data and index are broadcast exactly once in a broadcast cycle. We do not consider replication in this study.
Assumption 3: The number of range queries(M) that are requested within a region is proportional to its area (A): M=c*A
The cost is measured by A*D.
15
Cost ModelCost ModelBrief ReviewBrief Review
qx/2
qy/2
P1
A1
))]()...2(),1(min())()...2(),1([max(*...
))](),(),(min())(),(),([max(*
|)()(|*
,...2,1
1,,
1,
nnw
kjikjiw
jiw
Cost
n
nkjikji
njiji
ππππππ
ππππππ
ππ
−++
−+
−
=
∑
∑
≤≤<≤
≤<≤
qy/2qx/2
qy/2
qy/2P2
P1
∑
∑
∑
∈
∈
∈
=
=
=
Qqyqx
qqnn
Qqyqx
qqikji
Qqyqx
qqjiji
yx
yx
yx
Aw
Aw
Aw
),(
),(,...2,1,...2,1
),(
),(,,
),(
),(,,
~...
~
~
nn
n
kkjijiji
n
jjiii
AA
kjiAAA
jiAAA
..2,1,...2,1
1,,,
1,
~...
)(,~
)(~
=
≠≠−=
≠−=
=
=
U
U
A12
16
Optimization Optimization General IdeasGeneral Ideas
Requirements:
•Low-Cost: near real time scheduling (sequence 100-10000 nodes in 0-5 minutes)
•Trade off between Greedy and Non-greedy methods (n! possible orderings)
MinLA algorithm of (Bar-Yehuda, 2001):•Divide-and-conquer strategy•Space Complexity: O(2depth(T)) O(n)•Time Complexity: O(n2)
•Examines 2n-1 orderings in O(n2) time
∑∈Tt
tdepth )(2
17
Optimization Optimization General IdeasGeneral Ideas
Graph Minimum Linear Arrangement problem
|)()(|)*,()(),(
vuvuwGlaEvu
ππ∑∈
−=
)}(),...(),(min{)}(),...(),(max{ 2121 kk nnnnnn ππππππ −
]2
)1)(([1)( 222
2−−−
−=LLLL
LL
LgQuadratic
18
Optimization Optimization General IdeasGeneral Ideas
Observation: the monotonic relationship between L2and g(L2)
Motivation: Use L2 to approximate g(L2)
Assumption: the optimized ordering where the optimization is based on the definition of la(G) which is linear with respect to L2, is also a good ordering according to quadratic cost model respect to L2.
]2
)1)(([1)( 222
2−−−
−=LLLL
LL
Lg
19
Optimization Optimization The Approximation AlgorithmThe Approximation Algorithm
10
115
8
64
7
31
9
02
BDT
20
Optimization Optimization The Approximation AlgorithmThe Approximation Algorithm
A BDT:
• A binary tree that has all the nodes in a graph as its leaf nodes
• Two options: 0-orientation and 1-orientation
• Number of possible orderings is 2n-1, if the BDT is full and balanced
1-Orientation 0-Orientation+ -
LR RL
21
Optimization Optimization The Approximation AlgorithmThe Approximation Algorithm
The algorithm:
• Starts with the root of the BDT and computes the costs of the two possible orientations of its two sub-trees recursively.
•Keeps the orientation that has lower cost while discard the one that has a higher cost.
•The computed orientations at each intermediate node of the BDT form an orientation tree that has the same structure as the BDT.
•The orientation tree determines an ordering sequence of all the nodes in a graph.
22
Optimization Optimization The Approximation AlgorithmThe Approximation Algorithm
∈∩∈−∈∩∈∈∩∈−
=
∑∈ otherwise
RvtVuutVvuwLvtVuuvuwtVvtVuvuvuw
Evu 0)(|)()(|)*,()()(*),(
)()(|)()(|)*,(],[Cost
),(
RV(t),L,
ππ
πππ
•In-Cut
•Left_cut
• Right_cut
LR
V(t)
23
Optimization Optimization The Approximation AlgorithmThe Approximation Algorithm
Use position implicitly in computing the cost (access time in our applications) which is very efficient.
t̂t̂t̂t̂t̂t̂
24
Optimization Optimization An ExampleAn Example
T0
1 2 3 4
T11 T12
iA~
jiA ,~
kjiA ,,~In_cuts:
T11: {1,2}=22 T12: {3,4}=8
T0: {1,3}=2, {2,3}=38, {2,4}=3, {1,2,3}=14,{2,3,4}=4, total=62
25
Optimization Optimization DBW: An ExampleDBW: An Example
+ +
3 4 2 122
3
38
43 4 2 1
- +
22
2
14
T11 T11+-
Left_cut(1)=2+14+22=38Right_cut(1)=0Cost(1)=Left_cut(1)=38
Left_cut(2)=38+3+4=45Right_cut(2)=22Cost(2)= Left_cut(2)=45
Left_cut(T11)=45+38-22=61Right_cut(T11)=22+0-22=0Cost(T11)=45+38+(22-22)*1+(38-22)*1=99
26
Optimization Optimization DBW: An ExampleDBW: An Example
Left_cut(1)=2Right_cut(1)=22Cost(1)= Left_cut(1)=2
+
3 4 222
1
38
43 4 1 2
- -
2214
3
+T11 T11--
2
Left_cut(2)=38+4+3+22+14=81Right_cut(2)=0Cost(2)=Left_cut(2)=81
Left_cut(T11)=2+81-22=62Right_cut(T11)=22+0-22=0Cost(T11)=2+81+(22-22)*1+(81-22)*1=142
27
Optimization Optimization An ExampleAn Example
T11: 1-orientation 99, 0-orientation 142
T12: 1-orientation 15, 0-orientation 66
T0= 1-orientation 114, [4,3,2,1]
T11: 1-orientation 81, 0-orientation 38
T12: 1-orientation 127, 0-orientation 76
T0=0-orientation 114 [1,2,3,4]
Best among all 4!=24 possible orderings
28
Optimization Optimization An ExampleAn Example
How good is the approximation?
Answer from the example:
29
Experiments and ResultsExperiments and ResultsGenerating Data SetsGenerating Data Sets
•Five synthetic point data sets: 100/200/300/400/500 data points
•Data space [0,1) ×[0,1)
•Query window size: (0.1,0.1)
30
Experiments and ResultsExperiments and ResultsOrdering HeuristicsOrdering Heuristics
•Hilbert Space Filling Curve Ordering
•R-Tree Traversal Ordering
1 4 3 6 2 5
31
Experiments and ResultsExperiments and ResultsComparison of Random Orderings Comparison of Random Orderings
32
Experiments and ResultsExperiments and ResultsComparison of Two HeuristicsComparison of Two Heuristics
33
Experiments and ResultsExperiments and ResultsOptimization of ROptimization of R--Tree OrderingTree Ordering
34
ConclusionsConclusions•Observe the structural similarity between the quadratic cost model we previously developed and the MinLA problem and the monotonic relationship between the cost in terms of DPW+DBW and the DBW for a single query
•Propose to use the access time of DBW to approximate the access time of DPW+DBW and convert the optimization problem under the quadratic cost model into a MinLA optimization problem
•The experiment results using the five synthetic data sets based on optimization method showed that the optimized ordering is 21%-32% better than the 1000 random orderings average under the quadratic cost model.
35
Future WorkFuture Work•Extend our cost models to handle the access time both to the data channel and the index channel
•Explore more ordering heuristics as well as exact and/or approximation optimization methods
•Perform more experiments using both synthetic and real data sets with different sizes, distributions and densities to examine the effectiveness and scalabilities of the optimization methods
36
Thanks!
Questions?