Distributed Indexing and Querying in Sensor Networks using Statistical Models
description
Transcript of Distributed Indexing and Querying in Sensor Networks using Statistical Models
![Page 1: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/1.jpg)
Distributed Indexing and Querying in Sensor Networks
using Statistical Models
Arnab [email protected]
Indian Institute of Technology (IIT), Kanpur
![Page 2: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/2.jpg)
Jul 17, 2008 CS, ULB 2
Wireless sensor networks
• “Sensor” is a tiny, cheap communicating device with limited memory, communication bandwidth and battery life– Communication is precious
• Provides monitoring of physical phenomena• Wireless sensor network (WSN): a collection
of such sensors– Enables spatio-temporal monitoring of events– Inter-communication among neighboring sensors– Base station as a centralized point of entry
![Page 3: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/3.jpg)
Jul 17, 2008 CS, ULB 3
Semantic modeling
• Uses of WSNs– How many rooms are occupied?– Is there a fire in any room?– What is the pattern of birds’ movements?
• Low-level individual sensor readings do not provide semantics
• Content summarization by modeling• Which models to use?• Where and when to model?
![Page 4: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/4.jpg)
Jul 17, 2008 CS, ULB 4
Outline
• Semantic modeling– Which models to use?– Where and when to build the models?
• MIST: An index structure
• Query algorithms
• Experiments
• Conclusions
![Page 5: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/5.jpg)
Jul 17, 2008 CS, ULB 5
How to model?
• Zebranet
• Track movement of zebras by velocity sensors
• Three discrete states:– Grazing (G)– Walking (W)– Fast-moving (F)
• Zebras’ behavior by state sequence– G W W W W F F G G, G G F F F W W W
![Page 6: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/6.jpg)
Jul 17, 2008 CS, ULB 6
Statistical models• Markov Chain (MC)
– Provides inference about behavior in general
– τ: transition probabilities– π: start state probabilities
• Hidden Markov Model (HMM)– Try to infer the causes of
such behavior– ξ: emission probabilities
• Use of either model depends on the context
Zebra Mobility: HMM
Zebra Mobility: MC
![Page 7: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/7.jpg)
Jul 17, 2008 CS, ULB 7
When and where: Queries• Identify interesting behaviors in the network
– Example: Identify all zebras (sensors) that observed the behavior pattern FFFF with likelihood > 0.8
• May denote possible predator attack
• Sequence queries– Range query: Return sensors that observed a particular
behavior with likelihood > threshold– Top-1 query: Which sensor is most likely to observe a
given behavior?
• Model queries– 1-NN query: Which sensor is most similar to a given
pattern (model)?
![Page 8: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/8.jpg)
Jul 17, 2008 CS, ULB 8
Centralized solution
• Each sensor– Builds a model– Transmits the model to the base station (BS)
• Queries come to BS
• BS answers them– No query communication
• Each update in a sensor is transmitted– Huge update costs
![Page 9: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/9.jpg)
Jul 17, 2008 CS, ULB 9
Slack-based centralized solution
• To save update costs• Introduce slack locally at each sensor• No update if new parameter is within slack of old
parameter– Update costs reduced
• BS knows slack– Finds range for likelihood from each sensor– If cannot be answered by cached models, then query
transmitted to the sensor– Query communication costs are introduced
![Page 10: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/10.jpg)
Jul 17, 2008 CS, ULB 10
Outline
• Semantic modeling• MIST: An index structure
– Correlation among models– Composition of models– Hierarchical aggregation of index– Dynamic maintenance
• Query algorithms• Experiments• Conclusions
![Page 11: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/11.jpg)
Jul 17, 2008 CS, ULB 11
MIST (Model-based Index Structure)
• Overlay a tree on the network
• Each sensor trains a model (MC/HMM) using observed sequences
• Aggregation of child models into parent using correlation among models
• Two types of composite models
• Bottom-up aggregation of index models
• Update in models handled by slack
![Page 12: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/12.jpg)
Jul 17, 2008 CS, ULB 12
Correlation among models
• Models λ1,..., λm are (1-ε)-correlated if for all corresponding parameters σ1,...,σm:
• ε →0: High correlation– Models are similar
},...,max{
},...,min{1
1
1
m
m
5.05.0
4.06.0
7.03.0
1
1
4.06.0
3.07.0
6.04.0
2
2
1 2
4.0
3.01
7.0
6.01
5.0
4.0,
6.0
5.0,
7.0
6.0,
4.0
3.0min1
25.0
![Page 13: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/13.jpg)
Jul 17, 2008 CS, ULB 13
Outline
• Semantic modeling• MIST: An index structure
– Correlation among models– Composition of models– Hierarchical aggregation of index– Dynamic maintenance
• Query algorithms• Experiments• Conclusions
![Page 14: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/14.jpg)
Jul 17, 2008 CS, ULB 14
Average index model• λavg maintains
– Average of all corresponding parameters:
– ε’: Correlation parameter between λavg and any λi
– βmax, βmin: maximum and minimum of all parameters from constituent models
mavg avg ,...,1
max),'1/(min avgi
min),'1.(max avgi
5.05.0
4.06.0
7.03.0
1
1
4.06.0
3.07.0
6.04.0
2
2
1 2
45.055.0
35.065.0
65.035.0
avg
avg
avg
m
m1
1
11
'
' '
![Page 15: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/15.jpg)
Jul 17, 2008 CS, ULB 15
Min-max index models
• λmin and λmax maintains
– Minimum and maximum of all corresponding parameters:
– No extra parameter
m ,...,min 1min
min i
5.05.0
4.06.0
7.03.0
1
1
4.06.0
3.07.0
6.04.0
2
2
1 2
4.05.0
3.06.0
6.03.0
min
min
min
m ,...,max 1max
max i 5.06.0
4.07.0
7.04.0
min
min
max
![Page 16: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/16.jpg)
Jul 17, 2008 CS, ULB 16
Comparison• Statistical properties
– Average: Valid statistical models• Transition and start state probabilities add up to 1
– Min-max: Pseudo-models• Probabilities, in general, do not add up to 1
• Parameters– Average: 3 extra parameters
• Total n+3 parameters
– Min-max: no extra parameter• Total 2n parameters
![Page 17: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/17.jpg)
Jul 17, 2008 CS, ULB 17
Outline
• Semantic modeling• MIST: An index structure
– Correlation among models– Composition of models– Hierarchical aggregation of index– Dynamic maintenance
• Query algorithms• Experiments• Conclusions
![Page 18: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/18.jpg)
Jul 17, 2008 CS, ULB 18
Hierarchical index• Average model
– Correlation parameter ε’
• Correlation gets reduced
– βmax (βmin)
• Maximum (minimum) of βmax (βmin) ’s of children
– Bounds become larger
• Min- (max-) model– Aggregation of min- (max-)
model parameters– Min (max) becomes smaller
(larger)
)'1)('1()'1( 21
![Page 19: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/19.jpg)
Jul 17, 2008 CS, ULB 19
Dynamic maintenance
• Observations and therefore models change• Slack parameter δ• Models re-built with period d• Last model update time u• No update if λ(t+d) is within (1- δ) correlation with λ(u)
• Correlation parameter εslack maintained in the parent as
• Hierarchical index construction assumes εslack
)1)(1()1( noslackslack
![Page 20: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/20.jpg)
Jul 17, 2008 CS, ULB 20
Outline
• Semantic modeling
• MIST: An index structure
• Query algorithms– Sequence queries– Model queries
• Experiments
• Conclusions
![Page 21: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/21.jpg)
Jul 17, 2008 CS, ULB 21
Queries
• Sequence queries– Query sequence of symbols: q = q1q2...qk
– Range query: Return sensors that have observed q with a probability > χ
– Top-1 query: Given q, return the sensor that has the highest probability of observing q
• Model queries– Query model: Q = {π,τ}– 1-NN query: Return the sensor model that is
most similar to Q
![Page 22: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/22.jpg)
Jul 17, 2008 CS, ULB 22
Range query• Probability of observing q from λ is
– q is of length k– σi is the ith parameter in P(q| λ)– For MC λ = {π,τ},
– For HMM, P(q| λ) is calculated as a sum along all possible state paths, each having 2k terms
• Idea is to bound every parameter σi separately
kqP ...)|( 21
kk qqqqqqP1211
...)|(
![Page 23: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/23.jpg)
Jul 17, 2008 CS, ULB 23
Bounds
• Average model
– Use of δ and εslack to correct for changes after the last update
– Therefore, bounds for P(q| λ) are
• Min-max model
ilbs
iavg
i )1.(),1.(max min
iubs
iavg
i )1/(),1/(min max
iub
k
i
ilb
k
iqP
11)|(
kk qPqPqP )1/()|()|()1).(|( maxmin
![Page 24: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/24.jpg)
Jul 17, 2008 CS, ULB 24
Top-1 query
• For any internal node– Each subtree has a lower bound and an upper
bound of observing q– Prune a subtree if its lower bound is higher than
upper bound of some other subtree• Guarantees that best answer is not in this subtree
• Requires comparison of bounds across subtrees
• Pruning depends on dissimilarity of subtree models
![Page 25: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/25.jpg)
Jul 17, 2008 CS, ULB 25
Model (1-NN) query
• Requires notion of distance between models• Euclidean distance or L2 norm
– Corresponding parameters are considered as dimensions
• Straightforward for MCs• For HMMs, state correspondence needs to be
established– Domain knowledge– Matching
n
i
in
i
iid1
2
1
2
2121 ),(
![Page 26: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/26.jpg)
Jul 17, 2008 CS, ULB 26
Average models
• M-tree like mechanism– 1-nearest-neighbor (1-NN) query
• “Model distance” space is a metric space
• Topology is the overlaid communication tree
• Average model maintains radius as largest possible distance to any model in the subtree
• For each parameter
iavg
iavg
iavg
iavgi
max
min
),1/(min
),1.(maxmax
![Page 27: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/27.jpg)
Jul 17, 2008 CS, ULB 27
Min-max models
• R-tree like mechanism– 1-nearest-neighbor (1-NN) query
• “Model parameter” space is a vector space• Topology is the overlaid communication tree
• For each parameter σi, there is a lower (σimin.
(1-δ)) and an upper bound (σimax/(1-δ))
• The min-max models thus form a bounding rectangle– Similar to MBRs
![Page 28: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/28.jpg)
Jul 17, 2008 CS, ULB 28
“Curse of dimensionality”
• Dimensionality = number of model parameters
• No “curse” for sequence queries– Each index model computes two bounds of P(q|λ)– Pruning depends on whether χ (threshold) falls
within these bounds– Bounds are real numbers between 0 and 1– Single dimensional space – probability line
• “Curse” exists for model queries– R-tree, M-tree like pruning on parameter space
![Page 29: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/29.jpg)
Jul 17, 2008 CS, ULB 29
Outline
• Semantic modeling
• MIST: An index structure
• Query algorithms
• Experiments– Experimental setup– Effects of different parameters– Fault-tolerance
• Conclusions
![Page 30: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/30.jpg)
Jul 17, 2008 CS, ULB 30
Optimal slack• Large slack minimizes updates but querying cost
goes up• Reverse for small slack• Optimal can be chosen by analyzing expected total
costs• Non-linear optimization
– Difficult for local nodes– Almost impossible over the entire network– Changes in the models require re-computation
• Experimental method
![Page 31: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/31.jpg)
Jul 17, 2008 CS, ULB 31
Fault-tolerance
• Periodic heartbeat messages from child to parent– Extra messages
• When parent fails or child-parent link fails– Child finds another parent– Sends model parameters– Model, correlation, etc. is calculated afresh in parent
• When node or link comes up– Child switches to original parent– Old parent notified– Parents update their models, correlation, etc.
![Page 32: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/32.jpg)
Jul 17, 2008 CS, ULB 32
Outline
• Semantic modeling
• MIST: An index structure
• Query algorithms
• Experiments– Experimental setup– Effects of different parameters– Fault-tolerance
• Conclusions
![Page 33: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/33.jpg)
Jul 17, 2008 CS, ULB 33
Experimental setup• Two datasets
– Real dataset• Laboratory sensors• Temperature readings• Readings for every 30s for 10 days• 4 rooms, each having 4 sensors• States: C (cold, <25°C), P (pleasant), H (hot, >27°C)
– Synthetic dataset• Network size varied from 16 to 512• State size varied from 3 to 11• Correlation parameter ε varied from 0.001 to 0.5
• Both MCs and HMMs• Metric to measure
– Communication cost in bytes
![Page 34: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/34.jpg)
Jul 17, 2008 CS, ULB 34
Compared techniques
• Centralized with no slack– Node transmits all updates to BS– Zero querying cost
• Centralized with slack– Node maintains slack– Query sent to sensor nodes if cached models at BS
cannot answer
• MIST schemes– Average/min-max models– With/without slack
![Page 35: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/35.jpg)
Jul 17, 2008 CS, ULB 35
Effect of query rate
• Slack-based schemes win at small query rates• Centralized scheme with no slack is the best at
very high query rates
![Page 36: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/36.jpg)
Jul 17, 2008 CS, ULB 36
Update costs
• No-slack schemes have almost double costs• MIST’s slack schemes are better since updates are
pruned at every level in the hierarchy
![Page 37: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/37.jpg)
Jul 17, 2008 CS, ULB 37
Query costs
• Costs increase with decreasing correlation (1-ε)• At high correlation (low ε), no-slack schemes
(including centralized) perform better
![Page 38: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/38.jpg)
Jul 17, 2008 CS, ULB 38
Optimal slack
• Minimum exists for MIST’s schemes• Centralized: Due to low query rate, update costs
dominated over querying costs
![Page 39: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/39.jpg)
Jul 17, 2008 CS, ULB 39
Network size
• No-slack schemes are better• Querying cost increases due to higher bounds and
longer path lengths to leaf nodes
![Page 40: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/40.jpg)
Jul 17, 2008 CS, ULB 40
Number of states: update costs
• Update costs increase with number of states• MIST schemes are scalable due to
hierarchical pruning
![Page 41: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/41.jpg)
Jul 17, 2008 CS, ULB 41
Number of states: query costs
• Querying cost decreases– Each model parameter σ decreases– Probability of observing q, i.e., P(q|λ) decreases – Therefore, bounds decrease
![Page 42: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/42.jpg)
Jul 17, 2008 CS, ULB 42
Number of states: total costs
• For sequence queries, no “curse of dimensionality”
![Page 43: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/43.jpg)
Jul 17, 2008 CS, ULB 43
Number of states: model query
• For model queries, “curse of dimensionality” sets in– Scalable up to reasonable state sizes
![Page 44: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/44.jpg)
Jul 17, 2008 CS, ULB 44
Fault-tolerance experiments
• Costs increase moderately due to parent switching
• Scalable with probability of failure
![Page 45: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/45.jpg)
Jul 17, 2008 CS, ULB 45
Outline
• Semantic modeling
• MIST: An index structure
• Query algorithms
• Experiments
• Conclusions– Future work
![Page 46: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/46.jpg)
Jul 17, 2008 CS, ULB 46
Conclusions• A hierarchical in-network index structure for sensor
networks using statistical models• Hierarchical model aggregation schemes
– Average model– Min-max models
• Queries– Sequence queries– Model query
• Experiments– Better than centralized schemes in terms of update,
querying and total communication costs– Scales well with network size and number of states
![Page 47: Distributed Indexing and Querying in Sensor Networks using Statistical Models](https://reader036.fdocuments.us/reader036/viewer/2022062321/56813dd0550346895da798a9/html5/thumbnails/47.jpg)
Jul 17, 2008 CS, ULB 47
Future work
• How to overlay the tree?– Similar models should be in the same subtree
• “Quality” of tree
– Distributed solutions– What happens when models are updated?
• Fault-tolerance– How to find the best parent during faults?– Whether to switch back or stay after recovery– How to replicate information in siblings?
• Deployment