Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations
description
Transcript of Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations
![Page 1: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/1.jpg)
Finding Probabilistic Nearest Neighbors for Query Objects with
Imprecise Locations
Yuichi Iijima and Yoshiharu IshikawaNagoya University, Japan
![Page 2: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/2.jpg)
Outline
• Background and Problem Formulation• Related Work• Query Processing Strategies• Experimental Results• Conclusions
2
![Page 3: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/3.jpg)
3
Imprecise Location Information
• Sensor Environments– Measurement Noise– Frequent updates may not be possible
• GPS-based positioning consumes batteries• Robotics
– Localization using sensing andmovement histories
– Probabilistic approach hasvagueness
• Privacy– Location Anonymity
![Page 4: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/4.jpg)
4
Nearest Neighbor Queries
• Nearest Neighbor Queries– Example: Find the closest bus stop– Traditional problem in spatial databases
• Efficient query processing using spatial indices• Extensible to multi-dimensional cases (e.g., image
retrieval)• What happen if the
location of query object is uncertain? q
NN of q
![Page 5: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/5.jpg)
Example Scenario (1)
• Query: Find the nearest gas station
5
Location estimation based on noisy GPS data
35%
10%
55%
Nearest gas stationdepends on the possiblecar locations
0%
NN objects are definedin a probabilistic way
![Page 6: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/6.jpg)
Example Scenario (2)
• Mobile Robotics– Location of the robot is estimated
based on movement histories andsensor data
– Measurements are noisy– Localization based on probabilistic
modeling• Kalman filter, particle filter, etc.
– Estimated location is given as aprobabilistic density function (PDF)• PDF changes on each estimation
6
![Page 7: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/7.jpg)
7
Probabilistic Nearest Neighbor Query (1)
• PNNQ for short• Assumptions
– Location of query object q is specified as a Gaussian distribution
– Target data: static points• Gaussian Distribution
–Σ: Covariance matrix
)()(
21exp
||)2(1)( 1
2/12/ qxqxx ΣΣ
tdqp
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08
-4-2
0 2
4
0 0.02 0.04 0.06 0.08
pq(x)
![Page 8: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/8.jpg)
8
Probabilistic Nearest Neighbor Query (2)
• Definition
• Find objects which satisfy the condition– The probability that
the object is thenearest neighbor of q is greater than or equal to θ
q
),','Pr(),(Pr o'xox ooΟooqNN
}),(Pr,|{),( oqOooqPNNQ NN
![Page 9: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/9.jpg)
Outline
• Background and Problem Formulation• Related Work• Query Processing Strategies• Experimental Results• Conclusions
9
![Page 10: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/10.jpg)
Related Work
• Query processing methods for uncertain (location) data– Cheng, Prabhakar, et al. (SIGMOD’03, VLDB’04, …)– Tao et al. (VLDB’05, TODS’07)– Consider arbitrary PDFs or uniform PDFs– Most of the case, target objects are imprecise
• Research related to Gaussian distribution– Gauss-tree [Böhm et al., ICDE’06]
• Target objects are based on Gaussian distributions
• Our former work– Ishikawa, Iijima, Yu (ICDE’09): Probabilistic
range queries10
![Page 11: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/11.jpg)
Outline
• Background and Problem Formulation• Related Work• Query Processing Strategies• Experimental Results• Conclusions
11
![Page 12: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/12.jpg)
Naïve Approach (1)
• Use of Voronoi Diagram– Well-known method for standard (non-imprecise)
nearest neighbor queries
12
a b
e
f g
c
d
h
Voronoi region Ve
![Page 13: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/13.jpg)
Naïve Approach (2)
• PrNN(q, o): Nearest neighbor probability– Probability that object o is the nearest
neighbor of query object q– It can be computed by integrating the
probability density function pq(x) over Voronoi region Vo
• Problem– Need to consider all target
objects– Numerical integration (Monte
Carlo method) is quite costly! 13
a b
e
fg
c
d
h
qpq(x)
Ve
oV qNN dpoq xx)(),(Pr
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08
-4-2
0 2
4
0 0.02 0.04 0.06 0.08
pq(x)
![Page 14: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/14.jpg)
Our Approach
• Outline of processing1. Filtering
• Prune non-candidate objects whose PrNN are obviously smaller than the threshold
• Low-cost filtering conditions2. Numerical integration for the remaining
candidate objects• Two strategies
– -region-based Approach– SES-based Approach
• SES: Smallest Enclosing Sphere
14
![Page 15: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/15.jpg)
Strategy 1: -Region-Based Approach (1)
• -region– Similar concepts are often used in query processing for
uncertain spatial databases– Definition: Ellipsoidal region for which the result of the
integration becomes 1 – 2 :
The ellipsoidal region
is the -region• Example: is specified as 1%,
we consider 98% -region15
21 )()(21)(
r qt
dpqxqx
xxΣ
21)()( r
t
qxqx Σ
q
-region
![Page 16: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/16.jpg)
Strategy 1: -Region-Based Approach (2)
• Query Processing1. -region for the query is computed at first
• -region can be derived using r-table:– It is constructed for the normal Gaussian (S = I, q = 0): Given ,
it returns appropriate r
• Final -region can be obtained by transformation2. Derive the bounding box of
-region 3. Objects whose Voronoi
regions overlaps with the box are the candidates
• {a, c, d, f, g}, in this example16
q
b
e
h
θ-region
ac
f
dg
![Page 17: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/17.jpg)
Strategy 1: -Region-Based Approach (3)
• Derivation details• First, we consider the
normal Gaussian– Using r-table, we can get
the appropriate r for given
• Second, a spherical -region is derived based on transformation – Transformation is performed
by analyzing S
• Third, the bonding box is calculated
where (Σ)ii is the (i, i) entry of Σ
17
qr
q
iii
ii rw
)(Σ
qwj
wj
wi wi
xi
xj
![Page 18: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/18.jpg)
Strategy 2: SES-Based Approach (1)
• SES: Smallest Enclosing Sphere– For each Voronoi region Vo, we compute its SES
SESo beforehand
18
SESe: SES for Voronoi region of e
a b
e
f g
c
d
h
![Page 19: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/19.jpg)
Strategy 2: SES-Based Approach (2)
• Integration over SESo gives the upper-bound for PrNN(q, o)
– Integration over a sphere region is more easier to compute• We use a table called
U-catalog constructed beforehand
19
oo SES qV qNN dpdpoq xxxx )()(),(Pr
a b
e
fg
c
d
h
qpq(x)
![Page 20: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/20.jpg)
Strategy 2: SES-Based Approach (3)
• What is U-catalog?– Given two parameters and , it returns corresponding
integral– U-catalog is made for different (, ) pairs by computing
the integral of normal Gaussian over sphere region R
20
xxx
dpR
)(),( norm
R
q
α δ π(α, δ)0.0 0.1 …0.0 0.2 …… … …1.0 0.1… … …
![Page 21: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/21.jpg)
Strategy 2: SES-Based Approach (4)
• To use U-catalog, another approximation is required since it is only useful for normal Gaussian– Use of upper-bounding function pq(x)– pq(x) tightly bounds pq(x) and has spherical isosurfaces– For pq(x), we can easily derive its integral over SESo using
U-catalog• In summary, we use two
approximations:
21
ab
e
fg
c
d
h
q
pq(x)
o
o
o
SES q
SES q
V qNN
dp
dp
dpoq
xx
xx
xx
)(
)(
)(),(Pr
T
pq(x)
T
T
T
T
![Page 22: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/22.jpg)
Strategy 2: SES-Based Approach (5)
• Bounding Function– Original Gaussian PDF
– Upper-Bounding Function
where
22
)()(
21exp
)2(1)( 1
2/12/qxqxx Σ
Σt
dqp
2
2/12/ 2exp
)2(1)( qxx
TT
Σdqp
}min{1
1
i
d
i
tiii
T
vvΣ )()( xx Tqq pp
is satisfied
![Page 23: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/23.jpg)
Outline
• Background and Problem Formulation• Related Work• Query Processing Strategies• Experimental Results• and Conclusions
23
![Page 24: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/24.jpg)
24
Setup of Experiments (1)
• Target Data– 2D point data (50K entries)– Based on road line
segments in Long Beach• Computation of Voronoi
regions and SESs– LEDA package was used
• Comparison– Strategies 1, 2, and their hybrid approach– Evaluation metric: Response time
![Page 25: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/25.jpg)
25
Setup of Experiments (2)
• Default Parameters– Covariance matrix
• For this matrix the shape of the isosurface of pq(x) is an ellipse titled at 30 degrees and the major-to-minor axis ratio is 3:1
• γ : Parameter for controlling vagueness (default: γ = 10)– Probability threshold value: θ = 0.01– No. of samples for Monte Carlo method: 1,000,000
732327Σ
qpq(x)
![Page 26: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/26.jpg)
Candidate Objects in Strategy 1
26
Voronoi regions for candidate objects
Voronoi regions for answer objects
Bounding box of the θ-region
Query center
![Page 27: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/27.jpg)
Candidate Objects in Strategy 2
27
Voronoi regions for candidate objects
Voronoi regions for answer objects
Query center
![Page 28: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/28.jpg)
Candidate Objects in Hybrid Strategy
28
Voronoi regions for candidate objects
Voronoi regions for answer objects
Bounding box of the θ-region
Query Center
![Page 29: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/29.jpg)
Experimental Results - Default Parameters
• Most of the processing time is spent in numerical integration
• A strategy that can prune more objects has better performance
29
Number of candidates 179 150 129Number of answers 26
Strategy 1 Strategy 2 Hybrid0.00 5.00
10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00
0.09 0.19 0.13
43.70 38.70
34.13
2.45 2.48
2.47
Filtering Compute Prob. Rest
Res
pons
e Ti
me[
s]
![Page 30: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/30.jpg)
Experimental Results - Different γ Values
30
Number of candidates 24 31 23Number of answers 8
γ = 1(almost exact)
γ = 10 γ = 50(too vague)
Strategy 1
Strategy 2
Hybrid0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00
10.00
0.09 0.13 0.13
6.28 6.05 5.53
2.45 2.48 2.48
Res
pons
e Ti
me[
s]
Strategy 1
Strategy 2
Hybrid0.00 5.00
10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00
0.09 0.19 0.13
43.70 38.70 34.13
2.45 2.48
2.47
Filtering Compute Prob. Rest
Strategy 1
Strategy 2
Hybrid0.00
50.00
100.00
150.00
200.00
250.00
0.09 0.22 0.13
209.22
70.88 66.41
2.45
2.47 2.46
179 150 129
26
847 276 260
15
Query is costly when impreciseness is high
![Page 31: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/31.jpg)
Experimental Results - Different Shapes
32
Number of candidates 115 50 49Number of answers 26
Circle Default Ellipse Narrow Ellipse
179 150 129
26
276 366 250
24
Strategy 1
Strategy 2
Hybrid0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
0.09 0.15 0.13
26.97
13.60 13.42
2.45
2.46 2.45
Res
pons
e Ti
me[
s]
Strategy 1
Strategy 2
Hybrid0.00 5.00
10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00
0.09 0.19 0.13
43.70 38.70 34.13
2.45 2.48
2.47
Filtering Compute Prob. Rest
Strategy 1
Strategy 2
Hybrid0.00
10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00
100.00
0.09 0.20 0.13
69.69 84.56
62.56
2.45
2.46
2.46
Narrow ellipse (correlated distribution) needs to consider additional candidates
![Page 32: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/32.jpg)
Conclusions of Experimental Results
• Most of the processing time is spent in numerical integration⇒A strategy that can prune more objects has
better performance Superiority or inferiority of two strategies
depends on the given query and the specified parameters No apparent winner
• The hybrid strategy inherits the benefits of two strategies and shows the best performance
33
![Page 33: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/33.jpg)
Outline
• Background and Problem Formulation• Related Work• Query Processing Strategies• Experimental Results• Conclusions
34
![Page 34: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/34.jpg)
Conclusions
• Nearest neighbor query processing methods for imprecise query objects– Location of query object is represented by
Gaussian distribution– Two strategies and their combination– Reduction of numerical integration is important– Proposal of two pruning strategies and their
evaluation• Future work
– Evaluation for multi-dimensional cases (d 3)
35
![Page 35: Finding Probabilistic Nearest Neighbors for Query Objects with Imprecise Locations](https://reader034.fdocuments.us/reader034/viewer/2022051317/568164a0550346895dd68f76/html5/thumbnails/35.jpg)
Finding Probabilistic Nearest Neighbors for Query Objects with
Imprecise Locations
Yuichi Iijima and Yoshiharu IshikawaNagoya University, Japan