Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity...
-
Upload
juliet-webb -
Category
Documents
-
view
225 -
download
0
Transcript of Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity...
![Page 1: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/1.jpg)
Multimedia DBs
![Page 2: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/2.jpg)
Multimedia dbs
A multimedia database stores text, strings and images
Similarity queries (content based retrieval) Given an image find the images in the database
that are similar (or you can “describe” the query image)
Extract features, index in feature space, answer similarity queries using GEMINI
Again, average values help!
![Page 3: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/3.jpg)
Image Features
Features extracted from an image are based on: Color distribution Shapes and structure …..
![Page 4: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/4.jpg)
Images - color
what is an image?A: 2-d RGB array
![Page 5: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/5.jpg)
Images - color
Color histograms,and distance function
![Page 6: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/6.jpg)
Images - color
Mathematically, the distance function between
a vector x and a query q is:
D(x, q) = (x-q)T A (x-q) = aij (xi-qi) (xj-qj)
A=I ?
![Page 7: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/7.jpg)
Images - color
Problem: ‘cross-talk’: Features are not orthogonal -> SAMs will not work properly
Q: what to do? A: feature-extraction question
![Page 8: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/8.jpg)
Images - color
possible answers: avg red, avg green, avg blue
it turns out that this lower-bounds the histogram distance ->
no cross-talk SAMs are applicable
![Page 9: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/9.jpg)
Images - color
performance:
time
selectivity
w/ avg RGB
seq scan
![Page 10: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/10.jpg)
Images - shapes distance function: Euclidean, on
the area, perimeter, and 20 ‘moments’
(Q: how to normalize them?
![Page 11: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/11.jpg)
Images - shapes distance function: Euclidean, on
the area, perimeter, and 20 ‘moments’
(Q: how to normalize them? A: divide by standard deviation)
![Page 12: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/12.jpg)
Images - shapes distance function: Euclidean, on
the area, perimeter, and 20 ‘moments’
(Q: other ‘features’ / distance functions?
![Page 13: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/13.jpg)
Images - shapes distance function: Euclidean, on the
area, perimeter, and 20 ‘moments’ (Q: other ‘features’ / distance
functions? A1: turning angle A2: dilations/erosions A3: ... )
![Page 14: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/14.jpg)
Images - shapes distance function: Euclidean, on
the area, perimeter, and 20 ‘moments’
Q: how to do dim. reduction?
![Page 15: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/15.jpg)
Images - shapes distance function: Euclidean, on
the area, perimeter, and 20 ‘moments’
Q: how to do dim. reduction? A: Karhunen-Loeve (= centered
PCA/SVD)
![Page 16: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/16.jpg)
Images - shapes Performance: ~10x faster
# of features kept
log(# of I/Os)
all kept
![Page 17: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/17.jpg)
Dimensionality Reduction Many problems (like time-series and
image similarity) can be expressed as proximity problems in a high dimensional space
Given a query point we try to find the points that are close…
But in high-dimensional spaces things are different!
![Page 18: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/18.jpg)
Effects of High-dimensionality
Assume a uniformly distributed set of points in high dimensions [0,1]d
Let’s have a query with length 0.1 in each dimension query selectivity in 100-d 10-
100
If we want constant selectivity (0.1) the length of the side must be ~1!
![Page 19: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/19.jpg)
Effects of High-dimensionality
Surface is everything! Probability that a point is closer
than 0.1 to a (d-1) dimensional surface D=2 0.36 D = 10 ~1 D=100 ~1
![Page 20: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/20.jpg)
Effects of High-dimensionality
Number of grid cells and surfaces Number of k-dimensional surfaces in
a d-dimensional hypercube Binary partitioning 2d cells
Indexing in high-dimensions is extremely difficult “curse of dimensionality”
![Page 21: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/21.jpg)
Dimensionality Reduction The main idea: reduce the dimensionality of the
space. Project the d-dimensional points in a k-
dimensional space so that: k << d distances are preserved as well as possible
Solve the problem in low dimensions (the GEMINI idea of course…)
![Page 22: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/22.jpg)
DR requirements The ideal mapping should:1. Be fast to compute: O(N) or O(N
logN) but not O(N2)2. Preserve distances leading to
small discrepancies3. Provide a fast algorithm to map a
new query (why?)
![Page 23: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/23.jpg)
MDS (multidimensional scaling)
Input: a set of N items, the pair-wise (dis) similarities and the dimensionality k
Optimization criterion: stress = (ij(D(Si,Sj) - D(Ski, Skj) )2 / ijD(Si,Sj) 2) 1/2
where D(Si,Sj) be the distance between time series Si, Sj, and D(Ski, Skj) be the Euclidean distance of the k-dim representations
Steepest descent algorithm: start with an assignment (time series to k-dim point) minimize stress by moving points
![Page 24: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/24.jpg)
MDS Disadvantages:
Running time is O(N2), because of slow convergence
Also it requires O(N) time to insert a new point, not practical for queries
![Page 25: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/25.jpg)
FastMap [Faloutsos and Lin, 1995]
Maps objects to k-dimensional points so that distances are preserved well
It is an approximation of Multidimensional Scaling
Works even when only distances are known Is efficient, and allows efficient query
transformation
![Page 26: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/26.jpg)
FastMap Find two objects that are far away Project all points on the line the two objects
define, to get the first coordinate
![Page 27: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/27.jpg)
FastMap - next iteration
![Page 28: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/28.jpg)
ResultsDocuments /cosine similarity ->
Euclidean distance (how?)
![Page 29: Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cfa5503460f949cbba0/html5/thumbnails/29.jpg)