Search Techniques for Multimedia Databases Similarity-Based Queries Similarity Computation ...

Search Techniques for Multimedia Databases

Similarity-Based Queries

Similarity Computation

Indexing Techniques

Data Clustering

Search Algorithms

2

Characteristic ofMultimedia Queries

• We normally retrieve a few records from a traditional DBMS through the specification of exact queries based on the notions of “equality”.

• The types of queries expected in an image/video DBMS are relatively vague or fuzzy, and are based on the notion of “similarity”.

The indexing structure should be able to satisfy similarity-based queries for a wide range of similarity measures.

3

Content-Based Retrieval • It is necessary to extract the features

which are characteristic of the image and index the image on these features.

Examples: Shape descriptions, texture properties.

• Typically, there are a few different quantitative measures which describes the various aspects of each feature.

Example: The texture attribute of an image

can be modeled as a 3-dimensional vector with measures of directionality, contrast, and coarseness.

4

Measure of SimilarityA suitable measure of similarity between an image feature vector F and query vector Q is the weighted metric D:

where W is an nxn matrix which can be used to specify suitable weighting measures.

, )()( QFWQFD T

W1•(F1-Q1)2 + W2•(F2-Q2)2 + … + Wn•(Fn-Qn)2

Square of weighted Euclidean Distance

5

Measure of SimilarityA suitable measure of similarity between an image feature vector F and query vector Q is the weighted metric D:

where W is an nxn matrix which can be used to specify suitable weighting measures.

, )()( QFWQFD T

W1•(F1-Q1)2 + W2•(F2-Q2)2 + … + Wn•(Fn-Qn)2

Square of weighted Euclidean Distance

6

Similarity Based on Euclidean Distance ),()(),( QFAQFQFD T

F F F Q1 2 3

346

247

347

246

1 2 1 2

1 3 3

( , ) ( , ) and are equally similar to .

( , ) ( , ) is less similar to .

D F Q D F Q F F Q

D F Q D F Q F Q

D(F1 ,Q) 1 0 01 0 00 1 00 0 1

100

1

0 0 11 0 00 1 00 0 1

001

1

1 0 11 0 00 1 00 0 1

10

12

D(F2 ,Q)

D(F3 ,Q)

A: Identity Matrix

7

Similarity Based on Euclidean Distance (cont.)

Fea

ture

2

Feature 1

Points which lie at the same distance from the query point are considered equally similar, e.g., F1 and F2.

F1

F2 F3

Q

Features 1 & 2 are treated

equally

8

Similarity Based on Weighted Euclidean Distance

Where W is the diagonal. ),(),( QFWQFQFD T

F F Q W

D(F1 ,Q)

1 2

457

358

357

1 0 00 1 00 0 2

1 0 01 0 00 1 00 0 2

100

1

0 0 11 0 00 1 00 0 2

001

2

Example:

D(F2 ,Q)

D(F1 ,Q) < D(F2 ,Q) F1 is more similar to Q

Dissimilarity in 3rd dimension emphasized

9

How to determine the weights ?

A 0 0

0 00 0

σi2: Statistical variance of

the i th feature measure.

The variance of the individual feature measures can be used to determine their weights.

Rationale:

• Variance characterizes the dispersion among the measures

• Use larger weight for feature with smaller variance.

1/σ12

1/σ22

1/σ32

Effect of Weights

10

This feature has larger variance,

use smaller weight

A cluster of similar objects (e.g., cars)

Color

Shap

e

F1

F1 and F2 at same

distance from Q

An ellipsoid, not a circle.

Q

F2

ColorSh

ape

Distance in Euclidean Space

11

1-norm distance(Manhattan distance)

2-norm distance(Euclidean distance)

p-norm distance(Minkowski distance)

Infinity norm distance(Chebyshev distance)

Maximum distance

between any component of

the two vectors

Common Properties of a Distance

12

• Distances, such as the Euclidean distance, have some well known properties.

1. Positive Definiteness: d(p, q) 0 for all p and q and d(p, q) = 0 only if p = q.

2. Symmetry: d(p, q) = d(q, p) for all p and q.

3. Triangle Inequality: d(p, r) d(p, q) + d(q, r) for all points p, q, and r.

where d(p, q) is the distance (dissimilarity) between points (data objects), p and q.

• A distance that satisfies these properties is a metric

13

Query Types

• k-Nearest-Neighbor (k-NN) Queries: The user specifies the number of close matches to the given query point.

Retrieve 10 images most similar to this sample

• Range queries: An interval is given for each dimension of the feature space and all the images which fall inside this hypercube are retrieved.

r is large r is small range query vague query 4-nearest-neighbor query

Q Q Q.r

. .

.

.. .

..

. . ..+ +.

...

.. ..

.

14

Multiattribute and Spatial Indexing

Spatial Databases: Queries involve regions that are represented as multidimensional objects.

Example: A rectangle in a 2-dimensional space involves four values - two points with two values for each point (4D vector).

Access methods that index on multidimensional keys yield better performance for spatial queries.

(X1,Y1)

(X2,Y2)

15

Multiattribute and Spatial Indexing of Multimedia Objects

Multimedia Databases: Multimedia objects typically have several attributes that characterize them.

Example: Attributes of an image include coarseness, shape, color, etc.

Multimedia databases are also good candidates for multikey search structures. Average color

shape

Coa

rsen

ess

16

Indexing Multimedia Objects

Can’t we index multiple features using a B+-tree ?

– B+-tree defines a linear order (e.g., according to X)

– Similar objects (e.g., O1 and O2) can be far apart in the indexing order

•Why multidimensional indexing ?

– A multidimensional index defines a “spatial order”

– Conceptually similar objects are spatially near each other in the indexing order (e.g., O1 and O2)

Feature X

Fea

ture

Y

.O1

O2. .O3

.O4

.O5

O5 is closer to O1

then O2 is in B+-tree order

17

Some Multidimensional Search Structures

• k-d Tree

• Multidimensional Trie

• Grid File

• R Tree

• Point-Quad Tree

• D-Tree

18

k-d Tree• Each node consists of a “record” and two pointers. The pointers are

either null or point to another node.

• Nodes have levels and each level of the tree discriminates for one attribute.

• The partitioning of the space with respect to various attributes alternates between the various attributes of the n-dimensional search space.

Input Sequence

A = (65, 50) B = (60, 70) C = (70, 60) D = (75, 25) E = (50, 90) F = (90, 65) G = (10, 30) H = (80, 85) I = (95, 75)

A(65, 50)

B(60, 70) C(70, 60)

G(10,30) E(50,90) D(75, 25)

F(90, 65)

H(80, 85) I(95, 75)

Discriminator

X

Y

Y

Example: 2-D tree

X

Insertion order can affect performance

19

k-d Tree: Search Algorithm• Notations:

• Algorithm: Search for P(K1, ..., Kn)

Q := Root; /* Q will be used to navigate the tree */

While NOT DONE DO the following: if (Ki(P) = Ki(Q) for i = 1, ..., n) then we have /* Agree

in each */ located the node and we are DONE /* dimension */

Otherwise if A = Disc(Q) and KA(P) < KA(Q) then Q := High(Q)

else Q := Low(Q)

• Performance: O(logN), where N is the number of records

L

M N

(..., Ki(L), ...)Disc(L) : The discriminator at L’s levelKi(L) : The i-attribute value of node LLow(L) : The left child of LHigh(L) : The right child of L

N = High(L)

M = Low(L)

20

Multidimensional TrieMultidimensional tries, or k-d tries, are similar to k-d tree except that they divide the embedding space.

Each split evenly divides a region

Example: Construction of a 2D trie

X<=50 X>50

A(65, 50)

Insert A(65,50):

X<=50 X>50

Y<=50 Y>50

A(65,50) B(60, 70)

Insert B(60, 70):

X>75

X<=50 X>50

Y>50

X<=75

Y<=50

X<=75 X>75

Y>25

A(65,50)

X<=75 Y>75

X<=62.5 X>62.5

B(60,70) C(70,60)

Insert D(75, 25):

Y<=25

D(75,25)

10203040506070

10 20 30 40 50 60 70 80 90

D(75,25)

B(60,70)

C(70, 60)

A(65,50)

X

Y

B(60, 70) C(70, 60)

X<=50 X>50

Y<=50 Y>50

X<=75 X>75

X<=62.5 X>62.5

Y<=75 Y>75

A(65,50)

Insert C(70,60):

Partitioning the space

21

Disadvantages of k-d Trie

•The maximum level of decomposition depends on the minimum separation between two points

A solution: Split a region only if it contains more than p points (i.e., using buckets)

•Not a balanced tree

→ unpredictable performance

22

Grid File

Split Strategy: The partitioning is done with only one hyperplane, but the split extends to all regions in the splitting direction

linear scaleGrid directory

Data bucket H

A B C D

D E F G

H I J K

L L M K

0 25 50 75 100

1 2 3 4

100

75

50

25

0

1

2

3

4

Data basket K

23

Grid File - Potential Issues

• The directory can be quite sparse.

• Many adjacent directory entries may point to the same data block.

• For partial-match and range queries, many directory entries, but only few data blocks, may have to be scanned (i.e., sequential search might be faster)

Grid directory

Data bucket H

A B C K

D E F K

H I J K

L L M K

0 25 50 75 1001 2 3 4

Data basket K

24

Point-Quad Tree•Each node of a k-dimensional quad tree partitions

the object space into k quadrants.

•The partitioning is performed along all search dimensions and is data dependent, like k-d tree.

D(35,85)

A(50,50)

E(25,25)

To search for P(55, 75):

•Since XA< XP and YA < YP → go to NE (i.e., B).

•Since XB > XP and YB > YP → go to SW, which in this case is null.

Partitioning of the space

P

B(75,80)

C(90,65)

The quad tree

SE

SW

E

NW

D

NE

SESW NW

NE

C

Not a balanced tree

A(50,50)

B(75,80)

25

R-tree (RegionTree)

• R-tree is a higher generalization of B-tree

• The nodes correspond to disk pages

• All leaf nodes appear at the same level

• Root and intermediate nodes correspond to the smallest rectangle that encloses its child nodes, i.e., containing [r, <page pointer>] pairs.

• Leaf nodes contain pointers to the actual objects, i.e., containing [r, <RID> …].

• A rectangle may be spatially contained in several nodes (e.g., J ), yet it can be associated with only one node.

A C B

D E F G H I J K L

D

F

E

G

H

IL

K

A

B

C

J

Root

Level 2

May incur redundant search

Level 3

A C B

26

R-tree: Insertion• A new object is added to the

appropriate leaf node.

• If insertion causes the leaf node to overflow, the node must be split, and the records distributed in the two leaf nodes.

A C B

D E F G H I J K L

D

F

E

G

H

IL

K

A

B

C

J

– Minimizing the total area of the covering rectangles (compact clusters)

– Minimizing the area common to the covering rectangles (to minimize redundant search)

• Splits are propagated up the tree (similar to B-tree).

27

R-tree: Delete

If a deletion causes a node to underflow, its nodes are reinserted (instead of being merged with adjacent nodes as in B-tree).

Reason: There is no concept of adjacency in an R-tree.

28

D-tree: Domain DecompositionIf the number of objects inside a domain exceeds a certain threshold, the domain is split into two subdomains.

Horizontal Split

Split lineG FE

D B

A

C

D BA

C

G E F

A border object

1st subdomain

Original

domain

2nd subdomain

Vertical SplitA

B

C D

FE

G

D

Split along longest dimension

A

B

C

FE

G

Original

domain

1st subdomain

2nd subdomain

29

D-tree: Split Example

D-treeEmbedding Space

D

null

D

D1

null

D2

null

D1 D2

D11

null

D2

null

D12

nullD11

D2

D12

D11

null

D2

null

D121

null

D122

nullD11

D2

D121 D122

D11 D21

D22D121 D122

D1 D2

D11

null

D121

null

D122

null null

D21

null

D22

null null null

External node

Internal node

D22.P

30

D-tree: Split ExamplesD-tree Embedding Space

DInitial tree:D

null

After 3insertions:

D

D1 D2

D11

D12

After 2nd split:D11 D2 D12

null

After 1st split:D1 D2

null null

Original domain

Asubdomain

A domain node

A data node

31

D-tree: Split Example (continued)

D-tree Embedding SpaceAfter 3rd

split:D11

D121

D2

D122

D11 D2 D121 D122

After 4th

split:D1 D2

D11 D121 D122 D21 D22 D122

D11 D21

D121 D22

External node

Internal node

D22.P

32

D-tree: Search Algorithm

Search(D_tree_root, search_object )

Current_node = D_tree_root;

For each entry in current-node, say (D, P ), do

if D contains search_object, we do the following:

if Current_node is an external node

retrieve the objects through D.P & compare them

if Current_node is an internal node

call Search(D.P, search_object) /* Recursive call

33

D-tree: Range Query

A range query can be represented as a hypercube embedded in the search space

Search Strategy:

•Use D-tree to retrieve all subdomains which overlap with the query cube.

•For each such subdomain which is not fully contained in the query cube, discard the objects falling outside the query cube. Range

query

Discard this object

34

D-tree: Range Query

Search(D_tree_root, search_cube)

Current_node = D_tree_root

For each entry in Current_node, say (D, P), if D overlaps with search_cube, do:

– If Current_node is an external node, retrieve the objects in D.P, which fall within the overlap region.

– If Current_node is an internal node, call Search(D.P, search_cube).

35

D-tree: Desirable Properties•D-tree is a balanced tree

•Search path for an object is unique No redundant search

•More splits occur in denser regions of the search space.No unnecessary splits Objects evenly distributed among data nodes

•Similar objects are physically clustered in the same, or neighboring data nodes.

•Good performance is ensured regardless of the insertion order of the data.

36

Curse of Dimensionality

As the number of dimensions (D) increases, the probability of finding data in the sphere (Vol-S/V0l-C) decreases exponentially

→ Most data are in “corners” of the cube

→ More dimension we have, more similar things appear (i.e., data have equi-distance)

D Vol-S/Vol-C

1 100%

2 78%

3 52%

4 31%

5 17%

6 9%

Corners are very dense in

high dimensional spaces

37

Effect on High Dimensionality• Figure (a): As dimensionality increases, it

requires a substantially greater search radius to retrieve the same percentage of data (Note: 0.02% means retrieving 3 from 16,000 images.)

• Figure (b): In a high dimensional space, the approximating hypercube query returns substantially more candidate data items as the search radius increases

• Figure (c): The retrieval time increases exponentially with the increases in the number of dimensions (for a given selectivity) after a certain high dimensionality

Sphere query

Approximating query

Data space

1

1

Need larger approximate query Mostly

irrelevant data in corners

Long retrieval time

38

Effect on k-NN Queries• Process k-NN query

Use an approximating hypercube query to find kc candidate neighbors

Examine the candidates to determine the k nearest neighbors

• Effect of high dimensionality

In a high dimensional space, kc >> k

When kc is a very large percentage of the database, no index structure is helpful

Approximating query

Red are my

nearest neighbors

39

Sequential Scan is Better

– In a high-dimensional space, tree-based indexing structures examine large fraction of leaf nodes

– Instead of visiting so many tree nodes, it is better• to scan the whole data set, and • avoid performing seeks altogether

40

Vector Approximation (VA) File• How to speed-up linear scan ?

• A Solution - use approximation

– Divide data space into cells and allocates a bit-string to each cell

– Vectors inside a cell are approximated by the cell

– VA-file is an array of these geometric approximations

– For search, • the VA-file is scanned to select candidate vectors (i.e.,

relevant “cells”).

• Candidates are then verified by visiting the vector files (i.e., the original vectors)

41

VA-File Example

O1

O4

O3

O2

Data Space

00

11

10

01

00

111001

Approximation

O1 00 11

O2 10 10

O3 01 01

O4 11 00

Vector Data

O1 (0.1, 0.9)

O2 (0.7, 0.7)

O3 (0.3, 0.4)

O4 (0.8, 0.2)

VA file

Vectorfile

Principle Component Analysis

• Principle Component Analysis (PCA)– Goal is to find a projection that captures

the largest amount of variation in data

– transforms a number of possibly correlated variables (X1 and X2) into a smaller number of uncorrelated variables called principle components

• This technique can be used to reduce the number of dimensions in content-based image retrieval (CBIR)

42

X1

X2

e

Review: Variance

43

Mean of n data itemsStandard Deviation – A measureof how spread out the data are

Variance: A less computation version ofstandard deviation

Covariance

44

Variance only operates on one dimension

Covariance: It is useful to have a similar measure to find out how much the dimensions vary from the mean with respect to each other

Covariance Matrix

45

X1 X2 X3 X4 X5 X6 X7 X8

X1

X2

X3

X4

X5

X6

X7

X8

Cov(X3,X4)

Cov(X7,X7) = σ7

Review: Transformation Matrix

46

Transformation matrix

Vector - A point in the multidimensional

space

Another vector that is transformed from the original position

Review: Eigenvector (2)

47

• A matrix acts on a vector by changing both its magnitude and its direction

• This matrix may act on certain vectors by changing only their magnitude, and leaving their direction unchanged (or possibly reversing it)

• These vectors are the eigenvectors of the matrix

Not an eigenvector

Eigenvector

Integer multiple of the original vector, i.e., direction remains the same

A Transformation Example

48

• This transformation matrix does not change the direction or magnitude of the vectors along the central vertical axis (e.g., the red vector)

• All the pixels along the central vertical axis are the eigenvectors of this transformation matrix

Apply transformation

Eigenvector: Properties

49

Eigenvector

• Eigenvectors can only be found for square matrices• Not every square matrix has eigenvectors• If an nxn matrix does have eigenvectors, there are n of them• If we scale the vector by some amount before the multiplication,

we still get the same multiple of it as a result• All the unit eigenvectors (i.e., length is 1) are perpendicular

Eigenvalue

50

Eigenvector

The amount by which the original vector was scaled after multiplication by the square matrix is the same

• No matter what multiple of the eigenvector we take before the multiplication, we always get 4 times the scaled vector as the result

• “4” is the eigenvalue associated with this eigenvector

Scale the eigenvector

Eigenvalue

Scaled vector

Principal Components Analysis (1)

51

1. Prepare the matrix Data to hold the original data set, with a data item in each column and each row holding a separate dimension.

A feature vector

1st dimension

2nd dimension

nth dimension


52

2. Compute the mean for each dimension of the data set (i.e., averaging each row in Data)

A feature vector

Compute means


53

3. Compute the matrix DataAdjust by subtracting each entry in Data by the mean of the corresponding dimension

• Note: This produce a data set whose mean in each dimension is zero

Adjusted entry


54

4. Compute the covariance matrix for the dimensions


55

5. Calculate the unit eigenvectors and eigenvalues of the covariance matrix.

Note: Most math packages give unit eigenvectors


56

6. Sort the eigenvectors in descending order according to their eigenvalue.

• The eigenvector with the largest eigenvalue is the principal component

• The sorting order gives the components in order of significance

Moresignificant

PCA & Dimension Reduction

57

First principle component

(more important)

Second principle component

(less important)

• The eigenvectors corresponding to the principle components are orthogonal

• We can map data from the original space into the new space defined by the orthogonal vectors

• We can reduce the number of dimensions by dropping some of the less important components

PCA: Deriving the New Data Set

58

1. Choose the components (eigenvectors) we want to keep and form a transformation matrix F, with an eigenvector in each row

2. Compute the new data set by applying the transformation F to the adjusted data matrix (coordinate transformation):

FinalData = F x DataAdjust

T

• We have projected the data from the original coordinate system to a lower dimensional space defined by the chosen eigenvectors

• The relative spatial distances among the original data items are mostly preserved in the lower dimensional space

PCA - Summary• Determine the eigenvectors of the

covariance matrix

• These eigenvectors define the new space with lower dimensionality

59

Data Clustering

• Supervised Classification• Semi-supervised Classification• Unsupervised kMeans Clustering• Semi-Supervised kMeans Clustering• Constrained kMeans Clustering

60

Supervised Classification

61

Training data(labeled data)


62

Result of supervised

learning


63

Item to be

classified


64

Classified based on

the dividing line


• Support Vector Machine (SVM)

• Artificial Neural Networks (ANN)

65

Support Vector Machine (1)• The dashed lines mark the distance between the

dividing line and the closest vectors (points) to the line• The vectors that constrain the width of the margin are

the support vectors• SVM analysis finds the dividing line that maximizes the

margin

66Large

margin Small

marginSupport vectors

Support Vector Machine (2)• Points on a 2-dimensional plane can be separated

by a 1-dimensional plane (i.e., a line)

• Points in an d-dimensional plane can be separated by a (d-1)-dimensional hyperplane

67

Support Vector Machine (3)• Problem: What if the

points are separated by a nonlinear region

68

• Solution: Using a kernel function to map the data into a higher-dimensional space where a hyperplane can be used to do the separation

Semi-supervised LearningLabeling a lot of data can be expensive

•Solution: Semi-supervised learning– Make use of unlabeled data in conjunction

with a small amount of labeled data

•Examples– Semi-supervised EM [Ghahramani,NIPS94],

[Nigam,ML00]– Transductive SVM [Joachims,ICML99]– Co-training [Blum,COLT98]

69

Co-training (1)• Many problems have two

complimentary views that can be used to label data

• Example: Faculty home pages1. “my advisor” pointing to a page is a good

indicator that it is a faculty home page

2. “I am teaching” in a page is a good indicator that it is a faculty home page

70

Co-training (2)Features can be split into two sets, i.e., x=(x1,x2).L: set of labeled examples U: set of unlabeled examples

Repeat k times:Use L to train a classifier c1 that considers only x1

Use L to train a classifier c2 that considers only x2

Apply c1 to label p positive and n negative examples from U Apply c2 to label p positive and n negative examples from U Move these self-labeled examples from U to L /* c1 adds labeled examples to L that c2 will be able to use for learning, and vice verse

Assumption: The class of each instance can be accurately predicted from each of the two feature subsets alone 71

Unsupervised Clustering: kMeans

72

Randomly initialize k

means


73

Assign points to clusters


74

Re-estimate means


75

Re-assign points to clusters


76

Re-estimate means


77

Assign points to clusters

No changes → ConvergeNo changes → Converge

These are the clusters

Unsupervised Clustering: kMeans• Initialize k cluster centers• Repeat until convergence

– Assign points to the cluster with the closest center

– For each cluster, re-estimate the center as the mean of the points in that cluster

78

Property: Locally minimizes sum of distances between the data points and their corresponding cluster center → More compact clusters

Choose well-separated

centers

Semi-supervised Clustering

79

Labeled data

Idea: Uses small amount of labeled data to guide (bias) the clustering of unlabeled data

Example: Use the labeled data to initialize clusters in k-Means algorithm

Semi-supervised Clustering

80

There are three

clusters

Constrained kMeans Clustering

• Must-link constraints specify that the two points have to be in the same cluster

• Cannot-link constraints specify that the two points must not be placed in the same cluster

81

D: Data setCon= : Set of must-link constraints

Con≠ : Set of cannot-link constraints

Constrained kMeans Clustering [Wagstaff, ICML01]

1. Let C1 … Ck be the initial cluster centers

2. For each point di in D, assign it to the closest cluster Cj such that VIOLATION(di, Cj, Con=, Con≠) is false. If no such cluster exists, fail (return {})

3. For each cluster Ci , update its center by averaging all of the points di that have been assigned to it

4. Repeat (2) and (3) until convergence

5. Return {C1 … Ck }

82

VIOLATION(data point d, cluster C, must-link constraints Con= , cannot-link constraints Con≠)

1. For each (d, d=) ϵ Con= , If d= is in some C’, return true

2. For each (d, d≠) ϵ Con≠ , If d≠ is in C, return true

3. Otherwise, return false

Can we assign d to C without violating any

constraint

83

Content-Based Image Indexing

• Keyword Approach

– Problem: there is no commonly agreed-upon vocabulary for describing image properties.

• Computer Vision Techniques

– Problem: General image understanding and object recognition is beyond the capability of current computer vision technology.

• Image Analysis Techniques

– It is relatively easy to capture the primitive image properties such as

• prominent regions,

• their colors and shapes,

• and related layout and location information within images.

– These features can be used to index image data.– Challenge: Semantic gap !

Original Segmented Contour

84

Features Acquisition: Image Segmentation

• Group adjacent pixels with similar color properties into one region, and

• segment the pixels with distinct color properties into different regions.

85

Image Indexing by contents

By applying image segmentation techniques, a set of regions are detected along with their locations, sizes, colors, and shapes.

These features can be used to index image data.

86

Color• We can divide the color space into a

small number of zones, each of which is clearly distinct with others for human eyes.

• Each of the zones is assigned a sequence number beginning from zero.

Notes: Human eyes are not very sensitive to colors. In fact, users only have a vague idea about the colors they want to specify.

87

ShapeShape feature can be measured by two properties: circularity and major axis orientation.

– Circularity:

– Major Axis Orientation:

The more circular the shape, the closer to one the circularity

420

0o ≤ orientation ≤ 360o

04

12

circularityarea

perimeter

circularityr

r

41

2 ( )

(2 )2

circularitya

a

4

4 4

2

2

( )

( )

circularitya

a

4 2

6

2

9

2

2

( )

( )

a

2a

a

a

r

88

Location• Image is divided into sub-areas.

• Each sub-area is labeled with a number.

• Region location is represented by ID of the sub-area in which the gravity center of the region is contained.

Note: When a user queries the database by visual contents, approximate feature values are used. It is meaningless to use absolute feature values as indices.

0 1 2

3 4 5

6 7 8A

B • Location of A is 4

• Location of B is 1

89

Size• The size range is divided into groups.• A region’s size is represented by the

corresponding group number.Example:

Notes: Only regions more than one-fourth of the sub-area are registered.

ASSIGNED SIZE

ACTUAL SIZES

1 ¼ Asub ˂ S ≤ ½ Asub

2 ½ Asub ˂ S ≤ Asub

3 Asub ˂ S ≤ 2 Asub

4 2 Asub ˂ S ≤ 3 Asub







0 1 2

3 4 5

6 7 8A

B

Cell area is

Asub

90

Texture Areas

Texture areas and images with dominant high frequency components are beyond the capacity of image segmentation techniques.

Matching on the distribution of colors (i.e., color histograms) is a simple yet effective means for these areas.

Strategy: Dividing an image into sub-areas and creating a histogram for each of the sub-areas.

Potential Issue: the partitioning of the image is to capture locality information. We don’t want to match an image with a red balloon on top with an image with a red car in the bottom.

91

Histograms• Gray-Level Histogram: It is a plot of the number

of pixels that assume each discrete value that the quantized image intensity can take.

• Color Histogram: It holds information on color distribution. It is a plot of the statistics of the R, G, B components in the 3-D color space.

R 3

B 3

G 2

R 3

B 3

G 2

Color ImageB

G

R

Color Histogram

3x3x2 = 18 bins

Image

Intensity

Gray-Level Histogram

Co

un

t

36

18

10

white gray black

92

We can use the largest, say 20, bins as the representative bins of the histogram.

these 20 bins form a chain in the 3-D color space.

Histograms (cont.)

B

(0,1,1)

(2,3,0)(6,2,0)

(8,2,6)

G

RO

Most histogram bins are sparsely populated, with only a small number of bins capturing the majority of pixel counts.

93

If we can represent such chains using a numerical number, then we can index the color images using a B+-tree.

– Connecting order: Representative bins are sorted in ascending order by their distance from the origin of the color space.

– Weighted Perimeter:

– Weighted Angle:

– Format of index key:

Histograms (cont.)

iii

i dC

WP ,1

20

1 2

1

ii

i aC

WA

20

1 2

1

B

(0,1,1)

(2,3,0)

(3,2,3)

(6,2,0)

(8,2,6)

G

ROdi i 1, ai

WP (10 bits) WA (10 bits)

94

Video Content Extraction

• Other forms of information extraction can be employed:

• Close-captioned text• Speech recognition• Descriptive information from screenplay• Key frames that characterize a shot

• These content information can be associated with the video story units.

95

Story UnitsShot: Frames recorded in one camera

operation form a shot.

Scene: One or several related shots are combined in a scene.

Sequence: A series of related scenes forms a sequence.

Video: A video is composed of different story units such as shots, scenes, and sequences arranged according to some logical structure (defined by the screen play).

These concepts can be used to organize video data

96

Video Modeling Approaches

• Physical Feature based Modeling

• Semantic Content based Modeling

97

Physical Feature based Modeling

• A Video is represented and indexed based on audio-visual features

– Features can be extracted automatically

• Queries are formulated in terms of color, texture, audio, or motion information

– Very limited in expressing queries close to human thinking

– It would be difficult to ask for a video clip showing the sinking of the ship in the movie “Titanic” using only color description

98

Semantic Content based Modeling

• Video semantics are captured and organized to support video retrieval

– Difficult to automate

– Partially rely on manual annotation

• Capable of supporting natural language like queries.

99

Semantic-Level Models

• Segmentation-based Models

• Stratification-based Models

• Temporal Coherent Model

100

Segmentation-based Modeling

• A video stream is segmented into temporally continuous segments

• Each segment is associated with a description which could be natural text, keywords, or other kinds of annotation.

• Disadvantages:– Lack of flexibility

– Limited in representing semantics

=>

=>Teacher enters classA student raises hand Break Principal interrupts class Empty class

0 10 30 35 70 90

101

Stratification-based• We partition the contextual information into single

events.

• Each event is associated with a video segment called a stratum.

• Strata can overlap or encompass each other.

Event 1

Event 2

Event 3

Event 3

Event 4

0 5 15 20 30 35 60 70 85 90

102

Temporal Coherent• Each event is associated with a set of video

segments where it happens.

• More flexible in structuring video semantics.

Event 1

Event 2

Event 3

Event 4

time

Advantage: Allowing easy retrieval by keyword, e.g., using inverted index

103

StratumThe concept of stratification can be used to assign descriptions to video footage.

- Each stratum refers to a sequence of video frames.

- The strata may overlap or totally encompass each other. Car wreck rescue mission

Medics

Victim

Pulled free In stretcher In ambulance

Siren Ambulance

Video Frames

time

104

Inverted Index

• frequency of the term in the document,

• locations of the term in the document,• other information pertaining to the relationship of

the term and the document.

Inverted Index

ANSI D1, D2

C D3, D4, D5

DB2 D1, D6, D21

GUI D2, D8. D11

SYBASE D2, D6, D17

MULTIMEDIA D5, D15

ORACLE D3, D11, D19

RELATIONAL D2, D3, D11, D19

SQL D2, D11, D20

JAVA D3, D20

B+-tree

Each document entry contains:

105

Video AlgebraGoal: To provide a high-level abstraction that

– models complex information associated with digital video data, and

– supports content-based access

Strategy:

– The algebraic video data model consists of hierarchical compositions of video expressions with high-level semantic descriptions

– The video expressions are constructed using video algebra operations

106

PresentationThe fundamental entity is a presentation

•A presentation is described by a video expression.

•A video expression describes a multi-window spatial-temporal, and content combination of video segments.

A presentatio

n

107

PresentationThe fundamental entity is a presentation

•A presentation is described by a video expression.

•A video expression describes a multi-window spatial-temporal, and content combination of video segments.

•An algebraic video node provides a means of abstraction by which video expressions can be named stored, and manipulated as units

video expression

video expression

video expression video expression

Compound video expression: constructed from simpler expressions using video algebra operations

Primitive video expression: creates a single-window presentation from a raw video segment

An algebraic

video node

Raw videoRaw video

108

Video Algebra Operations

The video algebra operations fall into four categories:

1. Creation: defines the construction of video expressions from raw video.

2. Description: associates content attributes with a video expression.

3. Composition: defines temporal relationships between component video expressions.

4. Output: defines spatial layout and audio output for component video expressions.

109

Descriptions•description E1 content : specifies that E1 is

described by content.– a content is a Boolean combination of attributes that

consists of a field name and a value.

– some field names have predefined semantics (e.g., title), while other fields are user-definable.

– values can assume a variety of types, including strings and video node names.

– field names or values do not have to be unique within a description.

•hide-content E1 : defines a presentation that hides the content of E1 (i.e.., E1 does not contain any description).

– This operation provides a method for creating abstraction barriers for content-based access.

Example: title = “CNN Headline News”

110

CompositionThe composition operations can be combined to produce complex scheduling definitions and constraints.

C1 = create Cnn.HeadlineNews.rv 10 30C2 = create Cnn.HeadlineNews.rv 20 40C3 = create Cnn.HeadlineNews.rv 32 65D1 = (description C1 “Anchor speaking”)D2 = (description C2 “Professor Smith”)D3 = (description C3 “Economic reform”)

Anchor speaking

Professor Smith

Economic reform

create a video presentation raw video segment

C1 C2 C3

)( 321 DDD

D3 follows D2 which follows D1, and common footages are not repeated. (It creates a non-redundant video stream from three overlapping segments.)

D3 follows D2 which follows D1, and common footages are not repeated. (It creates a non-redundant video stream from three overlapping segments.)

111

Composition Operators (1)

Operator Description

E1 E2

defines the presentation where E2 follows E1

E1 E2

defines the presentation where E2 follows E1 and common footage is not repeated

E1 E2

defines the presentation where only common footage of E1 and E2 is played

E1 - E2

defines the presentation where only footage of E1 that is not in E2 is played

E1 || E2

E1 and E2 are played concurrently and terminate simultaneously

(test) ? E1:E2:...:En Ei is played if test evaluates to i.

loop E1 timedefines a repetition of E1 for a duration of time

112

Composition Operators (2)

Operator Description

stretch E1 factor

sets the duration of the presentation equal to factor times duration of E1 by changing the playback speed of the video segment

limit E1 time

sets the duration of the presentation equal to the minimum of time and the duration of E1, but the playback speed is not changed

113

Composition Operators (3)• transition E1 E2 type time: defines type

transition effect between E1 and E2; time defines the duration of the transition effect

The transition type is one of a set of transition effects, such as dissolve, fade, and wipe.

• contains E1 query: defines the presentation that contains component expressions of E1 that match query. (similar to FROM clause in SQL)

A query is a Boolean combination of attributes:

Example: text: smith and text: question

114

Output CharacteristicsVideo expressions include output characteristics that specify the screen layout and audio output for playing back children streams.

A presentatio

n

115

Output CharacteristicsVideo expressions include output characteristics that specify the screen layout and audio output for playing back children streams.

– window E1 (X1 , Y1 ) - (X2 , Y2 ) priority

specifies that E1 will be displayed with priority in the window defined by the top-left corner (X1 , Y1) and the bottom-right corner (X2 , Y2) such that Xi in [0, 1] and Yi in [0, 1].

Window priorities are used to resolve overlap conflicts of screen display.

– audio E1 channel force priority

specifies that the audio of E1 will be output to channel with priority; if force is true, then the audio operation overrides any channel specifications of the component video expressions.

116

Output Characteristics exampleC1 = create MagicvsBulls.rv 30:0 50:0P1 = window C1 (0, 0) - (0.5, 0.5) 10P2 = window C1 (0, 0.5) - (0.5, 1) 20P3 = window C1 (0.5, 0.5) - (1, 1) 30P4 = window C1 (0.5, 0) - (1, 0.5) 40P5 = (P1 || P2 || P4)P6 = (P1 || P2 || P3 || P4)

(P5 || (window

(P5 || (window P6 (0.5, 0.5) - (1, 1) 60)) (0.5, 0.5) - (1, 1) 50))

Larger means higher priority

Bottom-right corner

Top-left corner

P1

P2

P4

P1 P4

P2

P1 P4

P2 P3

0

A presentation

Since expressions can be nested, the spatial layout of any particular video expression is defined relative to the parent rectangle.

117

Scope of a video node description

The scope of a given algebraic video node description is the subgraph that originates from the node.

– The components of a video expression inherit descriptions by context.

– i.e., content attributes associated with some parent video nodes are also associated with all its descendant nodes.

118

Content Based AccessSearch query : Search a collection of video nodes for video expressions that match query. Example: search text: smith AND text: question

Strategy: Matching a query to the attributes of an expression must take into account all of the attributes of that expression including the attributes of its encompassing expressions.

.

O

O

Raw video

Anchor Smith

Question fromaudience

Question

Smith on economic reform Result of

the queryThis node also

satisfies the querybut is not returnedbecause it’s adescendant of a node already inthe result set

119

Browsing and NavigationPlayback video-expression

Plays the video expression. It enables the user to view the presentation defined by the expressions.

Display video-expressionDisplay the video expression. It allows the user to inspect the video expression.

Get-parent video-expressionReturns the set of nodes that directly point to video-expression.

Get-children video-expressionReturns the set of nodes video-expression directly points at.

120

Algebraic Video System Prototype

The implementation is build on top of three existing subsystems:

– The VuSystem is used for managing raw video data and for its support of Tcl (Tool command language) programming. It provides an environment for recording, processing, and playing video.

– The Semantic File System is used as a storage subsystem with content-based access to data for indexing and retrieving files that represent algebraic video nodes.

– The Web server provides a graphical interface to the system that includes facilities for querying, navigating, video editing and composing, and invoking the video player.

Search Techniques for Multimedia Databases Similarity-Based Queries Similarity Computation ...

Documents

Transcript of Search Techniques for Multimedia Databases Similarity-Based Queries Similarity Computation ...