Multimedia Retrieval Architecture
Electrical Communication Engineering,Indian Institute of Science, Bangalore – 560012, India
Multimedia Retrieval Architecture
Multimedia Retrieval Architecture
Query processing
Query processing requirement of a multimedia presentation unit
Heterogeneous presentation units (image, video, audio etc) may beCombined to form single presentation units.
a. These units may have different attributes and accessing methods.b. Temporal relationships between presentation components.
To support the retrieval and reuse of presentations and presentation components, a multimedia Retrieval system must support the following types of queries
Multimedia Retrieval Architecture
Query Processing Types of queries
Attribute based queries
association of attributes, including text and numerical attributes which may represent features extracted from the multimedia units
– Retrieval by identifier (index)
– retrieval by conditional statements.
Content based queries
queries over color composition and other image or media characteristics
eg. Query can select all images that contain round shapes. Using pattern recognition operations
Temporal queries
temporal relations among the media units within a presentation.
Retrieve picture on basis of occurrence or non occurrence of certain entity,
eg.look at president shaking hand with PM, queries select video clips stored
Multimedia Retrieval Architecture
XQuery is the language for querying XML data
XQuery for XML is like SQL for databases
XQuery is built on XPath expressions
XQuery is supported by all the major database engines (IBM, Oracle, Microsoft, etc.)
XQuery?
Multimedia Retrieval Architecture
XML document<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
Multimedia Retrieval Architecture
How to Select Nodes From "books.xml"?Functions
XQuery uses functions to extract data from XML documents.
The doc() function is used to open the "books.xml" file:
doc("books.xml")
Multimedia Retrieval Architecture
XQuery uses path expressions to navigate through elements in an XML document.
The following path expression is used to select all the title elements in the "books.xml" file:
doc("books.xml")/bookstore/book/title
(/bookstore selects the bookstore element, /book selects all the book elements under the bookstore element, and /title selects all the title elements under each book element)
The XQuery above will extract the following:
<title lang="en">Everyday Italian</title>
<title lang="en">Harry Potter</title>
<title lang="en">XQuery Kick Start</title>
<title lang="en">Learning XML</title>
Multimedia Retrieval Architecture
XQuery uses predicates to limit the extracted data from XML documents.
The following predicate is used to select all the book elements under the bookstore element that have a price element with a value that is less than 30:
doc("books.xml")/bookstore/book[price<30]
The XQuery above will extract the following:
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
Multimedia Retrieval Architecture
XML based Multimedia RetrievalRecognition of characters to locate names.
Identification of the type of line that represents the state boundaries and the symbol that represents cities.
Definition of a region in which the search for the state boundary should be performed; this requires knowledge of at least one point inside the state, which is obtained by locating the state name.
Search for a closed contour formed by the type of line which represent the state boundaries; if no closed contour is found the system should define a larger area for search, which is done by returning to step 3
Search for symbols which represent the cities near the city name and within the closed contour.
Search for symbols representing objects (such as buildings, parks, lakes, etc.) and identify the objects based on cont
Multimedia Retrieval Architecture
XML Based RetrievalIt then translates the innermost query into the following operations:
Call Window: This routine defines the spatial working area where the search for the tower symbol is performed. The nearest symbol to the transformed coordinates is identified as the symbol of IISc (tower in this case).
Call Connected Component: This routine identifies potential symbols within the windowed area of the map.
Call Symbol: This routine examines the potential symbols and recognizes all the symbols contained within the window.
Call Short/Far: This routine identifies the city symbol nearest to the approximate coordinates as the symbol of the IISc tower.
Multimedia Retrieval Architecture
Image Queries Images are required for:
illustration of text articles, conveying information
or emotions difficult to describe in words,
display of detailed data (such as radiology
images) for analysis,
formal recording of design data (such as
architectural plans) for later use, and so on
Multimedia Retrieval Architecture
Image Queries Types of attributes:
the presence of a particular combination of color, texture or shape features (e.g., green stars);
the presence or arrangement of specific types of object (e.g., chairs around a table);
the depiction of a particular type of event (e.g., a football match);
the presence of named individuals, locations, or events (e.g., the PM greeting a crowd);
subjective emotions one might associate with the image (e.g., happiness).
Multimedia Retrieval Architecture
Video Queries Shortest video are made up of number of distinct scene each of
which can
Be further broken down into individual shots depicting single view, conversation or action
Prepare a storyboard of annotated still images (often known as key frames) representing each scene.
Prepare a series of short video clips, each capturing the essential details of a single sequence – video skimming.
Multimedia Retrieval Architecture
1-Dimensional Objects:
Text and speech objects
Reason - text and audio are to be accessed in a contiguous manner
2-dimensional Objects:
E.g., Image objects - Access to image data can be done with reference to the
spatial locations of objects.
E.g., a query can search for an object that is to the right of or below a specified
object.
3-dimensional Objects:
E.g., Video objects – both spatial as well as temporal characteristics
Access to video can be done by describing the temporal as well as the spatial
content.
E.g., a query can ask for a movie to be shown from 10 minutes after its start.
4-dimensional Objects:
3-D + Time Dimension
E.g., 3D heart-beat visualization – 3D heart image expanding and contracting over
time.
Multimedia Retrieval Architecture
Spatial Represents the way media objects are presented, by specifying the layout of windows on a monitor.
Multimedia Retrieval Architecture
Temporal ModelsDescribe the time and duration of presentation of each media object as well as their temporal relationships to other media objects. Temporal requirements of objects need to be specified and stored along with the database.
Multimedia Retrieval Architecture
Query based on three levels of increasing
complexityLevel 1 comprises retrieval by primitive features such as color, texture, shape or the spatial location of image elements
eg. find images containing yellow stars. This level requires retrieval uses features that are both objective and directly derivable from image itself.
Level 2 comprises retrieval by derived features, involving some degree of logical inference about the identity of the objects in image.
retrieval of objects of a given type; retrieval of individual objects or persons eg. find pic of double decker bus. Queries at this level requires reference to some outside store of knowledge.
Level 3 comprises retrieval by abstract attributes, involving a significant amount of high-level reasoning about the meaning and purpose of the objects or scenes depicted.
retrieval of named events or types of activity; retrieval of pictures with emotional or religious significance-(eg pic of rajasthani folk/depicting suffering). Requires complex reasoning, subjective judgement.
Multimedia Retrieval Architecture
Queries for Video and Images Retrieval Subimage Query:
Given a query image, find the parent image in the database with which it
matches, either as a whole or as a part.
Give location information showing where in the parent the query is positioned
The images may be very high resolution ,The query and target may be at
different resolutions
We want to retrieve all image that contain (k, u,t) query image given image
contains the
k labeled objects and u unlabeled objects, and a tolerance t, retrieve all images
that contain a (k,u,t) subimage which matches the query within tolerance t.
Multimedia Retrieval Architecture
Query
Result
Best matching image with sub-image identified
Query and target image also differ in resolution
Multimedia Retrieval Architecture
Generic search algorithm:Generic algorithm to search for a solution path in
graph.
R-tree search: are tree data structures used for spatial access methods for
indexing multidimensional info like geographical coordinates, rectangles,
polygons. Issue (one or more) range queries on the (k, 1) R-tree, to obtain a
list of promising images (image identifiers)
Multimedia Retrieval Architecture
Clean-up:
For each of the above obtained images, retrieve its corresponding Attributed relation Graph, ARG from the graph file, and compute the actual distance between this ARG and ARG of the original (k, u,t) query. If the distance is less than the threshold t , the image is included in the response set.
Multimedia Retrieval Architecture
Attributed relational graphs• Image content is represented by ARG holding
features of Objects and relationships between objects.
• eg.”find all X rays that are similar to Smith's X-ray”
In ARG objects are represent by graph nodes and relationships between objects are represented by arcs between such nodes
Multimedia Retrieval Architecture
Single Region Based Image Query A single region image query consist of computing
individual queries on color set, region-location queries spatial properties of individual regions, or indexing of region centroids or minimum bounding rectangles are used
Spatial distance between regions given by Euclidean distance
Where (xq, yq) and (xt, yt) are coordinates of 2 points
Multimedia Retrieval Architecture
Single Region Based Image Query Bounded query location The user has flexibility in designating the spatial
bounds for each region in the query within which a target region falls outside of the spatial distance of zero
Multimedia Retrieval Architecture
Region absolute location
fixed query location
The Euclidean distance of centroids
bounded query location Fall within a designed area: dq,t=0
otherwise: the Euclidean distance of the centroids
Multimedia Retrieval Architecture
Index StructureCentroid location spatial access
Spatial quad-tree
Centroids of image regions are indexed using spatial quad tree on x and y values. Quad tree
provides quick access to 2D data points.
Rectangle (MBR (Minimum bounding rectangle)) location spatial access
R-trees
Multimedia Retrieval Architecture
Single Region Based Image QueryArea
-The absolute distance between two regions
Spatial Extent-measure the distance among the width and the height of the
MBRs
-is much simpler than shape information
Multimedia Retrieval Architecture
Single Region Based Image Query Rectangle Location Spatial access – R-trees
The MBR (Minimum bounding rectangle) is the smallest vertically aligned rectangle that completely encloses the regions
Size
Another important perceptual dimension of the regions is their size in terms of area and spatial extent.
Area
The distance in area between two regions is given by the absolute distance
Spatial Extent
distance in MBR width (w) and height (h) between two regions is given by:
Multimedia Retrieval Architecture
Single Region Query Strategy Integrating these approaches, query strategy consist
of weighted sum of the color set, location, area and spatial extent distances.
single region query distance:
Multimedia Retrieval Architecture
Multiple Regions Query• Overall image query strategy consist of joining
queries on individual regions in query image.
Multimedia Retrieval Architecture
Multiple Regions Query Strategy – Absolute Locations For each region in the query positioned by absolute
location, the query strategy outlined for single region query is carried out, without computing the final minimization
List is intersected, best image match minimizes combined region query
Find the image having three regions that best matches
Matches found:
Multimedia Retrieval Architecture
Shaped based Query Processing
Shape Index
For each color region the shape index may be computed as follows:
Compute the major and minor axes of each color region.
Rotate the shape region to align the major axis to X-axis to achieve rotation normalization and scale it such that major axis is of standard fixed length (say 96 pixels).
Place the grid of fixed size (96x96 pixels) over the normalized color region and obtain the binary sequence by assigning 1's and 0's accordingly.
Using the binary sequence, compute the row and column total vectors. These along with the eccentricity form the shape index for the region.
Multimedia Retrieval Architecture
Shaped based Query Processing
Query Process
The query image is processed to obtain a list of matching images based only on color features.
For each color region in the query image, the shape representation of each region is evaluated.
Compare the shape index of regions in the query image to those in the list of images retrieved on color.
Regions with only matching eccentricity within a threshold (t) are compared for shape similarity.
The matching images are ordered depending on the dierence in the sum of the difference in row and column vectors between query and matching image.
Multimedia Retrieval Architecture
Queries for multimedia objects Query Model
A query model for searching multimedia objects in a database or a file needs to satisfy the following requirements:
Consider that a match between the value of an attribute of a multimedia object and a given constant is not exact, i.e., must account for the grade of match.
Allow users to specify thresholds on the grade of match of the acceptable objects.
Enable users to ask for only a few top-matching objects
Multimedia Retrieval Architecture
Queries for multimedia documents Four main phases of query processing:
During the preprocessing phase parsing and catalog access are performed, and also the query is modified in light of the type hierarchy.
The multicluster query resolution phase determines the set of document clusters that must be accessed. Document distribution on the various clusters is transparent to the applications, to evaluate a query it is necessary to determine which clusters contain documents that can potentially satisfy the query.
Once the set of clusters involved in the query is determined, the single-cluster query optimization phase is performed and a query processing strategy is defined for each cluster.
The query execution phase applies the strategies defined in the previous phase.
Multimedia Retrieval Architecture
Queries for multimedia documents Predicates in a query are divided into four classes:
Structure predicates. These predicates are evaluated by accessing the system catalogs.
Index predicates. These predicates are evaluated by using the indexes.
Text predicates. These predicates are evaluated by means of signature scanning.
Residual predicates. These are predicates on components for which there are no access structures and so can only be evaluated by accessing the documents. This is the case for data attributes with no indexes. In addition, predicates defined on spring nodes belong to this class.
Multimedia Retrieval Architecture
Queries for multimedia documents Index query. A query issued against the index segments by using
the access paths provided by the index handler.
Text query. A query issued against the signature segments by using the access paths provided by the signature handler.
Document query. A query issued against the bulk storage segments by using the access paths provided by the bulk storage handler.
Query Preprocessing Phase
Parsing. The query is parsed by a conventional parser.
Catalog Access. After parsing of the query, the definitions of the conceptual types appearing in the query are retrieved from the system catalogs.
Component Checking. If the query contains a type-clause, then the conceptual components present in the query are veried as belonging to the specified types.
Multimedia Retrieval Architecture
Shape based multimedia retrieval Registration: Given two 3D models, align them
optimally; compute the geometric similarity between them;
Retrieval. Given a database of 3D models and a geometric query, find the models that best match the query;
Recognition. Given a database of 3D models and a query model, either find the query model in the database or determine it is not there;
Verification. Given a 3D model and a specification, determine whether they match to within some tolerance;
Clustering. Given a database of 3D models, automatically partition them into a set of classes;
Multimedia Retrieval Architecture
Shape based multimedia retrieval Feature detection. Given a 3D model, find geometric
features of interest on its surface; Classification. Given a set of model class
specifications and a query model, determine the class to which the query model belongs;
Segmentation. Partition a given 3D model into its salient parts;
Semantic labeling. Infer semantic meaning regarding the purpose and function of a given 3D model;
Synthesis. Automatically synthesize new examples typical of a given model class specification;
Multimedia Retrieval Architecture
Indexing and retrieval Used for pdf files Indexing
Each video sample is processed by the text recognition software. For each frame the recognized characters are stored after deletion of all text lines with fewer than 3 characters
Retrieval
Video sequences are retrieved by specifying a search string. Two search modes are supported:
exact substring matching and approximate substring matching.
Multimedia Retrieval Architecture
Shape based multimedia retrieval FIBSSR – Feature Index-based Similar Shape
Retrieval
A general and flexible shape similarity-based approach, enables retrieval of both rigid and articulated shapes.
Spatial Access based Retrieval Methods
Space-Filling Curves a finite precision in the representation of each
coordinate, say, K bits. Address space is a square – image, represented 2k x 2k
array of 1 X 1 squares - pixel.
R-Trees Z-ordering & R-trees and variants
Multimedia Retrieval Architecture
Content based retrieval methods Retrieving stored images from a collection by comparing
features automatically extracted from the images themselves
measures of color, texture or shape
Color retrieval
Each image added to the collection is analyzed to compute a color histogram which shows the proportion of pixels of each color within the image.
Texture retrieval
comparing values of what are known as second-order statistics calculated from query and stored images
Shape retrieval
A number of features characteristic of object shape are computed for every object identified within each stored image
Multimedia Retrieval Architecture
Retrieval using indexing Objects are represented as collections of features Similarity depends on context and frame of reference Features are characterized by multiple multimodal
feature measures Challenges in Indexing
The index must be created using all features of an object class
Nodes in index tree show consistency with respect to the context and frame of reference.
Multiple multimodal feature measures should be fused properly to generate index tree so that a valid categorization can be possible.
Multimedia Retrieval Architecture
Similarity based retrieval Uses similarity measures When presented with a sample facial image,
similarity retrieval occurs in the same way as pattern classification happens using a decision tree.
Retrieval follows the tree down to the leaf nodes. At each level, similarity measures determine the decision.
Using distance as the similarity measure, the index tree selects a node in the next level if d(x,t')=min,d(x,t'), where x is sample image and t' is the template of the jth node.
At the leaf node level, all leaf nodes similar to the sample image will be selected.
Top Related