CBIR06

8/8/2019 CBIR06

1/25

World Wide Web: Internet and Web Information Systems, 6, 131155, 2003

2003 Kluwer Academic Publishers. Manufactured in The Netherlands.

Relevance Feedback and Learning in Content-Based

Image Search

HONGJIANG ZHANG, ZHENG CHEN, MINGJING LI and ZHONG SU [email protected]

Microsoft Research Asia, 49 Zhichun Road, Beijing 100080, China

Abstract

A major bottleneck in content-based image retrieval (CBIR) systems or search engines is the large gap between

low-level image features used to index images and high-level semantic contents of images. One solution to this

bottleneck is to apply relevance feedback to refine the query or similarity measures in image search process.

In this paper, we first address the key issues involved in relevance feedback of CBIR systems and present a

brief overview of a set of commonly used relevance feedback algorithms. Almost all of the previously proposed

methods fall well into such framework. We present a framework of relevance feedback and semantic learning

in CBIR. In this framework, low-level features and keyword annotations are integrated in image retrieval and in

feedback processes to improve the retrieval performance. We have also extended framework to a content-based

web image search engine in which hosting web pages are used to collect relevant annotations for images and

users feedback logs are used to refine annotations. A prototype system has developed to evaluate our proposed

schemes, and our experimental results indicated that our approach outperforms traditional CBIR system and

relevance feedback approaches.

Keywords: image retrieval, relevance feedback, machine learning, web mining

1. Introduction

The popularity of digital images is rapidly increasing due to improving digital imaging

technologies and convenient availability facilitated by the Internet. However, how to find

user-intended images from the Internet is still non-trivial. The main reason is that web

images are usually not well annotated using semantic descriptors. The development his-

tory of image retrieval systems features two stages. The first stage is keyword-based image

retrieval, which is summarized by Chang et al. [2]. Since manual image annotation is

a tedious process, it is practically impossible to annotate all the images on the Internet.

Furthermore, due to the multiplicity of contents in a single image and the subjectivity of

human perception, it is also difficult to make exactly the same annotations to the sameimage by different users. These difficulties have limited the applications of the keyword-

based image retrieval technology. Having been actively researched on in the last decade

[6,30], content-based image retrieval (CBIR) attempts to automate the process of indexing

or annotating image in image databases. CBIR approaches work with descriptions based

on inherent properties of images, such as color, texture and shape. However, despite all

This paper is based on the invited keynote that first author gave in VDB2002, Brisbane, Australia, May 2002.

8/8/2019 CBIR06

2/25

132 ZHANG ET AL.

the research efforts, the retrieval accuracy of todays CBIR algorithms is still very limited.

In addition to many other difficulties, the bottleneck is the gap between low-level image

features and semantic image contents. This problem stems from the fact that visual sim-

ilarity measures, such as color histograms, in general do not necessarily match semantics

of images human subjectivity. Also, each type of visual feature tends to capture only one

aspect of image property and it is usually hard for a user to specify clearly how different

aspects are combined to form an optimal query. To make the problem even worse, people

often have different semantic interpretations of the same image. Even the same person

may have different perception about the same image at different times. To address this

bottleneck, interactive relevance feedback techniques have been proposed. The key idea

is that we should incorporate human perception subjectivity into the retrieval process and

provide users opportunities to evaluate retrieval results and automatically refine queries on

the basis of those evaluations. In the last few years, this research topic has become the

focus in CBIR research community.

Relevance feedback, originally developed for textual document retrieval [16], is a super-

vised active learning technique used to improve the effectiveness of information systems.

The main idea is to use positive and negative examples from the user to improve system

performance. For a given query, the system first retrieves a list of ranked images according

to a predefined similarity metrics. Then, the user marks the retrieved images as relevant

(positive examples) to the query or not (negative examples). The system will refine the

query based on the feedback and retrieves a new list of images and presents to user. Hence,

the key issue in relevance feedback is how to incorporate positive and negative examples

to refine the query and/or to adjust the similarity measure.

In this paper, we present a content-based image retrieval framework that integrate low-

level and semantic-based image similarities and support automated annotation throughlearning from relevance feedback, and the extension of the framework in a web image

search engine. Instead of detailed description of the novel component algorithms, we fo-

cus our description on the key ideas in the framework. Details of the algorithms and the

framework implementation can be found in the reference [4,9,12,23,24]. Also, we want

the paper to serve as a reference on the current state of the art of CBIR relevance feed-

back research, a comprehensive survey is presented in this paper on relevance feedback

algorithms in terms of their natures and limitations.

There are many issues in relevance feedback approaches CBIR, such as learning

schemes, feature selection, index structure and scalability. Instead of giving an exhaus-

tive survey of each published relevance feedback algorithms for CBIR in term of their

advantages and limitations, we focus our discussions with the consideration that relevance

feedback in CBIR is a small sample machine learning problem and extend our descriptionin detail in respect to learning and searching natures of each algorithm. This is presented

in Section 2.

In Section 3, we present the integrated relevance feedback framework for framework

for CBIR. In this framework, while the user is interacting with the system by providing

feedbacks in a query session, a progressive learning process is activated to propagate the

keyword annotations from the labeled images to un-labeled images as the system refines

the retrieval. The knowledge learned in the relevance feedback sessions are accumulated

8/8/2019 CBIR06

3/25

RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 133

in a semantic network. In addition, a cross-modality query expansion scheme is imple-

mented to improve the retrieval performances significantly either a query is initiated with

a keyword or an example image.

The proposed framework has been further extended to a web image search system, as

presented in Section 4. In this extension, we combine visual features and text descriptors

initially extracted from web pages where images exist, such as image URLs, filenames,

page titles, ALT text, hyperlinks, and surrounding text. Those visual and texture features

build the document space model of images. However, the initial text descriptors are in

general are less accurate than annotating text and there are often mismatch between the

page authors expression and the users understanding and expectation of the annotation.

To overcome these problems, we apply the proposed relevant feedback framework. Fur-

thermore, data mining technology is also applied on the users log of feedback to improve

the image retrieval performance in two aspects. Firstly, the original document space model

built from the images and the text content of the web pages can be analyzed to detect and

removeclutter and irrelevant text information. Secondly, the user space model, which is the

keyword vectors used by the users to represent images in the database, can be constructed

from the users log data of relevance feedback. The user space model is then combined

with the document space model to eliminate mismatch between the page authors expres-

sion and the users understanding and expectation.

2. Relevance feedback algorithms

In this section, we review a set of relevance feedback approaches used in CBIR. The review

is focused on learning and searching natures of each relevance feedback algorithm as we

consider that relevance feedback in CBIR is a machine learning problem. We begin the

discussion by first providing an overview of classical relevance feedback approaches in

CBIR.

2.1. Classical algorithms

The early relevant feedback schemes for CBIR were mainly adopted from those developed

for classical textual document retrieval. These approaches can be classified into two ap-

proaches: query point movement (query refinement) and reweighting (similarity measure

refinement) [1]. Both of them were developed based on vector space model, the most

popular model used in information retrieval [20].

The query point movement method essentially tries to improve the estimate of the ideal

query point by moving it towards positive example points and away from bad example

points in the query space. There are various ways to update the query. The frequently used

technique to iteratively improve this estimation is the Rocchios formula given below for

sets of relevant documents DR and non-relevant documents DN given by the user [16]:

Q = Q +

1

NR

iDR

Di

1

NN

iDN

Di

, (1)

8/8/2019 CBIR06

4/25

134 ZHANG ET AL.

where , , and are suitable constants; NR and NN are the number of documents in

DR and DN, respectively. This technique is also referred as learning query vector. It was

implemented in the MARS system [18] by replacing the document vector with visual fea-

ture vectors. Experiments show that retrieval performance can be improved considerably

by using such relevance feedback approaches.

The basic idea behind the re-weighting method is to enhance the importance of the di-

mensions of a feature that help in retrieving the relevant images and reduce the importance

of those dimensions that hinder this process. This is achieved by updating the weights of

feature vectors in the distance metric. Considered a weighted metric defined as

D =

j [N]

jX

(1)j

X(2)j

. (2)

When an image of the query result is labeled as a positive example, the feature components

that contribute more similarity to the match is considered more important, while the com-

ponents with less contribution is considered to be less important. Therefore, the weight for

a feature component, i , is updated in the following way:

i = i

1 + i

, =f(Q) fA+j , (3)

where is the mean of . On the other hand, if an image is labeled as a negative exam-

ple, the feature components that contribute more to the match should be considered to be

depressed. That is, the weight is updated as:

i = i 1 + i. (4)This technique is also referred as learning the metric. This approach was implemented

proposed by Huang et al. [7]. The MARS system implemented a slight refinement to the

re-weighting method called the standard deviation method [18].

Instead of updating the individual components of a distance metric, we can also begin

with a set of predefined distance metrics and use relevance feedback to automatically select

the best one in the retrieval process. For instance, in ImageRover system [21], appropriate

Lp Minkowski distance metrics are automatically selected to minimize the mean distance

between the relevant images specified by the user.

Another relevance feedback approach, proposed by Minka and Picard, is to update the

query space by selecting feature models. It is assumed that each feature model has its

own strength in representing a certain aspect of image content, and thus, the best way for

effective content-based retrieval is to utilize a society of models. This approach uses alearning scheme to dynamically determine which feature model or combination of models

is best for subsequent retrieval.

Recently, more computationally robust methods that perform global feature optimization

have been proposed. The MindReader retrieval system designed by Ishikawa et al. [8]

formulates a minimization problem on the parameter estimating process. Unlike traditional

retrieval systems whose distance function can be represented by ellipses aligned with the

coordinateaxis, the MindReader system proposed a distance function that is not necessarily

8/8/2019 CBIR06

5/25


aligned with the coordinate axis. Therefore, it allows for correlations between attributes in

addition to different weights on each component.

A further improvement over the MindReader approach is given in [17]. In this ap-

proach, optimal query estimation and weighting functions are derived in a unified frame-

work. Based on the minimization of total distances of positive examples from the revised

query, the weighted average and a whitening transform in the feature space were found to

be the optimal solutions. In more detail, assume that a query vector component qi corre-

sponds to the ith feature, an N element vector r = [r1, . . . , rN] represents the degree ofrelevance for each of the N input training samples, and there is a set of N training vec-

tors xni for each feature i. It is derived that the ideal query vector qi for feature i is the

weighted average of the training samples for feature i given by

qTi =RTXiN

n=1 rn, (5)

where Xi is the N Ki training sample matrix for feature i, obtained by stacking the Ntraining vectors xni into a matrix. It is interesting to note that the original query vector qidoes not appear in (5). This shows that the ideal query vector with respect to the feedbacks

is not influenced by the initial query.

The optimal weight matrix Wi is given by

Wi =

det(Ci )1/Ki

C1

i , (6)

where Ci is the weighted covariance matrix of Xi . That is,

Cirs =N

n=1n(xnir qir )(xnis qis )N

n=1 n, r, s = 1, . . . , Ki . (7)

We can see from the above equations that the critical inputs into the system are training

vectors xni and the relevance matrix R. In this algorithm, initially, the user needs to input

these data to the system. Another issue with this algorithm is that negative examples are

not utilized in updating of query and similarity.

2.2. Relevance feedback as a learning process

Relevance feedback can be considered as a leaning problem a user provides feedback

examples from the retrieval results of a query and system learns from such examples to

refine retrieval results. The original query-movement method represented by the Roc-chios formula and reweighting method [16] is both simple learning methods. According

to Mitchells [15] definition, machine learning are concerned with the question of how to

construct computer programs that automatically improve with experience. In this view, any

task that could be improved with respect to certain performance measure based on some

experience can be considered the machine-learning task. In CBIR, relevance feedback is

a task to improve the retrieval performance and the experience here is feedback examples

provided by the users. Hence, classical machine-learning methods, such as decision tree

8/8/2019 CBIR06

6/25

136 ZHANG ET AL.

learning [13], artificial neural networks [10], Bayesian learning [5,27], and kernel based

learning [26] can be and have been applied to relevance feedbacks in CBIR. However, as

users are usually reluctant to provide a large number of feedback examples, the number of

training samples is very small, typically less than ten in each round of feedback session. On

the contrary, feature dimensions in CBIR systems are usually high. Hence, the crucial is-

sue in performing relevance feedback in CBIR systems is how to learn from small training

samples in a very high dimension feature space. This fact makes many learning methods,

such as decision tree learning and artificial neural networks, not suitable for CBIR.

The key issues in addressing relevance feedback in CBIR as a small sample learning

problem include: How to learn fast from small sets of feedback samples to improve re-

trieval accuracy effectively; How to accumulate knowledge learned from feedback; and

How to integrate low-level visual and high-levelsemantic features in query. However, most

of the published works have been focused on the first issue. Compared with other learn-

ing methods, Bayesian learning shows its advantages in addressing the first issue above

and almost all aspects of Bayesian learning have been touched in researching for effective

learning algorithms.

Vasconcelos and Lippman [27] treated feature distribution as a Gaussian mixture and

used Bayesian inference for learning during feedback iterations in a query session. Richer

information captured by the mixture model also makes image regional matching possible.

The potential problems of their methods are computing efficiency and complex data model

that leads to too many parameters need to be estimated with very limited samples.

To speed up the learning process so the retrieval result can be converged faster to users

satisfaction, active learning methods have been used to actively select samples in order to

achieve the maximal information gain, or the minimized entropy/uncertainty in decision-

making. The approached proposed in [5] used Monte Carlo sampling in search of theset of sample that will minimize the expected number of future iterations. In estimating

the expected number of future iterations, entropy is used as an estimate of the number of

future iterations under the ambiguity specified by the current probability distribution of the

target image over all test images. Tong and Chang [26] proposed a SVM active learning

algorithm to select the sample to maximally reduce the size of the vector space in which

the class boundary lies. Without knowing a priori the class of a candidate, the best strategy

is to halve the search space each time. They attempted to justify that selecting the points

near the SVM boundary can approximately achieve this goal, and it is more efficient than

other more sophisticated schemes, which require exhaustive trials on all the test items.

Therefore, in their work, the points near the SVM boundary are used to approximate the

most-informative points; and the most-positive images are chosen as the ones farthest from

the boundary on the positive side in the feature space.Some researchers consider relevance feedback process in CBIR as a pattern recognition

or classification problem. Under such a consideration, the positive and negative exam-

ples provided by user can be treated as training examples and a classifier could be trained.

Then, such classifier can separate all data set into relevant and irrelevant groups. It seemed

that many existing pattern recognition tools could be adopted for this task and many kinds

of classifiers have been experimented, such as linear classifier [29], nearest-neighbor clas-

sifier [28], Bayesian classifier [24], support vector machines (SVM) [26], and so on. In

8/8/2019 CBIR06

7/25


this category, the most popular algorithm is represented by [26] where SVM classifier is

trained to divide the positive and negative examples. Then such SVM classifier will clas-

sify all images in database into two groups: relevant and irrelevant groups to a given query.

However, in most cases of CBIR, there is no predefined class structure. From application

point of view, such classification-based methods may improve the retrieval performance in

some constrained contexts; but they will be limited when applied to general purpose image

databases.

2.3. Feature versus semantics in relevance feedback

All the approaches described in above perform relevance feedback at the low-level featurevector level by basically replacing keywords with features when adopting the vector space

model developed for document retrieval. While these approaches do improve the perfor-

mance of ICBR, there are severe limitations. The inherent problem is that the low-level

features are often not as powerful in representing complete semantic content of images as

keywords in representing text documents. Furthermore, users often pay more attention to

the semantic content (or a certain object/region) of an image than to the background and

other, the feedback images may be similar only partially in semantic content, but may vary

largely in low-level features. Hence, using low-level features alone may not be effective in

representing users feedbacks and in describing their intentions.

In addition, there are typically two different modes of user interactions involved in image

retrieval systems. In one case, the user types in a list of keywords representing the semantic

contents of the desired images. In the other case, the user provides a set of examplesimages as the input and the retrieval system will retrieve other similar images. In most

image retrieval systems, these two modes of interaction are mutually exclusive. However,

combining these two approaches and allowing them to benefit from each other will yield a

great deal of advantage in terms of both retrieval accuracy and ease of use of the system.

There have been efforts on incorporating semantics in relevance feedback for image

retrieval. The framework proposed in [11] (to be discussed later in more detail in this

section) attempted to embed semantic information into a low-level feature based image re-

trieval process using a correlation matrix. The FourEye system by Minka and Picard [14]

and the PicHunter system by Cox et al. [5], made use of hidden annotation through learn-

ing process. However, they excluded the possibility of benefiting from good annotations,

which may lead to a very slow convergence.

In terms of feature selection, unlike most CBIR systems that use image features such

as color histogram or moments, texture, shape, and structure features, Tieu and Viola [25]

used a boosting technique to learn a classification function in a feature space of more than

45,000 features. The features were demonstrated to be sparse with high kurtosis, and were

argued to be expressive for high-level semantic concepts. Weak 2-class classifiers were

formulated based on Gaussian assumption for both the positive and negative (randomly

chosen) examples along each feature component, independently. The strong classifier is

then a weighted sum of the weak classifiers as in AdaBoost.

8/8/2019 CBIR06

8/25

138 ZHANG ET AL.

The framework to be discussed in Section 3 integrates both semantics and low-level

features into the relevance feedback process in a new way. Only when the semantic infor-

mation is not available, the method is reduced to one of the previously described low-level

feedback approaches as a special case.

2.4. Relevance feedback with memory

A disadvantage in the classic relevance feedback as well as many learning based

approaches discussed above is that the captured knowledge in the relevance feedback

processes in one query session or one learning step is not memorized to continuously im-

prove the retrieval accuracy. That is, even with the same query, a user will have to go

through the same, often tedious, feedback process to obtain the same result, despite the

fact the user has given the same query and feedbacks before. Strictly speaking, there is no

learning or only limited learning in such systems as there is no knowledge accumulation

across different query sessions. To overcome these limitations, another school of ideas is to

using learning approaches to memorize users subjectivities in relevance feedback process.

The challenge in this approach is how to memorize knowledge learned and how to handle

the inconsistency of content subjectivities across difference users and/or across different

query sessions of the same user.

The approach proposed in [11] was the first attempt to explicitly memorize learned se-

mantic information to improve CBIR performance. The basic idea of this approach is to

accumulate semantic relevance between image clusters learnt from users feedback in cor-

relation network. In other words, a correlation network is used to memorize. Figure 1

illustrates the correlation network. Mathematically, the correlation network is representedby a correlation matrix, M, defined as below:

M =

w11 w12 . . . w1Nw21 w22 . . . w2N

......

. . ....

wN1 wN2 . . . wNN

, (8)

where the weight or coefficient, wij , represents the semantic correlation between images

in cluster i and j .

The system works a follows. First, all images in a database are clustered into N clusters

based on visual feature similarity using, for instance, k-means algorithm. Obviously, the

images in each cluster initially are only similar in term of the selected visual features, like

in a typical CBIR system. Also, initially, all correlation coefficients between each two

clusters are set to zero, meaning only images within the same cluster are correlated and

images across clusters are uncorrelated. That is, the initial matrix is a unit one,

M0 = INN. (9)

Then, for a given query, the initial retrieval is based on visual features. Assume that after

a given iteration, n + m images are displayed, and n images are marked relevant and

8/8/2019 CBIR06

9/25


Figure 1. Correlation network to memorize semantic correlations between image groups.

m irrelevant. The relevant as well as irrelevant images may or may not be from difference

clusters. This approach memorizes such feedbacks by updating the correlation matrix as

below:

Mt = Mt1 +

mi=1

F (q)F (pi )T

ni=1

F (q)F (ni )T, (10)

where q is the feature vector of the query, pi and ni are feature vectors of positive and

negative feedback samples, and F(x) is a transform function used to determine the update

magnitude based on the feedback samples. In this way, the correlation between the clus-

ter where the query original falls in and these the positive samples fall in are increased,

progressively embedding the information on semantic correlations between images. This

correlation is then used in subsequent retrievals in which not only the visual features, but

also the semantic correlations are used in determine the similarity of an image to the query.

Experiments have shown that such a progressive learning approach effectively utilizes the

knowledge learnt from previous queries to reduce the number of iterations to achieve high

retrieval accuracy [11].

8/8/2019 CBIR06

10/25

140 ZHANG ET AL.

Also, if there are two distinct groups in one initial cluster which semantically dissimilar,

meaning that they are negative examples to each other, a splitting is performed to spit the

initial cluster into two clusters. On the other hand, based on feedbacks, when two clusters

that are close in features space and have high correlation between them according to M, the

two initial clusters could be merged into one. That is, the correlation network dynamically

updates its structure in addition to updating the correlation matrix as learning from user

feedback.

2.5. Log mining in relevance feedback

More recently, people are aware of the fact that the Web is a rich resource of image data and

some of their semantics is usually available on the same web documents. Shen et al. [22]exploit such reality and use some natural language processing technique to obtain semantic

features from the web text to characterize the web images. Hence, they are able to find

relevant images from the web using text-based queries. In our work of web image search

engine, we also use the web pages as the potential sources of semantics. There are two

kinds of difference for two systems. First difference is in the natural language processing

approach to obtaining semantic features. They use a so-called weighted chain-net, which

is actually a lexical chain, to represent the document space model for images, while our

document space model of all media objects is simply a vector space model, which is an

effective approach and has widely been used in traditional information retrieval. Other

natural language processing methods, such as, proper noun identification, are also used to

extract semantic features. Another difference is that our system exploits relevant feedback

and data mining on the users feedback logs to update the document space model. So ourapproach outperforms traditional CBIR system and relevance feedback approaches.

3. An integrated relevance feedback framework

As discussed in Section 2, an effective relevance feedback system should provide effective

solutions to learning effectively from small sets of feedback samples, accumulating learned

knowledge and integrating low-level visual and high-level semantic features in query and

feedbacks to achieve high retrieval accuracy.

In addition, there typically are two different modes of user interactions involved in image

retrieval systems. In one case, the user types in a list of keywords representing the semantic

contents of the desired images. In the other case, the user provides a set of examples images

as the input and the retrieval system will try to retrieve other similar images. In most imageretrieval systems, these two modes of interaction are mutually exclusive. We argue that

combining these two approaches and allow them to benefit from each other yields a great

deal of advantage in terms of both retrieval accuracy and ease of use of the system.

To address all of above-mentioned issues, a CBIR framework with integrated relevance

feedback and query expansion was proposed [9,12,23,24]. Figure 2 illustrates the proposed

CBIR framework. It consists of a semantic network which links images to semantic an-

notations in a database, a similarity measure that integrating both semantic features and

8/8/2019 CBIR06

11/25


Figure 2. The proposed framework of integrated relevance feedback and query expansion.

Figure 3. Semantic network.

image features, and a machine learning algorithm to iteratively update the semantic net-

work and to improve the systems performance over time. The system supports both query

by keyword and query by image example through semantic network and low-level fea-

ture indexing. More importantly, the learning process propagates the keyword annotations

from the labeled images to unlabeled ones during the feedback. In this way, more and more

images are implicitly labeled by keywords by the semantic propagation process. This an-

notation propagation process also helps the system in accumulating knowledge learned to

improve performance of future retrieval requests.

3.1. Semantic network

The semantic network is a two-layered structure. The top layer is represented by a set of

keywords having links to the images in the database. It can be considered an extension

of the initial information embedding idea in the system shown in Figure 1. The degree

of relevance of the keywords to the associated images semantic content is represented as

the weight on each link, as shown pictorially in Figure 3. This layer is what we need in

8/8/2019 CBIR06

12/25

142 ZHANG ET AL.

keyword relevance feedback and will be updated during the semantic propagation. Bottom

layer is a keyword thesaurus to construct the connection between different keywords.

The initial weights can be obtained by manual labeling. In our web image search engine,

they are initially extracted from the following sources on the web page that contains the

image based according to some empirical rules.

1. Image filename and URL. We assume that web page authors/editors usually assign

meaningful filenames to images in a web page. Some heuristic rules are used to extract

the keywords from the filenames. First, the filename is segmented into meaningful key-

words based on pre-define dictionary. For example, filename redflower.jpg includes

two semantic words: red and flower. Then, the clutter letters in filenames, such

as digits, hyphens, filename extension, etc., are discarded. We also extract semantic

keywords from the URL of the image files. The URL usually represents the hierar-chy information of an image on the web page. For instance, animal and bird are

useful information in the URL http://www.ditto.com/images/animals/

anim_birds.jpg . We apply the similar technology of the filename segmentation to

segment the URL into meaning pieces.

2. ALT (alternate) text. The ALT text in a web page is used for displaying to replace the

associated image in a text-based browser. Hence, it usually represents the semantics

of the image concisely; hence, it is a very relevant feature to represent the semantic

meaning of the images.

3. Surrounding text. In web pages, images are used to enhance the content that the editors

want to present. Hence, some texts in the surrounding areas are semantically relevant

to the content of the image. However, it is difficult to judge which area among all

of the four possible areas (above, below, left, right) is the most relevant to the image.Therefore, in our prototype, all of the four areas are chosen as the sources of the text

features for the image. This feature will be refined by log mining on the users relevant

feedback logs as discussed in Section 4.

4. Page title. The page title is a good candidate of the text feature of images in a web page.

5. Other information. Image hyperlinks, anchor text, etc., are also candidates of text fea-

tures of the images.

The initial value of weight wij associated with each keyword of an image is calculated

by the TF*IDF method [19]. That is, a feature vector is used to represent the all keywords

of an image and the vector is defined as

Dih = TFi IDFi

=

ti1 log

Nn1

, . . . , t ij log

Nnj

, . . . , t im log

Nnm

, (11)

where Dih is the feature vector, with each component value corresponding to the initial

weight assigned to the association of a keyword to an image i. tij stands for the frequency

of keyword j appearing in the text description of the image i. nj is the number of images

that are characterized by keyword j . N is the total number of images. Of course, if no

keyword information to the image, the corresponding feature vector is set to null.

8/8/2019 CBIR06

13/25


With the semantic network, semantic based relevance feedback can be performed rela-

tively easily compared to its low-level feature counterpart. This is performed by updating

the weights wij associated with each link shown in Figure 3. The weight updating process

is described below.

1. A user submits a query and the system retrieves similar images using cross-modality

query extension, to be explained later in next subsection.

2. System collects the positive and negative feedback examples corresponding to the

query.

3. For each keyword in the input query, check to see if any of them is not in the keyword

database. If so, add them into the database without creating any links.

4. For each positive example, check to see if any query keyword is not linked to it. If so,

create a link with an initial weight from each missing keyword to this image. For allother keywords that are already linked to this image, increase the weight by a predefined

value or using the method defined by (10) and (11).

5. Similarly, for each negative example, check to see if any query keyword is linked with

it. If so, decrease its weight, until it is zero.

Through this updating process, the keywords that represent the actual semantic content

of each image will receive a larger weight. Also, it can be easily seen that as more queries

are inputted into the system, the system is able to expand its vocabulary. Furthermore,

a semantic propagation method is used to populate keywords to unlabeled image during

users feedback iteration, which will be described later in this section.

3.2. Integrated and cross modality query and retrieval

The proposed framework has an integrated relevance feedback scheme in which both low-

level feature based and high-level semantic feedbacks are performed. We define a unified

metric function G to measure the relevance between query Q and any image j within an

image database in terms of both semantic and low-level feature content, where Q includes

the original query and users feedback information:

G(j,Q) = simk(j,Qk) + (1 ) simf(j,Q

f), (12)

where [0, 1] is the weight of the semantic relevance in the overall similarity measure,which can be specified by users. The larger is, the more important the semantic rele-

vance will play in the overall similarity measurement. simf(j,Qf) and simk(j,Q

k) are the

semantic similarity and low-level feature similarity between image j and revised query Q

,respectively.

The revised query Q consists of two parts: the feature-based one Qf and the semantic

(keyword)-based one Qk. Qf is defined by (3)(5) based on feature vectors of feedback im-

ages. With the semantic network, simk(j,Qk) can be directly computed with the updated

weights.

To further improve the retrieval performance of the proposed framework, a cross-

modality query expansion method is supported. That is, once a query is submitted in

8/8/2019 CBIR06

14/25

8/8/2019 CBIR06

15/25


3.3. Probabilistic keyword propagation scheme

As illustrated in Figure 3, the more images are annotated (correctly), the better the system

retrieval performance will be. However, the reality is human labeling of images is tedious

and expensive, hence not a feasible solution, which was what motivated CBIR research

fifteen years ago. To address this issue, a probabilistic progressive keyword propagation

scheme is proposed in our framework to automatically annotate images in the databases in

the relevance feedback process utilizing based a small percentage of annotated images.

We assume that initially only a few images in a database have been manual labeled with

keywords and the retrieval is performed mainly based on low-level features. As stated be-

fore, the initial keywords annotation can be from web through the crawler when the images

are from the Web, or labeled by humans. While the user is interacting with the system by

providing feedbacks in a query session, a progressive learning process is activated to prop-

agate the keyword annotation from the labeled images to un-labeled images so that more

and more images are implicitly labeled by keywords. In this way, the semantic network is

updated in which the keywords with a majority of user consensus will emerge as the dom-

inant representation of the semantic content of their associated images. As more queries

are inputted into the system, the system is able to expand its vocabulary. Also, through

the propagation process, the keywords that represent the actual semantic content of each

image will receive a large weight.

There are two major issues in keyword propagation: which images and which key-

word(s) should be propagated during a query session. To answer the first question, a

probability model, based on Bayesian learning, is proposed. We assume that, (1) all pos-

itive examples in one retrieval session belong to the same semantic class with common

semantic object(s) or meaning(s); and (2) the features from the same semantic class fol-lows the Gaussian or Mixture Gaussian distributions. Therefore, all positive examples in a

query session are used to calculate and update the parameters of the corresponding seman-

tic Gaussian classes. Then, the probability of each image in the database belonging to such

semantic class is calculated. The common keywords in positive examples are propagated

to the images with very high probability belonging to this class.

As we can see, the propagation framework uses the same procedure as the feedback algo-

rithm in low-level features [23]. The only difference is that for low-level feature feedbacks,

the calculated probability is used for the ranking of an image in retrieval candidate list,

while here it is used to determine if an image should be in the propagation candidate list.

The propagation candidate set S is obtained as follows:

S = {c1, . . . , ck}, where p(cj ) > , (14)

where p(cj ) is the probability that image j in the database belonging to such semantic

class and is a constant threshold which can be estimated by the training process. The

weight associates with the propagated keyword i and the image j is wij = p(cj ). Morecomplex distribution model, for example, Mixture Gaussian, may be used in this propaga-

tion framework. However, because the users feedback examples in practice are often very

few, complex models will leads into much more parameter estimation errors as there are

more parameters to be estimated.

8/8/2019 CBIR06

16/25

146 ZHANG ET AL.

Also, to determine which keyword(s) should be propagated when an image is associated

with multiple keywords, there two approaches: using relevance factor defined by (13), or

using region-based approach [9]. In the former approach, the relevance factor rij can be

directly used to modify the weight with the propagated keyword. Obviously, the lower the

relevance of a keyword to an image is, the less weight it will be assigned to the keyword

in the prorogation, and vice versa. When the region-based approach is used, unlabeled

images to be propagated are firstly segmented into regions. By analyzing the feature distri-

bution of the segmented regions, a probability association between each segmented regions

and annotated keywords is set up for labeled images by region-based relevance feedback

approach. Then, each keyword of labeled image was assigned to one or several regions of

the image with certain probabilities. The detail of the region-based feedback framework is

in [9].

3.4. Experiment results

The image set used in evaluating the proposed framework described in this section is the

Corel Image Gallery of 10,000 images, manually labeled into 79 semantic categories.

200 random selected images compose the test query set. Whether a retrieved image is

correct or incorrect is judged according to the ground truth. Three types of color features

and three types of texture features are used in our system. Feedback process is running as

follows. Given a query from the test set, a different test image of the same category as the

query is used in each round of feedback iteration as the positive example for updating the

Gaussian parameters and revise the query. To incorporate negative feedback, the first two

irrelevant images are assigned as negative examples. The accuracy is defined as

Accuracy =relevant images retrieved in top N returns

N. (15)

Several experiments have been performed as follows. First, three feature-based feedback

algorithms are compared. They are: a Bayesian feedback scheme by Su et al. in [23,24],

the scheme by [27] and scheme by [17] as defined by (5)(7). This comparison is done

in the same feature space. Figure 4 shows that the accuracy of Bayesian feedback scheme

(referred as our feedback approach) becomes higher than the other two methods after

two feedback iterations. This demonstrates that the incorporated Bayesian estimation with

the Gaussian parameter-updating scheme is able to improve retrieval effectively.

To demonstrate the performance of the semantic propagation, the following experiment

was designed. 200 images in the query set were annotated by their category names. Soonly one keyword is associated to one query image and other images in database have no

keyword annotations. During the test, each query image was used twice. The retrieval

performance is shown in Figure 5 with comparison to that with the propagation. It is seen

that for feedback with propagation, the retrieval accuracy is much higher than the original

one without it. This is because, when a system has propagation ability, latter queries can

utilize the accumulated knowledge from previous feedback iterations. In other words,

system has the learning ability and will be smarter with more users interactions.

8/8/2019 CBIR06

17/25


Figure 4. Retrieval accuracy for top 100 results in original feature space.

Figure 5. Retrieval accuracy for top 100 results performance between feedback without propagation and feed-

back with propagation scheme.

4. Incorporating log mining in web image search engine

The architecture of our proposed web image search engine is shown in Figure 6. In addition

to all components in a CBIR system, the web search engine contains an image crawler and

three other modules, namely, the log miner, the model updater, and the query updater [3,4].The data organization of the system mainly consists of four parts: the image database that

also contains metadata of images (i.e., low-level and high-levelfeatures), the users relevant

feedback log database, the document space model, and the user space model.

A typical scenario of the system is as follows. The off-line crawler is first employed at

regular intervals (e.g., once every day at non-peak network traffic hours) to collect potential

web pages containing images and store them into a local database. The feature extractor is

then applied to these pages to extract both the low-level visual features and the high-level

8/8/2019 CBIR06

18/25

148 ZHANG ET AL.

Figure 6. Architecture of the proposed web image search engine.

semantic features for the images appear in these pages. In our system, the crawler and the

feature extractor actually work simultaneously. An image indexer is applied to the images

and their features to build the document space model, which is the representation of the

images in the database using their features. Once the document space model is available,the matcher compares the users query with the document space model of images to yield

the image retrieval results. Since many irrelevant images may be returned by the retrieval

system, the user feedback interface is also provided for users to specify whether a returned

image is relevant or not to the users intents. The image retrieval system can utilize user

feedbacks to gain an understanding as to the relevancy of certain images and update the

query or adjust the matcher to return more accurate retrieval results. The users feedback

log data are also stored in the user log database in the system, from which the log miner

8/8/2019 CBIR06

19/25


can find and build the user space model through log analysis. The user space model is

then combined with the document space model to update the document space model to

eliminate the mismatch between the page authors expression and the users understanding

and expectation and can further improve the retrieval accuracy.

4.1. Document modeling of images

The document space model in the image search engine combines the low-level visual fea-

tures and high-level semantic features to index the images on the web. The detail process

is described as follows.

To collect images in the web, a crawler (or a spider, which is a program that can automat-

ically analyze the web pages and download the related pages hyper-linked to the analyzedweb pages) is used to collect images from many web sites. First, we re-arrange the semantic

network shown in Figure 3 a concept hierarchy of image categories, such as animals, ar-

chitecture, arts, etc. Then, we select some representative sites to be collected for each

concept category. For instance, http://www.nba.com for sports, http://www.

cnn.com for news, http://www.disney.com for entertainment, etc. For each site

candidate, the crawler collects the images and saves it to a local web page database. We

then use a simple classifier to classify the images into meaningful and junk (e.g., ban-

ners, backgrounds, buttons, icons, etc.) categories based on certain information like color

histograms, image sizes, image file types, etc.

For each image collected, the initial keywords are assigned in the way as described in

Section 3.1. In addition, the low-level features of each image are calculated. The keywords

and low-level features of all collected images form the document space.In the image search process, the overall similarity is simply the linear combination of the

visual and the textual similarities, as defined in (12). It is not a good idea to set the same

default weight = 0.5 in (12) to balance the importance of low-level features and high-level features. However, it is very efficient for us to build up the baseline configuration

of our image retrieval system. The weight is automatically adjusted to a suitable value

by the system through the users feedback as to the relevancy of certain returned images.

Moreover, after we collect enough user log information of user feedback, data mining

technology (which will be presented in the next section) can be applied to find out the

importance of low-level feature and high-level feature for different concepts/categories.

For example, we find that for concept Clinton, the high-level features are more important

than the low-level features, while for concept sunshine, the low-level features are more

useful than the high-level features.

4.2. Log mining and feedback

In order to reduce the ambiguity in the text descriptors extracted from web pages and

the low-level image features, and to improve the search performance, we have proposed

a user space model to supplement the original document space model. This is achieved

by applying a user log analysis process. The user space model is also a vector space

8/8/2019 CBIR06

20/25

150 ZHANG ET AL.

model. The difference between the user space model and the document space model is that

vectors in the user space model are constructed from the information mined from the user

feedback log data, not from the original information extracted from the web pages. When

a user submits a query, our system will return to the user some images found based on the

original document space model. The user can then use the feedback user interface to tell

the system about the return images as whether relevant or irrelevant to the query based on

his/her subjective judgment. Of course, most users do not have the patience and time to

mark all relevant and irrelevant images in the returned image collection. However, this is

not a very serious problem because even a small set of feedback images can provide very

useful information.

After we get some users feedback log data, the user space model can be built from the

user log. Let Q be the set of total queries used until now. Let Tj

(j = 1, . . . , N T

) be the

set of all individual words that appear in Q. (Note that a singe query may contain multiple

words.) For a query in Q, Iri is one of the relevant images and Iii is one of the irrelevant

images specified by the user and stored in the user log.

From the user log, we can easily calculate the probabilities listed below:

P (Iri ) =Nri

NQ, (16)

where Nri is the number of query times that image Iri has been retrieved and marked as

relevant, and NQ is the total number of queries.

P (Iri |Tj ) =Nri (Tj )

NQ(Tj ), (17)

where Nri (Tj ) is the number of query times that image Iri has been retrieved and markedas relevant for those queries that contain word Tj , and NQ(Tj ) is the number of queries

that contain Tj .

P (Tj ) =NQ(Tj )

NQ. (18)

Based on the Bayesian theory, we have

P (Tj |Iri ) =P (Iri |Tj )P(Tj )

P (Iri ). (19)

In addition, for irrelevant images in the user log, we have

P (Iii |Tj ) =

Nii (Tj )

NQ(Tj ) , (20)

where Nii (Tj ) is the number of times that image Iii has been retrieved and marked as

irrelevant for those queries that contain word Tj .

For a given image I, P (Tj |I) (j = 1, . . . , N T) calculated using (19) also form a vectorfor I. We call this vector the user space model of image I, compared to the document

space model of image I, which is built from the related features extracted from the web

pages.

8/8/2019 CBIR06

21/25


If we have a large collection of user log data, it is reasonable to say that the information

in the user space model is more accurate than the information in the original document

space model. However, as we have previously stated, few users like to tag all relevant and

irrelevant images in the retrieval result. Hence, the user feedback log is usually not enough

and this causes the user space model to be not as comprehensive as the original document

space model. Therefore, we cannot replace the document space model with the user space

model completely. We choose to integrate the user space model into the original document

space model to improve the accuracy of the final document space model.

For each image I, vector U is the feature in the user space model, and vector D is thetextual feature in the document space model. We simply use the linear combination method

to integrate these two vectors. We use Dnew to denote the updated document space model,

which is calculated using

Dnew = U + (1 ) D, (21)

where is used to adjust the weight between the user space model and the document space

model. Actually, is the confidence of the vector U in the user space model. In ourapproach, if the vector in the user space model is accurate and comprehensive enough, we

can assign with a value very close to 1.0. If the vector in the user space model is not

accurate and not comprehensive enough, the value of should be relatively small. The

times that an image is marked in the feedback by the user can be used to determine the

value of for this image. Obviously, if an image is marked in user feedback more times

than another image, the feedback information of this image should be more accurate and

comprehensive than the other image. The confidence of its vector U in the user space

model should thus be higher for this image than that for the other image and we can assigna bigger for this image than for the other image.

Since irrelevant images are also recorded in the user feedback log, we can also utilize

this information. For each irrelevant image Iii , we use P (Iii |Tj ) as the confidence that Iiiis irrelevant to query Tj and form a vector I. We denote Dfinal to the text feature vectorof the image in the final document space model and calculate it using (22), similar to the

TFIDF method:

Dfinal = Dnew

1 I

. (22)

4.3. Experiments

Based on the proposed architecture, a demo system of image search engine, called iFind,

has been developed in Microsoft Research Asia. The graphic interface is shown in Figure 7.

The search options that iFind supports include:

Keyword-based search. One can type in one or more keywords, such as girl, in thetextbox and start the retrieval. One will see some images displayed in several pages in

the browse mode.

8/8/2019 CBIR06

22/25

152 ZHANG ET AL.

Figure 7. iFind user interface.

Query by example. If the Similar hyperlink under an image is selected, the systemwill retrieve some similar images that are semantically/visually similar to the example

image.

Relevance feedback. The system will improve the performance of retrieval after the userprovides some positive and/or negative examples. One is promised to get much better

result after several iterations of feedback.

Log mining. The retrieval performance of the system will be greatly improved afteroff-line log mining process. The user could benefit from other users usages.

To illustrate improvement brought by log mining in image search, we show here some

evaluation results based on three system configurations: (1) the baseline system, which

provides only query and retrieval; (2) the feedback system, which can provides user feed-

back as well as the baseline functionality; (3) the full configuration including user log

mining.

In our experiments, we have selected more than 2000 representative image websites. Theintelligent crawler is used to collect the images from these hyperlinks. All related semantic

features, including image filenames, ALT texts, surrounding texts, and page titles, as well

the low-level visual features are also extracted using the feature extractor at the same time.

The images are stored in the database and indexed with their textual and visual features. In

total, we have collected more than 30,000 images from these websites. It is difficult for us

to calculate the recall of the system because it is a tedious job to browse the entire image

database and specify the ground truth manually. Therefore, we only choose 17 queries

8/8/2019 CBIR06

23/25


Figure 8. The average precisionrecall curve of the systems retrieval performance for all queries.

to demonstrate the performance of the system. Furthermore, the calculation of recall is

roughly estimated after scanning the top 1000 images returned for each query. The selected

queries are: Clinton, Jordan, car, flower, tree, cat, submarine, mars, spring, galaxy, movie

star, potato, ship, space, tomb raider, female, and mountain. Figure 8 shows the average

precisionrecall for all queries.

Although the feedback from a single user is limited in our experiments, multiple users

feedbacks are accumulated and stored in the user log. The user space model is constructed

from the user log and used to improve the document space model and to improve the

retrieval performance. The system performance of applying log mining is represented in

the dash-dotted line in Figure 8 for the corresponding cases. As we can see from these

figures, the log mining not only improves the precision when the recall is low, but can also

improve the precision when the recall is high. In other words, the overall performance of

the system is improved after log mining.

5. Conclusions

In this paper, we have discussed in detail relevance feedback technologies in content-based

image retrieval systems. The key issues and representative algorithms in relevance feed-

back in CBIR are reviewed. We have presented a framework of integrated relevance feed-

back and semantic learning in content-based retrieval. Our method utilizes both semantic

and low-level feature properties of every feedback image in refine the retrieval, while inthe meantime, learning semantic annotations of each image. While the user is interacting

with the system by providing feedbacks in a query session, a progressive learning process

is activated to propagate the keyword annotation from the labeled images to un-labelled

images so that more and more images are implicitly labeled by keywords at certain prob-

abilities. In this way, more and more images are implicitly labelled by keywords by the

semantic propagation process. Thus, such process will improve the retrieval performance

in future, either query by image examples or by keywords. Furthermore, we extended

8/8/2019 CBIR06

24/25

154 ZHANG ET AL.

the framework in a web image search engine by incorporating user log mining in refining

search accuracy. This new framework makes the image retrieval system to be superior over

either the classical CBIR or text-based systems.

Publishers note

This article is based on the original conference paper published by Kluwer Academic Pub-

lishers in Visual and Multimedia Information Management, edited by Xiaofang Zhou and

Pearl Pu. ISBN: 1-4020-7060-8. 2002 by International Federation for Information

Processing.

References

[1] C. Buckley and G. Salton, Optimization of relevance feedback weights, in Proceedings of SIGIR95,

1995.

[2] S. K. Chang, C. W. Yan, D. C. Dimitroff, and T. Arndt, An intelligent image database system, IEEE

Transactions on Software Engineering 14(5), 1988.

[3] Z. Chen, W. Liu, C. Hu, M. Li, and H. J. Zhang, iFind: A web image search engine, in Proceedings of

SIGIR2001, 2001.

[4] Z. Chen, W. Liu, F. Zhang, M. Li, and H. J. Zhang, Web mining for web image retrieval, Journal of the

American Society for Information Science and Technology 52(10), August 2001, 831839.

[5] I. J. Cox, T. P. Minka, T. V. Papathomas, and P. N. Yianilos, The Bayesian image retrieval system,

PicHunter: Theory, implementation, and psychophysical experiments, IEEE Transactions on Image

Processing, Special Issue on Digital Libraries, 2000.

[6] M. Flickner, H. Sawhney, W. Niblack et al., Query by image and video content: The QBIC system, IEEE

Computer Magazine 28, 1995, 2332.[7] J. Huang, S. R. Kumar, and M. Metra, Combining supervised learning with color correlograms for content-

based image retrieval, in Proceedings of ACM Multimedia95, November 1997, pp. 325334.

[8] Y. Ishikawa, R. Subramanya, and C. Faloutsos, Mindreader: Query databases through multiple examples,

in Proceedings of the 24th VLDB Conference, New York, 1998.

[9] F. Jing, M. Li, H. J. Zhang, and B. Zhang, An effective region-based image retrieval framework, in

Proceedings of ACM Multimedia 2002, Juan-les-Pins, France, December 16, 2002.

[10] J. Laaksonen, M. Koskela, and E. Oja, PicSOM: Self-organizing maps for content-based image retrieval,

in Proceedings of International Joint Conference on NN, July 1999.

[11] C. Lee, W. Y. Ma, and H. J. Zhang, Information embedding based on users relevance feedback for image

retrieval, in Proceedings of SPIE International Conference on Multimedia Storage and Archiving Sys-

tems IV, Boston, 1922 September 1999.

[12] Y. Lu et al., A unified framework for semantics and feature based relevance feedback in image retrieval

systems, in Proceedings of ACM MM2000, 2000.

[13] S. D. MacArthur, C. E. Brodley, and C.-R. Shyu, Relevance feedback decision trees in content-based image

retrieval, in IEEE Workshop on Content-Based Access of Image and Video Libraries, 2000, pp. 6872.[14] T. Minka and R. Picard, Interactive learning using a Society of Models, Pattern Recognition 30(4), 1997.

[15] T. Mitchell, Machine Learning, McGraw-Hill, 1997.

[16] J. J. Rocchio Jr., Relevance feedback in information retrieval, in The SMART Retrieval System: Experi-

ments in Automatic Document Processing, ed. G. Salton, Prentice-Hall, 1971, pp. 313323.

[17] Y. Rui and T. S. Huang, A novel relevance feedback technique in image retrieval, in Proceedings of 7th

ACM Conference on Multimedia, 1999.

[18] Y. Rui, T. S. Huang, and S. Mehrotra, Content-based image retrieval with relevance feedback in MARS,

in Proceedings of IEEE International Conference on Image Processing, 1997.

8/8/2019 CBIR06

25/25


[19] G. Salton, Automatic Text Processing, Addison-Wesley, Reading, MA, 1989.

[20] G. Salton and M. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, 1983.

[21] S. Sclaroff, L. Taycher, and M. L. Cascia, ImageRover: a content-based image browser for the World Wide

Web, Technical Report 97-005, Boston University CS Dept., 1997.

[22] H. T. Shen, B. C. Ooi, and K. L. Tan, Giving meanings to WWW images, in Proceedings of ACM

MM2000, 2000, pp. 3948.

[23] Z. Su, S. Li, and H. J. Zhang, Extraction of feature subspaces for content-based retrieval using relevance

feedback, in ACM Multimedia 2001, Ottawa, Canada, 2001.

[24] Z. Su, H. J. Zhang, and S. Ma, Relevant feedback using a Bayesian classifier in content-based image

retrieval, in SPIE Electronic Imaging 2001, San Jose, CA, January 2001.

[25] K. Tieu and P. Viola, Boosting image retrieval, in IEEE Conference on Computer Vision and Pattern

Recognition, 2000.

[26] S. Tong and E. Chang, Support vector machine active leaning for image retrieval, in ACM Multimedia

2001, Ottawa, Canada, 2001.

[27] N. Vasconcelos and A. Lippman, Learning from user feedback in image retrieval systems, in NIPS99,

Denver, CO, 1999.

[28] P. Wu and B. S. Manjunath, Adaptive nearest neighbour search for relevance feedback in large image

database, in ACM Multimedia Conference, Ottawa, Canada, 2001.

[29] Y. Wu, Q. Tian, and T. S. Huang, Discriminant EM algorithm with application to image retrieval, in IEEE

CVPR, South Carolina, 2000.

[30] H. J. Zhang and D. Zhong, A scheme for visual feature based image indexing, in Proceedings of

IS&T/SPIE Conference on Storage and Retrieval for Image and Video Databases III, 1995, pp. 3646.

CBIR06

Documents

Transcript of CBIR06