CBIR06
Transcript of CBIR06
-
8/8/2019 CBIR06
1/25
World Wide Web: Internet and Web Information Systems, 6, 131155, 2003
2003 Kluwer Academic Publishers. Manufactured in The Netherlands.
Relevance Feedback and Learning in Content-Based
Image Search
HONGJIANG ZHANG, ZHENG CHEN, MINGJING LI and ZHONG SU [email protected]
Microsoft Research Asia, 49 Zhichun Road, Beijing 100080, China
Abstract
A major bottleneck in content-based image retrieval (CBIR) systems or search engines is the large gap between
low-level image features used to index images and high-level semantic contents of images. One solution to this
bottleneck is to apply relevance feedback to refine the query or similarity measures in image search process.
In this paper, we first address the key issues involved in relevance feedback of CBIR systems and present a
brief overview of a set of commonly used relevance feedback algorithms. Almost all of the previously proposed
methods fall well into such framework. We present a framework of relevance feedback and semantic learning
in CBIR. In this framework, low-level features and keyword annotations are integrated in image retrieval and in
feedback processes to improve the retrieval performance. We have also extended framework to a content-based
web image search engine in which hosting web pages are used to collect relevant annotations for images and
users feedback logs are used to refine annotations. A prototype system has developed to evaluate our proposed
schemes, and our experimental results indicated that our approach outperforms traditional CBIR system and
relevance feedback approaches.
Keywords: image retrieval, relevance feedback, machine learning, web mining
1. Introduction
The popularity of digital images is rapidly increasing due to improving digital imaging
technologies and convenient availability facilitated by the Internet. However, how to find
user-intended images from the Internet is still non-trivial. The main reason is that web
images are usually not well annotated using semantic descriptors. The development his-
tory of image retrieval systems features two stages. The first stage is keyword-based image
retrieval, which is summarized by Chang et al. [2]. Since manual image annotation is
a tedious process, it is practically impossible to annotate all the images on the Internet.
Furthermore, due to the multiplicity of contents in a single image and the subjectivity of
human perception, it is also difficult to make exactly the same annotations to the sameimage by different users. These difficulties have limited the applications of the keyword-
based image retrieval technology. Having been actively researched on in the last decade
[6,30], content-based image retrieval (CBIR) attempts to automate the process of indexing
or annotating image in image databases. CBIR approaches work with descriptions based
on inherent properties of images, such as color, texture and shape. However, despite all
This paper is based on the invited keynote that first author gave in VDB2002, Brisbane, Australia, May 2002.
-
8/8/2019 CBIR06
2/25
132 ZHANG ET AL.
the research efforts, the retrieval accuracy of todays CBIR algorithms is still very limited.
In addition to many other difficulties, the bottleneck is the gap between low-level image
features and semantic image contents. This problem stems from the fact that visual sim-
ilarity measures, such as color histograms, in general do not necessarily match semantics
of images human subjectivity. Also, each type of visual feature tends to capture only one
aspect of image property and it is usually hard for a user to specify clearly how different
aspects are combined to form an optimal query. To make the problem even worse, people
often have different semantic interpretations of the same image. Even the same person
may have different perception about the same image at different times. To address this
bottleneck, interactive relevance feedback techniques have been proposed. The key idea
is that we should incorporate human perception subjectivity into the retrieval process and
provide users opportunities to evaluate retrieval results and automatically refine queries on
the basis of those evaluations. In the last few years, this research topic has become the
focus in CBIR research community.
Relevance feedback, originally developed for textual document retrieval [16], is a super-
vised active learning technique used to improve the effectiveness of information systems.
The main idea is to use positive and negative examples from the user to improve system
performance. For a given query, the system first retrieves a list of ranked images according
to a predefined similarity metrics. Then, the user marks the retrieved images as relevant
(positive examples) to the query or not (negative examples). The system will refine the
query based on the feedback and retrieves a new list of images and presents to user. Hence,
the key issue in relevance feedback is how to incorporate positive and negative examples
to refine the query and/or to adjust the similarity measure.
In this paper, we present a content-based image retrieval framework that integrate low-
level and semantic-based image similarities and support automated annotation throughlearning from relevance feedback, and the extension of the framework in a web image
search engine. Instead of detailed description of the novel component algorithms, we fo-
cus our description on the key ideas in the framework. Details of the algorithms and the
framework implementation can be found in the reference [4,9,12,23,24]. Also, we want
the paper to serve as a reference on the current state of the art of CBIR relevance feed-
back research, a comprehensive survey is presented in this paper on relevance feedback
algorithms in terms of their natures and limitations.
There are many issues in relevance feedback approaches CBIR, such as learning
schemes, feature selection, index structure and scalability. Instead of giving an exhaus-
tive survey of each published relevance feedback algorithms for CBIR in term of their
advantages and limitations, we focus our discussions with the consideration that relevance
feedback in CBIR is a small sample machine learning problem and extend our descriptionin detail in respect to learning and searching natures of each algorithm. This is presented
in Section 2.
In Section 3, we present the integrated relevance feedback framework for framework
for CBIR. In this framework, while the user is interacting with the system by providing
feedbacks in a query session, a progressive learning process is activated to propagate the
keyword annotations from the labeled images to un-labeled images as the system refines
the retrieval. The knowledge learned in the relevance feedback sessions are accumulated
-
8/8/2019 CBIR06
3/25
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 133
in a semantic network. In addition, a cross-modality query expansion scheme is imple-
mented to improve the retrieval performances significantly either a query is initiated with
a keyword or an example image.
The proposed framework has been further extended to a web image search system, as
presented in Section 4. In this extension, we combine visual features and text descriptors
initially extracted from web pages where images exist, such as image URLs, filenames,
page titles, ALT text, hyperlinks, and surrounding text. Those visual and texture features
build the document space model of images. However, the initial text descriptors are in
general are less accurate than annotating text and there are often mismatch between the
page authors expression and the users understanding and expectation of the annotation.
To overcome these problems, we apply the proposed relevant feedback framework. Fur-
thermore, data mining technology is also applied on the users log of feedback to improve
the image retrieval performance in two aspects. Firstly, the original document space model
built from the images and the text content of the web pages can be analyzed to detect and
removeclutter and irrelevant text information. Secondly, the user space model, which is the
keyword vectors used by the users to represent images in the database, can be constructed
from the users log data of relevance feedback. The user space model is then combined
with the document space model to eliminate mismatch between the page authors expres-
sion and the users understanding and expectation.
2. Relevance feedback algorithms
In this section, we review a set of relevance feedback approaches used in CBIR. The review
is focused on learning and searching natures of each relevance feedback algorithm as we
consider that relevance feedback in CBIR is a machine learning problem. We begin the
discussion by first providing an overview of classical relevance feedback approaches in
CBIR.
2.1. Classical algorithms
The early relevant feedback schemes for CBIR were mainly adopted from those developed
for classical textual document retrieval. These approaches can be classified into two ap-
proaches: query point movement (query refinement) and reweighting (similarity measure
refinement) [1]. Both of them were developed based on vector space model, the most
popular model used in information retrieval [20].
The query point movement method essentially tries to improve the estimate of the ideal
query point by moving it towards positive example points and away from bad example
points in the query space. There are various ways to update the query. The frequently used
technique to iteratively improve this estimation is the Rocchios formula given below for
sets of relevant documents DR and non-relevant documents DN given by the user [16]:
Q = Q +
1
NR
iDR
Di
1
NN
iDN
Di
, (1)
-
8/8/2019 CBIR06
4/25
134 ZHANG ET AL.
where , , and are suitable constants; NR and NN are the number of documents in
DR and DN, respectively. This technique is also referred as learning query vector. It was
implemented in the MARS system [18] by replacing the document vector with visual fea-
ture vectors. Experiments show that retrieval performance can be improved considerably
by using such relevance feedback approaches.
The basic idea behind the re-weighting method is to enhance the importance of the di-
mensions of a feature that help in retrieving the relevant images and reduce the importance
of those dimensions that hinder this process. This is achieved by updating the weights of
feature vectors in the distance metric. Considered a weighted metric defined as
D =
j [N]
jX
(1)j
X(2)j
. (2)
When an image of the query result is labeled as a positive example, the feature components
that contribute more similarity to the match is considered more important, while the com-
ponents with less contribution is considered to be less important. Therefore, the weight for
a feature component, i , is updated in the following way:
i = i
1 + i
, =f(Q) fA+j , (3)
where is the mean of . On the other hand, if an image is labeled as a negative exam-
ple, the feature components that contribute more to the match should be considered to be
depressed. That is, the weight is updated as:
i = i 1 + i. (4)This technique is also referred as learning the metric. This approach was implemented
proposed by Huang et al. [7]. The MARS system implemented a slight refinement to the
re-weighting method called the standard deviation method [18].
Instead of updating the individual components of a distance metric, we can also begin
with a set of predefined distance metrics and use relevance feedback to automatically select
the best one in the retrieval process. For instance, in ImageRover system [21], appropriate
Lp Minkowski distance metrics are automatically selected to minimize the mean distance
between the relevant images specified by the user.
Another relevance feedback approach, proposed by Minka and Picard, is to update the
query space by selecting feature models. It is assumed that each feature model has its
own strength in representing a certain aspect of image content, and thus, the best way for
effective content-based retrieval is to utilize a society of models. This approach uses alearning scheme to dynamically determine which feature model or combination of models
is best for subsequent retrieval.
Recently, more computationally robust methods that perform global feature optimization
have been proposed. The MindReader retrieval system designed by Ishikawa et al. [8]
formulates a minimization problem on the parameter estimating process. Unlike traditional
retrieval systems whose distance function can be represented by ellipses aligned with the
coordinateaxis, the MindReader system proposed a distance function that is not necessarily
-
8/8/2019 CBIR06
5/25
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 135
aligned with the coordinate axis. Therefore, it allows for correlations between attributes in
addition to different weights on each component.
A further improvement over the MindReader approach is given in [17]. In this ap-
proach, optimal query estimation and weighting functions are derived in a unified frame-
work. Based on the minimization of total distances of positive examples from the revised
query, the weighted average and a whitening transform in the feature space were found to
be the optimal solutions. In more detail, assume that a query vector component qi corre-
sponds to the ith feature, an N element vector r = [r1, . . . , rN] represents the degree ofrelevance for each of the N input training samples, and there is a set of N training vec-
tors xni for each feature i. It is derived that the ideal query vector qi for feature i is the
weighted average of the training samples for feature i given by
qTi =RTXiN
n=1 rn, (5)
where Xi is the N Ki training sample matrix for feature i, obtained by stacking the Ntraining vectors xni into a matrix. It is interesting to note that the original query vector qidoes not appear in (5). This shows that the ideal query vector with respect to the feedbacks
is not influenced by the initial query.
The optimal weight matrix Wi is given by
Wi =
det(Ci )1/Ki
C1
i , (6)
where Ci is the weighted covariance matrix of Xi . That is,
Cirs =N
n=1n(xnir qir )(xnis qis )N
n=1 n, r, s = 1, . . . , Ki . (7)
We can see from the above equations that the critical inputs into the system are training
vectors xni and the relevance matrix R. In this algorithm, initially, the user needs to input
these data to the system. Another issue with this algorithm is that negative examples are
not utilized in updating of query and similarity.
2.2. Relevance feedback as a learning process
Relevance feedback can be considered as a leaning problem a user provides feedback
examples from the retrieval results of a query and system learns from such examples to
refine retrieval results. The original query-movement method represented by the Roc-chios formula and reweighting method [16] is both simple learning methods. According
to Mitchells [15] definition, machine learning are concerned with the question of how to
construct computer programs that automatically improve with experience. In this view, any
task that could be improved with respect to certain performance measure based on some
experience can be considered the machine-learning task. In CBIR, relevance feedback is
a task to improve the retrieval performance and the experience here is feedback examples
provided by the users. Hence, classical machine-learning methods, such as decision tree
-
8/8/2019 CBIR06
6/25
136 ZHANG ET AL.
learning [13], artificial neural networks [10], Bayesian learning [5,27], and kernel based
learning [26] can be and have been applied to relevance feedbacks in CBIR. However, as
users are usually reluctant to provide a large number of feedback examples, the number of
training samples is very small, typically less than ten in each round of feedback session. On
the contrary, feature dimensions in CBIR systems are usually high. Hence, the crucial is-
sue in performing relevance feedback in CBIR systems is how to learn from small training
samples in a very high dimension feature space. This fact makes many learning methods,
such as decision tree learning and artificial neural networks, not suitable for CBIR.
The key issues in addressing relevance feedback in CBIR as a small sample learning
problem include: How to learn fast from small sets of feedback samples to improve re-
trieval accuracy effectively; How to accumulate knowledge learned from feedback; and
How to integrate low-level visual and high-levelsemantic features in query. However, most
of the published works have been focused on the first issue. Compared with other learn-
ing methods, Bayesian learning shows its advantages in addressing the first issue above
and almost all aspects of Bayesian learning have been touched in researching for effective
learning algorithms.
Vasconcelos and Lippman [27] treated feature distribution as a Gaussian mixture and
used Bayesian inference for learning during feedback iterations in a query session. Richer
information captured by the mixture model also makes image regional matching possible.
The potential problems of their methods are computing efficiency and complex data model
that leads to too many parameters need to be estimated with very limited samples.
To speed up the learning process so the retrieval result can be converged faster to users
satisfaction, active learning methods have been used to actively select samples in order to
achieve the maximal information gain, or the minimized entropy/uncertainty in decision-
making. The approached proposed in [5] used Monte Carlo sampling in search of theset of sample that will minimize the expected number of future iterations. In estimating
the expected number of future iterations, entropy is used as an estimate of the number of
future iterations under the ambiguity specified by the current probability distribution of the
target image over all test images. Tong and Chang [26] proposed a SVM active learning
algorithm to select the sample to maximally reduce the size of the vector space in which
the class boundary lies. Without knowing a priori the class of a candidate, the best strategy
is to halve the search space each time. They attempted to justify that selecting the points
near the SVM boundary can approximately achieve this goal, and it is more efficient than
other more sophisticated schemes, which require exhaustive trials on all the test items.
Therefore, in their work, the points near the SVM boundary are used to approximate the
most-informative points; and the most-positive images are chosen as the ones farthest from
the boundary on the positive side in the feature space.Some researchers consider relevance feedback process in CBIR as a pattern recognition
or classification problem. Under such a consideration, the positive and negative exam-
ples provided by user can be treated as training examples and a classifier could be trained.
Then, such classifier can separate all data set into relevant and irrelevant groups. It seemed
that many existing pattern recognition tools could be adopted for this task and many kinds
of classifiers have been experimented, such as linear classifier [29], nearest-neighbor clas-
sifier [28], Bayesian classifier [24], support vector machines (SVM) [26], and so on. In
-
8/8/2019 CBIR06
7/25
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 137
this category, the most popular algorithm is represented by [26] where SVM classifier is
trained to divide the positive and negative examples. Then such SVM classifier will clas-
sify all images in database into two groups: relevant and irrelevant groups to a given query.
However, in most cases of CBIR, there is no predefined class structure. From application
point of view, such classification-based methods may improve the retrieval performance in
some constrained contexts; but they will be limited when applied to general purpose image
databases.
2.3. Feature versus semantics in relevance feedback
All the approaches described in above perform relevance feedback at the low-level featurevector level by basically replacing keywords with features when adopting the vector space
model developed for document retrieval. While these approaches do improve the perfor-
mance of ICBR, there are severe limitations. The inherent problem is that the low-level
features are often not as powerful in representing complete semantic content of images as
keywords in representing text documents. Furthermore, users often pay more attention to
the semantic content (or a certain object/region) of an image than to the background and
other, the feedback images may be similar only partially in semantic content, but may vary
largely in low-level features. Hence, using low-level features alone may not be effective in
representing users feedbacks and in describing their intentions.
In addition, there are typically two different modes of user interactions involved in image
retrieval systems. In one case, the user types in a list of keywords representing the semantic
contents of the desired images. In the other case, the user provides a set of examplesimages as the input and the retrieval system will retrieve other similar images. In most
image retrieval systems, these two modes of interaction are mutually exclusive. However,
combining these two approaches and allowing them to benefit from each other will yield a
great deal of advantage in terms of both retrieval accuracy and ease of use of the system.
There have been efforts on incorporating semantics in relevance feedback for image
retrieval. The framework proposed in [11] (to be discussed later in more detail in this
section) attempted to embed semantic information into a low-level feature based image re-
trieval process using a correlation matrix. The FourEye system by Minka and Picard [14]
and the PicHunter system by Cox et al. [5], made use of hidden annotation through learn-
ing process. However, they excluded the possibility of benefiting from good annotations,
which may lead to a very slow convergence.
In terms of feature selection, unlike most CBIR systems that use image features such
as color histogram or moments, texture, shape, and structure features, Tieu and Viola [25]
used a boosting technique to learn a classification function in a feature space of more than
45,000 features. The features were demonstrated to be sparse with high kurtosis, and were
argued to be expressive for high-level semantic concepts. Weak 2-class classifiers were
formulated based on Gaussian assumption for both the positive and negative (randomly
chosen) examples along each feature component, independently. The strong classifier is
then a weighted sum of the weak classifiers as in AdaBoost.
-
8/8/2019 CBIR06
8/25
138 ZHANG ET AL.
The framework to be discussed in Section 3 integrates both semantics and low-level
features into the relevance feedback process in a new way. Only when the semantic infor-
mation is not available, the method is reduced to one of the previously described low-level
feedback approaches as a special case.
2.4. Relevance feedback with memory
A disadvantage in the classic relevance feedback as well as many learning based
approaches discussed above is that the captured knowledge in the relevance feedback
processes in one query session or one learning step is not memorized to continuously im-
prove the retrieval accuracy. That is, even with the same query, a user will have to go
through the same, often tedious, feedback process to obtain the same result, despite the
fact the user has given the same query and feedbacks before. Strictly speaking, there is no
learning or only limited learning in such systems as there is no knowledge accumulation
across different query sessions. To overcome these limitations, another school of ideas is to
using learning approaches to memorize users subjectivities in relevance feedback process.
The challenge in this approach is how to memorize knowledge learned and how to handle
the inconsistency of content subjectivities across difference users and/or across different
query sessions of the same user.
The approach proposed in [11] was the first attempt to explicitly memorize learned se-
mantic information to improve CBIR performance. The basic idea of this approach is to
accumulate semantic relevance between image clusters learnt from users feedback in cor-
relation network. In other words, a correlation network is used to memorize. Figure 1
illustrates the correlation network. Mathematically, the correlation network is representedby a correlation matrix, M, defined as below:
M =
w11 w12 . . . w1Nw21 w22 . . . w2N
......
. . ....
wN1 wN2 . . . wNN
, (8)
where the weight or coefficient, wij , represents the semantic correlation between images
in cluster i and j .
The system works a follows. First, all images in a database are clustered into N clusters
based on visual feature similarity using, for instance, k-means algorithm. Obviously, the
images in each cluster initially are only similar in term of the selected visual features, like
in a typical CBIR system. Also, initially, all correlation coefficients between each two
clusters are set to zero, meaning only images within the same cluster are correlated and
images across clusters are uncorrelated. That is, the initial matrix is a unit one,
M0 = INN. (9)
Then, for a given query, the initial retrieval is based on visual features. Assume that after
a given iteration, n + m images are displayed, and n images are marked relevant and
-
8/8/2019 CBIR06
9/25
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 139
Figure 1. Correlation network to memorize semantic correlations between image groups.
m irrelevant. The relevant as well as irrelevant images may or may not be from difference
clusters. This approach memorizes such feedbacks by updating the correlation matrix as
below:
Mt = Mt1 +
mi=1
F (q)F (pi )T
ni=1
F (q)F (ni )T, (10)
where q is the feature vector of the query, pi and ni are feature vectors of positive and
negative feedback samples, and F(x) is a transform function used to determine the update
magnitude based on the feedback samples. In this way, the correlation between the clus-
ter where the query original falls in and these the positive samples fall in are increased,
progressively embedding the information on semantic correlations between images. This
correlation is then used in subsequent retrievals in which not only the visual features, but
also the semantic correlations are used in determine the similarity of an image to the query.
Experiments have shown that such a progressive learning approach effectively utilizes the
knowledge learnt from previous queries to reduce the number of iterations to achieve high
retrieval accuracy [11].
-
8/8/2019 CBIR06
10/25
140 ZHANG ET AL.
Also, if there are two distinct groups in one initial cluster which semantically dissimilar,
meaning that they are negative examples to each other, a splitting is performed to spit the
initial cluster into two clusters. On the other hand, based on feedbacks, when two clusters
that are close in features space and have high correlation between them according to M, the
two initial clusters could be merged into one. That is, the correlation network dynamically
updates its structure in addition to updating the correlation matrix as learning from user
feedback.
2.5. Log mining in relevance feedback
More recently, people are aware of the fact that the Web is a rich resource of image data and
some of their semantics is usually available on the same web documents. Shen et al. [22]exploit such reality and use some natural language processing technique to obtain semantic
features from the web text to characterize the web images. Hence, they are able to find
relevant images from the web using text-based queries. In our work of web image search
engine, we also use the web pages as the potential sources of semantics. There are two
kinds of difference for two systems. First difference is in the natural language processing
approach to obtaining semantic features. They use a so-called weighted chain-net, which
is actually a lexical chain, to represent the document space model for images, while our
document space model of all media objects is simply a vector space model, which is an
effective approach and has widely been used in traditional information retrieval. Other
natural language processing methods, such as, proper noun identification, are also used to
extract semantic features. Another difference is that our system exploits relevant feedback
and data mining on the users feedback logs to update the document space model. So ourapproach outperforms traditional CBIR system and relevance feedback approaches.
3. An integrated relevance feedback framework
As discussed in Section 2, an effective relevance feedback system should provide effective
solutions to learning effectively from small sets of feedback samples, accumulating learned
knowledge and integrating low-level visual and high-level semantic features in query and
feedbacks to achieve high retrieval accuracy.
In addition, there typically are two different modes of user interactions involved in image
retrieval systems. In one case, the user types in a list of keywords representing the semantic
contents of the desired images. In the other case, the user provides a set of examples images
as the input and the retrieval system will try to retrieve other similar images. In most imageretrieval systems, these two modes of interaction are mutually exclusive. We argue that
combining these two approaches and allow them to benefit from each other yields a great
deal of advantage in terms of both retrieval accuracy and ease of use of the system.
To address all of above-mentioned issues, a CBIR framework with integrated relevance
feedback and query expansion was proposed [9,12,23,24]. Figure 2 illustrates the proposed
CBIR framework. It consists of a semantic network which links images to semantic an-
notations in a database, a similarity measure that integrating both semantic features and
-
8/8/2019 CBIR06
11/25
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 141
Figure 2. The proposed framework of integrated relevance feedback and query expansion.
Figure 3. Semantic network.
image features, and a machine learning algorithm to iteratively update the semantic net-
work and to improve the systems performance over time. The system supports both query
by keyword and query by image example through semantic network and low-level fea-
ture indexing. More importantly, the learning process propagates the keyword annotations
from the labeled images to unlabeled ones during the feedback. In this way, more and more
images are implicitly labeled by keywords by the semantic propagation process. This an-
notation propagation process also helps the system in accumulating knowledge learned to
improve performance of future retrieval requests.
3.1. Semantic network
The semantic network is a two-layered structure. The top layer is represented by a set of
keywords having links to the images in the database. It can be considered an extension
of the initial information embedding idea in the system shown in Figure 1. The degree
of relevance of the keywords to the associated images semantic content is represented as
the weight on each link, as shown pictorially in Figure 3. This layer is what we need in
-
8/8/2019 CBIR06
12/25
142 ZHANG ET AL.
keyword relevance feedback and will be updated during the semantic propagation. Bottom
layer is a keyword thesaurus to construct the connection between different keywords.
The initial weights can be obtained by manual labeling. In our web image search engine,
they are initially extracted from the following sources on the web page that contains the
image based according to some empirical rules.
1. Image filename and URL. We assume that web page authors/editors usually assign
meaningful filenames to images in a web page. Some heuristic rules are used to extract
the keywords from the filenames. First, the filename is segmented into meaningful key-
words based on pre-define dictionary. For example, filename redflower.jpg includes
two semantic words: red and flower. Then, the clutter letters in filenames, such
as digits, hyphens, filename extension, etc., are discarded. We also extract semantic
keywords from the URL of the image files. The URL usually represents the hierar-chy information of an image on the web page. For instance, animal and bird are
useful information in the URL http://www.ditto.com/images/animals/
anim_birds.jpg . We apply the similar technology of the filename segmentation to
segment the URL into meaning pieces.
2. ALT (alternate) text. The ALT text in a web page is used for displaying to replace the
associated image in a text-based browser. Hence, it usually represents the semantics
of the image concisely; hence, it is a very relevant feature to represent the semantic
meaning of the images.
3. Surrounding text. In web pages, images are used to enhance the content that the editors
want to present. Hence, some texts in the surrounding areas are semantically relevant
to the content of the image. However, it is difficult to judge which area among all
of the four possible areas (above, below, left, right) is the most relevant to the image.Therefore, in our prototype, all of the four areas are chosen as the sources of the text
features for the image. This feature will be refined by log mining on the users relevant
feedback logs as discussed in Section 4.
4. Page title. The page title is a good candidate of the text feature of images in a web page.
5. Other information. Image hyperlinks, anchor text, etc., are also candidates of text fea-
tures of the images.
The initial value of weight wij associated with each keyword of an image is calculated
by the TF*IDF method [19]. That is, a feature vector is used to represent the all keywords
of an image and the vector is defined as
Dih = TFi IDFi
=
ti1 log
Nn1
, . . . , t ij log
Nnj
, . . . , t im log
Nnm
, (11)
where Dih is the feature vector, with each component value corresponding to the initial
weight assigned to the association of a keyword to an image i. tij stands for the frequency
of keyword j appearing in the text description of the image i. nj is the number of images
that are characterized by keyword j . N is the total number of images. Of course, if no
keyword information to the image, the corresponding feature vector is set to null.
-
8/8/2019 CBIR06
13/25
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 143
With the semantic network, semantic based relevance feedback can be performed rela-
tively easily compared to its low-level feature counterpart. This is performed by updating
the weights wij associated with each link shown in Figure 3. The weight updating process
is described below.
1. A user submits a query and the system retrieves similar images using cross-modality
query extension, to be explained later in next subsection.
2. System collects the positive and negative feedback examples corresponding to the
query.
3. For each keyword in the input query, check to see if any of them is not in the keyword
database. If so, add them into the database without creating any links.
4. For each positive example, check to see if any query keyword is not linked to it. If so,
create a link with an initial weight from each missing keyword to this image. For allother keywords that are already linked to this image, increase the weight by a predefined
value or using the method defined by (10) and (11).
5. Similarly, for each negative example, check to see if any query keyword is linked with
it. If so, decrease its weight, until it is zero.
Through this updating process, the keywords that represent the actual semantic content
of each image will receive a larger weight. Also, it can be easily seen that as more queries
are inputted into the system, the system is able to expand its vocabulary. Furthermore,
a semantic propagation method is used to populate keywords to unlabeled image during
users feedback iteration, which will be described later in this section.
3.2. Integrated and cross modality query and retrieval
The proposed framework has an integrated relevance feedback scheme in which both low-
level feature based and high-level semantic feedbacks are performed. We define a unified
metric function G to measure the relevance between query Q and any image j within an
image database in terms of both semantic and low-level feature content, where Q includes
the original query and users feedback information:
G(j,Q) = simk(j,Qk) + (1 ) simf(j,Q
f), (12)
where [0, 1] is the weight of the semantic relevance in the overall similarity measure,which can be specified by users. The larger is, the more important the semantic rele-
vance will play in the overall similarity measurement. simf(j,Qf) and simk(j,Q
k) are the
semantic similarity and low-level feature similarity between image j and revised query Q
,respectively.
The revised query Q consists of two parts: the feature-based one Qf and the semantic
(keyword)-based one Qk. Qf is defined by (3)(5) based on feature vectors of feedback im-
ages. With the semantic network, simk(j,Qk) can be directly computed with the updated
weights.
To further improve the retrieval performance of the proposed framework, a cross-
modality query expansion method is supported. That is, once a query is submitted in
-
8/8/2019 CBIR06
14/25
-
8/8/2019 CBIR06
15/25
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 145
3.3. Probabilistic keyword propagation scheme
As illustrated in Figure 3, the more images are annotated (correctly), the better the system
retrieval performance will be. However, the reality is human labeling of images is tedious
and expensive, hence not a feasible solution, which was what motivated CBIR research
fifteen years ago. To address this issue, a probabilistic progressive keyword propagation
scheme is proposed in our framework to automatically annotate images in the databases in
the relevance feedback process utilizing based a small percentage of annotated images.
We assume that initially only a few images in a database have been manual labeled with
keywords and the retrieval is performed mainly based on low-level features. As stated be-
fore, the initial keywords annotation can be from web through the crawler when the images
are from the Web, or labeled by humans. While the user is interacting with the system by
providing feedbacks in a query session, a progressive learning process is activated to prop-
agate the keyword annotation from the labeled images to un-labeled images so that more
and more images are implicitly labeled by keywords. In this way, the semantic network is
updated in which the keywords with a majority of user consensus will emerge as the dom-
inant representation of the semantic content of their associated images. As more queries
are inputted into the system, the system is able to expand its vocabulary. Also, through
the propagation process, the keywords that represent the actual semantic content of each
image will receive a large weight.
There are two major issues in keyword propagation: which images and which key-
word(s) should be propagated during a query session. To answer the first question, a
probability model, based on Bayesian learning, is proposed. We assume that, (1) all pos-
itive examples in one retrieval session belong to the same semantic class with common
semantic object(s) or meaning(s); and (2) the features from the same semantic class fol-lows the Gaussian or Mixture Gaussian distributions. Therefore, all positive examples in a
query session are used to calculate and update the parameters of the corresponding seman-
tic Gaussian classes. Then, the probability of each image in the database belonging to such
semantic class is calculated. The common keywords in positive examples are propagated
to the images with very high probability belonging to this class.
As we can see, the propagation framework uses the same procedure as the feedback algo-
rithm in low-level features [23]. The only difference is that for low-level feature feedbacks,
the calculated probability is used for the ranking of an image in retrieval candidate list,
while here it is used to determine if an image should be in the propagation candidate list.
The propagation candidate set S is obtained as follows:
S = {c1, . . . , ck}, where p(cj ) > , (14)
where p(cj ) is the probability that image j in the database belonging to such semantic
class and is a constant threshold which can be estimated by the training process. The
weight associates with the propagated keyword i and the image j is wij = p(cj ). Morecomplex distribution model, for example, Mixture Gaussian, may be used in this propaga-
tion framework. However, because the users feedback examples in practice are often very
few, complex models will leads into much more parameter estimation errors as there are
more parameters to be estimated.
-
8/8/2019 CBIR06
16/25
146 ZHANG ET AL.
Also, to determine which keyword(s) should be propagated when an image is associated
with multiple keywords, there two approaches: using relevance factor defined by (13), or
using region-based approach [9]. In the former approach, the relevance factor rij can be
directly used to modify the weight with the propagated keyword. Obviously, the lower the
relevance of a keyword to an image is, the less weight it will be assigned to the keyword
in the prorogation, and vice versa. When the region-based approach is used, unlabeled
images to be propagated are firstly segmented into regions. By analyzing the feature distri-
bution of the segmented regions, a probability association between each segmented regions
and annotated keywords is set up for labeled images by region-based relevance feedback
approach. Then, each keyword of labeled image was assigned to one or several regions of
the image with certain probabilities. The detail of the region-based feedback framework is
in [9].
3.4. Experiment results
The image set used in evaluating the proposed framework described in this section is the
Corel Image Gallery of 10,000 images, manually labeled into 79 semantic categories.
200 random selected images compose the test query set. Whether a retrieved image is
correct or incorrect is judged according to the ground truth. Three types of color features
and three types of texture features are used in our system. Feedback process is running as
follows. Given a query from the test set, a different test image of the same category as the
query is used in each round of feedback iteration as the positive example for updating the
Gaussian parameters and revise the query. To incorporate negative feedback, the first two
irrelevant images are assigned as negative examples. The accuracy is defined as
Accuracy =relevant images retrieved in top N returns
N. (15)
Several experiments have been performed as follows. First, three feature-based feedback
algorithms are compared. They are: a Bayesian feedback scheme by Su et al. in [23,24],
the scheme by [27] and scheme by [17] as defined by (5)(7). This comparison is done
in the same feature space. Figure 4 shows that the accuracy of Bayesian feedback scheme
(referred as our feedback approach) becomes higher than the other two methods after
two feedback iterations. This demonstrates that the incorporated Bayesian estimation with
the Gaussian parameter-updating scheme is able to improve retrieval effectively.
To demonstrate the performance of the semantic propagation, the following experiment
was designed. 200 images in the query set were annotated by their category names. Soonly one keyword is associated to one query image and other images in database have no
keyword annotations. During the test, each query image was used twice. The retrieval
performance is shown in Figure 5 with comparison to that with the propagation. It is seen
that for feedback with propagation, the retrieval accuracy is much higher than the original
one without it. This is because, when a system has propagation ability, latter queries can
utilize the accumulated knowledge from previous feedback iterations. In other words,
system has the learning ability and will be smarter with more users interactions.
-
8/8/2019 CBIR06
17/25
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 147
Figure 4. Retrieval accuracy for top 100 results in original feature space.
Figure 5. Retrieval accuracy for top 100 results performance between feedback without propagation and feed-
back with propagation scheme.
4. Incorporating log mining in web image search engine
The architecture of our proposed web image search engine is shown in Figure 6. In addition
to all components in a CBIR system, the web search engine contains an image crawler and
three other modules, namely, the log miner, the model updater, and the query updater [3,4].The data organization of the system mainly consists of four parts: the image database that
also contains metadata of images (i.e., low-level and high-levelfeatures), the users relevant
feedback log database, the document space model, and the user space model.
A typical scenario of the system is as follows. The off-line crawler is first employed at
regular intervals (e.g., once every day at non-peak network traffic hours) to collect potential
web pages containing images and store them into a local database. The feature extractor is
then applied to these pages to extract both the low-level visual features and the high-level
-
8/8/2019 CBIR06
18/25
148 ZHANG ET AL.
Figure 6. Architecture of the proposed web image search engine.
semantic features for the images appear in these pages. In our system, the crawler and the
feature extractor actually work simultaneously. An image indexer is applied to the images
and their features to build the document space model, which is the representation of the
images in the database using their features. Once the document space model is available,the matcher compares the users query with the document space model of images to yield
the image retrieval results. Since many irrelevant images may be returned by the retrieval
system, the user feedback interface is also provided for users to specify whether a returned
image is relevant or not to the users intents. The image retrieval system can utilize user
feedbacks to gain an understanding as to the relevancy of certain images and update the
query or adjust the matcher to return more accurate retrieval results. The users feedback
log data are also stored in the user log database in the system, from which the log miner
-
8/8/2019 CBIR06
19/25
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 149
can find and build the user space model through log analysis. The user space model is
then combined with the document space model to update the document space model to
eliminate the mismatch between the page authors expression and the users understanding
and expectation and can further improve the retrieval accuracy.
4.1. Document modeling of images
The document space model in the image search engine combines the low-level visual fea-
tures and high-level semantic features to index the images on the web. The detail process
is described as follows.
To collect images in the web, a crawler (or a spider, which is a program that can automat-
ically analyze the web pages and download the related pages hyper-linked to the analyzedweb pages) is used to collect images from many web sites. First, we re-arrange the semantic
network shown in Figure 3 a concept hierarchy of image categories, such as animals, ar-
chitecture, arts, etc. Then, we select some representative sites to be collected for each
concept category. For instance, http://www.nba.com for sports, http://www.
cnn.com for news, http://www.disney.com for entertainment, etc. For each site
candidate, the crawler collects the images and saves it to a local web page database. We
then use a simple classifier to classify the images into meaningful and junk (e.g., ban-
ners, backgrounds, buttons, icons, etc.) categories based on certain information like color
histograms, image sizes, image file types, etc.
For each image collected, the initial keywords are assigned in the way as described in
Section 3.1. In addition, the low-level features of each image are calculated. The keywords
and low-level features of all collected images form the document space.In the image search process, the overall similarity is simply the linear combination of the
visual and the textual similarities, as defined in (12). It is not a good idea to set the same
default weight = 0.5 in (12) to balance the importance of low-level features and high-level features. However, it is very efficient for us to build up the baseline configuration
of our image retrieval system. The weight is automatically adjusted to a suitable value
by the system through the users feedback as to the relevancy of certain returned images.
Moreover, after we collect enough user log information of user feedback, data mining
technology (which will be presented in the next section) can be applied to find out the
importance of low-level feature and high-level feature for different concepts/categories.
For example, we find that for concept Clinton, the high-level features are more important
than the low-level features, while for concept sunshine, the low-level features are more
useful than the high-level features.
4.2. Log mining and feedback
In order to reduce the ambiguity in the text descriptors extracted from web pages and
the low-level image features, and to improve the search performance, we have proposed
a user space model to supplement the original document space model. This is achieved
by applying a user log analysis process. The user space model is also a vector space
-
8/8/2019 CBIR06
20/25
150 ZHANG ET AL.
model. The difference between the user space model and the document space model is that
vectors in the user space model are constructed from the information mined from the user
feedback log data, not from the original information extracted from the web pages. When
a user submits a query, our system will return to the user some images found based on the
original document space model. The user can then use the feedback user interface to tell
the system about the return images as whether relevant or irrelevant to the query based on
his/her subjective judgment. Of course, most users do not have the patience and time to
mark all relevant and irrelevant images in the returned image collection. However, this is
not a very serious problem because even a small set of feedback images can provide very
useful information.
After we get some users feedback log data, the user space model can be built from the
user log. Let Q be the set of total queries used until now. Let Tj
(j = 1, . . . , N T
) be the
set of all individual words that appear in Q. (Note that a singe query may contain multiple
words.) For a query in Q, Iri is one of the relevant images and Iii is one of the irrelevant
images specified by the user and stored in the user log.
From the user log, we can easily calculate the probabilities listed below:
P (Iri ) =Nri
NQ, (16)
where Nri is the number of query times that image Iri has been retrieved and marked as
relevant, and NQ is the total number of queries.
P (Iri |Tj ) =Nri (Tj )
NQ(Tj ), (17)
where Nri (Tj ) is the number of query times that image Iri has been retrieved and markedas relevant for those queries that contain word Tj , and NQ(Tj ) is the number of queries
that contain Tj .
P (Tj ) =NQ(Tj )
NQ. (18)
Based on the Bayesian theory, we have
P (Tj |Iri ) =P (Iri |Tj )P(Tj )
P (Iri ). (19)
In addition, for irrelevant images in the user log, we have
P (Iii |Tj ) =
Nii (Tj )
NQ(Tj ) , (20)
where Nii (Tj ) is the number of times that image Iii has been retrieved and marked as
irrelevant for those queries that contain word Tj .
For a given image I, P (Tj |I) (j = 1, . . . , N T) calculated using (19) also form a vectorfor I. We call this vector the user space model of image I, compared to the document
space model of image I, which is built from the related features extracted from the web
pages.
-
8/8/2019 CBIR06
21/25
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 151
If we have a large collection of user log data, it is reasonable to say that the information
in the user space model is more accurate than the information in the original document
space model. However, as we have previously stated, few users like to tag all relevant and
irrelevant images in the retrieval result. Hence, the user feedback log is usually not enough
and this causes the user space model to be not as comprehensive as the original document
space model. Therefore, we cannot replace the document space model with the user space
model completely. We choose to integrate the user space model into the original document
space model to improve the accuracy of the final document space model.
For each image I, vector U is the feature in the user space model, and vector D is thetextual feature in the document space model. We simply use the linear combination method
to integrate these two vectors. We use Dnew to denote the updated document space model,
which is calculated using
Dnew = U + (1 ) D, (21)
where is used to adjust the weight between the user space model and the document space
model. Actually, is the confidence of the vector U in the user space model. In ourapproach, if the vector in the user space model is accurate and comprehensive enough, we
can assign with a value very close to 1.0. If the vector in the user space model is not
accurate and not comprehensive enough, the value of should be relatively small. The
times that an image is marked in the feedback by the user can be used to determine the
value of for this image. Obviously, if an image is marked in user feedback more times
than another image, the feedback information of this image should be more accurate and
comprehensive than the other image. The confidence of its vector U in the user space
model should thus be higher for this image than that for the other image and we can assigna bigger for this image than for the other image.
Since irrelevant images are also recorded in the user feedback log, we can also utilize
this information. For each irrelevant image Iii , we use P (Iii |Tj ) as the confidence that Iiiis irrelevant to query Tj and form a vector I. We denote Dfinal to the text feature vectorof the image in the final document space model and calculate it using (22), similar to the
TFIDF method:
Dfinal = Dnew
1 I
. (22)
4.3. Experiments
Based on the proposed architecture, a demo system of image search engine, called iFind,
has been developed in Microsoft Research Asia. The graphic interface is shown in Figure 7.
The search options that iFind supports include:
Keyword-based search. One can type in one or more keywords, such as girl, in thetextbox and start the retrieval. One will see some images displayed in several pages in
the browse mode.
-
8/8/2019 CBIR06
22/25
152 ZHANG ET AL.
Figure 7. iFind user interface.
Query by example. If the Similar hyperlink under an image is selected, the systemwill retrieve some similar images that are semantically/visually similar to the example
image.
Relevance feedback. The system will improve the performance of retrieval after the userprovides some positive and/or negative examples. One is promised to get much better
result after several iterations of feedback.
Log mining. The retrieval performance of the system will be greatly improved afteroff-line log mining process. The user could benefit from other users usages.
To illustrate improvement brought by log mining in image search, we show here some
evaluation results based on three system configurations: (1) the baseline system, which
provides only query and retrieval; (2) the feedback system, which can provides user feed-
back as well as the baseline functionality; (3) the full configuration including user log
mining.
In our experiments, we have selected more than 2000 representative image websites. Theintelligent crawler is used to collect the images from these hyperlinks. All related semantic
features, including image filenames, ALT texts, surrounding texts, and page titles, as well
the low-level visual features are also extracted using the feature extractor at the same time.
The images are stored in the database and indexed with their textual and visual features. In
total, we have collected more than 30,000 images from these websites. It is difficult for us
to calculate the recall of the system because it is a tedious job to browse the entire image
database and specify the ground truth manually. Therefore, we only choose 17 queries
-
8/8/2019 CBIR06
23/25
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 153
Figure 8. The average precisionrecall curve of the systems retrieval performance for all queries.
to demonstrate the performance of the system. Furthermore, the calculation of recall is
roughly estimated after scanning the top 1000 images returned for each query. The selected
queries are: Clinton, Jordan, car, flower, tree, cat, submarine, mars, spring, galaxy, movie
star, potato, ship, space, tomb raider, female, and mountain. Figure 8 shows the average
precisionrecall for all queries.
Although the feedback from a single user is limited in our experiments, multiple users
feedbacks are accumulated and stored in the user log. The user space model is constructed
from the user log and used to improve the document space model and to improve the
retrieval performance. The system performance of applying log mining is represented in
the dash-dotted line in Figure 8 for the corresponding cases. As we can see from these
figures, the log mining not only improves the precision when the recall is low, but can also
improve the precision when the recall is high. In other words, the overall performance of
the system is improved after log mining.
5. Conclusions
In this paper, we have discussed in detail relevance feedback technologies in content-based
image retrieval systems. The key issues and representative algorithms in relevance feed-
back in CBIR are reviewed. We have presented a framework of integrated relevance feed-
back and semantic learning in content-based retrieval. Our method utilizes both semantic
and low-level feature properties of every feedback image in refine the retrieval, while inthe meantime, learning semantic annotations of each image. While the user is interacting
with the system by providing feedbacks in a query session, a progressive learning process
is activated to propagate the keyword annotation from the labeled images to un-labelled
images so that more and more images are implicitly labeled by keywords at certain prob-
abilities. In this way, more and more images are implicitly labelled by keywords by the
semantic propagation process. Thus, such process will improve the retrieval performance
in future, either query by image examples or by keywords. Furthermore, we extended
-
8/8/2019 CBIR06
24/25
154 ZHANG ET AL.
the framework in a web image search engine by incorporating user log mining in refining
search accuracy. This new framework makes the image retrieval system to be superior over
either the classical CBIR or text-based systems.
Publishers note
This article is based on the original conference paper published by Kluwer Academic Pub-
lishers in Visual and Multimedia Information Management, edited by Xiaofang Zhou and
Pearl Pu. ISBN: 1-4020-7060-8. 2002 by International Federation for Information
Processing.
References
[1] C. Buckley and G. Salton, Optimization of relevance feedback weights, in Proceedings of SIGIR95,
1995.
[2] S. K. Chang, C. W. Yan, D. C. Dimitroff, and T. Arndt, An intelligent image database system, IEEE
Transactions on Software Engineering 14(5), 1988.
[3] Z. Chen, W. Liu, C. Hu, M. Li, and H. J. Zhang, iFind: A web image search engine, in Proceedings of
SIGIR2001, 2001.
[4] Z. Chen, W. Liu, F. Zhang, M. Li, and H. J. Zhang, Web mining for web image retrieval, Journal of the
American Society for Information Science and Technology 52(10), August 2001, 831839.
[5] I. J. Cox, T. P. Minka, T. V. Papathomas, and P. N. Yianilos, The Bayesian image retrieval system,
PicHunter: Theory, implementation, and psychophysical experiments, IEEE Transactions on Image
Processing, Special Issue on Digital Libraries, 2000.
[6] M. Flickner, H. Sawhney, W. Niblack et al., Query by image and video content: The QBIC system, IEEE
Computer Magazine 28, 1995, 2332.[7] J. Huang, S. R. Kumar, and M. Metra, Combining supervised learning with color correlograms for content-
based image retrieval, in Proceedings of ACM Multimedia95, November 1997, pp. 325334.
[8] Y. Ishikawa, R. Subramanya, and C. Faloutsos, Mindreader: Query databases through multiple examples,
in Proceedings of the 24th VLDB Conference, New York, 1998.
[9] F. Jing, M. Li, H. J. Zhang, and B. Zhang, An effective region-based image retrieval framework, in
Proceedings of ACM Multimedia 2002, Juan-les-Pins, France, December 16, 2002.
[10] J. Laaksonen, M. Koskela, and E. Oja, PicSOM: Self-organizing maps for content-based image retrieval,
in Proceedings of International Joint Conference on NN, July 1999.
[11] C. Lee, W. Y. Ma, and H. J. Zhang, Information embedding based on users relevance feedback for image
retrieval, in Proceedings of SPIE International Conference on Multimedia Storage and Archiving Sys-
tems IV, Boston, 1922 September 1999.
[12] Y. Lu et al., A unified framework for semantics and feature based relevance feedback in image retrieval
systems, in Proceedings of ACM MM2000, 2000.
[13] S. D. MacArthur, C. E. Brodley, and C.-R. Shyu, Relevance feedback decision trees in content-based image
retrieval, in IEEE Workshop on Content-Based Access of Image and Video Libraries, 2000, pp. 6872.[14] T. Minka and R. Picard, Interactive learning using a Society of Models, Pattern Recognition 30(4), 1997.
[15] T. Mitchell, Machine Learning, McGraw-Hill, 1997.
[16] J. J. Rocchio Jr., Relevance feedback in information retrieval, in The SMART Retrieval System: Experi-
ments in Automatic Document Processing, ed. G. Salton, Prentice-Hall, 1971, pp. 313323.
[17] Y. Rui and T. S. Huang, A novel relevance feedback technique in image retrieval, in Proceedings of 7th
ACM Conference on Multimedia, 1999.
[18] Y. Rui, T. S. Huang, and S. Mehrotra, Content-based image retrieval with relevance feedback in MARS,
in Proceedings of IEEE International Conference on Image Processing, 1997.
-
8/8/2019 CBIR06
25/25
RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 155
[19] G. Salton, Automatic Text Processing, Addison-Wesley, Reading, MA, 1989.
[20] G. Salton and M. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, 1983.
[21] S. Sclaroff, L. Taycher, and M. L. Cascia, ImageRover: a content-based image browser for the World Wide
Web, Technical Report 97-005, Boston University CS Dept., 1997.
[22] H. T. Shen, B. C. Ooi, and K. L. Tan, Giving meanings to WWW images, in Proceedings of ACM
MM2000, 2000, pp. 3948.
[23] Z. Su, S. Li, and H. J. Zhang, Extraction of feature subspaces for content-based retrieval using relevance
feedback, in ACM Multimedia 2001, Ottawa, Canada, 2001.
[24] Z. Su, H. J. Zhang, and S. Ma, Relevant feedback using a Bayesian classifier in content-based image
retrieval, in SPIE Electronic Imaging 2001, San Jose, CA, January 2001.
[25] K. Tieu and P. Viola, Boosting image retrieval, in IEEE Conference on Computer Vision and Pattern
Recognition, 2000.
[26] S. Tong and E. Chang, Support vector machine active leaning for image retrieval, in ACM Multimedia
2001, Ottawa, Canada, 2001.
[27] N. Vasconcelos and A. Lippman, Learning from user feedback in image retrieval systems, in NIPS99,
Denver, CO, 1999.
[28] P. Wu and B. S. Manjunath, Adaptive nearest neighbour search for relevance feedback in large image
database, in ACM Multimedia Conference, Ottawa, Canada, 2001.
[29] Y. Wu, Q. Tian, and T. S. Huang, Discriminant EM algorithm with application to image retrieval, in IEEE
CVPR, South Carolina, 2000.
[30] H. J. Zhang and D. Zhong, A scheme for visual feature based image indexing, in Proceedings of
IS&T/SPIE Conference on Storage and Retrieval for Image and Video Databases III, 1995, pp. 3646.