CBIR06

download CBIR06

of 25

Transcript of CBIR06

  • 8/8/2019 CBIR06

    1/25

    World Wide Web: Internet and Web Information Systems, 6, 131155, 2003

    2003 Kluwer Academic Publishers. Manufactured in The Netherlands.

    Relevance Feedback and Learning in Content-Based

    Image Search

    HONGJIANG ZHANG, ZHENG CHEN, MINGJING LI and ZHONG SU [email protected]

    Microsoft Research Asia, 49 Zhichun Road, Beijing 100080, China

    Abstract

    A major bottleneck in content-based image retrieval (CBIR) systems or search engines is the large gap between

    low-level image features used to index images and high-level semantic contents of images. One solution to this

    bottleneck is to apply relevance feedback to refine the query or similarity measures in image search process.

    In this paper, we first address the key issues involved in relevance feedback of CBIR systems and present a

    brief overview of a set of commonly used relevance feedback algorithms. Almost all of the previously proposed

    methods fall well into such framework. We present a framework of relevance feedback and semantic learning

    in CBIR. In this framework, low-level features and keyword annotations are integrated in image retrieval and in

    feedback processes to improve the retrieval performance. We have also extended framework to a content-based

    web image search engine in which hosting web pages are used to collect relevant annotations for images and

    users feedback logs are used to refine annotations. A prototype system has developed to evaluate our proposed

    schemes, and our experimental results indicated that our approach outperforms traditional CBIR system and

    relevance feedback approaches.

    Keywords: image retrieval, relevance feedback, machine learning, web mining

    1. Introduction

    The popularity of digital images is rapidly increasing due to improving digital imaging

    technologies and convenient availability facilitated by the Internet. However, how to find

    user-intended images from the Internet is still non-trivial. The main reason is that web

    images are usually not well annotated using semantic descriptors. The development his-

    tory of image retrieval systems features two stages. The first stage is keyword-based image

    retrieval, which is summarized by Chang et al. [2]. Since manual image annotation is

    a tedious process, it is practically impossible to annotate all the images on the Internet.

    Furthermore, due to the multiplicity of contents in a single image and the subjectivity of

    human perception, it is also difficult to make exactly the same annotations to the sameimage by different users. These difficulties have limited the applications of the keyword-

    based image retrieval technology. Having been actively researched on in the last decade

    [6,30], content-based image retrieval (CBIR) attempts to automate the process of indexing

    or annotating image in image databases. CBIR approaches work with descriptions based

    on inherent properties of images, such as color, texture and shape. However, despite all

    This paper is based on the invited keynote that first author gave in VDB2002, Brisbane, Australia, May 2002.

  • 8/8/2019 CBIR06

    2/25

    132 ZHANG ET AL.

    the research efforts, the retrieval accuracy of todays CBIR algorithms is still very limited.

    In addition to many other difficulties, the bottleneck is the gap between low-level image

    features and semantic image contents. This problem stems from the fact that visual sim-

    ilarity measures, such as color histograms, in general do not necessarily match semantics

    of images human subjectivity. Also, each type of visual feature tends to capture only one

    aspect of image property and it is usually hard for a user to specify clearly how different

    aspects are combined to form an optimal query. To make the problem even worse, people

    often have different semantic interpretations of the same image. Even the same person

    may have different perception about the same image at different times. To address this

    bottleneck, interactive relevance feedback techniques have been proposed. The key idea

    is that we should incorporate human perception subjectivity into the retrieval process and

    provide users opportunities to evaluate retrieval results and automatically refine queries on

    the basis of those evaluations. In the last few years, this research topic has become the

    focus in CBIR research community.

    Relevance feedback, originally developed for textual document retrieval [16], is a super-

    vised active learning technique used to improve the effectiveness of information systems.

    The main idea is to use positive and negative examples from the user to improve system

    performance. For a given query, the system first retrieves a list of ranked images according

    to a predefined similarity metrics. Then, the user marks the retrieved images as relevant

    (positive examples) to the query or not (negative examples). The system will refine the

    query based on the feedback and retrieves a new list of images and presents to user. Hence,

    the key issue in relevance feedback is how to incorporate positive and negative examples

    to refine the query and/or to adjust the similarity measure.

    In this paper, we present a content-based image retrieval framework that integrate low-

    level and semantic-based image similarities and support automated annotation throughlearning from relevance feedback, and the extension of the framework in a web image

    search engine. Instead of detailed description of the novel component algorithms, we fo-

    cus our description on the key ideas in the framework. Details of the algorithms and the

    framework implementation can be found in the reference [4,9,12,23,24]. Also, we want

    the paper to serve as a reference on the current state of the art of CBIR relevance feed-

    back research, a comprehensive survey is presented in this paper on relevance feedback

    algorithms in terms of their natures and limitations.

    There are many issues in relevance feedback approaches CBIR, such as learning

    schemes, feature selection, index structure and scalability. Instead of giving an exhaus-

    tive survey of each published relevance feedback algorithms for CBIR in term of their

    advantages and limitations, we focus our discussions with the consideration that relevance

    feedback in CBIR is a small sample machine learning problem and extend our descriptionin detail in respect to learning and searching natures of each algorithm. This is presented

    in Section 2.

    In Section 3, we present the integrated relevance feedback framework for framework

    for CBIR. In this framework, while the user is interacting with the system by providing

    feedbacks in a query session, a progressive learning process is activated to propagate the

    keyword annotations from the labeled images to un-labeled images as the system refines

    the retrieval. The knowledge learned in the relevance feedback sessions are accumulated

  • 8/8/2019 CBIR06

    3/25

    RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 133

    in a semantic network. In addition, a cross-modality query expansion scheme is imple-

    mented to improve the retrieval performances significantly either a query is initiated with

    a keyword or an example image.

    The proposed framework has been further extended to a web image search system, as

    presented in Section 4. In this extension, we combine visual features and text descriptors

    initially extracted from web pages where images exist, such as image URLs, filenames,

    page titles, ALT text, hyperlinks, and surrounding text. Those visual and texture features

    build the document space model of images. However, the initial text descriptors are in

    general are less accurate than annotating text and there are often mismatch between the

    page authors expression and the users understanding and expectation of the annotation.

    To overcome these problems, we apply the proposed relevant feedback framework. Fur-

    thermore, data mining technology is also applied on the users log of feedback to improve

    the image retrieval performance in two aspects. Firstly, the original document space model

    built from the images and the text content of the web pages can be analyzed to detect and

    removeclutter and irrelevant text information. Secondly, the user space model, which is the

    keyword vectors used by the users to represent images in the database, can be constructed

    from the users log data of relevance feedback. The user space model is then combined

    with the document space model to eliminate mismatch between the page authors expres-

    sion and the users understanding and expectation.

    2. Relevance feedback algorithms

    In this section, we review a set of relevance feedback approaches used in CBIR. The review

    is focused on learning and searching natures of each relevance feedback algorithm as we

    consider that relevance feedback in CBIR is a machine learning problem. We begin the

    discussion by first providing an overview of classical relevance feedback approaches in

    CBIR.

    2.1. Classical algorithms

    The early relevant feedback schemes for CBIR were mainly adopted from those developed

    for classical textual document retrieval. These approaches can be classified into two ap-

    proaches: query point movement (query refinement) and reweighting (similarity measure

    refinement) [1]. Both of them were developed based on vector space model, the most

    popular model used in information retrieval [20].

    The query point movement method essentially tries to improve the estimate of the ideal

    query point by moving it towards positive example points and away from bad example

    points in the query space. There are various ways to update the query. The frequently used

    technique to iteratively improve this estimation is the Rocchios formula given below for

    sets of relevant documents DR and non-relevant documents DN given by the user [16]:

    Q = Q +

    1

    NR

    iDR

    Di

    1

    NN

    iDN

    Di

    , (1)

  • 8/8/2019 CBIR06

    4/25

    134 ZHANG ET AL.

    where , , and are suitable constants; NR and NN are the number of documents in

    DR and DN, respectively. This technique is also referred as learning query vector. It was

    implemented in the MARS system [18] by replacing the document vector with visual fea-

    ture vectors. Experiments show that retrieval performance can be improved considerably

    by using such relevance feedback approaches.

    The basic idea behind the re-weighting method is to enhance the importance of the di-

    mensions of a feature that help in retrieving the relevant images and reduce the importance

    of those dimensions that hinder this process. This is achieved by updating the weights of

    feature vectors in the distance metric. Considered a weighted metric defined as

    D =

    j [N]

    jX

    (1)j

    X(2)j

    . (2)

    When an image of the query result is labeled as a positive example, the feature components

    that contribute more similarity to the match is considered more important, while the com-

    ponents with less contribution is considered to be less important. Therefore, the weight for

    a feature component, i , is updated in the following way:

    i = i

    1 + i

    , =f(Q) fA+j , (3)

    where is the mean of . On the other hand, if an image is labeled as a negative exam-

    ple, the feature components that contribute more to the match should be considered to be

    depressed. That is, the weight is updated as:

    i = i 1 + i. (4)This technique is also referred as learning the metric. This approach was implemented

    proposed by Huang et al. [7]. The MARS system implemented a slight refinement to the

    re-weighting method called the standard deviation method [18].

    Instead of updating the individual components of a distance metric, we can also begin

    with a set of predefined distance metrics and use relevance feedback to automatically select

    the best one in the retrieval process. For instance, in ImageRover system [21], appropriate

    Lp Minkowski distance metrics are automatically selected to minimize the mean distance

    between the relevant images specified by the user.

    Another relevance feedback approach, proposed by Minka and Picard, is to update the

    query space by selecting feature models. It is assumed that each feature model has its

    own strength in representing a certain aspect of image content, and thus, the best way for

    effective content-based retrieval is to utilize a society of models. This approach uses alearning scheme to dynamically determine which feature model or combination of models

    is best for subsequent retrieval.

    Recently, more computationally robust methods that perform global feature optimization

    have been proposed. The MindReader retrieval system designed by Ishikawa et al. [8]

    formulates a minimization problem on the parameter estimating process. Unlike traditional

    retrieval systems whose distance function can be represented by ellipses aligned with the

    coordinateaxis, the MindReader system proposed a distance function that is not necessarily

  • 8/8/2019 CBIR06

    5/25

    RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 135

    aligned with the coordinate axis. Therefore, it allows for correlations between attributes in

    addition to different weights on each component.

    A further improvement over the MindReader approach is given in [17]. In this ap-

    proach, optimal query estimation and weighting functions are derived in a unified frame-

    work. Based on the minimization of total distances of positive examples from the revised

    query, the weighted average and a whitening transform in the feature space were found to

    be the optimal solutions. In more detail, assume that a query vector component qi corre-

    sponds to the ith feature, an N element vector r = [r1, . . . , rN] represents the degree ofrelevance for each of the N input training samples, and there is a set of N training vec-

    tors xni for each feature i. It is derived that the ideal query vector qi for feature i is the

    weighted average of the training samples for feature i given by

    qTi =RTXiN

    n=1 rn, (5)

    where Xi is the N Ki training sample matrix for feature i, obtained by stacking the Ntraining vectors xni into a matrix. It is interesting to note that the original query vector qidoes not appear in (5). This shows that the ideal query vector with respect to the feedbacks

    is not influenced by the initial query.

    The optimal weight matrix Wi is given by

    Wi =

    det(Ci )1/Ki

    C1

    i , (6)

    where Ci is the weighted covariance matrix of Xi . That is,

    Cirs =N

    n=1n(xnir qir )(xnis qis )N

    n=1 n, r, s = 1, . . . , Ki . (7)

    We can see from the above equations that the critical inputs into the system are training

    vectors xni and the relevance matrix R. In this algorithm, initially, the user needs to input

    these data to the system. Another issue with this algorithm is that negative examples are

    not utilized in updating of query and similarity.

    2.2. Relevance feedback as a learning process

    Relevance feedback can be considered as a leaning problem a user provides feedback

    examples from the retrieval results of a query and system learns from such examples to

    refine retrieval results. The original query-movement method represented by the Roc-chios formula and reweighting method [16] is both simple learning methods. According

    to Mitchells [15] definition, machine learning are concerned with the question of how to

    construct computer programs that automatically improve with experience. In this view, any

    task that could be improved with respect to certain performance measure based on some

    experience can be considered the machine-learning task. In CBIR, relevance feedback is

    a task to improve the retrieval performance and the experience here is feedback examples

    provided by the users. Hence, classical machine-learning methods, such as decision tree

  • 8/8/2019 CBIR06

    6/25

    136 ZHANG ET AL.

    learning [13], artificial neural networks [10], Bayesian learning [5,27], and kernel based

    learning [26] can be and have been applied to relevance feedbacks in CBIR. However, as

    users are usually reluctant to provide a large number of feedback examples, the number of

    training samples is very small, typically less than ten in each round of feedback session. On

    the contrary, feature dimensions in CBIR systems are usually high. Hence, the crucial is-

    sue in performing relevance feedback in CBIR systems is how to learn from small training

    samples in a very high dimension feature space. This fact makes many learning methods,

    such as decision tree learning and artificial neural networks, not suitable for CBIR.

    The key issues in addressing relevance feedback in CBIR as a small sample learning

    problem include: How to learn fast from small sets of feedback samples to improve re-

    trieval accuracy effectively; How to accumulate knowledge learned from feedback; and

    How to integrate low-level visual and high-levelsemantic features in query. However, most

    of the published works have been focused on the first issue. Compared with other learn-

    ing methods, Bayesian learning shows its advantages in addressing the first issue above

    and almost all aspects of Bayesian learning have been touched in researching for effective

    learning algorithms.

    Vasconcelos and Lippman [27] treated feature distribution as a Gaussian mixture and

    used Bayesian inference for learning during feedback iterations in a query session. Richer

    information captured by the mixture model also makes image regional matching possible.

    The potential problems of their methods are computing efficiency and complex data model

    that leads to too many parameters need to be estimated with very limited samples.

    To speed up the learning process so the retrieval result can be converged faster to users

    satisfaction, active learning methods have been used to actively select samples in order to

    achieve the maximal information gain, or the minimized entropy/uncertainty in decision-

    making. The approached proposed in [5] used Monte Carlo sampling in search of theset of sample that will minimize the expected number of future iterations. In estimating

    the expected number of future iterations, entropy is used as an estimate of the number of

    future iterations under the ambiguity specified by the current probability distribution of the

    target image over all test images. Tong and Chang [26] proposed a SVM active learning

    algorithm to select the sample to maximally reduce the size of the vector space in which

    the class boundary lies. Without knowing a priori the class of a candidate, the best strategy

    is to halve the search space each time. They attempted to justify that selecting the points

    near the SVM boundary can approximately achieve this goal, and it is more efficient than

    other more sophisticated schemes, which require exhaustive trials on all the test items.

    Therefore, in their work, the points near the SVM boundary are used to approximate the

    most-informative points; and the most-positive images are chosen as the ones farthest from

    the boundary on the positive side in the feature space.Some researchers consider relevance feedback process in CBIR as a pattern recognition

    or classification problem. Under such a consideration, the positive and negative exam-

    ples provided by user can be treated as training examples and a classifier could be trained.

    Then, such classifier can separate all data set into relevant and irrelevant groups. It seemed

    that many existing pattern recognition tools could be adopted for this task and many kinds

    of classifiers have been experimented, such as linear classifier [29], nearest-neighbor clas-

    sifier [28], Bayesian classifier [24], support vector machines (SVM) [26], and so on. In

  • 8/8/2019 CBIR06

    7/25

    RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 137

    this category, the most popular algorithm is represented by [26] where SVM classifier is

    trained to divide the positive and negative examples. Then such SVM classifier will clas-

    sify all images in database into two groups: relevant and irrelevant groups to a given query.

    However, in most cases of CBIR, there is no predefined class structure. From application

    point of view, such classification-based methods may improve the retrieval performance in

    some constrained contexts; but they will be limited when applied to general purpose image

    databases.

    2.3. Feature versus semantics in relevance feedback

    All the approaches described in above perform relevance feedback at the low-level featurevector level by basically replacing keywords with features when adopting the vector space

    model developed for document retrieval. While these approaches do improve the perfor-

    mance of ICBR, there are severe limitations. The inherent problem is that the low-level

    features are often not as powerful in representing complete semantic content of images as

    keywords in representing text documents. Furthermore, users often pay more attention to

    the semantic content (or a certain object/region) of an image than to the background and

    other, the feedback images may be similar only partially in semantic content, but may vary

    largely in low-level features. Hence, using low-level features alone may not be effective in

    representing users feedbacks and in describing their intentions.

    In addition, there are typically two different modes of user interactions involved in image

    retrieval systems. In one case, the user types in a list of keywords representing the semantic

    contents of the desired images. In the other case, the user provides a set of examplesimages as the input and the retrieval system will retrieve other similar images. In most

    image retrieval systems, these two modes of interaction are mutually exclusive. However,

    combining these two approaches and allowing them to benefit from each other will yield a

    great deal of advantage in terms of both retrieval accuracy and ease of use of the system.

    There have been efforts on incorporating semantics in relevance feedback for image

    retrieval. The framework proposed in [11] (to be discussed later in more detail in this

    section) attempted to embed semantic information into a low-level feature based image re-

    trieval process using a correlation matrix. The FourEye system by Minka and Picard [14]

    and the PicHunter system by Cox et al. [5], made use of hidden annotation through learn-

    ing process. However, they excluded the possibility of benefiting from good annotations,

    which may lead to a very slow convergence.

    In terms of feature selection, unlike most CBIR systems that use image features such

    as color histogram or moments, texture, shape, and structure features, Tieu and Viola [25]

    used a boosting technique to learn a classification function in a feature space of more than

    45,000 features. The features were demonstrated to be sparse with high kurtosis, and were

    argued to be expressive for high-level semantic concepts. Weak 2-class classifiers were

    formulated based on Gaussian assumption for both the positive and negative (randomly

    chosen) examples along each feature component, independently. The strong classifier is

    then a weighted sum of the weak classifiers as in AdaBoost.

  • 8/8/2019 CBIR06

    8/25

    138 ZHANG ET AL.

    The framework to be discussed in Section 3 integrates both semantics and low-level

    features into the relevance feedback process in a new way. Only when the semantic infor-

    mation is not available, the method is reduced to one of the previously described low-level

    feedback approaches as a special case.

    2.4. Relevance feedback with memory

    A disadvantage in the classic relevance feedback as well as many learning based

    approaches discussed above is that the captured knowledge in the relevance feedback

    processes in one query session or one learning step is not memorized to continuously im-

    prove the retrieval accuracy. That is, even with the same query, a user will have to go

    through the same, often tedious, feedback process to obtain the same result, despite the

    fact the user has given the same query and feedbacks before. Strictly speaking, there is no

    learning or only limited learning in such systems as there is no knowledge accumulation

    across different query sessions. To overcome these limitations, another school of ideas is to

    using learning approaches to memorize users subjectivities in relevance feedback process.

    The challenge in this approach is how to memorize knowledge learned and how to handle

    the inconsistency of content subjectivities across difference users and/or across different

    query sessions of the same user.

    The approach proposed in [11] was the first attempt to explicitly memorize learned se-

    mantic information to improve CBIR performance. The basic idea of this approach is to

    accumulate semantic relevance between image clusters learnt from users feedback in cor-

    relation network. In other words, a correlation network is used to memorize. Figure 1

    illustrates the correlation network. Mathematically, the correlation network is representedby a correlation matrix, M, defined as below:

    M =

    w11 w12 . . . w1Nw21 w22 . . . w2N

    ......

    . . ....

    wN1 wN2 . . . wNN

    , (8)

    where the weight or coefficient, wij , represents the semantic correlation between images

    in cluster i and j .

    The system works a follows. First, all images in a database are clustered into N clusters

    based on visual feature similarity using, for instance, k-means algorithm. Obviously, the

    images in each cluster initially are only similar in term of the selected visual features, like

    in a typical CBIR system. Also, initially, all correlation coefficients between each two

    clusters are set to zero, meaning only images within the same cluster are correlated and

    images across clusters are uncorrelated. That is, the initial matrix is a unit one,

    M0 = INN. (9)

    Then, for a given query, the initial retrieval is based on visual features. Assume that after

    a given iteration, n + m images are displayed, and n images are marked relevant and

  • 8/8/2019 CBIR06

    9/25

    RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 139

    Figure 1. Correlation network to memorize semantic correlations between image groups.

    m irrelevant. The relevant as well as irrelevant images may or may not be from difference

    clusters. This approach memorizes such feedbacks by updating the correlation matrix as

    below:

    Mt = Mt1 +

    mi=1

    F (q)F (pi )T

    ni=1

    F (q)F (ni )T, (10)

    where q is the feature vector of the query, pi and ni are feature vectors of positive and

    negative feedback samples, and F(x) is a transform function used to determine the update

    magnitude based on the feedback samples. In this way, the correlation between the clus-

    ter where the query original falls in and these the positive samples fall in are increased,

    progressively embedding the information on semantic correlations between images. This

    correlation is then used in subsequent retrievals in which not only the visual features, but

    also the semantic correlations are used in determine the similarity of an image to the query.

    Experiments have shown that such a progressive learning approach effectively utilizes the

    knowledge learnt from previous queries to reduce the number of iterations to achieve high

    retrieval accuracy [11].

  • 8/8/2019 CBIR06

    10/25

    140 ZHANG ET AL.

    Also, if there are two distinct groups in one initial cluster which semantically dissimilar,

    meaning that they are negative examples to each other, a splitting is performed to spit the

    initial cluster into two clusters. On the other hand, based on feedbacks, when two clusters

    that are close in features space and have high correlation between them according to M, the

    two initial clusters could be merged into one. That is, the correlation network dynamically

    updates its structure in addition to updating the correlation matrix as learning from user

    feedback.

    2.5. Log mining in relevance feedback

    More recently, people are aware of the fact that the Web is a rich resource of image data and

    some of their semantics is usually available on the same web documents. Shen et al. [22]exploit such reality and use some natural language processing technique to obtain semantic

    features from the web text to characterize the web images. Hence, they are able to find

    relevant images from the web using text-based queries. In our work of web image search

    engine, we also use the web pages as the potential sources of semantics. There are two

    kinds of difference for two systems. First difference is in the natural language processing

    approach to obtaining semantic features. They use a so-called weighted chain-net, which

    is actually a lexical chain, to represent the document space model for images, while our

    document space model of all media objects is simply a vector space model, which is an

    effective approach and has widely been used in traditional information retrieval. Other

    natural language processing methods, such as, proper noun identification, are also used to

    extract semantic features. Another difference is that our system exploits relevant feedback

    and data mining on the users feedback logs to update the document space model. So ourapproach outperforms traditional CBIR system and relevance feedback approaches.

    3. An integrated relevance feedback framework

    As discussed in Section 2, an effective relevance feedback system should provide effective

    solutions to learning effectively from small sets of feedback samples, accumulating learned

    knowledge and integrating low-level visual and high-level semantic features in query and

    feedbacks to achieve high retrieval accuracy.

    In addition, there typically are two different modes of user interactions involved in image

    retrieval systems. In one case, the user types in a list of keywords representing the semantic

    contents of the desired images. In the other case, the user provides a set of examples images

    as the input and the retrieval system will try to retrieve other similar images. In most imageretrieval systems, these two modes of interaction are mutually exclusive. We argue that

    combining these two approaches and allow them to benefit from each other yields a great

    deal of advantage in terms of both retrieval accuracy and ease of use of the system.

    To address all of above-mentioned issues, a CBIR framework with integrated relevance

    feedback and query expansion was proposed [9,12,23,24]. Figure 2 illustrates the proposed

    CBIR framework. It consists of a semantic network which links images to semantic an-

    notations in a database, a similarity measure that integrating both semantic features and

  • 8/8/2019 CBIR06

    11/25

    RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 141

    Figure 2. The proposed framework of integrated relevance feedback and query expansion.

    Figure 3. Semantic network.

    image features, and a machine learning algorithm to iteratively update the semantic net-

    work and to improve the systems performance over time. The system supports both query

    by keyword and query by image example through semantic network and low-level fea-

    ture indexing. More importantly, the learning process propagates the keyword annotations

    from the labeled images to unlabeled ones during the feedback. In this way, more and more

    images are implicitly labeled by keywords by the semantic propagation process. This an-

    notation propagation process also helps the system in accumulating knowledge learned to

    improve performance of future retrieval requests.

    3.1. Semantic network

    The semantic network is a two-layered structure. The top layer is represented by a set of

    keywords having links to the images in the database. It can be considered an extension

    of the initial information embedding idea in the system shown in Figure 1. The degree

    of relevance of the keywords to the associated images semantic content is represented as

    the weight on each link, as shown pictorially in Figure 3. This layer is what we need in

  • 8/8/2019 CBIR06

    12/25

    142 ZHANG ET AL.

    keyword relevance feedback and will be updated during the semantic propagation. Bottom

    layer is a keyword thesaurus to construct the connection between different keywords.

    The initial weights can be obtained by manual labeling. In our web image search engine,

    they are initially extracted from the following sources on the web page that contains the

    image based according to some empirical rules.

    1. Image filename and URL. We assume that web page authors/editors usually assign

    meaningful filenames to images in a web page. Some heuristic rules are used to extract

    the keywords from the filenames. First, the filename is segmented into meaningful key-

    words based on pre-define dictionary. For example, filename redflower.jpg includes

    two semantic words: red and flower. Then, the clutter letters in filenames, such

    as digits, hyphens, filename extension, etc., are discarded. We also extract semantic

    keywords from the URL of the image files. The URL usually represents the hierar-chy information of an image on the web page. For instance, animal and bird are

    useful information in the URL http://www.ditto.com/images/animals/

    anim_birds.jpg . We apply the similar technology of the filename segmentation to

    segment the URL into meaning pieces.

    2. ALT (alternate) text. The ALT text in a web page is used for displaying to replace the

    associated image in a text-based browser. Hence, it usually represents the semantics

    of the image concisely; hence, it is a very relevant feature to represent the semantic

    meaning of the images.

    3. Surrounding text. In web pages, images are used to enhance the content that the editors

    want to present. Hence, some texts in the surrounding areas are semantically relevant

    to the content of the image. However, it is difficult to judge which area among all

    of the four possible areas (above, below, left, right) is the most relevant to the image.Therefore, in our prototype, all of the four areas are chosen as the sources of the text

    features for the image. This feature will be refined by log mining on the users relevant

    feedback logs as discussed in Section 4.

    4. Page title. The page title is a good candidate of the text feature of images in a web page.

    5. Other information. Image hyperlinks, anchor text, etc., are also candidates of text fea-

    tures of the images.

    The initial value of weight wij associated with each keyword of an image is calculated

    by the TF*IDF method [19]. That is, a feature vector is used to represent the all keywords

    of an image and the vector is defined as

    Dih = TFi IDFi

    =

    ti1 log

    Nn1

    , . . . , t ij log

    Nnj

    , . . . , t im log

    Nnm

    , (11)

    where Dih is the feature vector, with each component value corresponding to the initial

    weight assigned to the association of a keyword to an image i. tij stands for the frequency

    of keyword j appearing in the text description of the image i. nj is the number of images

    that are characterized by keyword j . N is the total number of images. Of course, if no

    keyword information to the image, the corresponding feature vector is set to null.

  • 8/8/2019 CBIR06

    13/25

    RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 143

    With the semantic network, semantic based relevance feedback can be performed rela-

    tively easily compared to its low-level feature counterpart. This is performed by updating

    the weights wij associated with each link shown in Figure 3. The weight updating process

    is described below.

    1. A user submits a query and the system retrieves similar images using cross-modality

    query extension, to be explained later in next subsection.

    2. System collects the positive and negative feedback examples corresponding to the

    query.

    3. For each keyword in the input query, check to see if any of them is not in the keyword

    database. If so, add them into the database without creating any links.

    4. For each positive example, check to see if any query keyword is not linked to it. If so,

    create a link with an initial weight from each missing keyword to this image. For allother keywords that are already linked to this image, increase the weight by a predefined

    value or using the method defined by (10) and (11).

    5. Similarly, for each negative example, check to see if any query keyword is linked with

    it. If so, decrease its weight, until it is zero.

    Through this updating process, the keywords that represent the actual semantic content

    of each image will receive a larger weight. Also, it can be easily seen that as more queries

    are inputted into the system, the system is able to expand its vocabulary. Furthermore,

    a semantic propagation method is used to populate keywords to unlabeled image during

    users feedback iteration, which will be described later in this section.

    3.2. Integrated and cross modality query and retrieval

    The proposed framework has an integrated relevance feedback scheme in which both low-

    level feature based and high-level semantic feedbacks are performed. We define a unified

    metric function G to measure the relevance between query Q and any image j within an

    image database in terms of both semantic and low-level feature content, where Q includes

    the original query and users feedback information:

    G(j,Q) = simk(j,Qk) + (1 ) simf(j,Q

    f), (12)

    where [0, 1] is the weight of the semantic relevance in the overall similarity measure,which can be specified by users. The larger is, the more important the semantic rele-

    vance will play in the overall similarity measurement. simf(j,Qf) and simk(j,Q

    k) are the

    semantic similarity and low-level feature similarity between image j and revised query Q

    ,respectively.

    The revised query Q consists of two parts: the feature-based one Qf and the semantic

    (keyword)-based one Qk. Qf is defined by (3)(5) based on feature vectors of feedback im-

    ages. With the semantic network, simk(j,Qk) can be directly computed with the updated

    weights.

    To further improve the retrieval performance of the proposed framework, a cross-

    modality query expansion method is supported. That is, once a query is submitted in

  • 8/8/2019 CBIR06

    14/25

  • 8/8/2019 CBIR06

    15/25

    RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 145

    3.3. Probabilistic keyword propagation scheme

    As illustrated in Figure 3, the more images are annotated (correctly), the better the system

    retrieval performance will be. However, the reality is human labeling of images is tedious

    and expensive, hence not a feasible solution, which was what motivated CBIR research

    fifteen years ago. To address this issue, a probabilistic progressive keyword propagation

    scheme is proposed in our framework to automatically annotate images in the databases in

    the relevance feedback process utilizing based a small percentage of annotated images.

    We assume that initially only a few images in a database have been manual labeled with

    keywords and the retrieval is performed mainly based on low-level features. As stated be-

    fore, the initial keywords annotation can be from web through the crawler when the images

    are from the Web, or labeled by humans. While the user is interacting with the system by

    providing feedbacks in a query session, a progressive learning process is activated to prop-

    agate the keyword annotation from the labeled images to un-labeled images so that more

    and more images are implicitly labeled by keywords. In this way, the semantic network is

    updated in which the keywords with a majority of user consensus will emerge as the dom-

    inant representation of the semantic content of their associated images. As more queries

    are inputted into the system, the system is able to expand its vocabulary. Also, through

    the propagation process, the keywords that represent the actual semantic content of each

    image will receive a large weight.

    There are two major issues in keyword propagation: which images and which key-

    word(s) should be propagated during a query session. To answer the first question, a

    probability model, based on Bayesian learning, is proposed. We assume that, (1) all pos-

    itive examples in one retrieval session belong to the same semantic class with common

    semantic object(s) or meaning(s); and (2) the features from the same semantic class fol-lows the Gaussian or Mixture Gaussian distributions. Therefore, all positive examples in a

    query session are used to calculate and update the parameters of the corresponding seman-

    tic Gaussian classes. Then, the probability of each image in the database belonging to such

    semantic class is calculated. The common keywords in positive examples are propagated

    to the images with very high probability belonging to this class.

    As we can see, the propagation framework uses the same procedure as the feedback algo-

    rithm in low-level features [23]. The only difference is that for low-level feature feedbacks,

    the calculated probability is used for the ranking of an image in retrieval candidate list,

    while here it is used to determine if an image should be in the propagation candidate list.

    The propagation candidate set S is obtained as follows:

    S = {c1, . . . , ck}, where p(cj ) > , (14)

    where p(cj ) is the probability that image j in the database belonging to such semantic

    class and is a constant threshold which can be estimated by the training process. The

    weight associates with the propagated keyword i and the image j is wij = p(cj ). Morecomplex distribution model, for example, Mixture Gaussian, may be used in this propaga-

    tion framework. However, because the users feedback examples in practice are often very

    few, complex models will leads into much more parameter estimation errors as there are

    more parameters to be estimated.

  • 8/8/2019 CBIR06

    16/25

    146 ZHANG ET AL.

    Also, to determine which keyword(s) should be propagated when an image is associated

    with multiple keywords, there two approaches: using relevance factor defined by (13), or

    using region-based approach [9]. In the former approach, the relevance factor rij can be

    directly used to modify the weight with the propagated keyword. Obviously, the lower the

    relevance of a keyword to an image is, the less weight it will be assigned to the keyword

    in the prorogation, and vice versa. When the region-based approach is used, unlabeled

    images to be propagated are firstly segmented into regions. By analyzing the feature distri-

    bution of the segmented regions, a probability association between each segmented regions

    and annotated keywords is set up for labeled images by region-based relevance feedback

    approach. Then, each keyword of labeled image was assigned to one or several regions of

    the image with certain probabilities. The detail of the region-based feedback framework is

    in [9].

    3.4. Experiment results

    The image set used in evaluating the proposed framework described in this section is the

    Corel Image Gallery of 10,000 images, manually labeled into 79 semantic categories.

    200 random selected images compose the test query set. Whether a retrieved image is

    correct or incorrect is judged according to the ground truth. Three types of color features

    and three types of texture features are used in our system. Feedback process is running as

    follows. Given a query from the test set, a different test image of the same category as the

    query is used in each round of feedback iteration as the positive example for updating the

    Gaussian parameters and revise the query. To incorporate negative feedback, the first two

    irrelevant images are assigned as negative examples. The accuracy is defined as

    Accuracy =relevant images retrieved in top N returns

    N. (15)

    Several experiments have been performed as follows. First, three feature-based feedback

    algorithms are compared. They are: a Bayesian feedback scheme by Su et al. in [23,24],

    the scheme by [27] and scheme by [17] as defined by (5)(7). This comparison is done

    in the same feature space. Figure 4 shows that the accuracy of Bayesian feedback scheme

    (referred as our feedback approach) becomes higher than the other two methods after

    two feedback iterations. This demonstrates that the incorporated Bayesian estimation with

    the Gaussian parameter-updating scheme is able to improve retrieval effectively.

    To demonstrate the performance of the semantic propagation, the following experiment

    was designed. 200 images in the query set were annotated by their category names. Soonly one keyword is associated to one query image and other images in database have no

    keyword annotations. During the test, each query image was used twice. The retrieval

    performance is shown in Figure 5 with comparison to that with the propagation. It is seen

    that for feedback with propagation, the retrieval accuracy is much higher than the original

    one without it. This is because, when a system has propagation ability, latter queries can

    utilize the accumulated knowledge from previous feedback iterations. In other words,

    system has the learning ability and will be smarter with more users interactions.

  • 8/8/2019 CBIR06

    17/25

    RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 147

    Figure 4. Retrieval accuracy for top 100 results in original feature space.

    Figure 5. Retrieval accuracy for top 100 results performance between feedback without propagation and feed-

    back with propagation scheme.

    4. Incorporating log mining in web image search engine

    The architecture of our proposed web image search engine is shown in Figure 6. In addition

    to all components in a CBIR system, the web search engine contains an image crawler and

    three other modules, namely, the log miner, the model updater, and the query updater [3,4].The data organization of the system mainly consists of four parts: the image database that

    also contains metadata of images (i.e., low-level and high-levelfeatures), the users relevant

    feedback log database, the document space model, and the user space model.

    A typical scenario of the system is as follows. The off-line crawler is first employed at

    regular intervals (e.g., once every day at non-peak network traffic hours) to collect potential

    web pages containing images and store them into a local database. The feature extractor is

    then applied to these pages to extract both the low-level visual features and the high-level

  • 8/8/2019 CBIR06

    18/25

    148 ZHANG ET AL.

    Figure 6. Architecture of the proposed web image search engine.

    semantic features for the images appear in these pages. In our system, the crawler and the

    feature extractor actually work simultaneously. An image indexer is applied to the images

    and their features to build the document space model, which is the representation of the

    images in the database using their features. Once the document space model is available,the matcher compares the users query with the document space model of images to yield

    the image retrieval results. Since many irrelevant images may be returned by the retrieval

    system, the user feedback interface is also provided for users to specify whether a returned

    image is relevant or not to the users intents. The image retrieval system can utilize user

    feedbacks to gain an understanding as to the relevancy of certain images and update the

    query or adjust the matcher to return more accurate retrieval results. The users feedback

    log data are also stored in the user log database in the system, from which the log miner

  • 8/8/2019 CBIR06

    19/25

    RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 149

    can find and build the user space model through log analysis. The user space model is

    then combined with the document space model to update the document space model to

    eliminate the mismatch between the page authors expression and the users understanding

    and expectation and can further improve the retrieval accuracy.

    4.1. Document modeling of images

    The document space model in the image search engine combines the low-level visual fea-

    tures and high-level semantic features to index the images on the web. The detail process

    is described as follows.

    To collect images in the web, a crawler (or a spider, which is a program that can automat-

    ically analyze the web pages and download the related pages hyper-linked to the analyzedweb pages) is used to collect images from many web sites. First, we re-arrange the semantic

    network shown in Figure 3 a concept hierarchy of image categories, such as animals, ar-

    chitecture, arts, etc. Then, we select some representative sites to be collected for each

    concept category. For instance, http://www.nba.com for sports, http://www.

    cnn.com for news, http://www.disney.com for entertainment, etc. For each site

    candidate, the crawler collects the images and saves it to a local web page database. We

    then use a simple classifier to classify the images into meaningful and junk (e.g., ban-

    ners, backgrounds, buttons, icons, etc.) categories based on certain information like color

    histograms, image sizes, image file types, etc.

    For each image collected, the initial keywords are assigned in the way as described in

    Section 3.1. In addition, the low-level features of each image are calculated. The keywords

    and low-level features of all collected images form the document space.In the image search process, the overall similarity is simply the linear combination of the

    visual and the textual similarities, as defined in (12). It is not a good idea to set the same

    default weight = 0.5 in (12) to balance the importance of low-level features and high-level features. However, it is very efficient for us to build up the baseline configuration

    of our image retrieval system. The weight is automatically adjusted to a suitable value

    by the system through the users feedback as to the relevancy of certain returned images.

    Moreover, after we collect enough user log information of user feedback, data mining

    technology (which will be presented in the next section) can be applied to find out the

    importance of low-level feature and high-level feature for different concepts/categories.

    For example, we find that for concept Clinton, the high-level features are more important

    than the low-level features, while for concept sunshine, the low-level features are more

    useful than the high-level features.

    4.2. Log mining and feedback

    In order to reduce the ambiguity in the text descriptors extracted from web pages and

    the low-level image features, and to improve the search performance, we have proposed

    a user space model to supplement the original document space model. This is achieved

    by applying a user log analysis process. The user space model is also a vector space

  • 8/8/2019 CBIR06

    20/25

    150 ZHANG ET AL.

    model. The difference between the user space model and the document space model is that

    vectors in the user space model are constructed from the information mined from the user

    feedback log data, not from the original information extracted from the web pages. When

    a user submits a query, our system will return to the user some images found based on the

    original document space model. The user can then use the feedback user interface to tell

    the system about the return images as whether relevant or irrelevant to the query based on

    his/her subjective judgment. Of course, most users do not have the patience and time to

    mark all relevant and irrelevant images in the returned image collection. However, this is

    not a very serious problem because even a small set of feedback images can provide very

    useful information.

    After we get some users feedback log data, the user space model can be built from the

    user log. Let Q be the set of total queries used until now. Let Tj

    (j = 1, . . . , N T

    ) be the

    set of all individual words that appear in Q. (Note that a singe query may contain multiple

    words.) For a query in Q, Iri is one of the relevant images and Iii is one of the irrelevant

    images specified by the user and stored in the user log.

    From the user log, we can easily calculate the probabilities listed below:

    P (Iri ) =Nri

    NQ, (16)

    where Nri is the number of query times that image Iri has been retrieved and marked as

    relevant, and NQ is the total number of queries.

    P (Iri |Tj ) =Nri (Tj )

    NQ(Tj ), (17)

    where Nri (Tj ) is the number of query times that image Iri has been retrieved and markedas relevant for those queries that contain word Tj , and NQ(Tj ) is the number of queries

    that contain Tj .

    P (Tj ) =NQ(Tj )

    NQ. (18)

    Based on the Bayesian theory, we have

    P (Tj |Iri ) =P (Iri |Tj )P(Tj )

    P (Iri ). (19)

    In addition, for irrelevant images in the user log, we have

    P (Iii |Tj ) =

    Nii (Tj )

    NQ(Tj ) , (20)

    where Nii (Tj ) is the number of times that image Iii has been retrieved and marked as

    irrelevant for those queries that contain word Tj .

    For a given image I, P (Tj |I) (j = 1, . . . , N T) calculated using (19) also form a vectorfor I. We call this vector the user space model of image I, compared to the document

    space model of image I, which is built from the related features extracted from the web

    pages.

  • 8/8/2019 CBIR06

    21/25

    RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 151

    If we have a large collection of user log data, it is reasonable to say that the information

    in the user space model is more accurate than the information in the original document

    space model. However, as we have previously stated, few users like to tag all relevant and

    irrelevant images in the retrieval result. Hence, the user feedback log is usually not enough

    and this causes the user space model to be not as comprehensive as the original document

    space model. Therefore, we cannot replace the document space model with the user space

    model completely. We choose to integrate the user space model into the original document

    space model to improve the accuracy of the final document space model.

    For each image I, vector U is the feature in the user space model, and vector D is thetextual feature in the document space model. We simply use the linear combination method

    to integrate these two vectors. We use Dnew to denote the updated document space model,

    which is calculated using

    Dnew = U + (1 ) D, (21)

    where is used to adjust the weight between the user space model and the document space

    model. Actually, is the confidence of the vector U in the user space model. In ourapproach, if the vector in the user space model is accurate and comprehensive enough, we

    can assign with a value very close to 1.0. If the vector in the user space model is not

    accurate and not comprehensive enough, the value of should be relatively small. The

    times that an image is marked in the feedback by the user can be used to determine the

    value of for this image. Obviously, if an image is marked in user feedback more times

    than another image, the feedback information of this image should be more accurate and

    comprehensive than the other image. The confidence of its vector U in the user space

    model should thus be higher for this image than that for the other image and we can assigna bigger for this image than for the other image.

    Since irrelevant images are also recorded in the user feedback log, we can also utilize

    this information. For each irrelevant image Iii , we use P (Iii |Tj ) as the confidence that Iiiis irrelevant to query Tj and form a vector I. We denote Dfinal to the text feature vectorof the image in the final document space model and calculate it using (22), similar to the

    TFIDF method:

    Dfinal = Dnew

    1 I

    . (22)

    4.3. Experiments

    Based on the proposed architecture, a demo system of image search engine, called iFind,

    has been developed in Microsoft Research Asia. The graphic interface is shown in Figure 7.

    The search options that iFind supports include:

    Keyword-based search. One can type in one or more keywords, such as girl, in thetextbox and start the retrieval. One will see some images displayed in several pages in

    the browse mode.

  • 8/8/2019 CBIR06

    22/25

    152 ZHANG ET AL.

    Figure 7. iFind user interface.

    Query by example. If the Similar hyperlink under an image is selected, the systemwill retrieve some similar images that are semantically/visually similar to the example

    image.

    Relevance feedback. The system will improve the performance of retrieval after the userprovides some positive and/or negative examples. One is promised to get much better

    result after several iterations of feedback.

    Log mining. The retrieval performance of the system will be greatly improved afteroff-line log mining process. The user could benefit from other users usages.

    To illustrate improvement brought by log mining in image search, we show here some

    evaluation results based on three system configurations: (1) the baseline system, which

    provides only query and retrieval; (2) the feedback system, which can provides user feed-

    back as well as the baseline functionality; (3) the full configuration including user log

    mining.

    In our experiments, we have selected more than 2000 representative image websites. Theintelligent crawler is used to collect the images from these hyperlinks. All related semantic

    features, including image filenames, ALT texts, surrounding texts, and page titles, as well

    the low-level visual features are also extracted using the feature extractor at the same time.

    The images are stored in the database and indexed with their textual and visual features. In

    total, we have collected more than 30,000 images from these websites. It is difficult for us

    to calculate the recall of the system because it is a tedious job to browse the entire image

    database and specify the ground truth manually. Therefore, we only choose 17 queries

  • 8/8/2019 CBIR06

    23/25

    RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 153

    Figure 8. The average precisionrecall curve of the systems retrieval performance for all queries.

    to demonstrate the performance of the system. Furthermore, the calculation of recall is

    roughly estimated after scanning the top 1000 images returned for each query. The selected

    queries are: Clinton, Jordan, car, flower, tree, cat, submarine, mars, spring, galaxy, movie

    star, potato, ship, space, tomb raider, female, and mountain. Figure 8 shows the average

    precisionrecall for all queries.

    Although the feedback from a single user is limited in our experiments, multiple users

    feedbacks are accumulated and stored in the user log. The user space model is constructed

    from the user log and used to improve the document space model and to improve the

    retrieval performance. The system performance of applying log mining is represented in

    the dash-dotted line in Figure 8 for the corresponding cases. As we can see from these

    figures, the log mining not only improves the precision when the recall is low, but can also

    improve the precision when the recall is high. In other words, the overall performance of

    the system is improved after log mining.

    5. Conclusions

    In this paper, we have discussed in detail relevance feedback technologies in content-based

    image retrieval systems. The key issues and representative algorithms in relevance feed-

    back in CBIR are reviewed. We have presented a framework of integrated relevance feed-

    back and semantic learning in content-based retrieval. Our method utilizes both semantic

    and low-level feature properties of every feedback image in refine the retrieval, while inthe meantime, learning semantic annotations of each image. While the user is interacting

    with the system by providing feedbacks in a query session, a progressive learning process

    is activated to propagate the keyword annotation from the labeled images to un-labelled

    images so that more and more images are implicitly labeled by keywords at certain prob-

    abilities. In this way, more and more images are implicitly labelled by keywords by the

    semantic propagation process. Thus, such process will improve the retrieval performance

    in future, either query by image examples or by keywords. Furthermore, we extended

  • 8/8/2019 CBIR06

    24/25

    154 ZHANG ET AL.

    the framework in a web image search engine by incorporating user log mining in refining

    search accuracy. This new framework makes the image retrieval system to be superior over

    either the classical CBIR or text-based systems.

    Publishers note

    This article is based on the original conference paper published by Kluwer Academic Pub-

    lishers in Visual and Multimedia Information Management, edited by Xiaofang Zhou and

    Pearl Pu. ISBN: 1-4020-7060-8. 2002 by International Federation for Information

    Processing.

    References

    [1] C. Buckley and G. Salton, Optimization of relevance feedback weights, in Proceedings of SIGIR95,

    1995.

    [2] S. K. Chang, C. W. Yan, D. C. Dimitroff, and T. Arndt, An intelligent image database system, IEEE

    Transactions on Software Engineering 14(5), 1988.

    [3] Z. Chen, W. Liu, C. Hu, M. Li, and H. J. Zhang, iFind: A web image search engine, in Proceedings of

    SIGIR2001, 2001.

    [4] Z. Chen, W. Liu, F. Zhang, M. Li, and H. J. Zhang, Web mining for web image retrieval, Journal of the

    American Society for Information Science and Technology 52(10), August 2001, 831839.

    [5] I. J. Cox, T. P. Minka, T. V. Papathomas, and P. N. Yianilos, The Bayesian image retrieval system,

    PicHunter: Theory, implementation, and psychophysical experiments, IEEE Transactions on Image

    Processing, Special Issue on Digital Libraries, 2000.

    [6] M. Flickner, H. Sawhney, W. Niblack et al., Query by image and video content: The QBIC system, IEEE

    Computer Magazine 28, 1995, 2332.[7] J. Huang, S. R. Kumar, and M. Metra, Combining supervised learning with color correlograms for content-

    based image retrieval, in Proceedings of ACM Multimedia95, November 1997, pp. 325334.

    [8] Y. Ishikawa, R. Subramanya, and C. Faloutsos, Mindreader: Query databases through multiple examples,

    in Proceedings of the 24th VLDB Conference, New York, 1998.

    [9] F. Jing, M. Li, H. J. Zhang, and B. Zhang, An effective region-based image retrieval framework, in

    Proceedings of ACM Multimedia 2002, Juan-les-Pins, France, December 16, 2002.

    [10] J. Laaksonen, M. Koskela, and E. Oja, PicSOM: Self-organizing maps for content-based image retrieval,

    in Proceedings of International Joint Conference on NN, July 1999.

    [11] C. Lee, W. Y. Ma, and H. J. Zhang, Information embedding based on users relevance feedback for image

    retrieval, in Proceedings of SPIE International Conference on Multimedia Storage and Archiving Sys-

    tems IV, Boston, 1922 September 1999.

    [12] Y. Lu et al., A unified framework for semantics and feature based relevance feedback in image retrieval

    systems, in Proceedings of ACM MM2000, 2000.

    [13] S. D. MacArthur, C. E. Brodley, and C.-R. Shyu, Relevance feedback decision trees in content-based image

    retrieval, in IEEE Workshop on Content-Based Access of Image and Video Libraries, 2000, pp. 6872.[14] T. Minka and R. Picard, Interactive learning using a Society of Models, Pattern Recognition 30(4), 1997.

    [15] T. Mitchell, Machine Learning, McGraw-Hill, 1997.

    [16] J. J. Rocchio Jr., Relevance feedback in information retrieval, in The SMART Retrieval System: Experi-

    ments in Automatic Document Processing, ed. G. Salton, Prentice-Hall, 1971, pp. 313323.

    [17] Y. Rui and T. S. Huang, A novel relevance feedback technique in image retrieval, in Proceedings of 7th

    ACM Conference on Multimedia, 1999.

    [18] Y. Rui, T. S. Huang, and S. Mehrotra, Content-based image retrieval with relevance feedback in MARS,

    in Proceedings of IEEE International Conference on Image Processing, 1997.

  • 8/8/2019 CBIR06

    25/25

    RELEVANCE FEEDBACK AND LEARNING IN CONTENT-BASED IMAGE SEARCH 155

    [19] G. Salton, Automatic Text Processing, Addison-Wesley, Reading, MA, 1989.

    [20] G. Salton and M. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, 1983.

    [21] S. Sclaroff, L. Taycher, and M. L. Cascia, ImageRover: a content-based image browser for the World Wide

    Web, Technical Report 97-005, Boston University CS Dept., 1997.

    [22] H. T. Shen, B. C. Ooi, and K. L. Tan, Giving meanings to WWW images, in Proceedings of ACM

    MM2000, 2000, pp. 3948.

    [23] Z. Su, S. Li, and H. J. Zhang, Extraction of feature subspaces for content-based retrieval using relevance

    feedback, in ACM Multimedia 2001, Ottawa, Canada, 2001.

    [24] Z. Su, H. J. Zhang, and S. Ma, Relevant feedback using a Bayesian classifier in content-based image

    retrieval, in SPIE Electronic Imaging 2001, San Jose, CA, January 2001.

    [25] K. Tieu and P. Viola, Boosting image retrieval, in IEEE Conference on Computer Vision and Pattern

    Recognition, 2000.

    [26] S. Tong and E. Chang, Support vector machine active leaning for image retrieval, in ACM Multimedia

    2001, Ottawa, Canada, 2001.

    [27] N. Vasconcelos and A. Lippman, Learning from user feedback in image retrieval systems, in NIPS99,

    Denver, CO, 1999.

    [28] P. Wu and B. S. Manjunath, Adaptive nearest neighbour search for relevance feedback in large image

    database, in ACM Multimedia Conference, Ottawa, Canada, 2001.

    [29] Y. Wu, Q. Tian, and T. S. Huang, Discriminant EM algorithm with application to image retrieval, in IEEE

    CVPR, South Carolina, 2000.

    [30] H. J. Zhang and D. Zhong, A scheme for visual feature based image indexing, in Proceedings of

    IS&T/SPIE Conference on Storage and Retrieval for Image and Video Databases III, 1995, pp. 3646.