Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos
description
Transcript of Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos
![Page 1: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/1.jpg)
Using Large-Scale Web Data to Facilitate Textual QueryBased Retrieval of Consumer Photos
Yiming Liu, Dong Xu, Ivor W. Tsang, Jiebo Luo
Nanyang Technological University & Kodak Research Lab
![Page 2: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/2.jpg)
Motivation• Digital cameras and mobile phone cameras popularize
rapidly:– More and more personal photos;– Retrieving images from enormous collections of personal
photos becomes an important topic.
?How to retrieve?
![Page 3: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/3.jpg)
Previous Work
• Content-Based Image Retrieval (CBIR)– Users provide images as queries to retrieve
personal photos.
• The paramount challenge -- semantic gap:– The gap between the low-level visual features and
the high-level semantic concepts.
…
Low-levelLow-levelFeature vectorFeature vector
Image with high-Image with high-level conceptlevel concept
queryquery resultresult
… …
Feature Feature vectorsvectorsin DBin DB
compare
SemanticGap
![Page 4: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/4.jpg)
A More Natural Way For Consumer Applications
• Let the user to retrieve the desirable personal photos using textual queries.
• Image annotation is used to classify images w.r.t. high-level semantic concepts.
– Semantic concepts are analogous to the textual terms describing document contents.
• An intermediate stage for textual query based image retrieval.
queryquery
Sunset Annotation Result:high-level concepts
Annotation Result:high-level concepts
annotateannotate
compare
…
databasedatabase
…
resultresult
rankrank
![Page 5: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/5.jpg)
Our Goal• Web images are accompanied by tags, categories and titles.
… …
building
people, family
people, wedding
sunset
… …
WebWebImagesImages
ContextualContextualInformationInformation
Web Images Consumer Photos
• Leverage information from web image to retrieve consumer photos in personal photo collection.
information
No intermediate image annotation
process.
No intermediate image annotation
process.
• A real-time textual query based consumer photo retrieval system without any intermediate annotation stage.
![Page 6: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/6.jpg)
• When user provides a textual query,
TextualQuery
ClassifierAutomatic WebImage Retrieval
Automatic WebImage Retrieval
Large Collection of Web images
(with descriptive words)
Relevant/Irrelevant
Images
WordNet
RelevanceFeedback
RelevanceFeedback
Refined Top-Ranked
Photos
ConsumerPhoto Retrieval
ConsumerPhoto Retrieval
Raw Consumer Photos
Top-RankedConsumer
Photos
• It would be used to find relevant/irrelevant images in web image collections.
• Then, a classifier is trained based on these web images.
• And then consumer photos can be ranked based on the classifier’s decision value.
• The user can also gives relevance feedback to refine the retrieval results.
System Framework
![Page 7: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/7.jpg)
“boat” InvertedFile
InvertedFile
Relevant Web Images
IrrelevantWeb Images
boat
ark barge
dredger houseboat
… …
… …
… …
… …
… …
Semantic Word TreesBased on WordNet
• For user’s textual query, first search it in the semantic word trees.
• The web images containing the query word are considered as “relevant web images”.
• The web images which do not contain the query word and its two-level descendants are considered as “irrelevant web images”.
Automatic Web Image Retrieval
![Page 8: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/8.jpg)
Decision Stump Ensemble
• Train a decision stump on each dimension.
• Combine them with their training error rates.
![Page 9: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/9.jpg)
Why Decision Stump Ensemble?
• Main reason: low time cost– Our goal: a (quasi) real-time retrieval system.– For basic classifiers: SVMs are much slower;– For combination: boosting is also much
slower.
• The advantage of decision stump ensemble:– Low training cost;– Low testing cost;– Very easy to parallelize;
![Page 10: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/10.jpg)
Asymmetric Bagging
• Imbalance: count(irrelevant) >> count(relevant)– Side effects, e.g. overfitting.
• Solution: asymmetric bagging– Repeat 100 times by using different randomly sampled
irrelevant web images.
irrelevant images
relevant images
100 training sets
…
…
![Page 11: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/11.jpg)
Relevance Feedback
• The user labels nl relevant or irrelevant consumer photos.– Use this information to further refine the
retrieval results;
• Challenge 1: Usually nl is small;
• Challenge 2: Cross-domain learning– Source classifier is trained on the web image
domain. – The user labels some personal photos.
![Page 12: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/12.jpg)
Method 1: Cross-Domain Combination of Classifiers
• Re-train classifiers with data from both domain?– Neither effective nor efficient;
• A simple but effective method:– Train an SVM on the consumer photo domain with
user-labeled photos;– Convert the responds of source classifier and SVM
classifier to probability, and add them up;– Rank consumer photos based on this sum value.
• Referred as DS_S+SVM_T.
![Page 13: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/13.jpg)
Method 2: Cross-Domain Regularized Regression (CDRR)
• Construct a linear regression function fT(x):– For labeled photos: fT(xi) ≈ yi;
– For unlabeled photos: fT(xi) ≈ fs(xi);
Source Classifier
![Page 14: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/14.jpg)
Other images f T(x) should be f s(x)
• Design a target linear classifier f T(x) = wTx.
User-labeled images x1,…,xl
f T(x) should be the user’s label y(x)
A regularizer to control the complexity of A regularizer to control the complexity of the target classifier the target classifier ff TT((xx))
• This problem can be solved with least square solver.
![Page 15: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/15.jpg)
Hybrid Method• A combination of two methods.• For labeled consumer photos:
– Measure the average distance davg to their 30 nearest unlabeled neighbors in feature space;
– If davg < ε: Use DS_S+SVM_T;
– Otherwise: Use CDRR.
• Reason: – For consumer photos which are visually similar to
user-labeled images, they should be influenced more by user-labeled images.
![Page 16: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/16.jpg)
Experimental Results
![Page 17: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/17.jpg)
Dataset and Experimental Setup
• Web Image Database:– 1.3 million photos from photoSIG.– Relatively professional photos.
• Text descriptions for web images:– Title, portfolio, and categories accompanied
with web images;– Remove the common high-frequency words;– Remove the rarely-used words.– Finally, 21377 words in our vocabulary.
![Page 18: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/18.jpg)
Dataset and Experimental Setup
• Testing Dataset #1: Kodak dataset– Collected by Eastman Kodak Company:
• From about 100 real users.• Over a period of one year.
– 1358 images:• The first keyframe from each video.
– 21 concepts:• We merge “group_of_two” and
“group_of_three_or_more” to one concept.
![Page 19: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/19.jpg)
Dataset and Experimental Setup
• Testing Dataset #2: Corel dataset– 4999 images
• 192x128 or 128x192.
– 43 concepts:• We remove all concepts in which there are fewer
than 100 images.
![Page 20: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/20.jpg)
Visual Features
• Grid-Based color moment (225D)– Three moments of three color channels from each
block of 5x5 grid.
• Edge direction histogram (73D)– 72 edge direction bins plus one non-edge bin.
• Wavelet texture (128D)• Concatenate all three kinds of features:
– Normalize each dimension to avg = 0, stddev = 1– Use first 103 principal components.
![Page 21: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/21.jpg)
Retrieval without Relevance Feedback
• For all concepts:– Average number of relevant images: 3703.5.
![Page 22: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/22.jpg)
Retrieval without Relevance Feedback
• kNN: rank consumer photos with average distance to 300-nn in the relevant web images.
• DS_S: decision stump ensemble.
![Page 23: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/23.jpg)
Retrieval without Relevance Feedback
• Time cost:– We use OpenMP to parallelize our method;– With 8 threads, both methods can achieve
interactive level.– But kNN is expected to cost much time on large-
scale datasets.
![Page 24: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/24.jpg)
Retrieval with Relevance Feedback
• In each round, the user labels at most 1 positive and 1 negative images in top-40;
• Methods for comparison:– kNN_RF: add user-labeled photos into relevant
image set, and re-apply kNN;– SVM_T: train SVM based on the user-labeled images
in the target domain;– A-SVM: Adaptive SVM;– MR: Manifold Ranking based relevance feedback
method;
![Page 25: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/25.jpg)
Retrieval with Relevance Feedback
• Setting of y(x) for CDRR:– Positive: +1.0;– Negative: -0.1;
• Reason:– The top-ranked negative images
are not extremely negative;– Positive: “what is”; Negative:
“what is not”.positiveimages
negativeimages
![Page 26: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/26.jpg)
Retrieval with Relevance Feedback
• On Corel dataset:
![Page 27: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/27.jpg)
Retrieval with Relevance Feedback
• On Kodak dataset:
![Page 28: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/28.jpg)
Retrieval with Relevance Feedback
• Time cost:– All methods except A-SVM can achieve real-time
speed.
![Page 29: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/29.jpg)
System Demonstration
![Page 30: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/30.jpg)
Query: Sunset
![Page 31: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/31.jpg)
Query: Plane
![Page 32: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/32.jpg)
The User is Providing The Relevance Feedback …
![Page 33: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/33.jpg)
After 2 pos 2 neg feedback…
![Page 34: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/34.jpg)
Summary
• Our goal: (quasi) real-time textual query based consumer photo retrieval.
• Our method:– Use web images and their surrounding text
descriptions as an auxiliary database;– Asymmetric bagging with decision stumps;– Several simple but effective cross-domain
learning methods to help relevance feedback.
![Page 35: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/35.jpg)
Future Work
• How to efficiently use more powerful source classifiers?
• How to further improve the speed:– Control training time within 1 seconds;– Control testing time when the consumer photo
set is very large.
![Page 36: Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos](https://reader036.fdocuments.us/reader036/viewer/2022081603/568151bc550346895dbfed0e/html5/thumbnails/36.jpg)
Thank you!
• Any questions?