Distance Metric Learning from Uncertain Side Information with Application to Automated Photo
Tagging
Lei Wu *†, Steven C.H. Hoi*, Rong Jin#, Jianke Zhu‡, Nenghai Yu†
*Nanyang Technological University, †University of Sci. & Tech. of China, #Michigan State University, ‡ETH Zurich
BACKGROUND Annotation/tagging is essential to making
images accessible to Web users Billions of images on the Web lack proper
annotation/tags Automatic image annotation has been actively
studied in multimedia community
BACKGROUND (cont’) Social media data in social websites enjoy rich
tagging information provided by Web users
Can we resolve the challenge of auto-photo annotation by leveraging the emerging huge amount of rich social media data?
BACKGROUND (cont’) Annotation by Search from Social Images
SunBirdSkyBlue…
BirdFlyWhiteCloud…
SunCloudHawkFly…
HawkBirdSkyEagle…
MOTIVATION Annotation by Search
Find similar image from social image DB Annotate the image by the tags of high frequency
Research Challenges Visual feature representation Tag data mining Scalable search & indexing Distance/similarity measure
Distance Metric Learning
MOTIVATION (cont’) Related Work of Automated Photo Tagging
Built a large collection of web images with ground truth labels for helping object recognition research (Russell et al. 2008)
A fast search-based approach for image annotation by some efficient hashing technique (Wang et al. 2006)
Utilized visual and text modalities simultaneously in clustering images (Rege et al. 2008)
Efficient image search and scene matching techniques for exploring a large-scale web image repository. (Torralba et al. 2008)
Learning based method for improving the efficiency of manual image annotation (Yan et al. 2008)Adopt Hamming or Euclidian distance
MOTIVATION (cont’) Distance Measure
Hamming distance Euclidian distance Mahanalobis distance
Distance Metric Learning Learning to optimize the metric M Side Information (a.k.s. “Pairwise Constraints”)
Similar pairs S(x1, x2) : x1 and x2 belong to the same category
Dissimilar pairs D(x1, x2): x1 and x2 belong to different categories
MOTIVATION (cont’) Related Work on Distance Metric Learning
Probablistic Global Distance Metric Learning (PGDM) (Xing et al. 2002)
Neighbourhood Components Analysis (NCA) (Goldberger et al. 2005)
Relevance Component Analysis (RCA) (Bar-Hillel et al. 2005) Discriminative Component Analysis (DCA) (Hoi et al. 2006) Large Margin Nearest Neighbor (LMNN) (Weinberger et al.
2006) Regularized Distance Metric Learning (RDML) (Si et al. 2006) Information-Theoretic Metric Learning (ITML) (Davis et al.
2007)Clean side information is given
explicitly
MOTIVATION (cont’) Annotation by Search from Social Media
NO explicit pairwise side information available But rich information is available with social images
Ideas of our research To discover implicit pairwise relationship between
social images via a probabilistic approach To learn effective distance metrics from
uncertain side information that is discovered from social images implicitly
METRIC LEARNING FRAMEWORK FOR
AUTOMATED PHOTO TAGGING Overview of Our Approaches
Discovery of probabilistic side information A Graphical Model Approach
Learning distance metrics from probabilistic side information
A Probabilistic RCA Method Automated photo tagging by applying the
optimized metric in visual similarity search
METRIC LEARNING FRAMEWORK FOR
AUTOMATED PHOTO TAGGING
Latent Chunklet Estimation for Probabilistic Side Info.
Problem Formulation Latent Chunklets
i.e., the hidden topics
Assumption both visual images and associated textual
metadata are generated from the hidden topic Calculation
Multi-model hidden topic analysis
Graphical Model ForLatent Chunklet Estimation
Text Modality
Visual ModalityHidden
Topic
Graphical Model ForLatent Chunklet Estimation (cont’)
Generation Process
Inference
Probabilistic Side Info., as Prior Prob. Matrix
Probabilistic Distance Metric Learning
Problem Definition and Notations Probabilistic Side Info.:
Centers/Means for the Latent Chunklets
Membership Probability
Given the estimation of latent chunklets P0, how to formulate the DML problem to find the optimal metric M?
Propose an extension of RCA with Prob. Side Info.
Probabilistic Relevance Component Analysis (pRCA)
The objective function of pRCA:
Corollary 1. When fixing the means of chunklets μ and the matrix of probability assignments P (assuming with hard assignments of 0 and 1), the Probabilistic Relevance Component Analyasis (pRCA) formulation reduces to the regular RCA learning.
Minimize Sum of square distances of examples from their chunklet’s
centers
regularization preventing the trivial
solution
Probabilistic Relevance Component Analysis (pRCA)
Iterative algorithm Fixing P and μ to optimize M:
Fixing M and μ to optimize P:
Fixing P and M to find μ:
Probabilistic Relevance Component Analysis (pRCA)
pRCA Algorithm
Automated Photo Tagging Query image Steps of Auto Photo Tagging via Search
Distance/Similarity Measure
To retrieve a set of visually similar social photos Set of k-Nearest Neighbor Images
Set of images with distance less than some threshold
Automated Photo Tagging (cont’)
Annotating the query photo by the relevant tags associated with the set of similar images A tag is more preferred if it has a higher frequency
among the set of similar social images A tag is more preferred if its associated social image
are visually more similar to the query photo Our tagging approach
Frequency of tag w among
the retrieved social images
Average distance from the query photo to the tag’s associated
social images
EXPERIMENTS Experimental Testbed
Totally 205,442 photos from Flickr Distance Metric Learning: 16,588 photos + tags Knowledge Database: 186,854 photos + tags Query Image: 2,000 random photos
Compared Schemes: Relevance Component Analysis (RCA) Discriminative Component Analysis (DCA) Information-Theoretic Metric Learning (ITML) Large Margin Nearest Neighbor (LMNN) Neighbourhood Components Analysis (NCA) Regularized Distance Metric Learning (RDML)
EXPERIMENTS (cont’) Settings:
500 latent chunklets 1,000 visual words 10,000 tags Learning rate γ=0.5 Top k nearest photos, k=30 Top t relevant tags for annotation, t=1,…,10
Average Precision Fixed the number of nearest neighbors k to
30 for all compared methods
Average Recall Fixed the number of nearest neighbors k to
30 for all compared methods
Precision-Recall Curves Fixed the number of nearest neighbors k to
30 for all compared methods
Empirical Observations DML techniques are beneficial and critical to
the retrieval-based photo tagging tasks In general, pRCA algorithm considerably
outperformed other approaches in most cases. For some cases, some DML methods did not
perform well, which could be even worse than the Euclidean method. Noisy (uncertain) side information issue Robustness is important to DML
Evaluation of Varied k And t Examine the annotation performance of
pRCA by varying the value of k from 10 to 50
Empirical Observations The number of nearest neighbors parameter
k can influence the annotation performance In our case, when k equals to 30, the resulting
performance is generally better than others Too large k, lots of noisy tags may be included as
there may not exist many relevant images in the database.
Too small k, some relevant tags may not appear, which again may degrade the performance
Time Cost For Metric Learning To evaluate the time efficiency performance of the proposed
DML algorithm on the same dataset
Findings The most efficient method is the regular RCA approach The most time-consuming one is NCA pRCA is quite competitive, which is worse than RCA,DCA, and RDML,
but is considerably better than ITML, LMNN, and NCA
Some Good ExamplesQuery Photo Top Recommended Tags
autumn, fall, forest , trees, nature , tree wood, germany , path , creative
sunset, clouds, sky, sea, beach, abigfave,sun, water, landscape, ocean
tiger, zoo , specanimal, impressedbeauty, abigfave, nature, animal , cat, animals, aplusphoto
garden, flowers, yellow, nature, hdr,nikon, spring, festival, impressed beauty
Some Poor ExamplesQuery Photo Top Recommended Tags
macro, nikon, bokeh, nature, flower,canon, storm, eos, plane, flickrsbest
nikon, street, water, sport, blue, bike,lebanon, kids, eric mckenna, krissy mckenna
winter, photography, art , beach usa, fashion, portrait , travel, party, snow
park, river, travel, trees, lake, hiking,winter, green, vacation, water
CONCLUSIONS Contributions:
Study DML from uncertain side information that exploits probabilistic side information
Propose a two-step probabilistic distance metric learning (PDML) framework
Present an effective probabilistic RCA (pRCA) algorithm
Apply the algorithm to the auto photo annotation by search task
Encouraging results showed that our technique is effective and promising
Future Work To improve visual feature representation,
especially for annotating objects. To expand the scale of database To improve large scale search & indexing To filter spam and irrelevant tags To adopt user’s feedback to improve
automated tagging performance on APT.
Q&A More information is available:
Http://www.cais.ntu.edu.sg/~chhoi/APT/
Online demo of Auto Photo Tagging (APT) is available:Http://msm.cais.ntu.edu.sg/APT/
Contact: WU Lei [email protected] Steven CH Hoi [email protected] School of Computer EngineeringNanyang Technological UniversitySingapore 639798 Email: [email protected] Tel: (+65) 6513-8040 Fax: (+65) 6792-6559 Http://www.ntu.edu.sg/home/chhoi/
GRAPHICAL MODEL FOR LATENT CHUNKLET ESTIMATION
Inference Joint probability on documents and topics
Conditional probability on tags, visual words and topics
Gibbs sampling estimation
Appendix
Top Related