VisualRank: Applying PageRank to Large-Scale Image Search
Yushi Jing, Member, IEEEShumeet Baluja, Member, IEEE
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, NOVEMBER 2008
[24] Y. Jing, S. Baluja, and H. Rowley, “Canonical Image Selection from the Web,” Proc. Sixth Int’l Conf. Image and Video Retrieval, pp. 280-287, 2007.
Outline
• Introduction• Similarity graph[24]• PageRank & VisualRank• Hashing• Experiments• Conclusion
Outline
• Introduction• Similarity graph[24]• PageRank & VisualRank• Hashing• Experiments• Conclusion
Search for “d80” & “coca cola” by traditional search engine
Introduction
• Visual theme, ex: “coca cola” logo• CBIR: content-based image retrieval– Pure– Composite• “Visual-filter” via Probabilistic Graphical Models(PGMs)
[7]
• Compare:– Object category learner– image search engine
[7] R. Fergus, P. Perona, and A. Zisserman, “A Visual Category Filter for Google Images,” Proc. Eighth European Conf. Computer Vision, pp. 242-256, 2004.
Introduction
Introduction
• Combine[24]– pairwise visual similarity among images– nonvisual signals
• VisualRank– Based on PageRank– Large number of queries & images
• Goal– More accurate search ranking
introducton
Outline
• Introduction
• Similarity graph[24]• PageRank & VisualRank• Hashing• Experiments• Conclusion
Features generation
• Local descriptor– SIFT & compare[29]
[29] K. Mikolajczyk and C. Schmid, “A Performance Evaluation of Local Descriptors,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1615-1630, Oct. 2005.
Similarity graph
• pairwise
Similarity graph
• Top 1000 results of “Mona-lisa”
Outline
• Introduction• Similarity graph[24]
• PageRank & VisualRank• Hashing• Experiments• Conclusion
PageRank
• Conception– Vote– eigenvector centrality A
B D C
PR(A) = PR(B) + PR(C) + PR(D)
PageRank
A
B
D
C
PageRank
q=0.15Random walk
PageRank
• Markov matrix
VisualRank
usually d>0.8
Link spam
• Well connected image V.S. VisualRank, “Nemo”
Outline
• Introduction• Similarity graph[24]• PageRank & VisualRank
• Hashing• Experiments• Conclusion
Matching
• Precluster– “Paris”, “Eiffel Tower”, and “Arc de Triomphe”
• Top-N, and compute VisualRank• Hashing– Locality Sensitive Hashing (LSH)– Feature descriptor as the key
Locality Sensitive Hashing (LSH)
• An approximate k-NN technique• Hash function:
– a is d-dimensional random vector– b is real number from range– W defines the quantization of the features– V is the original feature vector
Flow(1/3)
1. Resize 500*500 pix, 1000 web images 3000,000 to 700,000 feature vectores
2. L hash table H=H1, H2,…,HL, each with K hash functions, L=40, W=100, K=3
Flow(2/3)
3. Matched descriptor– Have same key more than C=3 hash table
4. Hough Transform
Flow(3/3)
5. Similarity– Matched images• More than 3 features
– no. of matches divide by their avg. number of local features
6. Given similarity matrix S, and use VisualRank
Outline
• Introduction• Similarity graph[24]• PageRank & VisualRank• Hashing
• Experiments• Conclusion
Experiments
• 2,000 most popular product queries on Google, ex: “ipod”, “Xbox”
• the top 1,000 search results each query in July 2007 Google
• Filter– Fewer than 5% images at least one connection– Remaining 1,000 queries
Experiment 1
• Evaluate– “irrelevancy” of our ranking
• Mixed Top 10 VisualRank & top 10 google Remove duplicates and ask “which are least relevant?”
• Ask 150 evaluators, randomly 50 queries
Experiment 2
• VisualRankbias,
• pT=VjT=[1/m, …, 1/m, 0, …, 0]
• HeuristicRank – a pure CBIR system
j
Experiment 3
• Collected 40 top images each click numbers from google
• Compare – Sum of VisualRank top 20 click numbers– Sum of default ranking top 20 click numbers
• VisualRank exceeds 17.5% than default Google ranking
Landmarks
• 80 common landmark, ex: “Eiffel Tower,”“Big Ben,” “Coliseum,” and “Lincoln Memorial.”
Outline
• Introduction• Similarity graph[24]• PageRank & VisualRank• Hashing• Experiments
• Conclusion
Conclusion
• VisualRank applying PageRank conception and combined – Default Google ranking– similarity graph between images
• VisualRank can outperform the default Google on the vast majority of queries
• Reduce the number of irrelevant images efficiently
Top Related