Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

40
Visual-Textual Joint Relevance Learning for Tag- Based Social Image Search Yue Gao, Meng Wang, Zheng-Jun Zha, Jialie Shen, Xuelong Li, Xindong Wu IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013

Transcript of Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Page 1: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Visual-Textual Joint Relevance Learning for Tag-Based Social

Image SearchYue Gao, Meng Wang, Zheng-Jun Zha, Jialie Shen, Xuelong Li, Xindong Wu

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013

Page 2: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Introduction

Page 3: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Tag-based social image search• Social media data (Flickr, Youtube, etc.)• Associated with user generated tags, meta information (date, location, etc.)

• Conventional tag-based social image search• Too much noise in tags • Lacks an optimal ranking strategy (e.g. Flickr – time-based ranking, interest -

ingness-based ranking)

• Existing relevance-based ranking method• explore visual content and tags separately or sequentially

Page 4: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Proposed schema• a hypergraph-based approach to simultaneously utilize visual information and tags

Vertex: social imageHyperedge: visual word / tag

Learn the weights(importance of different visual words and tags)

Relevance scores of images

Page 5: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Related works

Page 6: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Social image search• Separated Methods• Only the textual content or the

visual content is employed for tag analysis• The useful information is miss-

ing

Page 7: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Social image search• Sequential Methods• The visual content and the tags

are sequentially employed for image search• The correlation among visual

content and tags are separated

Page 8: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Social image searchJoint method

Page 9: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Hypergraph learning• Hypergraph is generalization of graph in which an edge can connect to

multiple vertices• Used for data mining and information retrieval task• Effective in capturing higher-order relationship

Page 10: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Hypergraph analysis

Page 11: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Definition

Image from Wikipedia

• Vertex set • Hyperedge set • Hyperedge is able to link more than two

vertices.• Edge weight set

Hypergraph

Page 12: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Hypergraph analysis• Learning with hypergraphs• Binary classification with hypergraph• Normalized Laplacian method is formulated as a regularization framework

𝑎𝑟𝑔𝑚𝑖𝑛𝑓 {𝜆𝑅𝑒𝑚𝑝 ( 𝑓 )+Ω( 𝑓 )}Regularizer

Empirical loss

Weighting parameter

To-be-learned classification function

Page 13: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Visual-textual joint relevance learning

Page 14: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Hypergraph construction• Vertex construction• Vertices : Social image set• The number of vertices in Hyper-

graph is equals to the number of images in the image dataset.

Page 15: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Hypergraph construction• Hyperedge construction• Feature 1. visual contents

• Bag of Visual Words• Extracts local SIFT descriptors for

each images• Trains visual vocabularies with de-

scriptors

𝑓 𝑖𝑏𝑜𝑤 (𝑘 ,1 )={10

if i-th image has k-th visual wordotherwise

Page 16: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Hypergraph construction• Hyperedge construction• Feature 2. Textual information

• Bag of Textual Words• Tags in each image are ranked by

TagRanking• For further processing, top tags for

each image are left• For further hyperedge construction,

the total number of tags with the highest TF-IDF are left in the data-base 𝑓 𝑖

𝑡𝑎𝑔 (𝑘 ,1 )={10if i-th image has k-th tagotherwise

Page 17: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Hypergraph construction• Hyperedge construction• If selected two images contain the

same visual word, they are con-nected with the hyperedge.• If selected two images contain the

same tag, they are connected with the hyperedge.

• If , and is connected.• If , and is connected.

visual content based hyperedges

tag-based hyperedges

+ hyperedges in total

Page 18: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Hypergraph construction

Example of textual hyperedge construction Example of visual hyperedge construction

Page 19: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Example of the connection between two images

Page 20: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Social image relevance learning • Social image search task• Binary classification problem• Measure the relevance score among all vertices in hypergraph• Transductivie inference is also formulated as a regularization framework• Object Function

• Regularizer term indicates that highly related vertices should have close label results

Weight regularizer termEmpirical loss termRegularizer termWeight vectorTo-be-learned relevance score vector

Page 21: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Social image relevance learning • Object Function

• guarantees that the new generated labeling results are not far away from the initial label in-formation

• s.t.

• s.t.

(: the normalized hypergraph Laplacian)

(y : n × 1 initial label vector)

Page 22: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Optimization• Alternating optimization strategy• to-be-learned two variable w and f

we fix one and optimize the other one each time

• Using the iterative optimization method, w and f are obtained.

Page 23: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Probabilistic explanation• Probabilistic perspective• Deriving the optimal f and w with the maximum posterior probability given

the samples X and the label vector y

• Equivalent to the object function s.t.

Page 24: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Pseudo-relevant sample selection• Pseudo-relevant samples• Associated with the query tag• Have high relevance probabilities• They are not far away from result• Used for noise reduction

Page 25: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Pseudo-relevant sample selection• Semantic Relevance Measuring

• All the social images that are asso-ciated with the tag are ranked in descending order• The top K results are selected as

the pseudo-relevant images

• Semantic similarity

• Flickr Distance between two tags• Based on a latent topic based vis-

ual language model

𝑠 (𝑥 𝑖 ,𝑡𝑞)= 1𝑛𝑖Σ𝑡 ∈𝑇 𝑖

𝑠𝑡𝑎𝑔(𝑡𝑞 ,𝑡) 𝑠𝑡𝑎𝑔 (𝑡1 , 𝑡2 )=exp (−𝐹𝐷 (𝑡1 ,𝑡 2))

Page 26: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Experiments

Page 27: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Experimental settings• Dataset : Flickr dataset(104,000 images, 83,999 tags) + NUS-WIDE (370K+ images)• Labeling : three relevance levels : very relevant(2), relevant(1) and irrelevant(0)• Compared algorithms

• Graph based semi supervised learning (Graph)• Sequential social image relevance learning (Sequential)• Tag ranking (TagRanking)• Tag relevance combination (Uniform Tagger)• Hypergraph based relevance learning (HG)• HG + hyperedge weight estimation (HG+WE)• HG + WE (visual contents only)• HG + WE (textual contents only)

• Performance evaluation metric• Normalised Discounted Cumulative Gain (NDCG)

Page 28: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

The NDCG@20 Results of different methods

HL+WE

HL+WE(ta

g)

HL+WE(vi

sual) HL

TagRanking

Uniform

TaggerSe

qGraph

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

10.8814 0.8578 0.8463

0.7418

0.6281 0.5994 0.5778 0.5727

The Average NDCG@20 Results

Page 29: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Average NDCG@k comparison

• This approach consistently outper-forms the other methods

Depth for NDCG

Page 30: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

• Top results obtained by different methods for the query weapon.• the final ranking list can preserve

images from all different mean-ings

Page 31: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

• Top results obtained by different methods for the query apple.• the proposed method can return

relevant results with different meanings

Page 32: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

The effects of hyperedge weight learning

Top 100 visual words with the highest weights after the hypergraph learning process

Page 33: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

The effects of hyperedge weight learning

Ten tags with the highest weights after the hyper-graph learning process for the queries (a) car and (b) weapon.

Page 34: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Variation of weighting parametersAverage NDCG@20 performance curves with respect to the variation of λ and μ.

Page 35: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Variation of dictionary sizeNDCG@20 comparison of the proposed method with different sizesof the tag and visual word dictionaries, i.e., and .

Page 36: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Variation of max. number of tagsNDCG@20 comparison of the proposed method with different selection

The parameter is employed to filter noise tags

Page 37: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Computational cost comparison

Page 38: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Conclusion

Page 39: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Conclusion• Proposal : joint utilization of both visual contents and tags by hyper-

graph and relevance learning procedure for social image search.

• Consideration of weights of hyperedges• Differ from previous hypergraph learning algorithms• Minimizes the effects of uninformative features

• Future work• Diversity of search results : Next issue

Page 40: Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Thank you !

Q&A