Session: Object Recognition II: using Language, Tuesday 15, June 2010

35
Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition ession: Object Recognition II: using Language, Tuesday 15, June 201 Rongrong Ji 1 , Hongxun Yao 1 , Xiaoshuai Sun 1 , Bineng Zhong 1 , and Wen Gao 1,2 1 Visual Intelligence Laboratory, Harbin Institute of Technology 2 School of Electronic Engineering and Computer Science, Peking University

description

Session: Object Recognition II: using Language, Tuesday 15, June 2010. Rongrong Ji 1 , Hongxun Yao 1 , Xiaoshuai Sun 1 , Bineng Zhong 1 , and Wen Gao 1,2 1 Visual Intelligence Laboratory, Harbin Institute of Technology 2 School of Electronic Engineering and Computer Science, Peking University. - PowerPoint PPT Presentation

Transcript of Session: Object Recognition II: using Language, Tuesday 15, June 2010

Page 1: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Towards Semantic Embedding in Visual Vocabulary

Towards Semantic Embedding in Visual Vocabulary

The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition

Session: Object Recognition II: using Language, Tuesday 15, June 2010

Rongrong Ji1, Hongxun Yao1, Xiaoshuai Sun1, Bineng Zhong1, and Wen Gao1,2

1Visual Intelligence Laboratory, Harbin Institute of Technology2School of Electronic Engineering and Computer Science, Peking University

Page 2: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Overview

✴ Problem Statement

✴ Building patch-labeling correspondence

✴ Generative semantic embedding

✴ Experimental comparisons

✴ Conclusions

Page 3: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Overview

✴ Problem Statement

✴ Building patch-labeling correspondence

✴ Generative semantic embedding

✴ Experimental comparisons

✴ Conclusions

Page 4: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Introduction

• Building visual codebooks– Quantization-based approaches

• K-Means, Vocabulary Tree, Approximate K-Means et al.

– Feature-indexing-based approaches• K-D Tree, R Tree, Ball Tree et al.

• Refining visual codebooks– Topic model decompositions

• pLSA, LDA, GMM et al.

– Spatial refinement• Visual pattern mining, Discriminative visual phrase et al.

Page 5: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Introduction

• With the prosperity of Web community

User Tags

Our Contribution

Page 6: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Introduction

• Problems to achieve this goal

– Supervision @ image level

– Correlative semantic labels

– Model generality

A traditional San Francisco street A traditional San Francisco street view with car and buildingsview with car and buildings

??Flower

Rose

Page 7: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Introduction

• Our contribution– Publish a ground truth path-labeling set

http://vilab.hit.edu.cn/~rrji/index_files/SemanticEmbedding.htm

– A generalized semantic embedding framework• Easy to deploy into different codebook models

– Modeling label correlations• Model correlative tagging in semantic embedding

Page 8: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Introduction

• The proposed framework

Build patch-level correspondenceSupervised visual codebook construction

Page 9: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Overview

✴ Problem Statement

✴ Building patch-labeling correspondence

✴ Generative semantic embedding

✴ Experimental comparisons

✴ Conclusions

Page 10: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Building Patch-Labeling Correspondence

• Collecting over 60,000 Flickr photos– For each photo

• DoG detection + SIFT

• “Face” labels (for instance)

Page 11: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Building Patch-Labeling Correspondence

• Purify the path-labeling correspondences– Density-Diversity Estimation (DDE)– Formulation

• For a semantic label , its initial correspondence set is

– Patches extracted from images with label

– A correspondence from label to local patch

1 2

1 2, { , ,..., },

{ , ,..., }i

i i n ii

i i n i

d s d s d s

D s d d d s

le le le

1 2{ , ,..., }i nD d d d

ijleis

is jd

Purify le links in <Di, si>

Page 12: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Building Patch-Labeling Correspondence

• Density– For a given , Density reveals its

representability for :

• Diversity– For a given , Diversity reveals it

unique score for :

ld ldDen

is

21

1exp( || || )

l

m

d l j Lj

Den d dm

ldDivld

is

ln( )l

m md

i i

n nDiv

n n

Average neighborhood distance in m

neighborsnm: number of images in

neighborhood ni: number of total images with

label si

Page 13: Session: Object Recognition II: using Language, Tuesday 15, June 2010

{ | }

. . j

j j j

Purifyi j d

d d d

D d DDE T

s t DDE Den Div

High T: Concerns more on Precision,

not Recall

Building Patch-Labeling Correspondence

Page 14: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Building Patch-Labeling Correspondence

• Case study: “Face” label (before DDE)

Page 15: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Building Patch-Labeling Correspondence

• Case study: “Face” label (after DDE)

Page 16: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Overview

✴ Problem Statement

✴ Building patch-labeling correspondence

✴ Generative semantic embedding

✴ Experimental comparisons

✴ Conclusions

Page 17: Session: Object Recognition II: using Language, Tuesday 15, June 2010

✴Generative Hidden Markov Random Field

✴ A Hidden Field for semantic modeling

✴ An Observed Field for local patch quantization

Generative Semantic Embedding

Page 18: Session: Object Recognition II: using Language, Tuesday 15, June 2010

• Hidden Field– Each produces correspondence links (le) to

a subset of patch in the Observed Field– Links denotes semantic correlations

between and

1{ }mi iS s

is

ijl

is js

lij

Generative Semantic Embedding

WordNet::SimilarityPedersen et al. AAAI 04

Page 19: Session: Object Recognition II: using Language, Tuesday 15, June 2010

• Observed Field– Any two nodes follow visual metric (e.g. L2)– Once there is a link between and , we

constrain by from the hidden field

1{ }ni iD d

ijle

jsjsid

id

sj

di

leij

Generative Semantic Embedding

Page 20: Session: Object Recognition II: using Language, Tuesday 15, June 2010

• In the ideal case

– Each is conditionally independent given :

Neighbors in the Hidden Field

id

( | ) { ( | ) | ( | ) 0}i j i ji m

P D S P d s P d s

Feature set D is regarded as (partial) generative from the Hidden Field

Generative Semantic Embedding

Page 21: Session: Object Recognition II: using Language, Tuesday 15, June 2010

– Formulize Clustering procedure

• Assign a unique cluster label to each

• Hence, D is quantized into a codebook with corresponding features

• Cost for codebook candidate C

P(C|D)=P(C)P(D|C)/P(D)

ic id

1{ }Kk kW w 1{ }Kk kV v

Generative Semantic Embedding

P(D) is a constraintSemantic Constraint

Visual Distortion

Page 22: Session: Object Recognition II: using Language, Tuesday 15, June 2010

• Semantic Constraint P(C)– Define a MRF on the Observed Field

– For a quantizer assignments C, its probability can be expressed as a Gibbs distribution from the Hidden Field as

Generative Semantic Embedding

Page 23: Session: Object Recognition II: using Language, Tuesday 15, June 2010

• That is– Two data points and in contribute to

if and only ifjdid kw

Observed FieldObserved Field

Hidden FieldHidden Field

ccii=c=cii

ssxx ssyy

ddjjddii

llxyxy

Generative Semantic Embedding

Page 24: Session: Object Recognition II: using Language, Tuesday 15, June 2010

• Visual Constraint P(D|C)– Whether the codebook C is visually consistent

with current data D• Visual distortion in quantization

Generative Semantic Embedding

Page 25: Session: Object Recognition II: using Language, Tuesday 15, June 2010

• Overall Cost• Finding MAP of P(C|D) can be converted into

maximizing its posterior probability

The visual constraint P(D|C)

The semantic constraints P(C)

Generative Semantic Embedding

Page 26: Session: Object Recognition II: using Language, Tuesday 15, June 2010

• EM solution

• E step

• M step

Assign local patches to the closest clusters

Update the visual word center

Generative Semantic Embedding

Page 27: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Generative Semantic Embedding

Page 28: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Generative Semantic Embedding

• Model generation– To supervised codebook with label

independence assumption

– To unsupervised codebooks• Making all l=0

Page 29: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Overview

✴ Problem Statement

✴ Building patch-labeling correspondence

✴ Generative semantic embedding

✴ Experimental comparisons

✴ Conclusions

Page 30: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Experimental Comparisons

Case study of ratios between inter-class distance and intra-class distance with and without semantic embedding in the Flickr dataset.

Page 31: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Experimental Comparisons

MAP@1 comparisons between our GSE model to Vocabulary Tree [1] and GNP [34] in Flickr 60,000 database.

Page 32: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Experimental Comparisons

Confusion table comparisons on PASCAL VOC dataset with method in [24].

Page 33: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Overview

✴ Problem Statement

✴ Building patch-labeling correspondence

✴ Generative semantic embedding

✴ Experimental comparisons

✴ Conclusions

Page 34: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Conclusion

• Our contribution• Propagating Web Labeling from images to local

patches to supervise codebook quantization• Generalized semantic embedding framework for

supervised codebook building• Model correlative semantic labels in supervision

• Future works• Adapt one supervised codebook for different tasks• Move forward supervision into local feature

detection and extraction

Page 35: Session: Object Recognition II: using Language, Tuesday 15, June 2010

Thank you!Thank you!