National Cancer Institute Enterprise Vocabulary Services & Semantic Interoperability
Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The...
Transcript of Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The...
![Page 1: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/1.jpg)
Towards Semantic Embedding in Visual Vocabulary
Towards Semantic Embedding in Visual Vocabulary
The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition
Session: Object Recognition II: using Language, Tuesday 15, June 2010
Rongrong Ji1, Hongxun Yao1, Xiaoshuai Sun1, Bineng Zhong1, and Wen Gao1,2
1Visual Intelligence Laboratory, Harbin Institute of Technology2School of Electronic Engineering and Computer Science, Peking University
![Page 2: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/2.jpg)
Overview
✴ Problem Statement
✴ Building patch-labeling correspondence
✴ Generative semantic embedding
✴ Experimental comparisons
✴ Conclusions
![Page 3: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/3.jpg)
Overview
✴ Problem Statement
✴ Building patch-labeling correspondence
✴ Generative semantic embedding
✴ Experimental comparisons
✴ Conclusions
![Page 4: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/4.jpg)
Introduction
• Building visual codebooks– Quantization-based approaches
• K-Means, Vocabulary Tree, Approximate K-Means et al.
– Feature-indexing-based approaches• K-D Tree, R Tree, Ball Tree et al.
• Refining visual codebooks– Topic model decompositions
• pLSA, LDA, GMM et al.
– Spatial refinement• Visual pattern mining, Discriminative visual phrase et al.
![Page 5: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/5.jpg)
Introduction
• With the prosperity of Web community
User Tags
Our Contribution
![Page 6: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/6.jpg)
Introduction
• Problems to achieve this goal
– Supervision @ image level
– Correlative semantic labels
– Model generality
A traditional San Francisco street A traditional San Francisco street view with car and buildingsview with car and buildings
??Flower
Rose
![Page 7: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/7.jpg)
Introduction
• Our contribution– Publish a ground truth path-labeling set
http://vilab.hit.edu.cn/~rrji/index_files/SemanticEmbedding.htm
– A generalized semantic embedding framework• Easy to deploy into different codebook models
– Modeling label correlations• Model correlative tagging in semantic embedding
![Page 8: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/8.jpg)
Introduction
• The proposed framework
Build patch-level correspondenceSupervised visual codebook construction
![Page 9: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/9.jpg)
Overview
✴ Problem Statement
✴ Building patch-labeling correspondence
✴ Generative semantic embedding
✴ Experimental comparisons
✴ Conclusions
![Page 10: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/10.jpg)
Building Patch-Labeling Correspondence
• Collecting over 60,000 Flickr photos– For each photo
• DoG detection + SIFT
• “Face” labels (for instance)
![Page 11: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/11.jpg)
Building Patch-Labeling Correspondence
• Purify the path-labeling correspondences– Density-Diversity Estimation (DDE)– Formulation
• For a semantic label , its initial correspondence set is
– Patches extracted from images with label
– A correspondence from label to local patch
1 2
1 2, { , ,..., },
{ , ,..., }i
i i n ii
i i n i
d s d s d s
D s d d d s
le le le
1 2{ , ,..., }i nD d d d
ijleis
is jd
Purify le links in <Di, si>
![Page 12: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/12.jpg)
Building Patch-Labeling Correspondence
• Density– For a given , Density reveals its
representability for :
• Diversity– For a given , Diversity reveals it
unique score for :
ld ldDen
is
21
1exp( || || )
l
m
d l j Lj
Den d dm
ldDivld
is
ln( )l
m md
i i
n nDiv
n n
Average neighborhood distance in m
neighborsnm: number of images in
neighborhood ni: number of total images with
label si
![Page 13: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/13.jpg)
{ | }
. . j
j j j
Purifyi j d
d d d
D d DDE T
s t DDE Den Div
High T: Concerns more on Precision,
not Recall
Building Patch-Labeling Correspondence
![Page 14: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/14.jpg)
Building Patch-Labeling Correspondence
• Case study: “Face” label (before DDE)
![Page 15: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/15.jpg)
Building Patch-Labeling Correspondence
• Case study: “Face” label (after DDE)
![Page 16: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/16.jpg)
Overview
✴ Problem Statement
✴ Building patch-labeling correspondence
✴ Generative semantic embedding
✴ Experimental comparisons
✴ Conclusions
![Page 17: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/17.jpg)
✴Generative Hidden Markov Random Field
✴ A Hidden Field for semantic modeling
✴ An Observed Field for local patch quantization
Generative Semantic Embedding
![Page 18: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/18.jpg)
• Hidden Field– Each produces correspondence links (le) to
a subset of patch in the Observed Field– Links denotes semantic correlations
between and
1{ }mi iS s
is
ijl
is js
lij
Generative Semantic Embedding
WordNet::SimilarityPedersen et al. AAAI 04
![Page 19: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/19.jpg)
• Observed Field– Any two nodes follow visual metric (e.g. L2)– Once there is a link between and , we
constrain by from the hidden field
1{ }ni iD d
ijle
jsjsid
id
sj
di
leij
Generative Semantic Embedding
![Page 20: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/20.jpg)
• In the ideal case
– Each is conditionally independent given :
Neighbors in the Hidden Field
id
( | ) { ( | ) | ( | ) 0}i j i ji m
P D S P d s P d s
Feature set D is regarded as (partial) generative from the Hidden Field
Generative Semantic Embedding
![Page 21: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/21.jpg)
– Formulize Clustering procedure
• Assign a unique cluster label to each
• Hence, D is quantized into a codebook with corresponding features
• Cost for codebook candidate C
P(C|D)=P(C)P(D|C)/P(D)
ic id
1{ }Kk kW w 1{ }Kk kV v
Generative Semantic Embedding
P(D) is a constraintSemantic Constraint
Visual Distortion
![Page 22: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/22.jpg)
• Semantic Constraint P(C)– Define a MRF on the Observed Field
– For a quantizer assignments C, its probability can be expressed as a Gibbs distribution from the Hidden Field as
Generative Semantic Embedding
![Page 23: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/23.jpg)
• That is– Two data points and in contribute to
if and only ifjdid kw
Observed FieldObserved Field
Hidden FieldHidden Field
ccii=c=cii
ssxx ssyy
ddjjddii
llxyxy
Generative Semantic Embedding
![Page 24: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/24.jpg)
• Visual Constraint P(D|C)– Whether the codebook C is visually consistent
with current data D• Visual distortion in quantization
Generative Semantic Embedding
![Page 25: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/25.jpg)
• Overall Cost• Finding MAP of P(C|D) can be converted into
maximizing its posterior probability
The visual constraint P(D|C)
The semantic constraints P(C)
Generative Semantic Embedding
![Page 26: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/26.jpg)
• EM solution
• E step
• M step
Assign local patches to the closest clusters
Update the visual word center
Generative Semantic Embedding
![Page 27: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/27.jpg)
Generative Semantic Embedding
![Page 28: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/28.jpg)
Generative Semantic Embedding
• Model generation– To supervised codebook with label
independence assumption
– To unsupervised codebooks• Making all l=0
![Page 29: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/29.jpg)
Overview
✴ Problem Statement
✴ Building patch-labeling correspondence
✴ Generative semantic embedding
✴ Experimental comparisons
✴ Conclusions
![Page 30: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/30.jpg)
Experimental Comparisons
Case study of ratios between inter-class distance and intra-class distance with and without semantic embedding in the Flickr dataset.
![Page 31: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/31.jpg)
Experimental Comparisons
MAP@1 comparisons between our GSE model to Vocabulary Tree [1] and GNP [34] in Flickr 60,000 database.
![Page 32: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/32.jpg)
Experimental Comparisons
Confusion table comparisons on PASCAL VOC dataset with method in [24].
![Page 33: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/33.jpg)
Overview
✴ Problem Statement
✴ Building patch-labeling correspondence
✴ Generative semantic embedding
✴ Experimental comparisons
✴ Conclusions
![Page 34: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/34.jpg)
Conclusion
• Our contribution• Propagating Web Labeling from images to local
patches to supervise codebook quantization• Generalized semantic embedding framework for
supervised codebook building• Model correlative semantic labels in supervision
• Future works• Adapt one supervised codebook for different tasks• Move forward supervision into local feature
detection and extraction
![Page 35: Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649f325503460f94c4e6ae/html5/thumbnails/35.jpg)
Thank you!Thank you!