Class 6: Attributes and Semantic Features

Rogerio Feris, March 6, 2014 EECS 6890 – Topics in Information Processing

Spring 2014, Columbia University http://rogerioferis.com/VisualRecognitionAndSearch2014

Class 6: Attributes and Semantic Features

Visual Recognition And Search Columbia University, Spring 2014

Paper Review Reminder

Paper review due March 11 (solo, no groups):

Perronnin et al, Improving the Fisher Kernel for Large-Scale Image Classification, ECCV 2010

You can use up to 3 late days over the course of the semester

Required content (1-2 pages):

Summary Strengths and Weaknesses Experimental Analysis Proposed Extensions

Check more details at:

http://rogerioferis.com/VisualRecognitionAndSearch2014/PaperReviews.html


Project Update Reminder

Project Update Presentation: March 25/27

Milestones, preliminary results.

More information about the project update requirements coming soon.


What we have seen so far

Low-Level Features

SIFT, SURF, HOG, BRISK, etc.

Feature Coding and Pooling

Bag-of-words, Sparse coding, Fisher vector coding, etc.

Encoding Structure: Part-based Models

Deformable Part-based Models, Poselets, etc.

Attributes And Semantic Features [Today]

Part I: From Low-level to Semantic Visual Representations


Introduction to Semantic Features

Use the scores of semantic classifiers as high-level features

…

Semantic Features

Off-the-shelf Classifiers

Compact / powerful descriptor with semantic meaning (allows “explaining” the decision)

Score Score Score

Water Classifier Sand Classifier Sky Classifier

Input Image

Beach Classifier


Semantic Features (Frame-Level) Illustration of Early IBM work (multimedia community) describing

this concept

[John Smith et al, Multimedia Semantic Indexing Using Model Vectors, ICME 2003]

Concatenation / Dimensionality Reduction


Semantic Features (Frame-level)

System evolved to the IBM Multimedia Analysis and Retrieval System (IMARS)

Ensemble Learning

Rapid event modeling, e.g., “accident with high-speed skidding”

Discriminative semantic basis [Rong Yan et al, Model-Shared Subspace Boosting for Multi-label Classification, KDD 2007]


Classemes (Frame-level)

[L. Torresani et al, Efficient Object Category Recognition Using Classemes, ECCV 2010]

Noisy Labels

Images used to train the “table” classeme (from Google image search)

Descriptor is formed by concatenating the outputs of weakly trained classifiers called classemes (trained with noisy labels)


Classemes (Frame-level)

Compact and Efficient Descriptor , useful for large-scale classification

Features are not really semantic!


Semantic Features (Object Level)

Object Bank

http://vision.stanford.edu/projects/objectbank/

[Li-Jia Li et al, Object Bank: A High-Level Image Representation for Scene Classification and Semantic Feature Sparsification]

Source code available (~7 seconds per image)


Shifting from Naming to Describing:

Representations based on Semantic Attributes


Semantic Attributes

Modifiers rather than (or in addition to) nouns

Semantic properties that are shared among objects

Attributes are category independent and transferrable

Bald

Beard

Red Shirt ?

Naming Describing


Examples of Semantic Attributes

http://whatbird.com



[Lampert et al, Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, CVPR 2009]



[Farhadi et al, Describing Objects by their Attributes, 2009]



[Berg et al, Automatic Attribute Discovery and Characterization, ECCV 2010]



[Chen et al, Describing Clothing by Semantic Attributes, ECCV 2012]



http://www.galaxyzoo.org/


Attribute Models

Slide credit: Devi Parikh

[Kumar et al., Describable Visual Attributes for Face Verification and Image Search, PAMI 2011]

(Or confidence)

Binary Attributes


Attribute Models

Slide credit: Devi Parikh

> natural

< smiling

Parikh and Grauman, Relative Attributes, ICCV 2011

Max-margin learning to rank formulation of Joachims 2002

Relative Attributes


Attribute-Based Classification

Scalable Learning


Attribute-based Classification

[Lampert et al, Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, CVPR 2009]

Recognition of Unseen Classes (Zero-Shot Learning)

1) Train semantic attribute classifiers

2) Obtain a classifier for an unseen object (no training samples) by just specifying which attributes it has


Zero-Shot Learning

Unseen categories

Unseen categories

Semantic Attribute Classifiers

Flat multi-class classification

Attribute-based classification


Class-Attribute Associations

Manual Specification of Class-Attribute Associations


Class-Attribute Associations

Associations may be extracted automatically from other sources

Rohrbach et al . "What Helps Where – And Why? Semantic Relatedness for Knowledge Transfer", CVPR 2010


Label Embedding

Label Embedding Framework

Manual Specification of Attributes

Akata et al . “Label Embedding for Attribute-based Classification", CVPR 2013


Label Embedding

Frome et al . "DeViSE: A Deep Visual-Semantic Embedding Model", NIPS 2013


Automatic Discovery of word associations


Label Embedding

Language Model Source Code: https://code.google.com/p/word2vec/

Zero-Shot Learning / Semantically close mistakes


Automatic Discovery of word associations


Attributes as mid-level features

Face verification [Kumar et al, ICCV 2009]

Action recognition [Liu al, CVPR2011]

Semantic attributes + discriminative (non-semantic) features



Person Re-identification [Layne et al, BMVC 2012]

Bird Categorization [Farrell et al, ICCV 2011]


Attributes as mid-level features Dhar et al, High Level Describable Attributes for Predicting

Aesthetics and Interestingness, CVPR 2011



Slide credit: Tamara Berg

…

Detecting Interesting Insects



Slide credit: Tamara Berg

…

Detecting Interesting Beaches



Note: Several recent methods use the term “attributes” to refer to non-semantic model outputs In this case attributes are just mid-level features, like PCA, hidden layers in neural nets, … (non-interpretable splits)


Attributes for Fine-Grained Categorization


Fine-Grained Categorization


Fine-Grained Categorization Visipedia (http://http://visipedia.org/)

Machines collaborating with humans to organize visual knowledge, connecting text to

images, images to text, and images to images

Easy annotation interface for experts (powered by computer vision)

Picture credit: Serge Belongie

Visual Query: Fine-grained Bird Categorization

http://http/visipedia.org/

http://http/visipedia.org/



Slide Credit: Christoph Lampert

African Indian Is it an African or Indian Elephant?

Example-based Fine-Grained Categorization is Hard!!



African Indian Is it an African or Indian Elephant?

Visual distinction of subordinate categories may be quite subtle, usually based on Parts and Attributes

Larger Ears Smaller Ears



Codebook

Standard classification methods may not be suitable because the variation between classes is small …

[B. Yao, CVPR 2012]

… and intra-class variation is still high.



Humans rely on field guides!

Field guides usually refer to parts and attributes of the object

Slide Credit: Pietro Perona


Fine-Grained Categorization [Branson et al, Visual Recognition with Humans in the Loop, ECCV 2010]



[Branson et al, Visual Recognition with Humans in the Loop, ECCV 2010]

Computer vision reduces the amount of human-interaction (minimizes the number of questions)



[Wah et al, Multiclass Recognition and Part Localization with Humans in the Loop, ICCV 2011]

Localized part and attribute detectors.

Questions include asking the user to localize parts.



Video Demo



http://www.vision.caltech.edu/visipedia/CUB-200-2011.html



Check the fine-grained visual categorization workshop: http://www.fgvc.org/



Is fine-grained recognition different? Check https://sites.google.com/site/fgcomp2013/


Attribute-Based Search


People Search in Surveillance Videos

Traditional Approaches: Face Recognition (“Naming”)

Face recognition is very challenging under lighting changes, pose variation, and low-resolution imagery (typical conditions in surveillance scenarios)

Attribute-based People Search (“Describing”)

[Vaquero et al, Attribute-based People Search in Surveillance Environments, WACV 2009]

Rather than relying on face recognition only, a complementary people search framework based on semantic attributes is provided

Query Example:

“Show me all bald people at the 42nd street station last month with dark skin, wearing sunglasses, wearing a red jacket”



Feris et al, ICMR 2014



Boston Bombing Event “Show me all images of people matching the suspect description from

time X to time Y from all cameras in area Z.”

Ability to spot a person with e.g., a white hat in a crowded scene

Suspect #1 found in 4 images in top 8 results Suspect #2 found in 3 images in top page

1071 detected faces from 50 high-res Boston images (all from Flickr)



People Search based on textual descriptions - It does not require training images for the target suspect.

Robustness: attribute detectors are trained using lots of training images covering different lighting conditions, pose variation, etc.

Works well in low-resolution imagery (typical in video surveillance scenarios)



[Siddiquie, Feris and Davis, “Image Ranking and Retrieval Based on

Multi-Attribute Queries”, CVPR 2011]

Modeling attribute correlations


MugHunt Demo

http://mughunt.securics.com/


Whittle Search

Slide credit: Kristen Grauman


Whittle Search Check Whittle Search demo at: http://godel.ece.vt.edu/whittle/


Resources

http://rogerioferis.com/VisualRecognitionAndSearch2014/Resources.html


http://rogerioferis.com/PartsAndAttributes/

http://pub.ist.ac.at/~chl/PnA2012/

Class 6: Attributes and Semantic Features

Documents

Transcript of Class 6: Attributes and Semantic Features