Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14...

98
Lecture 11 - Silvio Savarese 12-Feb-14 • Descriptors (wrapping up) • An introduction to recognition • Image classification Lecture 11 Visual recognition

Transcript of Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14...

Page 1: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Lecture 11 -Silvio Savarese 12-Feb-14

• Descriptors (wrapping up)• An introduction to recognition • Image classification

Lecture 11Visual recognition

Page 2: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Feature

Detection

Feature

Description

• Estimation

• Matching

• Indexing

• Detection

e.g. DoG

e.g. SIFT

The big picture…

Page 3: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Properties

• Invariant w.r.t:•Illumination

•Pose

•Scale

•Intraclass variability

• Highly distinctive (allows a single feature to find its correct match

with good probability in a large database of features)

Depending on the application a descriptor must

incorporate information that is:

Page 4: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Descriptor Illumination Pose Intra-class

variab.

PATCH Good Poor Poor

FILTERS Good Medium Medium

SIFT Good Good Medium

Page 5: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

• Find dominant orientation by building a

orientation histogram

• Rotate all orientations by the dominant

orientation

0 2 p

This makes the SIFT descriptor rotational invariant

Rotational invariance

Page 6: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Pose normalization

• Keypoints are transformed in order to be

invariant to translation, rotation, scale, and

other geometrical parameters [Lowe 2000]

Co

urte

sy o

f D. L

ow

e

Change of scale, pose, illumination…

Page 7: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Shape context descriptorBelongie et al. 2002

1 2 3 4 5 12 13 14 15 16 ….

3

1

Histogram (occurrences within each bin)

Bin #

13th

00

//

Page 8: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Shape context descriptor

Belongie et al. 02

Page 9: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Other detectors/descriptors

• ORB: an efficient alternative to SIFT or SURF

• Fast Retina Key- point (FREAK)A. Alahi, R. Ortiz, and P. Vandergheynst. FREAK: Fast Retina Keypoint. In IEEE Conference on Computer Vision and Pattern Recognition, 2012. CVPR 2012 Open Source Award Winner.

Ethan Rublee, Vincent Rabaud, Kurt Konolige, Gary R. Bradski: ORB: An efficient alternative to SIFT or SURF. ICCV 2011

Rosten. Machine Learning for High-speed Corner Detection, 2006.

• FAST (corner detector)

Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van Gool, "SURF: Speeded Up Robust Features", Computer Vision and Image Understanding (CVIU), Vol. 110, No. 3, pp. 346--359, 2008

• SURF: Speeded Up Robust Features

• HOG: Histogram of oriented gradientsDalal & Triggs, 2005

Page 10: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Lecture 11 -Silvio Savarese 12-Feb-14

• Descriptors (wrapping up)• An introduction to recognition • Image classification

Lecture 12Visual recognition

Page 11: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification
Page 12: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Classification: Does this image contain a building? [yes/no]

Yes!

Page 13: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Classification:Is this an beach?

Page 15: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Detection:Does this image contain a car? [where?]

car

Page 16: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Building

clock

personcar

Detection:Which object does this image contain? [where?]

Page 17: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

clock

Detection:Accurate localization (segmentation)

Page 18: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Object detection is useful…

SurveillanceAssistive technologies

SecurityAssistive driving

Computational photography

Page 19: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Categorization vs Single instance

recognitionWhich building is this? Marshall Field building in Chicago

Page 20: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Where is the crunchy nut?

Categorization vs Single instance

recognition

Page 21: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

+ GPS

•Recognizing landmarks in

mobile platforms

Applications of computer vision

Page 22: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Object: Person, back;

1-2 meters away

Object: Police car, side view, 4-5 m away

Object: Building, 45º pose, 8-10 meters awayIt has bricks

Detection: Estimating object semantic

& geometric attributes

Page 23: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Activity or Event recognitionWhat are these people doing?

Page 24: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Visual Recognition

• Design algorithms that are capable to

– Classify images or videos

– Detect and localize objects

– Estimate semantic and geometrical attributes

– Classify human activities and events

Why is this challenging?

Page 25: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

How many object categories are there?

Page 26: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Challenges: viewpoint variation

Michelangelo 1475-1564 slide credit: Fei-Fei, Fergus & Torralba

Page 27: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Challenges: illumination

image credit: J. Koenderink

Page 28: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Challenges: scale

slide credit: Fei-Fei, Fergus & Torralba

Page 29: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Challenges: deformation

Page 30: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Challenges:

occlusion

Magritte, 1957 slide credit: Fei-Fei, Fergus & Torralba

Page 31: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Challenges: background clutter

Kilmeny Niland. 1995

Page 32: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Challenges: intra-class variation

Page 33: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

• Turk and Pentland, 1991

• Belhumeur, Hespanha, & Kriegman, 1997

• Schneiderman & Kanade 2004

• Viola and Jones, 2000

• Amit and Geman, 1999

• LeCun et al. 1998

• Belongie and Malik, 2002

• Schneiderman & Kanade, 2004

• Argawal and Roth, 2002

• Poggio et al. 1993

Some early works on

object categorization

Page 34: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Basic properties

• Representation

– How to represent an object category; which classification scheme?

• Learning

– How to learn the classifier, given training data

• Recognition

– How the classifier is to be used on novel data

Page 35: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Representation

- Building blocks: Sampling strategies

RandomlyMultiple interest operators

Interest operators Dense, uniformly

Ima

ge

cre

dits: F

-F.

Li, E

. N

ow

ak, J.

Siv

ic

Page 36: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Representation

- Building blocks: Choice of descriptors

[SIFT, HOG, codewords….]

Page 37: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Representation

– Appearance only or location and appearance

Page 38: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Representation

–Invariances

• View point

• Illumination

• Occlusion

• Scale

• Deformation

• Clutter

• etc.

Page 39: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Representation

– To handle intra-class variability, it is convenient to

describe an object categories using probabilistic

models

– Object models: Generative vs Discriminative vs

hybrid

Page 40: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Object categorization:

the statistical viewpoint

)|( imagezebrap

)( ezebra|imagnop

vs.

)(

)(

)|(

)|(

)|(

)|(

zebranop

zebrap

zebranoimagep

zebraimagep

imagezebranop

imagezebrap

• Bayes rule:

Page 41: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Object categorization:

the statistical viewpoint

)|( imagezebrap

)( ezebra|imagnop

vs.

• Bayes rule:

)(

)(

)|(

)|(

)|(

)|(

zebranop

zebrap

zebranoimagep

zebraimagep

imagezebranop

imagezebrap

posterior ratio likelihood ratio prior ratio

Page 42: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Object categorization:

the statistical viewpoint

• Bayes rule:

)(

)(

)|(

)|(

)|(

)|(

zebranop

zebrap

zebranoimagep

zebraimagep

imagezebranop

imagezebrap

posterior ratio likelihood ratio prior ratio

• Discriminative methods model posterior

• Generative methods model likelihood and

prior

Page 43: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Discriminative models

Support Vector Machines

Guyon, Vapnik, Heisele,

Serre, Poggio…

Boosting

Viola, Jones 2001,

Torralba et al. 2004,

Opelt et al. 2006,…

106 examples

Nearest neighbor

Shakhnarovich, Viola, Darrell 2003

Berg, Berg, Malik 2005...

Neural networks

Slide adapted from Antonio TorralbaCourtesy of Vittorio Ferrari

Slide credit: Kristen Grauman

Latent SVM

Structural SVM

Felzenszwalb 00

Ramanan 03…

LeCun, Bottou, Bengio, Haffner 1998

Rowley, Baluja, Kanade 1998

Page 44: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Generative models

• Naïve Bayes classifier– Csurka Bray, Dance & Fan, 2004

• Hierarchical Bayesian topic models (e.g. pLSA and LDA)

– Object categorization: Sivic et al. 2005, Sudderth et al. 2005

– Natural scene categorization: Fei-Fei et al. 2005

• 2D Part based models- Constellation models: Weber et al 2000; Fergus et al 200

- Star models: ISM (Leibe et al 05)

• 3D part based models: - multi-aspects: Sun, et al, 2009

Page 45: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Basic properties

• Representation

– How to represent an object category; which classification scheme?

• Learning

– How to learn the classifier, given training data

• Recognition

– How the classifier is to be used on novel data

Page 46: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

• Learning parameters: What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)

Learning

Page 47: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

• Learning parameters: What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)

• Level of supervision• Manual segmentation; bounding box; image labels;

noisy labels

Learning

• Batch/incremental

• Priors

Page 48: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

• Learning parameters: What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)

• Level of supervision• Manual segmentation; bounding box; image labels;

noisy labels

Learning

• Batch/incremental

• Training images:•Issue of overfitting

•Negative images for

discriminative methods

• Priors

Page 49: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Basic properties

• Representation

– How to represent an object category; which classification scheme?

• Learning

– How to learn the classifier, given training data

• Recognition

– How the classifier is to be used on novel data

Page 50: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

– Recognition task: classification, detection, etc..

Recognition

Page 51: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Recognition

– Recognition task

– Search strategy: Sliding Windows

• Simple

• Computational complexity (x,y, S, , N of classes)

- BSW by Lampert et al 08

- Also, Alexe, et al 10

Viola, Jones 2001,

Page 52: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Recognition

– Recognition task

– Search strategy: Sliding Windows

• Simple

• Computational complexity (x,y, S, , N of classes)

• Localization

• Objects are not boxes

- BSW by Lampert et al 08

- Also, Alexe, et al 10

Viola, Jones 2001,

Page 53: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Recognition

– Recognition task

– Search strategy: Sliding Windows

• Simple

• Computational complexity (x,y, S, , N of classes)

• Localization

• Objects are not boxes

• Prone to false positive

- BSW by Lampert et al 08

- Also, Alexe, et al 10

Non max suppression:

Canny ’86

….

Desai et al , 2009

Viola, Jones 2001,

Page 54: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Successful methods using sliding windows

[Dalal & Triggs, CVPR 2005]

• Subdivide scanning window

•In each cell compute histogram of gradients

orientation.

Code available: http://pascal.inrialpes.fr/soft/olt/

- Subdivide scanning window

- In each cell compute histogram of

codewords of adjacent segments

[Ferrari & al, PAMI 2008]

Code available: http://www.vision.ee.ethz.ch/~calvin

Page 55: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

– Recognition task

– Search strategy : Probabilistic “heat maps”

Recognition

Original

image

• Fergus et al 03

• Leibe et al 04

Page 56: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

– Recognition task

– Search strategy :

• Hypothesis generation + verification

Recognition

Page 57: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Recognition

Category: car

Azimuth = 225º

Zenith = 30º

•Savarese, 2007

•Sun et al 2009

• Liebelt et al., ’08, 10

•Farhadi et al 09

- It has metal

- it is glossy

- has wheels

•Farhadi et al 09

• Lampert et al 09

• Wang & Forsyth 09

– Recognition task

– Search strategy

– Attributes

Page 58: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Semantic:•Torralba et al 03

• Rabinovich et al 07

• Gupta & Davis 08

• Heitz & Koller 08

• L-J Li et al 08

• Bang & Fei-Fei 10

Recognition

– Recognition task

– Search strategy

– Attributes

– Context

Geometric• Hoiem, et al 06

• Gould et al 09

• Bao, Sun, Savarese 10

Page 59: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Segmentation

• Bottom up segmentation

• Semantic segmentation

Felzenszwalb and Huttenlocher, 2004

Malik et al. 01

Maire et al. 08

Duygulu et al. 02

Page 60: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Agenda on recognition

• Image classification• Bag of words representations

• Object detection• 2D object detection• 3D object detection

• Scene understanding

• Activity understanding

Page 61: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Lecture 11 -Silvio Savarese 12-Feb-14

• Descriptors (wrapping up)• An introduction to recognition • Image classification

• Bag of words models

Lecture 12Visual recognition

Page 62: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Challenges:

Variability due to:

• View point

• Illumination

• Occlusions

• Etc..

Page 63: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Challenges: intra-class variation

Page 64: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Basic properties

• Representation

– How to represent an object category; which classification scheme?

• Learning

– How to learn the classifier, given training data

• Recognition

– How the classifier is to be used on novel data

Page 65: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Part 1: Bag-of-words models

This segment is based on the tutorial “Recognizing and Learning

Object Categories: Year 2007”, by Prof A. Torralba, R. Fergus and F. Li

Page 66: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Related works

• Early “bag of words” models: mostly texture recognition– Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik,

2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003;

• Hierarchical Bayesian models for documents (pLSA, LDA, etc.)– Hoffman 1999; Blei, Ng & Jordan, 2004; Teh, Jordan, Beal &

Blei, 2004

• Object categorization– Csurka, Bray, Dance & Fan, 2004; Sivic, Russell, Efros,

Freeman & Zisserman, 2005; Sudderth, Torralba, Freeman & Willsky, 2005;

• Natural scene categorization– Vogel & Schiele, 2004; Fei-Fei & Perona, 2005; Bosch,

Zisserman & Munoz, 2006

Page 67: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Object Bag of ‘words’

Page 68: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Analogy to documents

Of all the sensory impressions proceeding to

the brain, the visual experiences are the

dominant ones. Our perception of the world

around us is based essentially on the

messages that reach the brain from our eyes.

For a long time it was thought that the retinal

image was transmitted point by point to visual

centers in the brain; the cerebral cortex was a

movie screen, so to speak, upon which the

image in the eye was projected. Through the

discoveries of Hubel and Wiesel we now

know that behind the origin of the visual

perception in the brain there is a considerably

more complicated course of events. By

following the visual impulses along their path

to the various cell layers of the optical cortex,

Hubel and Wiesel have been able to

demonstrate that the message about the

image falling on the retina undergoes a step-

wise analysis in a system of nerve cells

stored in columns. In this system each cell

has its specific function and is responsible for

a specific detail in the pattern of the retinal

image.

sensory, brain,

visual, perception,

retinal, cerebral cortex,

eye, cell, optical

nerve, image

Hubel, Wiesel

China is forecasting a trade surplus of $90bn

(£51bn) to $100bn this year, a threefold

increase on 2004's $32bn. The Commerce

Ministry said the surplus would be created by

a predicted 30% jump in exports to $750bn,

compared with a 18% rise in imports to

$660bn. The figures are likely to further

annoy the US, which has long argued that

China's exports are unfairly helped by a

deliberately undervalued yuan. Beijing

agrees the surplus is too high, but says the

yuan is only one factor. Bank of China

governor Zhou Xiaochuan said the country

also needed to do more to boost domestic

demand so more goods stayed within the

country. China increased the value of the

yuan against the dollar by 2.1% in July and

permitted it to trade within a narrow band, but

the US wants the yuan to be allowed to trade

freely. However, Beijing has made it clear that

it will take its time and tread carefully before

allowing the yuan to rise further in value.

China, trade,

surplus, commerce,

exports, imports, US,

yuan, bank, domestic,

foreign, increase,

trade, value

Page 69: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

– Independent features

definition of “BoW”

face bike violin

Page 70: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

definition of “BoW”

– Independent features

– histogram representation

codewords dictionary

Page 71: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

category

decision

Representation

feature detection

& representation

codewords dictionary

image representation

category models

(and/or) classifiers

recognitionle

arn

ing

Page 72: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

1.Feature detection and description

Page 73: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

1.Feature detection and description

• Regular grid

– Vogel & Schiele, 2003

– Fei-Fei & Perona, 2005

Page 74: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

1.Feature detection and description

• Regular grid

– Vogel & Schiele, 2003

– Fei-Fei & Perona, 2005

• Interest point detector

– Csurka, et al. 2004

– Fei-Fei & Perona, 2005

– Sivic, et al. 2005

Page 75: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

1.Feature detection and description

• Regular grid– Vogel & Schiele, 2003

– Fei-Fei & Perona, 2005

• Interest point detector– Csurka, Bray, Dance & Fan, 2004

– Fei-Fei & Perona, 2005

– Sivic, Russell, Efros, Freeman & Zisserman, 2005

• Other methods– Random sampling (Vidal-Naquet & Ullman, 2002)

– Segmentation based patches (Barnard, Duygulu, Forsyth, de Freitas, Blei, Jordan, 2003)

Page 76: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

1.Feature detection and description

Normalize

patch

Detect patches

[Mikojaczyk and Schmid ’02]

[Mata, Chum, Urban & Pajdla, ’02]

[Sivic & Zisserman, ’03]

Compute

SIFT

descriptor

[Lowe’99]

Slide credit: Josef Sivic

Page 77: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

2. Codewords dictionary formation

Page 78: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

2. Codewords dictionary formation

Page 79: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Example: color feature

Page 80: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

r

b

g

Example: color feature

Page 81: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

2. Codewords dictionary formation

Clustering/

vector quantization

Cluster center

= code word

E.g., Kmeans, see CS131A

Page 82: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification
Page 83: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification
Page 84: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification
Page 85: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification
Page 86: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Sivic et al. 2005

2. Codewords dictionary formation

• Image patch examples of codewords

Page 87: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

2. Codewords dictionary formation

Fei-Fei et al. 2005

Page 88: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

• Typically a codeword dictionary is obtained from a

training set comprising all the object classes of

interests

2. Codewords dictionary formation

Page 89: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Visual vocabularies: Issues

• How to choose vocabulary size?

– Too small: visual words not representative of all patches

– Too large: quantization artifacts, overfitting

• Computational efficiency

– Vocabulary trees

(Nister & Stewenius, 2006)

Page 90: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

3. Bag of word representation

Codewords dictionary • Nearest neighbors assignment

• K-D tree search strategy

Page 91: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

3. Bag of word representation

Codewords dictionary

fre

qu

en

cy

codewords

….

Page 92: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

• Texture is characterized by the repetition of basic

elements or textons

• For stochastic textures, it is the identity of the textons,

not their spatial arrangement, that matters

Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma &

Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

Representing textures

Credit slide: S. Lazebnik

Page 93: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Universal texton dictionary

histogram

Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma &

Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003Credit slide: S. Lazebnik

Representing textures

Page 94: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Invariance issues

• Scale? Rotation? View point? Occlusions?

– Implicit;

– depends on detectors and descriptors

Kadir and Brady. 2003

Page 95: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

feature detection

& representation

codewords dictionary

image representation

Representation

1.

2.

3.

category models

Page 96: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

Class 1 Class N

… …

Category models

Page 97: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

category

decision

codewords dictionary

category models

(and/or) classifiers

Recognition

Page 98: Lecture 11 Visual recognition - cvgl.stanford.edu · Silvio Savarese Lecture 11 - 12-Feb-14 •Descriptors (wrapping up) •An introduction to recognition •Image classification

12-Feb-14

• Bag of words models – part 2

Next Lecture