CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector •...

68
Lecture 16: Bag-of-words models CS6670: Computer Vision Noah Snavely Object Bag of ‘words’

Transcript of CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector •...

Page 1: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Lecture 16: Bag-of-words models

CS6670: Computer VisionNoah Snavely

Object Bag of ‘words’

Page 2: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Announcements

• Project 3: Eigenfaces

– due Wednesday, November 11 at 11:59pm

– solo project

• Final project presentations:

– During the final exams period

Page 3: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Skin detection results

Page 4: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Eigenfaces

PCA extracts the eigenvectors of A

• Gives a set of vectors v1, v2, v3, ...

• Each one of these vectors is a direction in face space

– what do these look like?

Page 5: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Perc

eptu

al

and S

enso

ry A

ugm

ente

d C

om

puti

ng

Vis

ual O

bje

ct

Reco

gn

itio

n T

uto

rial

Vis

ual O

bje

ct

Reco

gn

itio

n T

uto

rial

K. Grauman, B. Leibe

Viola-Jones Face Detector: Results

5K. Grauman, B. Leibe

Page 6: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Moving forward

• Faces are pretty well-behaved

– Mostly the same basic shape

– Lie close to a low-dimensional subspace

• Not all objects are as nice

Page 7: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Different appearance, similar parts

Page 8: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Bag of WordsModels

Adapted from slides by Rob Fergus

and Svetlana Lazebnik

Page 9: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Object Bag of ‘words’

Page 10: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Origin 1: Texture Recognition

Example textures (from Wikipedia)

Page 11: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Origin 1: Texture recognition

• Texture is characterized by the repetition of basic elements

or textons

• For stochastic textures, it is the identity of the textons, not

their spatial arrangement, that matters

Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001;

Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

Page 12: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Origin 1: Texture recognition

Universal texton dictionary

histogram

Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001;

Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

Universal texton dictionary

Page 13: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Origin 2: Bag-of-words models

• Orderless document representation: frequencies of words

from a dictionary Salton & McGill (1983)

Page 14: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Origin 2: Bag-of-words models

US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/

• Orderless document representation: frequencies of words

from a dictionary Salton & McGill (1983)

Page 15: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Origin 2: Bag-of-words models

US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/

• Orderless document representation: frequencies of words

from a dictionary Salton & McGill (1983)

Page 16: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Origin 2: Bag-of-words models

US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/

• Orderless document representation: frequencies of words

from a dictionary Salton & McGill (1983)

Page 17: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Bags of features for object recognition

Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)

face, flowers, building

• Works pretty well for image-level classification and for

recognizing object instances

Page 18: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Bags of features for object recognition

Caltech6 dataset

bag of features bag of features Parts-and-shape model

Page 19: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Bag of features

• First, take a bunch of images, extract

features, and build up a “dictionary” or “visual

vocabulary” – a list of common features

• Given a new image, extract features and

build a histogram – for each feature, find the

closest visual word in the dictionary

Page 20: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Bag of features: outline

1. Extract features

Page 21: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Bag of features: outline

1. Extract features

2. Learn “visual vocabulary”

Page 22: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Bag of features: outline

1. Extract features

2. Learn “visual vocabulary”

3. Quantize features using visual vocabulary

Page 23: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Bag of features: outline

1. Extract features

2. Learn “visual vocabulary”

3. Quantize features using visual vocabulary

4. Represent images by frequencies of

“visual words”

Page 24: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Regular grid• Vogel & Schiele, 2003

• Fei-Fei & Perona, 2005

1. Feature extraction

Page 25: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Regular grid• Vogel & Schiele, 2003

• Fei-Fei & Perona, 2005

Interest point detector• Csurka et al. 2004

• Fei-Fei & Perona, 2005

• Sivic et al. 2005

1. Feature extraction

Page 26: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Regular grid• Vogel & Schiele, 2003

• Fei-Fei & Perona, 2005

Interest point detector• Csurka et al. 2004

• Fei-Fei & Perona, 2005

• Sivic et al. 2005

Other methods• Random sampling (Vidal-Naquet & Ullman, 2002)

• Segmentation-based patches (Barnard et al. 2003)

1. Feature extraction

Page 27: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Normalize

patch

Detect patches

[Mikojaczyk and Schmid ’02]

[Mata, Chum, Urban & Pajdla, ’02]

[Sivic & Zisserman, ’03]

Compute

SIFT

descriptor

[Lowe’99]

Slide credit: Josef Sivic

1. Feature extraction

Page 28: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

1. Feature extraction

Page 29: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

2. Learning the visual vocabulary

Page 30: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

2. Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic

Page 31: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

2. Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic

Visual vocabulary

Page 32: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

K-means clustering

• Want to minimize sum of squared Euclidean

distances between points xi and their

nearest cluster centers mk

Algorithm:

• Randomly initialize K cluster centers

• Iterate until convergence:• Assign each data point to the nearest center

• Recompute each cluster center as the mean of all points

assigned to it

k

ki

ki mxMXDcluster

clusterinpoint

2)(),(

Page 33: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

From clustering to vector quantization

• Clustering is a common method for learning a

visual vocabulary or codebook• Unsupervised learning process

• Each cluster center produced by k-means becomes a

codevector

• Codebook can be learned on separate training set

• Provided the training set is sufficiently representative, the

codebook will be “universal”

• The codebook is used for quantizing features• A vector quantizer takes a feature vector and maps it to the

index of the nearest codevector in a codebook

• Codebook = visual vocabulary

• Codevector = visual word

Page 34: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Example visual vocabulary

Fei-Fei et al. 2005

Page 35: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Image patch examples of visual words

Sivic et al. 2005

Page 36: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Visual vocabularies: Issues

• How to choose vocabulary size?• Too small: visual words not representative of all patches

• Too large: quantization artifacts, overfitting

• Generative or discriminative learning?

• Computational efficiency• Vocabulary trees

(Nister & Stewenius, 2006)

Page 37: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

3. Image representation

…..

fre

qu

en

cy

codewords

Page 38: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Image classification

• Given the bag-of-features representations of

images from different classes, how do we

learn a model for distinguishing them?

Page 39: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Uses of BoW representation

• Treat as feature vector for standard classifier

– e.g k-nearest neighbors, support vector machine

• Cluster BoW vectors over image collection

– Discover visual themes

Page 40: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

K nearest neighbors

• For a new point, find the k closest points from training data

• Labels of the k points “vote” to classify

• Works well provided there is lots of data and the distance function is good

k = 5

Source: D. Lowe

Page 41: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Linear classifiers• Find linear function (hyperplane) to separate

positive and negative examples

0:negative

0:positive

b

b

ii

ii

wxx

wxx

Which hyperplaneis best?

Page 42: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Support vector machines

• Find hyperplane that maximizes the margin between the positive and negative examples

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Page 43: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Support vector machines

• Find hyperplane that maximizes the margin between the positive and negative examples

1:1)(negative

1:1)( positive

by

by

iii

iii

wxx

wxx

MarginSupport vectors

For support, vectors, 1 bi wx

Page 44: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Large-scale image matching

• Bag-of-words models have been useful in matching an image to a large database of object instances

11,400 images of game covers(Caltech games dataset)

how do I find this image in the database?

Page 45: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Large-scale image search

• Build the database:

– Extract features from the database images

– Learn a vocabulary using k-means (typical k: 100,000)

– Compute weights for each word

– Create an inverted file mapping words images

Page 46: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Weighting the words

• Just as with text, some visual words are more discriminative than others

• the bigger fraction of the documents a word appears in, the less useful it is for matching

– e.g., a word that appears in all documents is not helping us

the, and, or vs. cow, AT&T, Cher

Page 47: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

TF-IDF weighting

• Instead of computing a regular histogram distance, we’ll weight each word by it’s inverse document frequency

• inverse document frequency (IDF) of word j =

log number of documents

number of documents in which j appears

Page 48: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

TF-IDF weighting

• To compute the value of bin j in image I:

term frequency of j in I x inverse document frequency of j

Page 49: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Inverted file

• Each image has ~1,000 features

• We have ~100,000 visual words

each histogram is extremely sparse (mostly zeros)

• Inverted file

– mapping from words to documents

Page 50: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Inverted file

• Can quickly use the inverted file to compute similarity between a new image and all the images in the database

– Only consider database images whose bins overlap the query image

Page 51: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Large-scale image search

• Cons:

– not as accurate as per-image-pair feature matching

– performance degrades as the database grows

query image top 6 results

Page 52: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Large-scale image search

• Pros:

– Works well for CD covers, movie posters

– Real-time performance possible

real-time retrieval from a database of 40,000 CD covers

Nister & Stewenius, Scalable Recognition with a Vocabulary Tree

Page 53: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Large-scale image matching

Turn 1,000,000 images of Rome…

Page 54: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

…into 3D models

Colosseum

St. Peter’s Basilica

Trevi Fountain

Page 55: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

Large-scale image matching

• How can we match 1,000,000 images to each other?

• Brute force approach: 500,000,000,000 pairs– won’t scale

• Better approach: use bag-of-words technique to find likely matches

• For each image, find the top M scoring other images, do detailed SIFT matching with those

Page 56: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods
Page 57: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods
Page 58: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods
Page 59: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods
Page 60: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods
Page 61: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods
Page 62: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods
Page 63: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods
Page 64: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods
Page 65: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods
Page 66: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods
Page 67: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods
Page 68: CS6670: Computer Vision - Cornell University• Fei-Fei & Perona, 2005 Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 Other methods

What about spatial info?