Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT)...

54
Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)
  • date post

    20-Jan-2016
  • Category

    Documents

  • view

    217
  • download

    0

Transcript of Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT)...

Page 1: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Large Image Databases and Small Codes

for Object Recognition

Rob Fergus (NYU)Antonio Torralba (MIT)Yair Weiss (Hebrew U.)

William T. Freeman (MIT)

Page 2: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Banks

y,

2006

Object Recognition

Pixels Description of scene contents

Page 3: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Amazing resource Maybe we can use it for recognition?

Large image datasets

Internet contains billions of images

Page 4: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Web image dataset• 79.3 million images• Collected using image

search engines• List of nouns taken from

Wordnet

• 32x32resolution

a-bomba-horizona._conan_doylea._e._burnsidea._e._housmana._e._kennellya.e.a_batterya_cappella_singinga_horizon

• Example first 10:

See “80 Million Tiny images” TR

Page 5: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Noisy Output from Image Search Engines

Page 6: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Nearest-Neighbor methods for recognition

# images

105

106

108

Page 7: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Issues

• Need lots of data• How to find neighbors quickly?• What distance metric to use?

• How to transfer labels from neighbors to query image?

• What happens if labels are unreliable?

Page 8: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Overview

1.Fast retrieval using compact codes

2.Recognition using neighbors with unreliable labels

Page 9: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Overview

1.Fast retrieval using compact codes

2.Recognition using neighbors with unreliable labels

Page 10: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Binary codes for images

• Want images with similar contentto have similar binary codes

• Use Hamming distance between codes– Number of bit flips– E.g.:

• Semantic Hashing [Salakhutdinov & Hinton, 2007]– Text documents

Ham_Dist(10001010,10001110)=1

Ham_Dist(10001010,11101110)=3

Page 11: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Binary codes for images

• Permits fast lookup via hashing– Each code is a memory address– Find neighbors by exploring Hamming

ball around query address– Lookup time depends on

radius of ball, NOT on # data points

Address Space

Query ImageSemantic HashFunction

Semantically similar images

Query address

Figu

re a

dapt

ed fr

om S

alak

hutd

inov

& H

into

n ‘0

7

Page 12: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Compact Binary Codes

• Google has few billion images (109)• Big PC has ~10 Gbytes (1011 bits)• Codes must fit in memory (disk too slow) Budget of 102 bits/image

• 1 Megapixel image is 107 bits• 32x32 color image is 104 bits Semantic hash function must also reduce

dimensionality

Page 13: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Code requirements

• Preserves neighborhood structureof input space

• Very compact (<102 bits/image)• Fast to compute

Three approaches:1.Locality Sensitive Hashing (LSH)2.Boosting3.Restricted Boltzmann Machines (RBM’s)

Page 14: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Image representation: Gist vectors• Pixels not a convenient representation• Use GIST descriptor instead• 512 dimensions/image• L2 distance btw. Gist vectors not bad

substitute for human perceptual distance

Oliva & Torralba, IJCV 2001

Page 15: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

1. Locality Sensitive Hashing• Gionis, A. & Indyk, P. & Motwani, R. (1999)• Take random projections of data• Quantize each projection with few bits

•For our N bit code:– Compute first N PCA components of data– Each random projection must be linear combination of the N PCA components

Page 16: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

2. Boosting

• Modified form of BoostSSC [Shaknarovich, Viola & Darrell, 2003]

• GentleBoost with exponential loss• Positive examples are pairs of neighbors• Negative examples are pairs of unrelated images• Each binary regression stump examines a single

coordinate in input pair, comparing to some learnt threshold to see that they agree

Page 17: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

3. Restricted Boltzmann Machine (RBM)

Hidden units: h

Visible units: v

Symmetric weights w

Parameters: Weights w Biases b

• Network of binary stochastic units• Hinton & Salakhutdinov, Science 2006

Page 18: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

RBM architecture

Hidden units: h

Visible units: v

Symmetric weights w

Learn weights and biases using Contrastive Divergence

Parameters: Weights w Biases b

• Network of binary stochastic units• Hinton & Salakhutdinov, Science 2006

Convenient conditional distributions:

Page 19: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Multi-Layer RBM: non-linear dimensionality reduction

512

512w1

Input Gist vector (512 dimensions)

Layer 1

512

256w2

Layer 2

256

Nw3

Layer 3

Output binary code (N dimensions)

Page 20: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Training RBM models• Two phases

1. Pre-training – Unsupervised– Use Contrastive Divergence to learn weights and biases– Gets parameters to right ballpark

2. Fine-tuning– Can make use of label information– No longer stochastic– Back-propagate error to update parameters– Moves parameters to local minimum

Page 21: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Greedy pre-training (Unsupervised)

512

512w1

Input Gist vector (512 dimensions)

Layer 1

Page 22: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Greedy pre-training (Unsupervised)

Activations of hidden units from layer 1 (512 dimensions)

512

256w2

Layer 2

Page 23: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Greedy pre-training (Unsupervised)

Activations of hidden units from layer 2 (256 dimensions)

256

Nw3

Layer 3

Page 24: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Backpropagation using Neighborhood Components Analysis objective

512

512w1

Input Gist vector (512 dimensions)

Layer 1

512

256w2

Layer 2

256

Nw3

Layer 3

Output binary code (N dimensions)

Page 25: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Neighborhood Components Analysis• Goldberger, Roweis, Salakhutdinov & Hinton, NIPS 2004

Labels defined ininput space (high dim.)

Points in output space (low dimensional)

Page 26: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Neighborhood Components Analysis• Goldberger, Roweis, Salakhutdinov & Hinton, NIPS 2004

Output of RBMW are RBM weights

Page 27: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Neighborhood Components Analysis• Goldberger, Roweis, Salakhutdinov & Hinton, NIPS 2004

Pulls nearby pointsOF SAME CLASScloser

Push nearby pointsOF DIFFERENT CLASSaway

Page 28: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Neighborhood Components Analysis• Goldberger, Roweis, Salakhutdinov & Hinton, NIPS 2004

Pulls nearby pointsOF SAME CLASScloser

Preserves neighborhood structure of original, high dimensional, space

Push nearby pointsOF DIFFERENT CLASSaway

Page 29: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Two test datasets1. LabelMe– 22,000 images (20,000 train | 2,000 test)– Ground truth segmentations for all– Can define ground truth distance btw. images using these

segmentations – Per pixel labels [SUPERVISED]

2. Web data– 79.3 million images– Collected from Internet– No labels, so use L2 distance btw. GIST vectors as

ground truth distance [UNSUPERVISED]– Noisy image labels only

Page 30: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Retrieval Experiments

Page 31: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Examples of LabelMe retrieval• 12 closest neighbors under different distance metrics

Page 32: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

LabelMe Retrieval

Size of retrieval set % o

f 50

true

nei

ghbo

rs in

retr

ieva

l set

0 2,000 10,000 20,0000

Page 33: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

LabelMe Retrieval

Size of retrieval set % o

f 50

true

nei

ghbo

rs in

retr

ieva

l set

0 2,000 10,000 20,0000

Number of bits% o

f 50

true

nei

ghbo

rs in

firs

t 500

retr

ieve

d

Page 34: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Web images retrieval%

of 5

0 tr

ue n

eigh

bors

in re

trie

val s

et

Size of retrieval set

Page 35: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Web images retrieval

Size of retrieval set

% o

f 50

true

nei

ghbo

rs in

retr

ieva

l set

% o

f 50

true

nei

ghbo

rs in

retr

ieva

l set

Size of retrieval set

Page 36: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Examples of Web retrieval

• 12 neighbors using different distance metrics

Page 37: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Retrieval Timings

Page 38: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

LabelMe Recognition examples

Page 39: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

LabelMe Recognition

Page 40: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Overview

1.Fast retrieval using compact codes

2.Recognition using neighbors with unreliable labels

Page 41: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Label Assignment

• Retrieval gives set of nearby images• How to compute label?

• Issues:– Labeling noise– Keywords can be very specific

• e.g. yellowfin tuna

Query Grover Cleveland Linnet Birdcage Chiefs Casing

Neighbors

Page 42: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Wordnet – a Lexical Dictionary

Synonyms/Hypernyms (Ordered by Estimated Frequency) of noun aardvark

Sense 1aardvark, ant bear, anteater, Orycteropus afer => placental, placental mammal, eutherian, eutherian mammal => mammal => vertebrate, craniate => chordate => animal, animate being, beast, brute, creature

=> organism, being => living thing, animate thing => object, physical object => entity

http://wordnet.princeton.edu/

Page 43: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Wordnet Hierarchy

Synonyms/Hypernyms (Ordered by Estimated Frequency) of noun aardvark

Sense 1aardvark, ant bear, anteater, Orycteropus afer => placental, placental mammal, eutherian, eutherian mammal => mammal => vertebrate, craniate => chordate => animal, animate being, beast, brute, creature

=> organism, being => living thing, animate thing => object, physical object => entity

• Convert graph structure into tree by taking most common meaning

Page 44: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)
Page 45: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Wordnet Voting Scheme

Ground truth

One image – one vote

Page 46: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Classification at Multiple Semantic Levels

Votes:

Animal 6Person 33Plant 5Device 3Administrative4Others 22

Votes:

Living 44Artifact 9Land 3Region 7Others 10

Page 47: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Wordnet Voting Scheme

Page 48: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Person Recognition

• 23% of all imagesin dataset containpeople

• Wide range ofposes: not justfrontal faces

Page 49: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Person Recognition – Test Set

• 1016 images fromAltavista using“person” query

• High res and 32x32available

• Disjoint from 79million tiny images

Page 50: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Person Recognition• Task: person in image or not? • 64-bit RBM code trained on web data (Weak image

labels only)

Viola-JonesViola-Jones

Page 51: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Object Classification• Hand-designed metric

Page 52: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Distribution of objects in the world

1

10

1,000

100

10,000

10 100 1,000

L a b e lM e s t a t i s t i c s

object rank

10% of the classes account for 93% of the labels!

1500 classes with lessthan 100 samples

Number oflabeledsamples

LabelMe dataset

Page 53: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Conclusions

• Possible to build compact codes for retrieval– # bits seems to depend on dataset– Much room for improvement– Use JPEG coefficients as input to RBM

• Can do interesting things with lots of data– What would happen with Google’s ~ 2 billion

images?– Video data

Page 54: Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)