Object recognition

Object Classes

http://images.google.com/imgres?imgurl=http://www.ecomagic.org/fruition/trees-1.jpg&imgrefurl=http://www.ecomagic.org/fruition/friends.html&h=375&w=500&sz=99&tbnid=Crq2ZBkq7-kJ:&tbnh=95&tbnw=127&hl=en&start=10&prev=/images%3Fq%3Dtrees%26svnum%3D10%26hl%3Den%26lr%3D

http://images.google.com/imgres?imgurl=http://pinker.wjh.harvard.edu/photos/cambridge_boston/images/trees%2520in%2520Cambridge%2520Common.jpg&imgrefurl=http://pinker.wjh.harvard.edu/photos/cambridge_boston/pages/trees%2520in%2520Cambridge%2520Common.htm&h=600&w=900&sz=148&tbnid=aCzG9fGgmJAJ:&tbnh=96&tbnw=145&hl=en&start=1&prev=/images%3Fq%3Dtrees%26svnum%3D10%26hl%3Den%26lr%3D

http://images.google.com/imgres?imgurl=http://www.museum.state.il.us/isas/trees/whiteoak.jpeg&imgrefurl=http://www.museum.state.il.us/isas/trees/&h=344&w=515&sz=37&tbnid=bH2nsyp9QZYJ:&tbnh=85&tbnw=128&hl=en&start=14&prev=/images%3Fq%3Dtrees%26svnum%3D10%26hl%3Den%26lr%3D

http://images.google.com/imgres?imgurl=http://upload.wikimedia.org/wikipedia/en/thumb/3/3c/Birchandmaple.jpg/180px-Birchandmaple.jpg&imgrefurl=http://en.wikipedia.org/wiki/Tree&h=232&w=180&sz=29&tbnid=oYewFL9I-NcJ:&tbnh=103&tbnw=79&hl=en&start=20&prev=/images%3Fq%3Dtrees%26svnum%3D10%26hl%3Den%26lr%3D

http://images.google.com/imgres?imgurl=http://www.stellenboschwriters.com/araucaria.jpg&imgrefurl=http://www.stellenboschwriters.com/trees.html&h=400&w=283&sz=146&tbnid=F9JwGV-Zsc8J:&tbnh=120&tbnw=84&hl=en&start=37&prev=/images%3Fq%3Dtrees%26start%3D20%26svnum%3D10%26hl%3Den%26lr%3D%26sa%3DN

http://images.google.com/imgres?imgurl=http://www.jmadden.info/trees/Frosty%2520trees%25203.jpg&imgrefurl=http://www.jmadden.info/trees/trees.htm&h=416&w=312&sz=65&tbnid=nLwh0UJ6peMJ:&tbnh=122&tbnw=91&hl=en&start=51&prev=/images%3Fq%3Dtrees%26start%3D40%26svnum%3D10%26hl%3Den%26lr%3D%26sa%3DN

http://images.google.com/imgres?imgurl=http://www.uri.edu/personal/jsch5838/shoes.jpg&imgrefurl=http://www.uri.edu/personal/jsch5838/pics.html&h=564&w=451&sz=25&tbnid=845fGieS5oUJ:&tbnh=131&tbnw=104&hl=en&start=4&prev=/images%3Fq%3Dshoes%26svnum%3D10%26hl%3Den%26lr%3D%26rls%3DGGLD,GGLD:2005-13,GGLD:en

http://images.google.com/imgres?imgurl=http://www.top-trendy.com/images/DC%2520Shoes%2520Womens%2520Lunamm3.jpg&imgrefurl=http://www.top-trendy.com/images/&h=500&w=500&sz=39&tbnid=KbkKC9kiL_wJ:&tbnh=127&tbnw=127&hl=en&start=6&prev=/images%3Fq%3Dshoes%26svnum%3D10%26hl%3Den%26lr%3D%26rls%3DGGLD,GGLD:2005-13,GGLD:en

http://images.google.com/imgres?imgurl=http://www.ameliacaruso.com/pinkdotshoesweb.jpg&imgrefurl=http://www.ameliacaruso.com/shoe.htm&h=247&w=325&sz=78&tbnid=UBGvZqY30iAJ:&tbnh=86&tbnw=114&hl=en&start=27&prev=/images%3Fq%3Dshoes%26start%3D20%26svnum%3D10%26hl%3Den%26lr%3D%26rls%3DGGLD,GGLD:2005-13,GGLD:en%26sa%3DN

http://images.google.com/imgres?imgurl=http://www.muffys.com/images/500.JPG&imgrefurl=http://www.muffys.com/modern_traditional.html&h=480&w=640&sz=32&tbnid=gznXlksZIK8J:&tbnh=101&tbnw=135&hl=en&start=34&prev=/images%3Fq%3Dshoes%26start%3D20%26svnum%3D10%26hl%3Den%26lr%3D%26rls%3DGGLD,GGLD:2005-13,GGLD:en%26sa%3DN

http://images.google.com/imgres?imgurl=http://www.mydivashop.com/Shoes%2520-%2520sexy%2520black%2520open%2520toe%2520sandal.jpg&imgrefurl=http://www.mydivashop.com/winter_clearance.htm&h=320&w=320&sz=11&tbnid=Ub88RLfQ0V0J:&tbnh=113&tbnw=113&hl=en&start=57&prev=/images%3Fq%3Dshoes%26start%3D40%26svnum%3D10%26hl%3Den%26lr%3D%26rls%3DGGLD,GGLD:2005-13,GGLD:en%26sa%3DN

http://images.google.com/imgres?imgurl=http://www.onesmallchild-accessories.com/Shoes-maryjanes.jpg&imgrefurl=http://www.onesmallchild-accessories.com/Shoes-girl.asp&h=720&w=1200&sz=35&tbnid=K7P2imhtuk8J:&tbnh=90&tbnw=150&hl=en&start=60&prev=/images%3Fq%3Dshoes%26start%3D40%26svnum%3D10%26hl%3Den%26lr%3D%26rls%3DGGLD,GGLD:2005-13,GGLD:en%26sa%3DN

http://images.google.com/imgres?imgurl=http://www.ncbi.nlm.nih.gov/genome/guide/img/tasha_image.jpg&imgrefurl=http://www.ncbi.nlm.nih.gov/genome/guide/dog/&h=1536&w=1024&sz=237&tbnid=mqZeA11z--0J:&tbnh=150&tbnw=100&hl=en&start=17&prev=/images%3Fq%3Ddog%2B%26svnum%3D10%26hl%3Den%26lr%3D%26sa%3DG

http://images.google.com/imgres?imgurl=http://www.dogart.net/images/index.1.gif&imgrefurl=http://www.dogart.net/&h=375&w=298&sz=73&tbnid=KLRZCl5XkUEJ:&tbnh=118&tbnw=93&hl=en&start=6&prev=/images%3Fq%3Ddog%2B%26svnum%3D10%26hl%3Den%26lr%3D%26sa%3DG

http://images.google.com/imgres?imgurl=http://i7.photobucket.com/albums/y295/RachelMorris/DC01.jpg&imgrefurl=http://www.suite101.com/discussion.cfm/mixed_breed_dogs/103260&h=500&w=445&sz=30&tbnid=Hkh6KsJq4jQJ:&tbnh=127&tbnw=113&hl=en&start=34&prev=/images%3Fq%3Ddog%2B%26start%3D20%26svnum%3D10%26hl%3Den%26lr%3D%26sa%3DN

Individual Recognition

Is this a dog?

Variability of Airplanes Detected

Variability of Horses Detected

Class Non-class

Recognition with 3-D primitives

Geons

Visual Class: Common Building Blocks

Optimal Class Components?

• Large features are too rare

• Small features are found

everywhere

Find features that carry the highest amount of information

Entropy

Entropy:

x = 0 1 H

p = 0.5 0.5 ? 0.1 0.9 0.47 0.01 0.99 0.08

)p(x log )p(x- H i2i

Mutual Information I(x,y)

X alone: p(x) = 0.5, 0.5 H = 1.0

X given Y:

Y = 0 Y = 1

p(x) = 0.8, 0.2 H = 0.72

p(x) = 0.1, 0.9H = 0.47

H(X|Y) = 0.5*0.72 + 0.5*0.47 = 0.595

H(X) – H(X|Y) = 1 – 0.595 = 0.405

I(X,Y) = 0.405

Mutual information

H(C) when F=1 H(C) when F=0

I(C;F) = H(C) – H(C/F)

F=1 F=0

H(C)

))(()()( cPLogcPcH

Mutual Information II

yx ypxp

yxpyxpYXI

, )()(

),(log),(),(

Computing MI from Examples

• Mutual information can be measured from examples:

100 Faces 100 Non-faces

Feature: 44 times 6 times

Mutual information: 0.1525H(C) = 1, H(C|F) = 0.8475

Full KL Classification Error

FC

p(F|C)

q(C|F)

p(C)

Optimal classification features

• Theoretically: maximizing delivered information minimizes classification error

• In practice: informative object components can be identified in training images

Mutual Info vs. Threshold

0.00 20.00 40.00

Detection threshold

Mu

tu

al

Info

forehead

hairline

mouth

eye

nose

nosebridge

long_hairline

chin

twoeyes

Selecting Fragments

Adding a New Fragment(max-min selection)

?

MIΔ

MI = MI [Δ ;class] - MI [ ;class ]Select: Maxi Mink ΔMI (Fi, Fk)

)Min. over existing fragments, Max. over the entire pool(

);(),;(min);(),;( jjiij

i FCMIFFCMIFCMIFFCMI

Highly Informative Face Fragments

Intermediate Complexity

0

5

10

15

0 1 2 3

Relative object size

100

0123456

0 1 2 3 4


100

x M

erit

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1 1.5 2

Relative resolution

- 0 . 5

0

0 . 5

1

1 . 5

0 1 2 3


Relative mutual info.

100 x Merit, weight

a. b.

100 x Merit, weight

100 x Merit, weight

Decision

Combine all detected fragments Fk:

∑wk Fk > θ

Optimal Separation

SVMPerceptron

∑wk Fk = θ is a hyperplane

Combining fragments linearlyConditional independence:

P(F1,F2 | C) = p(F1|C) p(F2|C)

)/()/(

NCFpCFp

> θ

)|(

)|(

NCFip

cFip

> θ

W(Fi) = log)|(

)|(

NCFip

cFip

Σw(Fi) > θ

• Σw(Fi) > θ

If Fi=1 take log)|1(

)|1(

NCFip

cFip

If Fi=0 take log)|0(

)|0(

NCFip

cFip

Instead: Σ wi > θOn all the detected fragments

only

With: wi = w(Fi=1) – w(Fi=0)

Class II

Class Non-class

Fragments with positions

∑wk Fk > θ

On all detected fragments within their regions

Horse-class features

Examples of Horses Detected

Interest points (Harris)SIFT Descriptors

Ix2 IxIy

IxIy

Iy2

∑

Harris Corner Operator

<Ix2> < IxIy<

< < yIxI < yI2>

H=

Averages within a neighborhood.

Corner: The two eigenvalues λ1, λ2 are large

Indirectly:

‘Corner’ = det(H) – k trace2(H)

Harris Corner Examples

SIFT descriptor

David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110

Example :

4*4 sub-regions

Histogram of 8 orientations in each

V = 128 values:

g1,1,…g1,8,… …g16,1,…g16,8

Constellation of Patches Using interest points

Fegurs, Perona, Zissermann 2003

Six-part motorcycle model, joint Gaussian ,

Bag of wordsand Unsupervised Classification

ObjectObject Bag of ‘words’Bag of ‘words’

Bag of visual words A large collection of image patches

–

1.Feature detection 1.Feature detection and representationand representation

•Regular grid– & VogelSchiele ,2003

–Fei- ,Fei & Perona2005

Each class has its words historgram

–

–

–

pLSAClassify document automatically, find related documents, etc. based on word

frequency.

Documents contain different ‘topics’ such as Economics, Sports, Politics, France… Each topic has its typical word frequency. Economics will have high occurrence of

‘interest’, ‘bonds’ ‘inflation’ etc.

We observe the probabilities p(wi | dn) of words and documents

Each document contains several topics, zk

A word has different probabilities in each topic, p(wi | zk). A given document has a mixture of topics: p(zk | dn) The word-frequency model is:

p(wi | dn) = Σkp(wi|zk) p(zk | dn)

pLSA was used to discover topics, and arrange documents according to their topics.

Discovering objects and their location in images

Sivic, Russel, Efros, Freedman & Zisserman CVPR 2005

Uses simple ‘visual words’ for classification

Not the best classifier, but obtains unsupervised classification, using pLSA

Visual words – unsueprvised classification

• Four classes: faces, cars, airplanes, motorbikes, and non-class. Training images are mixed.

• Allowed 7 topics, one per class, the background includes 3 topics.

• Visual words: local patches using SIFT descriptors. – (say local 10*10 patches)

codewords dictionarycodewords dictionary

Results of learning

Classifying a new image

• New image I:

• Measure p(wi | I)

• Find topics for the new image:

• p(wi | I) = Σ p(wi | Tk) p(Tk | I)

• Optimize over the topics Tk

• Find the largest (non-background) topic

Classifying a new image

On general model learning

• The goal is to classify C using a set of features F. • F have been selected (must have high MI(C;F)) • The next goal is to use F to decide on the class C.

• Probabilistic approach: • Use observations to learn the joint distribution p(C,F)• In a new image, F is observed, find the most likely C, • Max (C) p(C,F)

General model learning • To learn the joint distribution p(C,F): • The model is of the form pθ(C,F)

– Or: pθ(C,X,F)

• For example we had – words in documents: – p(w,D) = Πp(wi,D)– p(wi | D) = Σ p(wi | Tk) p(Tk | D)

• Training examples used to determine optimal θ by maximizing pθ(data)– max (C,X, θ) pθ(C,X,F)

• When θ known, classify new example:– max (C,X) pθ(C,X,F)

Object recognition

Documents

Transcript of Object recognition