Lecture 8 Statistical Learning & Part Models

74
COS 429: Computer Vision Lecture 8 Statistical Learning & Part Models COS429 : 12.10.17 : Andras Ferencz

Transcript of Lecture 8 Statistical Learning & Part Models

Page 1: Lecture 8 Statistical Learning & Part Models

COS 429: Computer Vision

Lecture 8Statistical Learning & Part Models

COS429 : 12.10.17 : Andras Ferencz

Page 2: Lecture 8 Statistical Learning & Part Models

2 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Review: Object Detection Typical Components ● Hypothesis generation

● Sliding window, Segmentation, feature point detection, random, search

● Encoding of (local) image data● Colors, Edges, Corners, Histogram of Oriented

Gradients, Wavelets, Convolution Filters

● Relationship of different parts to each other● Blur or histogram, Tree/Star, Pairwise/Covariance

● Learning from (labeled) examples● Selecting representative examples (templates),

Clustering, Building a cascade ● Classifiers: Bayes, Logistic regression, SVM,

AdaBoost, Neural Network... ● Generative vs. Discriminative

● Verification - removing redundant, overlaping, incompatible examples

● Non-Max Suppression, context priors, geometry

Exemplar Summary

Page 3: Lecture 8 Statistical Learning & Part Models

3 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Using least squares for classification

If the right answer is 1 and the model says 1.5, it loses, so it changes the boundary to avoid being “too correct”

least squares regression

J. Hinton

Page 4: Lecture 8 Statistical Learning & Part Models

4 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

The Problem: Loss FunctionRecall: Regression Loss Functions

Some Classification Loss Functions

Squared LossHinge LossLog Loss0/1 Loss

Page 5: Lecture 8 Statistical Learning & Part Models

5 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Sigmoid

Ziv-Bar Joseph

Page 6: Lecture 8 Statistical Learning & Part Models

6 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Log Loss

Ziv-Bar Joseph

The Logistic Loss: Log(1+ exp(-z))Where z = y*(wTx+b)

Page 7: Lecture 8 Statistical Learning & Part Models

7 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Logistic Result

least squares regression

Logistic Regression

Page 8: Lecture 8 Statistical Learning & Part Models

8 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Using Logistic Regression● Quick, simple

classifier (try it first)

● Outputs a probabilistic label confidence

● Use L2 or L1 regularization

– L1 does feature selection and is robust to irrelevant features but slower to train

Page 9: Lecture 8 Statistical Learning & Part Models

9 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Classifiers: Linear SVM

x x

xx

x

x

x

x

oo

o

o

o

x2

x1

• Find a linear function to separate the classes:

f(x) = sgn(w x + b)

Page 10: Lecture 8 Statistical Learning & Part Models

10 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Classifiers: Linear SVM

x x

xx

x

x

x

x

oo

o

o

o

x2

x1

• Find a linear function to separate the classes:

f(x) = sgn(w x + b)

Page 11: Lecture 8 Statistical Learning & Part Models

11 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Support vector machines: Margin• Want line that maximizes the margin.

1:1)(negative

1:1)( positive

by

by

iii

iii

wxx

wxx

MarginSupport vectors

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

For support, vectors, 1 bi wx

wx+

b=-1

wx+

b=0

wx+

b=1

Kristin Grauman

Hinge Loss

L(y,f(x)) = max(0,1−y f(x))⋅

Page 12: Lecture 8 Statistical Learning & Part Models

12 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

• Datasets that are linearly separable work out great:

• But what if the dataset is just too hard?

• We can map it to a higher-dimensional space:

0 x

0 x

0 x

x2

Nonlinear SVMs

Slide credit: Andrew Moore

Page 13: Lecture 8 Statistical Learning & Part Models

13 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Φ: x → φ(x)

Nonlinear SVMs

• General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable:

Slide credit: Andrew Moore

Page 14: Lecture 8 Statistical Learning & Part Models

14 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Nonlinear SVMs• The kernel trick: instead of explicitly computing

the lifting transformation φ(x), define a kernel function K such that

K(xi , xj) = φ(xi ) · φ(xj)

(to be valid, the kernel function must satisfy Mercer’s condition)

• This gives a nonlinear decision boundary in the original feature space:

bKybyi

iiii

iii ),()()( xxxx

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Page 15: Lecture 8 Statistical Learning & Part Models

15 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Nonlinear kernel: Example

• Consider the mapping ),()( 2xxx

22

2222

),(

),(),()()(

yxxyyxK

yxxyyyxxyx

x2

Page 16: Lecture 8 Statistical Learning & Part Models

16 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Classifiers: Decision Trees

x x

xx

x

x

x

x

o

oo

o

o

o

o

x2

x1

oGiven (weighted) labeled examples:● Select best single feature &

threshold that separates classes● For each branch, recurse● Stop when

● some depth is reached● Branch is (close to) single-class● too few examples left in branch

Page 17: Lecture 8 Statistical Learning & Part Models

17 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Ensemble Methods: Boosting

figure from Friedman et al. 2000

Page 18: Lecture 8 Statistical Learning & Part Models

18 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Boosted Decision Trees

Gray?

High inImage?

Many LongLines?

Yes

No

NoNo

No

Yes Yes

Yes

Very High Vanishing

Point?

High in Image?

Smooth? Green?

Blue?

Yes

No

NoNo

No

Yes Yes

Yes

Ground Vertical Sky

[Collins et al. 2002]

P(label | good segment, data)

Page 19: Lecture 8 Statistical Learning & Part Models

19 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Using Boosted Decision Trees● Flexible: can deal with both continuous

and categorical variables● How to control bias/variance trade-of

– Size of trees

– Number of trees

● Boosting trees often works best with a small number of well-designed features

● Boosting “stubs” can give a fast classifier

Page 20: Lecture 8 Statistical Learning & Part Models

20 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Review: Typical Components ● Hypothesis generation

● Sliding window, Segmentation, feature point detection, random, search

● Encoding of (local) image data● Colors, Edges, Corners, Histogram of Oriented

Gradients, Wavelets, Convolution Filters

● Relationship of different parts to each other● Blur or histogram, Tree/Star, Pairwise/Covariance

● Learning from labeled examples● Selecting representative examples (templates),

Clustering, Building a cascade ● Classifiers: Bayes, Logistic regression, SVM,

Decision Trees, AdaBoost, ... ● Generative vs. Discriminative

● Verification - removing redundant, overlaping, incompatible examples

● Non-Max Suppression, context priors, geometry

Exemplar Summary

Page 21: Lecture 8 Statistical Learning & Part Models

21 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Discriminative vs. Generative Classifiers

Discriminative Classification:Find the boundary between classes

1. Assume some functional form for P(Y|X)

2. Estimate parameters of P(Y|X) directly from training data

Training classifiers involves estimating f: X Y, or P(Y|X) “Y given X”

Generative Classification:Model each class & see which fits better

1.Assume some functional form for P(X|Y), P(X)

2.Estimate parameters of P(X|Y), P(X) directly from training data

3.Use Bayes rule to calculate P(Y|X= xi)

X2

X2

Page 22: Lecture 8 Statistical Learning & Part Models

22 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit: Eamonn Keogh

Bayesian Classification (Generative Model)

Page 23: Lecture 8 Statistical Learning & Part Models

23 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit: Eamonn Keogh

Bayesian Classification

Page 24: Lecture 8 Statistical Learning & Part Models

24 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit: Eamonn Keogh

Bayesian Classification

Page 25: Lecture 8 Statistical Learning & Part Models

25 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit: Eamonn Keogh

Bayesian Classification

Page 26: Lecture 8 Statistical Learning & Part Models

26 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit: Eamonn Keogh

Bayesian Classification

Page 27: Lecture 8 Statistical Learning & Part Models

27 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit: Eamonn Keogh

Bayesian Classification

Page 28: Lecture 8 Statistical Learning & Part Models

28 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit: Eamonn Keogh

Naïve Bayes Classification

Page 29: Lecture 8 Statistical Learning & Part Models

29 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Naïve Bayes, Odds ratio, and Logit

Assuming independence of features d1 and d2, we can classify between 2 classes {C1, C2}, by compute the ratio:

This is called the Odds ratio.

It is often easier to take the log of this, called the Log Odds or Logit:

P (C1∣d1 , d2)P (C2∣d1 , d2)

=P (d1∣C1)P(d2∣C1)P (C1)P(d1∣C2)P(d2∣C2)P(C2)

<1

logP (C1∣d1 , d2)P (C2∣d1 , d2)

=logP(d1∣C1)P(d1∣C2)

+logP (d2∣C1)P (d2∣C2)

+logP(C1)P (C2)

<0

Page 30: Lecture 8 Statistical Learning & Part Models

30 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit: Eamonn Keogh

Naïve Bayes

Page 31: Lecture 8 Statistical Learning & Part Models

31 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit: Eamonn Keogh

Naive Bayes with Gaussian densities have piecewise quadratic decision boundary.

Naïve Bayes

Page 32: Lecture 8 Statistical Learning & Part Models

32 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit: Eamonn Keogh

Graphical Models

Naive Bayes assumes independence of d1,d2,... conditioned on the class c

Independence assumption of Naive Bayes is the simplest assumption. Often much more complicated relationships needs to be represented.

A good way to do this is a Graphical Model (aka. Byesian Network)

Page 33: Lecture 8 Statistical Learning & Part Models

33 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit: Ioannis Tsamardinos

Cough

Graphical Models

More complicated models consider the dependence between different variables.

Page 34: Lecture 8 Statistical Learning & Part Models

34 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Discriminative vs. Generative Classifiers

Discriminative Classification:Find the boundary between classes

1. Assume some functional form for P(Y|X)

2. Estimate parameters of P(Y|X) directly from training data

Training classifiers involves estimating f: X Y, or P(Y|X) “Y given X”

Generative Classification:Model each class & see which fits better

1.Assume some functional form for P(X|Y), P(X)

2.Estimate parameters of P(X|Y), P(X) directly from training data

3.Use Bayes rule to calculate P(Y|X= xi)

Page 35: Lecture 8 Statistical Learning & Part Models

35 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Discriminative vs. Generative Classifiers

Discriminative:+ Model directly what you care about+ With many examples usually more accurate+ Often faster to evaluate, can scale well to many examples & classes Generative:+ Allows more flexibility to model relationships between variables+ Can handle compositionality, missing & occluded parts (more “object oriented”)+ Often needs less labeled examples

“On Discriminative vs. Generative classifiers: A comparison of logistic regression and naïve Bayes,” A. Ng and M. Jordan, NIPS 2002.

Page 36: Lecture 8 Statistical Learning & Part Models

36 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Comparison

Naïve Bayes

Logistic Regression

Linear SVM

Nearest Neighbor

Kernelized SVM

Learning Objective

ii

jjiij

yP

yxP

0;log

;|logmaximize

Training

Krky

rkyx

ii

iiij

kj

1

Inference

0|0

1|0log

,0|1

1|1log where

01

0

1

01

yxP

yxP

yxP

yxP

j

jj

j

jj

TT

xθxθ

xθθx

θθx

Tii

ii

yyP

yP

exp1/1,| where

,|logmaximize Gradient ascent 0xθT

0xθTLinear programmingiy i

Ti

ii

1 such that

2

1 minimize

θ

Quadratic programming

complicated to write

most similar features same label Record data

i

iii Ky 0,ˆ xx

xx ,ˆ argmin where

ii

i

Ki

y

assuming x in {0 1}

Slide credit: D. Hoiem

Page 37: Lecture 8 Statistical Learning & Part Models

37 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Train vs. Test Accuracy

Page 38: Lecture 8 Statistical Learning & Part Models

38 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Generalization● Components of generalization error

– Bias: how much the average model over all training sets differ from the true model?

● Error due to inaccurate assumptions/simplifications made by the model

– Variance: how much models estimated from different training sets differ from each other

● Underfitting: model is too “simple” to represent all the relevant class characteristics– High bias and low variance– High training error and high test error

● Overfitting: model is too “complex” and fits irrelevant characteristics (noise) in the data– Low bias and high variance– Low training error and high test error

Slide credit: L. Lazebnik

Page 39: Lecture 8 Statistical Learning & Part Models

39 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Bias-Variance Trade-of

● Models with too few parameters are inaccurate because of a large bias (not enough flexibility).

● Models with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample).

Slide credit: D. Hoiem

Page 40: Lecture 8 Statistical Learning & Part Models

40 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Bias-Variance Trade-of

E(MSE) = noise2 + bias2 + variance

See the following for explanations of bias-variance (also Bishop’s “Neural Networks” book): •http://www.inf.ed.ac.uk/teaching/courses/mlsc/Notes/Lecture4/BiasVariance.pdf

Unavoidable error

Error due to incorrect

assumptions

Error due to variance of training

samples

Slide credit: D. Hoiem

Page 41: Lecture 8 Statistical Learning & Part Models

41 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Try out what hyperparameters work best on test set.

Cross Validation

Page 42: Lecture 8 Statistical Learning & Part Models

42 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

42

Trying out what hyperparameters work best on test set:Very bad idea. The test set is a proxy for the generalization performance!Use only VERY SPARINGLY, at the end.

Cross Validation

Page 43: Lecture 8 Statistical Learning & Part Models

43 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Validation datause to tune hyperparameters

Validation

Page 44: Lecture 8 Statistical Learning & Part Models

44 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Cross-validationcycle through the choice of which fold is the validation fold, average results.

Cross Validation

Page 45: Lecture 8 Statistical Learning & Part Models

45 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

So…● No classifier is inherently

better than any other: you need to make assumptions to generalize

● Three kinds of error– Inherent: unavoidable– Bias: due to over-

simplifications– Variance: due to inability

to perfectly estimate parameters from limited data

Slide credit: D. HoiemSlide credit: D. Hoiem

Page 46: Lecture 8 Statistical Learning & Part Models

46 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

What to remember about classifiers

● Machine learning algorithms are tools, not dogmas

● Try simple classifiers first

● Better to have smart features and simple classifiers than simple features and smart classifiers

● Use increasingly powerful classifiers with more training data (bias-variance tradeof)

Slide credit: D. Hoiem

Page 47: Lecture 8 Statistical Learning & Part Models

47 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

How to reduce variance?

● Choose a simpler classifier

● Regularize the parameters

● Get more training data

Slide credit: D. Hoiem

Page 48: Lecture 8 Statistical Learning & Part Models

48 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Review: Typical Components ● Hypothesis generation

● Sliding window, Segmentation, feature point detection, random, search

● Encoding of (local) image data● Colors, Edges, Corners, Histogram of Oriented

Gradients, Wavelets, Convolution Filters

● Relationships of different parts to each other● Blur or histogram, Tree/Star, Pairwise/Covariance

● Learning from labeled examples● Selecting representative examples (templates),

Clustering, Building a cascade ● Classifiers: Bayes, Logistic regression, SVM,

Decision Trees, AdaBoost, ... ● Generative vs. Discriminative

● Verification - removing redundant, overlaping, incompatible examples

● Non-Max Suppression, context priors, geometry

Exemplar Summary

Page 49: Lecture 8 Statistical Learning & Part Models

49 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Implicit shape models

• Visual codebook is used to index votes for object position

B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004

training image annotated with object localization info

visual codeword withdisplacement vectors

Slides by Lana Lazebnik, some adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Page 50: Lecture 8 Statistical Learning & Part Models

50 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Implicit shape models

• Visual codebook is used to index votes for object position

B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004

training image annotated with object localization info

visual codeword withdisplacement vectors

Page 51: Lecture 8 Statistical Learning & Part Models

51 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Implicit shape models: Training

1. Build codebook of patches around extracted interest points using clustering

Page 52: Lecture 8 Statistical Learning & Part Models

52 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Implicit shape models: Training

1. Build codebook of patches around extracted interest points using clustering

Page 53: Lecture 8 Statistical Learning & Part Models

53 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Implicit shape models: Training

1. Build codebook of patches around extracted interest points using clustering

Page 54: Lecture 8 Statistical Learning & Part Models

54 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Implicit shape models: Testing1. Given test image, extract patches, match to

codebook entry

2. Cast votes for possible positions of object center

3. Search for maxima in voting space

4. Extract weighted segmentation mask based on stored masks for the codebook occurrences

Page 55: Lecture 8 Statistical Learning & Part Models

55 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:Source: B. Leibe

Original image

Example: Results on Cows

Page 56: Lecture 8 Statistical Learning & Part Models

56 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Interest points

Example: Results on Cows

Source: B. Leibe

Page 57: Lecture 8 Statistical Learning & Part Models

57 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Example: Results on Cows

Matched patchesSource: B. Leibe

Page 58: Lecture 8 Statistical Learning & Part Models

58 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Example: Results on Cows

Probabilistic votesSource: B. Leibe

Page 59: Lecture 8 Statistical Learning & Part Models

59 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Example: Results on Cows

Hypothesis 1Source: B. Leibe

Page 60: Lecture 8 Statistical Learning & Part Models

60 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Example: Results on Cows

Hypothesis 1Source: B. Leibe

Page 61: Lecture 8 Statistical Learning & Part Models

61 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Example: Results on Cows

Hypothesis 3Source: B. Leibe

Page 62: Lecture 8 Statistical Learning & Part Models

62 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Additional examples

B. Leibe, A. Leonardis, and B. Schiele, Robust Object Detection with Interleaved Categorization and Segmentation, IJCV 77 (1-3), pp. 259-289, 2008.

Page 63: Lecture 8 Statistical Learning & Part Models

63 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Generative part-based models

R. Fergus, P. Perona and A. Zisserman, Object Class Recognition by Unsupervised Scale-Invariant Learning, CVPR 2003

R. Fergus

Page 64: Lecture 8 Statistical Learning & Part Models

64 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Gaussian shape pdf

Poission pdf on # detections

Uniform shape pdf

Prob. of detection

Gaussian part appearance pdf

Generative probabilistic model

Clutter model

Gaussian relative scale pdf

Log(scale)

Gaussian appearance pdf

0.8 0.75 0.9

Uniformrelative scale pdf

Log(scale)

Foreground model

R. Fergus

Page 65: Lecture 8 Statistical Learning & Part Models

65 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

)|(),|(),|(max

)|,()|(

objecthpobjecthshapepobjecthappearanceP

objectshapeappearancePobjectimageP

h

Probabilistic model

h: assignment of features to partsPartdescriptors

Partlocations

Candidate parts

R. Fergus

Page 66: Lecture 8 Statistical Learning & Part Models

66 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Probabilistic model

High-dimensional appearance space

Distribution over patchdescriptors

)|(),|(),|(max

)|,()|(

objecthpobjecthshapepobjecthappearanceP

objectshapeappearancePobjectimageP

h

R. Fergus

Page 67: Lecture 8 Statistical Learning & Part Models

67 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Probabilistic model

h: assignment of features to parts

Part 2

Part 3

Part 1

)|(),|(),|(max

)|,()|(

objecthpobjecthshapepobjecthappearanceP

objectshapeappearancePobjectimageP

h

R. Fergus

Page 68: Lecture 8 Statistical Learning & Part Models

68 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Probabilistic model

2D image space

Distribution over jointpart positions

)|(),|(),|(max

)|,()|(

objecthpobjecthshapepobjecthappearanceP

objectshapeappearancePobjectimageP

h

R. Fergus

Page 69: Lecture 8 Statistical Learning & Part Models

69 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Results: Faces

Faceshapemodel

Patchappearancemodel

Recognitionresults

R. Fergus

Page 70: Lecture 8 Statistical Learning & Part Models

70 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Results: Motorbikes

Page 71: Lecture 8 Statistical Learning & Part Models

71 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Motorbikes

Page 72: Lecture 8 Statistical Learning & Part Models

72 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Deformable Parts Model

Detections

Template Visualization

Felzenszwalb

P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan“Object Detection with Discriminatively Trained Part Based Models”

Page 73: Lecture 8 Statistical Learning & Part Models

73 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

Page 74: Lecture 8 Statistical Learning & Part Models

74 : COS429 : L9 : 12.10.17 : Andras Ferencz Slide Credit:

So many options... How to choose? ● Hypothesis generation

● Sliding window, Segmentation, feature point detection, random, search

● Encoding of (local) image data● Colors, Edges, Corners, Histogram of Oriented

Gradients, Wavelets, Convolution Filters

● Relationship of different parts to each other● Blur or histogram, Tree/Star, Pairwise/Covariance

● Learning from labeled examples● Selecting representative examples (templates),

Clustering, Building a cascade ● Classifiers: Bayes, Logistic regression, SVM,

Decision Trees, AdaBoost, ... ● Generative vs. Discriminative

● Verification - removing redundant, overlaping, incompatible examples

● Non-Max Suppression, context priors, geometry

Exemplar Summary