ICIAP 2007 - Tutorial Advances of statistical learning and ...
Transcript of ICIAP 2007 - Tutorial Advances of statistical learning and ...
![Page 1: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/1.jpg)
ICIAP 2007 - Tutorial
Advances of statistical learningand
Applications to Computer Vision
Ernesto De Vito and Francesca Odone
- PART 2 -
http://slipguru.disi.unige.it
![Page 2: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/2.jpg)
2
Plan of the second part
Brief intro through a set of applications
One problem in detail (face detection):Choosing the representationFeature selectionClassification
On the choice of the classifier (filter methods)
Spotlights on other interesting issuesImage annotationKernel engineeringGlobal vs local
![Page 3: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/3.jpg)
3
Plan of the second part
Brief intro through a set of applications
One problem in detail (face detection):Choosing the representationFeature selectionClassification
On the choice of the classifier (filter methods)
Spotlights on other interesting issuesImage annotationKernel engineeringGlobal vs local
![Page 4: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/4.jpg)
4
Learning in everyday life
Security and video-surveillanceOCR systemsRobot controlBiometricsSpeech recognitionEarly diagnosis from medical dataKnowledge discovery in big dataset of heterogeneous data (included the Internet)Microarray analysis and classificationStock market preditionRegression applications in computer graphics
![Page 5: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/5.jpg)
5
Statistical Learning in Computer Vision
![Page 6: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/6.jpg)
6
Statistical Learning in Computer Vision
Detection problems
![Page 7: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/7.jpg)
7
Statistical Learning in Computer Vision
More in general: Image annotation
cartreebuildingskypavementpedestrian..
![Page 8: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/8.jpg)
8
How difficult is image understanding?
![Page 9: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/9.jpg)
9
Plan of the second part
Brief intro through a set of applications
One problem in detail (face detection):Choosing the representationFeature selectionClassification
On the choice of the classifier (filter methods)
Spotlights on other interesting issuesImage annotationKernel engineeringGlobal vs local
![Page 10: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/10.jpg)
10
Regularized face detection
Main steps towards a complete classifier:
Choosing the representationFeature selectionClassification
Joint work with A. Destrero – C. De Mol – A. Verri
Problem setting:Find one or more occurrences of a (~frontal) human face, possibly at different resolutions, in a digital image
![Page 11: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/11.jpg)
11
Application scenario (the data)
2000+2000 training1000+1000 validation3400 test
19 x 19 images
![Page 12: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/12.jpg)
12
Initial representation (the dictionary)
Overcomplete, general purpose sets of features are effective for modeling visual informationMany object classes have a peculiar intrinsic structure that can be better appreciated if one looks for symmetries or local geometry
Examples of features: wavelets, curvelets, ranklets, chirplets, rectangle features ...Example of problems: face detection (Heisele et al, Viola & Jones, ....), pedestrian detection (Oren et al., ..), car detection (Papageorgiou & Poggio)
![Page 13: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/13.jpg)
13
Initial representation (the dictionary)
The approach is inspired by biological systems See, for instance, B.A. Olshauser and D. J. Field “Sparse coding with an over-complete basis set: a strategy employed by V1?” 1997.
Usually this approach is coupled with learning from examples
The prior knowledge is embedded in the choice of an appropriate training set
Problem: usually these sets are very big
![Page 14: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/14.jpg)
14
Initial representation (the dictionary)
Rectangle features (Viola & Jones)
... About 64000 features per image patch!
Most of them are correlatedShort range correlation of natural imagesLong range correlation relative to the object of interest
![Page 15: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/15.jpg)
15
What’s wrong with this?
Measurements are noisyFeatures are correlatedThe number of features is higher than the number of examples
=> Ill conditioned
![Page 16: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/16.jpg)
16
Feature selection
Extracting features relevant for a given problem
What is relevant?
Often related to dimensionality reductionBut the two problems are different
A possible way to address the problem is to resort to regularization methods
Elastic net penalty (PART 1)
![Page 17: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/17.jpg)
17
Let us revise the basic algorithm
We assume a linear dependence between input and output
φ={φij} is the measurement matrixi=1,...,n examples/dataj=1,...,p dictionary
β=(β1,..., βp)T vector of unknown weights to be estimated
f=(f1,..., fn)T output values {-1,1} labels in binary classification problems
βϕ=f
![Page 18: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/18.jpg)
18
Choosing the appropriate algorithm
What sort of penalty suits our problem best?In other words:
How do we choose ε?
The choice is driven by the application domainWhat can we say about image correlation?Is there any reason to prefer feature A to feature B?Do we want them both?
{ })|(|minarg 222 βεβλϕβ
β++−
∈
fNR
A
B
![Page 19: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/19.jpg)
19
Peculiarity of images
Given a group of short range correlated features each element is a good representative of the group
As for long range correlated features it would be interesting to keep them all, but it’s difficult to distinguish them at this stage
Notice that in other applications (e.g., microarray analysis) feature is important per se.
![Page 20: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/20.jpg)
20
L1 penalty
A purely L1 penalty automatically enforces the presence of many zeros in fThe L1 norm is convex therefore providing feasible algorithms
(PROB L1) is the Lagrangian formulation of the so-called LASSO Problem
PROB L1{ }||minarg βλϕββ
+−∈
22f
NR
![Page 21: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/21.jpg)
21
L1 penalty
The regularization parameter λ regulates the balance between misfit of the data and penalty
Also it allows us to vary the degree of sparsity
![Page 22: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/22.jpg)
22
How do we solve it?
The solution is not uniqueA number of numerical strategies have been proposed
We adopt the iterated soft-threshold Landweber
[ ])( )()()( tL
TtL
tL fS ϕβϕββ λ −+=+1
⎩⎨⎧ ≥−
=otherwise
hifhsignhhS jjj
j 02λλ
λ
||)()(Where the
soft-thresholderis defined as
This algorithm converges to a minimized of (PROB L1) if |ϕ|<1.
ALG L
![Page 23: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/23.jpg)
23
Thresholded Landweber and our problem
βϕ=f
φ is the measurement matrix: one row per imageone column per feature
f is the vector of labels+1 for faces-1 for negative examples
In our experiments φ has size 4000x64000 (about 1Gb!)
[ ])( )()()( tL
TtL
tL fS ϕβϕββ λ −+=+1
![Page 24: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/24.jpg)
24
A sampled version of Thresholded Landweber
We build S feature subsets each time extracting with replacement m features, m < < pWe compute S sub-problems
Then we keep the features that were selected eachtime they appeared in the sub-set
s=1,...,Sssf ϕβ=
![Page 25: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/25.jpg)
25
A sampled version of Thresholded Landweber
In our experimentsEach sub-set is 10% of the original sizeS=200 (the probability of extracting each feature at least 10 times is high)
5 10 15 20 25 30 35 400
1000
2000
3000
4000
5000
6000
![Page 26: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/26.jpg)
26
Structure of the method (I)
S0
sub1 sub2 subS....
Alg L Alg L Alg L
+
S1
![Page 27: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/27.jpg)
27
Choosing λ
A few words on parameter tuning
A classical choice is cross validation but in this case it is too heavy (because of the number of sub-problems..)
Thus, at this stage, we fix the number of zeros to be reached in a given number of iterations
![Page 28: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/28.jpg)
28
Cross validation
A standard technique for parameter estimation
Try different parameters and choose the one that performs (generalizes) best.
K-fold cross validation:Divide the training set in K chunksKeep K-1 for training and 1 for validatingRepeat for the K different validation setsCompute an average classification rate
![Page 29: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/29.jpg)
29
Classification
Two reasons:Obtain an effective face detectorSpeculate on the quality of the selected features
Face detection is a fairly standard binary classification problem
Regularized Least SquaresSupport Vector Machines (Vapnik, 1995)...with some nice kernel
In the following experiments we start using linear SVMs
![Page 30: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/30.jpg)
30
Setting 90% of zeros
We get 4636 features..too many
What about increasing the number of zeros in the solution???
0.7
0.75
0.8
0.85
0.9
0.95
1
0 0.005 0.01 0.015 0.02
One stage feature selectionOne stage feature selection (cross validation)
One stage feature selection (on entire set of features)
![Page 31: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/31.jpg)
31
A refinement of the solution
Setting 99% of zeros:345 features (good)Generalization performance drops of about 3% (bad)
IDEA: We apply the Thresholded Landweber once again (on S1 = 4636 features)This time we tune λ with cross validationWe obtain 247 features
![Page 32: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/32.jpg)
32
Structure of the method (II)
S1
Alg L
S2
S0
sub1 sub2 subS....
Alg L Alg L Alg L
+
![Page 33: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/33.jpg)
33
Comparative analysis
0.7
0.75
0.8
0.85
0.9
0.95
1
0 0.005 0.01 0.015 0.02
2 stages feature selection2 stages feature selection + correlation
Viola+Jones feature selection using our same dataViola+Jones cascade performance
0.7
0.75
0.8
0.85
0.9
0.95
1
0 0.005 0.01 0.015 0.02
2 stages feature selectionPCA
Comparison with PCAComparison with Adaboost feature selection (Viola&Jones)
![Page 34: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/34.jpg)
34
How compact is the solution?
The 247 are still redundantFor real-time processing we may want to try and reduce it further
0.7
0.75
0.8
0.85
0.9
0.95
1
0 0.005 0.01 0.015 0.02
2 stages feature selection2 stages feature selection (polynomial kernel)
Linear vs Polynomial kernel
![Page 35: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/35.jpg)
35
A third optimization stage
Starting from S2We choose one delegate for each group of short range correlated featuresOur correlation is based on discarding features that are
Of the same typeCorrelated according to the Spearman’s testSpatially close
![Page 36: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/36.jpg)
36
Structure of the method (III)
S0
S2
sub1 sub2 subS
Alg L Alg L
+
S1
Alg L
Corr
S3
![Page 37: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/37.jpg)
37
What do we get?
0.7
0.75
0.8
0.85
0.9
0.95
1
0 0.005 0.01 0.015 0.02
2 stages feature selection + correlation2 stages feature selection + correlation (polynomial kernel)Linear vs polynomial 0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0 0.01 0.02 0.03 0.04 0.05 0.06
Two stages feature selectionTwo stages + correlation analysis
With and without 3rd stage
![Page 38: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/38.jpg)
38
A fully trainable system for detecting faces
Peculiarity of object detectors:For each image many testsVery few positive examplesVery many negative examples
![Page 39: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/39.jpg)
39
A fully trainable system for detecting faces
Coarse-to-fine methods deal with this, devising multiple classifiers of increasing difficulty
Many approaches (focus-of-attention, cascades, ...)
![Page 40: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/40.jpg)
40
Our cascade of classifiers
Starting from a set of features, say S3we build many small linear SVM classifiers each of them based on at least 3 distant features that are able to reach a fixed target performance on a validation setThe target performance is chosen so that each classifier is not likely to miss faces
Minimum hit rate 99.5%Maximum false positive rate 50%
∏∏
==
i
i
hHfF For 10 layers
F ~ 90% and H ~ 0.510
![Page 41: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/41.jpg)
41
Our cascade of classifiers
![Page 42: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/42.jpg)
42
Finding faces in images
![Page 43: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/43.jpg)
43
Finding faces in images
![Page 44: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/44.jpg)
44
Finding faces in images
![Page 45: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/45.jpg)
45
Finding faces in video frames
![Page 46: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/46.jpg)
46
Finding eye regions...
The beauty of data driven approachesSame approachDifferent dataset: we extracted eye regions from a subset of the Feret dataset
![Page 47: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/47.jpg)
47
A few results (faces and eyes)
![Page 48: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/48.jpg)
48
Online examples
video
![Page 49: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/49.jpg)
49
A few words on the choice of the classifier
SVMs are very popular for their effectiveness and their generalization abilityOther algorithms can perform in a similar way and have other attractiveness
Filter methods are very simple to implement and allow us to obtain very interesting performanceIn particular, iterative methods are very useful when parameter tuning is needed
Joint work with L. Logerfo, L. Rosasco, E. De Vito, A Verri
![Page 50: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/50.jpg)
50
Experiments on face detection
1.48 ±0.34σ=300 t=59
1.53 ±0.33σ=341 t=89
1.63 ±0.32σ=341 t=95
ν method
1.60 ±0.71σ=1000 C=0.9
1.99 ±0.82σ=1000 C=1
2.41 ±1.39σ=800 C=1
RBF-SVM
800700600
Size of the training set
Experiments carried out on a portion of the previously mentioned facesdataset
![Page 51: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/51.jpg)
51
Plan of the second part
Brief intro through a set of applications
One problem in detail (face detection):Choosing the representationFeature selectionClassification
On the choice of the classifier (filter methods)
Spotlights on other interesting issuesImage annotationKernel engineeringGlobal vs local
![Page 52: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/52.jpg)
52
On the classifier choice:Filter methods
Starting from RLS we have seen (PART 1) how a large class of methods known as spectral regularization give rise to regularized learning algorithmsThese methods were originally proposed to solve inverse problemsThe crucial intuition is that the same principle allowing us to numerical stabilize a matrix inversion is crucial to avoid overfitting
They are worth investigating for their simplicity and effectiveness
![Page 53: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/53.jpg)
53
Filter methods
Alle these algorithms are consistent and can be easily implemented
They have a common derivation (and similar implementation) but have
Different theoretical properties (PART 1)Different computational burden
![Page 54: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/54.jpg)
54
Filter methods: computational issues
Non iterativeTikhonov (RLS)Truncated SVD
IterativeLandweberv methodIterated Tikhonov
![Page 55: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/55.jpg)
55
Filter methods: computational issues
RLSTraining (for a fixed lambda):function [c] = rls(K, lambda, y)
n = length(K);c = (K+n*lambda*eye(n))\y;
Test:function [y_new] = rls_test(x, x_new, c)
K_test = kernel(x,x_test);y_new = K_test * c;% for classificationy_new = sign(y_new);
Careful to choose the matrix inversion function
![Page 56: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/56.jpg)
56
Filter methods: computational issues
RLS
The computational cost of RLS is the cost of invertingthe matrix K: O(n3)
In case parameter tuning is needed resorting to a eigendecomposition of matrix K saves time:
yQnIQc
QQKT
T
1−+Λ=
Λ=
)()( λλ
ynIKc 1−+= )( λ
![Page 57: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/57.jpg)
57
Filter methods: computational issues
v method
t plays the role of the regularization parameterComputational cost: O(tn2)
The iterative procedure allows us to compute all solutions from 0 to t (regularization path)This is convenient if parameter tuning is needed:
With an appropriate choice of the max number of iterations the computational cost does not change
λ=t
![Page 58: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/58.jpg)
58
Plan of the second part
Brief intro through a set of applications
One problem in detail (face detection):Choosing the representationFeature selectionClassification
On the choice of the classifier (filter methods)
Spotlights on other interesting issuesImage annotationKernel engineeringGlobal vs local
![Page 59: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/59.jpg)
59
How difficult is image understanding?
Problem setting (general):Assign one or many labels (from a finite but possibly big
set of known classes) to a digital image according to its content
This general problem is very complexMany better defined domains have been studied
Image categorizationObject detectionObject recognition
Usually the trick is in defining the boundaries of the problem of interest
Joint work(s) with A. Barla – E. Delponte – A. Verri
![Page 60: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/60.jpg)
60
Object idenfication/recognition
Nevertheless, the problem isnot that simple
![Page 61: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/61.jpg)
61
Image annotation
Problem setting:Assign one or more labels (from a finite set of
known classes) to a digital image according to its content
Assumption: we look for global descriptionsIndoor/outdoorDrawing/pictureDay/nightCityscape/not
It usually leads to supervised problems (binary classifiers)Low level descriptions are often applied
![Page 62: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/62.jpg)
62
Image annotation from low-levelglobal descriptions
The problem:Capture a global description of the image usingsimple features
The procedure:Build a suitable training set of dataFind an appropriate representationChoose a classification algorithm and a kernelTune the parameters
![Page 63: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/63.jpg)
63
Computer vision ingredients
Color: Color histograms
Shape: Orientation and strength edge histogramsHistograms of the lengths of edge chains
Texture:Wavelets, Co-occurrence matrices
We represent whole images with low leveldescriptions of color, shape or texture
![Page 64: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/64.jpg)
64
A few comments
Histograms appear quite often
We need a simple example to discuss kernel engineeringDesigning ad hoc kernels for the problem/data at hand and the right properties:
SymmetryPositive definiteness
=> Let us go through the histogram intersection example
![Page 65: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/65.jpg)
65
Histogram Intersection (HI)
• Since (Swain and Ballard, 1991) it is knownthat histogram intersection is a powerfulsimilarity measure for color indexing
• Given two images, A and B, of N pixels, if werepresent them as histograms with M bins Ai and Bi, histogram intersection is defined as
{ }∑=
=M
iii BABAK
1,min),(
![Page 66: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/66.jpg)
66
Histogram Intersection (HI)
05
1015202530354045
Bin1
Bin2
Bin3
Bin4
Bin5
Bin6
Bin7
Bin8
05
1015202530354045
Bin1
Bin2
Bin3
Bin4
Bin5
Bin6
Bin7
Bin8
0
5
10
15
20
25
30
35
Bin1
Bin2
Bin3
Bin4
Bin5
Bin6
Bin7
Bin8
∑i
iBin
![Page 67: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/67.jpg)
67
HI is a Kernel
If we build the MxN – dimensional vector
it can immediately be seen that
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛=
−−−321
876
321
876
321
876r
M
M
AN
A
AN
A
AN
A
A 0,...,0,0,1,...,1,1,...,0,...,0,0,1,...,1,1,0,...,0,0,1,...,1,12
2
1
1
( ) >=< BABAK ,,
NOTICE: The proof is based on finding an explicit mapping
Dot product(linear kernel)
![Page 68: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/68.jpg)
68
Histogram intersection: applications
HI has been applied with success to a variety of classification problems, both global and local:
Indoor/outdoor, day/night, cityscape/landscape classificationObject detection from local features (SIFT)
In all those cases it outperformed RBF classifiers Also HI does not depend on any parameter
![Page 69: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/69.jpg)
69
Local approaches
Global approaches have limitsOften objects of interest occupy only a (small) portion of the imageIn a simplified setting all the rest of the image can be defined as background (or context)Depending on the application domain context can help recognition or make it more difficult:
![Page 70: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/70.jpg)
70
Local approaches
We may represent the image content as a set of local features (f1, ..., fn) --- corners, DoG features, ...
We immediately see that this is a variable length description
How to deal withvariable length:
Vocabulary approachLocal kernels (or kernels on sets)
Local features in scale-space
![Page 71: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/71.jpg)
71
Local approaches: features vocabulary
It is reminiscent of text categorization
We define a vocabulary of local features and represent our images based on how often a given feature appears in the image
One implementation of this paradigm is the bag of keypoints approach
![Page 72: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/72.jpg)
72
Local approaches: features vocabulary
[Csurka et al, 2004]
![Page 73: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/73.jpg)
73
Local approaches: kernels on sets
Image descriptions based on local features can be seen as sets:
Variable lengthNo internal ordering
A common approach to define a global similarity between feature setsis to combine the local similarity between (possibly all) pairs of vector elements
},,{},,,{ 11 mn yyYxxX KK ==
mjniyxKYXK jiL ,...,1,,1)),((),( ==∀ℑ= K
![Page 74: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/74.jpg)
74
Summation kernel [Haussler,1999]
The simplest kernel for sets is the summation kernel
Ks is a kernel if KL is a kernelKs is not so useful in practice:
Computationally heavyIt mixes good and bad correspondences
∑∑= =
=n
i
m
jjiLS yxKYXK
1 1),(),(
![Page 75: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/75.jpg)
75
Matching kernel [Wallraven et al, 2003]
Among the many other kernels for sets proposed the matching kernel received a lot of attention for image data
( ){ }∑
= ==
+=n
jjiLmj
M
yxKm
YXK
XYKYXKYXK
1 ,...,1),(max1),(ˆ
),(ˆ),(ˆ21),(
where
![Page 76: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/76.jpg)
76
Matching kernel [Wallraven et al, 2003]
The matching kernel lead to promising results on object recognition problems
Nevertheless it has been shown that it is not a Mercer kernel (because of the max op.)
![Page 77: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/77.jpg)
77
Intermediate matching kernel[Boughorbel et al,, 2004]
Let us consider two feature sets
The two feature sets are compared through an auxiliary set of virtual features
The intermediate matching kernel is defined as
},,{},,,{ 11 mn yyYxxX KK ==
},,{ 1 pvvV K=
∑∈
=Vv
vVi
iYXKYXK ),(),(
),(),( ** yxKYXK Lvi=where
x* and y* are the elements of X and Ycloser to vi
![Page 78: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/78.jpg)
78
Intermediate matching kernel[Boughorbel et al,, 2004]
∑∈
=Vv
vVi
iYXKYXK ),(),(
),(),( ** yxKYXK Lvi=where
x* and y* are the elements of X and Ycloser to vi
![Page 79: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/79.jpg)
79
Intermediate matching kernel:how to choose the virtual features
The intuition behind the virtual features is to find representatives of the feature points extracted in the training setSimply the training set features are grouped in N clusters
The authors show that the choice of N is not crucial (the bigger the better, but careful to computational complexity)It is better to cluster features within each class
![Page 80: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/80.jpg)
80
Conclusions
Understanding the image content is difficult
Statistical learning can help a lot
Don’t forget computer vision! Appropriate descriptions, similarity measures allow us to achieve good results and to obtain effective solutions
![Page 81: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/81.jpg)
81
That’s all!
How to contact us:Ernesto: [email protected]: [email protected]
http://slipguru.disi.unige.itwhere you will find updated versions of the slides
![Page 82: ICIAP 2007 - Tutorial Advances of statistical learning and ...](https://reader034.fdocuments.us/reader034/viewer/2022052601/628cd65b741e9e344058ad2f/html5/thumbnails/82.jpg)
82
Selected (and very incomplete) biblio
A. Destrero, C. De Mol, F. Odone, A. Verri. A regularized approachto feature selection for face detection. DISI-TR-2007-01A. Mohan, C. Papageorgiou, T. PoggioExample based object detection in images by components, PAMI(Vol. 23, No. 4), 2001F. Odone, A. Barla, and A. Verri. Building kernels from binary
strings for image matching, IEEE Transactions on ImageProcessing, 14(2):169-180, 2005
P. Viola and M. J. Jones. Robust real-time face detection. International Journal on Computer Vision, 57(2),2004.C. Wallraven, B. Caputo, A. Graf. Recognition with Local features: the kernel recipe. ICCV03.