Lec12 review-part-i
-
Upload
univ-of-missouri-kansas-city -
Category
Education
-
view
43 -
download
1
Transcript of Lec12 review-part-i
Image Analysis & Retrieval
CS/EE 5590 Special Topics (Class Ids: 44873, 44874)
Fall 2016, M/W 4-5:15pm@Bloch 0012
Lec 12
Review of Part I:
Features and Classifiers in Image Classification
Zhu Li
Dept of CSEE, UMKC
Office: FH560E, Email: [email protected], Ph: x 2346.
http://l.web.umkc.edu/lizhu
Z. Li, Image Analysis & Retrv. 2016 p.1
Outline
HW-2
Quiz-1
Review part I: Image Features & Classifiers
Image Formation: Homography
Color Space & Color Features
Filtering and Edge Features
Harris Detector
SIFT
Aggregation: BoW, VLAD, Fisher, and Supervector
Image Retrieval System Metrics: TPR/Precision, FPR, Recall, mAP
Classification: kNN, GMM Bayesian, SVM
Z. Li, Image Analysis & Retrv. 2016 p.2
Homework-2: Aggregation
Data Set:
6 class, 6x20=120 images
Global Codebook for VLAD, FisherVector
Compute SIFT and DenseSIFT from each image, sort them into two arrays: hw2_sift: hw2_dsift
Random sample a subset of them: use randperm()
Compute PCA:
o indx = randperm(n_sift); indx = indx(1:20*2^10); % sample 20k
o [A0, s, eigv]=princomp(hw2_sift(indx,: ));
Project SIFT
o Dimension reduction: x = sift*A0(:,1:kd)
Compute Kmeans and GMM models for VLAD and FV aggregation:
o vl_feat: vl_kmeans(), vl_gmm();
Z. Li, Image Analysis & Retrv. 2016 p.3
VLAD- Vector of Locally Aggregated Descriptors
Aggregate feature difference from the codebook
Hard assignment by finding the NN of feature {xk} to {𝜇𝑘}
Compute aggregated differences
L2 normalize
Final feature: k x d
Image Analysis & Understanding, 2015 p.4
3
x
v1 v2v3 v4
v5
1
4
2
5
① assign descriptors
② compute x- i
③ vi=sum x- i for cell i
𝑣𝑘 =
∀𝑗,𝑠.𝑡.𝑁𝑁 𝑥𝑗 =𝜇𝑘
𝑥𝑗 − 𝜇𝑘
𝑣𝑘 = 𝑣𝑘/ 𝑣𝑘 2
Fisher Vector Aggregation
FV encoding
Gradient on the mean, for GMM component k, j=1..D
In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances
Image Analysis & Understanding, 2015 p.5
𝐹𝑉 = 𝑢1, 𝑢2, … , 𝑢𝐾 , 𝑣1, 𝑣2, … , 𝑣𝐾𝑇
Richer characterization of the code book with GMM: {𝜇𝑘 , Σ𝑘 , 𝜋𝑘}
Normalize by Fisher Kernel
HW-2 Deliverables
Functions:
[gmm]=getGMM(sift, A, kd, n); %gmm.m, gmm.v, gmm.p
[km]=getKMeans(sift, A, kd, n); % km: n x kd
[vlad]=getVladSift(im, km, opt); % opt = ‘sift’, or ‘dsft’
[fv]=getFVSift(im, gmm, opt); % can use vl_fisher()
Plots:
For each image: {sift, dsift} x {vlad, fv} x {kd=[24,32]} x{n=[64, 128]}, total 16 features per image, plot the 120x120 distances for each feature. [hint: use subplot(2,2,k), 4 plots per fig]
TPR-FPR: for each class, plot one figure with 16 plots,
mAP: one plot/table
Z. Li, Image Analysis & Retrv. 2016 p.6
Quiz-1
Covers what we reviewed today
Format:
True or False
Multiple Choices
Problem Solving
Extra Credit (25%)
Close Book, 1-page A4 cheating sheet allowed.
cheating sheet need to be turned in with name and id number
cheating sheet will also be graded for up to 10 pts
Relax
Quiz is actually on me.
Z. Li, Image Analysis & Retrv. 2016 p.7
Outline
HW-2
Quiz-1
Review part I: Image Features & Classifiers
Image Formation: Homography
Color Space & Color Features
Filtering and Edge Features
Harris Detector
SIFT
Aggregation: BoW, VLAD, Fisher, and Supervector
Image Retrieval System Metrics: TPR/Precision, FPR, Recall, mAP
Classification: kNN, GMM Bayesian, SVM
Z. Li, Image Analysis & Retrv. 2016 p.8
Classification in Image Retrieval
A typical image retrieval pipeline
p.9
Image Formation
Feature Computing
Feature Aggregation
Classification
Knowledge/Data Base
Z. Li, Image Analysis & Retrv. 2016
Homography,Color space
Color histogramFiltering, Edge DetectionHoG, Harris Detector, SIFT
BowVLADFisher VectorSupervector
Image Formation - Geometry
Image Analysis & Retrieval, 2016 p.10
X0IKx
10100
000
000
1z
y
x
f
f
v
u
w
K
Slide Credit: GaTech Computer Vision class, 2015.
Intrinsic Assumptions• Unit aspect ratio
• Optical center at (0,0)• No skew of pixels
Extrinsic Assumptions• No rotation• Camera at (0,0,0)
X
x
Image Formation: Homography
In general, homography H maps 2d points according to, x’=Hx
Up to a scale, as [x, y, w]=[sx, sy, sw], so H has 8 DoF
Affine Transform: 6 DoF: Contains a translation [t1, t2], and invertable
affine matrix A[2x2]
Similarity Transform, 4 DoF: a rigid transform that preserves distance if
s=1:
Image Analysis & Retrieval, 2016 p.11
𝑥′𝑦′
𝑤′
=
ℎ1 ℎ2 ℎ3ℎ4 ℎ5 ℎ6ℎ7 ℎ8 ℎ9
𝑥𝑦𝑤
H =𝑎1 𝑎2 𝑡1𝑎3 𝑎4 𝑡20 0 1
H =𝑠 𝑐𝑜𝑠 𝜃 −𝑠 sin(𝜃) 𝑡1𝑠 𝑠𝑖𝑛 𝜃 𝑠 cos(𝜃) 𝑡2
0 0 1
VL_FEAT Implementation
VL_FEAT has an implementation:
p.12Z. Li, Image Analysis & Retrv. 2016
Image Formation – Color Space
RGB Color Space:
8 bit Red, Green, Blue Channels
Mixed together
YUV/YCbCr Color Space
HSV Color Space
Hue: color in polar coord
Saturation
Value
p.13Z. Li, Image Analysis & Retrv. 2016
Color Histogram – Fixed Codebook
Color Histogram:
Fixed code book to generate histogram
Example: MPEG 7 Scalable Color Descriptor
HSV Color Space, 16x4x4 partition for bins
Scalability thru quantization and Haar Transform
p.14Z. Li, Image Analysis & Retrv. 2016
Color Histogram – Adaptive Codebook
Adaptive Color Histogram
No fixed code book
Example: MPEG 7 Dominant Color Descriptor
Distance Metric
Not index-able
p.15Z. Li, Image Analysis & Retrv. 2016
Image Filters
Basic Filter Operations
Region of Support
p.16
1 1
1 1 1
1 1 1*
1/9 1/9 1/9
1/9 1/9 1/9
1/9 1/9 1/9
Z. Li, Image Analysis & Retrv. 2016
Filter Properties
Linear Filters are:
Shift-Invariant:
f(m-k, n-j)*h = g(m-k, n-j), if f*h=g
Associative:
f*h1*h2 = f*(h1*h2)
this can save a lot of complexity
Distributive:
f*h1 + f*h2 = f*(h1+h2)
useful in SIFT’s DoG filtering.
p.17Z. Li, Image Analysis & Retrv. 2016
Edge Detection
The need of smoothing
Gaussian smoothing
Combine differentiation and Gaussian
p.18
f
derivative of Gaussian (x)
Z. Li, Image Analysis & Retrv. 2016
Image Gradient from DoG Filtering
Image Gradient (at a scale):
p.19
The gradient points in the direction of most rapid increase in intensity
The edge strength is given by the gradient magnitude:
The gradient direction is given by:
Z. Li, Image Analysis & Retrv. 2016
HOG – Histogram of Oriented Gradients
Generate local histogram per block
Size of HoG: n*m*4*9=36nm, for n x m cells.
p.20
12
0
10
19747
11
0
10
19545
30251510
2525150
80707050
40303020
510105
1020205
Angle Magnitude
0
20
40
60
80
100
120
140
160
0-19 20-39 40-59 60-79 80-99 100-119 120-139 140-159 160-179
0
1
2
3
4
5
0-19 20-39 40-59 60-79 80-99 100-119 120-139 140-159 160-179
Binary voting Magnitude voting
4
9
4
1
3
9
3
1
2
9
2
1
1
9
1
1 ,...,,,...,,,...,,,..., hhhhhhhhf
HOG Cell size
Z. Li, Image Analysis & Retrv. 2016
Smoothed 2nd moment:
Harris Detector
p.21
)()(
)()()(),(
2
2
DyDyx
DyxDx
IDIIII
IIIg
1. Image derivatives (optionally blur first)
2. Square of derivatives
3. Gaussian filter g(I)
Ix Iy
Ix2
Iy2
IxIy
g(Ix2) g(Iy
2) g(IxIy)
222222 )]()([)]([)()( yxyxyx IgIgIIgIgIg
])),([trace()],(det[ 2
DIDIharrisR
4. Harris Function – detect corners, no need to compute the
eigen values
har
1 2
1 2
det
trace
M
M
Z. Li, Image Analysis & Retrv. 2016
Harris Detector Steps
Response function -> Threshold -> Local Maxima
Harris Detector NOT scale invariant
p.22
R(x,y)
Z. Li, Image Analysis & Retrv. 2016
Scale Space Theory - Lindeberg
Scale Space Response via Laplacian of Gaussian
The scale is controlled by 𝜎
Characteristic Scale:
p.23
2
2
2
22
y
g
x
gg
𝑔 = 𝑒− 𝑥+𝑦 2
2𝜎
r
image𝜎 = 0.8𝑟 𝜎 = 1.2𝑟 𝜎 = 2𝑟
…
characteristic scale
Z. Li, Image Analysis & Retrv. 2016
SIFT Detector via Difference of Gaussian
However, LoG computing is expensive – non-seperatable filter
SIFT: uses Difference of Gaussian filters to approximate LoG
Separate-able, much faster
Box Filter Approximation – you will see later.
p.24
LoG
Scale space construction By Gaussian Filtering, and Image Difference
Z. Li, Image Analysis & Retrv. 2016
Peak Strength & Edge Removal
Peak Strength:
Interpolate true DoG response and pixel location by Taylor expansion
Edge Removal:
Re-do Harris type detection to remove edge on much reduced pixel set
p.25Z. Li, Image Analysis & Retrv. 2016
Scale Invariance thru Dominant Orientation Coding
Voting for the dominant orientation
Weighted by a Gaussian window to give more emphasis to the gradients closer to the center
p.26Z. Li, Image Analysis & Retrv. 2016
SIFT Description
Rotate to dominant direction, divide Image Patch (16x16 pixels) around SIFT point into 4x4 Cells
For each cell, compute an 8-bin edge histogram
Overall, we have 4x4x8=128 dimension
p.27
0 2
angle histogram
Spatial-scale space extrema
Z. Li, Image Analysis & Retrv. 2016
SIFT Matching and Repeatability Prediction
SIFT Distance – True Love Test
Not all SIFT are created equal…
Peak strength (DoG response at interpolated position)
p.28
Combined scale/peak strength pmf
𝑑(𝑠11, 𝑠𝑘∗
2 )
𝑑(𝑠11, 𝑠𝑘
2)≤ 𝜃
Z. Li, Image Analysis & Retrv. 2016
Outline
HW-2
Quiz-1
Review part I: Image Features & Classifiers
Image Formation: Homography
Color Space & Color Features
Filtering and Edge Features
Harris Detector
SIFT
Aggregation: BoW, VLAD, Fisher, and Supervector
Image Retrieval System Metrics: TPR/Precision, FPR, Recall, mAP
Classification: kNN, GMM Bayesian, SVM
Z. Li, Image Analysis & Retrv. 2016 p.29
Why Aggregation ?
Curse of Dimensionality
Decision Boundary / Indexing
p.30
+
…..
+
+
+
Z. Li, Image Analysis & Retrv. 2016
Histogram Aggregation: Bag-of-Words
Codebook:
Feature space: Rd, k-means to get k centroids, {𝜇1, 𝜇2, … , 𝜇𝑘}
BoW Hard Encoding:
For n feature points,{x1, x2, …,xn} assignment matrix: kxn, with column only 1-non zero entry
Aggregated dimension: k
p.31
k
n
Z. Li, Image Analysis & Retrv. 2016
Histogram Aggregation: Kernel Code Book Soft Encoding
Kernel Code Book Soft Encoding
Kernel Affinity: 𝐾 𝑥𝑗 , 𝜇𝑘 = 𝑒−𝑘|𝑥𝑗−𝜇𝑘|2
Assignment Matrix: 𝐴𝑗,𝑘 = 𝐾(𝑥𝑗 , 𝜇𝑘)/ 𝑘𝐾(𝑥𝑗 , 𝜇𝑘)
Encoding: k-dimensional: X(k)= 1
𝑛 𝑗𝐴𝑗,𝑘
p.32Z. Li, Image Analysis & Retrv. 2016
VLAD- Vector of Locally Aggregated Descriptors
Aggregate feature difference from the codebook
Hard assignment by finding the NN of feature {xk} to {𝜇𝑘}
Compute aggregated differences
L2 normalize
Final feature: k x d
p.33
3
x
v1 v2v3 v4
v5
1
4
2
5
① assign descriptors
② compute x- i
③ vi=sum x- i for cell i
𝑣𝑘 =
∀𝑗,𝑠.𝑡.𝑁𝑁 𝑥𝑗 =𝜇𝑘
𝑥𝑗 − 𝜇𝑘
𝑣𝑘 = 𝑣𝑘/ 𝑣𝑘 2
Z. Li, Image Analysis & Retrv. 2016
Fisher Vector Aggregation
FV encoding
Gradient on the mean, for GMM component k, j=1..D
In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances
p.34
𝐹𝑉 = 𝑢1, 𝑢2, … , 𝑢𝐾 , 𝑣1, 𝑣2, … , 𝑣𝐾𝑇
Richer characterization of the code book with GMM: {𝜇𝑘 , Σ𝑘 , 𝜋𝑘}
Normalize by Fisher Kernel
Z. Li, Image Analysis & Retrv. 2016
Supervector Aggregation
Assuming we have UBM GMM model
𝜆𝑈𝐵𝑀 = {𝑃𝑘 , 𝜇𝑘 , Σ𝑘},
with identical prior and covariance
Then for two utterance samples a and b, with GMM models 𝜆𝑎 = {𝑃𝑘 , 𝜇𝑘
𝑎 , Σ𝑘},
𝜆𝑏 = {𝑃𝑘 , 𝜇𝑘𝑏 , Σ𝑘},
The SV distance is,
It means the means of two models need to be normalized by the UBM covariance induced Mahanolibis distance metric
This is also a linear kernel function scaled by the UBM covariances
Z. Li, Image Analysis & Retrv. 2016 p.35
𝐾 𝜆𝑎 , 𝜆𝑏 =
𝑘
𝑃𝑘Σ𝑘−(
12)𝜇𝑘
𝑎
𝑇
( 𝑃𝑘Σ𝑘−(
12)𝜇𝑘
𝑏)
AKULA – Adaptive KLUster Aggregation
AKULA Descriptor:
No fixed codebook !
Outer loop: subspace optimization
Inner loop: cluster centroids + SIFT count
A1={yc11, yc1
2, …, yc1k ; pc1
1, pc12, …, pc1
k }
A2={yc21, yc2
2, …, yc2k ; pc2
1, pc22, …, pc2
k }
Distance metric:
Min centroids distance, weighted by SIFT count (kind of manifolds tangent distance)
d A1, A2 =1
𝑘
𝑗=0
𝑘d𝑚𝑖𝑛1 𝑗 𝑤𝑚𝑖𝑛
1 (𝑗) +1
𝑘
𝑖=0
𝑘d𝑚𝑖𝑛2 𝑖 𝑤𝑚𝑖𝑛
2 (𝑖)
d𝑚𝑖𝑛1 𝑗 = min
𝑖𝑑𝑗,𝑖
d𝑚𝑖𝑛2 𝑖 = min
𝑗𝑑𝑗,𝑖
w𝑚𝑖𝑛1 𝑗 = 𝑤𝑗,𝑖∗ , 𝑖∗ = 𝑎𝑟𝑔min
𝑖𝑑𝑗,𝑖
w𝑚𝑖𝑛2 𝑖 = 𝑤𝑗∗,𝑖 , 𝑗∗ = 𝑎𝑟𝑔min
𝑗𝑑𝑗,𝑖
p.36Z. Li, Image Analysis & Retrv. 2016
Image Retrieval Performance Metrics
True Positive Rate
False Positive Rate
Precision
Recall
F-Measure
p.37
TPR = TP/(TP+FN)
FPR = FP/(FP+TN)
Precision =TP/(TP + FP),
Recall = TP/(TP + FN)
F-measure = 2*(precision*recall)/(precision + recall)
Z. Li, Image Analysis & Retrv. 2016
Retrieval Performance: Mean Average Precision
mAP measures the retrieval performance across all queries
Image Analysis & Understanding, 2015 p.38
mAP example
MAP is computed across all query results as the average precision over the recall
Image Analysis & Understanding, 2015 p.39
Compute TPR-FPR
ROC Curve (Receiver Operating Characteristics)
p.40
Confusion/distance matrix: d(j,k)
Ground Truth: m(j,k)
𝑚 𝑗, 𝑘 = 1, 𝑑𝑜𝑐 𝑗 𝑎𝑛𝑑 𝑘 𝑎𝑟𝑒 𝑚𝑎𝑡𝑐ℎ0, 𝑛𝑜𝑡 𝑚𝑎𝑡𝑐ℎ
Z. Li, Image Analysis & Retrv. 2016
Classification in Image Retrieval
A typical image retrieval pipeline
p.41
Image Formation
Feature Computing
Feature Aggregation
Classification
Knowledge/Data Base
Z. Li, Image Analysis & Retrv. 2016
kNN Classifier
kNN is Nonlinear Classifier
Operations: majority vote
Performance bounds on k:
not worse off by 2 times the error rate of Bayesian error rate
As k increases, it improves, but not much
Accelerate NN retrieval operation
Kd-tree:
o a space partition scheme, to limit the number of data points comparison operation
Kd-tree supported kNN search:
o Approx KNN search
o Kd-Forest supported Approx Search
p.42Z. Li, Image Analysis & Retrv. 2016
Gaussian Generative Model Classifier
Shaped by the relative shape of Covariance Matrix
Matlab:
[rec_label, err]=classify(q, x, y, method);
Method = ‘quadratic’
Z. Li, Image Analysis & Retrv. 2016 p.43
The Support Vector Solution
Why called SVM ?
p.44Z. Li, Image Analysis & Retrv. 2016
K(xi, x)
Support Vector Machine
Matlab:
p.45
% train svm
svm = svmtrain(x, y, 'showplot',
true);
% svm classification
for k=1:5
figure(41); hold on; grid
on; axis([0 1 0 1]);
[u,v]=ginput(1);
q=[u,v];
plot(u, v, '*b');
rec = svmclassify(svm, q);
fprintf('\n recognized as
%c', rec);
end
Z. Li, Image Analysis & Retrv. 2016
Summary
Image Analysis & Retrieval Part I
Image Formation
Color and Texture Features
Interest Points Detection – Harris Detector
Invariant Interest Point Detection – SIFT
Aggregation
Classification tools: knn, gmm, svm
Performance metrics:
o tp, fp, tn, fn, tpr/precision, fpr, recall, mAP
Part II: Holistic approach
Just throw the pixels to the algorithm to figure out what are the good features.
Z. Li, Image Analysis & Retrv. 2016 p.46