Spring 2019: Venu: Haag 315, Time: M/W 4-5:15pm
ECE 5582 Computer VisionLec 12: Part I Review
Zhu LiDept of CSEE, UMKC
Office: FH560E, Email: [email protected], Ph: x 2346.http://l.web.umkc.edu/lizhu
Z. Li: ECE 5582 Computer Vision, 2019. p.1
slides created with WPS Office Linux and EqualX LaTex equation editor
Quiz-1
Covers what we reviewed today Format: True or False Multiple Choices Problem Solving Extra Credit (25%)
Close Book, 1-page A4 cheating sheet allowed. cheating sheet need to be turned in with name and id number cheating sheet will also be graded for up to 10 pts
Relax Quiz is actually on me.
Z. Li: ECE 5582 Computer Vision, 2019. p.2
Best Cheating Sheet Award from 2017
Cheating sheet is also graded. Up to 10 pts extra A nice way to summarize
and organize/indexing your knowledge base. Cheating sheet + HW =
what you really get from the class.
Z. Li: ECE 5582 Computer Vision, 2019. p.3
Outline
Quiz-1 Review part I: Image Features & Classifiers Image Formation: Homography Color Space & Color Features Filtering and Edge Features Harris Detector SIFT Aggregation: BoW, VLAD, Fisher, and Supervector Image Retrieval System Metrics: TPR/Precision, FPR, Recall,
mAP Classification: kNN, GMM Bayesian, SVM
Z. Li: ECE 5582 Computer Vision, 2019. p.4
Classification in Image Retrieval
An image retrieval pipeline (hand crafted features)
p.5
Image Formation
Feature Computing
Feature Aggregation
Classification
Knowledge/Data Base
Z. Li: ECE 5582 Computer Vision, 2019.
Homography,Color space
Color histogramFiltering, Edge DetectionHoG, Harris Detector, SIFT
BowVLADFisher VectorSupervector
Image Formation - Geometry
Z. Li: ECE 5582 Computer Vision, 2019. p.6
[ ]X0IKx =úúúú
û
ù
êêêê
ë
é
úúú
û
ù
êêê
ë
é=
úúú
û
ù
êêê
ë
é
10100000000
1zyx
ff
vu
w
K
Slide Credit: GaTech Computer Vision class, 2015.
Intrinsic Assumptions• Unit aspect ratio• Optical center at (0,0)• No skew of pixels
Extrinsic Assumptions• No rotation• Camera at (0,0,0)
X
x
Image Formation: Homography In general, homography H maps 2d points
according to, x’=Hx
Up to a scale, as [x, y, w]=[sx, sy, sw], so H has 8 DoF
Affine Transform: 6 DoF: Contains a translation [t1, t2], and invertable
affine matrix A[2x2]
Similarity Transform, 4 DoF: a rigid transform that preserves distance if
s=1:
Z. Li: ECE 5582 Computer Vision, 2019. p.7
��′�′�′�=�
ℎ� ℎ� ℎ�ℎ� ℎ� ℎ�ℎ� ℎ� ℎ�
������
H =��1 �� ���� �4 ��0 0 1
�
H =�� �X���� −� sin���� ��� �X ��� � cos���� ��
0 0 1�
VL_FEAT Implementation
VL_FEAT has an implementation:
p.8Z. Li: ECE 5582 Computer Vision, 2019.
Image Formation – Color Space
RGB Color Space: 8 bit Red, Green, Blue Channels Mixed together
YUV/YCbCr Color Space
HSV Color Space Hue: color in polar coord Saturation Value
p.9Z. Li: ECE 5582 Computer Vision, 2019.
Color Histogram – Fixed Codebook
Color Histogram: Fixed code book to generate histogram Example: MPEG 7 Scalable Color Descriptor HSV Color Space, 16x4x4 partition for bins Scalability thru quantization and Haar Transform
p.10Z. Li: ECE 5582 Computer Vision, 2019.
Color Histogram – Adaptive Codebook
Adaptive Color Histogram No fixed code book Example: MPEG 7 Dominant Color Descriptor Distance Metric Not index-able
p.11Z. Li: ECE 5582 Computer Vision, 2019.
Image Filters
Basic Filter Operations
Region of Support
p.12
1 11 1 11 1 1 *
1/9 1/9 1/9
1/9 1/9 1/9
1/9 1/9 1/9
Z. Li: ECE 5582 Computer Vision, 2019.
Filter Properties
Linear Filters are: Shift-Invariant:
f(m-k, n-j)*h = g(m-k, n-j), if f*h=g
Associative: f*h1*h2 = f*(h1*h2)
this can save a lot of complexity
Distributive:f*h1 + f*h2 = f*(h1+h2)useful in SIFT’s DoG filtering.
p.13Z. Li: ECE 5582 Computer Vision, 2019.
Edge Detection
The need of smoothing Gaussian smoothing Combine differentiation and Gaussian
p.14
fderivative of Gaussian (x)
Z. Li: ECE 5582 Computer Vision, 2019.
Image Gradient from DoG Filtering
Image Gradient (at a scale):
p.15
The gradient points in the direction of most rapid increase in intensity
The edge strength is given by the gradient magnitude:
The gradient direction is given by:
Z. Li: ECE 5582 Computer Vision, 2019.
HOG – Histogram of Oriented Gradients
Generate local histogram per block Size of HoG: n*m*4*9=36nm, for n x m cells.
p.16
120
1019747
110
1019545
30251510
2525150
80707050
40303020
510105
1020205
Angle Magnitude
020406080100120140160
0-19 20-39 40-59 60-79 80-99 100-119 120-139 140-159 160-1790
1
2
3
4
5
0-19 20-39 40-59 60-79 80-99 100-119 120-139 140-159 160-179
Binary voting Magnitude voting
( )49
41
39
31
29
21
19
11 ,...,,,...,,,...,,,..., hhhhhhhhf =
HOG Cell size
Z. Li: ECE 5582 Computer Vision, 2019.
Smoothed 2nd moment:
Harris Detector
p.17
úúû
ù
êêë
é*=
)()()()(
)(),( 2
2
DyDyx
DyxDxIDI III
IIIg
ssss
sssm
1. Image derivatives (optionally blur first)
2. Square of derivatives
3. Gaussian filter g(sI)
Ix Iy
Ix2 Iy2 IxIy
g(Ix2) g(Iy2) g(IxIy)
222222 )]()([)]([)()( yxyxyx IgIgIIgIgIg +-- a
=-= ])),([trace()],(det[ 2DIDIharrisR ssmassm
4. Harris Function – detect corners, no need to compute the eigen values
har
1 2
1 2
dettrace
MM
l ll l
== +
Z. Li: ECE 5582 Computer Vision, 2019.
Harris Detector Steps
Response function -> Threshold -> Local Maxima Harris Detector NOT scale invariant
p.18
R(x,y)
Z. Li: ECE 5582 Computer Vision, 2019.
Scale Space Theory - Lindeberg
Scale Space Response via Laplacian of Gaussian The scale is controlled by �
Characteristic Scale:
p.19
2
2
2
22
yg
xgg
¶¶
+¶¶
=Ñ
� = �− ��+���
2�
r
image� = 0.8� � = 1.2� � = 2�
…
characteristic scale
Z. Li: ECE 5582 Computer Vision, 2019.
ALP Scale Space Extrema Detection
ALP filtering scale space response as polynomials
– Scale Space Extrema: �� ∗ ����X ,�XX,�� ≈ X.�X �� −XX.X� �� + �XX.�X � −�XX.��
– No Extrema (saddle point): �� ∗ ����XX,XX�,�� ≈ .�X �� − �. � �� +X.�X � − X.X
+
+
Z. Li: ECE 5582 Computer Vision, 2019. p.20
SIFT Detector via Difference of Gaussian
However, LoG computing is expensive – non- seperatable filter SIFT: uses Difference of Gaussian filters to
approximate LoG Separate-able, much faster Box Filter Approximation – you will see later.
p.21
LoG
Scale space construction By Gaussian Filtering, and Image Difference
Z. Li: ECE 5582 Computer Vision, 2019.
Peak Strength & Edge Removal
Peak Strength: Interpolate true DoG response and pixel location by Taylor
expansion
Edge Removal: Re-do Harris type detection to remove edge on much reduced
pixel set
p.22Z. Li: ECE 5582 Computer Vision, 2019.
Scale Invariance thru Dominant Orientation Coding
Voting for the dominant orientation Weighted by a Gaussian window to give more emphasis to the
gradients closer to the center
p.23Z. Li: ECE 5582 Computer Vision, 2019.
SIFT Description Rotate to dominant direction, divide
Image Patch (16x16 pixels) around SIFT point into 4x4 Cells For each cell, compute an 8-bin
edge histogram
Overall, we have 4x4x8=128
dimension
p.24
0 2angle histogram
Spatial-scale space extrema
Z. Li: ECE 5582 Computer Vision, 2019.
SIFT Matching and Repeatability Prediction
SIFT Distance – True Love Test
Not all SIFT are created equal… Peak strength (DoG response at interpolated position)
p.25
Combined scale/peak strength pmf
�����, ��∗
� �����
�, ����
≤ �
Z. Li: ECE 5582 Computer Vision, 2019.
Outline
HW-2 Quiz-1 Review part I: Image Features & Classifiers Image Formation: Homography Color Space & Color Features Filtering and Edge Features Harris Detector SIFT Aggregation: BoW, VLAD, Fisher, and Supervector Image Retrieval System Metrics: TPR/Precision, FPR, Recall,
mAP Classification: kNN, GMM Bayesian, SVM
Z. Li: ECE 5582 Computer Vision, 2019. p.26
Why Aggregation ?
Curse of Dimensionality
Decision Boundary / Indexing
p.27
+
…..
+
+
+
Z. Li: ECE 5582 Computer Vision, 2019.
Histogram Aggregation: Bag-of-Words
Codebook: Feature space: Rd, k-means to get k centroids, {��, ��,…,��}
BoW Hard Encoding: For n feature points,{x1, x2, …,xn} assignment matrix: kxn,
with column only 1-non zero entry Aggregated dimension: k
p.28
k
n
Z. Li: ECE 5582 Computer Vision, 2019.
Histogram Aggregation: Kernel Code Book Soft Encoding
Kernel Code Book Soft Encoding Kernel Affinity: ����, ��� = �−�|�� −��|
�
Assignment Matrix: ���� = ����, ���/������, ��� Encoding: k-dimensional: X(k)= �1 �������
p.29Z. Li: ECE 5582 Computer Vision, 2019.
VLAD- Vector of Locally Aggregated Descriptors
Aggregate feature difference from the codebook Hard assignment by finding the
NN of feature {xk} to {��} Compute aggregated
differences
L2 normalize
Final feature: k x d
p.30
m 3
x
v1 v2 v3 v4
v5
m 1
m 4
m 2
m 5
① assign descriptors
② compute x- m i
③ vi=sum x- m i for cell i
�� = �∀�,�.�.������=��
�� −��
�� = ��/||��||�
Z. Li: ECE 5582 Computer Vision, 2019.
Fisher Vector Aggregation
FV encoding Gradient on the mean, for GMM
component k, j=1..D
In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances
p.31
�X= ���, ��, …, ��, ��, ��, …, ����
Richer characterization of the code book with GMM: {��, �,��}
Normalize by Fisher Kernel
Z. Li: ECE 5582 Computer Vision, 2019.
Supervector Aggregation Assuming we have UBM GMM model
���� = {��, ��, �}, with identical prior and covariance Then for two utterance samples a and b, with
GMM models �� = {��, ��
�, �}, �� = {��, ��
�, Σ�},
The SV distance is,
It means the means of two models need to be normalized by the UBM covariance induced Mahanolibis distance metricThis is also a linear kernel function scaled by the UBM covariances
Z. Li: ECE 5582 Computer Vision, 2019. p.32
����, ��� = ��� ���
−�12�����
�
� ��Σ�−�12���
��
AKULA – Adaptive KLUster AggregationAKULA Descriptor:
No fixed codebook ! Outer loop: subspace optimization
Inner loop: cluster centroids + SIFT count
A1={yc11, yc1
2, …, yc1k ; pc1
1, pc12, …, pc1
k } A2={yc2
1, yc22, …, yc2
k ; pc21, pc2
2, …, pc2k }
Distance metric: Min centroids distance, weighted by SIFT
count (kind of manifolds tangent distance)
p.33Z. Li: ECE 5582 Computer Vision, 2019.
Image Retrieval Performance Metrics
True Positive Rate
False Positive Rate
Precision
Recall
F-Measure
p.34
TPR = TP/(TP+FP)
FPR = FP/(TP+FP)
Precision =TP/(TP + FP),
Recall = TP/(TP + FN)
F-measure = 2*(precision*recall)/(precision + recall)
Z. Li: ECE 5582 Computer Vision, 2019.
Retrieval Performance: Mean Average Precision
mAP measures the retrieval performance across all queries
Z. Li: ECE 5582 Computer Vision, 2019. p.35
mAP example
MAP is computed across all query results as the average precision over the recall
Z. Li: ECE 5582 Computer Vision, 2019. p.36
Compute TPR-FPR
ROC Curve (Receiver Operating Characteristics)
p.37
Confusion/distance matrix: d(j,k)
Ground Truth: m(j,k)
���,�� = �1, �X� � � � � ��� ����ℎ�0, X� ����ℎ
Z. Li: ECE 5582 Computer Vision, 2019.
Classification in Image Retrieval
A typical image retrieval pipeline
p.38
Image Formatio
n
Feature Computing
Feature Aggregation
Classification
Knowledge/Data Base
Z. Li: ECE 5582 Computer Vision, 2019.
kNN Classifier
kNN is Nonlinear Classifier Operations: majority vote
Performance bounds on k: not worse off by 2 times the error
rate of Bayesian error rate As k increases, it improves, but not
much
Accelerate NN retrieval operation Kd-tree:
o a space partition scheme, to limit the number of data points comparison operation
Kd-tree supported kNN search:o Approx KNN searcho Kd-Forest supported Approx Search
p.39Z. Li: ECE 5582 Computer Vision, 2019.
Gaussian Generative Model Classifier
Shaped by the relative shape of Covariance Matrix
Matlab: [rec_label, err]=classify(q, x, y, method); Method = ‘quadratic’
Z. Li: ECE 5582 Computer Vision, 2019. p.40
Logistic Regression
Logistic Function: Mapping linear function to [0 1]. Give a prob of observed X is has label 0 or 1 via logistic
mapping
Z. Li: ECE 5582 Computer Vision, 2019. p.41
z
1w
iw
Iw
…
1x
ix
Ix
+
b
( )zs
…
……
Gradient Search Solution With Log loss function the problem is convex, and a gradient search solution can
reach optimal. for each weights wj, we have,
For a batch of m oberved {x(i), y(i)}, we can do batch descent, or stochastic gradient descent (SGD)
matlab: mnrfit()
Z. Li: ECE 5582 Computer Vision, 2019. p.42
The Support Vector Solution Why called SVM ?
p.43Z. Li: ECE 5582 Computer Vision, 2019.
K(xi, x)
Support Vector Machine
Matlab:
p.44
% train svmsvm = svmtrain(x, y, 'showplot', true);
% svm classificationfor k=1:5 figure(41); hold on; grid on; axis([0 1 0 1]); [u,v]=ginput(1); q=[u,v]; plot(u, v, '*b'); rec = svmclassify(svm, q); fprintf('\n recognized as %c', rec); end
Z. Li: ECE 5582 Computer Vision, 2019.
Summary Image Analysis & Retrieval Part I Image Formation Color and Texture Features Interest Points Detection – Harris Detector Invariant Interest Point Detection – SIFT Aggregation Classification tools: knn, gmm, svm Performance metrics:
o tp, fp, tn, fn, tpr/precision, fpr, recall, mAP Part II: Holistic approach Just throw the pixels to the algorithm to figure out what are
the good features. Subspace approach: Eigenface, Fisherface, Laplacian
Embedding Deep Learning: Classification, Identification, and
Segmentation Hashing
Z. Li: ECE 5582 Computer Vision, 2019. p.45
Top Related