Download - Lec 12: Part I ReviewReview part I: Image Features & Classifiers Image Formation: Homography Color Space & Color Features Filtering and Edge Features Harris Detector SIFT Aggregation:

Spring 2019: Venu: Haag 315, Time: M/W 4-5:15pm

ECE 5582 Computer VisionLec 12: Part I Review

Zhu LiDept of CSEE, UMKC

Office: FH560E, Email: [email protected], Ph: x 2346.http://l.web.umkc.edu/lizhu

Z. Li: ECE 5582 Computer Vision, 2019. p.1

slides created with WPS Office Linux and EqualX LaTex equation editor

Quiz-1

Covers what we reviewed today Format: True or False Multiple Choices Problem Solving Extra Credit (25%)

Close Book, 1-page A4 cheating sheet allowed. cheating sheet need to be turned in with name and id number cheating sheet will also be graded for up to 10 pts

Relax Quiz is actually on me.


Best Cheating Sheet Award from 2017

Cheating sheet is also graded. Up to 10 pts extra A nice way to summarize

and organize/indexing your knowledge base. Cheating sheet + HW =

what you really get from the class.


Outline

Quiz-1 Review part I: Image Features & Classifiers Image Formation: Homography Color Space & Color Features Filtering and Edge Features Harris Detector SIFT Aggregation: BoW, VLAD, Fisher, and Supervector Image Retrieval System Metrics: TPR/Precision, FPR, Recall,

mAP Classification: kNN, GMM Bayesian, SVM


Classification in Image Retrieval

An image retrieval pipeline (hand crafted features)

p.5

Image Formation

Feature Computing

Feature Aggregation

Classification

Knowledge/Data Base

Z. Li: ECE 5582 Computer Vision, 2019.

Homography,Color space

Color histogramFiltering, Edge DetectionHoG, Harris Detector, SIFT

BowVLADFisher VectorSupervector

Image Formation - Geometry


[ ]X0IKx =úúúú

û

ù

êêêê

ë

é

úúú

û

ù

êêê

ë

é=

úúú

û

ù

êêê

ë

é

10100000000

1zyx

ff

vu

w

K

Slide Credit: GaTech Computer Vision class, 2015.

Intrinsic Assumptions• Unit aspect ratio• Optical center at (0,0)• No skew of pixels

Extrinsic Assumptions• No rotation• Camera at (0,0,0)

X

x

Image Formation: Homography In general, homography H maps 2d points

according to, x’=Hx

Up to a scale, as [x, y, w]=[sx, sy, sw], so H has 8 DoF

Affine Transform: 6 DoF: Contains a translation [t1, t2], and invertable

affine matrix A[2x2]

Similarity Transform, 4 DoF: a rigid transform that preserves distance if

s=1:


��′�′�′�=�

ℎ� ℎ� ℎ�ℎ� ℎ� ℎ�ℎ� ℎ� ℎ�

��

H =��1 �� 4 ��0 0 1

�

H =�� X�� −� sin�� X �� cos��

0 0 1�

VL_FEAT Implementation

VL_FEAT has an implementation:

p.8Z. Li: ECE 5582 Computer Vision, 2019.

Image Formation – Color Space

RGB Color Space: 8 bit Red, Green, Blue Channels Mixed together

YUV/YCbCr Color Space

HSV Color Space Hue: color in polar coord Saturation Value


Color Histogram – Fixed Codebook

Color Histogram: Fixed code book to generate histogram Example: MPEG 7 Scalable Color Descriptor HSV Color Space, 16x4x4 partition for bins Scalability thru quantization and Haar Transform


Color Histogram – Adaptive Codebook

Adaptive Color Histogram No fixed code book Example: MPEG 7 Dominant Color Descriptor Distance Metric Not index-able


Image Filters

Basic Filter Operations

Region of Support

p.12

1 11 1 11 1 1 *

1/9 1/9 1/9

1/9 1/9 1/9

1/9 1/9 1/9


Filter Properties

Linear Filters are: Shift-Invariant:

f(m-k, n-j)*h = g(m-k, n-j), if f*h=g

Associative: f*h1*h2 = f*(h1*h2)

this can save a lot of complexity

Distributive:f*h1 + f*h2 = f*(h1+h2)useful in SIFT’s DoG filtering.


Edge Detection

The need of smoothing Gaussian smoothing Combine differentiation and Gaussian

p.14

fderivative of Gaussian (x)


Image Gradient from DoG Filtering

Image Gradient (at a scale):

p.15

The gradient points in the direction of most rapid increase in intensity

The edge strength is given by the gradient magnitude:

The gradient direction is given by:


HOG – Histogram of Oriented Gradients

Generate local histogram per block Size of HoG: n*m*4*9=36nm, for n x m cells.

p.16

120

1019747

110

1019545

30251510

2525150

80707050

40303020

510105

1020205

Angle Magnitude

020406080100120140160

0-19 20-39 40-59 60-79 80-99 100-119 120-139 140-159 160-1790

1

2

3

4

5

0-19 20-39 40-59 60-79 80-99 100-119 120-139 140-159 160-179

Binary voting Magnitude voting

( )49

41

39

31

29

21

19

11 ,...,,,...,,,...,,,..., hhhhhhhhf =

HOG Cell size


Smoothed 2nd moment:

Harris Detector

p.17

úúû

ù

êêë

é*=

)()()()(

)(),( 2

2

DyDyx

DyxDxIDI III

IIIg

ssss

sssm

1. Image derivatives (optionally blur first)

2. Square of derivatives

3. Gaussian filter g(sI)

Ix Iy

Ix2 Iy2 IxIy

g(Ix2) g(Iy2) g(IxIy)

222222 )]()([)]([)()( yxyxyx IgIgIIgIgIg +-- a

=-= ])),([trace()],(det[ 2DIDIharrisR ssmassm

4. Harris Function – detect corners, no need to compute the eigen values

har

1 2

1 2

dettrace

MM

l ll l

== +


Harris Detector Steps

Response function -> Threshold -> Local Maxima Harris Detector NOT scale invariant

p.18

R(x,y)


Scale Space Theory - Lindeberg

Scale Space Response via Laplacian of Gaussian The scale is controlled by �

Characteristic Scale:

p.19

2

2

2

22

yg

xgg

¶¶

+¶¶

=Ñ

� = �− ��+��

2�

r

image� = 0.8� � = 1.2� � = 2�

…

characteristic scale


ALP Scale Space Extrema Detection

ALP filtering scale space response as polynomials

– Scale Space Extrema: �� ∗ ��X ,�XX,�� ≈ X.�X �� −XX.X� �� + �XX.�X � −�XX.��

– No Extrema (saddle point): �� ∗ ��XX,XX�,�� ≈ .�X �� − �. � �� +X.�X � − X.X

+

+


SIFT Detector via Difference of Gaussian

However, LoG computing is expensive – non- seperatable filter SIFT: uses Difference of Gaussian filters to

approximate LoG Separate-able, much faster Box Filter Approximation – you will see later.

p.21

LoG

Scale space construction By Gaussian Filtering, and Image Difference


Peak Strength & Edge Removal

Peak Strength: Interpolate true DoG response and pixel location by Taylor

expansion

Edge Removal: Re-do Harris type detection to remove edge on much reduced

pixel set


Scale Invariance thru Dominant Orientation Coding

Voting for the dominant orientation Weighted by a Gaussian window to give more emphasis to the

gradients closer to the center


SIFT Description Rotate to dominant direction, divide

Image Patch (16x16 pixels) around SIFT point into 4x4 Cells For each cell, compute an 8-bin

edge histogram

Overall, we have 4x4x8=128

dimension

p.24

0 2angle histogram

Spatial-scale space extrema


SIFT Matching and Repeatability Prediction

SIFT Distance – True Love Test

Not all SIFT are created equal… Peak strength (DoG response at interpolated position)

p.25

Combined scale/peak strength pmf

��, ��∗

� ��

�, ��

≤ �


Outline

HW-2 Quiz-1 Review part I: Image Features & Classifiers Image Formation: Homography Color Space & Color Features Filtering and Edge Features Harris Detector SIFT Aggregation: BoW, VLAD, Fisher, and Supervector Image Retrieval System Metrics: TPR/Precision, FPR, Recall,

mAP Classification: kNN, GMM Bayesian, SVM


Why Aggregation ?

Curse of Dimensionality

Decision Boundary / Indexing

p.27

+

…..

+

+

+


Histogram Aggregation: Bag-of-Words

Codebook: Feature space: Rd, k-means to get k centroids, {��, ��,…,��}

BoW Hard Encoding: For n feature points,{x1, x2, …,xn} assignment matrix: kxn,

with column only 1-non zero entry Aggregated dimension: k

p.28

k

n


Histogram Aggregation: Kernel Code Book Soft Encoding

Kernel Code Book Soft Encoding Kernel Affinity: ��, �� = �−�|�� −��|

�

Assignment Matrix: �� = ��, ��/��, �� Encoding: k-dimensional: X(k)= �1 ��


VLAD- Vector of Locally Aggregated Descriptors

Aggregate feature difference from the codebook Hard assignment by finding the

NN of feature {xk} to {��} Compute aggregated

differences

L2 normalize

Final feature: k x d

p.30

m 3

x

v1 v2 v3 v4

v5

m 1

m 4

m 2

m 5

① assign descriptors

② compute x- m i

③ vi=sum x- m i for cell i

�� = �∀�,�.�.��=��

�� −��

�� = ��/||��||�


Fisher Vector Aggregation

FV encoding Gradient on the mean, for GMM

component k, j=1..D

In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances

p.31

�X= ��, ��, …, ��, ��, ��, …, ��

Richer characterization of the code book with GMM: {��, Σ�,��}

Normalize by Fisher Kernel


Supervector Aggregation Assuming we have UBM GMM model

�� = {��, ��, Σ�}, with identical prior and covariance Then for two utterance samples a and b, with

GMM models �� = {��, ��

�, Σ�}, �� = {��, ��

�, Σ�},

The SV distance is,

It means the means of two models need to be normalized by the UBM covariance induced Mahanolibis distance metricThis is also a linear kernel function scaled by the UBM covariances


��, �� = �� Σ�

−�12��

�

� ��Σ�−�12��

��

AKULA – Adaptive KLUster AggregationAKULA Descriptor:

No fixed codebook ! Outer loop: subspace optimization

Inner loop: cluster centroids + SIFT count

A1={yc11, yc1

2, …, yc1k ; pc1

1, pc12, …, pc1

k } A2={yc2

1, yc22, …, yc2

k ; pc21, pc2

2, …, pc2k }

Distance metric: Min centroids distance, weighted by SIFT

count (kind of manifolds tangent distance)


Image Retrieval Performance Metrics

True Positive Rate

False Positive Rate

Precision

Recall

F-Measure

p.34

TPR = TP/(TP+FP)

FPR = FP/(TP+FP)

Precision =TP/(TP + FP),

Recall = TP/(TP + FN)

F-measure = 2*(precision*recall)/(precision + recall)


Retrieval Performance: Mean Average Precision

mAP measures the retrieval performance across all queries


mAP example

MAP is computed across all query results as the average precision over the recall


Compute TPR-FPR

ROC Curve (Receiver Operating Characteristics)

p.37

Confusion/distance matrix: d(j,k)

Ground Truth: m(j,k)

��,�� = �1, �X� � � � � �� ℎ�0, X� ��ℎ


Classification in Image Retrieval

A typical image retrieval pipeline

p.38

Image Formatio

n

Feature Computing

Feature Aggregation

Classification

Knowledge/Data Base


kNN Classifier

kNN is Nonlinear Classifier Operations: majority vote

Performance bounds on k: not worse off by 2 times the error

rate of Bayesian error rate As k increases, it improves, but not

much

Accelerate NN retrieval operation Kd-tree:

o a space partition scheme, to limit the number of data points comparison operation

Kd-tree supported kNN search:o Approx KNN searcho Kd-Forest supported Approx Search


Gaussian Generative Model Classifier

Shaped by the relative shape of Covariance Matrix

Matlab: [rec_label, err]=classify(q, x, y, method); Method = ‘quadratic’


Logistic Regression

Logistic Function: Mapping linear function to [0 1]. Give a prob of observed X is has label 0 or 1 via logistic

mapping


z

1w

iw

Iw

…

1x

ix

Ix

+

b

( )zs

…

……

Gradient Search Solution With Log loss function the problem is convex, and a gradient search solution can

reach optimal. for each weights wj, we have,

For a batch of m oberved {x(i), y(i)}, we can do batch descent, or stochastic gradient descent (SGD)

matlab: mnrfit()


The Support Vector Solution Why called SVM ?


K(xi, x)

Support Vector Machine

Matlab:

p.44

% train svmsvm = svmtrain(x, y, 'showplot', true);

% svm classificationfor k=1:5 figure(41); hold on; grid on; axis([0 1 0 1]); [u,v]=ginput(1); q=[u,v]; plot(u, v, '*b'); rec = svmclassify(svm, q); fprintf('\n recognized as %c', rec); end


Summary Image Analysis & Retrieval Part I Image Formation Color and Texture Features Interest Points Detection – Harris Detector Invariant Interest Point Detection – SIFT Aggregation Classification tools: knn, gmm, svm Performance metrics:

o tp, fp, tn, fn, tpr/precision, fpr, recall, mAP Part II: Holistic approach Just throw the pixels to the algorithm to figure out what are

the good features. Subspace approach: Eigenface, Fisherface, Laplacian

Embedding Deep Learning: Classification, Identification, and

Segmentation Hashing