Lec12 review-part-i

Image Analysis & Retrieval

CS/EE 5590 Special Topics (Class Ids: 44873, 44874)

Fall 2016, M/W 4-5:15pm@Bloch 0012

Lec 12

Review of Part I:

Features and Classifiers in Image Classification

Zhu Li

Dept of CSEE, UMKC

Office: FH560E, Email: [email protected], Ph: x 2346.

http://l.web.umkc.edu/lizhu

Z. Li, Image Analysis & Retrv. 2016 p.1

mailto:[email protected]

http://l.web.umkc.edu/lizhu

Outline

HW-2

Quiz-1

Review part I: Image Features & Classifiers

Image Formation: Homography

Color Space & Color Features

Filtering and Edge Features

Harris Detector

SIFT

Aggregation: BoW, VLAD, Fisher, and Supervector

Image Retrieval System Metrics: TPR/Precision, FPR, Recall, mAP

Classification: kNN, GMM Bayesian, SVM


Homework-2: Aggregation

Data Set:

6 class, 6x20=120 images

Global Codebook for VLAD, FisherVector

Compute SIFT and DenseSIFT from each image, sort them into two arrays: hw2_sift: hw2_dsift

Random sample a subset of them: use randperm()

Compute PCA:

o indx = randperm(n_sift); indx = indx(1:20*2^10); % sample 20k

o [A0, s, eigv]=princomp(hw2_sift(indx,: ));

Project SIFT

o Dimension reduction: x = sift*A0(:,1:kd)

Compute Kmeans and GMM models for VLAD and FV aggregation:

o vl_feat: vl_kmeans(), vl_gmm();


VLAD- Vector of Locally Aggregated Descriptors

Aggregate feature difference from the codebook

Hard assignment by finding the NN of feature {xk} to {𝜇𝑘}

Compute aggregated differences

L2 normalize

Final feature: k x d

Image Analysis & Understanding, 2015 p.4

3

x

v1 v2v3 v4

v5

1

4

2

5

① assign descriptors

② compute x- i

③ vi=sum x- i for cell i

𝑣𝑘 =

∀𝑗,𝑠.𝑡.𝑁𝑁 𝑥𝑗 =𝜇𝑘

𝑥𝑗 − 𝜇𝑘

𝑣𝑘 = 𝑣𝑘/ 𝑣𝑘 2

Fisher Vector Aggregation

FV encoding

Gradient on the mean, for GMM component k, j=1..D

In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances


𝐹𝑉 = 𝑢1, 𝑢2, … , 𝑢𝐾 , 𝑣1, 𝑣2, … , 𝑣𝐾𝑇

Richer characterization of the code book with GMM: {𝜇𝑘 , Σ𝑘 , 𝜋𝑘}

Normalize by Fisher Kernel

HW-2 Deliverables

Functions:

[gmm]=getGMM(sift, A, kd, n); %gmm.m, gmm.v, gmm.p

[km]=getKMeans(sift, A, kd, n); % km: n x kd

[vlad]=getVladSift(im, km, opt); % opt = ‘sift’, or ‘dsft’

[fv]=getFVSift(im, gmm, opt); % can use vl_fisher()

Plots:

For each image: {sift, dsift} x {vlad, fv} x {kd=[24,32]} x{n=[64, 128]}, total 16 features per image, plot the 120x120 distances for each feature. [hint: use subplot(2,2,k), 4 plots per fig]

TPR-FPR: for each class, plot one figure with 16 plots,

mAP: one plot/table


Quiz-1

Covers what we reviewed today

Format:

True or False

Multiple Choices

Problem Solving

Extra Credit (25%)

Close Book, 1-page A4 cheating sheet allowed.

cheating sheet need to be turned in with name and id number

cheating sheet will also be graded for up to 10 pts

Relax

Quiz is actually on me.


Outline

HW-2

Quiz-1





Harris Detector

SIFT





Classification in Image Retrieval

A typical image retrieval pipeline

p.9

Image Formation

Feature Computing

Feature Aggregation

Classification

Knowledge/Data Base

Z. Li, Image Analysis & Retrv. 2016

Homography,Color space

Color histogramFiltering, Edge DetectionHoG, Harris Detector, SIFT

BowVLADFisher VectorSupervector

Image Formation - Geometry

Image Analysis & Retrieval, 2016 p.10

X0IKx

10100

000

000

1z

y

x

f

f

v

u

w

K

Slide Credit: GaTech Computer Vision class, 2015.

Intrinsic Assumptions• Unit aspect ratio

• Optical center at (0,0)• No skew of pixels

Extrinsic Assumptions• No rotation• Camera at (0,0,0)

X

x


In general, homography H maps 2d points according to, x’=Hx

Up to a scale, as [x, y, w]=[sx, sy, sw], so H has 8 DoF

Affine Transform: 6 DoF: Contains a translation [t1, t2], and invertable

affine matrix A[2x2]

Similarity Transform, 4 DoF: a rigid transform that preserves distance if

s=1:

Image Analysis & Retrieval, 2016 p.11

𝑥′𝑦′

𝑤′

=

ℎ1 ℎ2 ℎ3ℎ4 ℎ5 ℎ6ℎ7 ℎ8 ℎ9

𝑥𝑦𝑤

H =𝑎1 𝑎2 𝑡1𝑎3 𝑎4 𝑡20 0 1

H =𝑠 𝑐𝑜𝑠 𝜃 −𝑠 sin(𝜃) 𝑡1𝑠 𝑠𝑖𝑛 𝜃 𝑠 cos(𝜃) 𝑡2

0 0 1

VL_FEAT Implementation

VL_FEAT has an implementation:

p.12Z. Li, Image Analysis & Retrv. 2016

Image Formation – Color Space

RGB Color Space:

8 bit Red, Green, Blue Channels

Mixed together

YUV/YCbCr Color Space

HSV Color Space

Hue: color in polar coord

Saturation

Value


Color Histogram – Fixed Codebook

Color Histogram:

Fixed code book to generate histogram

Example: MPEG 7 Scalable Color Descriptor

HSV Color Space, 16x4x4 partition for bins

Scalability thru quantization and Haar Transform


Color Histogram – Adaptive Codebook

Adaptive Color Histogram

No fixed code book

Example: MPEG 7 Dominant Color Descriptor

Distance Metric

Not index-able


Image Filters

Basic Filter Operations

Region of Support

p.16

1 1

1 1 1

1 1 1*

1/9 1/9 1/9

1/9 1/9 1/9

1/9 1/9 1/9


Filter Properties

Linear Filters are:

Shift-Invariant:

f(m-k, n-j)*h = g(m-k, n-j), if f*h=g

Associative:

f*h1*h2 = f*(h1*h2)

this can save a lot of complexity

Distributive:

f*h1 + f*h2 = f*(h1+h2)

useful in SIFT’s DoG filtering.


Edge Detection

The need of smoothing

Gaussian smoothing

Combine differentiation and Gaussian

p.18

f

derivative of Gaussian (x)


Image Gradient from DoG Filtering

Image Gradient (at a scale):

p.19

The gradient points in the direction of most rapid increase in intensity

The edge strength is given by the gradient magnitude:

The gradient direction is given by:


HOG – Histogram of Oriented Gradients

Generate local histogram per block

Size of HoG: n*m*4*9=36nm, for n x m cells.

p.20

12

0

10

19747

11

0

10

19545

30251510

2525150

80707050

40303020

510105

1020205

Angle Magnitude

0

20

40

60

80

100

120

140

160

0-19 20-39 40-59 60-79 80-99 100-119 120-139 140-159 160-179

0

1

2

3

4

5

0-19 20-39 40-59 60-79 80-99 100-119 120-139 140-159 160-179

Binary voting Magnitude voting

4

9

4

1

3

9

3

1

2

9

2

1

1

9

1

1 ,...,,,...,,,...,,,..., hhhhhhhhf

HOG Cell size


Smoothed 2nd moment:

Harris Detector

p.21

)()(

)()()(),(

2

2

DyDyx

DyxDx

IDIIII

IIIg

1. Image derivatives (optionally blur first)

2. Square of derivatives

3. Gaussian filter g(I)

Ix Iy

Ix2

Iy2

IxIy

g(Ix2) g(Iy

2) g(IxIy)

222222 )]()([)]([)()( yxyxyx IgIgIIgIgIg

])),([trace()],(det[ 2

DIDIharrisR

4. Harris Function – detect corners, no need to compute the

eigen values

har

1 2

1 2

det

trace

M

M


Harris Detector Steps

Response function -> Threshold -> Local Maxima

Harris Detector NOT scale invariant

p.22

R(x,y)


Scale Space Theory - Lindeberg

Scale Space Response via Laplacian of Gaussian

The scale is controlled by 𝜎

Characteristic Scale:

p.23

2

2

2

22

y

g

x

gg

𝑔 = 𝑒− 𝑥+𝑦 2

2𝜎

r

image𝜎 = 0.8𝑟 𝜎 = 1.2𝑟 𝜎 = 2𝑟

…

characteristic scale


SIFT Detector via Difference of Gaussian

However, LoG computing is expensive – non-seperatable filter

SIFT: uses Difference of Gaussian filters to approximate LoG

Separate-able, much faster

Box Filter Approximation – you will see later.

p.24

LoG

Scale space construction By Gaussian Filtering, and Image Difference


Peak Strength & Edge Removal

Peak Strength:

Interpolate true DoG response and pixel location by Taylor expansion

Edge Removal:

Re-do Harris type detection to remove edge on much reduced pixel set


Scale Invariance thru Dominant Orientation Coding

Voting for the dominant orientation

Weighted by a Gaussian window to give more emphasis to the gradients closer to the center


SIFT Description

Rotate to dominant direction, divide Image Patch (16x16 pixels) around SIFT point into 4x4 Cells

For each cell, compute an 8-bin edge histogram

Overall, we have 4x4x8=128 dimension

p.27

0 2

angle histogram

Spatial-scale space extrema


SIFT Matching and Repeatability Prediction

SIFT Distance – True Love Test

Not all SIFT are created equal…

Peak strength (DoG response at interpolated position)

p.28

Combined scale/peak strength pmf

𝑑(𝑠11, 𝑠𝑘∗

2 )

𝑑(𝑠11, 𝑠𝑘

2)≤ 𝜃


Outline

HW-2

Quiz-1





Harris Detector

SIFT





Why Aggregation ?

Curse of Dimensionality

Decision Boundary / Indexing

p.30

+

…..

+

+

+


Histogram Aggregation: Bag-of-Words

Codebook:

Feature space: Rd, k-means to get k centroids, {𝜇1, 𝜇2, … , 𝜇𝑘}

BoW Hard Encoding:

For n feature points,{x1, x2, …,xn} assignment matrix: kxn, with column only 1-non zero entry

Aggregated dimension: k

p.31

k

n


Histogram Aggregation: Kernel Code Book Soft Encoding

Kernel Code Book Soft Encoding

Kernel Affinity: 𝐾 𝑥𝑗 , 𝜇𝑘 = 𝑒−𝑘|𝑥𝑗−𝜇𝑘|2

Assignment Matrix: 𝐴𝑗,𝑘 = 𝐾(𝑥𝑗 , 𝜇𝑘)/ 𝑘𝐾(𝑥𝑗 , 𝜇𝑘)

Encoding: k-dimensional: X(k)= 1

𝑛 𝑗𝐴𝑗,𝑘


VLAD- Vector of Locally Aggregated Descriptors

Aggregate feature difference from the codebook

Hard assignment by finding the NN of feature {xk} to {𝜇𝑘}

Compute aggregated differences

L2 normalize

Final feature: k x d

p.33

3

x

v1 v2v3 v4

v5

1

4

2

5

① assign descriptors

② compute x- i

③ vi=sum x- i for cell i

𝑣𝑘 =

∀𝑗,𝑠.𝑡.𝑁𝑁 𝑥𝑗 =𝜇𝑘

𝑥𝑗 − 𝜇𝑘

𝑣𝑘 = 𝑣𝑘/ 𝑣𝑘 2


Fisher Vector Aggregation

FV encoding

Gradient on the mean, for GMM component k, j=1..D

In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances

p.34

𝐹𝑉 = 𝑢1, 𝑢2, … , 𝑢𝐾 , 𝑣1, 𝑣2, … , 𝑣𝐾𝑇

Richer characterization of the code book with GMM: {𝜇𝑘 , Σ𝑘 , 𝜋𝑘}

Normalize by Fisher Kernel


Supervector Aggregation

Assuming we have UBM GMM model

𝜆𝑈𝐵𝑀 = {𝑃𝑘 , 𝜇𝑘 , Σ𝑘},

with identical prior and covariance

Then for two utterance samples a and b, with GMM models 𝜆𝑎 = {𝑃𝑘 , 𝜇𝑘

𝑎 , Σ𝑘},

𝜆𝑏 = {𝑃𝑘 , 𝜇𝑘𝑏 , Σ𝑘},

The SV distance is,

It means the means of two models need to be normalized by the UBM covariance induced Mahanolibis distance metric

This is also a linear kernel function scaled by the UBM covariances


𝐾 𝜆𝑎 , 𝜆𝑏 =

𝑘

𝑃𝑘Σ𝑘−(

12)𝜇𝑘

𝑎

𝑇

( 𝑃𝑘Σ𝑘−(

12)𝜇𝑘

𝑏)

AKULA – Adaptive KLUster Aggregation

AKULA Descriptor:

No fixed codebook !

Outer loop: subspace optimization

Inner loop: cluster centroids + SIFT count

A1={yc11, yc1

2, …, yc1k ; pc1

1, pc12, …, pc1

k }

A2={yc21, yc2

2, …, yc2k ; pc2

1, pc22, …, pc2

k }

Distance metric:

Min centroids distance, weighted by SIFT count (kind of manifolds tangent distance)

d A1, A2 =1

𝑘

𝑗=0

𝑘d𝑚𝑖𝑛1 𝑗 𝑤𝑚𝑖𝑛

1 (𝑗) +1

𝑘

𝑖=0

𝑘d𝑚𝑖𝑛2 𝑖 𝑤𝑚𝑖𝑛

2 (𝑖)

d𝑚𝑖𝑛1 𝑗 = min

𝑖𝑑𝑗,𝑖

d𝑚𝑖𝑛2 𝑖 = min

𝑗𝑑𝑗,𝑖

w𝑚𝑖𝑛1 𝑗 = 𝑤𝑗,𝑖∗ , 𝑖∗ = 𝑎𝑟𝑔min

𝑖𝑑𝑗,𝑖

w𝑚𝑖𝑛2 𝑖 = 𝑤𝑗∗,𝑖 , 𝑗∗ = 𝑎𝑟𝑔min

𝑗𝑑𝑗,𝑖


Image Retrieval Performance Metrics

True Positive Rate

False Positive Rate

Precision

Recall

F-Measure

p.37

TPR = TP/(TP+FN)

FPR = FP/(FP+TN)

Precision =TP/(TP + FP),

Recall = TP/(TP + FN)

F-measure = 2*(precision*recall)/(precision + recall)


Retrieval Performance: Mean Average Precision

mAP measures the retrieval performance across all queries


mAP example

MAP is computed across all query results as the average precision over the recall


Compute TPR-FPR

ROC Curve (Receiver Operating Characteristics)

p.40

Confusion/distance matrix: d(j,k)

Ground Truth: m(j,k)

𝑚 𝑗, 𝑘 = 1, 𝑑𝑜𝑐 𝑗 𝑎𝑛𝑑 𝑘 𝑎𝑟𝑒 𝑚𝑎𝑡𝑐ℎ0, 𝑛𝑜𝑡 𝑚𝑎𝑡𝑐ℎ


Classification in Image Retrieval

A typical image retrieval pipeline

p.41

Image Formation

Feature Computing

Feature Aggregation

Classification

Knowledge/Data Base


kNN Classifier

kNN is Nonlinear Classifier

Operations: majority vote

Performance bounds on k:

not worse off by 2 times the error rate of Bayesian error rate

As k increases, it improves, but not much

Accelerate NN retrieval operation

Kd-tree:

o a space partition scheme, to limit the number of data points comparison operation

Kd-tree supported kNN search:

o Approx KNN search

o Kd-Forest supported Approx Search


Gaussian Generative Model Classifier

Shaped by the relative shape of Covariance Matrix

Matlab:

[rec_label, err]=classify(q, x, y, method);

Method = ‘quadratic’


The Support Vector Solution

Why called SVM ?


K(xi, x)

Support Vector Machine

Matlab:

p.45

% train svm

svm = svmtrain(x, y, 'showplot',

true);

% svm classification

for k=1:5

figure(41); hold on; grid

on; axis([0 1 0 1]);

[u,v]=ginput(1);

q=[u,v];

plot(u, v, '*b');

rec = svmclassify(svm, q);

fprintf('\n recognized as

%c', rec);

end


Summary

Image Analysis & Retrieval Part I

Image Formation

Color and Texture Features

Interest Points Detection – Harris Detector

Invariant Interest Point Detection – SIFT

Aggregation

Classification tools: knn, gmm, svm

Performance metrics:

o tp, fp, tn, fn, tpr/precision, fpr, recall, mAP

Part II: Holistic approach

Just throw the pixels to the algorithm to figure out what are the good features.


Lec12 review-part-i

Education

Transcript of Lec12 review-part-i