Lec12 review-part-i

46
Image Analysis & Retrieval CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W 4-5:15pm@Bloch 0012 Lec 12 Review of Part I: Features and Classifiers in Image Classification Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: [email protected] , Ph: x 2346. http://l.web.umkc.edu/lizhu Z. Li, Image Analysis & Retrv. 2016 p.1

Transcript of Lec12 review-part-i

Page 1: Lec12 review-part-i

Image Analysis & Retrieval

CS/EE 5590 Special Topics (Class Ids: 44873, 44874)

Fall 2016, M/W 4-5:15pm@Bloch 0012

Lec 12

Review of Part I:

Features and Classifiers in Image Classification

Zhu Li

Dept of CSEE, UMKC

Office: FH560E, Email: [email protected], Ph: x 2346.

http://l.web.umkc.edu/lizhu

Z. Li, Image Analysis & Retrv. 2016 p.1

Page 2: Lec12 review-part-i

Outline

HW-2

Quiz-1

Review part I: Image Features & Classifiers

Image Formation: Homography

Color Space & Color Features

Filtering and Edge Features

Harris Detector

SIFT

Aggregation: BoW, VLAD, Fisher, and Supervector

Image Retrieval System Metrics: TPR/Precision, FPR, Recall, mAP

Classification: kNN, GMM Bayesian, SVM

Z. Li, Image Analysis & Retrv. 2016 p.2

Page 3: Lec12 review-part-i

Homework-2: Aggregation

Data Set:

6 class, 6x20=120 images

Global Codebook for VLAD, FisherVector

Compute SIFT and DenseSIFT from each image, sort them into two arrays: hw2_sift: hw2_dsift

Random sample a subset of them: use randperm()

Compute PCA:

o indx = randperm(n_sift); indx = indx(1:20*2^10); % sample 20k

o [A0, s, eigv]=princomp(hw2_sift(indx,: ));

Project SIFT

o Dimension reduction: x = sift*A0(:,1:kd)

Compute Kmeans and GMM models for VLAD and FV aggregation:

o vl_feat: vl_kmeans(), vl_gmm();

Z. Li, Image Analysis & Retrv. 2016 p.3

Page 4: Lec12 review-part-i

VLAD- Vector of Locally Aggregated Descriptors

Aggregate feature difference from the codebook

Hard assignment by finding the NN of feature {xk} to {𝜇𝑘}

Compute aggregated differences

L2 normalize

Final feature: k x d

Image Analysis & Understanding, 2015 p.4

3

x

v1 v2v3 v4

v5

1

4

2

5

① assign descriptors

② compute x- i

③ vi=sum x- i for cell i

𝑣𝑘 =

∀𝑗,𝑠.𝑡.𝑁𝑁 𝑥𝑗 =𝜇𝑘

𝑥𝑗 − 𝜇𝑘

𝑣𝑘 = 𝑣𝑘/ 𝑣𝑘 2

Page 5: Lec12 review-part-i

Fisher Vector Aggregation

FV encoding

Gradient on the mean, for GMM component k, j=1..D

In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances

Image Analysis & Understanding, 2015 p.5

𝐹𝑉 = 𝑢1, 𝑢2, … , 𝑢𝐾 , 𝑣1, 𝑣2, … , 𝑣𝐾𝑇

Richer characterization of the code book with GMM: {𝜇𝑘 , Σ𝑘 , 𝜋𝑘}

Normalize by Fisher Kernel

Page 6: Lec12 review-part-i

HW-2 Deliverables

Functions:

[gmm]=getGMM(sift, A, kd, n); %gmm.m, gmm.v, gmm.p

[km]=getKMeans(sift, A, kd, n); % km: n x kd

[vlad]=getVladSift(im, km, opt); % opt = ‘sift’, or ‘dsft’

[fv]=getFVSift(im, gmm, opt); % can use vl_fisher()

Plots:

For each image: {sift, dsift} x {vlad, fv} x {kd=[24,32]} x{n=[64, 128]}, total 16 features per image, plot the 120x120 distances for each feature. [hint: use subplot(2,2,k), 4 plots per fig]

TPR-FPR: for each class, plot one figure with 16 plots,

mAP: one plot/table

Z. Li, Image Analysis & Retrv. 2016 p.6

Page 7: Lec12 review-part-i

Quiz-1

Covers what we reviewed today

Format:

True or False

Multiple Choices

Problem Solving

Extra Credit (25%)

Close Book, 1-page A4 cheating sheet allowed.

cheating sheet need to be turned in with name and id number

cheating sheet will also be graded for up to 10 pts

Relax

Quiz is actually on me.

Z. Li, Image Analysis & Retrv. 2016 p.7

Page 8: Lec12 review-part-i

Outline

HW-2

Quiz-1

Review part I: Image Features & Classifiers

Image Formation: Homography

Color Space & Color Features

Filtering and Edge Features

Harris Detector

SIFT

Aggregation: BoW, VLAD, Fisher, and Supervector

Image Retrieval System Metrics: TPR/Precision, FPR, Recall, mAP

Classification: kNN, GMM Bayesian, SVM

Z. Li, Image Analysis & Retrv. 2016 p.8

Page 9: Lec12 review-part-i

Classification in Image Retrieval

A typical image retrieval pipeline

p.9

Image Formation

Feature Computing

Feature Aggregation

Classification

Knowledge/Data Base

Z. Li, Image Analysis & Retrv. 2016

Homography,Color space

Color histogramFiltering, Edge DetectionHoG, Harris Detector, SIFT

BowVLADFisher VectorSupervector

Page 10: Lec12 review-part-i

Image Formation - Geometry

Image Analysis & Retrieval, 2016 p.10

X0IKx

10100

000

000

1z

y

x

f

f

v

u

w

K

Slide Credit: GaTech Computer Vision class, 2015.

Intrinsic Assumptions• Unit aspect ratio

• Optical center at (0,0)• No skew of pixels

Extrinsic Assumptions• No rotation• Camera at (0,0,0)

X

x

Page 11: Lec12 review-part-i

Image Formation: Homography

In general, homography H maps 2d points according to, x’=Hx

Up to a scale, as [x, y, w]=[sx, sy, sw], so H has 8 DoF

Affine Transform: 6 DoF: Contains a translation [t1, t2], and invertable

affine matrix A[2x2]

Similarity Transform, 4 DoF: a rigid transform that preserves distance if

s=1:

Image Analysis & Retrieval, 2016 p.11

𝑥′𝑦′

𝑤′

=

ℎ1 ℎ2 ℎ3ℎ4 ℎ5 ℎ6ℎ7 ℎ8 ℎ9

𝑥𝑦𝑤

H =𝑎1 𝑎2 𝑡1𝑎3 𝑎4 𝑡20 0 1

H =𝑠 𝑐𝑜𝑠 𝜃 −𝑠 sin(𝜃) 𝑡1𝑠 𝑠𝑖𝑛 𝜃 𝑠 cos(𝜃) 𝑡2

0 0 1

Page 12: Lec12 review-part-i

VL_FEAT Implementation

VL_FEAT has an implementation:

p.12Z. Li, Image Analysis & Retrv. 2016

Page 13: Lec12 review-part-i

Image Formation – Color Space

RGB Color Space:

8 bit Red, Green, Blue Channels

Mixed together

YUV/YCbCr Color Space

HSV Color Space

Hue: color in polar coord

Saturation

Value

p.13Z. Li, Image Analysis & Retrv. 2016

Page 14: Lec12 review-part-i

Color Histogram – Fixed Codebook

Color Histogram:

Fixed code book to generate histogram

Example: MPEG 7 Scalable Color Descriptor

HSV Color Space, 16x4x4 partition for bins

Scalability thru quantization and Haar Transform

p.14Z. Li, Image Analysis & Retrv. 2016

Page 15: Lec12 review-part-i

Color Histogram – Adaptive Codebook

Adaptive Color Histogram

No fixed code book

Example: MPEG 7 Dominant Color Descriptor

Distance Metric

Not index-able

p.15Z. Li, Image Analysis & Retrv. 2016

Page 16: Lec12 review-part-i

Image Filters

Basic Filter Operations

Region of Support

p.16

1 1

1 1 1

1 1 1*

1/9 1/9 1/9

1/9 1/9 1/9

1/9 1/9 1/9

Z. Li, Image Analysis & Retrv. 2016

Page 17: Lec12 review-part-i

Filter Properties

Linear Filters are:

Shift-Invariant:

f(m-k, n-j)*h = g(m-k, n-j), if f*h=g

Associative:

f*h1*h2 = f*(h1*h2)

this can save a lot of complexity

Distributive:

f*h1 + f*h2 = f*(h1+h2)

useful in SIFT’s DoG filtering.

p.17Z. Li, Image Analysis & Retrv. 2016

Page 18: Lec12 review-part-i

Edge Detection

The need of smoothing

Gaussian smoothing

Combine differentiation and Gaussian

p.18

f

derivative of Gaussian (x)

Z. Li, Image Analysis & Retrv. 2016

Page 19: Lec12 review-part-i

Image Gradient from DoG Filtering

Image Gradient (at a scale):

p.19

The gradient points in the direction of most rapid increase in intensity

The edge strength is given by the gradient magnitude:

The gradient direction is given by:

Z. Li, Image Analysis & Retrv. 2016

Page 20: Lec12 review-part-i

HOG – Histogram of Oriented Gradients

Generate local histogram per block

Size of HoG: n*m*4*9=36nm, for n x m cells.

p.20

12

0

10

19747

11

0

10

19545

30251510

2525150

80707050

40303020

510105

1020205

Angle Magnitude

0

20

40

60

80

100

120

140

160

0-19 20-39 40-59 60-79 80-99 100-119 120-139 140-159 160-179

0

1

2

3

4

5

0-19 20-39 40-59 60-79 80-99 100-119 120-139 140-159 160-179

Binary voting Magnitude voting

4

9

4

1

3

9

3

1

2

9

2

1

1

9

1

1 ,...,,,...,,,...,,,..., hhhhhhhhf

HOG Cell size

Z. Li, Image Analysis & Retrv. 2016

Page 21: Lec12 review-part-i

Smoothed 2nd moment:

Harris Detector

p.21

)()(

)()()(),(

2

2

DyDyx

DyxDx

IDIIII

IIIg

1. Image derivatives (optionally blur first)

2. Square of derivatives

3. Gaussian filter g(I)

Ix Iy

Ix2

Iy2

IxIy

g(Ix2) g(Iy

2) g(IxIy)

222222 )]()([)]([)()( yxyxyx IgIgIIgIgIg

])),([trace()],(det[ 2

DIDIharrisR

4. Harris Function – detect corners, no need to compute the

eigen values

har

1 2

1 2

det

trace

M

M

Z. Li, Image Analysis & Retrv. 2016

Page 22: Lec12 review-part-i

Harris Detector Steps

Response function -> Threshold -> Local Maxima

Harris Detector NOT scale invariant

p.22

R(x,y)

Z. Li, Image Analysis & Retrv. 2016

Page 23: Lec12 review-part-i

Scale Space Theory - Lindeberg

Scale Space Response via Laplacian of Gaussian

The scale is controlled by 𝜎

Characteristic Scale:

p.23

2

2

2

22

y

g

x

gg

𝑔 = 𝑒− 𝑥+𝑦 2

2𝜎

r

image𝜎 = 0.8𝑟 𝜎 = 1.2𝑟 𝜎 = 2𝑟

characteristic scale

Z. Li, Image Analysis & Retrv. 2016

Page 24: Lec12 review-part-i

SIFT Detector via Difference of Gaussian

However, LoG computing is expensive – non-seperatable filter

SIFT: uses Difference of Gaussian filters to approximate LoG

Separate-able, much faster

Box Filter Approximation – you will see later.

p.24

LoG

Scale space construction By Gaussian Filtering, and Image Difference

Z. Li, Image Analysis & Retrv. 2016

Page 25: Lec12 review-part-i

Peak Strength & Edge Removal

Peak Strength:

Interpolate true DoG response and pixel location by Taylor expansion

Edge Removal:

Re-do Harris type detection to remove edge on much reduced pixel set

p.25Z. Li, Image Analysis & Retrv. 2016

Page 26: Lec12 review-part-i

Scale Invariance thru Dominant Orientation Coding

Voting for the dominant orientation

Weighted by a Gaussian window to give more emphasis to the gradients closer to the center

p.26Z. Li, Image Analysis & Retrv. 2016

Page 27: Lec12 review-part-i

SIFT Description

Rotate to dominant direction, divide Image Patch (16x16 pixels) around SIFT point into 4x4 Cells

For each cell, compute an 8-bin edge histogram

Overall, we have 4x4x8=128 dimension

p.27

0 2

angle histogram

Spatial-scale space extrema

Z. Li, Image Analysis & Retrv. 2016

Page 28: Lec12 review-part-i

SIFT Matching and Repeatability Prediction

SIFT Distance – True Love Test

Not all SIFT are created equal…

Peak strength (DoG response at interpolated position)

p.28

Combined scale/peak strength pmf

𝑑(𝑠11, 𝑠𝑘∗

2 )

𝑑(𝑠11, 𝑠𝑘

2)≤ 𝜃

Z. Li, Image Analysis & Retrv. 2016

Page 29: Lec12 review-part-i

Outline

HW-2

Quiz-1

Review part I: Image Features & Classifiers

Image Formation: Homography

Color Space & Color Features

Filtering and Edge Features

Harris Detector

SIFT

Aggregation: BoW, VLAD, Fisher, and Supervector

Image Retrieval System Metrics: TPR/Precision, FPR, Recall, mAP

Classification: kNN, GMM Bayesian, SVM

Z. Li, Image Analysis & Retrv. 2016 p.29

Page 30: Lec12 review-part-i

Why Aggregation ?

Curse of Dimensionality

Decision Boundary / Indexing

p.30

+

…..

+

+

+

Z. Li, Image Analysis & Retrv. 2016

Page 31: Lec12 review-part-i

Histogram Aggregation: Bag-of-Words

Codebook:

Feature space: Rd, k-means to get k centroids, {𝜇1, 𝜇2, … , 𝜇𝑘}

BoW Hard Encoding:

For n feature points,{x1, x2, …,xn} assignment matrix: kxn, with column only 1-non zero entry

Aggregated dimension: k

p.31

k

n

Z. Li, Image Analysis & Retrv. 2016

Page 32: Lec12 review-part-i

Histogram Aggregation: Kernel Code Book Soft Encoding

Kernel Code Book Soft Encoding

Kernel Affinity: 𝐾 𝑥𝑗 , 𝜇𝑘 = 𝑒−𝑘|𝑥𝑗−𝜇𝑘|2

Assignment Matrix: 𝐴𝑗,𝑘 = 𝐾(𝑥𝑗 , 𝜇𝑘)/ 𝑘𝐾(𝑥𝑗 , 𝜇𝑘)

Encoding: k-dimensional: X(k)= 1

𝑛 𝑗𝐴𝑗,𝑘

p.32Z. Li, Image Analysis & Retrv. 2016

Page 33: Lec12 review-part-i

VLAD- Vector of Locally Aggregated Descriptors

Aggregate feature difference from the codebook

Hard assignment by finding the NN of feature {xk} to {𝜇𝑘}

Compute aggregated differences

L2 normalize

Final feature: k x d

p.33

3

x

v1 v2v3 v4

v5

1

4

2

5

① assign descriptors

② compute x- i

③ vi=sum x- i for cell i

𝑣𝑘 =

∀𝑗,𝑠.𝑡.𝑁𝑁 𝑥𝑗 =𝜇𝑘

𝑥𝑗 − 𝜇𝑘

𝑣𝑘 = 𝑣𝑘/ 𝑣𝑘 2

Z. Li, Image Analysis & Retrv. 2016

Page 34: Lec12 review-part-i

Fisher Vector Aggregation

FV encoding

Gradient on the mean, for GMM component k, j=1..D

In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances

p.34

𝐹𝑉 = 𝑢1, 𝑢2, … , 𝑢𝐾 , 𝑣1, 𝑣2, … , 𝑣𝐾𝑇

Richer characterization of the code book with GMM: {𝜇𝑘 , Σ𝑘 , 𝜋𝑘}

Normalize by Fisher Kernel

Z. Li, Image Analysis & Retrv. 2016

Page 35: Lec12 review-part-i

Supervector Aggregation

Assuming we have UBM GMM model

𝜆𝑈𝐵𝑀 = {𝑃𝑘 , 𝜇𝑘 , Σ𝑘},

with identical prior and covariance

Then for two utterance samples a and b, with GMM models 𝜆𝑎 = {𝑃𝑘 , 𝜇𝑘

𝑎 , Σ𝑘},

𝜆𝑏 = {𝑃𝑘 , 𝜇𝑘𝑏 , Σ𝑘},

The SV distance is,

It means the means of two models need to be normalized by the UBM covariance induced Mahanolibis distance metric

This is also a linear kernel function scaled by the UBM covariances

Z. Li, Image Analysis & Retrv. 2016 p.35

𝐾 𝜆𝑎 , 𝜆𝑏 =

𝑘

𝑃𝑘Σ𝑘−(

12)𝜇𝑘

𝑎

𝑇

( 𝑃𝑘Σ𝑘−(

12)𝜇𝑘

𝑏)

Page 36: Lec12 review-part-i

AKULA – Adaptive KLUster Aggregation

AKULA Descriptor:

No fixed codebook !

Outer loop: subspace optimization

Inner loop: cluster centroids + SIFT count

A1={yc11, yc1

2, …, yc1k ; pc1

1, pc12, …, pc1

k }

A2={yc21, yc2

2, …, yc2k ; pc2

1, pc22, …, pc2

k }

Distance metric:

Min centroids distance, weighted by SIFT count (kind of manifolds tangent distance)

d A1, A2 =1

𝑘

𝑗=0

𝑘d𝑚𝑖𝑛1 𝑗 𝑤𝑚𝑖𝑛

1 (𝑗) +1

𝑘

𝑖=0

𝑘d𝑚𝑖𝑛2 𝑖 𝑤𝑚𝑖𝑛

2 (𝑖)

d𝑚𝑖𝑛1 𝑗 = min

𝑖𝑑𝑗,𝑖

d𝑚𝑖𝑛2 𝑖 = min

𝑗𝑑𝑗,𝑖

w𝑚𝑖𝑛1 𝑗 = 𝑤𝑗,𝑖∗ , 𝑖∗ = 𝑎𝑟𝑔min

𝑖𝑑𝑗,𝑖

w𝑚𝑖𝑛2 𝑖 = 𝑤𝑗∗,𝑖 , 𝑗∗ = 𝑎𝑟𝑔min

𝑗𝑑𝑗,𝑖

p.36Z. Li, Image Analysis & Retrv. 2016

Page 37: Lec12 review-part-i

Image Retrieval Performance Metrics

True Positive Rate

False Positive Rate

Precision

Recall

F-Measure

p.37

TPR = TP/(TP+FN)

FPR = FP/(FP+TN)

Precision =TP/(TP + FP),

Recall = TP/(TP + FN)

F-measure = 2*(precision*recall)/(precision + recall)

Z. Li, Image Analysis & Retrv. 2016

Page 38: Lec12 review-part-i

Retrieval Performance: Mean Average Precision

mAP measures the retrieval performance across all queries

Image Analysis & Understanding, 2015 p.38

Page 39: Lec12 review-part-i

mAP example

MAP is computed across all query results as the average precision over the recall

Image Analysis & Understanding, 2015 p.39

Page 40: Lec12 review-part-i

Compute TPR-FPR

ROC Curve (Receiver Operating Characteristics)

p.40

Confusion/distance matrix: d(j,k)

Ground Truth: m(j,k)

𝑚 𝑗, 𝑘 = 1, 𝑑𝑜𝑐 𝑗 𝑎𝑛𝑑 𝑘 𝑎𝑟𝑒 𝑚𝑎𝑡𝑐ℎ0, 𝑛𝑜𝑡 𝑚𝑎𝑡𝑐ℎ

Z. Li, Image Analysis & Retrv. 2016

Page 41: Lec12 review-part-i

Classification in Image Retrieval

A typical image retrieval pipeline

p.41

Image Formation

Feature Computing

Feature Aggregation

Classification

Knowledge/Data Base

Z. Li, Image Analysis & Retrv. 2016

Page 42: Lec12 review-part-i

kNN Classifier

kNN is Nonlinear Classifier

Operations: majority vote

Performance bounds on k:

not worse off by 2 times the error rate of Bayesian error rate

As k increases, it improves, but not much

Accelerate NN retrieval operation

Kd-tree:

o a space partition scheme, to limit the number of data points comparison operation

Kd-tree supported kNN search:

o Approx KNN search

o Kd-Forest supported Approx Search

p.42Z. Li, Image Analysis & Retrv. 2016

Page 43: Lec12 review-part-i

Gaussian Generative Model Classifier

Shaped by the relative shape of Covariance Matrix

Matlab:

[rec_label, err]=classify(q, x, y, method);

Method = ‘quadratic’

Z. Li, Image Analysis & Retrv. 2016 p.43

Page 44: Lec12 review-part-i

The Support Vector Solution

Why called SVM ?

p.44Z. Li, Image Analysis & Retrv. 2016

K(xi, x)

Page 45: Lec12 review-part-i

Support Vector Machine

Matlab:

p.45

% train svm

svm = svmtrain(x, y, 'showplot',

true);

% svm classification

for k=1:5

figure(41); hold on; grid

on; axis([0 1 0 1]);

[u,v]=ginput(1);

q=[u,v];

plot(u, v, '*b');

rec = svmclassify(svm, q);

fprintf('\n recognized as

%c', rec);

end

Z. Li, Image Analysis & Retrv. 2016

Page 46: Lec12 review-part-i

Summary

Image Analysis & Retrieval Part I

Image Formation

Color and Texture Features

Interest Points Detection – Harris Detector

Invariant Interest Point Detection – SIFT

Aggregation

Classification tools: knn, gmm, svm

Performance metrics:

o tp, fp, tn, fn, tpr/precision, fpr, recall, mAP

Part II: Holistic approach

Just throw the pixels to the algorithm to figure out what are the good features.

Z. Li, Image Analysis & Retrv. 2016 p.46