Does one size really fit all? Evaluating classifiers in Bag-of-Visual-Words classification Christian...

21
Does one size really fit all? Evaluating classifiers in Bag-of-Visual-Words classification Christian Hentschel, Harald Sack Hasso Plattner Institute

Transcript of Does one size really fit all? Evaluating classifiers in Bag-of-Visual-Words classification Christian...

Does one size really fit all? Evaluating classifiers in Bag-of-Visual-Words classification

Christian Hentschel, Harald Sack

Hasso Plattner Institute

Agenda

1. Content-based Image Classification – Motivation

2. Bag-of-Visual-Words

3. Bag-of-Visual-Words Classification

■ Classifier Evaluation

■ Model Visualization

4. Conclusion

Does one size really fit all?

Content-based Image Classification

Christian Hentschel, 09-18-2014

Chart 3

Does one size really fit all?

Training:

■ Positive images:

(that depict a concept)

■ Negative images:

(that don’t)

Classification:

■ Test image if it depicts concept

(or not):

Content-based Image Classification (2)

Christian Hentschel, 09-18-2014

Chart 4

Does one size really fit all?

■ Origin - text classification

□ e.g. Task: classify forum posts into “insult” (positive) and “not insult” (negative)

Bag-of-Visual-Words

Christian Hentschel, 09-18-2014

Chart 5

"haha...at least get your insults straight you idiot!!...."

"You're one of my favorite commenters."

{ “idiot”: 1, “favorite”: 2, “to”: 3, “you”: 4, “at”: 5, “least”: 6, “commenter”: 7, …}

[1, 2, 1, 1, 2, 0, 0,…]

[1, 1, 1, 1, 0, 1, 1,…]

D1 D2

D1

D2

Does one size really fit all?

■ Learn a decision rule (e.g. linear SVM)

□ i.e. learn features weights

Bag-of-Visual-Words (2)

Christian Hentschel, 09-18-2014

Chart 6[Adopted from A. Mueller,https://github.com/amueller/ml-berlin-tutorial]

Featu

re w

eig

hts

Does one size really fit all?

■ Examples for Visual Words

Bag-of-Visual-Words (3)

Christian Hentschel, 09-18-2014

Chart 7[Schmid, 2013]

Does one size really fit all?

Bag-of-Visual-Words (4)

Christian Hentschel, 09-18-2014

Chart 8

Does one size really fit all?

■ De-facto standard: kernel-based Support Vector Machines

□ Decision rule:

□ Kernel-Function:

□ Distance metric:

Bag-of-Visual Words Classification

Christian Hentschel, 09-18-2014

Chart 9

Does one size really fit all?

■ Testing different classification models

□ Average Precision (AP, area under Precision Recall Curve)

■ Test Dataset

□ Caltech-101

– 100 + 1 object classes

– 31 – 800 images per class

■ Tested Classifiers:

□ Naïve Bayes, K-NN, Logistic Regression

□ SVM: linear SVM, RBF kernel SVM, Chi2-kernel SVM

□ Ensemble Methods:Random Forest, AdaBoost

□ Hyper parameters optimized in grid-search using CV

Bag-of-Visual Words Classification (2)

Christian Hentschel, 09-18-2014

Chart 10

Does one size really fit all?

■ Mean AP scores over all classes:

Bag-of-Visual Words Classification – Results

Christian Hentschel, 09-18-2014

Chart 12Naive Bayes

k NN

Logistic Regression

linear SVM

RBF kernel SVM

Random Forest

AdaBoost

Chi2-Kernel SVM

0.48

0.52

0.55

0.55

0.59

0.61

0.63

0.67

Does one size really fit all?

■ mAP-scores between best (Chi2-SVM) and worst (Naïve Bayes): 0.19

□ Poor performance of Naïve Bayes and k-NN – but fast training

■ Superior performance of kernel-based SVM, but:

□ Kernel function (Chi2 vs. Gaussian RBF) is crucial:

– Ensemble methods outperform Gaussian RBF

– Gaussian RBF only slightly better than linear SVM

□ increased evaluation time:

– complex kernel function between each SV and a testing example

– ensemble method reduce classification time

Bag-of-Visual Words Classification – Results (2)

Christian Hentschel, 09-18-2014

Chart 13

Does one size really fit all?

■ Correlation between training sets size and average Precision:

Bag-of-Visual Words Classification – Results (3)

Christian Hentschel, 09-18-2014

Chart 14

Does one size really fit all?

■ Outliers:

□ “minaret”

□ “leopards”

Bag-of-Visual Words Classification – Results (4)

Christian Hentschel, 09-18-2014

Chart 15

Does one size really fit all?

■ Visualize impact of individual image regions on classification result

□ Use ensemble methods

– No kernel function

– AdaBoost:direct indicator for feature importance: mean decrease in impurity

Bag-of-Visual Words Classification –Model Visualization

Christian Hentschel, 09-18-2014

Chart 16

Local Region

Descriptor

BoVW Vector

Feature Weights

“minaret”

Christian Hentschel, 09-18-2014

Does one size really fit all?

Chart 17

■ “leopards”

Christian Hentschel, 09-18-2014

Does one size really fit all?

Chart 18

■ “minaret”

Christian Hentschel, 09-18-2014

Does one size really fit all?

Chart 19

■ “car_side”

Christian Hentschel, 09-18-2014

Does one size really fit all?

Chart 20

■ “watch”

Does one size really fit all?

■ Kernel-based SVM are best choice when aiming for accuracy

□ Kernel function is crucial

□ Evaluation time-cost is high

■ Ensemble methods are second-best winner

□ Fast evaluation

□ Offer intuitive visualization of model parameters

■ Visual analytics reveal deficiencies in datasets

□ Improperly chosen training data affects classification results

Conclusion

Christian Hentschel, 09-18-2014

Chart 21

Thank you for your attention!

Christian Hentschel, Harald Sack

Hasso Plattner Institute