Download - ImgeClassification

Scene Classification

Thomas Atta-Fosu, Daniel Hafley,

December 15, 2014

Thomas Atta-Fosu, Daniel Hafley, Multilabel Classification Problem December 15, 2014 1 / 17

Motivation

Given a collection of images from different (more than 2) scenes, wewish to classify each image into the right category.

Figure: A collection of images to sort into different scenes

We consider a subset of 8 scenes from the SUN Dataset: Coast, Forest,Highway, Inside City, Mountain, Open Country/Countryside, Street, TallBuilding

(a) Coast (b) Forest (c) Highway (d) Inside City

(a) Mountain (b) Open Country (c) Street (d) Tall Building

Feature Vectors

Most of the ’State-of-the-art’ techniques uses the ’bag of features’obtained from samples/statistics of filter responses of the image as afeature vector (gist, SIFT descriptors). We will not discuss the’SIFT’ technique in this talk (see [David G. Lowe,2004]) We discussthe ’gist’ scheme as outlined in [Oliva & Torralba,2001]

36 Gabor filtersThomas Atta-Fosu, Daniel Hafley, Multilabel Classification Problem December 15, 2014 4 / 17

Example

(a) Image (b) 1 Response (c) Patches

For each filter there are 16 mean values, 1 for each patch. Hence there are576 features for each image.

Train and Test Set

A total of 2688 images from all 8 categories were downloaded from theSUN Dataset website.

800 Train images: 100 images from each (category/scene)

The remaining 1888 Images were used as test set.

What we tried

Shearlet Coefficients motivated by the work in [Torralba et al,2003]

In progress

Gaussian Filters

Did not perform so well (Accuracy ≈ 40)

sampling specific pixel values in each patch

Gabor Filters

What we tried

In progress

Gaussian Filters

Gabor Filters

What we tried

In progress

Gaussian Filters

Gabor Filters

What we tried

In progress

Gaussian Filters

Gabor Filters

What we tried

In progress

Gaussian Filters

Gabor Filters

Choosing a learner

K-Nearest Neighbor

Did relatively well (Accuracy ≈ 60.)Generating the confusion matrix is very easy when KNN is used

One-Vs-All: We learn classifiers for each scene category, obtaining Wj

for scene category j .Logistic Regression

Could not learn very well on the train sample.

SVM

Performed very well.But Recall rate was poor (Due to imbalance in train set.)

Modified SVM with 2 penalty terms

Barely hurt Precision. Recall rate improved.

Choosing a learner

K-Nearest Neighbor

for scene category j .

Logistic Regression

SVM

Choosing a learner

K-Nearest Neighbor

SVM

Choosing a learner

K-Nearest Neighbor

SVM

Choosing a learner

K-Nearest Neighbor

SVM

Choosing a learner

K-Nearest Neighbor

SVM

Performed very well.

But Recall rate was poor (Due to imbalance in train set.)

Choosing a learner

K-Nearest Neighbor

SVM

Choosing a learner

K-Nearest Neighbor

SVM

Metrics

Due to imbalance in the test set, We used Precision and Recall as ourMetrics. After learning in the one-vs-all scheme, Precision and Recall oneach scene was computed.

1 2 3 4 5 6 7 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Scenes

Precision by Scene

coast

forest

highway insidecity

mountain

opencountry street

tallbuilding

SVM ModNormal SVM

Metrics

1 2 3 4 5 6 7 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Scenes

Recall by Scene

coast

forest

highway

insidecity

mountainopencountry

street

tallbuilding

SVM ModNormal SVM

Moving on: From One-vs-all to an All-inclusive scheme

The goal is to predict the type of scene to which an image belongs.

Use a multiclass SVM

Use a Hierarchical scheme

Use a Voting Scheme (Motivation: Next slide)

Classification Issues

In the One-vs-All scheme, an image could be labeled into at least 2categories i.e for some test image i , Wj ·Xi + bj > 0 for at least some 2 j ′s(scenes). Or, an image may not be classified into any of the scenes at all

In such cases, we propose the following voting method rule.

Choose Scene j s.t j = argmaxj Wj · X + bj