Programme 2pm Introduction –Andrew Zisserman, Chris Williams 2.10pm Overview of the challenge and...

Programme• 2pm Introduction

– Andrew Zisserman, Chris Williams

• 2.10pm Overview of the challenge and results– Mark Everingham (Oxford)

• 2.40pm Session 1: The Classification Task– Frederic Jurie presenting work by

• Jianguo Zhang (INRIA) 20 mins• Frederic Jurie (INRIA) 20 mins

– Thomas Deselaers (Aachen) 20 mins– Jason Farquhar (Southampton) 20 mins

• • 4-4.30pm Coffee break

• 4.30pm Session 2: The Detection Task– Stefan Duffner/Christophe Garcia (France Telecom) 30 mins– Mario Fritz (Darmstadt) 30 mins

• 5.30pm Discussion– Lessons learnt, and future challenges

The PASCAL Visual Object Classes Challenge

Mark EveringhamLuc Van GoolChris Williams

Andrew Zisserman

Challenge

• Four object classes– Motorbikes– Bicycles– People– Cars

• Classification– Predict object present/absent

• Detection– Predict bounding boxes of objects

Competitions

• Train on any (non-test) data– How well do state-of-the-art methods perform on

these problems?– Which methods perform best?

• Train on supplied data– Which methods perform best given specified training

data?

Data sets

• train, val, test1– Sampled from the same distribution of images– Images taken from PASCAL image databases– “Easier” challenge

• test2– Freshly collected for the challenge (mostly Google

Images)– “Harder” challenge

Training and first test set

Class Images Objects

Motorbikes 214 217

Bicycles 114 123

People 84 152

Cars 272 320

Total 684


Motorbikes 216 220

Bicycles 114 123

People 84 149

Cars 275 341

Total 689

train+val test1

Example images

Second test set


Motorbikes 202 227

Bicycles 279 399

People 526 1038

Cars 275 381

Total 1282

test2

Example images

Annotation for training

• Object class present/absent

• Sub-class labels (partial)– Car side, Car rear, etc.

• Bounding boxes

• Segmentation masks (partial)

Issues in ground truth

• What objects should be considered detectable?– Subjective judgement by size in image, level of

occlusion, detection without ‘inference’• Disagreements will cause noise in evaluation i.e. incorrectly-

judged false positives

• “Errors” in training data– Un-annotated objects

• Requires machine learning algorithms robust to noise on class labels

– Inaccurate bounding boxes• Hard to specify for some instances e.g. bicycles

• Detection threshold was set “liberally”

Results:Classification

Participantstest1 test2

Participant Motorbikes Bicycles People Cars Motorbikes Bicycles People Cars

Aachen

Darmstadt

Edinburgh

FranceTelecom

HUT

INRIA: dalal

INRIA: dorko

INRIA: jurie

INRIA: zhang

METU

MPITuebingen

Southampton

Methods

• Interest points (LoG/Harris) + patches/SIFT– Histogram of clustered descriptors

• SVM: INRIA: Dalal, INRIA: Zhang

• Log-linear model: Aachen

• Logistic regression: Edinburgh

• Other: METU

– No clustering step• SVM with other kernels: MPITuebingen, Southampton

– Additional features• Color: METU, moments: Southampton

Methods

• Image segmentation and region features: HUT– MPEG-7 color, shape, etc.– Self organizing map

• Classification by detection: Darmstadt– Generalized Hough transform/SVM verification

Evaluation

• Receiver Operating Characteristic (ROC)– Equal Error Rate (EER)– Area Under Curve (AUC)

00.1

0.20.3

0.40.50.6

0.70.8

0.91

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

False Positives

Tru

e P

os

itiv

es

EER

AUC

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

false positives

true

pos

itive

s

1.1: classification: test1: motorbikes

INRIA: jurie: dcb_p2Southampton: pascal_develtestINRIA: jurie: dcb_p1INRIA: zhang: predictionSouthampton: UoS_LoG.SIFT.PLS20ppkerAachen: motorbikes-test1-n1st-1024Southampton: UoS_mhar.aff.SIFT.PLS20ppkerAachen: motorbikes-test1-ms-2048-histoHUT: hut_final1HUT: hut_final2HUT: hut_final3METU: ms_metuHUT: hut_final4MPITuebingen: Pascal_FINAL_test1Darmstadt: ISMSVMbig3Darmstadt: ISMbig3Edinburgh: Edinburgh_C_bagoffeatures_train

Competition 1: train+val/test1

• 1.1: Motorbikes

• Max EER: 0.977 (INRIA: Jurie)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

false positives

true

pos

itive

s

1.2: classification: test1: bicycles

INRIA: jurie: dcb_p2INRIA: zhang: predictionINRIA: jurie: dcb_p1Southampton: pascal_develtestAachen: bicycles-test1-n1st-1024Southampton: UoS_LoG.SIFT.PLS20ppkerSouthampton: UoS_mhar.aff.SIFT.PLS20ppkerAachen: bicycles-test1-ms-2048-histoHUT: hut_final2HUT: hut_final1HUT: hut_final3METU: ms_metuHUT: hut_final4MPITuebingen: Pascal_FINAL_test1Edinburgh: Edinburgh_C_bagoffeatures_train


• 1.2: Bicycles

• Max EER: 0.930 (INRIA: Jurie, INRIA: Zhang)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

false positives

true

pos

itive

s

1.3: classification: test1: people

INRIA: jurie: dcb_p1INRIA: zhang: predictionINRIA: jurie: dcb_p2Southampton: pascal_develtestAachen: people-test1-ms-2048-histoAachen: people-test1-n1st-1024HUT: hut_final4HUT: hut_final1HUT: hut_final3Southampton: UoS_mhar.aff.SIFT.PLS20ppkerHUT: hut_final2Southampton: UoS_LoG.SIFT.PLS20ppkerMETU: ms_metuMPITuebingen: Pascal_FINAL_test1Edinburgh: Edinburgh_C_bagoffeatures_train


• 1.3: People

• Max EER: 0.917 (INRIA: Jurie, INRIA: Zhang)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

false positives

true

pos

itive

s

1.4: classification: test1: cars

INRIA: jurie: dcb_p1INRIA: jurie: dcb_p2INRIA: zhang: predictionAachen: cars-test1-ms-2048-histoAachen: cars-test1-n1st-1024Southampton: pascal_develtestHUT: hut_final4HUT: hut_final2Southampton: UoS_mhar.aff.SIFT.PLS20ppkerSouthampton: UoS_LoG.SIFT.PLS20ppkerHUT: hut_final1HUT: hut_final3METU: ms_metuMPITuebingen: Pascal_FINAL_test1Edinburgh: Edinburgh_C_bagoffeatures_trainDarmstadt: ISMSVMbig4Darmstadt: ISMbig4


• 1.4: Cars

• Max EER: 0.961 (INRIA: Jurie)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

false positives

true

pos

itive

s

2.1: classification: test2: motorbikes

INRIA: zhang: predictionAachen: motorbikes-test2-n1st-1024Aachen: motorbikes-test2-ms-2048-histoEdinburgh: Edinburgh_C_bagoffeatures_trainMPITuebingen: Pascal_FINAL_test2Darmstadt: ISMSVMbig3Darmstadt: ISMbig3HUT: hut_final4HUT: hut_final2HUT: hut_final1HUT: hut_final3


• 2.1: Motorbikes

• Max EER: 0.798 (INRIA: Zhang)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

false positives

true

pos

itive

s

2.2: classification: test2: bicycles

INRIA: zhang: predictionAachen: bicycles-test2-ms-2048-histoAachen: bicycles-test2-n1st-1024MPITuebingen: Pascal_FINAL_test2HUT: hut_final4HUT: hut_final2Edinburgh: Edinburgh_C_bagoffeatures_trainHUT: hut_final1HUT: hut_final3


• 2.2: Bicycles


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

false positives

true

pos

itive

s

2.3: classification: test2: people

INRIA: zhang: predictionAachen: people-test2-n1st-1024Aachen: people-test2-ms-2048-histoHUT: hut_final2HUT: hut_final1MPITuebingen: Pascal_FINAL_test2HUT: hut_final4HUT: hut_final3Edinburgh: Edinburgh_C_bagoffeatures_train


• 2.3: People


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

false positives

true

pos

itive

s

2.4: classification: test2: cars

INRIA: zhang: predictionAachen: cars-test2-n1st-1024Aachen: cars-test2-ms-2048-histoHUT: hut_final4MPITuebingen: Pascal_FINAL_test2HUT: hut_final2Darmstadt: ISMSVMbig4HUT: hut_final1HUT: hut_final3Edinburgh: Edinburgh_C_bagoffeatures_trainDarmstadt: ISMbig4


• 2.4: Cars


Classes and test1 vs. test2

• Mean EER of ‘best’ results across classes– test1: 0.946, test2: 0.741

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Motorbikes Bicycles People Cars

test1test2

Conclusions?

• Interest points + SIFT + clustering (histogram) + SVM did ‘best’– Log-linear model (Aachen) a close second– Results with SVM (INRIA) significantly better than

with logistic regression (Edinburgh)

• Method using detection (Darmstadt) did not do so well– Cannot exploit context (= unintended bias?) of image– Used subset of training data and is able to localize

Competitions 3 & 4

• Classification

• Any (non-test) training data to be used

• No entries submitted

Results:Detection

Participantstest1 test2

Participant Motorbikes Bicycles People Cars Motorbikes Bicycles People Cars

Aachen

Darmstadt

Edinburgh

FranceTelecom

HUT

INRIA: dalal

INRIA: dorko

INRIA: jurie

INRIA: zhang

METU

MPITuebingen

Southampton

Methods

• Generalized Hough Transform– Interest points, clustered patches/descriptors, GHT

• Darmstadt: (SVM verification stage), side views with segmentation mask used for training

• INRIA: Dorko: SIFT features, semi-supervised clustering, single detection per image

• “Sliding window” classifiers– Exhaustive search over translation and scale

• FranceTelecom: Convolutional neural network

• INRIA: Dalal: SVM with SIFT-based input representation

Methods

• Baselines: Edinburgh– Detection confidence

• class prior probability

• Whole-image classifier (SIFT + logistic regression)

– Bounding box• Entire image

• Scale-normalized mean bounding box from training data

• Bounding box of all interest points

• Bounding box of interest points weighted by ‘class purity’

Evaluation• Correct detection: 50% overlap in bounding boxes

– Multiple detections considered as (one true + ) false positives

• Precision/Recall– Average Precision (AP) as defined by TREC

• Mean precision interpolated at recall = 0,0.1,…,0.9,1

00.1

0.20.3

0.40.50.6

0.70.8

0.91

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Pre

cis

ion

Measured

Interpolated

0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

recall

prec

isio

n

5.1: detection: test1: motorbikesDarmstadt: ISMbig3Darmstadt: ISMSVMbig3Edinburgh: Edinburgh_D_meanbbox_trainEdinburgh: Edinburgh_D_purityweightedmeanbbox_trainEdinburgh: Edinburgh_D_siftbbox_trainEdinburgh: Edinburgh_D_wholeimage_trainFranceTelecom: pascal_develtestINRIA: dalal: ndalal_competition_number_5INRIA: dorko: gydorko


• 5.1: Motorbikes

• Max AP: 0.886 (Darmstadt)

0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

recall

prec

isio

n

5.2: detection: test1: bicyclesEdinburgh: Edinburgh_D_meanbbox_trainEdinburgh: Edinburgh_D_purityweightedmeanbbox_trainEdinburgh: Edinburgh_D_siftbbox_trainEdinburgh: Edinburgh_D_wholeimage_train


• 5.2: Bicycles

• Max AP: 0.119 (Edinburgh)

0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

recall

prec

isio

n

5.3: detection: test1: peopleEdinburgh: Edinburgh_D_meanbbox_trainEdinburgh: Edinburgh_D_purityweightedmeanbbox_trainEdinburgh: Edinburgh_D_siftbbox_trainEdinburgh: Edinburgh_D_wholeimage_trainINRIA: dalal: ndalal_competition_number_5INRIA: dorko: gydorko


• 5.3: People

• Max AP: 0.013 (INRIA: Dalal)

0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

recall

prec

isio

n

5.4: detection: test1: carsDarmstadt: ISMbig4Darmstadt: ISMSVMbig4_2Darmstadt: ISMSVMbig4Edinburgh: Edinburgh_D_meanbbox_trainEdinburgh: Edinburgh_D_purityweightedmeanbbox_trainEdinburgh: Edinburgh_D_siftbbox_trainEdinburgh: Edinburgh_D_wholeimage_trainFranceTelecom: pascal_develtestINRIA: dalal: ndalal_competition_number_5


• 5.4: Cars


0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

recall

prec

isio

n

6.1: detection: test2: motorbikesDarmstadt: ISMbig3Darmstadt: ISMSVMbig3_2Darmstadt: ISMSVMbig3Edinburgh: Edinburgh_D_meanbbox_trainEdinburgh: Edinburgh_D_purityweightedmeanbbox_trainEdinburgh: Edinburgh_D_siftbbox_trainEdinburgh: Edinburgh_D_wholeimage_trainFranceTelecom: pascal_develtestINRIA: dalal: ndalal_competition_number_6


• 6.1: Motorbikes

• Max AP: 0.341 (Darmstadt)

0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

recall

prec

isio

n

6.2: detection: test2: bicyclesEdinburgh: Edinburgh_D_meanbbox_trainEdinburgh: Edinburgh_D_purityweightedmeanbbox_trainEdinburgh: Edinburgh_D_siftbbox_trainEdinburgh: Edinburgh_D_wholeimage_train


• 6.2: Bicycles

• Max AP: 0.113 (Edinburgh)

0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

recall

prec

isio

n

6.3: detection: test2: peopleEdinburgh: Edinburgh_D_meanbbox_trainEdinburgh: Edinburgh_D_purityweightedmeanbbox_trainEdinburgh: Edinburgh_D_siftbbox_trainEdinburgh: Edinburgh_D_wholeimage_trainINRIA: dalal: ndalal_competition_number_6


• 6.3: People


0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

recall

prec

isio

n

6.4: detection: test2: carsDarmstadt: ISMbig4Darmstadt: ISMSVMbig4Edinburgh: Edinburgh_D_meanbbox_trainEdinburgh: Edinburgh_D_purityweightedmeanbbox_trainEdinburgh: Edinburgh_D_siftbbox_trainEdinburgh: Edinburgh_D_wholeimage_trainFranceTelecom: pascal_develtestINRIA: dalal: ndalal_competition_number_6


• 6.4: Cars


Classes and test1 vs. test2

• Mean AP of ‘best’ results across classes– test1: 0.408, test2: 0.195

00.10.20.30.40.50.60.70.80.9

1

Motorbikes Bicycles People Cars

test1test2

Conclusions?

• GHT (Darmstadt) method did ‘best’ on classes entered– SVM verification stage effective– Limited to lower recall (by use of only side views)

• SVM (INRIA: Dalal) comparable for cars, better on test2– Smaller objects?, higher recall

• Performance on bicycles, people was ‘poor’– “Non-solid” objects, articulation?

Competition 7: any train/test1

• One entry: 7.3: people (INRIA: Dalal)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

recall

prec

isio

n

7.3: detection: test1: people

INRIA: dalal: ndalal_competition_number_5INRIA: dalal: ndalal_competition_number_7

• AP: 0.416

• Use of own training data improved results dramatically(AP: 0.013)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

recall

prec

isio

n

8.3: detection: test2: people

INRIA: dalal: ndalal_competition_number_6INRIA: dalal: ndalal_competition_number_8

Competition 8: any train/test2

• One entry: 8.3: people (INRIA: Dalal)

• AP: 0.438

• Use of own training data improved results dramatically(AP: 0.021)

Conclusions

• Classification– Variety of methods and variations on SIFT+SVM– Encouraging performance on all object classes

• Detection– Variety of methods and variations on GHT– Encouraging performance on cars, motorbikes

• People and bicycles more challenging

• Use of own training data– Only one entry (people detection), much better results

than using provided training data– State-of-the-art performance for pre-built

classification/detection remains to be assessed

Programme 2pm Introduction –Andrew Zisserman, Chris Williams 2.10pm Overview of the challenge and...

Documents

Transcript of Programme 2pm Introduction –Andrew Zisserman, Chris Williams 2.10pm Overview of the challenge and...