Programme 2pm Introduction –Andrew Zisserman, Chris Williams 2.10pm Overview of the challenge and...
-
date post
15-Jan-2016 -
Category
Documents
-
view
212 -
download
0
Transcript of Programme 2pm Introduction –Andrew Zisserman, Chris Williams 2.10pm Overview of the challenge and...
Programme• 2pm Introduction
– Andrew Zisserman, Chris Williams
• 2.10pm Overview of the challenge and results– Mark Everingham (Oxford)
• 2.40pm Session 1: The Classification Task– Frederic Jurie presenting work by
• Jianguo Zhang (INRIA) 20 mins• Frederic Jurie (INRIA) 20 mins
– Thomas Deselaers (Aachen) 20 mins– Jason Farquhar (Southampton) 20 mins
• • 4-4.30pm Coffee break
• 4.30pm Session 2: The Detection Task– Stefan Duffner/Christophe Garcia (France Telecom) 30 mins– Mario Fritz (Darmstadt) 30 mins
• 5.30pm Discussion– Lessons learnt, and future challenges
The PASCAL Visual Object Classes Challenge
Mark EveringhamLuc Van GoolChris Williams
Andrew Zisserman
Challenge
• Four object classes– Motorbikes– Bicycles– People– Cars
• Classification– Predict object present/absent
• Detection– Predict bounding boxes of objects
Competitions
• Train on any (non-test) data– How well do state-of-the-art methods perform on
these problems?– Which methods perform best?
• Train on supplied data– Which methods perform best given specified training
data?
Data sets
• train, val, test1– Sampled from the same distribution of images– Images taken from PASCAL image databases– “Easier” challenge
• test2– Freshly collected for the challenge (mostly Google
Images)– “Harder” challenge
Training and first test set
Class Images Objects
Motorbikes 214 217
Bicycles 114 123
People 84 152
Cars 272 320
Total 684
Class Images Objects
Motorbikes 216 220
Bicycles 114 123
People 84 149
Cars 275 341
Total 689
train+val test1
Example images
Example images
Example images
Example images
Second test set
Class Images Objects
Motorbikes 202 227
Bicycles 279 399
People 526 1038
Cars 275 381
Total 1282
test2
Example images
Example images
Example images
Example images
Annotation for training
• Object class present/absent
• Sub-class labels (partial)– Car side, Car rear, etc.
• Bounding boxes
• Segmentation masks (partial)
Issues in ground truth
• What objects should be considered detectable?– Subjective judgement by size in image, level of
occlusion, detection without ‘inference’• Disagreements will cause noise in evaluation i.e. incorrectly-
judged false positives
• “Errors” in training data– Un-annotated objects
• Requires machine learning algorithms robust to noise on class labels
– Inaccurate bounding boxes• Hard to specify for some instances e.g. bicycles
• Detection threshold was set “liberally”
Results:Classification
Participantstest1 test2
Participant Motorbikes Bicycles People Cars Motorbikes Bicycles People Cars
Aachen
Darmstadt
Edinburgh
FranceTelecom
HUT
INRIA: dalal
INRIA: dorko
INRIA: jurie
INRIA: zhang
METU
MPITuebingen
Southampton
Methods
• Interest points (LoG/Harris) + patches/SIFT– Histogram of clustered descriptors
• SVM: INRIA: Dalal, INRIA: Zhang
• Log-linear model: Aachen
• Logistic regression: Edinburgh
• Other: METU
– No clustering step• SVM with other kernels: MPITuebingen, Southampton
– Additional features• Color: METU, moments: Southampton
Methods
• Image segmentation and region features: HUT– MPEG-7 color, shape, etc.– Self organizing map
• Classification by detection: Darmstadt– Generalized Hough transform/SVM verification
Evaluation
• Receiver Operating Characteristic (ROC)– Equal Error Rate (EER)– Area Under Curve (AUC)
00.1
0.20.3
0.40.50.6
0.70.8
0.91
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False Positives
Tru
e P
os
itiv
es
EER
AUC
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
false positives
true
pos
itive
s
1.1: classification: test1: motorbikes
INRIA: jurie: dcb_p2Southampton: pascal_develtestINRIA: jurie: dcb_p1INRIA: zhang: predictionSouthampton: UoS_LoG.SIFT.PLS20ppkerAachen: motorbikes-test1-n1st-1024Southampton: UoS_mhar.aff.SIFT.PLS20ppkerAachen: motorbikes-test1-ms-2048-histoHUT: hut_final1HUT: hut_final2HUT: hut_final3METU: ms_metuHUT: hut_final4MPITuebingen: Pascal_FINAL_test1Darmstadt: ISMSVMbig3Darmstadt: ISMbig3Edinburgh: Edinburgh_C_bagoffeatures_train
Competition 1: train+val/test1
• 1.1: Motorbikes
• Max EER: 0.977 (INRIA: Jurie)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
false positives
true
pos
itive
s
1.2: classification: test1: bicycles
INRIA: jurie: dcb_p2INRIA: zhang: predictionINRIA: jurie: dcb_p1Southampton: pascal_develtestAachen: bicycles-test1-n1st-1024Southampton: UoS_LoG.SIFT.PLS20ppkerSouthampton: UoS_mhar.aff.SIFT.PLS20ppkerAachen: bicycles-test1-ms-2048-histoHUT: hut_final2HUT: hut_final1HUT: hut_final3METU: ms_metuHUT: hut_final4MPITuebingen: Pascal_FINAL_test1Edinburgh: Edinburgh_C_bagoffeatures_train
Competition 1: train+val/test1
• 1.2: Bicycles
• Max EER: 0.930 (INRIA: Jurie, INRIA: Zhang)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
false positives
true
pos
itive
s
1.3: classification: test1: people
INRIA: jurie: dcb_p1INRIA: zhang: predictionINRIA: jurie: dcb_p2Southampton: pascal_develtestAachen: people-test1-ms-2048-histoAachen: people-test1-n1st-1024HUT: hut_final4HUT: hut_final1HUT: hut_final3Southampton: UoS_mhar.aff.SIFT.PLS20ppkerHUT: hut_final2Southampton: UoS_LoG.SIFT.PLS20ppkerMETU: ms_metuMPITuebingen: Pascal_FINAL_test1Edinburgh: Edinburgh_C_bagoffeatures_train
Competition 1: train+val/test1
• 1.3: People
• Max EER: 0.917 (INRIA: Jurie, INRIA: Zhang)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
false positives
true
pos
itive
s
1.4: classification: test1: cars
INRIA: jurie: dcb_p1INRIA: jurie: dcb_p2INRIA: zhang: predictionAachen: cars-test1-ms-2048-histoAachen: cars-test1-n1st-1024Southampton: pascal_develtestHUT: hut_final4HUT: hut_final2Southampton: UoS_mhar.aff.SIFT.PLS20ppkerSouthampton: UoS_LoG.SIFT.PLS20ppkerHUT: hut_final1HUT: hut_final3METU: ms_metuMPITuebingen: Pascal_FINAL_test1Edinburgh: Edinburgh_C_bagoffeatures_trainDarmstadt: ISMSVMbig4Darmstadt: ISMbig4
Competition 1: train+val/test1
• 1.4: Cars
• Max EER: 0.961 (INRIA: Jurie)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
false positives
true
pos
itive
s
2.1: classification: test2: motorbikes
INRIA: zhang: predictionAachen: motorbikes-test2-n1st-1024Aachen: motorbikes-test2-ms-2048-histoEdinburgh: Edinburgh_C_bagoffeatures_trainMPITuebingen: Pascal_FINAL_test2Darmstadt: ISMSVMbig3Darmstadt: ISMbig3HUT: hut_final4HUT: hut_final2HUT: hut_final1HUT: hut_final3
Competition 2: train+val/test2
• 2.1: Motorbikes
• Max EER: 0.798 (INRIA: Zhang)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
false positives
true
pos
itive
s
2.2: classification: test2: bicycles
INRIA: zhang: predictionAachen: bicycles-test2-ms-2048-histoAachen: bicycles-test2-n1st-1024MPITuebingen: Pascal_FINAL_test2HUT: hut_final4HUT: hut_final2Edinburgh: Edinburgh_C_bagoffeatures_trainHUT: hut_final1HUT: hut_final3
Competition 2: train+val/test2
• 2.2: Bicycles
• Max EER: 0.728 (INRIA: Zhang)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
false positives
true
pos
itive
s
2.3: classification: test2: people
INRIA: zhang: predictionAachen: people-test2-n1st-1024Aachen: people-test2-ms-2048-histoHUT: hut_final2HUT: hut_final1MPITuebingen: Pascal_FINAL_test2HUT: hut_final4HUT: hut_final3Edinburgh: Edinburgh_C_bagoffeatures_train
Competition 2: train+val/test2
• 2.3: People
• Max EER: 0.719 (INRIA: Zhang)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
false positives
true
pos
itive
s
2.4: classification: test2: cars
INRIA: zhang: predictionAachen: cars-test2-n1st-1024Aachen: cars-test2-ms-2048-histoHUT: hut_final4MPITuebingen: Pascal_FINAL_test2HUT: hut_final2Darmstadt: ISMSVMbig4HUT: hut_final1HUT: hut_final3Edinburgh: Edinburgh_C_bagoffeatures_trainDarmstadt: ISMbig4
Competition 2: train+val/test2
• 2.4: Cars
• Max EER: 0.720 (INRIA: Zhang)
Classes and test1 vs. test2
• Mean EER of ‘best’ results across classes– test1: 0.946, test2: 0.741
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
Motorbikes Bicycles People Cars
test1test2
Conclusions?
• Interest points + SIFT + clustering (histogram) + SVM did ‘best’– Log-linear model (Aachen) a close second– Results with SVM (INRIA) significantly better than
with logistic regression (Edinburgh)
• Method using detection (Darmstadt) did not do so well– Cannot exploit context (= unintended bias?) of image– Used subset of training data and is able to localize
Competitions 3 & 4
• Classification
• Any (non-test) training data to be used
• No entries submitted
Results:Detection
Participantstest1 test2
Participant Motorbikes Bicycles People Cars Motorbikes Bicycles People Cars
Aachen
Darmstadt
Edinburgh
FranceTelecom
HUT
INRIA: dalal
INRIA: dorko
INRIA: jurie
INRIA: zhang
METU
MPITuebingen
Southampton
Methods
• Generalized Hough Transform– Interest points, clustered patches/descriptors, GHT
• Darmstadt: (SVM verification stage), side views with segmentation mask used for training
• INRIA: Dorko: SIFT features, semi-supervised clustering, single detection per image
• “Sliding window” classifiers– Exhaustive search over translation and scale
• FranceTelecom: Convolutional neural network
• INRIA: Dalal: SVM with SIFT-based input representation
Methods
• Baselines: Edinburgh– Detection confidence
• class prior probability
• Whole-image classifier (SIFT + logistic regression)
– Bounding box• Entire image
• Scale-normalized mean bounding box from training data
• Bounding box of all interest points
• Bounding box of interest points weighted by ‘class purity’
Evaluation• Correct detection: 50% overlap in bounding boxes
– Multiple detections considered as (one true + ) false positives
• Precision/Recall– Average Precision (AP) as defined by TREC
• Mean precision interpolated at recall = 0,0.1,…,0.9,1
00.1
0.20.3
0.40.50.6
0.70.8
0.91
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Pre
cis
ion
Measured
Interpolated
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
recall
prec
isio
n
5.1: detection: test1: motorbikesDarmstadt: ISMbig3Darmstadt: ISMSVMbig3Edinburgh: Edinburgh_D_meanbbox_trainEdinburgh: Edinburgh_D_purityweightedmeanbbox_trainEdinburgh: Edinburgh_D_siftbbox_trainEdinburgh: Edinburgh_D_wholeimage_trainFranceTelecom: pascal_develtestINRIA: dalal: ndalal_competition_number_5INRIA: dorko: gydorko
Competition 5: train+val/test1
• 5.1: Motorbikes
• Max AP: 0.886 (Darmstadt)
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
recall
prec
isio
n
5.2: detection: test1: bicyclesEdinburgh: Edinburgh_D_meanbbox_trainEdinburgh: Edinburgh_D_purityweightedmeanbbox_trainEdinburgh: Edinburgh_D_siftbbox_trainEdinburgh: Edinburgh_D_wholeimage_train
Competition 5: train+val/test1
• 5.2: Bicycles
• Max AP: 0.119 (Edinburgh)
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
recall
prec
isio
n
5.3: detection: test1: peopleEdinburgh: Edinburgh_D_meanbbox_trainEdinburgh: Edinburgh_D_purityweightedmeanbbox_trainEdinburgh: Edinburgh_D_siftbbox_trainEdinburgh: Edinburgh_D_wholeimage_trainINRIA: dalal: ndalal_competition_number_5INRIA: dorko: gydorko
Competition 5: train+val/test1
• 5.3: People
• Max AP: 0.013 (INRIA: Dalal)
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
recall
prec
isio
n
5.4: detection: test1: carsDarmstadt: ISMbig4Darmstadt: ISMSVMbig4_2Darmstadt: ISMSVMbig4Edinburgh: Edinburgh_D_meanbbox_trainEdinburgh: Edinburgh_D_purityweightedmeanbbox_trainEdinburgh: Edinburgh_D_siftbbox_trainEdinburgh: Edinburgh_D_wholeimage_trainFranceTelecom: pascal_develtestINRIA: dalal: ndalal_competition_number_5
Competition 5: train+val/test1
• 5.4: Cars
• Max AP: 0.613 (INRIA: Dalal)
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
recall
prec
isio
n
6.1: detection: test2: motorbikesDarmstadt: ISMbig3Darmstadt: ISMSVMbig3_2Darmstadt: ISMSVMbig3Edinburgh: Edinburgh_D_meanbbox_trainEdinburgh: Edinburgh_D_purityweightedmeanbbox_trainEdinburgh: Edinburgh_D_siftbbox_trainEdinburgh: Edinburgh_D_wholeimage_trainFranceTelecom: pascal_develtestINRIA: dalal: ndalal_competition_number_6
Competition 6: train+val/test2
• 6.1: Motorbikes
• Max AP: 0.341 (Darmstadt)
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
recall
prec
isio
n
6.2: detection: test2: bicyclesEdinburgh: Edinburgh_D_meanbbox_trainEdinburgh: Edinburgh_D_purityweightedmeanbbox_trainEdinburgh: Edinburgh_D_siftbbox_trainEdinburgh: Edinburgh_D_wholeimage_train
Competition 6: train+val/test2
• 6.2: Bicycles
• Max AP: 0.113 (Edinburgh)
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
recall
prec
isio
n
6.3: detection: test2: peopleEdinburgh: Edinburgh_D_meanbbox_trainEdinburgh: Edinburgh_D_purityweightedmeanbbox_trainEdinburgh: Edinburgh_D_siftbbox_trainEdinburgh: Edinburgh_D_wholeimage_trainINRIA: dalal: ndalal_competition_number_6
Competition 6: train+val/test2
• 6.3: People
• Max AP: 0.021 (INRIA: Dalal)
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
recall
prec
isio
n
6.4: detection: test2: carsDarmstadt: ISMbig4Darmstadt: ISMSVMbig4Edinburgh: Edinburgh_D_meanbbox_trainEdinburgh: Edinburgh_D_purityweightedmeanbbox_trainEdinburgh: Edinburgh_D_siftbbox_trainEdinburgh: Edinburgh_D_wholeimage_trainFranceTelecom: pascal_develtestINRIA: dalal: ndalal_competition_number_6
Competition 6: train+val/test2
• 6.4: Cars
• Max AP: 0.304 (INRIA: Dalal)
Classes and test1 vs. test2
• Mean AP of ‘best’ results across classes– test1: 0.408, test2: 0.195
00.10.20.30.40.50.60.70.80.9
1
Motorbikes Bicycles People Cars
test1test2
Conclusions?
• GHT (Darmstadt) method did ‘best’ on classes entered– SVM verification stage effective– Limited to lower recall (by use of only side views)
• SVM (INRIA: Dalal) comparable for cars, better on test2– Smaller objects?, higher recall
• Performance on bicycles, people was ‘poor’– “Non-solid” objects, articulation?
Competition 7: any train/test1
• One entry: 7.3: people (INRIA: Dalal)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
recall
prec
isio
n
7.3: detection: test1: people
INRIA: dalal: ndalal_competition_number_5INRIA: dalal: ndalal_competition_number_7
• AP: 0.416
• Use of own training data improved results dramatically(AP: 0.013)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
recall
prec
isio
n
8.3: detection: test2: people
INRIA: dalal: ndalal_competition_number_6INRIA: dalal: ndalal_competition_number_8
Competition 8: any train/test2
• One entry: 8.3: people (INRIA: Dalal)
• AP: 0.438
• Use of own training data improved results dramatically(AP: 0.021)
Conclusions
• Classification– Variety of methods and variations on SIFT+SVM– Encouraging performance on all object classes
• Detection– Variety of methods and variations on GHT– Encouraging performance on cars, motorbikes
• People and bicycles more challenging
• Use of own training data– Only one entry (people detection), much better results
than using provided training data– State-of-the-art performance for pre-built
classification/detection remains to be assessed