Visual Object Analysis using Regions and Local Features

Visual Object Analysis using Regions and Local

FeaturesCarles Ventura Royo

Co-advisorsXavier Giró i Nieto

Verónica Vilaplana Besler

TutorFerran Marqués Acosta

Outline• Introduction• Part I: Context Analysis in semantic segmentation• Part II: Multiresolution co-clustering for uncalibrated multiview

segmentation• Conclusions

Outline• Introduction• Part I: Context Analysis in semantic segmentation• Introduction• Related Work• Contributions• Experiments• Conclusions

• Part II: Multiresolution co-clustering for uncalibrated multiview segmentation• Conclusions

segmentation• Introduction• Related Work• Contributions• Experiments• Conclusions

• Conclusions

Introduction: Semantic segmentation

Instancesegmentation

Classsegmentation

Introduction: Semantic segmentation

Part I: Single view Part II: Multiview

STATE OF THE ART

OUR RESULTS

Introduction: Visual Object Analysis

Objects Scene

Introduction: Regions

BINARY PARTITION TREE

Introduction: Regions

REGION ADJACENCY GRAPH

Introduction: Local Features

Local Features Global Features

Introduction: Local Features Aggregation• Bag of Features (BoF) [1]

vectorquantization

codebook

Bag of Features

[1] G Csurka et al, Visual Categorization with Bags of Keypoints. ECCV’04

Introduction: Local Features Aggregation• Pooling

1𝑁∑

𝑖=1

𝑥 𝑖

1𝑁∑

𝑖=1

𝑥 𝑖 𝑥𝑖𝑇

First Order Average Pooling (O1P) [1]

Second Order Average Pooling (O2P) [2]𝑥𝑖 : 𝑙𝑜𝑐𝑎𝑙 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠

No need of codebook High dimensionality

[1] Y Boureau et al, A Theoretical Analysis of Feature Pooling in Visual Recognition. ICML’10[2] J Carreira et al, Semantic segmentation with second-order pooling. ECCV’12

Part IContext analysis

in semantic segmentation

Introduction: Context

[2] A Rabinovich et al, Objects in Context. ICCV’07

Semantic context [1,2] Spatial context

[1] M Bar, Visual Objects in Context. Nature Reviews Neuroscience 2004

GOAL: Analyze the influence of the spatial context in object recognition

Related Work: Ideal scenarioGroundtruthobjectlocation

[1] J.R.R. Uijlings et al., The Visual Extent of an Object. IJCV’12

Conclusion: Aggregating the local features over three region pools (interior, border and surround) increases the performance [1]

Related Work: Realistic scenario• Pipeline [1]

Input image

Generate object

candidates

Rank object

candidates

Predict class

scores

Aggregate high-rank

candidates

[1] J Carreira et al, Object Recognition as Ranking Holistic Figure-Ground Hypotheses. CVPR’10

Semantic partition

Related Work: Realistic scenario• How is each class predictor trained? [1]

0.81790.6861

0.9013

0.73810.7105

0.6462

A SVR is used to learn the function that predicts the overlap for each class

GOAL: CHANGE SPATIAL CODIFICATION

O2PF O2PG

overlapscore

os_1os_2

SVR os = f([O2PF O2PG])

[O2PF_1 O2PG_1] [O2PF_2 O2PG_2]

[O2PF_1 O2PG_1]

[1] J Carreira et al, Semantic segmentation with second-order pooling. ECCV’12

Contributions• Figure-Border-Ground spatial pooling in the realistic scenario

os_1os_2

SVR os = f([O2PF O2PB O2PG])

[O2PF_1 O2PB_1 O2PG_1] [O2PF_2 O2PB_2 O2PG_2]

[O2PF_N O2PB_N O2PG_N]

Contributions• Contour-based spatial pyramid [1]: crown-based

os_1os_2

SVR os = f([O2PF O2PSR1 O2PSR2 O2PSR3 O2PSR4])

[O2PF_1 O2PSR1_1 O2PSR2_1 O2PSR3_1 O2PSR4_1] [O2PF_2 O2PSR1_2 O2PSR2_2 O2PSR3_2 O2PSR4_2]

[O2PF_N O2PSR1_N O2PSR2_N O2PSR3_N O2PSR4_N] [1] S Lazebnik et al, Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. CVPR’06

Contributions• Contour-based spatial pyramid [1]: Cartesian-based

os_1os_2

SVR os = f([O2PF O2PSR1 O2PSR2 O2PSR3 O2PSR4])

[O2PF_1 O2PSR1_1 O2PSR2_1 O2PSR3_1 O2PSR4_1] [O2PF_2 O2PSR1_2 O2PSR2_2 O2PSR3_2 O2PSR4_2]

[O2PF_N O2PSR1_N O2PSR2_N O2PSR3_N O2PSR4_N] [1] S Lazebnik et al, Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. CVPR’06

Experiments• Pascal VOC segmentation challenge 2011 & 2012 [1]• Train, validation and test subsets• Train: 1,112 (2011) / 1,464 (2012)• Validation: 1,111 (2011) / 1,449 (2012)• Test: 1,111 (2011) / 1,456 (2012)

• 20 semantic classes• aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, dinningtable, dog,

horse, motorbike, person, pottedplant, sheep, sofa, train, tvmonitor

• Evaluation measure: Average Accuracy Classification

[1] M Everingham et al, The PASCAL Visual Object Classes (VOC) Challenge. IJCV’10

Experiments: Local Features Aggregation• Pooling

1𝑁∑

𝑖=1

𝑥 𝑖

1𝑁∑

𝑖=1

𝑥 𝑖 𝑥𝑖𝑇

First Order Average Pooling (O1P) [1]

Second Order Average Pooling (O2P) [2]𝑥𝑖 : 𝑙𝑜𝑐𝑎𝑙 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠

No need of codebook High dimensionality

[1] Y Boureau et al, A Theoretical Analysis of Feature Pooling in Visual Recognition. ICML’10[2] J Carreira et al, Semantic segmentation with second-order pooling. ECCV’12

Experiments• Ideal scenario• Train set: train11• Test set: val11

F [1] F-B F-G [1] F-B-G

eSIFT [1] 63.9 66.2 66.4 68.6

eMSIFT [1] 64.8 68.9 67.7 70.8

F [1] F-B F-B-G

Non SP 64.8 68.9 70.8

Crown-based SP 68.7 71.1 71.7

Cartesian-based SP 67.7 71.6 72.7

Figure SP (Figure) Border Ground AAC

eSIFT+eMSIFT+eLBP eSIFT 72.98 [1]

eSIFT+eMSIFT eSIFT+eMSIFT eSIFT+eMSIFT 73.84

eSIFT+eMSIFT+eLBP eMSIFT eSIFT+eMSIFT eSIFT+eMSIFT 75.86

Experiments• Realistic scenario (CPMC [1])• Train set: train11• Test set: val11

Figure SP (Figure) Border Ground AAC

eSIFT eSIFT 28.6 [2]

eSIFT eSIFT eSIFT 34.8

eSIFT+eMSIFT+eLBP eSIFT 37.2 [2]

eSIFT eSIFT eSIFT eSIFT 37.4

eSIFT+eMSIFT+eLBP eSIFT eSIFT eSIFT 39.6

[1] J Carreira et al, Constrained parametric min-cuts for automatic object segmentation. CVPR’10

Experiments• Realistic scenario (CPMC [1])• Train set: trainval11/12• Test set: test11/12

F-G [2] F-B-G SP(F)-B-G

VOC11 38.8 43.8 40.3

VOC12 39.9 42.2 40.8

[1] J Carreira et al, Constrained parametric min-cuts for automatic object segmentation. CVPR’10

Experiments• Realistic scenario (MCG [1])• Train set: train11• Test set: val11

F-G [2] F-B-G SP(F)-B-G

CPMC 37.2 38.9 39.6

MCG 30.9 34.1 36.1

[1] P Arbeláez et al, Multiscale combinatorial grouping. CVPR’14

Experiments: Qualitative evaluationF-G F-B-G F-G F-B-G

aeroplanebicycle bicycle

cat bird

motorbike boat

bottle

busbus

motorbike car

chaircat

chair chair

horse bird

Experiments: Qualitative evaluationF-G F-B-G F-G F-B-Gchair

diningtable

cow dog

person

horseperson motorbike

motorbikemotorbike

person

pottedplant bottle

sofacat

train train

tvmonitor

Conclusions• Figure-Border-Ground spatial pooling improves the original Figure-

Ground pooling in both ideal and realistic scenarios• The Border region pool carries the richest contextual information• The Cartesian-based spatial pyramid outperforms the crown-based

spatial pyramid, but both of them may result in overfitting• Both Figure-Border-Ground pooling and Cartesian-based spatial

pyramid have been validated with MCG object candidates• Published in ICIP’15

Part IIMultiresolution co-clustering for

uncalibrated multiview segmentation

• Conclusions

IntroductionST

Introduction• First goal: improving generic segmentation• Motion-based region adjacency graph• New resolution parameterization• Relaxing hierarchical constraints with a two-step architecture• Practical framework for a global optimization

• Second goal: improving semantic segmentation• Semantic-based generic segmentation• Automatic resolution selection technique• Generic segmentation based semantic segmentation

Introduction• Co-segmentation

• Video segmentation

• Co-clustering

• Conclusions

Related Work: Co-clustering framework [1,2]• Objective: Find the clusters that define the coherent regions across

the different views at multiple resolutions

[2] D Varas et al, Multiresolution hierarchy co-clustering for semantic segmentation in sequences with small variations. ICCV’15[1] D Glasner et al, Contour-based joint clustering of multiple segmentations. CVPR’11

CO-CLUSTERED PARTITIONS

Related Work: Co-clustering framework [1,2]• Objective: Find the clusters that define the coherent regions across

the different views

view 1 view 2 view 1 view 2

LEAVES PARTITIONS CO-CLUSTERED PARTITIONS

[2] D Varas et al, Multiresolution hierarchy co-clustering for semantic segmentation in sequences with small variations. ICCV’15[1] D Glasner et al, Contour-based joint clustering of multiple segmentations. CVPR’11

Related Work: Co-clustering framework• Representation with boundary variables• Intra-image boundary variables: D1,2, D1,3, D2,3, D4,5, D5,6

• Inter-image boundary variables: D1,4, D1,5, D2,4, D2,5, D3,6

view 1 view 2 view 1 view 2

D1,2 = 0 D1,4 = 0D1,3 = 1 D1,5 = 0D2,3 = 1 D2,4 = 0D4,5 = 0 D2,5 = 0D5,6 = 1 D3,6 = 0

Related Work: Co-clustering framework• How are the values of the boundary variables chosen?

view 1 view 2

LEAVES PARTITIONS

INTRA INTERACTIONS INTER INTERACTIONS

Q1,2, Q1,3, Q2,3, Q4,5, Q5,6 Q1,4, Q1,5, Q2,4, Q2,5, Q3,6

Related Work: Co-clustering framework• Hierarchical constraint

view 1 view 2

Co-clustered partitions cannot violate the hierarchical structures

view 1 view 2

Co-clustered partitions cannot violate the hierarchical structures

Related Work: Co-clustering framework• Multiresolution parameterization

view 1 view 2

LEAVES PARTITIONS

Related Work: Co-clustering framework• Iterative approach

• Conclusions

Contribution I: Motion-based adjacency

View #i View #i-1

Contribution I: Motion-based adjacency• Similarity computation• RAG definition

View #i View #i-1

Contribution II: Resolution parameterization

view 1 view 2

LEAVES PARTITIONS…

Original parameterization

Proposed parameterization

Contribution III: Two-step iterative architecture• Hierarchical constraints are not imposed in a second step

Contribution III: Two-step iterative architecture

First step Second step

Contribution III: Two-step iterative architecture

Contribution IV: Generic global co-clustering

• All co-clustered partitions resulting from the iterative architecture are fed into a global optimization

• The reduction on the number of regions makes the global optimization feasible

Contribution V: Semantic global co-clustering

• Semantic information is introduced in the global optimization

Contribution V: Semantic global co-clustering

GENERICCO-CLUSTERING

SEMANTIC SEGMENTATIONS

SEMANTIC CO-CLUSTERING

Contribution VI: Automatic resolution selection

view 1 view 2

LEAVES PARTITIONS…

MULTIRESOLUTIONCO-CLUSTERING

• We propose a method that automatically selects the resolution that best fits with the semantic information

SEMANTICPARTITIONS

SINGLE RESOLUTIONCO-CLUSTERING

Contribution VII: Coherent semantic partitions

view 1 view 2LEAVES PARTITIONS

SEMANTIC PARTITIONS

SINGLE RESOLUTIONCO-CLUSTERING

COHERENTSEMANTIC PARTITIONS

Contribution VII: Coherent semantic partitions

STATE OF THE ART [1]

OUR RESULTS

[1] S Zheng et al, Conditional Random Fields as Recurrent Neural Networks. ICCV’15

• Conclusions

Experiments: Dataset• Multiview dataset [1]

[1] A. Kowdle et at, Multiple view object cosegmentation using appearance and stereo cues (ECCV’12)

Experiments: Generic co-clusteringCo-segmentation techniques

Video segmentation techniques

Co-clustering techniques• I-1S: Motion-compensated one-step

iterative (baseline)• I-2S: Two-step iterative• UCM+I-1S: First step is replaced by a cut

from a hierarchical segmentation algorithm• I-2S+GG: Two-step iterative followed by

generic global optimization

Experiments: Generic co-clustering

I-2S UCM+I-1S I-2S+GG

[KX12] [JBP12] [XXC12] [GKHE10] [GCS13] UCM+Pr I-1S

BMW 0.72 0.68 0.70 0.42 0.56 0.70 0.65 0.63 0.62 0.67

Chair 0.79 0.77 0.76 0.53 0.78 0.80 0.76 0.47 0.59 0.78

Couch 0.93 0.95 0.94 0.78 0.90 0.85 0.88 0.73 0.89 0.90

GardenChair 0.84 0.63 0.87 0.31 0.52 0.70 0.68 0.63 0.84 0.80

Motorbike 0.76 0.77 0.77 0.39 0.39 0.71 0.73 0.46 0.54 0.70

Teddy 0.92 0.92 0.92 0.69 0.87 0.88 0.84 0.85 0.82 0.90

Average 0.83 0.79 0.83 0.52 0.67 0.77 0.76 0.63 0.72 0.79

CO-CLUSTERING CO-SEGMENTATION VIDEO SEGMENTATION BASELINES

• Two-step iterative co-clustering techniques (I-2S and I-2S+GG) outperform other state-of-the-art techniques

Experiments: Semantic co-clusteringCo-clustering techniques• I-2S+GG(MR): Multiresolution global

generic co-clustering• I-2S+SG(MR): Multiresolution global

semantic co-clustering• I-2S+GG(SR): Single resolution global

generic co-clustering• I-2S+SG(SR): Single resolution global

semantic co-clustering

Semantic segmentation techniques• SCSS: Semantic co-clustering based

semantic segmentation• GCSS: Generic co-clustering based

semantic segmentation• [ZJRP+15]: state-of-the-art

[ZJRP+15] S Zheng et al, Conditional Random Fields as Recurrent Neural Networks. ICCV’15

Experiments: Qualitative assessment

leaves partition

I-2S I-2S+GG I-2S+SG SCSS [ZJRP+15]

leaves partition

I-2S I-2S+GG I-2S+SG SCSS

[ZJRP+15]

Occlusion/Object Boundary Detection Dataset [GVB11] Ballet and Breakdancers datasets [ZKU+04]

• Conclusions

Conclusions• The use of motion cues significantly improved the performance• The new resolution parameterization allowed us to have a more uniform

distribution of resolutions• The two-step architecture improved the performance of the original one-

step architecture • Although global optimization is now feasible, there is no clear gain for

generic co-clustering. However, it is useful for semantic co-clustering.• A small decrease in performance is achieved as a result of applying the

resolution selection technique• Submitted to ECCV’16 (waiting decision)

Future Work• Extending experiments to video datasets• VSB100 (Video Segmentation Benchmark) [1]• Cityscapes [2]

• Extending experiments to calibrated scenarios

• Training end-to-end CNNs for multiview semantic segmentation

[1] F Galasso et al, A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis. ICCV’13

[2] M Cordts et al, The cityscapes dataset for semantic urban scene understanding. CVPR’16

• Conclusions

Conclusions• Results achieved in the first part by considering new spatial

configurations are now obsolete after the outstanding results achieved by deep learning techniques.• Results from deep learning techniques were used in the second part.• The proposed multiresolution co-clustering has improved state-of-

the-art results, but we should consider an end-to-end deep learning approach to achieve a more significant improvement.• Semantic segmentation techniques evolve really fast, making this field

very competitive and challenging.

Publications• Related with the Thesis

• C. Ventura, D. Varas, X. Giro-i-Nieto, V. Vilaplana, F. Marques. Semantically driven multiresolution co-clustering for uncalibrated multiview segmentation. Submitted to the European Conference on Computer Vision (ECCV) 2016. In process of review.

• C. Ventura, X. Giro-i-Nieto, V. Vilaplana, K. McGuinness, F. Marques, Noel E O'Connor. Improving spatial codication in semantic segmentation. International Conference on Image Processing (ICIP) 2015.

• C. Ventura. Visual object analysis using regions and interest points. ACM international conference on Multimedia 2013.

Publications• Other publications:

• K. McGuinness, E. Mohedano, Z. Zhang, F. Hu, R. Albatal, Cathal Gurrin, N.E O'Connor, A. F. Smeaton, A. Salvador, X. Giro-i-Nieto, C. Ventura. Insight Centre for Data Analytics (DCU) at TRECVid 2014: instance search and semantic indexing tasks. TRECVID Workshop 2014.

• C. Ventura, V. Vilaplana, X. Giro-i-Nieto, F. Marques. Improving retrieval accuracy of Hierarchical Cellular Trees for generic metric spaces. Multimedia Tools and Applications, 2014.

• C. Ventura, X. Giro-i-Nieto, V. Vilaplana, D. Giribet, E. Carasusan. Automatic keyframe selection based on mutual reinforcement algorithm. International Workshop on Content-Based Multimedia Indexing (CBMI) 2013.

• C. Ventura, M. Tella-Amo, X. Giro-i-Nieto. UPC at MediaEval 2013 Hyperlinking Task. MediaEval 2013.

• C. Ventura, M. Martos, X. Giro-i-Nieto, V. Vilaplana, F. Marques. Hierarchical navigation and visual search for video keyframe retrieval. International Conference on Multimedia Modeling 2012.

Source: A. Oliva and A. Torralba, The role of context in object recognition

Source: T. Malisiewicz and A. A. Efros, Improving spatial support for objects via multiple segmentations.

Related Work: Realistic scenario

Source: J. Carreira et al., Semantic segmentation with second-order pooling

Input image

Object segment hypotheses

Ranked object segment hypotheses (class independent)

object plausibility

Related Work: Realistic scenario

Source: J. Carreira et al., Semantic segmentation with second-order pooling

Predict overlap estimate of each segment to each object class and sort segments by maximal score

Aggregate high-rank segments

Related Work: Realistic scenario0.8179

0.68610.9013

0.73810.7105

0.6462

TA ?0.4905

Related Work: Co-clustering framework• What are the contour elements?

view 1 view 2

LEAVES PARTITIONS Which contour elements are considered to compute Q1,4?• Contour elements of R1

• Contour elements of R4

Related Work: Co-clustering framework

INTRA INTERACTIONS INTER INTERACTIONS

LINEAR PROGRAMMING RELAXATION

Intra: Q1,2 = -0.81 Q3,4 = -0.81, Q3,5 = -0.81, Q4,5 = -0.49Inter: Q1,3 = 2.81e+03 Q1,4 = -1.36e+03 Q1,5 = -1.45e+03 Q2,3 = -2.81e+03 Q2,4 = 1.36e+03 Q2,5 = 1.45e+03

Q4,5 = -0.49 D4,5 = 1 ??𝐷4,5≤𝐷4,2+𝐷2,5

D4,2 = 0, D2,5 = 0 D4,5 = 0

PARENT NODE 11

Inter-sibling boundaries:

Intra-sibling boundaries:

Related Work: Co-clustering framework• Multiresolution parameterization

: Number of active contours to encode leave contours

: Maximum fraction to describe the r-th coarse level

: Maximum difference between consecutive levels

= 9 = 0.5 = 0.1

4.53.6

Related Work: Co-clustering framework• Iterative approach

Contribution II: Resolution parameterization

Selected inter-sibling boundaries:

Contributions• Semantic global co-clustering

1. Class assignment to regions 3. Optimization constraints• Regions from same partition

with same class

• Regions from different partitions with diferent class

2. Similarity penalizations• Regions from same partition

with different classes

Contribution VI: Automatic resolution selection• Some applications require a single resolution

l1 C1 C2U

C2 l1 or l2 ? l1

Experiments: Semantic co-clustering

Conclusions• Multiresolution co-clustering framework for uncalibrated multiview

sequences• Two-step architecture• Global optimization• Semantic-based co-clustering with resolution selection

• Submitted to ECCV’16 (waiting decision)

Conclusions• Part I: Improving spatial codification in semantic segmentation• Figure-Border-Ground in realistic scenario• Contour-based spatial pyramid

• Part II: Multiresolution co-clustering for uncalibrated multiview segmentation• Results from Part I are replaced by SoA deep learning techniques• Generic co-clustering for multiview sequences• Semantic co-clustering for multiview sequences

Visual Object Analysis using Regions and Local Features

Data & Analytics

Transcript of Visual Object Analysis using Regions and Local Features

Object-Position Binding in Visual Memory for Natural ... · Object-Position Binding in Visual Memory for Natural Scenes and Object Arrays Andrew Hollingworth ... object representations

Visual Object Tracking: review

Image-Based Visual Hulls · Visual Hull. Many researchers have used silhouette infor-mation to distinguish regions of 3D space where an object is and is not present [22] [8] [19].

Top-down facilitation of visual object recognition: object ...cvcl.mit.edu/sunseminar/pbr_06_bar.pdf · Top-down facilitation of visual object recognition: object-based and context-based

Achieving visual object constancy across plane rotation ... › ~rlawson › PDF_Files › L-ActaPsych-199… · Visual object constancy is the ability to recognise an object from

The Visual Object Tracking VOT2015 challenge resultsdata.votchallenge.net/vot2015/presentations/vot_2015_paper.pdf · The Visual Object Tracking VOT2015 challenge results Matej Kristan1,

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH.

Multiple Object Properties Drive Scene-Selective Regions...Multiple Object Properties Drive Scene-Selective Regions Vanessa Troiani, Anthony Stigliani, Mary E. Smith and Russell A.

GuessWhat?! Visual object discovery throughlcarin/Roy2.16.2018.pdf · GuessWhat?! Visual object discovery through multi-modal dialogue1 Learning Cooperative Visual Dialog Agents with

Toddler-Inspired Visual Object Learning

Single-Histogram Class Models for Image Segmentation · 2016. 9. 3. · segmentation of an image into object class regions. In this manner the compact models of visual object classes

FPGA Based Visual Object Tracking

Wilcox Perceptual Development Visual Object Permanance …

Visual Cognition II Object Perception

The Visual Object Tracking VOT2016: Challenge and resultsdata.votchallenge.net/vot2016/presentations/vot_2016_presentation.… · Visual object tracking performance measures revisited,

Visual Object Tracking

Developmental Changes in Visual Object Recognition

The Visual Object Tracking VOT2015: Challenge and resultsdata.votchallenge.net/vot2015/presentations/vot_2015_presentation.pdfThe Visual Object Tracking VOT2015: Challenge and results.

Object-Oriented Programming in Visual Basic...2010/09/17 · Object Innovations Course 4201 Student Guide Revision 4.0 Object-Oriented Programming in Visual Basic

The Visual Object Tracking VOT-TIR2015 Challenge Resultsdata.votchallenge.net/vot2015/presentations/vot_tir_2015_presentati… · The Visual Object Tracking VOT-TIR2015 Challenge