Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH...

Feedforward semantic segmentation with zoom-out featuresMOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH

TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

2Main Ideas

Casting semantic segmentation as classifying a set of superpixels.

Extracting CNN features from different levels of spatial context around the superpixel at hand.

Using MLP as the classifier

Photo credit: Mostajabi et al.

3Zoom-out feature extraction


4Zoom-out feature extraction

Subscene Level Features Bounding box of superpixels within radius three from the superpixel

at hand

Warp bounding box to 256 x 256 pixels

Activations of the last fully connected layer

Scene Level Features Warp image to 256 x 256 pixels

Activations of the last fully connected layer

5Training

Extracting the features from the mirror images and take element-wise max over the resulting two features vectors.

12416-dimensional representation for each superpixel.

Training 2 classifiers Linear classifier (Softmax)

MLP: Hidden layer (1024 neurons) + ReLU + Hidden layer (1024 neurons) with dropout

6Loss Function

Imbalanced dataset Wheighted loss function

Loss function: Let be frequency of class c in the training data and

7Effect of Zoom-out Levels

Photo and Table credit: Mostajabi et al.

Image Ground Truth

G1:3 G1:5 G1:5+S1 G1:5+S1+S2

8Quantitative Results

Softmax Results on VOC 2012

Table credit: Mostajabi et al.

9Quantitative Results MLP Results

Table credit: Mostajabi et al.

10Qualitative Results


11

Learning Deconvolution Network for Semantic SegmentationNOH, HONG AND HAN

POSTECH, KOREA

12Motivations

Photo credit: Noh et al.

Image Ground Truth FCN Prediction

13Motivations


14Deconvolution Network Architecture


15Unpooling


16Deconvolution


17Unpooling and Deconvolution Effects


18Pipeline

Generating 2K object proposals using Edge-Box and selecting top 50 based on their objectness scores.

Aggregating the segmentation maps which are generated for each proposals using pixel-wise maximum or average.

Constructing the class conditional probability map using Softmax

Apply fully-conncected CRF to the probability map.

Ensemble with FCN Computing mean of probability map generated with DeconvNet and

FCN

applying CRF.


19Training Deep Network

Adding a batch normalization layer to the output of every convolutional and deconvolutional layer.

Two-stage Training Train on easy examples first and then fine-tune with more

challenging ones.

Constructing easy examples: Crop object instances using ground-truth annotations

Limiting the variations in object location and size reduces the search space for semantic segmentation substantially

20Effect of Number of Proposals


21Quantitative Results

Table credit: Noh et al.


Examples that FCN produces better results than DeconvNet.



Examples that inaccurate predictions from our method and FCN are improved by ensemble.


Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH...

Documents

Transcript of Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH...