Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH...

24
Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

Transcript of Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH...

Page 1: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

Feedforward semantic segmentation with zoom-out featuresMOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH

TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO

Page 2: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

2Main Ideas

Casting semantic segmentation as classifying a set of superpixels.

Extracting CNN features from different levels of spatial context around the superpixel at hand.

Using MLP as the classifier

Photo credit: Mostajabi et al.

Page 3: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

3Zoom-out feature extraction

Photo credit: Mostajabi et al.

Page 4: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

4Zoom-out feature extraction

Subscene Level Features Bounding box of superpixels within radius three from the superpixel

at hand

Warp bounding box to 256 x 256 pixels

Activations of the last fully connected layer

Scene Level Features Warp image to 256 x 256 pixels

Activations of the last fully connected layer

Page 5: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

5Training

Extracting the features from the mirror images and take element-wise max over the resulting two features vectors.

12416-dimensional representation for each superpixel.

Training 2 classifiers Linear classifier (Softmax)

MLP: Hidden layer (1024 neurons) + ReLU + Hidden layer (1024 neurons) with dropout

Page 6: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

6Loss Function

Imbalanced dataset Wheighted loss function

Loss function: Let be frequency of class c in the training data and

Page 7: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

7Effect of Zoom-out Levels

Photo and Table credit: Mostajabi et al.

Image Ground Truth

G1:3 G1:5 G1:5+S1 G1:5+S1+S2

Page 8: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

8Quantitative Results

Softmax Results on VOC 2012

Table credit: Mostajabi et al.

Page 9: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

9Quantitative Results MLP Results

Table credit: Mostajabi et al.

Page 10: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

10Qualitative Results

Photo credit: Mostajabi et al.

Page 11: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

11

Learning Deconvolution Network for Semantic SegmentationNOH, HONG AND HAN

POSTECH, KOREA

Page 12: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

12Motivations

Photo credit: Noh et al.

Image Ground Truth FCN Prediction

Page 13: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

13Motivations

Photo credit: Noh et al.

Page 14: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

14Deconvolution Network Architecture

Photo credit: Noh et al.

Page 15: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

15Unpooling

Photo credit: Noh et al.

Page 16: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

16Deconvolution

Photo credit: Noh et al.

Page 17: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

17Unpooling and Deconvolution Effects

Photo credit: Noh et al.

Page 18: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

18Pipeline

Generating 2K object proposals using Edge-Box and selecting top 50 based on their objectness scores.

Aggregating the segmentation maps which are generated for each proposals using pixel-wise maximum or average.

Constructing the class conditional probability map using Softmax

Apply fully-conncected CRF to the probability map.

Ensemble with FCN Computing mean of probability map generated with DeconvNet and

FCN

applying CRF.

Photo credit: Noh et al.

Page 19: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

19Training Deep Network

Adding a batch normalization layer to the output of every convolutional and deconvolutional layer.

Two-stage Training Train on easy examples first and then fine-tune with more

challenging ones.

Constructing easy examples: Crop object instances using ground-truth annotations

Limiting the variations in object location and size reduces the search space for semantic segmentation substantially

Page 20: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

20Effect of Number of Proposals

Photo credit: Noh et al.

Page 21: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

21Quantitative Results

Table credit: Noh et al.

Page 22: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

22Qualitative Results

Photo credit: Noh et al.

Page 23: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

23Qualitative Results

Examples that FCN produces better results than DeconvNet.

Photo credit: Noh et al.

Page 24: Feedforward semantic segmentation with zoom-out features MOSTAJABI, YADOLLAHPOUR AND SHAKHNAROVICH TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO.

24Qualitative Results

Examples that inaccurate predictions from our method and FCN are improved by ensemble.

Photo credit: Noh et al.