Semantic Segmentation Algorithms for Image Review of Deep ... · L.-C. Chen et al., Rethinking...
Transcript of Semantic Segmentation Algorithms for Image Review of Deep ... · L.-C. Chen et al., Rethinking...
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 1/23
Review of Deep LearningReview of Deep LearningAlgorithms for ImageAlgorithms for Image
Semantic SegmentationSemantic Segmentation
Deep Learning Working Group
Arthur OuaknineArthur Ouaknine
PhD Student
14/02/2019
valeovaleo.ai.ai
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 2/23
SummarySummary
Datasets and metrics
Review of Architectures
Comparison
1/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 3/23
Datasets and metricsDatasets and metrics
1/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 4/23
Published: 2012
Number of classes: 20
Training and validation datasets: 11kimages
Test dataset: 10k images
Evaluation: mean Intersection overUnion (mIoU)
PASCAL Visual Object Classes (PASCAL VOC)
―――http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html
2/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 5/23
PASCAL-Context
Published: 2014 (extension of PASCAL VOC 2010)
Number of classes: 400 (59 are commonly used)
Training and validation datasets: 10k/10k images
Test dataset: 10k images
Evaluation: mIoU (and others)
―――https://cs.stanford.edu/~roozbeh/pascal-context/
3/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 6/23
Common Objects in COntext (COCO)
Published: 2016/2017/2018 (two challenges: object detectionobject detection andstu� detection)
Number of classes: 80
Training and validation dataset: 118K/5K images
Test datasets: 41k (dev + challenge)
Evaluation: Average Precision (AP) and Average Recall (AR)
―――http://cocodataset.org/
4/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 7/23
Cityscapes
Published: 2016
Number of classes: 29
Training and validation datasets: 23.5k
Testing dataset: 1.5k
Evaluation: mIoU
―――https://www.cityscapes-dataset.com/
5/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 8/23
Review of Architectures
5/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 9/23
Fully Convolutional Network (FCN)
MotivationMotivation: end-to-end convolutional network
ArchitectureArchitecture:
Input: un�xed size
Layers: only convolution with skip-connexions, deconvolution for upsampling,1x1 convolution for the scores
PerformancesPerformances:
PASCAL VOC 2012: 62.2% mIoU
―――J. Long et al., Fully Convolutional Networks for Semantic Segmentation, CVPR 2015
6/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 10/23
ParseNet
MotivationMotivation: Take into account the global context of the image
ArchitectureArchitecture:
Backbone model: depending on the challenge (FCN, DeepLab) using the newmodule
ParseNet contexture module: global pooling layer, L2 norm layer and unpoollayer
PerformancesPerformances:
PASCAL-Context: 40.4% mIoU
PASCAL VOC 2012: 69.8% mIoU
―――W. Liu et al., ParseNet: Looking Wider to See Better, arXiv 2015
7/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 11/23
MotivationMotivation: Improve upsampling withfully deconv network
ArchitectureArchitecture:
Input: instance proposal
Backbone model: VGG16
Deconvolution network: VGG16-like withdeconv and unpooling layers
PerformancesPerformances:
PASCAL VOC 2012: 72.5% mIoU
Convolutional and Deconvolutional Networks
―――H. Noh et al., Learning Deconvolution Network for Semantic Segmentation, ICCV 2015
8/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 12/23
ArchitectureArchitecture:
Network with a downsampling and anupsampling parts
"Copy and crop" pathways to keep patterninformation
1x1 convolution generates the segmentationmap
PerformancesPerformances: None
U-Net
MotivationMotivation: Improve pattern localisation with very few parameters
―――O. Ronneberger et al., U-Net: Convolutional Networks for Biomedical Image Segmentation, MICCAI 2015
9/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 13/23
ArchitectureArchitecture:
Input: instance proposal from Faster R-CNN
Bottom-up and top-down pathways (factor 2)
Lateral connections (1x1 convolution)
Add MLP for segmentation
PerformancesPerformances:
COCO 2016: 48.1% AR
Feature Pyramid Network (FPN)
MotivationMotivation: Join low-resolution and high-resolution features atdi�erent scale
―――T.-Y. Lin, Feature Pyramid Networks for Object Detection, CVPR 2017
10/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 14/23
Pyramid Scene Parsing Network (PSPNet)
MotivationMotivation: Learn global patterns using region-based contextaggregation
ArchitectureArchitecture:
Backbone network: ResNet with dilated network strategy
Pyramid Pooling Module: pooling, 1x1 convolution, upsampling andconcatenation
Convolution layer to generate pixel-wise predictions
PerformancesPerformances:
PASCAL VOC 2012: 85.4% mIoU
Cityscapes: 80.2% mIoU
―――H. Zhao et al., Pyramid Scene Parsing Network, CVPR 2017
11/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 15/23
Mask R-CNN
MotivationMotivation: Multi-task network to better solve all of them
ArchitectureArchitecture:
Backbone network: Faster R-CNN
RoIAlign layer: avoid using quatization for box coordinates, bilinearinterpolation instead
3 output branches: bounding box coordinates, classi�cation, binary mask
PerformancesPerformances:
COCO 2016: 37.1% AP
COCO 2017: 41.8% AP
―――K. He et al., Mask R-CNN, ICCV 2017
12/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 16/23
MotivationMotivation: Multi-scale object and betterresolution of the intermediaterepresentations
ArchitectureArchitecture:
Backbone network: ResNet-101 with "Atrous"convolution (=dilated convolution) andwithout FC layer
Atrous Spatial Pyramid Pooling (ASPP):stacking several atrous convolution
Bilinear interpolation and Conditional RandomField (CRF)
PerformancesPerformances:
PASCAL VOC 2012: 79.7% mIoU
PASCAL-Context: 45.7% mIoU
Cityscapes: 70.4% mIoU
DeepLab Familly: DeepLab(v2)
―――L.-C. Chen et al., DeepLab: Semantic Image Segmentation withDeep Convolutional Nets, Atrous Convolution,andFully Connected CRFs, TPAMI 2017
13/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 17/23
DeepLab Familly: DeepLabv3
MotivationMotivation: Improve multi-scale context
ArchitectureArchitecture:
Backbone network: modi�ed ResNet-101
ASPP: add 1x1 convolution and batch norm
Final 1x1 convlution for pixel-wise prediction
Performances:Performances: (pretraining: ImageNet + JFT-300M)
PASCAL VOC 2012: 86.9%
Cityscapes: 81.3%
―――L.-C. Chen et al., Rethinking Atrous Convolution for Semantic Image Segmentation, arXiv 2017
14/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 18/23
ArchitectureArchitecture:
Backbone network: modi�ed Xception
Atrous separable convolution (ASPP anddecoder)
Encoder-Decoder structure to recover theboundaries
PerformancesPerformances:
PASCAL VOC 2012: 89.0%
Cityscapes: 82.1%
DeepLab Familly: DeepLabv3+
MotivationMotivation: Re�ne the segmentation around the object boundaries
―――L.-C. Chen et al., Encoder-Decoder with Atrous SeparableConvolution for Semantic Image Segmentation, ECCV2018
15/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 19/23
Path Aggregation Network (PANet)
MotivationMotivation: Enhance information propagation
ArchitectureArchitecture:
New bottom-up pathway (propagation of low level features)
Adaptative feature pooling using RoIAlign to fuse all proposals
Binary mask: FCN with 4 conv and 1 deconv + short path with FC
PerformancesPerformances:
COCO 2016: 42.0% AP
COCO 2017: 46.7% AP
―――S. Liu et al., Path Aggregation Network for Instance Segmentation, CVPR 2018
16/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 20/23
ComparisonComparison
16/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 21/23
Results
17/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 22/23
Ressources
18/18
14/02/2019 Image Segmentation [Arthur Ouaknine]
file:///home/arthurouaknine/Documents/phd/slides/image_segmentation_slides/image_segmentation.html#21 23/23
Thanks for your attention :)