Pixel-Level Image Understanding with Semantic Segmentation...
Transcript of Pixel-Level Image Understanding with Semantic Segmentation...
![Page 1: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/1.jpg)
Pixel-Level Image Understanding with Semantic Segmentation and Panoptic Segmentation
Hengshuang Zhao
The Chinese University of Hong Kong
May 29, 2019
![Page 2: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/2.jpg)
Part I: Semantic Segmentation
![Page 3: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/3.jpg)
Semantic Segmentation
Original Image Per-Pixel Annotation
person
horse
car
background
Images adapted from PASCAL VOC 2012Images adapted from ADE20K
![Page 4: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/4.jpg)
Fully Convolutional Network
FCN [Long et al. 2015]
![Page 5: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/5.jpg)
Conditional Random Field
DeepLabV1 [Chen et al. 2015], DPN [Liu et al. 2015], CRF-RNN [Zheng et al. 2015]
![Page 6: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/6.jpg)
Encoder-Decoder
UNet [Ronneberger et al. 2015], DeconvNet [Noh et al. 2015],SegNet [Badrinarayanan et al. 2015], LRR [Ghiasi et al. 2016],
RefineNet [Lin et al. 2017], FRRN [Pohlen et al. 2017]
![Page 7: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/7.jpg)
Atrous Convolution / Dilated Convolution
DeepLabV1 [Chen et al. 2015], Dilation [Fisher et al. 2016]
![Page 8: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/8.jpg)
Context Aggregation
Pooling: ParseNet [Liu et al. 2015], PSPNet [Zhao et al. 2017], DeepLabV2 [Chen et al. 2016]Large Kernel: GCN [Peng et al. 2017]
![Page 9: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/9.jpg)
Neural Architecture Search
Search for backbone: Auto-DeepLab [Liu et al. 2019]Search for head: DPC [Chen et al. 2018]
![Page 10: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/10.jpg)
Attention Mechanism
Spatial attention (dot product): Transformer [Vaswani et al. 2017], Non-Local-Net [Wang et al. 2018]OCNet [Yuan et al. 2018], DANet [Fu et al. 2018], CCNet [Huang et al. 2018]
Channel reweighting: SENet [Hu et al. 2018],EncNet [Zhang et al. 2018], DFN [Yu et al. 2018]
![Page 11: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/11.jpg)
Point-wise Spatial Attention Network (PSANet)
• Conv & Dilated Conv: Fixed grid, information flow restricted inside local regions
• Pooling Operation: Fixed weights at each position with none adaptively manner
• Feature Correlation: Relative position information ignored
• Point-wise Spatial Attention:
• Long-range context aggregation for dense prediction
• Bi-direction information propagation
• Self-adaptively learned and location-sensitive masks
![Page 12: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/12.jpg)
Point-wise Spatial Attention Network
![Page 13: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/13.jpg)
Point-wise Spatial Attention Network
Information collection branch
Information distribution branch
Over-completed Compact
![Page 14: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/14.jpg)
Point-wise Spatial Attention Network
Information collection branch
Information distribution branch
Over-completed Compactfeature fusion: local & global
![Page 15: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/15.jpg)
Attention Mask Generation
![Page 16: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/16.jpg)
Incorporation with FCN
![Page 17: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/17.jpg)
Result on ADE20K and VOC 2012
ADE20K: information aggregation approaches ADE20K: result on val set
PSACAL VOC 2012:result on val set PSACAL VOC 2012: result on val set
![Page 18: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/18.jpg)
Result on Cityscapes
result on val set
result on test set(train with fine set)
result on test set(train with fine+coarse set)
![Page 19: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/19.jpg)
Visual Prediction on ADE20K
![Page 20: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/20.jpg)
Visual Prediction on VOC 2012
![Page 21: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/21.jpg)
Visual Prediction on Cityscapes
![Page 22: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/22.jpg)
Mask Visualization
![Page 23: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/23.jpg)
Part II: Panoptic Segmentation
![Page 24: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/24.jpg)
Semantic Segmentation
semantic segmentation:instances indistinguishable
![Page 25: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/25.jpg)
Instance Segmentation
instance segmentation:stuff unsolved
![Page 26: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/26.jpg)
Panoptic Segmentation
panoptic segmentation:stuff and things are solved, instances distinguishable
![Page 27: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/27.jpg)
Heuristic Combination
Mask R-CNN [He et al. 2017]
PSPNet [Zhao et al. 2017]
Instance
Semantic
redundant computation for independent models
![Page 28: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/28.jpg)
Heuristic Combination
Mask R-CNN [He et al. 2017]
PSPNet [Zhao et al. 2017]
Instance
Semantic
HeuristicMerge
heuristic merge logic is not end-to-end trainable
![Page 29: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/29.jpg)
heuristic combination
![Page 30: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/30.jpg)
our end-to-end output
![Page 31: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/31.jpg)
Unified Panoptic Segmentation Network (UPSNet)
Unified Backbone NetworkSave Computation!
Pixel-wise ClassificationConsistent Estimation!
![Page 32: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/32.jpg)
Semantic & Instance Head
Semantic Head: FPN with Deformable ConvInstance Head: Same as Mask-RCNN
![Page 33: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/33.jpg)
Panoptic Head
Mask logits from Instance head
𝑌𝑖 resize/pad
𝑋thing
Thing & Stuff logitsfrom Semantic head
𝑋mask𝑖
𝑁inst
H x W
𝑋stuff𝑁stuff
H x W
Panoptic logits
max
max
1Logits for Unknown
![Page 34: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/34.jpg)
Performance Comparison
160
165
170
175
180
185
190
41.4
41.6
41.8
42
42.2
42.4
42.6
Results on COCO (800 x 1300)
0
200
400
600
800
1000
1200
57
57.5
58
58.5
59
59.5
Results on Cityscapes (1024 x 2048)
UPSNet MR-CNN-PSP UPSNet MR-CNN-PSP
![Page 35: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/35.jpg)
Detailed Result
result on COCO result on Cityscapes
result on internal datarun time comparison
![Page 36: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/36.jpg)
Visual Prediction
result on COCO
result on Cityscapes
![Page 37: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/37.jpg)
Code Resource
I. Semantic Segmentation:• Caffe:
• https://github.com/hszhao/PSPNet• https://github.com/hszhao/PSANet• https://github.com/hszhao/ICNet
• PyTorch:• https://github.com/hszhao/semseg (new)• highly optimized codebase with better reimplementation results
II. Panoptic Segmentation:• PyTorch:
• https://github.com/uber-research/UPSNet• the first open sourced codebase for unified end-to-end panoptic segmentation
![Page 38: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/38.jpg)
Remain Problem
I. Semantic Segmentation:• imbalance classes: long-tail distribution
• confusion classes: using human’s confusion matrix (e.g., ade20k) as prior
• data augmentation: adaptive augmentation or auto augmentation
• hard mining: effective while not elegant
• robustness and generalization: one model for different datasets
• accuracy and efficiency: can both be achieved?
II. Panoptic Segmentation:• introduce parameters into panoptic head (e.g., 3d Conv)
• new frameworks with a single panoptic head
![Page 39: Pixel-Level Image Understanding with Semantic Segmentation ...valser.org/webinar/slide/slides/20190529/2019.5.29 赵恒爽.pdfMay 29, 2019 · Pixel-Level Image Understanding with](https://reader036.fdocuments.us/reader036/viewer/2022071218/60515cc725ec06681a520ac5/html5/thumbnails/39.jpg)
Thanks!