Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and...
Transcript of Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and...
![Page 1: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/1.jpg)
Joint Object Detection and Viewpoint Estimation using
CNN features
VII LSI PhD Workshop
Carlos Guindel
Intelligent Systems Laboratory · Universidad Carlos III de Madrid
Leganés · 19 June 2017
![Page 2: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/2.jpg)
Joint Object Detection and Viewpoint Estimation using
CNN features
C. Guindel, D. Martín, and J. M. Armingol, “Joint Object Detection and Viewpoint Estimation using CNN features,” in IEEE International Conference on Vehicular Electronics and Safety (ICVES), 2017.
3
![Page 3: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/3.jpg)
Outline
• Introduction
• Object detection
• Viewpoint estimation
• Results
• Conclusion
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
4
![Page 4: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/4.jpg)
Outline
• Introduction
• Object detection
• Viewpoint estimation
• Results
• Conclusion
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
5
![Page 5: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/5.jpg)
Situational awareness for vehicles
• Advanced Driver Assistance Systems (ADAS) and autonomous vehicles rely on a trustable on-board obstacle detection module.
• A precise classification of the obstacles enables accurate predictions of future traffic situations, including those involving VRU.
• Another significant cue that can be used to anticipate future events is the orientation of the objects moving on the ground plane.
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
6
![Page 6: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/6.jpg)
On-board object detection
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
7
Laser rangefinder
ComputerVision
Hand-craftedfeatures
ConvolutionalNeural Networks
Recognition Detection
![Page 7: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/7.jpg)
Faster R-CNN1
Proposal overview
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
Convolutional Layers
RGB Image Features
RPN Regions
Fast R-CNN
Refined Bounding Box
Classification
Viewpoint
8
1S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2016.
![Page 8: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/8.jpg)
Outline
• Introduction
• Object detection
• Viewpoint estimation
• Results
• Conclusion
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
9
![Page 9: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/9.jpg)
Object detection
• Traffic environments
• Diversity of agents
• Unstructured environment
• Faster R-CNN
• End-to-end feature learning
• Highly efficient
• No prior constraints about the location of objects in the image
• Meant for more than 21 classes
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2016.
10
![Page 10: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/10.jpg)
Faster R-CNN framework
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2016.
Convolutional features computed only once per image
A RPN generates proposals wrt. a fixed set of anchors
Conv. features in these regionsare pooled for classification
Parameters are learned through a multi-task loss
11
![Page 11: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/11.jpg)
Fine-tuning for traffic environments
• Optimized anchors
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
• Management of class imbalance
• Information gain multinomial logistic loss for the class inference
𝐿 = −1
𝑁
𝑛=1
𝑁
𝑘=1
𝐾
𝐻𝑙𝑛,𝑘log( Ƹ𝑝𝑛,𝑘)𝐻0,0
0⋯
0
⋮ ⋱ ⋮
0⋯
0𝐻𝐾,𝐾
Infogainmatrix
𝐿 = −1
𝑁
𝑛=1
𝑁
𝐻𝑙𝑛,𝑙𝑛log( Ƹ𝑝𝑛,𝑙𝑛)
12
![Page 12: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/12.jpg)
Outline
• Introduction
• Object detection
• Viewpoint estimation
• Results
• Conclusion
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
13
![Page 13: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/13.jpg)
Features can be learned for
multiple tasks
Viewpoint estimation
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
Convolutional Layers
RGB Image Features
RPN Regions
Fast R-CNN
Refined Bounding Box
Classification
Viewpoint
14
1
4 3
, 2
5
![Page 14: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/14.jpg)
Final estimation: Θ𝑖∗ → መ𝜃
Discrete viewpoint inference
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
𝑁𝑏 angle bins Θ𝑖 …Θ𝑁𝑏
𝑁𝑏 = 8Training: 𝜃𝑖0→ Θ𝑖
Inference output: r ∈ Δ𝑁𝑏−1
𝑟
Elements of 𝑟
15
![Page 15: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/15.jpg)
Joint detection and viewpoint estimation
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
RPN Fast R-CNN
For each anchor:• Objectness
• Predicted bounding box
For each proposal:• Class
• Bounding box refinementPer class
CNN outputs
…
𝑁𝑏
· 𝐾
Number of angle bins
𝑁𝑏 · 𝐾elementsNumber of
classes
16
• Viewpoint
![Page 16: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/16.jpg)
Joint detection and viewpoint estimation
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
17
Feature map
Proposal
Fixed sizefeat. vector
Fully connected (FC) layers
FC layer FC layer FC layer
B. Box regression
Softmax Softmax Softmax
Class ViewpointFast R-CNN
Only 𝑁𝑏 · 𝐾 · 4096new weights
![Page 17: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/17.jpg)
Loss function and training
• Approximate joint training strategy
• Unweighted muli-task loss with five components
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
Logistic lossfor RPN objectness
Smooth-L1 lossfor RPN b.box regression
Infogain lossfor class
Smooth-L1 lossfor b.box regression
Logistic lossfor viewpoint estimation
18
𝐿𝑖𝑛𝑓 𝑝𝑖𝑣𝑖 =
𝑛=1
𝑁
𝐻𝑣𝑖,𝑘 log(𝑝𝑖,𝑘)
↓ frequent → ↑ 𝐻𝑣𝑖,𝑘
Only the 𝑁𝑏 elements of the ground-truth class
![Page 18: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/18.jpg)
Outline
• Introduction
• Object detection
• Viewpoint estimation
• Results
• Conclusion
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
19
![Page 19: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/19.jpg)
Experiments
• On the KITTI Vision Benchmark Suite – Object detection set
• Training parameters:
• Scale: 500 px. in height
• 50k iter. 𝑙𝑟 = 0.001 + 50k iter. 𝑙𝑟 = 0.0001 + 50k iter. 𝑙𝑟 = 0.00001
• VGG16 architecture, initialized with ImageNet weights.
20
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
KITTI: A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 3354–3361.
VGG16 Image: https://blog.heuritech.com/2016/02/29/a-brief-report-of-the-heuritech-deep-learning-meetup-5/
![Page 20: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/20.jpg)
Experiments
• 𝑁𝑏 = 8 (resolution: 𝜋/4 rad)
• Infogain matrix values:
𝐻𝑘,𝑘 = 2 ·𝑓𝑚𝑖𝑛
𝑓𝑘
18
Evaluation criteria
• Average precision
• Average orientation similarity (performance of detection + orientation)
• Minimum overlaps established by the KITTI benchmark
21
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
Number of ocurrences of the
less frequent class
Number of instances of class 𝑘
![Page 21: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/21.jpg)
KITTI submission22
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
![Page 22: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/22.jpg)
Results (KITTI submission: FRCNN+Or)23
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
• Slightly different (generally better) than the ones in the paper:
• Used the whole KITTI training set
• Trained only with Car, Pedestrian and Cyclist
• Non-fixed weights (and bias) at the first convolutional layers
Detection (AP as %) Easy Moderate Hard
Car 89.60 78.59 68.69
Pedestrian 72.21 56.99 53.72
Cyclist 68.81 55.80 50.52
mAP 76.87 63.79 57,64
SubCNN 84.52 77.14 64.44
Det + Or (AOS as %) Easy Moderate Hard
Car 88.93 77.8 67.87
Pedestrian 67.92 52.96 49.61
Cyclist 64.90 51.47 46.48
mAOS 73.92 60.74 54,65
SubCNN 80.37 72.85 65.45
SubCNN: Y. Xiang, W. Choi, Y. Lin and S. Savarese, “Subcategory-aware Convolutional Neural Networks for Object Proposals and Detection,” in IEEE Winter Conference on Applications of Computer Vision (WACV), 2017, pp. 924-933.
![Page 23: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/23.jpg)
Comparison with published KITTI submissions
0
10
20
30
40
50
60
70
80
90
100
AO
S (
%)
TOP 10 AOS ranking · MODERATE difficulty · Only published methods
Car Pedestrians Cyclists
24
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
Global rankings (including unpublished methods): Car: 11st · Pedestrian: 9th · Cyclist: 10th
![Page 24: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/24.jpg)
Comparison with published KITTI submissions
0
0,5
1
1,5
2
2,5
3
3,5
4
4,5
5
RU
NT
IME
PE
R F
RA
ME
(S
)
Reported runtimes for TOP 10 AOS · Only published
25
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
110 ms*
*Average running time using an implementation based on py-faster-rcnn (Python & Caffe) and a NVIDIA Titan Xp donated by NVIDIA Corporation
![Page 25: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/25.jpg)
Outline
• Introduction
• Object detection
• Viewpoint estimation
• Results
• Conclusion
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
26
![Page 26: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/26.jpg)
Conclusion
• Monocular approach for object detection focused on traffic environments.
• Based on a state-of-the-art CNN and adding viewpoint inference.
• Results comparable with non-real-time sophisticated approaches.
• Orientation is a step towards a complete scene understanding.
27
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
Future work
• Fine-grained orientation inference using the cross-entropy logistic loss
𝐿 = −1
𝑛
𝑛=1
𝑛
[𝑝𝑛 log Ƹ𝑝𝑛 + 1 − 𝑝𝑛 log(1 − Ƹ𝑝𝑛)]
• Improvements:
• Network architecture
• Methods to overcome the fixed-size receptive field
![Page 27: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/27.jpg)
Other Developments
29
![Page 28: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/28.jpg)
Additional data for the CNN
• Stereo. Using the disparity map as a fourth channel
C. Guindel, D. Martín, and J. M. Armingol, “Stereo Vision-Based Convolutional Networks for Object Detection in Driving Environments,” in EUROCAST 2017 - Extended Abstracts, 2017, pp. 288–289.
• Laser. Classifying object proposals coming from the Velodyne
Work in progress!
30
VII LSI PhD workshopCarlos Guindel · ICVES 2017
![Page 29: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/29.jpg)
Teamwork31
VII LSI PhD workshopCarlos Guindel · ICVES 2017
AKA PILI, MILI & THE COAUTHORS
• Lidar-camera calibration. Solving a recurrent problem.
C. Guindel, J. Beltrán, D. Martín and F. García. “Automatic Extrinsic Calibration for Lidar-Stereo Vehicle Sensor Setups.” arXiv:1705.04085 [cs.CV], 2017.
• Didi Challenge. Faster R-CNN applied over image-like inputs (bird-view).
Proudly brought to you by J. Beltrán, D. Cruzado, F. Moreno and me.
Conv layers
RPN
Fast R-CNN
Bounding box
Class
ViewpointYawDiscrete(8 bins)
![Page 30: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/30.jpg)
Thank you for your attention!
Questions?
33
![Page 31: Joint Object Detection and Viewpoint Estimation using CNN ... · Joint Object Detection and Viewpoint Estimation using CNN features Carlos Guindel · ICVES 2017 Logistic loss for](https://reader030.fdocuments.us/reader030/viewer/2022040307/5ed3657632989a24575c5839/html5/thumbnails/31.jpg)
Precision change when introducing viewpoint
77,25
69,6
53,08
77,72
67,12
44,82
40
45
50
55
60
65
70
75
80
Car Pedestrian Cyclist
AV
ERA
GE
PR
ECIS
ION
(%
)
MODERATE DIFFICULTY LEVEL
Detection Detection+Viewpoint
34
Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017
Detection (𝚫AP as %) Easy Moderate Hard
Car +0,72 +0,47 +0,26
Pedestrian -1,85 -2,47 -3,00
Cyclist -12,16 -8,25 -8,11