Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

32
Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation Carlos Guindel, David Martín and José María Armingol Intelligent Systems Laboratory (LSI) · Universidad Carlos III de Madrid Sevilla · 23 November 2017

Transcript of Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Page 1: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Modeling Traffic Scenes for Intelligent Vehicles using

CNN-based Detection and Orientation Estimation

Carlos Guindel, David Martín and José María Armingol

Intelligent Systems Laboratory (LSI) · Universidad Carlos III de Madrid

Sevilla · 23 November 2017

Page 2: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Agenda

Introduction

Obstacle detection

Scene modeling

Results

Conclusion

2

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

Page 3: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Agenda

Introduction

Obstacle detection

Scene modeling

Results

Conclusion

3

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

Page 4: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Introduction4

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

• A basic requirement for driving tasks

Obstacle detection

• An accurate estimation of the class is essential

Classification• Close-to-market

assemblies

• Rich data source

Vision-based approaches

• Feature learning

• The new paradigm in computer vision

Convolutional Neural Networks

Automated vehicles

• Highly dynamic, semi-structured environments

• They have to handle complex situations

Page 5: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

IVVI 2.0 project5

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

INTELLIGENT VEHICLE BASED ON VISUAL INFORMATION 2.0

Trinocular stereo cam.

Multi-layer lidar scanner

Side-looking cameras

Computer with GPU

+info:uc3m.es/islab

Page 6: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

System overview6

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

• Two main branches intended to run in parallel

• Obstacle detection

• Features are extracted exclusively from the left stereo image

• Scene modeling

• Stereo-based 3D reconstruction & flat-ground assumption

Page 7: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Agenda

Introduction

Obstacle detection

Scene modeling

Results

Conclusion

7

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

Page 8: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Faster R-CNN framework

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2016.

Convolutional features computed only once per image

A RPN generates proposals wrt. a fixed set of anchors

Conv. features in these regionsare pooled for classification

Parameters are learned through a multi-task loss

8

Page 9: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Viewpoint estimation

• Faster R-CNN framework was modified to introduce viewpoint inference

9

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

C. Guindel, D. Martin, and J. M. Armingol, “Joint object detection and viewpoint estimation using CNN features,” in Proc. of the IEEE International Conference on Vehicular Electronics and Safety (ICVES), 2017, pp. 145–150.

Page 10: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Final estimation: Θ𝑖∗ → መ𝜃

Discrete viewpoint inference

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

𝑁𝑏 angle bins Θ𝑖 …Θ𝑁𝑏

𝑁𝑏 = 8Training: 𝜃𝑖0→ Θ𝑖

Inference output: r ∈ Δ𝑁𝑏−1

𝑟

Elements of 𝑟

10

• Every object is assigned a bin

• Inference gives a categorial distribution

Page 11: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Joint detection and viewpoint estimation

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

11

Feature map

Proposal

Fixed sizefeat. vector

Fully connected (FC) layers

FC layer FC layer FC layer

B. Box regression

Softmax Softmax Softmax

Class Viewpoint

Only 𝑁𝑏 · 𝐾 · 4096new weights

𝑁𝑏

· 𝐾

𝑁𝑏 · 𝐾

elementsangle bins

classes

Page 12: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Fast

er R

-CN

N lo

ss

Loss function and training

• Unweighted muli-task loss with five components

Joint Object Detection and Viewpoint Estimation using CNN featuresCarlos Guindel · ICVES 2017

Logistic lossfor RPN objectness

Smooth-L1 lossfor RPN b.box regression

Logistic lossfor class

Smooth-L1 lossfor b.box regression

Logistic lossfor viewpoint

estimation

12

𝑁𝑏 angle bins of the ground truth class

Ground-truth angle bin

Normalized on the batch

Page 13: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Agenda

Introduction

Obstacle detection

Scene modeling

Results

Conclusion

13

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

Page 14: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Scene modeling14

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

Left image

Right image

DisparityP.C.

generatorDisparity

map

3D point cloud

3D RECONSTRUCTION

Page 15: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Scene modeling15

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

Left image

Right image

DisparityP.C.

generatorDisparity

map

3D point cloud

3D RECONSTRUCTION

Page 16: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Scene modeling16

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

Left image

Right image

DisparityP.C.

generatorDisparity

map

3D point cloud

3D RECONSTRUCTION

SGM stereo matchingSuitable for environments with lack of texture, illumination changes, etc.

Page 17: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Scene modeling17

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

Left image

Right image

DisparityP.C.

generatorDisparity

map

3D point cloud

3D RECONSTRUCTION

Pin-hole + disparityWe build a XYZRGB cloud

from the left image and the disparity map

Page 18: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Scene modeling18

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

Left image

Right image

DisparityP.C.

generatorDisparity

map

3D point cloud

3D point cloud

Plane

segmentation

Plane model

Calibration from plane

Camera-to-world

calibration

3D RECONSTRUCTION

EXTRINSIC PARAMETERS AUTO-CALIBRATION

Page 19: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Scene modeling19

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

3D point cloud

Plane

segmentation

Plane model

Calibration from plane

Camera-to-world

calibration

EXTRINSIC PARAMETERS AUTO-CALIBRATION

Voxel grid dowsamplingThe cloud from the 3D reconstruction pipeline is downsampled (grid size: 20 cm)

Page 20: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Scene Modeling20

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

3D point cloud

Plane

segmentation

Plane model

Calibration from plane

Camera-to-world

calibration

EXTRINSIC PARAMETERS AUTO-CALIBRATION

Planar segmentationUsing RANSAC with a 10 cm threshold, and a small angular tolerance.

…Pass through filtersVertical axis: 0-2 mDepth axis: 0-20 m

Page 21: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Scene modeling21

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

3D point cloud

Plane

segmentation

Plane model

Calibration from plane

Camera-to-world

calibration

EXTRINSIC PARAMETERS AUTO-CALIBRATION

roll

pitchPlane:

Page 22: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Object localization22

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

Page 23: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Object localization23

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

Object ROI for localization

11 central rows of the object’s bounding box

Filtered cloudWithout points belonging to the ground or too close to the camera

Organized pointcloud

Coordinates on the world’sXY plane for every point

𝒙 = (𝑥, 𝑦, 𝑧)

𝒙𝑜𝑏𝑗 = 𝑚𝑒𝑑𝑖𝑎𝑛 𝒙

𝜃𝑜𝑏𝑗 = 𝛼 − atan2(𝑦𝑜𝑏𝑗 − 𝑥𝑜𝑏𝑗)

x

𝑦𝑜𝑏𝑗

world

y

𝑥𝑜𝑏𝑗

𝜃

Top-down view

yaw

Page 24: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Agenda

Introduction

Obstacle detection

Scene modeling

Results

Conclusion

24

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

Page 25: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Results: Detection and viewpoint estimation

• KITTI Object Detection Benchmark

• 5,576 images for training and 2,065 for validation

• Labels for class and orientation available

• Evaluation metric

• Average Orientation Similarity (AOS)

25

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

Page 26: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Results: Detection and viewpoint estimation

• Two different architectures:

• ZF (lightweight) and VGG 16-layer (more complex)

• Three different scales (height in pixels):

• 375, 500, 625

26

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

88,43 66,28 63,41 N.A. N.A. N.A. 2 sec.Top-performing

comparable method in the KITTI ranking

(ms)

Page 27: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Results: Scene modeling27

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

Page 28: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Results: Scene modeling28

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

Page 29: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Results: Scene modeling29

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

Page 30: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Agenda

Introduction

Obstacle detection

Scene modeling

Results

Conclusion

30

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

Page 31: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

Conclusion

• Towards a full object-based scene understanding

• CNN-based detection and viewpoint inference

• Efficient approach: the same set of features is used for all tasks

• Stereo-vision 3D information is included for situation assessment

• Results validate our approach

31

Modeling Traffic Scenes for Intelligent Vehicles using CNN-based Detection and Orientation Estimation

C. Guindel et al. · ROBOT 2017

Code for CNN detection & viewpoints available athttps://github.com/cguindel/lsi-faster-rcnn

Future work

• New categories of traffic elements

• Extension to the time domain

• Tracking, filtering, etc.

• Including information from other perception modules

• E.g., semantic segmentation

Page 32: Modeling Traffic Scenes for Intelligent Vehicles using CNN ...

THANKS FOR

YOUR ATTENTION

23 November 2017

ROBOT'2017 - Third Iberian Robotics Conference

Carlos Guindel · cguindel.github.io · [email protected]