Post on 12-Jan-2022
赛 灵 思 技 术 日XILINX TECHNOLOGY DAY
原钢赛灵思 AI 解决方案市场专家2009 年 3 月 19 日
基于机器学习的赛灵思 自动驾驶和ADAS 解决方案
© Copyright 2019 Xilinx
赛灵思 ADAS 市场领导地位
2015
2014
2013
CY
2016
2017
Over 12 Years Semi Supplier Heritage
CY13 - CY17 > 60% CAGR
40M+ cumulative units shipped
14 Makes - 29 Models
9 Makes - 13 Models
19 Makes - 64 Models
23 Makes - 85 Models
26 Makes - 96 Models
© Copyright 2019 Xilinx
赛灵思方案已经覆盖的应用
Auto Trailer HitchFull Display Mirror
Surround View
Front Camera - Mono
EV Car Charger SystemHeads Up Display
Driver Monitoring System
LiDAR Front Camera - Stereo
© Copyright 2019 Xilinx
赛灵思扩大汽车级 (XA)产品系列
© Copyright 2019 Xilinx
硬件可编程性成就性能更高的架构For (i=0, i< num;++i){ classification_process();hashing_process();encryption_process();
}
GPU Implementation FPGA Implementation
unloadloadKernel
Pipelining
No Kernel loading/unloading is required to run different applications Thanks to pipelining
To run different applications, GPU requires loading different kernel
Same kernel run many times using multiple small cores
A B C
A B C
A B C
A B C
Parallelizing Parallelizing
A B C
A B C
A B C
A B C
© Copyright 2019 Xilinx
OTA 硅片和动态功能
˃ Dyanmic Function eXchange (DFX) – Using the same FPGA for mutually exclusive functions– Eg: Driver monitoring and Valet Parking– Time-multiplexing hardware requires smaller FPGA– System Cost and Size Reduction with less silicon chips
˃ OTA Silicon– OEMs require OTA update to enable upgradability for new innovation in
emerging applications like Automated Driving – We provide both Software just like other SoC vendors but we can go further
by providing Hardware = OTA Silicon.
© Copyright 2019 Xilinx
˃ 2D Object DetectionVehicle: Car, SUV, Bus…Pedestrian, Cyclist, RiderTraffic -sign, Traffic-light
˃ 3D Object Detection
˃ Pose Estimation
˃ Lane Detection
˃ Drivable Space Detection
˃ Semantic Segmentation
汽车模块
© Copyright 2019 Xilinx
2D 目标检测
˃ 2D Object DetectionDetection Algorithms: SSD, TINY YOLOv2, YOLOv2, TINY YOLOv3, YOLOv3, Light-head RCNN etc.Datasets: KITTI 、Cityscapes 、BDD100K and Private data etc.
© Copyright 2019 Xilinx
2D 目标检测˃ SSD
Dataset: BDD100k and private dataCategories: Pedestrian, Car, Cyclist
GOPs(480*360)
Compress Ratio mAP(GPU)
FPS(DPU, Dual core,
ZCU102)
117 - 46.8 -
93.5 20% 46.7 -
69.7 40% 46.3 -
61 50% 48.6 -
46.9 60% 48.1 -
35.2 70% 49.4 -
28.9 75% 48.6 -
17.8 85% 47.5 -
12.1 90% 46.2 -
8.7 93% 44.3 -
6.3 95% 42.7 ~ 110 fps
© Copyright 2019 Xilinx
2D 目标检测 – 小目标检测
˃ RefineDet: Small pedestrian detectionThe original SSD model permanence on the small pedestrian dataset is 24%(MAP)Now the RefineDet model permanence is 31.8%(MAP)
˃ RefineDet Pruning
˃ FPS: baseline 210G 25fps, pruned 9.4G 101fps (ZCU102, triple B4096@330MHz)
210
103.6
51.9
17.79.4
31.8 31.7831.39
30.27 27.79
0
50
100
150
200
250
baseline 1 2 3 4
RefineDet compression
Operations(G) mAP(%)
© Copyright 2019 Xilinx
2D 目标检测
˃ YOLO2 Performance after Compression @Customer’s Data.
173
122
8669
52 4534 26.4 21.2 16.8 12.8
56.7 57.9 58.3 58.7 58.1 57.9 57.8 56.9 56.6 55.4 54.4
11.6 14.418.4 21.2
26.8 28.432.4
37.241.6 43.6 46.4
-40
-30
-20
-10
0
10
20
30
40
50
60
0
50
100
150
200
Baseline 1 2 3 4 5 6 7 8 9 10
FPS
Ope
ratio
ns(G
) / m
AP
Pruning loops
Pruning Speed up on Hardware (2xDPU@Zu9)YoloV2 single class detection @ Customer's data
Operations(G) mAP(%) fps
2.8x4x
© Copyright 2019 Xilinx
2D 目标检测˃ YOLO3 Performance of Compression
Dataset: CityscapesCategories: Pedestrian, Car, CyclistPlatform: ZCU102, triple B4096@330MHz
GOPs(512*256)
Compress Ratio mAP(DarkNet)
mAP(DPU)
FPS(DPU)
53.7 - 53.7 53.1 43
24.5 54% 53.7 53.7 61
17.0 68% 54.0 53.4 74
13.7 75% 56.1 55.4 82
10.7 80% 55.4 52.9 86
7.5 86% 57.0 55.3 93
5.7 89% 55.2 53.0 97
4.0 93% 51.2 49.3 100
© Copyright 2019 Xilinx
2D 目标检测
˃ SSD LiteBackbone : Mobilenet_v2 (Relu verison)Datasets: BDD100kInput size: 480*360,Operations: 6.57GmAP: 32.9DPU (one core) FPS: 36(ZU9), 21(ZU2)
˃ Tiny YOLO v3Datasets: KITTI ,Cityscapes ,BDD100K and Private data etc.Input size: 416*416Operations: 5.9GDPU FPS: 170 (ZU9 dual core)
© Copyright 2019 Xilinx
3D 目标检测˃ 3D Object Detection
Reproduce latest advanced 3D detection methods(F-PointNet and AVOD) combing the information of Lidar point cloud and RGB imageOptimize post processing
© Copyright 2019 Xilinx
姿态预判
˃ Driver Monitoring, Gesture Recognition
˃ Single Person Pose Estimation (After person detection)
head, neck, shoulder, elbow, wrist, hip, knee, ankleModel: CNN networks with coordinates regression300k train images, 70k test images, PCKh0.5 90.25%
˃ Multi-person Pose EstimationThis model uses heatmap to regression the joints’ location and the lines between two related jointsThe OKS of this model on AI challenger dataset is 0.32609
© Copyright 2019 Xilinx
˃ Motivation: detect lane even if the lanes are occluded by vehicles
˃ Algorithm:SCNN(left) and VPGNet (right)
˃ Dataset: SCNN: 9600 training and 1,300 test images capture from SCNN datasetVPGNet: 1000 training and 200 test images from Caltech-lane datasetInput size: SCNN (800x288), VPGNet (640x480)
车道检测
© Copyright 2019 Xilinx
˃ VPGnet compression:Dataset: 960 training and 240 test images capture from different scenesEvaluation metric: F1 scoreCompress to 10%, performance degrade 2%
车道检测- 剪枝
100
40
30
20
10
90 88.9 88.8 88.5 88
0
20
40
60
80
100
120
baseline 1 2 3 4
Operation (G) F1 score (%s)
© Copyright 2019 Xilinx
语义分割
˃ Semantic SegmentationUsing state-of-art algorithm for high performance Compress large model & try light-weight model to ensure efficiency and performance
Algorithm Input size Model backbone Operation numbers IOU(%) FPS @ Input sizeZCU9
WiderRes38 1024* 2048 wider-Resnet-38 10T 77.68 ——
SegNet 1024 * 2048 VGG 16 2.4T 56 ——
FPN-Deephi 1024 * 2048 Google_v1 136G 71.25 ——
Deeplabv3+ 1024 * 2048 Mobilenet_v2 49G 70.88 ——
ESPNet 512 * 1024 —— 9.4G 63.64 21.48 @ 256 * 512
ENet 512 * 1024 —— 9.36G 57.9 54.86 @ 256 * 512
FPN-Deephi(light weight) 256 * 512 Google_v1 9G 56.45 119 @ 256 * 512
Tiny-FPN 512 * 512 —— 1.8G 60.2 117 @ 256 * 512
© Copyright 2019 Xilinx
语义分割
˃ Semantic Segmentation
(a) Result of WiderRes38
(b) Result of FPN-Deephi (light weight)
© Copyright 2019 Xilinx
多任务学习
˃ Multi-task learningShared feature extraction backboneImprove accuracy by model architecture optimization multi-task model including 2D box detection, orientation and semantic segmentation (left)multi-task model including object detection, lane detection and drivable space detection (right)
© Copyright 2019 Xilinx
多任务学习- 剪枝
˃ Multi-task: 2D box detection, orientation and semantic segmentationDataset: BDD100k (train: 6967, test: 988)
Networks Input size Compression Ratio
Detection: mAP(IOU>0.5)
Segmentation: mIOU
Ops
VGG 288x512 Rate: 0 29% 46.9% 106.5G
Resnet50
480x640 Rate: 0 42.4% 48.3% 72.7G
480x640 Rate: 0.5 42.1% 47.1% 34.2G
480x640 Rate: 0.6 40.5% 45.8% 27.5G
480x640 Rate: 0.8 32.0% 39.6% 22.3G
Resnet18480x640 Rate: 0 26.5% 39.9% 27.7G
480x640 Rate: 0.5 24.2% 37.0% 14.0G
© Copyright 2019 Xilinx
多任务学习- 剪枝
˃ Multi-task: object detection, orientation, lane detection and drivable space detectionDataset: BDD100k (train: 6967, test: 988)
Networks Input size Compression Ratio
Detection: mAP(IOU>0.5)
Segmentation: mIOU
Ops
VGG288x512 Rate: 0 34.51% 57.43% 103.5G
288x512 Rate: 0.4 31.62% 56.35% 60.9G
288x512 Rate: 0.6 31.51% 55.42% 40.8G
Resnet18
288x512 Rate: 0 24.80% 56.26% 13.8G
288x512 Rate: 0.4 24.00% 54.83% 7.6G
288x512 Rate: 0.5 23.27% 54.30% 6.3G
288x512 Rate: 0.6 23.42% 53.58% 5.1G
Resnet50
288x512 Rate: 0 35.55% 58.81% 34.1G
288x512 Rate: 0.4 35.55% 58.09% 18.9G
288x512 Rate: 0.5 35.29% 57.61% 15.7G
288x512 Rate: 0.6 33.41% 56.80% 12.5G
© Copyright 2019 Xilinx
多任务学习: 在 ZCU102 上部署
˃ 1CH multi-task modelPlatform: ZU9 Network: ‒ ResNet 18 + 2D box detection, orientation and semantic segmentation
Input size: ‒ detection 480 * 360
Operation: ‒ detection 27.7G
FPS: ~29 fps
© Copyright 2019 Xilinx
现有客户案例Major application Functions Device CNN Demands Target Perf.
Front camera
2D object detection & classification
Zynq7020, ZU2/3/4/5
Yolo and Tiny Yolo, SSD, ResNet, Mobilenet v2
5FPS ~ 15FPS
Semantic Segmentation Zynq7020, ZU3/4 SegNet, FPN, ENet,
ESPNet 5FPS ~ 15FPS
SurroundView & Parking
Multi-channel object detection ZU5/ZU9 Yolo, SSD,
Lighthead RCN10FPS/CH ~ 30FPS/CH
LiDAR Object detection ZU3 SegNet, AVOD, F-PointNet 15FPS ~ 25FPS
L2-L4 ECU
2D and 3D object detection ZU9/ZU11
Yolo and Tiny Yolo, SSD;Complex Yolo;
10FPS/CH ~ 30FPS/CH
Semantic Segmentation ZU9/ZU11 SegNet, FPN, ENet,
ESPNet 10FPS/CH
Driver Monitoring Pose Estimation ZU9 OpenPose, 20FPS
© Copyright 2019 Xilinx
ADAS 域控制器 - 相机
Rear Camera *1 Surround view fisheye camera *4
Front Camera *2
Front Cam (near): Detection & SegmentationD Mode
R Mode
Fisheye Cam(1CH) @ turning: Segmentation
Fisheye Cam(4CH): Segmentation
赛 灵 思 技 术 日XILINX TECHNOLOGY DAY
基于机器学习的消费电子解决方案
© Copyright 2019 Xilinx
消费电子中的机器学习
Drones: obstacle recognitionSmart Appliance: intelligent controlSet Top Box: content recognition
Multi-function Printer: quality enhancement Projector: quality enhancement, SR Camcoder: scenario recognition
© Copyright 2019 Xilinx
为什么选择赛灵思?
Software programmability of an ARM®-based processor with the hardware programmability of an FPGA
Easy to design single-chip solution
Programmable hardware for diverse interface
Fusion of multi-function
© Copyright 2019 Xilinx
基于概念的 Zynq 消费电子系统架构
PSDual-core A9
PeripheralsUSB2.0 x2
GPIOsSPIs/I2Cs
DDRC
DVPInterface
DVPInterface
DMA
PS
(Motor Control)
DisplayController
MIPIDSI
UI LCD
PL
POD MotorsPOD MotorsPOD Motors
Motors POD & others
QSPIFLASH
DDR3DDR3
LEDs
POD Camera
Face Camera AXI Bus Fabric
DPU
Adaptable.Intelligent.
赛 灵 思 技 术 日XILINX TECHNOLOGY DAY