Sebastien Domine, May 2017 -...

Post on 22-Jul-2020

2 views 0 download

Transcript of Sebastien Domine, May 2017 -...

Sebastien Domine, May 2017

S7519: DEVELOPER TOOLS FOR AUTOMOTIVE, DRONES AND INTELLIGENT CAMERA APPLICATIONS

2

AGENDA

Some Context

Development Flows and Challenges

Hardware and Software Topologies

Soul Use Cases

Developer Tools Support

Conclusion and Q&A

3

INTELLIGENT SYSTEMSAI at the Edge

Industrial InspectionSearch and Rescue

Package DeliveryFactory AutomationEnterprise Collaboration Public Safety

Personal Assist

Service Robotics

Portable Medical Self Driving Car Driver Assistance

4

CHARACTERISTICS

Smart Computer with Machine Learning capabilities – training and/or inference

Real-Time constrains

Multiple sensors

Networked

Power limits

What is common to Automotive, Drones and IVA solutions

5

TYPICAL TASKS

Object Detection Feature Detection Localization Path Planning

Real Time

6

EMBEDDED SOFTWARE DEVELOPMENT WORKFLOW

Software Development

Toolchain Setup

Cross-compilation

Porting

Debugging

CPU/GPU

Remote

Debugging

Profiling

System/CPU/GPU/IO/…

Remote

Profiling

Running

Ship it!

DriveInstall

JetPack

Nsight EE

Eclipse

Tegra/Linux

Graphics

Debugger

Tegra/Linux

Graphics

Debugger

CUDA Visual Profiler

Tegra

System Profiler

Cuda-gdb

PerfWorks

nvprof

CUPTI

Cuda-memcheck

Nsight EE

Desktop

Tools

7NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

GETTING STARTED…

Jump starts developing for Embedded platforms

Installs Linux ARM cross-compilation tool chain

Installs Developer tools, CUDA, Libraries,…

Flashes Drive PX, Jetson OS Images

Reference documentation and samples

Compiles code samples, pushes them to devkit

And Runs one sample…

JetPack Installer For Jetson and DriveInstall For DRIVE

8

NVIDIA® NSIGHT™Homogeneous application development for

CPU+GPU compute platforms

CUDA-Aware Editor CUDA Debugger

CPU+GPU

CUDA Profiler

9

NSIGHT ECLIPSE EDITION NEXT-GEN

• True plug-in to Eclipse

• CUDA-GDB upgrade to GDB 7.12 source base

• Edit, build, debug and profile CUDA-C applications

• CUDA aware source code editor – syntax highlighting, code completion and inline help

• Debugger - Seamless and simultaneous debugging of both CPU and GPU code

• Profiler integration – Launch visual profiler as an external application with the CUDA application built in this IDE to easily identify performance bottlenecks

Shipping with CUDA 9.0

10

ECLIPSE INTEGRATION

• Required Eclipse version 4.4 or 4.5

• Developed based on Eclipse CDT/DSF framework.

• Using Eclipse remote system explorer(RSE) plugins to connect to the remote devices.

• Nsight EE plugins are bundled as an archive file(zip) and can be installed using standard Eclipse plugins install dialog.

• The dependent plugins (CDT/RSE) will be automatically installed.

• It can coexist with other eclipse plugins in the user environment.

Plugins can be installed on any standard eclipse

11

Visual Profiler

Trace CUDA activities

Profile CUDA kernels

Correlate performance instrumentation with source code

Expert-guided performance analysis

NVPROF

Collect performance events and metrics

GPU Library Advisor

Detect CUDA library optimization opportunities

NVDISASM, CUOBJDUMP

CUDA-MEMCHECK

Detect out-of-bounds memory accesses

Detect race condition in memory accesses

Detect uninitialized variable accesses

Detect incorrect GPU thread synchronization

CUDA-GDB

Debug CUDA kernels with CLI

Debug CPU and GPU code

CPU and GPU core dump support

CUDA STANDALONE TOOLS

12

NVIDIA JETSON TX2

Memory

Storage

Wifi

Jetson TX2USB

HDMI

A57 A57 A57 A57

Denver Denver

Pascal iGPU

TX2

Jetson TX2 Developer Kit

GB/E

CSI

CSI

Video Dec/Enc

13

Deep Learning

TensorRT

cuDNN

Computer Vision

VisionWorks

OpenCV

Graphics Media

Multimedia API

Vulkan

OpenGL

CUDA

JETSON SOFTWARE

libargus

Video API

Linux4Tegra (Ubuntu 16.04), ROS Support

CUDA Accelerated libraries

14

IVA APPLICATIONSample of Complex Application

raw vidVid

conv

Vid

conv

Vid

conv

TensorRT

Classifier

Tracking

bbox AnalyticsVid

conv

TensorRT

Attribute

Detector

OSD

TensorRT

Attribute

Detector

TensorRT

Attribute

Detector

Display

15

ENHANCED TOOLS EXPERIENCE

Application source code decoration and instrumentation

• Highlight execution phases, mark resource utilization

• Visualize in all NVIDIA Developer Tools

Features:

• Markers, nested ranges, and resource naming

• Color, payload, and text

NVIDIA Tools eXtension (NVTX)

nvtxRangePushA("Compute Work");

nvtxRangePushA("Sobel");

nvtxRangePop();

nvtxRangePushA(“CubeGen");

nvtxRangePop();

nvtxRangePop();

nvtxRangeId_t rid_A = nvtx::RangeStart(nvtx::Attributes()

.category(CATEGORY_CUDA_MEMORY)

.color(COLOR_RED).message(“A”));

cuMemAlloc(&d_A, mem_size_A);

cuMemFree(d_A);

nvtx::RangeEnd(rid_A);

16

TEGRA SYSTEM PROFILER

Visualize multi-core CPU and GPU activities w/ timeline view

Visualize thread state

Thread core migration

Time range filtering

Trace CUDA & OpenGL/ES API calls

Trace GPU compute & graphics workloads

NVIDIA Tools eXtension (NVTX) support

Multi-core CPU profiler and System Trace

17

DEMO IVA APP

- CPU utilization

- Thread /Core affinity migration

- NVTX

- CUDA API and workload trace w/ correlation

- OpenGL API and workload trace w/ correlation

- Gpu process trace

18

CPU UTILIZATION

CPU Core Utilization

Thread Utilization

Core Occupancy

Thread State

19

BLOCKED STATE BACKTRACE

Diagnose issues with blocking

calls, sched_yield, sleep, etc.

Including poor GPU API usage!!!

20

NVTX

21

SYSTEM TRACECUDA & OPENGL & NVTX Trace timeline

Graphics API calls

CPU CUDA API

invocations

GPU CUDA events

NVTX API

22

CALL-STACK SAMPLING

Hot functions filtered

by timeline range

23

GPU CONTEXT SWITCH

24

DRIVE AUTOCRUISE

A57 A57 A57 A57

Denver Denver

Pascal iGPU

TX2

Video Dec/Enc

Memory

Storage

Wifi

USB

GB/E

CSI

CSI

CAN

25

NVIDIA AUTOCHAUFFEUR

A57 A57 A57 A57

Denver Denver

Pascal iGPU

TX2

Video Dec/Enc

A57 A57 A57 A57

Denver Denver

Pascal iGPU

TX2

Video Dec/Enc

USB

GB/ECSICSI CAN

Aurix

Pascal dGPU Pascal dGPU

26

DRIVE Hypervisor

DRIVE SOFTWARE

DRIVE Linux(Ubuntu 16.04) Guest OSes

Deep Learning

TensorRT

cuDNN

Computer Vision

VisionWorks

Graphics Media and Sensor

DriveWorks

OpenGL-ES

OpenGL / Vulkan

CUDA

NVMEDIA

CUDA Accelerated libraries

27

NVIDIA HYPERVISOR ARCHITECTURE

Tegra™ Hardware (ARM, GPU & SoC Peripherals)

DRIVE Hypervisor

Hypervisor

Reso

urc

e M

anager

Serv

er

I/O

Serv

er

Part

itio

n M

onit

or

Guest OS 0 Guest OS 1 Guest OS 2

Earl

y B

oot

Part

itio

n

28

MULTI-OS SYSTEM ARCHITECTURE

QNX RTOS for cluster & HUD

Linux with Genivi for IVI

Linux with Co-pilot

Android for application sandboxing

Foundation type-1 hypervisor

& services

Sandbox

Foundation – DRIVE Hypervisor

ClusterFoundation- Secure boot loader

- Trusted Execution

Environment

- Secure partition

Loader

- Monitor partition

Co-Pilot

29

DRIVER ASSISTANCE“Co-Pilot / KITT”

raw

vid

Vid

convTensorRT

Head pose

GRID Objects

CAN

GPS

Risk

Assessment

ModuleUI

TensorRT

FaceID

TensorRT

Gaze

TensorRT

Eye

Openness

TensorRT

Lip reading

Speech

Engine

Navraw

vid

Vid

conv

30

Sensor Fusion

AUTONOMOUS DRIVING“RoadRunner”

Vid

conv

TensorRT

Lane

Detection

raw

vid

Vid

conv

TensorRT

Object

Detection

raw

vid

Path

PlanningLocalization

HD Mapping

Vehicle State I/O

Prediction

Engine

Driver Assistance

Car Control

System

Actuators

Sensor

Data

Filtering

point

cloud

Tracking

31

DEMO

- Multi-process

- Multi-OS

- Multi-node

- Discreet Pascal GPU and integrated Pascal GPU

- Hypervisor event trace

DrivePX2 - RoadRunner and Co-Pilot / IVI

32

MULTI-PROCESS TRACEAll Processes running

during the capture

Low-impact thread and

processes are filtered out

Kernel=Red

Requires Root

Kernel=Red

Requires Root

33

MULTI-OSLINUX+LINUX on 1 Tegra SoC

1 OS with 4 CPU Cores

1 OS with 2 CPU Cores

34

MULTI-NODE

1 OS per SoC1 OS per SoC

35

MULTI-GPUiGPU and dGPU

1 Process using 2 GPUs

GPU Process Trace

dGPU and iGPU

36

TEGRA SYSTEM PROFILER NEXTHypervisor Event Trace

37

TEGRA GRAPHICS DEBUGGERNext-gen graphics development tools

Supports OpenGL ES 2.0/3.0/3.1/3.2 + Android Extension Pack, OpenGL 4.x

Monitor key software and hardware performance metrics

Debug draw calls, related states and resources

Live capture of a single rendering frame

Automatic GPU bottleneck analysis

38

TEGRA GRAPHICS DEBUGGER

Performance Monitor

Range Profiler

Automated bottleneck analysis

Shader performance analysis

Offline perf simulation

Dynamic Shader editor

Advanced & Targeted GPU ProfilingSelect section of

interest based on scene

ranges, render targets

used, etc.Overview of selected

range, including time

spent, call count, etc.

Show efficiency

of pipeline units’

usage

Break down

memory subsystem

utilization

39

FUTURE DEVELOPER TOOLS

Additional improvement and unification for the out-of-box experience

Ubuntu 16.04 as a host OS

Better cross-compilation support

Hypervisor developer tools support

More HW units to be traced

More consistency of the developer tools offering across Desktop and Tegra

Vulkan support

40

CONCLUSION

Complete Developer Tools offering for Application Development on Heterogeneous Platform

Extensive coverage for devkit topologies, HW units and SW stack in system trace

Developer Tools support for soul use cases for each platform

41

Q&A