Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video...

29
Embedded Computing without Compromise GTC Israel 2017 Evolution of the Rugged GPGPU Computer Session: SIL7127 Dan Mor PLM -Aitech Systems

Transcript of Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video...

Page 1: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

Embedded Computing

without Compromise

GTC Israel 2017

Evolution of the Rugged

GPGPU Computer Session: SIL7127

Dan Mor – PLM -Aitech Systems

Page 2: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

2

Aitech

Embedded Computing

without Compromise

Agenda

• Current Aitech GPGPU systems

• NVIDIA Jetson TX1 and TX2 evaluation

• Conclusions

• New Aitech Products

Page 3: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

3

Aitech

Embedded Computing

without Compromise

GPGPU Product Line

Page 4: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

4

Aitech

Embedded Computing

without Compromise

Current Aitech GPGPU Products

Page 5: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

5

Aitech

Embedded Computing

without Compromise

Power

Supply

C873

4th Gen. Core i7

SBC

C530

GPGPU

Board

Frame

Grabber

Mezzanine

J3

RG

BH

V

4/7

2/0

DV

I/H

DM

I

RG

BH

V

18–

36V

In

pu

t P

ow

er

On-Board SSD

2

SD-SDI

Composite Video

US

B

Gig

ab

it E

the

rne

t

Se

ria

l

2 2

2.5" SSD(optional)

J1 J2

DV

I/H

DM

I

PCIe x8SATA

A191 Block Diagram

Page 6: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

6

Aitech

Embedded Computing

without Compromise

We need SwaP System…

Page 7: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

7

Aitech

Embedded Computing

without Compromise

SFF - 50x87mm

SoM with Linux support

Good for SWaP systems

Supercomputing performance

Quad-core ARM® Cortex®-A57 CPUs

GPU - NVIDIA Maxwell™, 1 TFLOP/s with 256 CUDA® Cores

Jetson TX1

Page 8: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

8

Aitech

Embedded Computing

without Compromise

400-pin board-to-board connector

pin-out will be backward-compatible with future versions

draws as little as 1 watt of power or lower while idle

8-10 watts under typical CUDA load

up to 15 watts TDP when the module is fully utilized

automatically scaling of CPU,GPU, memory

1 TFLOPS (GTX 770M is 1.36 TFLOPS)

HW encoder (H264/H265) and decoder

4K video processing MIPI CSI x4 cameras or six CSI x2 cameras

Page 9: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

9

Aitech

Embedded Computing

without Compromise

Jetson TX1 Evaluation - Non-Graphical Benchmark

The smaller is the number – the faster is calculation on GPU using CUDA. “TX1 – Max” is Jetson TX1 running with maximum GPU frequency

C873 & C530 which is about 120 Watts, only x 1.8 faster than Jetson TX1 which is only 15 Watt

Page 10: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

10

Aitech

Embedded Computing

without Compromise

Jetson TX1 Evaluation - Conclusions

Jetson TX1 get a real boost in rendering and CUDA calculation power

CUDA calculation performance

TX1 vs TK1 – x 2 to x 4 for TX1

TX1 vs C873&C530 (770M) – only x 1.8 for C873&C530 (770M)

If Linux is not an obstacle for our customers, Jetson TX1 based product will be success

Page 11: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

11

Aitech

Embedded Computing

without Compromise

Comparison table: TX2 vs TX1

Jetson TX2 Jetson TX1

GPU NVIDIA Pascal™, 256 CUDA cores NVIDIA Maxwell ™, 256 CUDA cores

CPU HMP Dual Denver 2/2 MB L2 +

Quad ARM® A57/2 MB L2

Quad ARM® A57/2 MB L2

Memory 8 GB 128 bit LPDDR4

58.3 GB/s

4 GB 64 bit LPDDR4

25.6 GB/s

Display 2x DSI, 2x DP 1.2 / HDMI 2.0 / eDP 1.4 2x DSI, 1x eDP 1.4 / DP 1.2 / HDMI

PCIE Gen 2 | 1x4 + 1x1 OR 2x1 + 1x2 Gen 2 | 1x4 + 1x1

Data Storage 32 GB eMMC, SDIO, SATA 16 GB eMMC, SDIO, SATA

Other CAN, UART, SPI, I2C, I2S, GPIOs UART, SPI, I2C, I2S, GPIOs

USB USB 3.0 + USB 2.0

Connectivity 1 Gigabit Ethernet, 802.11ac WLAN, Bluetooth

Mechanical 50 mm x 87 mm (400-Pin Compatible Board-to-Board Connector)

Page 12: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

12

Aitech

Embedded Computing

without Compromise

Dual Operating Modes

Page 13: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

13

Aitech

Embedded Computing

without Compromise

non-graphical benchmark (CUDA algorithms) - lower is better [ms]

TX1 TX2 MAXQ TX2 MAXN TX2 MAXQ vs TX1 TX2 MAXN vs TX1

n-body number 4096 4096 4096

Time for 10 iterations [msec] 22.533 68.4 16.421 -67% 27%

n-body number 8192 8192 8192

Time for 10 iterations [msec] 81.491 272.97 65.24 -70% 20%

n-body number 16384 16384 16384

Time for 10 iterations [msec] 206.799 527.47 154 -61% 25.5 %

TX2 has a better performance

when using MAXN power mode

Page 14: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

14

Aitech

Embedded Computing

without Compromise

CPU benchmark - lower is better [ms] - nbody algorithm running on CPU

TX1 TX2 MAXQ TX2 MAXN TX2 MAXQ vs TX1 TX2 MAXN vs TX1

n-body number 4096 4096 4096

Time for 10 iterations [msec] 30492.172 57837.430 7169.735 -47% 76.5%

n-body number 8192 8192 8192

Time for 10 iterations [msec] 121315.578 232723.719 11340.421 -48% 90%

TX2 has a better CPU performance

when using MAXN power mode

Page 15: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

15

Aitech

Embedded Computing

without Compromise

Conclusions

•TX2 getting a boost in GPU CUDA calculation power using MAXN power mode

MAXN power mode - increase of about 24% in performance (max power consumption 15 W) MAXQ power mode - decrease of about 66% in performance (max power consumption 7.5 W)

•TX2 getting a boost in CPU calculation power using MAXN power mode

MAXN power mode - increase of about 83% in performance (max power consumption 15 W) MAXQ power mode - decrease of about 47% in performance (max power consumption 7.5 W)

•The SW release is "Developer Preview Release", so I hope it should be a lot of improvement and optimizations in near future As we see from above, the half power coming with half of performance.

The full power coming with the boost for GPU (CUDA 24%) and CPU (83%).

Page 16: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

16

Aitech

Embedded Computing

without Compromise

Page 17: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

17

Aitech

Embedded Computing

without Compromise

Special Features

Page 18: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

18

Aitech

Embedded Computing

without Compromise

Technical Features

A176 – Cyclone GPGPU Fanless Small FF RediBuilt™

Supercomputer

Page 19: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

19

Aitech

Embedded Computing

without Compromise

A176 Cyclone Based on NVIDIA Jetson TX1/TX2

Pinout will be backward-compatible with future versions

Draws as little as 1 Watt of power or lower while idle

8-10 Watts under typical CUDA load

Up to 17 Watts when the CPU/GPU are fully utilized

Automatically scaling of CPU,GPU, memory

1 TFLOPS

Hardware encoder (H264/H265) and decoder

Ultra Small Form Factor –

129 mm [5.1"] square, 840g [1.85 lbs.]

Page 20: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

20

Aitech

Embedded Computing

without Compromise

A176 Block Diagram

Optional

Expansion

Module

Front Panel Connectors

4GB RAM

LPDDR4

NVIDIA

Jetson TX1System on Module

16GB Flash

eMMC 5.1

Quad-Core

ARM CPU

NVIDIA

GPU

Dis

cre

te I/O

8

US

B 2

.0

UA

RT

Mini SATA

SSD

I2C

22

Gig

ab

it E

the

rne

t

DV

I/H

DM

I O

utp

ut

Optional

Expansion

Module

PC

Ie

Isolated

Power

Supply

Line

Filter 2

Optional I/O

- 8 x Composite Inputs

- 1 x SDI Input

PC

Ie

ETR

Page 21: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

21

Aitech

Embedded Computing

without Compromise

A176 Highlights SWaP Optimized Rugged HPEC

Ultra Small Form Factor – 129 mm [5.1"] square, < 1 kg [2.2 lbs.]

NVIDIA® Jetson™ TX1 System on Module

NVIDIA Maxwell™ Architecture GPU, with 256 CUDA cores

ARM® Cortex® A57 Quad-Core CPU

1 TFLOPS

H.264/H.265 HW Encoder

Best Available Performance per Watt – 60 GFLOPS/W

SATA SSD with Quick Erase & Secure Erase

4 GB LPDDR4

Video Capture

SDI (SD/HD) w/dedicated H.264 encoder

Composite (RS-170A [NTSC]/PAL), 8 channels available simultaneously

I/O

Gigabit Ethernet DVI/HDMI Output

UART Serial Composite Input

USB 2.0 SDI Input

Discretes

CUDA, OpenGL, OpenGL ES, EGL

Low Power Consumption

Development Platforms Available

Additional expansions:

1. Dual Channel 1553

2. ARINC 429

3. Camera Link Frame Grabber

Page 22: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

22

Aitech

Embedded Computing

without Compromise

Technical Features

C535 – Typhoon GPGPU 3U VPX Supercomputer Board

Page 23: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

23

Aitech

Embedded Computing

without Compromise

C535 Typhoon Highlights

Rugged 3U VPX HPEC Board – SBC with on-board GPGPU

NVIDIA® Jetson™ TX1 System on Module

NVIDIA Maxwell™ Architecture GPU,

with 256 CUDA cores

ARM® Cortex® A57 Quad-Core CPU

1 TFLOPS

H.264/H.265 HW Encoder

Best Available Performance per Watt –

60 GFLOPS/W

SATA SSD with Quick Erase & Secure Erase

4 GB LPDDR4

Video Capture

SDI (SD/HD) w/dedicated H.264 encoder

Composite (RS-170A [NTSC]/PAL),

8 channels available simultaneously

I/O

Gigabit Ethernet DVI/HDMI Output

UART Serial Composite Input

USB 2.0 SDI Input

Discretes

CUDA, OpenGL, OpenGL ES, EGL

Low Power Consumption

Development Platforms Available

Rugged 3U VPX HPEC Board –

SBC with on-board GPGPU

Page 24: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

24

Aitech

Embedded Computing

without Compromise

C535 Block Diagram

Front Panel Connectors

4GB RAM

LPDDR4

NVIDIA

Jetson TX1System on Module

16GB Flash

eMMC 5.1

Quad-Core

ARM CPU

NVIDIA

GPU

Dis

cre

te I/O

8

US

B 2

.0

UA

RT

Mini SATA

SSD

I2C

22

Gig

ab

it E

the

rne

t

DV

I/H

DM

I O

utp

ut

Optional

Expansion

Module

PSU

2

Optional I/O

- 8 x Composite Inputs

- 1 x SDI Input

ETR

PCIe

Switch

PC

Ie x

4

PC

Ie x

4

PC

Ie x

4

PC

Ie

SDOptional

Expansion

Module

PC

Ie

Page 25: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

25

Aitech

Embedded Computing

without Compromise

A176/C535 – Interface Expansions

Currently available:

• FG – Simultaneously captures 8 composite PAL/NTSC inputs

• FG – HD/SD-SDI – H264 dedicated encoder (streaming) Available upon request:

• FG – CameraLink input

• ARINC-429 – 6 channels

• 1553 – 2 channels

Special Features

Page 26: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

26

Aitech

Embedded Computing

without Compromise

Technical Features

EV176 Development System for A176/C535

Page 27: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

27

Aitech

Embedded Computing

without Compromise

Start SW development

right now!

EV176 Development System for A176 Cyclone

Page 28: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

28

Aitech

Embedded Computing

without Compromise

GPU rendering (navigation, maps, etc…)

CUDA based (algorithms)

Image Processing (CUDA accelerated)

Radars

Flight Simulators

Video recorders/streaming

Surveillance

Autonomous Vehicles/Drones

Smart Cities

GPGPU extensions to existing systems

Applications

Page 29: Evolution of the Rugged GPGPU Computer - NVIDIA...HW encoder (H264/H265) and decoder 4K video processing MIPI CSI x4 cameras or six CSI x2 cameras . 9 Aitech Embedded Computing without

29

Aitech

Embedded Computing

without Compromise

Thank you!