End to End Deep Learning Solution on Arm Architecture...Arm on the road Astra at Sandia National Lab...

12
End to End Deep Learning Solution on Arm Architecture Jan. 14 2019, Jammy Zhou

Transcript of End to End Deep Learning Solution on Arm Architecture...Arm on the road Astra at Sandia National Lab...

Page 1: End to End Deep Learning Solution on Arm Architecture...Arm on the road Astra at Sandia National Lab of US is the first Arm based supercomputer entering TOP500 list, numbered at 203

End to End Deep Learning Solutionon Arm Architecture

Jan. 14 2019, Jammy Zhou

Page 2: End to End Deep Learning Solution on Arm Architecture...Arm on the road Astra at Sandia National Lab of US is the first Arm based supercomputer entering TOP500 list, numbered at 203

HPC and AI convergenceTOP500 Trend

More than 50 percent of additional flops in the latest TOP500 rankings were from Nvidia Tesla GPUs according to TOP500 report

Half of TOP10 systems use Nvidia GPUs, and 122 systems of TOP500 use Nvidia GPUs (64 systems uses P100 GPUs, 46 systems uses V100 GPUs, 12 systems uses Kepler GPUs)

More AI/ML/DL workloads are being added to HPC applications with wide adoption of Nvidia GPUs

Arm on the road

Astra at Sandia National Lab of US is the first Arm based supercomputer entering TOP500 list, numbered at 203 in the latest ranking.

Good momentum of Arm based supercomputers around the world, Post-K from Japan, Tianhe-3 from China, Catalyst UK, GW4 Isambad and CEA system from Europe

Arm SVE is enabled by Post-K together with the Tofu D interconnect and HBM2 memory, and will be used for some AI workloads

Besides Nvidia GPUs, there are some other accelerator options in the market, for example, MI60/MI50 Radeon Instinct GPUs from AMD, Xilinx and Intel FPGAs, customized ASIC products, etc

Page 3: End to End Deep Learning Solution on Arm Architecture...Arm on the road Astra at Sandia National Lab of US is the first Arm based supercomputer entering TOP500 list, numbered at 203

HPC and AI in the Cloud

CPU Accelerator

Network Storage

AI & ML ServicesHPC Services

100 Gbps Ethernet, InfiniBand, Omni-Path, RDMA and RoCE

Fast and scalable storage, such as NVMe based local SSD

Arm on the roadScience Cloud with Arm based HPC from HPC Systems (supporting Hisilicon Hi1616 and Marvell Thunder X2)

Amazon EC2 A1 instances based on AWS Graviton Arm 64-bit processor for scale-out and Arm based workloads

Arm Neoverse continuous improvement

Accelerators (GPUs, FPGAs, ASICs)

HPC & AI software stack (languages, frameworks, libraries, drivers, compilers, etc), multi-node distributed support and MPI

Page 4: End to End Deep Learning Solution on Arm Architecture...Arm on the road Astra at Sandia National Lab of US is the first Arm based supercomputer entering TOP500 list, numbered at 203

HW Diversity & SW Fragmentation

DL Frameworks

HAL and Drivers

Libraries

Hardware (CPU, GPU, FPGA, ASIC, DSP)

TensorFlowCaffe MXNet Theano

Caffe2 CNTKPaddlePaddle

BLAS FFT RNG SPARSE Eigen

PyTorch Keras

Framework support for multiple accelerators

CMSIS-NNACL

Model Formats (framework specific, ONNX, NNEF)Deep Learning Compilers (TVM, Glow, XLA, ONNC, etc)

1. Difficult to switch between frameworks by application and algorithm developers

2. Different backends to maintain by framework developers for various accelerators

3. Multiple frameworks to support by chip and IP vendors with duplicated efforts, and out-of-tree support by forking the upstream

4. Multiple configurations to support by OEMs/ODMs and cloud vendors

Chainer...

1

2

3

4

Big Data Analytics

TensorFlowOnSpark CaffeOnSpark SparkFlow ...

cuDNN MIOpen

...

Page 5: End to End Deep Learning Solution on Arm Architecture...Arm on the road Astra at Sandia National Lab of US is the first Arm based supercomputer entering TOP500 list, numbered at 203

Open Neural Network eXchange EcosystemFramework interoperability & Hardware optimizations

ONNX Format

ONNX Models

ONNXIFI ONNX Runtime

ONNX Tools

Create Convert DeployOptimize

Page 6: End to End Deep Learning Solution on Arm Architecture...Arm on the road Astra at Sandia National Lab of US is the first Arm based supercomputer entering TOP500 list, numbered at 203

ONNX Specifications

Neural-network-only ONNXDefines an extensible computation graph model, built-in operators and standard data types

Support only tensors for input/output data types

ONNX-ML ExtensionClassical machine learning extension

Also support data types of sequences and maps, extend ONNX operator set with ML algorithms not based on neural networks

ONNX v1.3 Released on Sep. 1st 2018

Control Flow supportFunctions (composable operators, experimental)Enhanced shape inferenceAdditional optimization passesONNXIFI 1.0 (C-backend for accelerators)

More to come... QuantizationTest/ComplianceData pipelinesEdge/Mobile/IoT

Page 7: End to End Deep Learning Solution on Arm Architecture...Arm on the road Astra at Sandia National Lab of US is the first Arm based supercomputer entering TOP500 list, numbered at 203

ONNX Interface for Framework IntegrationONNXIFIStandardized interface for NN inference on different accelerators Runtime discovery and selection of execution backends, as well as ONNX operators supported on each backendSupport ONNX format & online model conversion

ONNXIFI BackendA combination of software layer and hardware device used to run an ONNX graphThe same software layer can expose multiple backendsHeterogeneous type of backend can distribute work across multiple device types internally

ONNXIFIlibonnxifi.so

Glow Library Alibonnxifi-glow.so libonnxifi-a.so

Applications

ONNX Models Frameworks

Library Blibonnxifi-b.so

Library Clibonnxifi-c.dll

Library Dlibonnxifi-d.dylib

...

Page 8: End to End Deep Learning Solution on Arm Architecture...Arm on the road Astra at Sandia National Lab of US is the first Arm based supercomputer entering TOP500 list, numbered at 203

ONNX RuntimeHigh-performance and cross-platform inference engine for ONNX modelsFully implements the ONNX specification including the ONNX-ML extensionArm platforms are supported on both Linux (experimental) and Windows

Diagram from https://github.com/Microsoft/onnxruntime/blob/master/docs/HighLevelDesign.mdTensorRT and nGraph support are work in progress

Page 9: End to End Deep Learning Solution on Arm Architecture...Arm on the road Astra at Sandia National Lab of US is the first Arm based supercomputer entering TOP500 list, numbered at 203

Machine IntelligenceA Linaro Strategic Initiative

Provide the best-in-class Deep Learning performance by leveraging Neural Networkacceleration in IP and SoCs from the Arm ecosystem, through collaborative seamlessintegration with the ecosystem of AI/ML software frameworks and libraries

Page 10: End to End Deep Learning Solution on Arm Architecture...Arm on the road Astra at Sandia National Lab of US is the first Arm based supercomputer entering TOP500 list, numbered at 203

Scope from HPC to microcontroller

HPC, Data Center & Cloud *SVE based optimization for DL frameworks & libraries

PCIe/CCIX based heterogeneous accelerator support on Arm servers (drivers, compilers and framework integration, etc)

Scale out support for distributed training

Edge node & deviceInitial focus on inference support for Cortex-A SOCs

Common model description format and APIs to the runtime

Common optimized runtime inference engine for Arm-based SoC

Plug-in framework to support multiple 3rd party IPs (NPU, GPU, DSP, FPGA)

Continuous integration testing and benchmarking

Microcontroller *CMSIS-NN optimized frameworks/libraries on RTOS

Frameworks like uTendor and TensorFlow Lite (quantization, footprint reduction, etc)

IP based accelerator support & optimization

* under discussion

traininginference

Page 11: End to End Deep Learning Solution on Arm Architecture...Arm on the road Astra at Sandia National Lab of US is the first Arm based supercomputer entering TOP500 list, numbered at 203

ArmNN based collaborations - ongoing

https://developer.arm.com/products/processors/machine-learning/arm-nnhttps://community.arm.com/tools/b/blog/posts/arm-nn-the-easy-way-to-deploy-edge-ml

A good base for future collaborations:

100 man-years of effort, 340,000 lines of code

Shipping in over 200 million Android devices based on estimation

Impressive performance uplift by software-only improvements over a period of 6 months

Page 12: End to End Deep Learning Solution on Arm Architecture...Arm on the road Astra at Sandia National Lab of US is the first Arm based supercomputer entering TOP500 list, numbered at 203

Thanks!