Jammy Zhou - Linaro · Deep Learning Pipelines Deep Learning Pipelines is built on Spark ML...
Transcript of Jammy Zhou - Linaro · Deep Learning Pipelines Deep Learning Pipelines is built on Spark ML...
The Convergence of Big Data and AIJammy Zhou - Linaro
Big Data and AI
Data Science
Data Analytics
ArtificialIntelligence
Machine Learning
Deep Learning
Big DataDatasets
Algorithms
A unified cluster for both!
ML/DL Integration with Big Data
TensorFlowOnSparkCaffeOnSpark
Deep Learning Pipelines
Apache Initiatives 3rd Party Solutions
From Industry Vendors From Chip Vendors
Apache Spark EcosystemA unified analytics engine for large scale data processing
Standalone Hadoop YARN Mesos KubernetesResource Manager
Spark CoreComputing Engine
Spark SQL Spark Streaming MLlib GraphXService Modules
Languages JavaScala Python R
Storage & Data Sources HDFS HBase CassandraHive …...
AWS EC2
SQL
Alluxio
ML Pipelines in MLlib● ML Pipelines provide a set of APIs built on top of DataFrames from Spark SQL● Transformer is an algorithm to transform one DataFrame into another● Estimator is an algorithm to be fit on a DataFrame to produce a Transformer● Pipeline chains multiple Transformers and Estimators to specify a workflow
Transformer Transformer EstimatorDataFrame
Pipeline
PipelineModel
Transformer Transformer ModelDataFrame
PipelineModel
Result
ML Algorithms in MLlib
● Classification & Regression○ Logistic Regression○ Decision Tree○ Random Forest○ Gradient-Boosted Tree○ Linear Regression○ Multilayer Perceptron○ Linear Support Vector Machine○ Naive Bayes
● Clustering○ K-means○ Latent Dirichlet Allocation○ Bisecting k-means○ Gaussian Mixture Model
● Collaborative Filtering● Frequent Pattern Mining
○ FP-Growth○ PrefixSpan
How about Deep Learning?
Deep Learning Pipelines● Deep Learning Pipelines is built on Spark ML Pipelines by Databricks● Images are loaded into a DataFrame and decoded automatically● Enable fast transfer learning with Featurizer to reuse pre-trained models
● Apply pre-trained deep learning models as Transformers○ TF-backed Keras models and TF Graphs are supported
● Deploy models with Spark DataFrames and SQL UDFs● Distributed hyperparameter tuning with Estimator and MLlib built-in tools like
CrossValidator and TrainValidationSplit
source
Project Hydrogen● A Spark initiative to unify the Big Data and AI workloads● Barrier execution mode was introduced in Spark to run distributed DL job as
Spark job with gang scheduling○ Horovod integration via HorovodRunner (by Databricks Runtime ML) or horovod.spark (by
Horovod) to run Horovod as a Spark job
● Optimized data exchange between Spark and DL frameworks○ Pandas UDF implementation via Apache Arrow
● Accelerator aware scheduling○ Heterogeneous accelerator support by resource managers like YARN, Mesos & Kubernetes
Distributed Deep Learning● Distributed support is critical to integrate DL frameworks with Spark
● Parallelism for Deep Learning○ Data parallelism (a.k.a between-graph replication)
■ Synchronous vs. asynchronous■ Centralized vs. decentralized for synchronous training
● Parameter server for centralized mode● Ring-allreduce for decentralized mode
■ Parameter server can also be used for asynchronous training○ Model parallelism (a.k.a in-graph replication)
● Multi-device & multi-node communication○ Interconnect: PCIe, NVLink, xGMI, InfiniBand, Omni-Path, High-Speed Ethernet, RoCE○ Libraries: OpenMPI, NCCL (Nvidia), RCCL (AMD), libfabric (OpenFabrics), UCX
source
Distributed Framework Support
[1] TensorFlow has MPI collectives for Baidu allreduce, Horovod replaces Baidu allreduce with NCCL[2] CollectiveAllReduceStrategy is used by HopsML[3] HorovodEstimator is Horovod integration with Spark MLlib for distributed training
TensorFlow
TensorFlowOnSpark Horovod HopsML
Parameter server
[1][2]
HorovodEstimator
KerasBackend
PyTorchMXNet
Ring-allreduce
[3]
Angel ML
The Arm Story● Linaro Data Center and Cloud Group
○ Big Data Lead Project○ HPC SIG (SVE, MPI, math libraries, etc)
● Linaro Machine Intelligence Initiative○ Initial focus on inference support with Cortex-A SoCs○ ArmNN, TVM, etc
● Nvidia AI and HPC stack for Arm (planned for end of 2019)○ Announced at ISC 19 in Frankfurt on June 17th○ Lift the major barrier to integrate AI solutions with Big Data on Arm platforms
● What’s next?
Thank youJoin Linaro to accelerate deployment of your Arm-based solutions through collaboration