SRA-SV | Cloud Research Lab Slide 1
Building Distributed Deep Learning Engine
Guangdeng Liao, Zhan Zhang and Murtaza Zafer
SRA-SV | Cloud Research Lab Slide 2
What is Deep Learning
Deep learning is a set of algorithms that attempt to model high-level abstractions in data by using architectures composed of multiple non-linear transformations
Learn hidden features Learning state emission prob.
Learning word vectors
SRA-SV | Cloud Research Lab Slide 3
Usage Scenario: Speech Recognition, Image processing and NLP
Thanks Big Data, Deep Learning is not only research
SRA-SV | Cloud Research Lab Slide 4
To make our devices smarter and more intelligent by recognizing voice, image and even language
Why Samsung needs Deep Learning?
SRA-SV | Cloud Research Lab Slide 5
How does Deep Learning look like?
Many more examples (millions to billions parameters ) in Speech Recognition, Image Processing and NLP
Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks
SRA-SV | Cloud Research Lab Slide 6
Deep Learning is challenging..
BIG DATA + BIG MODEL
Quite new, no mature platform yet
Hard to design and develop DL algorithms
Building a distributed deep learning platform for Samsung R&D
SRA-SV | Cloud Research Lab Slide 7
Distributed Deep Learning Platform we are building
Model-parallel engine
I/O ….Infrastructure
Parameter server
Execution engine
Math
Algorithms
RBM FF DA CNN ….
Object recognition
Speech recognition
App.
….
Our focus
SRA-SV | Cloud Research Lab Slide 8
Now, Let’s dive deeper and more technically…
SRA-SV | Cloud Research Lab Slide 9
Model-Parallel Engine (MPE)
User defined model
Auto-generation of model topology
Auto-partition of topology over
clusterc1
c2
Auto-deployment of topology (in-
memory)
c3
Neuron-like programming
Message-based communication
Message-driven computation
Parallelize a big ML model over Hadoop YARN cluster
-Define nodes-Define groups-Define connections
SRA-SV | Cloud Research Lab Slide 10
MPE’s Architecture
Node Manager
Node manager
ControllerPartition and
deploy topology
Node manager
Application Master
Container
Container
Container
Data Communication:• node-level• group-level
Control comm. based on Thrift
Data comm. based on Netty
SRA-SV | Cloud Research Lab Slide 11
How to partition big models
Vertical Partition Horizontal Partition
SRA-SV | Cloud Research Lab Slide 12
Execution Engine (Layer-by-Layer Training)
HDFS/LFS
HDFS/LFS
Can stack different layers and training algorithms
SRA-SV | Cloud Research Lab Slide 13
Model-parallel itself is not scalable enough
SRA-SV | Cloud Research Lab Slide 14
Deep Learning Infra.: Hybrid of Data-parallelism and Model-parallelism
……..Data Chunk
Model-parallel Model-parallel
Data Chunk
……..
Parameter Server 1
Parameter Server n
……..
Parameters coordination
Data-parallelism
Lots of model instances
Parameter servers help models learn
each other
SRA-SV | Cloud Research Lab Slide 15
Distributed Parameter Servers
Client Client Client
HBase/HDFS
In-memory cache/storage
In-memory cache/storage
In-memory cache/storage
Server 1 Server 2 Server 3
Asyn. communication
Currently we support asynchronous stochastic gradient descent with AdaGrad
Pull/Push
SRA-SV | Cloud Research Lab Slide 16
Deep Learning Algorithms
Deep Learning Algorithms
Feed-forward Neural NetworkRestricted Boltzmann
MachineDenoise Auto-encoder
Deep Belief Network
More importantly, we can stack them layer by layer
SRA-SV | Cloud Research Lab Slide 17
More Challenging Algorithm: Convolutional Neural Network
第 17页
Layer: Multi-dimensional feature map
neurons
Output: Dense layer feed-forward
neurons
Input:e.g. image, spectral map of
voice data
Layer: Multi-dimensional feature map
neurons
Different convolutional, normalization and pooling layers
Weight shared and non-shared feature maps
Feature map is minimum partition unit
SRA-SV | Cloud Research Lab Slide 18
Sharing some early experiences/lessons
Computation abstraction might be too low level (a lot of pros and cons)
A generic deep learning platform is very challenging (like recurrent NN)
Communication is important
Methods of partitioning models are important
High performance mathematical library is useful
Infrastructure
SRA-SV | Cloud Research Lab Slide 19
Sharing some early experiences
Models for ASR are relatively small
Models for image are much larger
Models for NLP are typical small
DA seems more efficient than RBM for image
Accelerated SGD or Hessian free optimizations need to be explored
Algorithm/Models
SRA-SV | Cloud Research Lab Slide 20
Usage cases of Deep Learning
SRA-SV | Cloud Research Lab Slide 21
Image Recognition
Object models
Object parts
Edges
Pixels
Image pixels
Hand-designedFeature Extraction(SIFT, HOG etc.)
Trainable Classifier
ObjectCategory
Featured Learner (Convolutional NN is
popular)
Learned high level features
Data augmentation
Central andcorner crops
OriginalImage
SRA-SV | Cloud Research Lab Slide 22
Speech Recognition
• DNN is used to replace GMM to learn state output probability in HMM.
• FF and DBN have been used for ASR
• CNN starts being used to further improve WER
• Rectified Linear Activation seems better than Sigmoid
• Models are relatively small (e.g. 5 layers, 2560 neurons/hidden layer)
Li Deng, A Tutorial Survey of Architectures, Algorithms, and Applications for Deep Learning
SRA-SV | Cloud Research Lab Slide 23
NLP
Learn word vector
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, Efficient Estimation of Word Representations in Vector Space
Deep Learning in NLP is quite new
SRA-SV | Cloud Research Lab Slide 24
NLP
Sentiment Analysis
Richard Socher Jeffrey Pennington Eric H. Huang Andrew Y. Ng Christopher D. Manning, Semi-Supervised Recursive Autoencodersfor Predicting Sentiment Distributions
Based on word vector, map sentences to vector space now
Q&A
Top Related