Download - Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 1

Building Distributed Deep Learning Engine

Guangdeng Liao, Zhan Zhang and Murtaza Zafer


What is Deep Learning

Deep learning is a set of algorithms that attempt to model high-level abstractions in data by using architectures composed of multiple non-linear transformations

Learn hidden features Learning state emission prob.

Learning word vectors


Usage Scenario: Speech Recognition, Image processing and NLP

Thanks Big Data, Deep Learning is not only research


To make our devices smarter and more intelligent by recognizing voice, image and even language

Why Samsung needs Deep Learning?


How does Deep Learning look like?

Many more examples (millions to billions parameters ) in Speech Recognition, Image Processing and NLP

Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks


Deep Learning is challenging..

BIG DATA + BIG MODEL

Quite new, no mature platform yet

Hard to design and develop DL algorithms

Building a distributed deep learning platform for Samsung R&D


Distributed Deep Learning Platform we are building

Model-parallel engine

I/O ….Infrastructure

Parameter server

Execution engine

Math

Algorithms

RBM FF DA CNN ….

Object recognition

Speech recognition

App.

….

Our focus


Now, Let’s dive deeper and more technically…


Model-Parallel Engine (MPE)

User defined model

Auto-generation of model topology

Auto-partition of topology over

clusterc1

c2

Auto-deployment of topology (in-

memory)

c3

Neuron-like programming

Message-based communication

Message-driven computation

Parallelize a big ML model over Hadoop YARN cluster

-Define nodes-Define groups-Define connections


MPE’s Architecture

Node Manager

Node manager

ControllerPartition and

deploy topology

Node manager

Application Master

Container

Container

Container

Data Communication:• node-level• group-level

Control comm. based on Thrift

Data comm. based on Netty


How to partition big models

Vertical Partition Horizontal Partition


Execution Engine (Layer-by-Layer Training)

HDFS/LFS

HDFS/LFS

Can stack different layers and training algorithms


Model-parallel itself is not scalable enough


Deep Learning Infra.: Hybrid of Data-parallelism and Model-parallelism

……..Data Chunk

Model-parallel Model-parallel

Data Chunk

……..

Parameter Server 1

Parameter Server n

……..

Parameters coordination

Data-parallelism

Lots of model instances

Parameter servers help models learn

each other


Distributed Parameter Servers

Client Client Client

HBase/HDFS

In-memory cache/storage



Server 1 Server 2 Server 3

Asyn. communication

Currently we support asynchronous stochastic gradient descent with AdaGrad

Pull/Push


Deep Learning Algorithms

Deep Learning Algorithms

Feed-forward Neural NetworkRestricted Boltzmann

MachineDenoise Auto-encoder

Deep Belief Network

More importantly, we can stack them layer by layer


More Challenging Algorithm: Convolutional Neural Network

第 17页

Layer: Multi-dimensional feature map

neurons

Output: Dense layer feed-forward

neurons

Input:e.g. image, spectral map of

voice data

Layer: Multi-dimensional feature map

neurons

Different convolutional, normalization and pooling layers

Weight shared and non-shared feature maps

Feature map is minimum partition unit


Sharing some early experiences/lessons

Computation abstraction might be too low level (a lot of pros and cons)

A generic deep learning platform is very challenging (like recurrent NN)

Communication is important

Methods of partitioning models are important

High performance mathematical library is useful

Infrastructure


Sharing some early experiences

Models for ASR are relatively small

Models for image are much larger

Models for NLP are typical small

DA seems more efficient than RBM for image

Accelerated SGD or Hessian free optimizations need to be explored

Algorithm/Models


Usage cases of Deep Learning


Image Recognition

Object models

Object parts

Edges

Pixels

Image pixels

Hand-designedFeature Extraction(SIFT, HOG etc.)

Trainable Classifier

ObjectCategory

Featured Learner (Convolutional NN is

popular)

Learned high level features

Data augmentation

Central andcorner crops

OriginalImage


Speech Recognition

• DNN is used to replace GMM to learn state output probability in HMM.

• FF and DBN have been used for ASR

• CNN starts being used to further improve WER

• Rectified Linear Activation seems better than Sigmoid

• Models are relatively small (e.g. 5 layers, 2560 neurons/hidden layer)

Li Deng, A Tutorial Survey of Architectures, Algorithms, and Applications for Deep Learning


NLP

Learn word vector

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, Efficient Estimation of Word Representations in Vector Space

Deep Learning in NLP is quite new


NLP

Sentiment Analysis

Richard Socher Jeffrey Pennington Eric H. Huang Andrew Y. Ng Christopher D. Manning, Semi-Supervised Recursive Autoencodersfor Predicting Sentiment Distributions

Based on word vector, map sentences to vector space now