Building distributed deep learning engine

25
SRA-SV | Cloud Research Lab Slide 1 Building Distributed Deep Learning Engine Guangdeng Liao, Zhan Zhang and Murtaza Zafer

Transcript of Building distributed deep learning engine

Page 1: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 1

Building Distributed Deep Learning Engine

Guangdeng Liao, Zhan Zhang and Murtaza Zafer

Page 2: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 2

What is Deep Learning

Deep learning is a set of algorithms that attempt to model high-level abstractions in data by using architectures composed of multiple non-linear transformations

Learn hidden features Learning state emission prob.

Learning word vectors

Page 3: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 3

Usage Scenario: Speech Recognition, Image processing and NLP

Thanks Big Data, Deep Learning is not only research

Page 4: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 4

To make our devices smarter and more intelligent by recognizing voice, image and even language

Why Samsung needs Deep Learning?

Page 5: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 5

How does Deep Learning look like?

Many more examples (millions to billions parameters ) in Speech Recognition, Image Processing and NLP

Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks

Page 6: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 6

Deep Learning is challenging..

BIG DATA + BIG MODEL

Quite new, no mature platform yet

Hard to design and develop DL algorithms

Building a distributed deep learning platform for Samsung R&D

Page 7: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 7

Distributed Deep Learning Platform we are building

Model-parallel engine

I/O ….Infrastructure

Parameter server

Execution engine

Math

Algorithms

RBM FF DA CNN ….

Object recognition

Speech recognition

App.

….

Our focus

Page 8: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 8

Now, Let’s dive deeper and more technically…

Page 9: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 9

Model-Parallel Engine (MPE)

User defined model

Auto-generation of model topology

Auto-partition of topology over

clusterc1

c2

Auto-deployment of topology (in-

memory)

c3

Neuron-like programming

Message-based communication

Message-driven computation

Parallelize a big ML model over Hadoop YARN cluster

-Define nodes-Define groups-Define connections

Page 10: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 10

MPE’s Architecture

Node Manager

Node manager

ControllerPartition and

deploy topology

Node manager

Application Master

Container

Container

Container

Data Communication:• node-level• group-level

Control comm. based on Thrift

Data comm. based on Netty

Page 11: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 11

How to partition big models

Vertical Partition Horizontal Partition

Page 12: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 12

Execution Engine (Layer-by-Layer Training)

HDFS/LFS

HDFS/LFS

Can stack different layers and training algorithms

Page 13: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 13

Model-parallel itself is not scalable enough

Page 14: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 14

Deep Learning Infra.: Hybrid of Data-parallelism and Model-parallelism

……..Data Chunk

Model-parallel Model-parallel

Data Chunk

……..

Parameter Server 1

Parameter Server n

……..

Parameters coordination

Data-parallelism

Lots of model instances

Parameter servers help models learn

each other

Page 15: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 15

Distributed Parameter Servers

Client Client Client

HBase/HDFS

In-memory cache/storage

In-memory cache/storage

In-memory cache/storage

Server 1 Server 2 Server 3

Asyn. communication

Currently we support asynchronous stochastic gradient descent with AdaGrad

Pull/Push

Page 16: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 16

Deep Learning Algorithms

Deep Learning Algorithms

Feed-forward Neural NetworkRestricted Boltzmann

MachineDenoise Auto-encoder

Deep Belief Network

More importantly, we can stack them layer by layer

Page 17: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 17

More Challenging Algorithm: Convolutional Neural Network

第 17页

Layer: Multi-dimensional feature map

neurons

Output: Dense layer feed-forward

neurons

Input:e.g. image, spectral map of

voice data

Layer: Multi-dimensional feature map

neurons

Different convolutional, normalization and pooling layers

Weight shared and non-shared feature maps

Feature map is minimum partition unit

Page 18: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 18

Sharing some early experiences/lessons

Computation abstraction might be too low level (a lot of pros and cons)

A generic deep learning platform is very challenging (like recurrent NN)

Communication is important

Methods of partitioning models are important

High performance mathematical library is useful

Infrastructure

Page 19: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 19

Sharing some early experiences

Models for ASR are relatively small

Models for image are much larger

Models for NLP are typical small

DA seems more efficient than RBM for image

Accelerated SGD or Hessian free optimizations need to be explored

Algorithm/Models

Page 20: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 20

Usage cases of Deep Learning

Page 21: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 21

Image Recognition

Object models

Object parts

Edges

Pixels

Image pixels

Hand-designedFeature Extraction(SIFT, HOG etc.)

Trainable Classifier

ObjectCategory

Featured Learner (Convolutional NN is

popular)

Learned high level features

Data augmentation

Central andcorner crops

OriginalImage

Page 22: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 22

Speech Recognition

• DNN is used to replace GMM to learn state output probability in HMM.

• FF and DBN have been used for ASR

• CNN starts being used to further improve WER

• Rectified Linear Activation seems better than Sigmoid

• Models are relatively small (e.g. 5 layers, 2560 neurons/hidden layer)

Li Deng, A Tutorial Survey of Architectures, Algorithms, and Applications for Deep Learning

Page 23: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 23

NLP

Learn word vector

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, Efficient Estimation of Word Representations in Vector Space

Deep Learning in NLP is quite new

Page 24: Building distributed deep learning engine

SRA-SV | Cloud Research Lab Slide 24

NLP

Sentiment Analysis

Richard Socher Jeffrey Pennington Eric H. Huang Andrew Y. Ng Christopher D. Manning, Semi-Supervised Recursive Autoencodersfor Predicting Sentiment Distributions

Based on word vector, map sentences to vector space now

Page 25: Building distributed deep learning engine

Q&A