Georgia Tech cse6242 - Intro to Deep Learning and DL4J

22
Deep Learning with DL4J Scaleout Deep Learning

description

Introduction to deep learning and DL4J - http://deeplearning4j.org/ - a guest lecture by Josh Patterson at Georgia Tech for the cse6242 graduate class.

Transcript of Georgia Tech cse6242 - Intro to Deep Learning and DL4J

Page 1: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

Deep Learning with DL4J

Scaleout Deep Learning

Page 2: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

Josh Patterson

Email:[email protected]

Twitter:

@jpatanooga

Github:

https://github.com/jpatanooga

Past

Published in IAAI-09:

“TinyTermite: A Secure Routing Algorithm”

Grad work in Meta-heuristics, Ant-algorithms

Tennessee Valley Authority (TVA)

Hadoop and the Smartgrid

Cloudera

Principal Solution Architect

Today: Patterson Consulting

Page 3: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

Overview

• What is Deep Learning?

• Deep Belief Networks

• DL4J

Page 4: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

What is Deep Learning?

Page 5: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

What is Deep Learning?

Algorithm that tries to learn simple features in lower layers

And more complex features in higher layers

Page 6: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

Interesting Properties of Deep Learning

Reduces a problem with overfitting in neural networks.

Introduces new techniques for "unsupervised feature learning”

introduces new more automatic ways to figure out the parts of your data you should feed into your learning algorithm.

Page 7: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

Chasing Nature

Learning sparse representations of auditory signals

leads to filters that closely correspond to neurons in early audio processing in mammals

When applied to speech

Learned representations showed a striking resemblance to the cochlear filters in the auditory cortext

Page 8: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

Yann LeCunn on Deep Learning

Has become the dominant method for acoustic modeling in speech recognition

Quickly becoming the dominant method for several vision tasks such as

object recognition

object detection

semantic segmentation.

Page 9: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

Deep Belief Networks

Page 10: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

What is a Deep Belief Network?

Generative probabilistic model

Composed of one visible layer

Many hidden layers

Restricted Boltzman Machines

Each hidden layer learns relationship between units in lower layer

Higher layer representations tend to become more complex

Page 11: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

Restricted Boltzmann Machines

• Unsupervised model

• Does feature learning by repeated sampling of the input data.

• Learns how to reconstruct data for good feature detection.

Page 12: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

Deep Belief Network Training

Pre-Train

We should each RBM layer unlabeled vectors

“unsupervised learning”

For each layer we want to minimize the Cross Entropy

Fine-Tune

We move the learned weights (hidden bias units) from the RBMs to a traditional feed-forward neural network

We run gentle back-propagation with some labeled data

Page 13: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

Pre-Train Reconstructions

High Cross Entropy

Low Cross Entropy

Page 14: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

Deep Belief Network Diagram

• DBNs are classifiers• Layers of RBMs• Capped with a Logistic

Layer• RBMs are feature extractors• RBMs learn features via

sampling• Creates “simpler

problem” for later layers in stack

Page 15: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

Rendering RBM Hidden Neuron

Filters

Page 16: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

DeepLearning4J

Implementation in Java

Self-contained & built on Akka, Hazelcast, Jblas

Runs on desktop

Runs on Hadoop via YARN natively to scale out

Distributed to run faster and with more features than current Theano-based implementations

Page 17: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

Vectorized Implementation

Handles lots of data concurrently.

Any number of examples at once, but the code does not change.

Faster: Allows for native/GPU execution.

One format: Everything is a matrix.

Page 18: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

What are Good Applications for Deep Learning?

Image Processing

High MNIST Scores

Audio Processing

Current Champ on TIMIT dataset

Text / NLP Processing

Word2vec, etc

Page 19: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

19

Parameter Averaging

McDonald, 2010

Distributed Training Strategies for the Structured Perceptron

Langford, 2007

Vowpal Wabbit

Jeff Dean’s Work on Parallel SGD

DownPour SGD

Page 20: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

Parallelizing Deep Belief Networks

Two phase training

Pre Train

Fine tune

Each phase can do multiple passes over dataset

Entire network is averaged at master

Page 21: Georgia Tech cse6242 - Intro to Deep Learning and DL4J

PreTrain and Lots of Data

We’re exploring how to better leverage the unsupervised aspects of the PreTrain phase of Deep Belief Networks

Allows for the use of far less unlabeled data

Allows us to more easily modeled the massive amounts of structured data in HDFS