Deep Learning with Framework

Deep Learning with Framework by Nervana Systems

Paramita Mirza 15.10.2015

http://neon.nervanasys.com/

http://neon.nervanasys.com/

GPU vs CPU

• Multilayer perceptron, 1 hidden layer (100 nodes)

• MNIST dataset (handwritten digits), 60,000 training instances, 10,000 test instances

Network Layers:

Linear Layer 'LinearLayer': 784 inputs, 100 outputs

Activation Layer 'ActivationLayer': Rectlin

Linear Layer 'LinearLayer': 100 inputs, 10 outputs

Activation Layer 'ActivationLayer': Logistic

Epoch 0 [Train |████████████████████| 469/469 batches, 0.70 cost, 18.90s]










Misclassification error = 2.6%

2015-10-14 18:01:37,804 INFO:__init__ - Cudanet backend, RNG seed:

None, numerr: None

2015-10-14 18:01:37,804 INFO:mlp - Layers:

DataLayer d0: 784 nodes

FCLayer h0: 784 inputs, 100 nodes, RectLin act_fn

FCLayer output: 100 inputs, 10 nodes, Logistic act_fn

CostLayer cost: 10 nodes, CrossEntropy cost_fn

2015-10-14 18:01:56,206 INFO:mlp - commencing model fitting

2015-10-14 18:02:01,799 INFO:mlp - epoch: 0, training error: 0.70664










2015-10-14 18:02:55,626 INFO:fit_predict_err - test set

MisclassPercentage_TOP_1 2.57412

2015-10-14 18:02:58,733 INFO:fit_predict_err - train set

MisclassPercentage_TOP_1 1.11846

58.88 sec vs 185.11 sec 68.2% faster

Getting Started

or

GPU (Maxwell based architecture) requires the installation of CUDA SDK and drivers

https://developer.nvidia.com/cuda-downloads

…or with Docker

docker pull kaixhin/neon CPU docker pull kaixhin/cuda-neon GPU

Overview

The model can be specified in YAML or Python file

https://github.com/NervanaSystems/neon/blob/master/examples/mnist_mlp.yaml

https://github.com/NervanaSystems/neon/blob/master/examples/mnist_mlp.py

Datasets

• MNIST, a dataset of handwritten digits (28x28 grayscale), 60,000 training

samples, 10,000 test samples

• CIFAR10, an image dataset (32x32 color), 50,000 training samples,

10,000 test samples, 10 categories

• ImageCaption, an image and caption dataset (flickr8k, flickr30k,

and COCO), 5 reference sentences per image

http://nlp.cs.illinois.edu/HockenmaierGroup/8k-pictures.html

http://shannon.cs.illinois.edu/DenotationGraph/

http://mscoco.org/

Datasets (2)

• Text, Penn Treebank, Hutter Prize, and Shakespeare

• Speech? None, and handler is also not yet implemented

• Adding a new dataset?

…or modifying NEON_HOME/neon/data/loader.py

NEON_HOME/neon/data/__init__.py (continued…)

https://www.cis.upenn.edu/~treebank/

http://mattmahoney.net/dc/textdata



http://cs.stanford.edu/people/karpathy/char-rnn

NEON_HOME/neon/data/loader.py

• Update dataset_meta = { … … ,

'tempeval3': {

'size': 0,

'file': '',

'url': '',

'func': load_tempeval3

}

• Update load_tempeval3() function – e.g. opening csv files into Numpy array

def load_tempeval3(path):

tempeval3_meta = dataset_meta['tempeval3']

train_path = _valid_path_append(path, "te3-ee-token-embedding-no-label.csv")

train_label_path = _valid_path_append(path, "te3-ee-train-label.csv")

test_path = _valid_path_append(path, "te3-ee-token-embedding-no-label.csv")

test_label_path = _valid_path_append(path, "te3-ee-eval-label.csv")

X_train = np.loadtxt(open(train_path,"rb"), delimiter=",")

y_train = np.loadtxt(open(train_label_path,"rb"), delimiter=",")

X_test = np.loadtxt(open(test_path,"rb"), delimiter=",")

y_test = np.loadtxt(open(test_label_path,"rb"), delimiter=",")

nclass = 14

return (X_train, y_train), (X_test, y_test), nclass

Datasets (3)

The name of dataset to call in the YAML file

NEON_HOME/neon/data/__init__.py

• Update from neon.data.loader import ( …

… , load_tempeval3)

Datasets (4)

Input Output

Calculate error Learning

Network Functions (in Neon)

Initializers Activations Optimizers

Constant Uniform Gaussian GlorotUniform

Identity RectifiedLinear Softmax Tanh Logistic

GradientDescentMomentum RMSProp Adadelta Adam

Costs

Binary Cross Entropy Multiclass Cross Entropy Sum Squared Error

Convolutional Neural Network

• The infamous network for image recognition – e.g. GoogLeNet (22 layers deep network), the winner of the ImageNet

Large Scale Visual Recognition Challenge 2014

• Different from fully connected layers network; reducing the number of parameters to be learned, while retaining high expressiveness, with: – Local connectivity

– Weight sharing

– Pooling

Convolutional Neural Network (2)

More info: • http://ufldl.stanford.edu/tutorial/

• http://deeplearning.net/tutorial/lenet.html

http://ufldl.stanford.edu/tutorial/



http://deeplearning.net/tutorial/lenet.html

http://deeplearning.net/tutorial/lenet.html

Convolutional Neural Network (2)

• Related Neon layers:

– Pooling

– Convolutional and Deconv (composite deconvolution layer)

– Conv (convolutional layer with a learned bias and activation, implemented as a list composing separate Convolution, Bias and Activation layers)

Recurrent Neural Network

• What makes recurrent network so special? Sequences!

1. Image classification

2. Image captioning

3. Sentiment analysis

4. Machine translation

5. Video classification

1 2 3 4 5

Recurrent Neural Network


– Recurrent

– GRU (Gated Recurrent Unit) and LSTM (Long Short-Term Memory)

• More info: – http://karpathy.github.io/2015/05/21/rnn-effectiveness/

– About LSTM http://colah.github.io/posts/2015-08-Understanding-

LSTMs/

– GRU vs LSTM http://cs224d.stanford.edu/lecture_notes/LectureNotes4.pdf

http://karpathy.github.io/2015/05/21/rnn-effectiveness/




http://colah.github.io/posts/2015-08-Understanding-LSTMs/








http://cs224d.stanford.edu/lecture_notes/LectureNotes4.pdf

http://cs224d.stanford.edu/lecture_notes/LectureNotes4.pdf

Dropout Training

• A simple way to prevent neural networks from overfitting

• Random “dropout” gives big improvements on many benchmark tasks and sets new records for object recognition and molecular activity prediction


– Dropout

More info: • http://videolectures.net/nips2012_h

inton_networks/

• https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf

http://videolectures.net/nips2012_hinton_networks/



https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf



Unsupervised Learning

• Autoencoder

In Neon: # Load dataset

(X_train, y_train), (X_test,

y_test), nclass =

load_mnist(path=args.data_dir)

# Set input and target to X_train

train = DataIterator(X_train,

y_train, nclass, lshape=(1, 28,

28))

Restricted Boltzmann Machine

• A stochastic neural network; stochastic meaning the activations have a probabilistic element

• In Neon? • More info: http://blog.echen.me/2011/07/18/introduction-to-

restricted-boltzmann-machines/

SciFi/Fantasy Oscar winner

http://blog.echen.me/2011/07/18/introduction-to-restricted-boltzmann-machines/










Back to model.py • Model # setup model layers

layers = []

layers.append( … )

layers.append( … )

# initialize model object

mlp = Model(layers=layers)

• Train, Evaluate, Output # train

mlp.fit(train_set, optimizer=optimizer, num_epochs=num_epochs, cost=cost, callbacks=callbacks)

#evaluate

print('Misclassification error = %.1f%%' % (mlp.eval(valid_set, metric=Misclassification())*100))

#output

output = mlp.get_outputs(valid_set)np.savetxt("output.csv", output, delimiter=",")

• Run ./examples/mnist_mlp.py or

neon examples/mnist_mlp.yaml

Hyperparameter optimization

• Finding good hyperparameters for deep networks is quite tedious to do manually

• Spearmint have been forked and slightly extended to work with Neon*)

*) It seems that only for version < 1.0, and only for Neon in CPU

• Run: hyperopt init -y examples/mnist_mlp.yaml

https://github.com/JasperSnoek/spearmint

Deep Learning with Framework

Documents

Transcript of Deep Learning with Framework