Deep Learning with Framework by Nervana Systems
Paramita Mirza 15.10.2015
http://neon.nervanasys.com/
GPU vs CPU
• Multilayer perceptron, 1 hidden layer (100 nodes)
• MNIST dataset (handwritten digits), 60,000 training instances, 10,000 test instances
Network Layers:
Linear Layer 'LinearLayer': 784 inputs, 100 outputs
Activation Layer 'ActivationLayer': Rectlin
Linear Layer 'LinearLayer': 100 inputs, 10 outputs
Activation Layer 'ActivationLayer': Logistic
Epoch 0 [Train |████████████████████| 469/469 batches, 0.70 cost, 18.90s]
Epoch 1 [Train |████████████████████| 469/469 batches, 0.27 cost, 18.57s]
Epoch 2 [Train |████████████████████| 469/469 batches, 0.21 cost, 18.08s]
Epoch 3 [Train |████████████████████| 468/468 batches, 0.17 cost, 18.75s]
Epoch 4 [Train |████████████████████| 469/469 batches, 0.15 cost, 17.78s]
Epoch 5 [Train |████████████████████| 469/469 batches, 0.13 cost, 19.43s]
Epoch 6 [Train |████████████████████| 469/469 batches, 0.11 cost, 19.08s]
Epoch 7 [Train |████████████████████| 468/468 batches, 0.10 cost, 18.05s]
Epoch 8 [Train |████████████████████| 469/469 batches, 0.09 cost, 18.12s]
Epoch 9 [Train |████████████████████| 469/469 batches, 0.08 cost, 18.35s]
Misclassification error = 2.6%
2015-10-14 18:01:37,804 INFO:__init__ - Cudanet backend, RNG seed:
None, numerr: None
2015-10-14 18:01:37,804 INFO:mlp - Layers:
DataLayer d0: 784 nodes
FCLayer h0: 784 inputs, 100 nodes, RectLin act_fn
FCLayer output: 100 inputs, 10 nodes, Logistic act_fn
CostLayer cost: 10 nodes, CrossEntropy cost_fn
2015-10-14 18:01:56,206 INFO:mlp - commencing model fitting
2015-10-14 18:02:01,799 INFO:mlp - epoch: 0, training error: 0.70664
2015-10-14 18:02:07,829 INFO:mlp - epoch: 1, training error: 0.27303
2015-10-14 18:02:13,771 INFO:mlp - epoch: 2, training error: 0.21024
2015-10-14 18:02:19,800 INFO:mlp - epoch: 3, training error: 0.17499
2015-10-14 18:02:25,457 INFO:mlp - epoch: 4, training error: 0.15191
2015-10-14 18:02:31,440 INFO:mlp - epoch: 5, training error: 0.13190
2015-10-14 18:02:37,046 INFO:mlp - epoch: 6, training error: 0.11669
2015-10-14 18:02:43,039 INFO:mlp - epoch: 7, training error: 0.10395
2015-10-14 18:02:49,045 INFO:mlp - epoch: 8, training error: 0.09465
2015-10-14 18:02:55,084 INFO:mlp - epoch: 9, training error: 0.08586
2015-10-14 18:02:55,626 INFO:fit_predict_err - test set
MisclassPercentage_TOP_1 2.57412
2015-10-14 18:02:58,733 INFO:fit_predict_err - train set
MisclassPercentage_TOP_1 1.11846
58.88 sec vs 185.11 sec 68.2% faster
Getting Started
or
GPU (Maxwell based architecture) requires the installation of CUDA SDK and drivers
…or with Docker
docker pull kaixhin/neon CPU docker pull kaixhin/cuda-neon GPU
Overview
The model can be specified in YAML or Python file
Datasets
• MNIST, a dataset of handwritten digits (28x28 grayscale), 60,000 training
samples, 10,000 test samples
• CIFAR10, an image dataset (32x32 color), 50,000 training samples,
10,000 test samples, 10 categories
• ImageCaption, an image and caption dataset (flickr8k, flickr30k,
and COCO), 5 reference sentences per image
Datasets (2)
• Text, Penn Treebank, Hutter Prize, and Shakespeare
• Speech? None, and handler is also not yet implemented
• Adding a new dataset?
…or modifying NEON_HOME/neon/data/loader.py
NEON_HOME/neon/data/__init__.py (continued…)
NEON_HOME/neon/data/loader.py
• Update dataset_meta = { … … ,
'tempeval3': {
'size': 0,
'file': '',
'url': '',
'func': load_tempeval3
}
• Update load_tempeval3() function – e.g. opening csv files into Numpy array
def load_tempeval3(path):
tempeval3_meta = dataset_meta['tempeval3']
train_path = _valid_path_append(path, "te3-ee-token-embedding-no-label.csv")
train_label_path = _valid_path_append(path, "te3-ee-train-label.csv")
test_path = _valid_path_append(path, "te3-ee-token-embedding-no-label.csv")
test_label_path = _valid_path_append(path, "te3-ee-eval-label.csv")
X_train = np.loadtxt(open(train_path,"rb"), delimiter=",")
y_train = np.loadtxt(open(train_label_path,"rb"), delimiter=",")
X_test = np.loadtxt(open(test_path,"rb"), delimiter=",")
y_test = np.loadtxt(open(test_label_path,"rb"), delimiter=",")
nclass = 14
return (X_train, y_train), (X_test, y_test), nclass
Datasets (3)
The name of dataset to call in the YAML file
NEON_HOME/neon/data/__init__.py
• Update from neon.data.loader import ( …
… , load_tempeval3)
Datasets (4)
Input Output
Calculate error Learning
Network Functions (in Neon)
Initializers Activations Optimizers
Constant Uniform Gaussian GlorotUniform
Identity RectifiedLinear Softmax Tanh Logistic
GradientDescentMomentum RMSProp Adadelta Adam
Costs
Binary Cross Entropy Multiclass Cross Entropy Sum Squared Error
Convolutional Neural Network
• The infamous network for image recognition – e.g. GoogLeNet (22 layers deep network), the winner of the ImageNet
Large Scale Visual Recognition Challenge 2014
• Different from fully connected layers network; reducing the number of parameters to be learned, while retaining high expressiveness, with: – Local connectivity
– Weight sharing
– Pooling
Convolutional Neural Network (2)
More info: • http://ufldl.stanford.edu/tutorial/
• http://deeplearning.net/tutorial/lenet.html
Convolutional Neural Network (2)
• Related Neon layers:
– Pooling
– Convolutional and Deconv (composite deconvolution layer)
– Conv (convolutional layer with a learned bias and activation, implemented as a list composing separate Convolution, Bias and Activation layers)
Recurrent Neural Network
• What makes recurrent network so special? Sequences!
1. Image classification
2. Image captioning
3. Sentiment analysis
4. Machine translation
5. Video classification
1 2 3 4 5
Recurrent Neural Network
• Related Neon layers:
– Recurrent
– GRU (Gated Recurrent Unit) and LSTM (Long Short-Term Memory)
• More info: – http://karpathy.github.io/2015/05/21/rnn-effectiveness/
– About LSTM http://colah.github.io/posts/2015-08-Understanding-
LSTMs/
– GRU vs LSTM http://cs224d.stanford.edu/lecture_notes/LectureNotes4.pdf
Dropout Training
• A simple way to prevent neural networks from overfitting
• Random “dropout” gives big improvements on many benchmark tasks and sets new records for object recognition and molecular activity prediction
• Related Neon layers:
– Dropout
More info: • http://videolectures.net/nips2012_h
inton_networks/
• https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf
Unsupervised Learning
• Autoencoder
In Neon: # Load dataset
(X_train, y_train), (X_test,
y_test), nclass =
load_mnist(path=args.data_dir)
# Set input and target to X_train
train = DataIterator(X_train,
y_train, nclass, lshape=(1, 28,
28))
Restricted Boltzmann Machine
• A stochastic neural network; stochastic meaning the activations have a probabilistic element
• In Neon? • More info: http://blog.echen.me/2011/07/18/introduction-to-
restricted-boltzmann-machines/
SciFi/Fantasy Oscar winner
Back to model.py • Model # setup model layers
layers = []
layers.append( … )
layers.append( … )
# initialize model object
mlp = Model(layers=layers)
• Train, Evaluate, Output # train
mlp.fit(train_set, optimizer=optimizer, num_epochs=num_epochs, cost=cost, callbacks=callbacks)
#evaluate
print('Misclassification error = %.1f%%' % (mlp.eval(valid_set, metric=Misclassification())*100))
#output
output = mlp.get_outputs(valid_set)np.savetxt("output.csv", output, delimiter=",")
• Run ./examples/mnist_mlp.py or
neon examples/mnist_mlp.yaml
Hyperparameter optimization
• Finding good hyperparameters for deep networks is quite tedious to do manually
• Spearmint have been forked and slightly extended to work with Neon*)
*) It seems that only for version < 1.0, and only for Neon in CPU
• Run: hyperopt init -y examples/mnist_mlp.yaml
Top Related