CUDA & CAFFE

21
CUDA & CAFFE Использование CUDA и CAFFE для создания глубоких нейронных сетей Babii A.S. - [email protected]

Transcript of CUDA & CAFFE

Page 1: CUDA & CAFFE

CUDA & CAFFE

Использование CUDA и CAFFE для создания глубоких нейронных сетей

Babii A.S. - [email protected]

Page 2: CUDA & CAFFE

Why we need to learn methods of ‘deep learning’

Page 3: CUDA & CAFFE

Deep learning for image recognition tasks

Image classification

Object detection and localization

Object class segmentation

Page 4: CUDA & CAFFE

Problems related with dataset size

What if we have a large dataset?

Page 5: CUDA & CAFFE

What about types of parallel computing?

GPU - specificCPU - specific

Page 6: CUDA & CAFFE

1. Saman Amarasinghe, Matrix Multiply, a case study – 2008.

Optimization table for matrix multiplication[1]

Page 7: CUDA & CAFFE

If no parallelization, but we want to make it faster

1. Use profiler(gprof, valgrind, … )

2. Does application using BLAS?

3. Use vector or matrix form of data representation and include BLAS

4. SIMD – if no other way… use it for maximum perfomance on 1 core

Page 8: CUDA & CAFFE

Бабий А.С. - [email protected]

How to make it parallel?.

1. KML, PBLAS, ATLAS

2. Когда CPU Multicore эффективнее GPU ?

3. NVIDIA CUDA.

4. OpenCL

Page 9: CUDA & CAFFE

CUDA

Page 10: CUDA & CAFFE

Deep convolutional neural networks, CAFFE implementation

ConvNet configuration by Krizhevsky [2]

Page 11: CUDA & CAFFE

Deep convolutionnetwork example

Convolution Neural Network Architecture Model[3]

Feature maps

Page 12: CUDA & CAFFE

http://www.songho.ca/dsp/convolution/convolution.html

Convolution & pooling

Page 13: CUDA & CAFFE

Набор примитивов для сетей Deep Learning

1. Сверточный слой2. Слой фильтрации3. Обобщающий слой

Интеграция с Caffe

24-core Intel E5-2679v2 CPU @ 2.4GHz vs K40, NVIDIA

Page 14: CUDA & CAFFE

Feature maps

Feature map [4]

Накладываем друг на друга но, с «коэффициентом прозрачности»

Page 15: CUDA & CAFFE

Библиотеки для работы с deep learning

Caffe – deep convolutional neural network frameworkhttp://caffe.berkeleyvision.org ConvNetJS – JS based deep learning frameworkhttp://cs.stanford.edu/people/karpathy/convnetjs/DL4J - Java based deep learning frameworkhttp://deeplearning4j.org/Theano – CPU/GPU symbolic expression compiler in pythonhttp://deeplearning.net/software/theanoCuda-Convnet – A fast C++/CUDA implementation of convolutional (or more generally, feed-forward) neural networkshttp://code.google.com/p/cuda-convnet/Torch – provides a Matlab-like environment for state-of-the-art machine learning algorithms in luahttp://www.torch.ch/Accord.NET - C# deep learninghttp://accord-framework.net/, tutorial:http://whoopsidaisies.hatenablog.com/entry/2014/08/19/015420

http://deeplearning.net/software_links/

Page 16: CUDA & CAFFE

Работа с CAFFE

Начинать лучше с утилит командной строки:

build/tools

Наиболее доступный пример на базе MNIST – распознавания рукописных цифр

http://caffe.berkeleyvision.org/gathered/examples/mnist.html

cd $CAFFE_ROOT./data/mnist/get_mnist.sh./examples/mnist/create_mnist.sh

cd $CAFFE_ROOT./examples/mnist/train_lenet.sh

Page 17: CUDA & CAFFE

В каком виде подаются входные и выходные данные?

- databases (LevelDB or LMDB)

- directly from memory

- from files on disk in HDF5

- common image formats.

http://symas.com/mdb/ http://leveldb.org/

Input data

Output data

-snapshot file with mode

-snapshot file with solver state

Solver? Yes, we can continue breacked training from snapshot

Page 18: CUDA & CAFFE

Виды слоев CAFFE

Caffe stores and communicates data in 4-dimensional arrays called blobsname: "LogReg"layers { name: "mnist" type: DATA top: "data" top: "label" data_param { source: "input_leveldb" batch_size: 64 }}layers { name: "ip" type: INNER_PRODUCT bottom: "data" top: "ip" inner_product_param { num_output: 2 }}layers { name: "loss" type: SOFTMAX_LOSS bottom: "ip" bottom: "label" top: "loss"}

Page 19: CUDA & CAFFE

Виды слоев

Convolutional layerRequired field num_output (c_o): the number of filters kernel_size (or kernel_h and kernel_w): specifies height and width of each filter

Pooling layerRequired kernel_size (or kernel_h and kernel_w): specifies height and width of each filter

Loss Layers, Activation / Neuron Layers, Data Layers, Common Layers

How to configure?

Ready to use models in folder: examples

Page 20: CUDA & CAFFE

Решение своей задачи

1. Заботимся о корректности, размере и покрытии выборок.

2. Компилируем Caffe с поддержкой GPU.

3. Конфигурируем сеть, отталкиваясь от примеров.

4. Тренируем, смотрим на результат тестовой выборки.

5. Если результат не устраивает- настраиваем и тренируем до получения достаточного результата

6. Для использования натренированной сети для одиночныхИзображений необходимо написать конфиг и воспользоваться C++, Python или Mathlab.

Page 21: CUDA & CAFFE

References

1. L. Deng and D. Yu, "Deep Learning: Methods and Applications“ http://research.microsoft.com/pubs/209355/DeepLearning-NowPublishing-Vol7-

SIG-039.pdf2. ConvNet configuration by Krizhevsky et alhttp://books.nips.cc/papers/files/nips25/NIPS2012_0534.pdf3. Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster http://parse.ele.tue.nl/education/cluster24. http://www.cs.toronto.edu/~ranzato/research/projects.html5. http://www.amolgmahurkar.com/classifySTLusingCNN.html

Спасибо за внимание !