Predicting and Controlling Ground Movements Around Deep Excavations
Productionizing Deep Learning From the Ground Up
-
Upload
odsc -
Category
Technology
-
view
1.134 -
download
0
Transcript of Productionizing Deep Learning From the Ground Up
![Page 1: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/1.jpg)
PRODUCTIONIZING DEEP LEARNING FROM
THE GROUND UPAdam Gibson
O P E ND A T AS C I E N C EC O N F E R E N C E_
BOSTON 2015@opendatasci
![Page 2: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/2.jpg)
Open DataSciCon May 2015
Productionizing Deep Learning
From the Ground Up
![Page 3: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/3.jpg)
Overview
● What is Deep Learning?● Why is it hard?● Problems to think about● Conclusions
![Page 4: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/4.jpg)
What is Deep Learning?Pattern recognition on unlabeled & unstructured data.
![Page 5: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/5.jpg)
What is Deep Learning?
● Deep Neural Networks >= 3 Layers● For media/unstructured data● Automatic Feature Engineering● Benefits From Complex Architectures● Computationally Intensive● Accelerates With Special Hardware
![Page 6: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/6.jpg)
Get why it’s hard yet?
![Page 7: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/7.jpg)
Deep Networks >= 3 Layers
● Backpropagation and Old School ANNs = 3
![Page 8: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/8.jpg)
Deep Networks
● Neural Networks themselves as hidden Layers
● Different Types of Layers can be Interchanged/stacked
● Multiple Layer Types, each with own Hyperparameters and Loss Functions
![Page 9: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/9.jpg)
What Are Common Layer Types?
![Page 10: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/10.jpg)
Feedforward
1.MLPs2.AutoEncoders3.RBMs
![Page 11: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/11.jpg)
Recurrent
1.MultiModal2.LSTMs3.Stateful
![Page 12: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/12.jpg)
Convolutional
Lenet: Mixes convolutional & subsampling layers
![Page 13: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/13.jpg)
Recursive/Tree
Uses a parser to form a tree structure
![Page 14: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/14.jpg)
Other kinds
● Memory Networks● Deep Reinforcement Learning● Adversarial Architectures● New recursive ConvNet variant to
come in 2016?● Over 9,000 Layers? (22 is already
pretty common)
![Page 15: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/15.jpg)
Automatic Feature Engineering
![Page 16: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/16.jpg)
Automatic Feature Engineering (TSNE)Visualizations are crucial:Use TSNE to render different kinds of data:http://lvdmaaten.github.io/tsne/
![Page 17: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/17.jpg)
deeplearning4j.org
presentation@
Google, Nov. 17 2014
“TWO PIZZAS SITTING ON A STOVETOP”
![Page 18: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/18.jpg)
Benefits from Complex Architectures
Google’s result combined:● LSTMs (learning captions) ● Word Embeddings ● Convolutional features from images
(aligned to be same size as embeddings)
![Page 19: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/19.jpg)
Computationally Intensive
● One iteration of ImageNet (1k label dataset and over 1MM examples) takes 7 hours on GPUs
● Project Adam● Google Brain
![Page 20: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/20.jpg)
Special Hardware required
Unlike most solutions, multiple GPUs are used today (Not common in Java-based stacks!)
![Page 21: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/21.jpg)
Software Engineering Concerns
● Pipelines to deal with messy data, not canned problems...(Real life is not Kaggle, people.)
● Scale/Maintenance (Clusters of GPUs aren’t done well today.)
● Different kinds of parallelism (model and data)
![Page 22: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/22.jpg)
Model vs Data Parallelism
● Model is sharding model across servers
(HPC style)● Data is mini batch
![Page 23: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/23.jpg)
Vectorizing unstructured data
● Data is stored in different databases● Different kinds of files (raw)● Deep Learning works well on mixed
signal
![Page 24: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/24.jpg)
Parallelism
● Model (HPC)● Data (Mini batch param averaging)
![Page 25: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/25.jpg)
Production Stacks today
● Hadoop/Spark not enough● GPUs not friendly to average
programmer● Cluster management of GPUs as a
resource not typically done● Many frameworks don’t work well in a
distributed env (getting better, though)
![Page 26: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/26.jpg)
Problems With Neural Nets● Loss functions● Scaling data● Mixing different neural nets● Hyperparameter tuning
![Page 27: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/27.jpg)
Loss Functions
● Classification● Regression● Reconstruction
![Page 28: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/28.jpg)
Scaling Data
● Zero mean and unit variance● Zero to 1● Other forms of preprocessing relative
to distribution of data● Processing can also be columnwise
(categorical?)
![Page 29: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/29.jpg)
Mixing and Matching Neural Networks
● Video: ConvNet + Recurrent● Convolutional RBMs?● Convolutional -> Subsampling -> Fully
Connected● DBNs: Different hidden and visible
units for each layer
![Page 30: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/30.jpg)
Hyperparameter tuning
● Underfit● Overfit● Overdescribe (your hidden layers)● Layerwise interactions● What activation function? (Competing?
Relu? Good ol’ Sigmoid?)
![Page 31: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/31.jpg)
Hyperparameter Tuning (2)
● Grid search for neural nets (Don’t do it!)
● Bayesian (Getting better. There are at least priors here.)
● Gradient-based approaches (Your hyper- parameters are a neural net, so there are neural nets optimizing your neural nets...)
![Page 32: Productionizing Deep Learning From the Ground Up](https://reader030.fdocuments.us/reader030/viewer/2022032620/55cad4c2bb61ebbf438b45f1/html5/thumbnails/32.jpg)
Questions?
Twitter: @agibsonccc Github: agibsoncccLinkedIn: /in/agibsoncccEmail: [email protected] (combo breaker!)Web: deeplearning4j.org