Machine Learning and Deep Learning with R

Machine Learning and Deep Learning with R

Poo Kuan Hoong

February 15, 2017

Deep Learning so far. . .

Introduction

I In the past 10 years, machine learning and Artificial Intelligence(AI) have shown tremendous progress

I The recent success can be attributed to:I Explosion of dataI Cheap computing cost - CPUs and GPUsI Improvement of machine learning models

I Much of the current excitement concerns a subfield of it called“deep learning”.

Human Brain

Neural Networks

I Deep Learning is primarily about neural networks, where anetwork is an interconnected web of nodes and edges.

I Neural nets were designed to perform complex tasks, such asthe task of placing objects into categories based on a fewattributes.

I Neural nets are highly structured networks, and have threekinds of layers - an input, an output, and so called hiddenlayers, which refer to any layers between the input and theoutput layers.

I Each node (also called a neuron) in the hidden and outputlayers has a classifier.

Neural Network Layers

Deep Learning

I Deep learning refers to artificial neural networks that arecomposed of many layers.

I It’s a growing trend in Machine Learning due to some favorableresults in applications where the target function is very complexand the datasets are large.

Deep Learning: Benefits

I RobustI No need to design the features ahead of time - features are

automatically learned to be optimal for the task at handI Robustness to natural variations in the data is automatically

learned

I GeneralizableI The same neural net approach can be used for many different

applications and data typesI Scalable

I Performance improves with more data, method is massivelyparallelizable

Deep Learning: Weaknesses

I Deep Learning requires a large dataset, hence long trainingperiod.

I In term of cost, Machine Learning methods like SVMs andother tree ensembles are very easily deployed even by relativemachine learning novices and can usually get you reasonablygood results.

I Deep learning methods tend to learn everything. It’s betterto encode prior knowledge about structure of images (or audioor text).

I The learned features are often difficult to understand. Manyvision features are also not really human-understandable (e.g,concatenations/combinations of different features).

I Requires a good understanding of how to model multiplemodalities with traditional tools.

Deep Learning: Applications

Deep Learning: R Libraries

I MXNet: The R interface to the MXNet deep learning library.I darch: An R package for deep architectures and restricted

Boltzmann machines.I deepnet: An R package implementing feed-forward neural

networks, restricted Boltzmann machines, deep belief networks,and stacked autoencoders.

I h2o: The R interface to the H2O deep-learning framework.

http://mxnet.io/api/r/index.html

https://github.com/maddin79/darch

https://mran.microsoft.com/package/deepnet/

https://github.com/h2oai/h2o-3

MXNet

I Founded by: Uni. Washington & Carnegie Mellon Uni (~1.5years old)

I Supports most OS: Runs on Amazon Linux, Ubuntu/Debian,OS X, and Windows OS

I State of the art model support: Flexible and efficient GPUcomputing and state-of-art deep learning i.e. CNN, LSTM to R.

I Ultra Scalable: Seamless tensor/matrix computation withmultiple GPUs in R.

I Ease of USe: Construct and customize the state-of-art deeplearning models in R, and apply them to tasks, such as imageclassification and data science challenges

I Multi-language: Supports the Python, R, Julia and Scalalanguages

I Ecosystem: Vibrant community from Academia and Industry

MXNet Architecture

I You can specify the Context of the function to be executedwithin. This usually includes whether the function should berun on a CPU or a GPU, and if you specify a GPU, which GPUto use.

MXNet R Package

I The R Package can be downloaded using the followingcommands:

install.packages("drat", repos="https://cran.rstudio.com")drat:::addRepo("dmlc")install.packages("mxnet")

Amazon & MXNet for Deep Learning

MNIST Handwritten DatasetI The MNIST database consists of handwritten digits.I The training set has 60,000 examples, and the test set has

10,000 examples.I The MNIST database is a subset of a larger set available from

NIST. The digits have been size-normalized and centered in afixed-size image

I For this demo, the Kaggle pre-processed training and testingdataset were used. The training dataset, (train.csv), has 42000rows and 785 columns.

Demo

I Sourcecode available here https://github.com/kuanhoong/deeplearning-malaysia

https://github.com/kuanhoong/deeplearning-malaysia

https://github.com/kuanhoong/deeplearning-malaysia

Result

Lastly. . .

Machine Learning and Deep Learning with R

Data & Analytics

Transcript of Machine Learning and Deep Learning with R