Deep learning Tutorial - Part 2

Location:QuantUniversity MeetupJanuary 19th 2017Boston MA

Deep Learning : An introductionPart II

2016 Copyright QuantUniversity LLC.

Presented By:Sri Krishnamurthy, CFA, [email protected]

2

Introduction

Slides and Code will be available at: http://www.analyticscertificate.com/DeepLearning

http://www.analyticscertificate.com/DeepLearning

Author

- Analytics Advisory services- Custom training programs- Architecture assessments, advice and audits

4

• Founder of QuantUniversity LLC. and www.analyticscertificate.com

• Advisory and Consultancy for Financial Analytics• Prior Experience at MathWorks, Citigroup and

Endeca and 25+ financial services and energy customers.

• Regular Columnist for the Wilmott Magazine• Author of forthcoming book

“Financial Modeling: A case study approach” published by Wiley

• Charted Financial Analyst and Certified Analytics Professional

• Teaches Analytics in the Babson College MBA program and at Northeastern University, Boston

Sri KrishnamurthyFounder and CEO

http://www.analyticscertificate.com/

5

Quantitative Analytics and Big Data Analytics Onboarding

•Trained more than 500 students in Quantitative methods, Data Science and Big Data Technologies using MATLAB, Python and R

• Launched the Analytics Certificate Program in September▫New Cohort in March 2017

•Coming soon: Deep Learning and Cognitive computing Certificate!

6

•February 2017▫Apache Spark Lecture – Feb 3rd ▫Deep Learning Workshop – Boston – March 27-28▫Anomaly Detection Workshop – Boston – April 24-25

•March 2017▫Deep Learning Workshop – New York (Date TBD)

Events of Interest

7

•Neural Networks 101

•Multi-Layer Perceptron

•Convolutional Neural Networks

Recap

8

•AutoEncoders•Recurrent Neural Networks▫LSTM

Agenda for today

9

•Unsupervised Algorithms▫Given a dataset with variables , build a model that captures the

similarities in different observations and assigns them to different buckets => Clustering, etc.

▫Create a transformed representation of the original data=> PCA

Machine Learning

Obs1, Obs2,Obs3

etc.Model

Obs1- Class 1Obs2- Class 2Obs3- Class 1

10

•Supervised Algorithms▫Given a set of variables , predict the value of another variable in a

given data set such that

▫If y is numeric => Prediction▫If y is categorical => Classification

Machine Learning

x1,x2,x3… Model F(X) y

11

•Motivation1:

Autoencoders

1. http://ai.stanford.edu/~quocle/tutorial2.pdf

http://ai.stanford.edu/~quocle/tutorial2.pdf

12

https://blog.google/products/google-plus/saving-you-bandwidth-through-machine-learning/



13

•Goal is to have to approximate x • Interesting applications such as ▫Data compression▫Visualization▫Pre-train neural networks

Autoencoder

14

Demo in Keras1

1. https://blog.keras.io/building-autoencoders-in-keras.html2. https://keras.io/models/model/

https://blog.keras.io/building-autoencoders-in-keras.html

https://keras.io/models/model/

15

•Pretraining step: Train a sequence of shallow autoencoders, greedily one layer at a time, using unsupervised data.

•Fine-tuning step 1: train the last layer using supervised data•Fine-tuning step 2: use backpropagation to fine-tune the entire

network using supervised data

Autoencoders1


http://ai.stanford.edu/~quocle/tutorial2.pdf

Supervised learning

Cross-sectional▫Observations are independent▫Given X1----Xi, predict Y▫CNNs

Supervised learning

Sequential▫Sequentially ordered

▫Given O1---OT, predict OT+1

1 Normal

2 Normal

3 Abnormal

4 Normal

5 Abnormal

18

•Given : X1,X2,X3----XN

•Convert the Univariate time series dataset to a cross sectional Dataset

Time series modeling in Keras using MLPs

X1X2X3X4X5X6X7X8X9

X10X11X12X13X14X15

X YX1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X10

X10 X11X11 X12X12 X13X13 X14X14 X15

19

•Monthly data•Computational Intelligence in Forecasting•Source: http://irafm.osu.cz/cif/main.php?c=Static&page=download

Sample data

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 1050

200

400

600

800

1000

1200

1400

1600

1800

http://irafm.osu.cz/cif/main.php?c=Static&page=download

20

•Keras is a high-level neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation.

•Allows for easy and fast prototyping (through total modularity, minimalism, and extensibility).

•Supports both convolutional networks and recurrent networks, as well as combinations of the two.

•Supports arbitrary connectivity schemes (including multi-input and multi-output training).

•Runs seamlessly on CPU and GPU.

Keras

https://github.com/tensorflow/tensorflow

https://github.com/Theano/Theano

21

•Use 72 for training and 36 for testing• Lookback 1, 10• Longer the lookback, larger the network

Multi-Layer Perceptron

Size 8

Size 1

22

Demo

Train Score: 1972.20 MSE (44.41 RMSE)Test Score: 3001.77 MSE (54.79 RMSE)

Train Score: 2631.49 MSE (51.30 RMSE)Test Score: 4166.64 MSE (64.55 RMSE)

Lookback = 1 Lookback = 10

23

•Has 3 types of parameters▫W – Hidden weights▫U – Hidden to Hidden weights▫V – Hidden to Label weights

•All W,U,V are shared

Recurrent Neural Networks1


24

Where can Recurrent Neural Networks be used?1

1. http://karpathy.github.io/2015/05/21/rnn-effectiveness/

1. Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification).

2. Sequence output (e.g. image captioning takes an image and outputs a sentence of words).3. Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive

or negative sentiment). 4. Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in

English and then outputs a sentence in French).5. Synced sequence input and output (e.g. video classification where we wish to label each frame of

the video).

25

•Andrej Karpathy’s article▫http://karpathy.github.io/2015/05/21/rnn-effectiveness/

•Hand writing generation demo▫http://www.cs.toronto.edu/~graves/handwriting.html

Sample applications

http://karpathy.github.io/2015/05/21/rnn-effectiveness/




http://www.cs.toronto.edu/~graves/handwriting.html

26

Recurrent Neural Networks•A recurrent neural network can be thought of as multiple copies of

the same network, each passing a message to a successor. 1

•Backpropagation(computing gradient wrt all parameters of the network) which is process used to propagate errors and weights needs to be modified for RNNs due to the existence of loops

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

27

•BPTT begins by unfolding a recurrent neural network through time as shown in the figure.

•Training then proceeds in a manner similar to training a feed-forward neural network with backpropagation, except that the training patterns are visited in sequential order.

Back Propagation through time (BPTT)1

1. https://en.wikipedia.org/wiki/Backpropagation_through_time

https://en.wikipedia.org/wiki/Backpropagation

28

•Backpropagation through time (BPTT) for RNNs is difficult due to a problem known as vanishing/exploding gradient . i.e, the gradient becomes extremely small or large towards the first and end of the network.

•This is addressed by LSTM RNNs. Instead of neurons, LSTMs use memory cells 1

Addressing the problem of Vanishing/Exploding gradient

http://deeplearning.net/tutorial/lstm.html

http://deeplearning.net/tutorial/lstm.html

29

• Dataset of 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). • Reviews have been preprocessed, and each review is encoded as a sequence of word

indexes (integers). • For convenience, words are indexed by overall frequency in the dataset, so that for

instance the integer "3" encodes the 3rd most frequent word in the data. • The 2011 paper (see below) had approximately 88% accuracy

• See ▫ https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py▫ http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-network

s-python-keras/

▫ http://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf

Demo – IMDB Dataset

https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py

http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/

http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/

http://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf

30

Network

The most frequent 5000 words are chosen and mapped to 32 length vector

Sequences are restricted to 500 words; > 500 cut off ; < 500 pad

LSTM layer with 100 output dimensions

Accuracy: 84.08%

31

•Use 72 for training and 36 for testing• Lookback 1

Using RNNs for the CIF forecasting problem

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 1060

200

400

600

800

1000

1200

1400

1600

1800

32

Result

Train Score: 50.54 RMSETest Score: 65.34 RMSE

Lookback = 1

Train Score: 41.65 RMSETest Score: 90.68 RMSE

Lookback = 10

33

•Approach using Microsoft’s Cognitive Toolkit▫ https://gallery.cortanaintelligence.com/Tutorial/Forecasting-Short-Time-Series-with-LSTM-Neural-Networks-2 ▫ https://www.microsoft.com/en-us/research/product/cognitive-toolkit/model-gallery/

https://gallery.cortanaintelligence.com/Tutorial/Forecasting-Short-Time-Series-with-LSTM-Neural-Networks-2

https://www.microsoft.com/en-us/research/product/cognitive-toolkit/model-gallery/

34

Q&A

35

Thank you!Members & Sponsors!

Sri Krishnamurthy, CFA, CAPFounder and CEO

QuantUniversity LLC.

srikrishnamurthy

www.QuantUniversity.com

Contact

Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC.

https://www.linkedin.com/profile/view?id=6656253&authType=name&authToken=DaWh&pvs=pp

http://www.modelriskanalytics.com/

Deep learning Tutorial - Part 2

Software

Transcript of Deep learning Tutorial - Part 2