Deep learning Tutorial - Part 2

35
Location: QuantUniversity Meetup January 19 th 2017 Boston MA Deep Learning : An introduction Part II 016 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy, CFA, CAP www.QuantUniversity.com [email protected]

Transcript of Deep learning Tutorial - Part 2

Page 1: Deep learning Tutorial - Part 2

Location:QuantUniversity MeetupJanuary 19th 2017Boston MA

Deep Learning : An introductionPart II

2016 Copyright QuantUniversity LLC.

Presented By:Sri Krishnamurthy, CFA, [email protected]

Page 2: Deep learning Tutorial - Part 2

2

Introduction

Slides and Code will be available at: http://www.analyticscertificate.com/DeepLearning

Author
Page 3: Deep learning Tutorial - Part 2

- Analytics Advisory services- Custom training programs- Architecture assessments, advice and audits

Page 4: Deep learning Tutorial - Part 2

4

• Founder of QuantUniversity LLC. and www.analyticscertificate.com

• Advisory and Consultancy for Financial Analytics• Prior Experience at MathWorks, Citigroup and

Endeca and 25+ financial services and energy customers.

• Regular Columnist for the Wilmott Magazine• Author of forthcoming book

“Financial Modeling: A case study approach” published by Wiley

• Charted Financial Analyst and Certified Analytics Professional

• Teaches Analytics in the Babson College MBA program and at Northeastern University, Boston

Sri KrishnamurthyFounder and CEO

Page 5: Deep learning Tutorial - Part 2

5

Quantitative Analytics and Big Data Analytics Onboarding

•Trained more than 500 students in Quantitative methods, Data Science and Big Data Technologies using MATLAB, Python and R

• Launched the Analytics Certificate Program in September▫New Cohort in March 2017

•Coming soon: Deep Learning and Cognitive computing Certificate!

Page 6: Deep learning Tutorial - Part 2

6

•February 2017▫Apache Spark Lecture – Feb 3rd ▫Deep Learning Workshop – Boston – March 27-28▫Anomaly Detection Workshop – Boston – April 24-25

•March 2017▫Deep Learning Workshop – New York (Date TBD)

Events of Interest

Page 7: Deep learning Tutorial - Part 2

7

•Neural Networks 101

•Multi-Layer Perceptron

•Convolutional Neural Networks

Recap

Page 8: Deep learning Tutorial - Part 2

8

•AutoEncoders•Recurrent Neural Networks▫LSTM

Agenda for today

Page 9: Deep learning Tutorial - Part 2

9

•Unsupervised Algorithms▫Given a dataset with variables , build a model that captures the

similarities in different observations and assigns them to different buckets => Clustering, etc.

▫Create a transformed representation of the original data=> PCA

Machine Learning

Obs1, Obs2,Obs3

etc.Model

Obs1- Class 1Obs2- Class 2Obs3- Class 1

Page 10: Deep learning Tutorial - Part 2

10

•Supervised Algorithms▫Given a set of variables , predict the value of another variable in a

given data set such that

▫If y is numeric => Prediction▫If y is categorical => Classification

Machine Learning

x1,x2,x3… Model F(X) y

Page 11: Deep learning Tutorial - Part 2

11

•Motivation1:

Autoencoders

1. http://ai.stanford.edu/~quocle/tutorial2.pdf

Page 13: Deep learning Tutorial - Part 2

13

•Goal is to have to approximate x • Interesting applications such as ▫Data compression▫Visualization▫Pre-train neural networks

Autoencoder

Page 14: Deep learning Tutorial - Part 2

14

Demo in Keras1

1. https://blog.keras.io/building-autoencoders-in-keras.html2. https://keras.io/models/model/

Page 15: Deep learning Tutorial - Part 2

15

•Pretraining step: Train a sequence of shallow autoencoders, greedily one layer at a time, using unsupervised data.

•Fine-tuning step 1: train the last layer using supervised data•Fine-tuning step 2: use backpropagation to fine-tune the entire

network using supervised data

Autoencoders1

1. http://ai.stanford.edu/~quocle/tutorial2.pdf

Page 16: Deep learning Tutorial - Part 2

Supervised learning

Cross-sectional▫Observations are independent▫Given X1----Xi, predict Y▫CNNs

Page 17: Deep learning Tutorial - Part 2

Supervised learning

Sequential▫Sequentially ordered

▫Given O1---OT, predict OT+1

1 Normal

2 Normal

3 Abnormal

4 Normal

5 Abnormal

Page 18: Deep learning Tutorial - Part 2

18

•Given : X1,X2,X3----XN

•Convert the Univariate time series dataset to a cross sectional Dataset

Time series modeling in Keras using MLPs

X1X2X3X4X5X6X7X8X9

X10X11X12X13X14X15

X YX1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X10

X10 X11X11 X12X12 X13X13 X14X14 X15

Page 19: Deep learning Tutorial - Part 2

19

•Monthly data•Computational Intelligence in Forecasting•Source: http://irafm.osu.cz/cif/main.php?c=Static&page=download

Sample data

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 1050

200

400

600

800

1000

1200

1400

1600

1800

Page 20: Deep learning Tutorial - Part 2

20

•Keras is a high-level neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation.

•Allows for easy and fast prototyping (through total modularity, minimalism, and extensibility).

•Supports both convolutional networks and recurrent networks, as well as combinations of the two.

•Supports arbitrary connectivity schemes (including multi-input and multi-output training).

•Runs seamlessly on CPU and GPU.

Keras

Page 21: Deep learning Tutorial - Part 2

21

•Use 72 for training and 36 for testing• Lookback 1, 10• Longer the lookback, larger the network

Multi-Layer Perceptron

Size 8

Size 1

Page 22: Deep learning Tutorial - Part 2

22

Demo

Train Score: 1972.20 MSE (44.41 RMSE)Test Score: 3001.77 MSE (54.79 RMSE)

Train Score: 2631.49 MSE (51.30 RMSE)Test Score: 4166.64 MSE (64.55 RMSE)

Lookback = 1 Lookback = 10

Page 23: Deep learning Tutorial - Part 2

23

•Has 3 types of parameters▫W – Hidden weights▫U – Hidden to Hidden weights▫V – Hidden to Label weights

•All W,U,V are shared

Recurrent Neural Networks1

1. http://ai.stanford.edu/~quocle/tutorial2.pdf

Page 24: Deep learning Tutorial - Part 2

24

Where can Recurrent Neural Networks be used?1

1. http://karpathy.github.io/2015/05/21/rnn-effectiveness/

1. Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification).

2. Sequence output (e.g. image captioning takes an image and outputs a sentence of words).3. Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive

or negative sentiment). 4. Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in

English and then outputs a sentence in French).5. Synced sequence input and output (e.g. video classification where we wish to label each frame of

the video).

Page 25: Deep learning Tutorial - Part 2

25

•Andrej Karpathy’s article▫http://karpathy.github.io/2015/05/21/rnn-effectiveness/

•Hand writing generation demo▫http://www.cs.toronto.edu/~graves/handwriting.html

Sample applications

Page 26: Deep learning Tutorial - Part 2

26

Recurrent Neural Networks•A recurrent neural network can be thought of as multiple copies of

the same network, each passing a message to a successor. 1

•Backpropagation(computing gradient wrt all parameters of the network) which is process used to propagate errors and weights needs to be modified for RNNs due to the existence of loops

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 27: Deep learning Tutorial - Part 2

27

•BPTT begins by unfolding a recurrent neural network through time as shown in the figure.

•Training then proceeds in a manner similar to training a feed-forward neural network with backpropagation, except that the training patterns are visited in sequential order.

Back Propagation through time (BPTT)1

1. https://en.wikipedia.org/wiki/Backpropagation_through_time

Page 28: Deep learning Tutorial - Part 2

28

•Backpropagation through time (BPTT) for RNNs is difficult due to a problem known as vanishing/exploding gradient . i.e, the gradient becomes extremely small or large towards the first and end of the network.

•This is addressed by LSTM RNNs. Instead of neurons, LSTMs use memory cells 1

Addressing the problem of Vanishing/Exploding gradient

http://deeplearning.net/tutorial/lstm.html

Page 29: Deep learning Tutorial - Part 2

29

• Dataset of 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). • Reviews have been preprocessed, and each review is encoded as a sequence of word

indexes (integers). • For convenience, words are indexed by overall frequency in the dataset, so that for

instance the integer "3" encodes the 3rd most frequent word in the data. • The 2011 paper (see below) had approximately 88% accuracy

• See ▫ https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py▫ http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-network

s-python-keras/

▫ http://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf

Demo – IMDB Dataset

Page 30: Deep learning Tutorial - Part 2

30

Network

The most frequent 5000 words are chosen and mapped to 32 length vector

Sequences are restricted to 500 words; > 500 cut off ; < 500 pad

LSTM layer with 100 output dimensions

Accuracy: 84.08%

Page 31: Deep learning Tutorial - Part 2

31

•Use 72 for training and 36 for testing• Lookback 1

Using RNNs for the CIF forecasting problem

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 1060

200

400

600

800

1000

1200

1400

1600

1800

Page 32: Deep learning Tutorial - Part 2

32

Result

Train Score: 50.54 RMSETest Score: 65.34 RMSE

Lookback = 1

Train Score: 41.65 RMSETest Score: 90.68 RMSE

Lookback = 10

Page 33: Deep learning Tutorial - Part 2

33

•Approach using Microsoft’s Cognitive Toolkit▫ https://gallery.cortanaintelligence.com/Tutorial/Forecasting-Short-Time-Series-with-LSTM-Neural-Networks-2 ▫ https://www.microsoft.com/en-us/research/product/cognitive-toolkit/model-gallery/

Page 34: Deep learning Tutorial - Part 2

34

Q&A

Page 35: Deep learning Tutorial - Part 2

35

Thank you!Members & Sponsors!

Sri Krishnamurthy, CFA, CAPFounder and CEO

QuantUniversity LLC.

srikrishnamurthy

www.QuantUniversity.com

Contact

Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC.