Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python...

45
Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke Director of Big Data Credibly February 27, 2017

Transcript of Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python...

Page 1: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Python Machine Learning Step-by-Step:Modeling Financial Time Series Data

Reece Heineke

Director of Big DataCredibly

February 27, 2017

Page 2: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

What is Machine Learning?

Data PreparationOverviewPython ToolboxTrade Ideas to DataConclusion

Exploratory Data AnalysisOverviewScatter PlotPrincipal Component Analysis (PCA)Conclusion

Fitting ModelsOverviewModels and PipelinesLearning CurvesInterpretabilityConclusion

A Fitted Model

Page 3: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

What is Machine Learning?

1. Machine learning is a subfield of computer science thatprovides computers with the ability to learn without beingexplicitly programmed.

2. There are two sides to every machine learning problem:

2.1 The learning2.2 Model produced from the learning

Page 4: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

What is Machine Learning?

1. Machine learning is a subfield of computer science thatprovides computers with the ability to learn without beingexplicitly programmed.

2. There are two sides to every machine learning problem:

2.1 The learning2.2 Model produced from the learning

Page 5: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

What is Machine Learning?

1. Machine learning is a subfield of computer science thatprovides computers with the ability to learn without beingexplicitly programmed.

2. There are two sides to every machine learning problem:

2.1 The learning2.2 Model produced from the learning

Page 6: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

What is Machine Learning?

1. Machine learning is a subfield of computer science thatprovides computers with the ability to learn without beingexplicitly programmed.

2. There are two sides to every machine learning problem:

2.1 The learning

2.2 Model produced from the learning

Page 7: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

What is Machine Learning?

1. Machine learning is a subfield of computer science thatprovides computers with the ability to learn without beingexplicitly programmed.

2. There are two sides to every machine learning problem:

2.1 The learning2.2 Model produced from the learning

Page 8: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Data Preparation: Overview

I Review the Python software stack

I Motivate the problem

I Discuss some issues specific to time series modeling

Page 9: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Data Preparation: Overview

I Review the Python software stack

I Motivate the problem

I Discuss some issues specific to time series modeling

Page 10: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Data Preparation: Overview

I Review the Python software stack

I Motivate the problem

I Discuss some issues specific to time series modeling

Page 11: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Python Toolbox

1

1 Scientific Python by Eueung Mulyana

Page 12: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Trump2Cash

2

2 Trump2Cash GitHub Project

Page 13: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Input: Trump criticizes Toyota on Twitter

Page 14: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Output: Toyota stock opens lower

3

3 Toyota Stock on Yahoo Finance’s Interactive Chart

Page 15: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

WSJ Analysis of Trump Tweets

4

4 by Akane Otani and Shane Shifflett

Page 16: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

IPython: A Data Scientist’s Best Friend

Jupyter Notebook

Page 17: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Data Preparation: Conclusion

We now have a illustrative data set to work with

I Data set has 10 numeric dimensions: 9 inputs, 1 output

I Data set is large (˜400MB compressed)

Page 18: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Data Preparation: Conclusion

We now have a illustrative data set to work with

I Data set has 10 numeric dimensions: 9 inputs, 1 output

I Data set is large (˜400MB compressed)

Page 19: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Exploratory Data Analysis: Overview

I Covariance and Correlation Matrices

I Scatter plots

I Principal Component Analysis (PCA)

I Kernel PCA

Page 20: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Exploratory Data Analysis: Overview

I Covariance and Correlation Matrices

I Scatter plots

I Principal Component Analysis (PCA)

I Kernel PCA

Page 21: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Exploratory Data Analysis: Overview

I Covariance and Correlation Matrices

I Scatter plots

I Principal Component Analysis (PCA)

I Kernel PCA

Page 22: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Exploratory Data Analysis: Overview

I Covariance and Correlation Matrices

I Scatter plots

I Principal Component Analysis (PCA)

I Kernel PCA

Page 24: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Scatter Plot: What can we say about the data?

Page 25: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

scikit-learn Algorithm Cheat-Sheet: Just looking

5

5 scikit-learn Cheat-Sheet

Page 26: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Principal Component Analysis (PCA)

Page 27: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Kernel PCA with Radial Basis Function (RBF)

Page 28: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Exploratory Data Analysis: Conclusion

I Nonlinear relationship with (0, 9), (2, 9), (6, 9)

I All other dimensions are quite random

Page 29: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Exploratory Data Analysis: Conclusion

I Nonlinear relationship with (0, 9), (2, 9), (6, 9)

I All other dimensions are quite random

Page 30: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Fitting Models: Overview

I Scikit learn’s model and pipelines

I Illustrative learning curves

Page 31: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Fitting Models: Overview

I Scikit learn’s model and pipelines

I Illustrative learning curves

Page 32: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

scikit-learn Revisited

6

6 scikit-learn Cheat-Sheet

Page 33: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

scikit-learn Pipeline

7

7 Python Machine Learning by Sebastian Raschka

Page 34: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Holdout Method

8

8 Python Machine Learning by Sebastian Raschka

Page 35: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Cross-Validation

9

9 Python Machine Learning by Sebastian Raschka

Page 36: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Learning Curves: What does it tell us?

10

10 Python Machine Learning by Sebastian Raschka

Page 37: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Poor fit: Linear Regression even with (K)PCA

Page 38: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Good fits: SVR (RBF) and Decision Tree Learning Curves

Page 39: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Classic Overfitting: Random Forest Regressor

Page 40: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Decision Trees: Easy to understand

Page 41: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Fitting Models: Conclusion

I Support Vector Machine (SVR) with Radial Basis Function(RBF) Kernel has a higher accuracy

I Decision Tree is easier to understand

I Choice involves our own priors on the underlying structure

Page 42: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Fitting Models: Conclusion

I Support Vector Machine (SVR) with Radial Basis Function(RBF) Kernel has a higher accuracy

I Decision Tree is easier to understand

I Choice involves our own priors on the underlying structure

Page 43: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Fitting Models: Conclusion

I Support Vector Machine (SVR) with Radial Basis Function(RBF) Kernel has a higher accuracy

I Decision Tree is easier to understand

I Choice involves our own priors on the underlying structure

Page 44: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Second Half of Machine Learning: A Persistent Model

Jupyter Notebook

Page 45: Python Machine Learning Step-by-Step: Modeling Financial Time Series Data · 2017-10-08 · Python Machine Learning Step-by-Step: Modeling Financial Time Series Data Reece Heineke

Thanks for listening: Q&A

https://github.com/rheineke/time series modeling