Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T...
Transcript of Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T...
![Page 1: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/1.jpg)
Reproducible Machine Learning in Climate Science
ESoWC 2019
T
![Page 2: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/2.jpg)
We are a PhD Student and a Machine Learning
Engineer
![Page 3: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/3.jpg)
Our aim was to build a Python toolbox for working
with ML using weather and climate data
ML is hard for me and you,
Weather is tough for others too.
Interpretability makes us all fearful,
So we wrote some code to make it cheerful!
T
![Page 4: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/4.jpg)
We developed a modular and extensible pipeline to
apply machine learning to climate scienceData extensibility Experimental extensibility
G
![Page 5: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/5.jpg)
First results - using machine learning to predict
vegetation health
G
![Page 6: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/6.jpg)
Our initial experiments showed some skill!
G
Pers
iste
nce
EA L
STM
![Page 7: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/7.jpg)
We were strongly influenced by this talk
T
Reproducibility @ ICLR 2019
![Page 8: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/8.jpg)
Reproducibility is central to the ideals of scientific
research
T
![Page 9: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/9.jpg)
Open and Reproducible Science
T
"you can't stand on the shoulders of giants if they keep their shoulders
private"
Grus 2019
![Page 10: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/10.jpg)
We wanted to utilise the tools from Software
Engineering
T
![Page 11: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/11.jpg)
We wanted to allow other people (and future us) to
train, use and reproduce our models
T
![Page 12: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/12.jpg)
Unit testing allows us to be confident our pipeline
does what we expect
G
![Page 13: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/13.jpg)
We used type hints to better communicate what
functions did, and to leverage type checkers
G
![Page 14: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/14.jpg)
We use json configurations to keep track of what
experiments are run (WIP)
G
![Page 15: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/15.jpg)
Making scientific workflows fully reproducible is
hard...
T
● Documentation
● Instructions
● Unit Testing
● Experimental vs. Library code
● Source Control
● Parameters as Arguments
![Page 16: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/16.jpg)
Lessons learned:
T
● Infrastructure communicates what code does.
● The initial time investment is worth investing.
● There is a tension between experimenting quickly, and maintaining
well-tested robust code.
![Page 17: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/17.jpg)
Our Summer in Numbers● 131 commits to master branch
● 37 Github issues
● 87 Pull Requests
● +15,600 slack messages
● 18,544 lines of Python code
● 188 tests written
T
![Page 18: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/18.jpg)
● Work with forecast data● Test for different problems● ml_climate.readthedocs.io
documentation! ● Improve our VCI predictions
To the future ...
T
Usability Performance Analysis
![Page 19: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/19.jpg)
Let’s innovate together.
![Page 20: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/20.jpg)
Appendix
![Page 21: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/21.jpg)
We focused on predicting an agricultural drought
index in Kenya
T
![Page 22: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/22.jpg)
Lessons learned:
T
● The biggest benefit of all this infrastructure is easier communication of
what code is supposed to do
● A little overhead at the beginning of the project (setting up CI) reaps big
rewards later
● It is challenging to manage the tension between experimenting quickly,
but also keeping well-tested robust code
![Page 23: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/23.jpg)
G
![Page 24: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/24.jpg)
T
Our data sources include satellite data and model
outputs
![Page 25: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/25.jpg)
G
![Page 26: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/26.jpg)
G
![Page 27: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/27.jpg)
G
![Page 28: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/28.jpg)
G
![Page 29: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/29.jpg)
G
![Page 30: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/30.jpg)
● Persistence (previous month)● Linear Regression● Linear (classical) Neural Network● Recurrent Neural Network - LSTM● Entity Aware LSTM (Hydrology Specific Architecture!)
G
![Page 31: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/31.jpg)
G
![Page 32: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/32.jpg)
Incorporate static variables and dynamic variables
G
![Page 33: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/33.jpg)
Classic LSTM
Entity Aware LSTM
G
![Page 34: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/34.jpg)
G
![Page 35: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/35.jpg)
Initial Experiments
T
![Page 36: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/36.jpg)
Initial Experiments
T
![Page 37: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/37.jpg)
● We fit a number of different machine learning models.
● EALSTM seems to have performed significantly better than the other models.
Initial Results - EALSTM
![Page 38: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/38.jpg)
Initial Results
T
![Page 39: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/39.jpg)
Preliminary Results in April
T
![Page 40: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/40.jpg)
Preliminary Results in May
T
![Page 41: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/41.jpg)
EA
LSTM
Per
sist
ence
T
![Page 42: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/42.jpg)
Initial Results
T
![Page 43: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/43.jpg)
Initial Results
G
https://github.com/esowc/ml_drought/blob/master/notebooks/draft/15_gt_ealstm.ipynb
![Page 44: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/44.jpg)
Results - We do best in Cropland areas
![Page 45: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/45.jpg)
Appendix - Administrative Level performance
![Page 46: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/46.jpg)
T
![Page 47: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/47.jpg)
T
![Page 48: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/48.jpg)
![Page 49: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/49.jpg)
G
![Page 50: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/50.jpg)
● We were particularly interested in understanding the patterns being learnt by the model.
● We used the DeepSHAP implementation of DeepLIFT to interpret how an input data points affected a model’s final prediction
G
Model interpretability was a key component of our
pipeline
![Page 51: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/51.jpg)
● Persistence (previous month)● Linear Regression● Linear (classical) Neural Network● Recurrent Neural Network - LSTM● Entity Aware LSTM (Hydrology Specific Architecture!)
G
![Page 52: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/52.jpg)
● Model extras○ Surrounding Pixels○ One hot encoded month○ Spatial Climatology (for each month)○ Spatial Mean of input timesteps
In addition to the pixel wise climate values, we fed
the model a few additional inputs
G
![Page 53: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/53.jpg)
● Climate Indices.● Identify ‘runs’ of drought events.● Aggregate results by region or landcover.● Diagnose feature contributions (SHAP).● Calculate performance metrics (RMSE, R^2).● Plotting functions.
![Page 54: Reproducible Machine Learning in Climate Science · 2019-10-17 · Open and Reproducible Science T "you can't stand on the shoulders of giants if they keep their shoulders private"](https://reader034.fdocuments.us/reader034/viewer/2022042803/5f4c344cc6098b5f665df37b/html5/thumbnails/54.jpg)
● We were particularly interested in understanding the patterns being learnt by the model.
● We used the DeepSHAP implementation of DeepLIFT to interpret how an input data points affected a model’s final prediction
G