Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData...
Transcript of Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData...
![Page 1: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/1.jpg)
Introduction to the Data Challenge
Filip MorawskiNCAC PAS
Data Science School, Braga 25-27.03.2019
![Page 2: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/2.jpg)
Link to notebooks
https://cernbox.cern.ch/index.php/s/VSDpUpsavpmZR4A
![Page 3: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/3.jpg)
Link to the data
https://owncloud.ego-gw.it/index.php/s/nHXFIJrCvAoDWob
![Page 4: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/4.jpg)
Virtual Environmentcdmkdir -p .pythons/3.5cd .pythonswget https://www.python.org/ftp/python/3.5.3/Python-3.5.3.tgztar zxvf Python-3.5.3.tgzcd Python-3.5.3make clean./configure --prefix=$HOME/.pythons/3.5make -j4make installcd ..rm -rf Python-3.5.3*
![Page 5: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/5.jpg)
Virtual Environment
cd $HOMEmkdir python-virtualenvscd python-virtualenvs~/.pythons/3.5/bin/pyvenv myenv3.5
source ~/python-virtualenvs/myenv3.5/bin/activate
![Page 6: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/6.jpg)
Virtual Environment
pip install numpypip install scipypip install matplotlibpip install pandaspip install sklearnpip install h5pypip install tensorflowpip install keraspip install astropypip install gwpy
![Page 7: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/7.jpg)
LIGO/Virgo data
Main information in the LIGO/Virgo data is “strain”
Strain – relative change in distance
h=δ LL
![Page 8: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/8.jpg)
LIGO/Virgo dataData is stored in various formats but we focus only on the hdf5.
What we can find inside hdf5 tree?
1. meta – meta-data of the file like GPS times covered, which instrument etc…
2. quality – quality of each second of data (not important for you)
3. strain – main data collected by the interferometer
![Page 9: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/9.jpg)
Available data sets
You can find data at https://www.gw-openscience.orgFor the sake of the challenge we already downloaded and prepared the data for you. Stay tuned… :)
Two different types of data release:
1. Gravitational wave data surrounding discoveries
2. Data taken during a whole observation run
![Page 10: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/10.jpg)
Available data sets
![Page 11: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/11.jpg)
Data from scientific runs
In our challenge we are going to focus on the data from O1 – one of the two newest scientific runs.
What we can find inside beside astrophysical signal? Glitches!
![Page 12: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/12.jpg)
Glitches
Glitch is a transient noise event that might mimic real astrophysical signal and/or influence the quality of data.
Whistle Blip Gravitational Wave
![Page 13: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/13.jpg)
Your challenge is to classify variousglitches buried in the detector noise
![Page 14: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/14.jpg)
Data content
● 6667 labeled glitches from O1
● 22 classes
● unbalanced dataset (number of instances for each glitch varies)
● metadata + time-series
![Page 15: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/15.jpg)
Let’s have a look into the data
→ IntroductionData notebook
![Page 16: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/16.jpg)
The Challenge(s)
We offer two challenges:
1. Classification of glitches using metadata
2. Classification of glitches using strain in the form of time-series or images/spectrograms
![Page 17: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/17.jpg)
Classification of glitches using metadata
Create a classifier using only metadata:
● Input: 5 metaparameters (or more - feature engineering)
● Output: 22 classes
● Algorithms: neural networks, random trees, xgboost
![Page 18: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/18.jpg)
Classification of glitches using time-series/images
Create a classifier using time-series/images:
● Input: 1D time series or spectrograms (plus metadata ?)
● Output: 22 classes
● Algorithms: convolutional neural networks (1D or 2D), xgboost
![Page 19: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/19.jpg)
What is GWpy?
A python package for gravitational-waves analysis:http://gwpy.github.io
Depends on numpy, scipy, astropy and matplotlib.
Provides methods to access the data, process and visualize them.
![Page 20: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/20.jpg)
Signal processing with GWpy
The data you get at the beginning is just the raw signal. It might not be the best representation depending on the analysis you want to follow. You might consider processing it into more suitable form.
GWpy provides few signal processing methods like:
● bandpass, lowpass and highpass filtering,
● whitening.
![Page 21: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/21.jpg)
Data preprocessing
How can you prepare the data for the second challenge?
→DataPreprocessing Notebook
![Page 22: Introduction to the Data Challenge - LIP Indico (Indico) · 2019-03-25 · → IntroductionData notebook. The Challenge(s) We offer two challenges: 1. Classification of glitches using](https://reader033.fdocuments.us/reader033/viewer/2022050204/5f5766656769ef4b0e704539/html5/thumbnails/22.jpg)
Good luck!