Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark...

38
Machine Learning Based Analysis of XENON1T Data Mark Almanza

Transcript of Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark...

Page 1: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Machine Learning Based Analysis of XENON1T Data

Mark Almanza

Page 2: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Dark Matter and XENON

Mark Almanza Machine Learning with XENON 2

Page 3: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Dark Matter

3

• Currently estimated to make up ∼80% of all matter in the universe

• Natural candidate in the search for physics beyond the Standard Model

Mark Almanza

Machine Learning with XENON

Page 4: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Evidence of Dark Matter

• The Bullet Cluster• Collision of two galactic clusters

• Baryonic and dark matter have been separated out

Mark Almanza

Machine Learning with XENON 4

Page 5: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

WIMPs

• Stands for Weakly Interacting Massive Particles

• Mass between ∼100 GeV/c² and 1 TeV/c²

• As a dark matter candidate, only interacts through gravity and the weak force

• Predicted by R-parity conserving supersymmetry and other theories meant to extend the Standard Model

Page 6: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

The XENON Experiment

• Search for WIMPs using a two phase Time Projection Chamber (TPC)

• Energetic collisions of external particles with LXe will produce scintillation light and ionization electrons

• Internal electric field drifts electrons to the surface of the LXe and strips them from the liquid into the gas

• These electrons collide with GXe atoms and produce a second scintillation

• Photomultiplier Tubes (PMTs) detect photons from S1 and S2

Page 7: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Mark Almanza Machine Learning with XENON 7

Type: s1Area fraction top: 0.13757609 Height: 11.386597 (pe) Rise time: 51.38015734927384 (ns)50% width: 72.8162409217615 (ns) 90% width: 160.66486101173257 (ns)

Type: s2Area fraction top: 0.6607365Height: 4.8036823 (pe)Rise time: 199.00130751597845 (ns)50% width: 180.18479896403795 (ns)90% width: 521.9549598591848 (ns)

Page 8: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

My Work

• Use machine learning to improve analysis of XENON data

• Two projects• S1- small S2 classification

• S2 position classification

Mark Almanza Machine Learning with XENON 8

Page 9: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

S1-Small S2 Classification

Mark Almanza Machine Learning with XENON 9

Page 10: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Peak Classification

• PAX (Processor for Analyzing Xenon) currently uses simple cuts in

• rise time

• width

• area

• area fraction top

• S1s and small S2s are not linearly separable in these features, leading to misclassifications

Mark Almanza Machine Learning with XENON 10

Page 11: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Machine Learning

• Currently a booming field with advances in computational power and theory

• Application of statistical methods to computer data analysis

• Work here done using Scikit-Learn, a common Python library

Page 12: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Decision Trees

• Simple machine learning algorithm

• Iteratively splits data to maximize information gain at each node

• Selects for features with high importance

• Not a strong learner

Page 13: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Random Forests

• Extension of the decision tree

• Creates n decision trees with random subsets of training data

• Chooses random subsets of features for each tree to use

• Takes majority vote of each tree as prediction

Page 14: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Adaptive Boosting (AdaBoost)

• Creates n copies of a base classifier

• Trains each classifier one at a time, giving higher weight to misclassified data at each step

• Final classifier is a weighted sum of all the base classifiers

Page 15: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Data Simulation

• Supervised learning requires tagged data to train

• Small S2s, from 1 to 5 electrons were simulated

• S1s were then simulated to obtain high area confusion

Page 16: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Feature Selection

• Due to the curse of dimensionality, it is prudent to select a smaller number of features to train a model

• With that in mind, five features from the PAX data were selected

Page 17: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Training a Classifier

• Used an Adaboost classifier with random forests as base classifiers

• 9-fold cross validation was used to estimate generalization performance

Page 18: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Mark Almanza Machine Learning with XENON 18

Page 19: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Mark Almanza Machine Learning with XENON 19

Page 20: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Mark Almanza Machine Learning with XENON 20

Width/Rise time space of pure PAX data

Page 21: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Mark Almanza Machine Learning with XENON 21

Page 22: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Results

• Model achieved a test accuracy of 99.90%

• Great improvement over PAX classification in recognizing S1s

Mark Almanza Machine Learning with XENON 22

Page 23: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Conclusion

• Strong learners can outperform human made cuts, even using similar features

• PAX peak classification could be improved by using these techniques

Page 24: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

S2 Position Classification

Mark Almanza Machine Learning with XENON 24

Page 25: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

S2 Position Classification

• S2 depth is currently determined only by pairing with an S1

• This puts a lower limit on the event energies observable in the detector as not all small S1s are detectible

Page 26: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Convolutional Neural Networks

• CNNs use small filters to extract patterns from data in a position independent way

• Let a neural net learn increasingly abstract features from data

Page 27: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Recurrent Neural Networks and LSTMs

• RNNs can learn temporal patterns in sequences

• LSTMs are a class of RNN that does not suffer the vanishing gradient problem

• They have a persistent internal state that can remember information for an arbitrary number of time steps

Page 28: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Constructing a Neural Network

• Used Keras, a Python API for creating neural networks built to use TensorFlow

• Other layers besides CNN and LSTM used to improve generalization of model

Page 29: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Training the Neural Net

• Used waveform data from simulated S2s with tagged z positions

• Normalized waveforms to vary between 0 and 1

• Converted z position into classes of inside the active region and outside it

Page 30: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Results

• Performance of the network could use improvement

• Nonetheless rejects far more of the outside S2s

Page 31: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Conclusion

• Method shows promise

• Further work could improve classification or produce a regression model to obtain z values precisely

Page 32: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Acknowledgements

Thank you to the NSF and the Nevis REU program, particularly Dr. Georgia Karagiorgi, Dr. John Parsons, and Amy Garwood.

Thank you to Dr. Elena Aprile for the opportunity to work with the XENON collaboration and thank you to Fei Gao, Joey Howlett, and Tianyu Zhu for sharing their expertise

Mark Almanza Machine Learning with XENON 32

Page 33: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

References

• [1] Jelle Aalbers and Christopher Tunnell. Processor for Analyzing XENON (PAX). https://github.com/XENON1T/pax. 2018.

• [2] François Chollet. Deep learning with Python. Shelter Island, NY: Manning Publications Co, 2018. isbn:978-1-61729-443-3.

• [3] François Chollet et al. Keras. https://keras.io. 2015.

• [4] XENON Collaboration et al. “The XENON1T Dark Matter Experiment”. In: The European Physical Journal C 77.12 (Dec. 2017). issn: 1434-6044, 1434-6052. doi: 10.1140/epjc/s10052-017-5326-3. arXiv: 1708.07051. url: http://arxiv.org/abs/1708.07051 (visited on 07/27/2018).

Mark Almanza Machine Learning with XENON 33

Page 34: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

References cont.

• [5] Guy Leshem. “Improvement of adaboost algorithm by using random forests as weak learner and using this algorithm as statistics machine learning for traffic flow prediction. Research proposal for a Ph. D”.In: Research proposal for a Ph. D. thesis, the Hebrew university of Jerusalem (2005).

• [6] Fabian Pedregosa et al. “Scikit-learn: Machine Learning in Python”. In: Journal of Machine Learning Research 12 (Oct. 2011), 2825 2830. issn: 1533-7928. url:http://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html (visited on 07/30/2018).

Mark Almanza Machine Learning with XENON 34

Page 35: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

References cont.

• [7] NASA/CXC/M Weiss. English: Composite image showing the galaxy cluster 1E 0657-56, better known as the bullet cluster. The image in background showing the visible spectrum of light stems from Magellan and Hubble Space Telescope images. The pink overlay shows the x-ray emission (recorded by Chandra Telescope) of the colliding clusters, the blue one represents the mass distribution of the clusters calculated from gravitational lens effects. Aug. 21, 2006. url:https://commons.wikimedia.org/wiki/File: 1e0657_scale.jpg#metadata (visited on 07/27/2018).

• [8]“Fig. 1. Predictive Models: Decision Stream vs. Decision Tree.” ResearchGate. Accessed August 1, 2018. https://www.researchgate.net/figure/Predictive-models-Decision-Stream-vs-decision-tree_fig1_316471270.

Mark Almanza Machine Learning with XENON 35

Page 36: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Backup Slides

Mark Almanza Machine Learning with XENON 36

Page 37: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

XENON1T

• Cryogenic system holds 3.2 t of LXe at -96°C

• TPC holds 2 t with a 1 t active region

• Top and bottom array hold a total of 248 PMTs

• Copper electrodes keep drift field constant

• PTFE side panels treated for reflectivity of 178 nm scintillation light

Page 38: Machine Learning Based Analysis of XENON1T Data › reu › 2018 › Talk_Mark_Almanza.pdfMark Almanza Machine Learning with XENON 35. Backup Slides Mark Almanza Machine Learning with

Decision Trees

• Simple machine learning algorithm

• Iteratively splits data to maximize information gain at each node

• Selects for features with high importance

• Prone to overfitting