Machine learning in ALICE - Nikhef...Machine learning in ALICE Rüdiger Haake 3 IML workshop Don’t...
Transcript of Machine learning in ALICE - Nikhef...Machine learning in ALICE Rüdiger Haake 3 IML workshop Don’t...
Machine learning in ALICEActivities in ALICE and heavy-ion physics
Rüdiger Haake (CERN)(05.04.2018)
Rüdiger Haake 2Machine learning in ALICE
Outline
This lecture consists of two parts
1) Overview on machine learning activities in ALICE● Jets● Particle identification● Charmed baryons● DQM● Fast simulations● ...
2) Hands-on sessionClassification and regression example on SWAN
Rüdiger Haake 3Machine learning in ALICE
IML workshop
Don’t miss the
2nd IML Machine Learning Workshop9-12 April 2018, CERN (Vidyo available)
● Tutorials on methods & concepts● Industry talks by Google, IBM, Yandex, …● Modern ML applications● …
https://indico.cern.ch/event/668017/
Rüdiger Haake 4Machine learning in ALICE
Motivation
See SpotMini opening a door:https://www.youtube.com/watch?v=aFuA50H9uek
Rüdiger Haake 5Machine learning in ALICE
Motivation
Amazing progress in the last years!Partly reached ‘superhuman’ abilities
Interest rising inphysics (and industry)
Rüdiger Haake 6Machine learning in ALICE
Motivation
Important: Machine Learning is not the solution for all our problems● Usually, one cannot just throw in data
and expect the algorithm to perform better than human even in awell-defined task● ML no replacement for domain knowledge● “Garbage in → Garbage out”● Still: algorithms, classifiers, and training
data need to be selected carefully● Also interesting:
Deep learning is not necessarily better than classic ML methods
xkcd.com
Rüdiger Haake 7Machine learning in ALICE
Heavy-ion collisions on one slide
● ALICE is the dedicated heavy-ion experiment at the LHC● Main objective in heavy-ion physics:
Quark-Gluon Plasma (QGP)
● hot & dense medium of deconfined quarks & gluons
● strongly interacting● collective expansion● created in high energy heavy-
ion collisions
With it, we hope to better understand the strong interaction & cosmological questions
Particle jets
Rüdiger Haake 9Machine learning in ALICE
Jets: Physics
● Conceptually, a jet is the final state of collimated hadrons that fragmented from a hard scattered parton
● Jets can be used to shed light on the very early stage of a hadron collision
● There is no unambiguous jet definition● The reconstructed jet observable is defined by the jet
finding algorithm used to clusterize tracks and calorimeter clusters into jets
Rüdiger Haake 10Machine learning in ALICE
Jets: Physics
● pp collisions:Test pQCD, hadronization models etc.Use as reference for HI collisions
● p-Pb & Pb-Pb collisions:Use jets as calibrated probes for modification in mediumVacuum-behavior known from pp, pQCD
● In ALICE, we reconstruct● Charged jets (charged tracks)● Full jets (charged tracks + calorimeter clusters)
● Jets are embedded in background from heavy-ion collision● Mean background corrected for event-by-event● In-event fluctuations treated statistically
Rüdiger Haake 11Machine learning in ALICE
Jets: Reconstruction
● In ALICE, usually FastJet is used for jet finding
● Jet reconstruction algorithm clusters tracks / calorimeter hits into jets
● anti-kT algorithm yields rather
conical jets for higher transverse momenta
● Further cuts (area, pT) applied
● Jet finder is an example for machine learning
Rüdiger Haake 12Machine learning in ALICE
Jets: Reconstruction
● Jet finder is an example for machine learning
Unsupervised learning: Particles are classified to belong to certain classes (i.e. jets) – no real “learning” here
● In ALICE, usually FastJet is used for jet finding
● Jet reconstruction algorithm clusters tracks / calorimeter hits into jets
● anti-kT algorithm yields rather
conical jets for higher transverse momenta
● Further cuts (area, pT) applied
Rüdiger Haake 13Machine learning in ALICE
Jets: Machine learning applications
● Depending on the problem, jets can serve as input to● shallow learning algorithms (e.g. BDTs)● deep learning (neural networks)
Global event properties:Multiplicity, mean background,centrality, vertex z, ...
Per-jet properties:Jet mass, N-subjettiness,radial moment, other shapes...
Low-level per-jet properties:Constituent momenta, η, φ, ...
Other low-level properties:Reconstructed secondaryvertices, ...
High-level parameters
Low-level parameters
● Features need to have discrimination power on problem● In our case: Need good MC description of features
Rüdiger Haake 14Machine learning in ALICE
Jets: Machine learning applications
Typical applications for jets
Jet tagging/classification● q/g-jet tagging● b/c-jet tagging● W-jets vs. QCD jets● Multiclass jet classification
Regression of jet parameters● Background in heavy-ion jets
Rüdiger Haake 15Machine learning in ALICE
Jets: Jet images
● Motivation: Huge progress with convolutional neural networks in image recognition/classification
● Classify jets according to their pattern they leave in detector… in calorimeter cells… as charged particle tracks
● In 1407.5675, jet images are used for W-jet tagging
Preprocessing
Average jet imagebefore preprocessing
Average jet imageafter preprocessing
Discriminationbetween the
two populations
● Several approaches on jet images: CNNs, Locally-connected networks…
● Works, but have in mind: “Jets are no cats”
Rüdiger Haake 16Machine learning in ALICE
Jets: Jet images
● Motivation: Huge progress with convolutional neural networks in image recognition/classification
● Classify jets according to their pattern they leave in detector… in calorimeter cells… as charged particle tracks
● In 1407.5675, jet images are used for W-jet tagging
● Several approaches on jet images: CNNs, Locally-connected networks…
● Works, but have in mind: “Jets are no cats”
Might also work forQCD jet classification
Rüdiger Haake 17Machine learning in ALICE
Jets: Recurrent/recursive approaches
● Jet images exploit analogy to image classification● Also analogy to speech recognition possible● Exploit:
Sentence = sequence of wordsJet = sequence of constituents
● Good analogies are useful: We can build on progress in computer science
● A lot of research on text classification/understanding
● Like for text classification, recurrent networks promising● Interesting in this context:
Recursive networks whose topology changes event-by-event depending on jet finder combination history (1702.00748)
Rüdiger Haake 18Machine learning in ALICE
Jets: Heavy-flavor jet tagging
● Jets from b-/c-quarks interesting probes in heavy-ion collisions● In-medium modification of b-jets different to udsg-jets
● Larger energy loss for gluons than quarks (color charge)● “Dead cone effect”: For massive quarks, gluon bremsstrahlung
suppressed at smaller angles w.r.t. parton direction● Approach in ALICE: deep learning tagger● Exploit that B-hadrons decay in the
(sub-)millimeter range→ displaced from primary vertex→ reconstruct secondary vertices
● “Conventional” approach: Rectangular cuts on properties of most displaced vertices
● Ansatz here: Apply neural network to several low-level input parameters http://bartosik.pp.ua/hep_sketches/btagging
Rüdiger Haake 19Machine learning in ALICE
Jets: Heavy-flavor jet tagging
● Tagger uses two subnets which are merged● Secondary vertices● Track impact parameters
● 1D convolutional networks exploiting vertex or constituent relations
● Each subnet optimized via grid search, separately:
“Clever brute force”● Powerful concept:
● Find suitable designs for available discriminators
● Optimize separately● Merge with neural net
Rüdiger Haake 20Machine learning in ALICE
Jets: Background approximation
● Example for regression for jets: Background approximation● In heavy-ion collisions, background strongly affects measurement● General ansatz: Subtract mean background per event, correct for
fluctuations statistically● In-event fluctuations are probed
by random cones
● Distribution of fluctuations δpT
is bgrd.-subtracted transverse momentum in cones
● For jet spectra measurements, fluctuations can be unfolded
Idea: Use neural network to approximate background under each jet
Rüdiger Haake 21Machine learning in ALICE
Jets: Background approximation
● Use neural network to approximate background under each jet● In contrast to jet constituents, background is uncorrelated from jet
→ Neural network might be able to estimate background● Possible input parameters: Jet constituent η, φ, p
T
● Again: Major concern is the need for good training data● Here, this means jets & background must be realistic● Training data: Toy model
● Monte Carlo jets from PYTHIA (real physics)● Random background including flow (toy), parameters taken from
real data
● First results promising with simple neural networks● Might eventually allow measurement of jets at lower transverse
momentum
Particle identification
Rüdiger Haake 23Machine learning in ALICE
PID: Motivation
● In tracking, usually transverse momenta measured● Particle identification needs further measurement● Information on particle species often important for physics
analysis:● Production of pions, kaons, protons and their modification in
heavy-ion collisions● Heavy-flavor physics (D, B-mesons, b-jets)● Neutron pion production● Photon production● Particle composition in jets● …
ALICE uses a variety of different detectors to gain complementary information on particles
Rüdiger Haake 24Machine learning in ALICE
PID: Subdetectors in ALICE
ITS: Inner Tracking System (silicon detectors)
TPC: Time Projection Chamber (gas detector)
TOF: Time-of-Flight detector
TRD: Transition Radiation Detector
HMPID: High-momentum particle identification
Calorimeters: EMCal and PHOS (Pb-scint. and PbWO
4 calorimeters)
Energy loss dE/dx
Rüdiger Haake 25Machine learning in ALICE
PID: Subdetectors in ALICE
ITS: Inner Tracking System (silicon detectors)
TPC: Time Projection Chamber (gas detector)
TOF: Time-of-Flight detector
TRD: Transition Radiation Detector
HMPID: High-momentum particle identification
Calorimeters: EMCal and PHOS (Pb-scint. and PbWO
4 calorimeters)
Energy loss dE/dx
Rüdiger Haake 26Machine learning in ALICE
PID: Subdetectors in ALICE
ITS: Inner Tracking System (silicon detectors)
TPC: Time Projection Chamber (gas detector)
TOF: Time-of-Flight detector
TRD: Transition Radiation Detector
HMPID: High-momentum particle identification
Calorimeters: EMCal and PHOS (Pb-scint. and PbWO
4 calorimeters)
Particle velocity
Rüdiger Haake 27Machine learning in ALICE
PID: Subdetectors in ALICE
ITS: Inner Tracking System (silicon detectors)
TPC: Time Projection Chamber (gas detector)
TOF: Time-of-Flight detector
TRD: Transition Radiation Detector
HMPID: High-momentum particle identification
Calorimeters: EMCal and PHOS (Pb-scint. and PbWO
4 calorimeters)
Transition radiation (electron ID)
Rüdiger Haake 28Machine learning in ALICE
PID: Subdetectors in ALICE
ITS: Inner Tracking System (silicon detectors)
TPC: Time Projection Chamber (gas detector)
TOF: Time-of-Flight detector
TRD: Transition Radiation Detector
HMPID: High-momentum particle identification
Calorimeters: EMCal and PHOS (Pb-scint. and PbWO
4 calorimeters)
Cherenkov radiation
Rüdiger Haake 29Machine learning in ALICE
PID: Subdetectors in ALICE
ITS: Inner Tracking System (silicon detectors)
TPC: Time Projection Chamber (gas detector)
TOF: Time-of-Flight detector
TRD: Transition Radiation Detector
HMPID: High-momentum particle identification
Calorimeters: EMCal and PHOS (Pb-scint. and PbWO
4 calorimeters)
Total energy
Rüdiger Haake 30Machine learning in ALICE
PID: Electron identification
● Electron identification using several subdetectors in ALICEMeasurement of nσ’s: How many standard deviations away from mean expected value
● MVA approach:Use Boosted Decision Tree (BDT) on nσ values, track properties
● Performance evaluated in pp. Soon: PbPb
No cut PID cut MVA cut
nσ distribution for electrons TPC:
Rüdiger Haake 31Machine learning in ALICE
PID: General identification task
● Ultimate goal: General particle identification which exploits all available information
Stage 1: Classifier works on cleaned, calibrated distributions, e.g. on nσ values
Stage 2: Classifier works on raw PID detector distributions
● Both cases would be very helpful to raise efficiency and purity
Crucial point: Monte Carlo productions● Training data needs to be as precise as possible● Monte Carlos often show a poor PID agreement, at least for some
particle species
Charmed baryons
Rüdiger Haake 33Machine learning in ALICE
Charmed baryons
● Like b- or c-jets, also charmed baryons are interesting probes for the QGP in heavy-ion collisions
● Λc baryons: reconstruction challenging
● Short lifetime = decay only slightly displaced from collision vertex
● Reconstruction in ALICE possible via hadronic decays:
Λ
c+ → p K- π+
(Also other decay modes measured)
Rüdiger Haake 34Machine learning in ALICE
Charmed baryons
General approach:1) Identify decay projects with
ALICE’ PID capabilities2) Exploit topological constraints
(displaced production vertex)3) Select candidates and extract
signal via invariant mass fit
Rüdiger Haake 35Machine learning in ALICE
Charmed baryons
● Instead of rectangular cuts on PID, topological cuts, etc., a BDT is trained to select candidates
● Input variables include kinematics of decay projects, topological properties and PID (all verified to be described well in MC)
MVADefault
● BDT response shows clear separation of data & background● Invariant mass distribution shows reduced background and
enhanced signal!● Systematic uncertainties: Result shown to be robust for several BDT
variations
Fast simulation
Rüdiger Haake 37Machine learning in ALICE
Fast simulation: GANs
● Simulation of expected physics in detector crucial for interpretation● Usually: Monte Carlo simulation
1) Event generation on particle level (e.g. PYTHIA)2) Reconstruction on detector level (e.g. GEANT)
● Both steps can be computationally expensive, especially for heavy-ion collisions
● Currently, huge amount of computing resources (~50%) used for simulations
● Computational costs for LHC run 3 and HL-LHC much worse:
Higher statistics in data→ need higher MC statistics!
Rüdiger Haake 38Machine learning in ALICE
Fast simulation: GANs
● General approach: Use fast simulationExample: Mix fully-reconstructed with only simply reconstructed signals
● But to prepare for HL-LHC (~2023), we need to save more (x100)● Promising ansatz: Generative models, realized as DNNs
● Variational Autoencoders (VAEs)● Generative Adversarial Networks (GANs)● …
● Generation of realistic samples according to training samples● A lot of research going on here, benefit from industry progress● Advantages over classic MC:
● Neural network inference much faster than reconstruction (x 105)● Parallel computing (GPU), not so much CPU-bound● Can use commercial infrastructure: GPU clusters, cloud
computing
Rüdiger Haake 39Machine learning in ALICE
Fast simulation: GANs
Generative Adversarial Network:Two networks trained simultaneously: Generator and discriminator
● In competition & cooperation, generator learns to create more and more realistic samples
● Several studies show that deep GANs are able to reproduce a very large feature space
Samples
Rüdiger Haake 40Machine learning in ALICE
Fast simulation: CaloGAN
In ALICE, proof-of-concept work is ongoing to use GANs● to simulate fully-reconstructed tracks with TPC● to perform detector reconstruction of particle-level data
Outlook what is possible: CaloGAN (GAN for ATLAS LAr calorimeter)Deposited
energy
Energy fractionin ith layer
Rüdiger Haake 41Machine learning in ALICE
Fast simulation: CaloGAN
In ALICE, proof-of-concept work is ongoing to use GANs● to simulate fully-reconstructed tracks with TPC● to perform detector reconstruction of particle-level data
Outlook what is possible: CaloGAN (GAN for ATLAS LAr calorimeter)Deposited
energy
Energy fractionin ith layer
● Works for complex segmented calorimeter
● Up to five orders of magnitude faster
● More work to be done until ready for production
DQM/QA
Rüdiger Haake 43Machine learning in ALICE
DQM/QA
● Takes place during data taking (DQM) and shortly after (QA)
● Needs a lot of resources● Even worse for LHC Run III,
when much more data will be recorded
● Usual approach: Experts give flags to recorded runs using DQM/QA histograms
● Data Quality Management (DQM) and Quality Assurance (QA) still a lot of work from experts
● Machine learning can help with several aspects, e.g. anomaly detection & automatic data classification
Rüdiger Haake 44Machine learning in ALICE
DQM/QA
● Current research approach uses more than 200 physics parameters from available QA parameters
● First tests show that automatic/assisted classification is possible● Also other approaches tested: GANs for anomaly detection,
LSTM autoencoders for time series prediction (inspired by ECG anomaly detection)
Classification: BadClassification: Good
Some pragmatic hints &
hands-on
Rüdiger Haake 46Machine learning in ALICE
Neural networks
Neural networks can be configured very differently
● Some settings can be chosen according to experience, e.g.:● Adam optimizer for deep networks● ReLU activation function
● There is a lot of knowledge available online for problems similar to those we face in physics
● Good settings are adapted to the problem
ArchitectureArchitectureCNNs, LSTMs, DenseCNNs, LSTMs, Dense
Activation functionActivation function
Loss functionLoss function
OptimizerOptimizer
Learning rateLearning rate
RegularizationRegularization
Rüdiger Haake 47Machine learning in ALICE
How to design a good model
Define your problem● Is it a regression or classification task?
Optimizer, loss function, activation function, etc. depend on this choice● In a classification task, do you need multiple classes or does binary
classification suffice?Binary classification might be easier to learn for the network than multi-class classification
● Will the problem only rely on high-level parameters?If yes, also different technique (like BDTs) can be consideredHigh-level parameters are e.g. jet mass, jet shapes. Low-level parameters e.g. constituents
Define your dataset● In case of classification, clearly define signal and background● In case of regression, be sure your regression parameter is well
defined and represents what you wantCrucial step, better put more effort here than less
● Which input features could potentially have discrimination power for your problem? Implement them
Rüdiger Haake 48Machine learning in ALICE
How to design a good model
Define your model(s)● Get inspired by similar problems, experiment with different designs● Once you found a suitable design → Perform grid search
Clever “brute force” trial of possible hyper parameters● Number of layers, neurons per layers …● Activation function, loss function, …
If suitable, combine several models on features● Useful, if the models work on distinct input features● Example: Combination of PID classifier models on TPC and TOF
might be useful
Rüdiger Haake 49Machine learning in ALICE
Control parameters
● To monitor the training progress, several control parameters exist● Loss: Is directly minimized in the training● Only for classification:
● Scores: Score distribution hint to model performance● ROC-Curve: Plots the true vs. false rate● AUC (Area Under (ROC) Curve): Performance
Keras output during training looks like this:Train on 38000 samples, validate on 38000 samplesEpoch 1/138000/38000 [==============================] - 107s 3ms/step - loss: 1.0340 - acc: 0.6808 - val_loss: 0.7285 - val_acc: 0.7342
Loss on training data:Good check thatsomething is learned.Should get lower
Loss on valid. data:Check that learning isgeneralizable on unseendata.
Rüdiger Haake 50Machine learning in ALICE
Control parameters
Example: b-jet tagging
Typical control plot:● Loss vs. training epoch● Gets lower during whole
training● Seems to reach plateau● This means:
More training does not help anymore
Rüdiger Haake 51Machine learning in ALICE
Control parameters
Example: b-jet tagging
Typical control plot:● ROC curve● Optimally, the curve were
always at 1→ Full efficiency with lowest
misclassification rates● Orange line indicates how
good guessing performed
Rüdiger Haake 52Machine learning in ALICE
Control parameters
Example: b-jet tagging
Typical control plot:● AUC= Integrated area under
the ROC curve● Models shows good
behavior → reaches plateau
Rüdiger Haake 53Machine learning in ALICE
Control parameters
Example: b-jet tagging
Typical control plot:● Score distribution● Good indicator how clear
the two classes can be distinguished
Rüdiger Haake 54Machine learning in ALICE
Hands-on sessions
● The tutorial will be done on SWAN in your browser● Prerequisites
● You need a web browser● You need a CERN account● You need a CERNBox space
(if never done before, login at https://cernbox.cern.ch/)
● When you click on the link below, notebooks are automatically copied to your CERNBox
https://swan004.cern.ch/hub/spawn?projurl=https://gitlab.cern.ch/rhaake/MLTutorial.git