Multivariate Methods in HEP
description
Transcript of Multivariate Methods in HEP
![Page 1: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/1.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 1
Multivariate Methods in HEP
Pushpa Bhat Fermilab
![Page 2: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/2.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 2
Outline
• Introduction/History• Physics Analysis Examples• Popular Methods
• Likelihood Discriminants• Neural Networks• Bayesian Learning• Decision Trees
• Future• Issues and Concerns• Summary
![Page 3: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/3.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 3
Some History
• In 1990 most of the HEP community was skeptical towards use of multivariate methods, particularly so in case of neural networks (NN)• NN as a black box
Can’t understand weightsNonlinear mapping; higher order correlations Though mathematical function can’t explain in terms of physicsCan’t calculate systematic errors reliably
Uni-variate or “cut-based” analysis was the norm • Some were pursuing application of neural network methods to HEP
around 1990• Peterson, Lonnblad, Denby, Becks, Seixas, Lindsey, etc
• First AIHENP (Artificial Intelligence in High Energy & Nuclear Physics) workshop was in 1990.• Organizers included D. Perret-Gallix, K.H. Becks, R. Brun, J.Vermaseren. AIHENP metamorphosed into ACAT ten years later, in 2000
• Multivariate methods such as Fisher discriminants were in limited use.• In 1990, I began to pursue the use of multivariate methods, especially
NN, in top quark searches at Dzero.
![Page 4: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/4.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 4
Mid-1990’s
• LEP experiments had been using NN and likelihood discriminants for particle-ID applications and eventually for signal searches (Steinberger; tau-ID)
• H1 at HERA successfully implemented and used NN for triggering (Kiesling).
• Hardware NN was attempted at Fermilab at CDF• Fermilab Advanced Analysis Methods Group
brought CDF and DØ together for discussion of these methods and applications in physics analyses.
![Page 5: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/5.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 5
The Top QuarkPost-Evidence, Pre-Discovery !
Fisher Analysis of tte channel
One candidate event (S/B)(mt = 180 GeV)
= 18 w.r.t. Z = 10 w.r.t WW
NN Analysis tt e+jets channeltt
W+jets
W+jetstt160 Data
P. Bhat, DPF94
![Page 6: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/6.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 6
Cut Optimization for Top Discovery Feb. ‘95
Signal
BackgroundJan. ’95
(Aspen) cut
Mar. ’95Discovery cut
Contours: Possible NN cuts Feb. ‘95
Sig. Eff.
S/B (Feb-Mar, 95 -Discovery
Conventional cut)
S/B reach with 2-v NN analysisfor similar efficiency
(Jan, 95 –Aspen mtg.Conventional cut)
Neural Network Equi-probability Contour cuts from 2-variable analysis compared with conventional cuts used in Jan. ’95 and in Observation paper
P. Bhat, H.Prosper, E. AmidiD0 Top Marathon, Feb. ‘95
![Page 7: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/7.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 7
Measurement of the Top Quark Mass
Discriminant variables
mt = 173.3 ± 5.6(stat.) ± 6.2 (syst.) GeV/c2
The DiscriminantsThe Discriminants
DØ Lepton+jetsDØ Lepton+jets
Fit performed in 2-D: (DLB/NN, mfit)
Run I (1996) result with NN and likelihoodRecent (CDF+D0) mt measurement:
mt= 171.4 ± 2.1 Gev/c2
First significant physics result using multivariate methods
![Page 8: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/8.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 8
Higgs, the Holy Grail of HEPDiscovery Reach at the Tevatron
• The challenges are daunting! But using NN provides same reach with a factor of 2 less luminosity w.r.t. conventional analysis
• Improved bb mass resolution & b-tag efficiency crucial
Run II Higgs study hep-ph/0010338 (Oct-2000)P.C.Bhat, R.Gilmartin, H.Prosper, Phys.Rev.D.62 (2000) 074022
![Page 9: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/9.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 9
Then, it got easier
• One of the important steps in getting the NN accepted at the Tevatron experiments was to make the Bayesian connection.
• Another important message to drive home was “the maximal use of information in the event” for the job at hand
• Developed a random grid search technique that can be used as baseline for comparison
• Neural network methods now have become popular due to the ease of use, power and many successful applications
Maybe too easy??
![Page 10: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/10.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 10
Optimal Event Selection
x
r(x,y) = constant defines an optimaldecision boundary
r(x,y) = constant defines an optimaldecision boundary
Feature spaceFeature space
),|(
),|(
)()|,(
)()|,(),(
yxbp
yxsp
bpbyxp
spsyxpyxr
),|(
),|(
)()|,(
)()|,(),(
yxbp
yxsp
bpbyxp
spsyxpyxr
S = B =
Conventional cutsx x
y y
0
0
y
0y
x0
x
y
x
y
0x
0y
![Page 11: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/11.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 11
The NN-Bayesian Connection
Output of a feed forward neural network can approximate the posterior probability P(s|x1,x2).
r
rxspxy
1)|()ˆ,(
1x
2x
)ˆ,,( 21 xxy
))P(|P(x
))P(|P(x )x |( 11
1ii CC
CCCP
)()|(
)()|(
bpbxp
spsxpr
![Page 12: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/12.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 12
Limitations of “Conventional NN”
• The training yields one set of weights or network parameters• Need to look for “best” network, but avoid overfitting
• Heuristic decisions on network architecture• Inputs, number of hidden nodes, etc.
• No direct way to compute uncertainties
![Page 13: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/13.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 13
Ensembles of Networks
NN1
NN2
NN3
NNM
X
y1
y2
y3
yM
)(xyayi
ii
Decision by averaging over many networks (a committee of networks) has lower error than that of any individual network.
![Page 14: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/14.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 14
Bayesian Learning
• The result of Bayesian training is a posterior density of the network weights
P(w|training data) • Generate a sequence of weights (network
parameters) in the network parameter space i.e., a sequence of networks. The optimal network is approximated by averaging over the last K points:
K
1knew
1),( kwxy
Ky
![Page 15: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/15.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 15
Bayesian Learning – 2
• Advantages• Less prone to over-fitting• Less need to optimize the size of the network. Can use a
large network! Indeed, number of weights can be greater than number of training events!
• In principle, provides best estimate of p(t|x)p(t|x)
• Disadvantages• Computationally demanding!
• The dimensionality of the parameter space is, typically, large • There could be multiple maxima in the likelihood function p(t|
x,w), or, equivalently, multiple minima in the error function E(x,w).
![Page 16: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/16.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 16
Example: Single Top Search
• Training Data• 2000 events (1000 tqb- + 1000 Wbb-)• Standard set of 11 variables
• Network• (11, 30, 1) Network (391391 parameters!)
• Markov Chain Monte Carlo (MCMC)• 500 iterations, but use last 100 iterations • 20 MCMC steps per iteration• NN-parameters stored after each iteration• 10,000 steps• ~ 1000 steps / hour (on 1 GHz, Pentium III laptop)
![Page 17: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/17.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 17
Signal:tqb; Background:Wbb Distributions
Example: Single Top Search
![Page 18: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/18.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 18
![Page 19: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/19.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 19
Decision Trees
• Recover events that fail criteria in cut-based analyses• Start at first “node” with a fraction of the “training
sample” • Select best variable and cut with best separation to
produce two “branches ” of events, (F)ailed and (P)assed cut
• Repeat recursively on successive nodes• Stop when improvement stops or when too few events
are left • Terminal node is called a “leaf ” with purity =
Ns/(Ns+Nb)• Run remaining events and data through the tree to
derive results• Boosting DT:
• Boosting is a recently developed technique that improves any weak classifier (decision tree, neural network, etc)
• Boosting averages the results of many trees, dilutes the discrete nature of the output, improves the performance
DØ single topanalysis
![Page 20: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/20.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 20
Matrix Element MethodExample: Top mass measurement
• Maximal use of information in each event by calculating event-by-event signal and background probabilities based on the respective matrix element
x: reconstructed kinematic variables of final state objectsJES: jet energy Scale from Mw constraint
• Signal and background probabilities from differential cross sections
• Write combined likelihood for all events
• Maximize likelihood w.r.t. mtop, JES
![Page 21: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/21.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 21
Summary
• Multivariate methods are now used extensively in HEP data analysis
• Neural networks, because of their ease of use and power, are favorites for particle-ID and signal/background discrimination
• Bayesian neural networks take us one step closer to optimization
• Likelihood discriminants and Decision trees are becoming popular because they are easier to “defend” (no “black-box” stigma)
• Many issues remain to be addressed as we get ready to deploy the multivariate methods for discoveries in HEP
![Page 22: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/22.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 22
Nothing tends so much to the advancement of knowledge as the application of a new instrument - Humphrey Davy
No amount of experimentation can ever prove me right; a single experiment can prove me wrong. - Albert Einstein
![Page 23: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/23.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 23
CDF
CDF
DØ
DØDØ
Booster
World’s Highest Energy Laboratory
(for now)
![Page 24: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/24.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 24
Our Fancy New Toys
LHC Ring
SPS Ring
PS
Circumference = 27kmBeam Energy = 7.7 TeVLuminosity =1.65x1034 cm-2sec-1
Startup date: 2007
p p
LHC Magnet LHC Tunnel
TI 2TI 2
TI 8TI 8
The Large Hadron Collider
CMS
![Page 25: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/25.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 25
LHC Environment
14 TeV Proton Proton colliding beams
Parameter ValueBunch-crossing frequency 40 MHz
Average # of collisions / crossing
20
“interaction rate” ~109
Average # of charged tracks
1000
Radiation field severe
CMS Parameter ValueLevel-1 trigger rate 100 kHz
Mean time between triggers
10 sec
Trigger latency 3.2 sec
Solenoid field 4 T
![Page 26: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/26.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 26
CMS Silicon Tracker
Challenges
![Page 27: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/27.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 27
CMS Si Tracker
5.4 m
2,4
m
Inner Barrel & Disks
(TIB & TID)
PixelsOuter Barrel (TOB)
![Page 28: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/28.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 28
Lots of Silicon
214m2 of silicon sensors11.4 million silicon strips66 million pixels!
![Page 29: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/29.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 29
Si Tracker Challenges
• Large and complex system• 77.4 million total channels (out of a total of 78.2 M for
experiment)• Detector monitoring, data organization, data quality monitoring,
analysis, visualization, interpretation all daunting!
• Need to monitor every channel and make sure most of the detector is working at all times (live fraction of the detector and efficiencies bound to decrease with time)
• Need to verify data integrity and data quality for physics• Diagnose and fix problems ASAP• Keep calibration and alignment parameters current
![Page 30: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/30.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 30
Detector/Data Monitoring
• Monitor• Environmental variables
• Temperatures, coolant flow rates, interlocks, radiation doses
• Hardware status• Voltages, currents
• Channel Data• Readout states, Errors, missing data/channels, bad ID for
channel/modulemany kinds to be categorized and tracked and displayedshould be able to find rare problems/errors (with low
occurrence rate) that may corrupt data Problems (Rare problems may indicate a developing failure mode or hidden bad behavior)
Correlate problem/noisy channels with history, temperature, currents, etc.
![Page 31: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/31.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 31
Data Quality Monitoring
• Monitor• Raw Data
• Pedestals, noise, adc counts, occupancies, efficiencies• Processed high level objects
• Clusters, tracks, etc.• Evaluate thousands of histograms
• Can’t visually examine all• Automatically evaluate histograms by comparing to reference
histograms • Adaptive, efficient, find evolving patterns over time
• Quantiles? q-q plots/comparison instead of KS test?• A variety of 2D “heat” maps
• Occupancies, #of bad channels/module, #of errors/module, etc.
• Typical occupancy ~ 2% in strip tracker• 200,000 channels written out 100 times/sec
![Page 32: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/32.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 32
Module Assembly Precision
Example of a“Heat” map
![Page 33: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/33.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 33
Need smart approaches
• What are the best techniques for data-mining?• To organize data for analysis and data visualization
• complex geometry/addressing makes visualization difficult
• For finding problematic channels quickly, efficiently clustering, exploratory data-mining
• For finding anomalies, corrupt data, patterns of behaviorFeature-finding algorithms, superpose many events, time
evolution, spatial and temporal correlations
• Noise Correlations • Via correlation coefficients of defined groups• Correlate to history (time variations), environmental
variables
![Page 34: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/34.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 34
Data Visualization
• Based on hierarchical/geometrical structure of the tracker• Display every channel, attach objects/info to each
Sub-structuresLayers/ringsModulesReadout Chips
![Page 35: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/35.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 35
Multivariate Analysis Issues
• Dimensionality Reduction• Choosing Variables optimally without losing information
• Choosing the right method for the problem• Controlling Model Complexity• Testing Convergence• Validation
• Given a limited sample what is the best way?
• Computational Efficiency
![Page 36: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/36.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 36
Multivariate Analysis Issues
• Correctness of modeling• How do we make sure the multivariate modeling is
correct? • The data used for training or building PDEs represent reality.
Is it sufficient to check the modeling in the mapped variable? Pair-wise correlations? Higher order correlations?
• How do we show that the background is modeled well? How do we quantify the correctness of modeling?
• In conventional analysis, we normally look for variables that are well modeled in order to apply cuts
• How well is the background modeled in the signal region?
• Worries about hidden bias• Worries about underestimating errors
![Page 37: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/37.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 37
Sociological Issues
• We have been conservative in the use of MV methods for discovery.
• We have been more aggressive in the use of MV methods for setting limits.
• But discovery is more important and needs all the power you can muster!
• This is expected to change at LHC.
![Page 38: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/38.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 38
Summary
• The next generation of experiments will need to adopt advanced data mining and data analysis techniques
• Conventional/routine tasks such as alignment, detector performance and data quality monitoring and data visualization will be challenging and require new approaches
• Many issues regarding use of multivariate methods of data analysis for discoveries and measurements need to be addressed to make optimal use of data
![Page 39: Multivariate Methods in HEP](https://reader035.fdocuments.us/reader035/viewer/2022062221/56813d12550346895da6cf44/html5/thumbnails/39.jpg)
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 39
MV: Where can we use them?
• Almost everywhere since HEP events are multivariate• Improve several aspects of analysis
• Event selection• Triggering, Real-time Filters, Data Streaming
• Event reconstruction• Tracking/vertexing, particle ID
• Signal/Background Discrimination• Higgs discovery, SUSY discovery, Single top, …
• Functional Approximation• Jet energy corrections, tag rates, fake rates
• Parameter estimation• Top quark mass, Higgs mass, SUSY model parameters
• Data Exploration• Knowledge Discovery via data-mining• Data-driven extraction of information, latent structure analysis