Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.
-
date post
21-Dec-2015 -
Category
Documents
-
view
218 -
download
0
Transcript of Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.
Bayesian Neural Networks
Pushpa BhatFermilab
Harrison ProsperFlorida State University
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
OutlineIntroductionBayesian LearningSimple Examples Summary
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
Multivariate Methods
Since the early 1990’s, we have used multivariate methods extensively in Particle Physics Some examples
Particle ID and signal/background discrimination Optimization of cuts for top quark discovery at DØ Precision measurement of top mass Searches for leptoquarks, technicolor, ..
Neural network methods have become popular due to ease of use, power and successful applications
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
Improve several aspects of analysisEvent selection
Triggering, Real-time Filters, Data Streaming
Event reconstruction Tracking/vertexing, particle ID
Signal/Background Discrimination Higgs discovery, SUSY discovery, Single top, …
Functional Approximation Jet energy corrections, tag rates, fake rates
Parameter estimation Top quark mass, Higgs mass, SUSY model parameters
Data Exploration Knowledge Discovery via data-mining Data-driven extraction of information, latent structure analysis
Why Multivariate Methods?
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
1x
2x
)ˆ,,( 21 xxy
Multi Layer Perceptron
A popular and powerful neural network model:
Fkjijii
kjj e1
1yx( F
-θθ
;) f
ij
k
ji
kj
Need to find ’s and ’s, the free parameters of the model
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
The Bayesian Connection
Output of a feed forward neural network can approximate the posterior probability P(s|x1,x2).
r
rxspxy
1)|()ˆ,(
1x
2x
)ˆ,,( 21 xxy
))P(|P(x
))P(|P(x )x |( 11
1ii CC
CCCP
)()|(
)()|(
bpbxp
spsxpr
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
The Top QuarkPost-Evidence, Pre-Discovery !
Fisher Analysis of tte channel
One candidate event (S/B)(mt = 180 GeV)
= 18 w.r.t. Z = 10 w.r.t WW
NN Analysis tt e+jets channeltt
W+jets
W+jetstt160 Data
P. Bhat, DPF94
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
Measuring the Top Quark MassDiscriminant variables
mt = 173.3 ± 5.6(stat.) ± 6.2 (syst.) GeV/c2
The DiscriminantsThe Discriminants
DØ Lepton+jetsDØ Lepton+jets
Fit performed in 2-D: (DLB/NN, mfit)
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
Higgs Discovery Reach The challenges are daunting! But using NN provides same reach with
a factor of 2 less luminosity w.r.t. conventional analysis Improved bb mass resolution & b-tag efficiency crucial
Run II Higgs study hep-ph/0010338 (Oct-2000)P.C.Bhat, R.Gilmartin, H.Prosper, Phys.Rev.D.62 (2000) 074022
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
Limitations of “Conventional NN”
The training yields one set of weights or network parametersNeed to look for “best” network, but avoid
overfitting
Heuristic decisions on network architectureInputs, number of hidden nodes, etc.
No direct way to compute uncertainties
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
Ensembles of Networks
NN1
NN2
NN3
NNM
X
y1
y2
y3
yM
)(xyayi
ii
Decision by averaging over many networks (a committee of networks) has lower error than that of any individual network.
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
Bayesian Learning
The result of Bayesian training is a posterior density of the network weights
P(w|training data) Generate a sequence of weights (network
parameters) in the network parameter space i.e., a sequence of networks. The optimal network is approximated by averaging over the last K points:
K
1knew
1),( kwxy
Ky
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
Bayesian Learning – 2Advantages
Less prone to over-fitting, because of Bayesian averaging.
Less need to optimize the size of the network. Can use a large network! Indeed, number of weights can be greater than number of training events!
In principle, provides best estimate of p(t|x)p(t|x)
DisadvantagesComputationally demanding!
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
Bayesian Learning – 3Computationally demanding because
The dimensionality of the parameter space is, typically, large.
There could be multiple maxima in the likelihood function p(t|x,w), or, equivalently, multiple minima in the error function E(x,w).
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
Bayesian Neural Networks – 1
Basic IdeaCompute
Then estimate p(t|xnew) by averaging over NNs
)|(
),(),|(),|(
xtp
wxwxtpxtwp
dwxtwpwxyxy ),|(),()( newnew
Likelihood Prior
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
Bayesian Neural Networks – 2
Likelihood
Where ti = 0 or 1 for background/signal
Prior
N
i
ti
ti
ii yywxtp1
11 )(),|(
),,(),()( baww 2GammaGaussian
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
Bayesian Neural Networks – 3Computational method
Generate a Markov chain (MC) of N points {w} from the posterior density p(w|x) and average over last K
Markov Chain Monte Carlo software from http://www.cs.toronto.edu/~radford/fbm.software.html by Radford Neal
K
1knew
1),( kwxy
Ky
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
Bayesian Neural Networks – 4
Treat sampling of posterior density as a problem in Hamiltonian dynamics
in which the phase space (p,q) is explored using Markov techniques
2
ii
pxtwp
qpHqp
exp),|(
)],(exp[),Pr(
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
A Simple ExampleSignal
ppbar tqb ( channel)
Backgroundppbar Wbb
NN Model (1, 15, 1)
MCMC5000 tqb + Wbb eventsUse last 20 networks in a
MC chain of 500.HT_AllJets_MinusBestJets(scaled)
Wbbtqb
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
A Simple Example Estimate of Prob(s|HT)
Blue dots:p(s|HT) = Htqb/(Htqb+HWbb)
Curves: (individual NNs)y(HT, wn)
Black curve:< y(HT, w) >
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
Example: Single Top SearchTraining Data
2000 events (1000 tqb- + 1000 Wbb-)Standard set of 11 variables
Network (11, 30, 1) Network (391391 parameters!)
Markov Chain Monte Carlo (MCMC)500 iterations, but use last 100 iterations 20 MCMC steps per iterationNN-parameters stored after each iteration10,000 steps~ 1000 steps / hour (on 1 GHz, Pentium III laptop)
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
Signal/Bkgd. Distributions
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
Weighting with NN output
Number of data events:
Create weighted histograms of variables
nnn
xbnxsnxd
bs
bs
)()()(
)()(
)()(
)()(
)()()(
zsnzf
xbxs
xsxy
dxxyxdzfzxi
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
Weighted Distributions
Magenta: Weighting signal only; Blue: Weighting signal & backgroundBlack: Un-weighted signal distribution
9/14/05 PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
Summary
Bayesian learning of neural networks takes us another step closer to realizing optimal results in classification (or density estimation) problems. It allows a fully probabilistic approach with proper treatment of uncertainties.
We have started to explore Bayesian neural networks and the initial results are promising, though computationally challenging.