Defeating the Black Box – Neural Networks in HEP Data Analysis
description
Transcript of Defeating the Black Box – Neural Networks in HEP Data Analysis
![Page 1: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/1.jpg)
Defeating the Black Box – Neural Networks in HEP Data Analysis
Jan Therhaag (University of Bonn)TMVA Workshop @ CERN, January 21st, 2011
TMVA on the web: http://tmva.sourceforge.net/
![Page 2: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/2.jpg)
2 2 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 2 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
![Page 3: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/3.jpg)
3 3 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 3 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
•The single neuron as a classifier
•Network training and regularization
•Advanced topics: The Bayesian appraoch
![Page 4: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/4.jpg)
4 4 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 4 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
x1
x2
The Problem …
![Page 5: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/5.jpg)
5 5 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 5 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
The single neuron as a classifier
![Page 6: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/6.jpg)
6 6 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 6 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
y =NP
i=1wi xi + w0
• Code the classes as a binary variable
(here: blue = 0, orange = 1)
• Perform a linear fit to this discrete function
• Define the decision boundary by
f x : y = wT x = 0:5g
y 2 f0;1g
x1
x2
y < 0:5
y > 0:5
A simple approach:
//######################################################################################//TMVA code//######################################################################################
//create FactoryTMVA::Factory *factory =
new TMVA::Factory(“TMVAClassification”,outputfile,”AnalysisType=Classification”)
factory->AddVariable(“x1”,”F”);factory->AddVariable(“x2”,”F”);
//book linear discriminant classifier (LD)factory->BookMethod(TMVA::Types::kLD,”LD”);
Factory->TrainAllMethods();Factory->TestAllMethods();Factory->EvaluateAllMethods();
![Page 7: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/7.jpg)
7 7 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 7 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
• has values in [0,1] and can be interpreted as the probability p(orange | x)(then obviously p(blue | x) = 1- p(orange | x) = )
Now consider the sigmoid transformation: y 7! ¾(y) ´ 11+exp(¡ y)
7¡ !
y ¾(y)
¾(y)¾(¡ y)
![Page 8: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/8.jpg)
8 8 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 8 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
• is called the activity of the neuron, while is called the activation
We have just invented the neuron!
¾(y)
y
w0w1
wN
x1
xN
NPi=0
1
¾(y) ´ 11+exp(wT x)
w0w1
wN
x1
xN
NPi=0
1
¾(y)
y = wT x
y
![Page 9: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/9.jpg)
9 9 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 9 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
The idea of neuron training – searching the weight space
![Page 10: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/10.jpg)
10 10 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 10 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
• The training proceeds via minimization of the error function
• The neuron learns via gradient descent*
• Examples may be learned one-by-one (online learning) or all at once (batch learning)
• Overtraining may occur!
*more sophisticated techniques may be used
E (w) = ¡ Pn
[t(n) lnp(C1jx(n)) + (1¡ t(n)) lnp(C2jx(n))]
@E@wj
= ¡ Pn
(t(n) ¡ p(n))x(n)j
![Page 11: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/11.jpg)
11 11 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 11 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
Network training and regularization
![Page 12: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/12.jpg)
12 12 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 12 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
• The class of networks used for regression and classification tasks is called feedforward networks
• Neurons are organized in layers
• The output of a neuron in one layer becomes the input for the neurons in the next layer
yk(x;w) = ¾
0B@
MPj =0
w(2)kj ¾
à NX
i=0w(1)
j i xi
!
| {z }
1CA
zj
//######################################################################################//TMVA code//######################################################################################
//create FactoryTMVA::Factory *factory =
new TMVA::Factory(“TMVAClassification”,outputfile,”AnalysisType=Classification”)
factory->AddVariable(“x1”,”F”);factory->AddVariable(“x2”,”F”);
//book Multi Layer Perceptron(MLP) network and definde network architecturefactory->BookMethod(TMVA::Types::kMLP,”MLP”,”NeuronType=sigmoid:HiddenLayers=N+5,N”);
Factory->TrainAllMethods();Factory->TestAllMethods();Factory->EvaluateAllMethods();
![Page 13: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/13.jpg)
13 13 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 13 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
• Feedforward networks are universal approximators
• Any continuous function can be approximated with arbitratry precision
• The complexity of the output function is determined by the number of hidden units and the characteristic magnitude of the weights
yk(x;w) =MP
j =0w(2)
kj ¾Ã NX
i=0w(1)
j i xi
!
| {z }zj
z1z2z3
ytraining data
![Page 14: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/14.jpg)
14 14 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 14 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
• In order to find the optimal set of weights w, we have to calculate the derivatives
• Recall the single neuron:
• It turns out that:
From neuron training to network training - backpropagation
@E (w)@wi j
@E (w)@wk
= (yk ¡ tk)xk
@E (w)@wi j
= ±j zi
±k = yk ¡ tk
±j / Pk
wkj ±k
with for output neurons
and else
While input information is always propagated forward, errors are propagated backwards!
![Page 15: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/15.jpg)
15 15 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 15 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
• The error function has several minima, the result of the minimization typically depends on the starting values of the weights
• The scaling of the inputs has an effect on the final solution
Some issues in network training
NN with 10 hidden units
• Overtraining– bad generalization and overconfident
predictions
//######################################################################################//TMVA code//######################################################################################
//create FactoryTMVA::Factory *factory =
new TMVA::Factory(“TMVAClassification”,outputfile,”AnalysisType=Classification”)
factory->AddVariable(“x1”,”F”);factory->AddVariable(“x2”,”F”);
//book Multi Layer Perceptron(MLP) network with normalized input distributionsfactory->BookMethod(TMVA::Types::kMLP,”MLP”,”RandomSeed=1:VarTransform=N”);
Factory->TrainAllMethods();Factory->TestAllMethods();Factory->EvaluateAllMethods();
![Page 16: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/16.jpg)
16 16 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 16 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
• Early stopping: Stopping the training before the minimum of E(w) is reached– a validation data set is needed– convergence is monitored in TMVA
• Weight decay: Penalize large weights explicitly
Regularization and early stopping
~E (w) = E (w) + ¸wT w
NN with 10 hidden units and λ=0.02
//######################################################################################//TMVA code//######################################################################################
//create FactoryTMVA::Factory *factory =
new TMVA::Factory(“TMVAClassification”,outputfile,”AnalysisType=Classification”)
factory->AddVariable(“x1”,”F”);factory->AddVariable(“x2”,”F”);
//book Multi Layer Perceptron(MLP) network with regulariaztion factory->BookMethod(TMVA::Types::kMLP,”MLP”,”NCycles=500:UseRegulator”);
Factory->TrainAllMethods();Factory->TestAllMethods();Factory->EvaluateAllMethods();
![Page 17: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/17.jpg)
17 17 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 17 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
• Unless prohibited by computing power, a large number of hidden units H is to be preferred
– no ad hoc limitation of the model
• In the limits of , network complexity is entirely determined by the typical size of the weights
Network complexity vs. regularization
H ! 1
Out
put
![Page 18: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/18.jpg)
18 18 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 18 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
Network learning as inference and Bayesian neural networks
Advanced Topics
![Page 19: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/19.jpg)
19 19 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 19 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
• Reminder: Given the network output , the error function is just minus the log likelihood of the training data D
• Similarly, we can interpret the weight decay term as a log probability distribution for w
• Obviously, there is a close connection between the regularized error function and the inference for the network parameters
Network training as inference
y(x;w) = p(t = 1jw;x)
P (Djw) = exp(¡ E (w))
P (wj¸) = 1ZW D (¸ ) exp(¡ ¸wT w)
P (wjD;¸) = P (D jw)P (wj¸ )RP (D jw)P (wj¸ )dw
= 1Z ~E
exp(¡ ~E (w))
likelihood prior
normalization
![Page 20: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/20.jpg)
20 20 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 20 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
![Page 21: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/21.jpg)
21 21 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 21 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
• Minimizing the error corresponds to finding the most probable value which is used to make predictions
• Problem: Predictions for points in regions less populated by the training data may be to confident
Predictions and confidence
wM P
Can we do better?
P (t(N +1) jx(N +1);wM P )
![Page 22: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/22.jpg)
22 22 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 22 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
• Instead of using , we can also exploit the full information in the posterior
Using the posterior to make predictions
wM P
P (wjD;¸)
P (t(N +1) jx(N +1);D;¸) =RP (t(N +1) jx(N +1);w;¸)P (wjD;¸)dw
P (wjD;¸)
![Page 23: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/23.jpg)
23 23 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 23 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
• Instead of using , we can also exploit the full information in the posterior
wM P
P (wjD;¸)
P (t(N +1) jx(N +1);D;¸) =RP (t(N +1) jx(N +1) ;w;¸)P (wjD;¸)dw
Using the posterior to make predictions
P (wjD;¸)See Jiahang’s talk this afternoon for details of the Bayesian approach to NN in the TMVA framework!
![Page 24: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/24.jpg)
24 24 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 24 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
• In a full Bayesian framework, the hyperparameter(s) λ are estimated from the data by maximizing the evidence
– no test data set is needed– neural network tunes itself– relevance of input variables can be tested
(automatic relevance determination ARD)
• Simultaneous optimization of parameters and hyperparameters is technically challenging– TMVA uses a clever approximation
A full Bayesian treatment
model complexity
model complexity
P (Dj¸) = RP (Djw)P (w;¸)dw
![Page 25: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/25.jpg)
25 25 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 25 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
Summary (1)* A neuron can be understood as an extension of a linear classifier
* A neural net consists of layers of neurons, input information always propagates forward, errors propagate backwards
* Feedforward networks are universal approximators
* The model complexity is governed by the typical weight size, which can be controlled by weight decay or early stopping
* In the Bayesian framework, error minimization corresponds to inference and regularization corresponds to the choice of a prior for the parameters
* The Bayesian approach makes use to the full posterior and gives better predictive power
* The amount of regularization can be learned from the data by maximizing the evidence
![Page 26: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/26.jpg)
26 26 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 26 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
Summary (2)Current features of the TMVA MLP:
* Support for regression, binary and multiclass classification (new in 4.1.0 !)
* Efficient optional preprocessing (Gaussianization, normalization) of the input distributions
* Optional regularization to prevent overtraining+ efficient approximation of the posterior distribution of the network weights+ self adapting regulator+ error estimation
Future development in TMVA:
* Automatic relevance determination for input variables
* Extended automatic model (network architecture) comparison
Thank you!
![Page 27: Defeating the Black Box – Neural Networks in HEP Data Analysis](https://reader036.fdocuments.us/reader036/viewer/2022081422/568160ae550346895dcfcdc4/html5/thumbnails/27.jpg)
27 27 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 27 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis
References
Figures taken from:
David MacKay: “Information Theory, Inference and Learning Algorithms”Cambridge University Press 2003
Christopher Bishop: “Pattern Recognition and Machine Learning”Springer 2006
Hastie, Tibshirani, Friedman: “The Elements of Statistical Learning”, 2nd Ed.Springer 2009
These books are also recommended for further reading on neural networks