Defeating the Black Box – Neural Networks in HEP Data Analysis

27
Defeating the Black Box – Neural Networks in HEP Data Analysis Jan Therhaag (University of Bonn) TMVA Workshop @ CERN, January 21 st , 2011 TMVA on the web: http:// tmva.sourceforge.net /

description

Defeating the Black Box – Neural Networks in HEP Data Analysis. Jan Therhaag (University of Bonn) TMVA Workshop @ CERN, January 21 st , 2011. T MVA on the web: http:// tmva.sourceforge.net /. TexPoint fonts used in EMF. - PowerPoint PPT Presentation

Transcript of Defeating the Black Box – Neural Networks in HEP Data Analysis

Page 1: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

Defeating the Black Box – Neural Networks in HEP Data Analysis

Jan Therhaag (University of Bonn)TMVA Workshop @ CERN, January 21st, 2011

TMVA on the web: http://tmva.sourceforge.net/

Page 2: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

2 2 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 2 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

Page 3: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

3 3 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 3 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

•The single neuron as a classifier

•Network training and regularization

•Advanced topics: The Bayesian appraoch

Page 4: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

4 4 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 4 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

x1

x2

The Problem …

Page 5: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

5 5 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 5 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

The single neuron as a classifier

Page 6: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

6 6 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 6 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

y =NP

i=1wi xi + w0

• Code the classes as a binary variable

(here: blue = 0, orange = 1)

• Perform a linear fit to this discrete function

• Define the decision boundary by

f x : y = wT x = 0:5g

y 2 f0;1g

x1

x2

y < 0:5

y > 0:5

A simple approach:

//######################################################################################//TMVA code//######################################################################################

//create FactoryTMVA::Factory *factory =

new TMVA::Factory(“TMVAClassification”,outputfile,”AnalysisType=Classification”)

factory->AddVariable(“x1”,”F”);factory->AddVariable(“x2”,”F”);

//book linear discriminant classifier (LD)factory->BookMethod(TMVA::Types::kLD,”LD”);

Factory->TrainAllMethods();Factory->TestAllMethods();Factory->EvaluateAllMethods();

Page 7: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

7 7 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 7 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

• has values in [0,1] and can be interpreted as the probability p(orange | x)(then obviously p(blue | x) = 1- p(orange | x) = )

Now consider the sigmoid transformation: y 7! ¾(y) ´ 11+exp(¡ y)

7¡ !

y ¾(y)

¾(y)¾(¡ y)

Page 8: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

8 8 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 8 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

• is called the activity of the neuron, while is called the activation

We have just invented the neuron!

¾(y)

y

w0w1

wN

x1

xN

NPi=0

1

¾(y) ´ 11+exp(wT x)

w0w1

wN

x1

xN

NPi=0

1

¾(y)

y = wT x

y

Page 9: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

9 9 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 9 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

The idea of neuron training – searching the weight space

Page 10: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

10 10 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 10 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

• The training proceeds via minimization of the error function

• The neuron learns via gradient descent*

• Examples may be learned one-by-one (online learning) or all at once (batch learning)

• Overtraining may occur!

*more sophisticated techniques may be used

E (w) = ¡ Pn

[t(n) lnp(C1jx(n)) + (1¡ t(n)) lnp(C2jx(n))]

@E@wj

= ¡ Pn

(t(n) ¡ p(n))x(n)j

Page 11: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

11 11 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 11 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

Network training and regularization

Page 12: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

12 12 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 12 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

• The class of networks used for regression and classification tasks is called feedforward networks

• Neurons are organized in layers

• The output of a neuron in one layer becomes the input for the neurons in the next layer

yk(x;w) = ¾

0B@

MPj =0

w(2)kj ¾

à NX

i=0w(1)

j i xi

!

| {z }

1CA

zj

//######################################################################################//TMVA code//######################################################################################

//create FactoryTMVA::Factory *factory =

new TMVA::Factory(“TMVAClassification”,outputfile,”AnalysisType=Classification”)

factory->AddVariable(“x1”,”F”);factory->AddVariable(“x2”,”F”);

//book Multi Layer Perceptron(MLP) network and definde network architecturefactory->BookMethod(TMVA::Types::kMLP,”MLP”,”NeuronType=sigmoid:HiddenLayers=N+5,N”);

Factory->TrainAllMethods();Factory->TestAllMethods();Factory->EvaluateAllMethods();

Page 13: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

13 13 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 13 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

• Feedforward networks are universal approximators

• Any continuous function can be approximated with arbitratry precision

• The complexity of the output function is determined by the number of hidden units and the characteristic magnitude of the weights

yk(x;w) =MP

j =0w(2)

kj ¾Ã NX

i=0w(1)

j i xi

!

| {z }zj

z1z2z3

ytraining data

Page 14: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

14 14 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 14 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

• In order to find the optimal set of weights w, we have to calculate the derivatives

• Recall the single neuron:

• It turns out that:

From neuron training to network training - backpropagation

@E (w)@wi j

@E (w)@wk

= (yk ¡ tk)xk

@E (w)@wi j

= ±j zi

±k = yk ¡ tk

±j / Pk

wkj ±k

with for output neurons

and else

While input information is always propagated forward, errors are propagated backwards!

Page 15: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

15 15 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 15 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

• The error function has several minima, the result of the minimization typically depends on the starting values of the weights

• The scaling of the inputs has an effect on the final solution

Some issues in network training

NN with 10 hidden units

• Overtraining– bad generalization and overconfident

predictions

//######################################################################################//TMVA code//######################################################################################

//create FactoryTMVA::Factory *factory =

new TMVA::Factory(“TMVAClassification”,outputfile,”AnalysisType=Classification”)

factory->AddVariable(“x1”,”F”);factory->AddVariable(“x2”,”F”);

//book Multi Layer Perceptron(MLP) network with normalized input distributionsfactory->BookMethod(TMVA::Types::kMLP,”MLP”,”RandomSeed=1:VarTransform=N”);

Factory->TrainAllMethods();Factory->TestAllMethods();Factory->EvaluateAllMethods();

Page 16: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

16 16 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 16 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

• Early stopping: Stopping the training before the minimum of E(w) is reached– a validation data set is needed– convergence is monitored in TMVA

• Weight decay: Penalize large weights explicitly

Regularization and early stopping

~E (w) = E (w) + ¸wT w

NN with 10 hidden units and λ=0.02

//######################################################################################//TMVA code//######################################################################################

//create FactoryTMVA::Factory *factory =

new TMVA::Factory(“TMVAClassification”,outputfile,”AnalysisType=Classification”)

factory->AddVariable(“x1”,”F”);factory->AddVariable(“x2”,”F”);

//book Multi Layer Perceptron(MLP) network with regulariaztion factory->BookMethod(TMVA::Types::kMLP,”MLP”,”NCycles=500:UseRegulator”);

Factory->TrainAllMethods();Factory->TestAllMethods();Factory->EvaluateAllMethods();

Page 17: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

17 17 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 17 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

• Unless prohibited by computing power, a large number of hidden units H is to be preferred

– no ad hoc limitation of the model

• In the limits of , network complexity is entirely determined by the typical size of the weights

Network complexity vs. regularization

H ! 1

Out

put

Page 18: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

18 18 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 18 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

Network learning as inference and Bayesian neural networks

Advanced Topics

Page 19: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

19 19 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 19 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

• Reminder: Given the network output , the error function is just minus the log likelihood of the training data D

• Similarly, we can interpret the weight decay term as a log probability distribution for w

• Obviously, there is a close connection between the regularized error function and the inference for the network parameters

Network training as inference

y(x;w) = p(t = 1jw;x)

P (Djw) = exp(¡ E (w))

P (wj¸) = 1ZW D (¸ ) exp(¡ ¸wT w)

P (wjD;¸) = P (D jw)P (wj¸ )RP (D jw)P (wj¸ )dw

= 1Z ~E

exp(¡ ~E (w))

likelihood prior

normalization

Page 20: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

20 20 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 20 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

Page 21: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

21 21 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 21 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

• Minimizing the error corresponds to finding the most probable value which is used to make predictions

• Problem: Predictions for points in regions less populated by the training data may be to confident

Predictions and confidence

wM P

Can we do better?

P (t(N +1) jx(N +1);wM P )

Page 22: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

22 22 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 22 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

• Instead of using , we can also exploit the full information in the posterior

Using the posterior to make predictions

wM P

P (wjD;¸)

P (t(N +1) jx(N +1);D;¸) =RP (t(N +1) jx(N +1);w;¸)P (wjD;¸)dw

P (wjD;¸)

Page 23: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

23 23 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 23 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

• Instead of using , we can also exploit the full information in the posterior

wM P

P (wjD;¸)

P (t(N +1) jx(N +1);D;¸) =RP (t(N +1) jx(N +1) ;w;¸)P (wjD;¸)dw

Using the posterior to make predictions

P (wjD;¸)See Jiahang’s talk this afternoon for details of the Bayesian approach to NN in the TMVA framework!

Page 24: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

24 24 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 24 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

• In a full Bayesian framework, the hyperparameter(s) λ are estimated from the data by maximizing the evidence

– no test data set is needed– neural network tunes itself– relevance of input variables can be tested

(automatic relevance determination ARD)

• Simultaneous optimization of parameters and hyperparameters is technically challenging– TMVA uses a clever approximation

A full Bayesian treatment

model complexity

model complexity

P (Dj¸) = RP (Djw)P (w;¸)dw

Page 25: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

25 25 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 25 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

Summary (1)* A neuron can be understood as an extension of a linear classifier

* A neural net consists of layers of neurons, input information always propagates forward, errors propagate backwards

* Feedforward networks are universal approximators

* The model complexity is governed by the typical weight size, which can be controlled by weight decay or early stopping

* In the Bayesian framework, error minimization corresponds to inference and regularization corresponds to the choice of a prior for the parameters

* The Bayesian approach makes use to the full posterior and gives better predictive power

* The amount of regularization can be learned from the data by maximizing the evidence

Page 26: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

26 26 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 26 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

Summary (2)Current features of the TMVA MLP:

* Support for regression, binary and multiclass classification (new in 4.1.0 !)

* Efficient optional preprocessing (Gaussianization, normalization) of the input distributions

* Optional regularization to prevent overtraining+ efficient approximation of the posterior distribution of the network weights+ self adapting regulator+ error estimation

Future development in TMVA:

* Automatic relevance determination for input variables

* Extended automatic model (network architecture) comparison

Thank you!

Page 27: Defeating the Black Box –                 Neural Networks in HEP Data Analysis

27 27 Top Workshop, LPSC, Oct 18–20, 2007 A. Hoecker: Multivariate Analysis with TMVA 27 CERN, Jan 21st, 2011 J. Therhaag – Neural Networks in HEP Data Analysis

References

Figures taken from:

David MacKay: “Information Theory, Inference and Learning Algorithms”Cambridge University Press 2003

Christopher Bishop: “Pattern Recognition and Machine Learning”Springer 2006

Hastie, Tibshirani, Friedman: “The Elements of Statistical Learning”, 2nd Ed.Springer 2009

These books are also recommended for further reading on neural networks