Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa...

25
Top Thinkshop-2 Nov. 10 -12, 2000 Pushpa Bh at 1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November 2000 A reasonable man adapts himself to the world. An unreasonable man persists to adapt the world to himself. So, all So, all progress depends on the unreasonable one. - Bernard Shaw
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa...

Page 1: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

1

Advanced Analysis Algorithms

for Top AnalysisPushpa Bhat

Fermilab

Top Thinkshop 2Fermilab, ILNovember 2000

A reasonable man adapts himself to the world.An unreasonable man persists to adapt the world to himself.So, all So, all progress depends on the unreasonable one.

- Bernard Shaw

Page 2: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

2

What do we gain?

b-tag efficiency in Run I: DØ ~20%, CDF ~53% b-tag efficiency in Run I: DØ ~20%, CDF ~53% But, DØ was able to measure the top quark mass But, DØ was able to measure the top quark mass with a precision approaching that of CDF by using with a precision approaching that of CDF by using multivariate techniques to separate signal and multivariate techniques to separate signal and background while minimizing the correlation of background while minimizing the correlation of the selection with the top quark mass.the selection with the top quark mass.

Page 3: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

3

Optimal Analysis MethodsThe new generation of experiments will be a lot more demanding than the previous in data handling at all stagesThe time-honored procedure of choosing and applying cuts on one event variable at a time is rarely optimal!The measurements being multivariate, the optimal methods of analyses are necessarily multivariateDiscriminant Analysis: Partition multidimensional variable space, identify boundaries between classes of objects Cluster Analysis: Assign objects to groups based on similarityRegression Analysis: Functional approximation/fitting

Page 4: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

4

Data Analysis TasksParticle Identification e-ID, -ID, b-ID, , q/g

Signal/Background Event Classification Signals of new physics are rare and small

(Finding a “jewel” in a hay-stack)

Parameter Estimation t mass, H mass, track parameters, for example

Function Approximation Correction functions, tag rates, fake rates

Data Exploration Data-driven extraction of information, latent structure analysis

Page 5: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

5

x1x1

x2x2

Why Multivariate Methods?

x1x1

x2x2

Because they are optimal!Because they are optimal!

D(x1,x2)=2.014x1 + 1.592x2D(x1,x2)=2.014x1 + 1.592x2

Page 6: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

6

Optimal Event Selection

b)p(b)(xp

s)p(s)|p(x

)x(bp

)x(sp

)(xr

b)p(b)(xp

s)p(s)|p(x

)x(bp

)x(sp

)(xr

defines decision boundariesdefines decision boundariesthat minimize the probabilitythat minimize the probabilityof misclassificationof misclassification

So, the problem mathematically reduces to that of calculating r(x), the Bayes Discriminant Function or probability densities

Posterior probabilityPosterior probability

s)|p(xb)(xp

s)|p(x

r1

r

)|( xsp

s)|p(xb)(xp

s)|p(x

r1

r

)|( xsp

Page 7: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

7

Probability Density EstimatorsHistogramming:

The basic problem of non-parametric density estimation is very simple! Histogram data in M bins in each of the d feature variables

Md bins Curse Of Dimensionality In high dimensions, we would either require a huge

number of data points or most of the bins would be empty leading to an estimated density of zero.

But, the variables are generally correlated and hence tend to be restricted to a sub-space Intrinsic Dimensionality

Page 8: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

8

Kernel-Based MethodsAkin to Histogramming but adopts importance sampling

Place in d-dimensional space a hypercube of side h centered on each data point x,

The estimate will have discontinuities

Can be smoothed out using different forms for kernel functions H(u). A common choice is a multivariate Gaussian kernel

N

n

n

d h

xxH

hNxp

1

11)(~

N

n

n

d h

xxH

hNxp

1

11)(~

N

n

n

d h

xx

hNxp

12

2

2/2 2

||exp

)2(

11)(~

N

n

n

d h

xx

hNxp

12

2

2/2 2

||exp

)2(

11)(~

N = Number of data points H(u) = 1 if xn in the hypercube = 0 otherwise

h=smoothingparameter

Page 9: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

9

Place a hyper-sphere centered at each data point x and allow the radius to grow to a volume V until it contains K data points. Then, density at x

If our data set contains Nk points in class Ck and N points in total, then

NV

Kxp )(

NV

Kxp )(

K nearest-neighbor Method

N = Number of data pointsN = Number of data points

VN

KCxp

k

kk )|(

VN

KCxp

k

kk )|(

KKkk = # of points in volume = # of points in volume

V for class CV for class Ckk

K

K

xp

CpCxPxCp kkk

k )(

)()|()|(

K

K

xp

CpCxPxCp kkk

k )(

)()|()|(

Page 10: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

10

Discriminant Approximation with Neural Networks

Output of a feed forward neural network can approximate the Bayesian posterior probability p(s|x,y)Directly without estimating class-conditional probabilities

x

y

),,( yxn

r

ryxspyxn

1),|(),,(

r

ryxspyxn

1),|(),,(

Page 11: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

11

Calculating the Discriminant

Consider the sum

i

iii dyxnyxE 2]),,([),,(

Where di = 11 for signal

= 00 for background = vector of parameters

Then

r

ryxspyxn

d

yxdE

1),|(),,(0

),,(

in the limit of large data samples and provided that the function n(x,y,) is flexible enough.

Page 12: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

12

NN estimates a mapping function without requiring a mathematical description of how the output formally depends on the input.

The “hidden” transformation functions, g, adapt themselves to the data as part of the training process. The number of such functions need to grow only as the complexity of the problem grows.

x1

x2

x3

x4

DNN

aijii

kjj

NN e1

1(a))};X({ D

- ggg

ij

k

Neural Networks

Page 13: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

13

Why are NN models powerful?

Neural networks are universal approximators

With a sufficiently large NN, you can approximate a function to arbitrary accuracy

Convergence of approximation is rapid

High dimensionality is not a curse any more!

Model complexity can be controlled by regularization

Extrapolate gracefully

Page 14: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

14

Also, they need to have optimal flexibility/complexity

x1

x2

)2sin(4.05.0)( xxh Mth Order Polynomial Fit

M=1 M=3 M=10

x1

x2

x1

x2

Simple Flexible Highly flexible

Page 15: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

15

The Golden Rule

Keep it simpleAs simple as possibleNot any simpler

- Einstein

Page 16: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

16

Measuring the Top Quark Mass

The DiscriminantsThe Discriminants

Discriminant variables shaded = topshaded = top DØDØ

Page 17: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

17

Background-rich

Signal-rich

Measuring the Top Quark MassMeasuring the Top Quark Mass

mt = 173.3 ± 5.6(stat.) ± 6.2 (syst.) GeV/c2mt = 173.3 ± 5.6(stat.) ± 6.2 (syst.) GeV/c2

DØ Lepton+jetsDØ Lepton+jets

Page 18: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Strategy for Discovering the Higgs Boson

at the Tevatron

P.C. Bhat, R. Gilmartin, H. Prosper, PRD 62 (2000) P.C. Bhat, R. Gilmartin, H. Prosper, PRD 62 (2000) hep-ph/0001152

Page 19: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

19

WH Results from NN AnalysisWH Results from NN AnalysisMMHH = 100 GeV/c = 100 GeV/c22

WH WH vs Wbb

Page 20: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

20

WH (110 GeV/c2) NN Distributions

Page 21: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

21

Results, Standard vs. NN

A good chance of discovery up to MH= 130 GeV/c2 with 20-30fb-1

Page 22: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

22

Improving the Higgs Mass Resolution

13.8% 12.2%

13.1% 11..3%

13%13% 11%11%

Use mjj and HT (= Etjets ) to train NNs to predict the Higgs boson mass

Page 23: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

23

Newer ApproachesEnsembles of Networks

Committees of Networks Performance can be better than the best single

network

Stacks of NetworksControl both bias and variance

Mixture of ExpertsDecompose complex problems

Page 24: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

24

Bayesian ReasoningThe Bayesian approach provides a well-founded mathematical procedure to make straight-forward and meaningful model comparisons. It also allows treatment of all uncertainties in a consistent manner.

Examples of useful applications: Fitting binned data to multi-source models PLB 407 (1997) 73

Extraction of solar neutrino survival probability PRL 81(1998) 5056

Mathematically linked to adaptive algorithms such as Neural Networks (NN)

Hybrid methods involving NN for probability density estimation and Bayesian treatment can be very powerful

Page 25: Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat

25

Summary

Multivariate methods have already made impact discoveries and precision measurements and will be the methods of choice in future analyses.

We have only scratched the surface in our use of advanced analysis algorithms.

Hybrid methods combining “intelligent” algorithms and probabilistic approach will be the wave of the future!