Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

40
Slide 1 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai

Transcript of Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Page 1: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 1

Statistics for Experimental HEP

Kajari MazumdarTIFR, Mumbai

Page 2: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 2

Why do we do experiments?

• Whether a particular species of particle exist or not.

• If it does exist what are its properties, like, mass, lifetime, charge, magnetic moment, … .

• If of finite lifetime, then which particles does it decay to? What arethe branching fractions?Do the distribution of variables/parameters (energy, directions ofdaughter particles) agree with theoretical predictions?

• To probe what processes can occur and with what probability?(in a given experimental situation depending on initial particles and collision energy, of course)

To study some phenomenon X which could be:

Page 3: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 3

Template of an experiment

• Arrange for instances of X.• Record events that might be X.• Reconstruct the measurable quantities of visible particles.• Select events that could be X by applying cuts.• Histogram distributions of interesting variables compare with • Theorectical prediction(s, there may be several in the market.)

Confrontation of Theory with Experiment. Ask:• Is there any evidence for X or is the null hypothesis refuted?• Given X, what are the parameters involved in the model?• Are the results from the experiment compatible with the predictionsof X?

Page 4: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 4

Data analysis in particle physics Observe events of a certain type, which has variousuncertianties which are quantified in terms of probability:• theory is not deterministic,• measuremnets has various random errors• other limitations, like cost, time, ..

• Measure characteristics of each event (particle momenta, number of muons, energy of jets,...)• Theories (e.g. SM) predict distributions of these properties up to free parameters, e.g., , GF, MZ, s, mH, ...• Some tasks of data analysis:

Estimate (measure) the parameters;

Quantify the uncertainty of the parameter estimates;

Test the extent to which the predictions of a theory are in agreement with the data (→ presence of New Physics?)

Page 5: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 5

Everything is counting experiment in HEPwithin certain approximations

• To measure branching ratio or cross-sections, we count the number of events produced/ observed (account for inefficiency of observation).

• To measure the mass of a particle, we use histogram where the entry in each bin is a random (Poisson) process.

•This unpredictability is inherent in nature, driven by quantum mechanical consideration everything becomes probabilistic.

When we want to infer something about the probabilistic processes that produced the data, we need Statistics.

Page 6: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 6

What happens if there’s nothing ?

• In the final analysis, we may make approximations, take apragmatic approach, or do things acc. to convention. We need to have good understanding of foundationalaspects pf statistics.

• Even if your analysis finds no events, this is still useful information about the way the universe is built.

• Want to say more than: “We looked for X, we didn’t see it.”

• Need statistics – which can’t prove anything.

• “We show that X probably has a mass greater than..OR a coupling smaller than…”

Page 7: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 7

Statistical tools mostly used in particle physics

1. Monte Carlo simulation2. Likelihood methods to estimate parameters.3. Fitting of data.4. Goodness of fit.5. Toy Monte Carlo: to achieve a given level of confidence given data (Neyman construction).

Page 8: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 8

Theoretically, the distributions, perhaps with few unknown parameters, are beautiful and simple (this is not only a dogma, but a reality)• angular distributions may be flat or is a function of few trigonometric variables, like, sin Ө, cosӨ;• masses may be described by Cauchy (Breit-Wigner) function, • time distribution may be exponential, or exponential, modulated by sinusoidal oscillation (neutrino oscillation).

• But they are modified due to various complicated effects; e.g., higherorder perturbation effects may by one of the theoretical reasons. The detector effects (reconstruction and measurement processes), …

•A monte carlo simulation starts with a simple distribution and is putthrough repeated randomization to take into account of various unavoidable complications, to finally produce histograms!

This is both computer and man-power intensive.

Monte Carlo simulation

Page 9: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 9

Mathematical definition P(A) is a number obeying the Kolmogorov axioms

1)(

)()()(

0)(

2121

iAP

APAPAAP

AP Problem with mathematical definition :No information is conveyed by P(A) !

Concept of Probability

Probability of discrete variable x: P(x) = Lim N(x) / N (N ∞)e.g, coins, dice, cards, ..For continuous x Probability Density Function (pdf): P(x to x+dx) = P(x) dxe.g, parton (quark, gluon) density functions of proton

In particle physics, A1, A2: outcomes of a repeatable experiment, say, decays.

In frequentist’s interpretation P(Higgs boson exists) = either 0, OR, 1 but we don’t know which one is correct!

A random variable is a numerical characteristic assigned to an element of the sample space; can be discrete or continuous.

All As are considered equally likely.P(A) depends on A and the ensemble.

Page 10: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 10

Treatment of probability in a subjective way.

• In particle physics frequency interpretation often most useful,• but subjective probability can provide more natural treatment of non-

repeatable phenomena: eg., systematic uncertainties, probability that Higgs boson exists,...

P(A) is degree of belief that A is true.

Conditional probability probability of A, given B =

and similarly, probability of B given A=

But so so

If A, B are independent

Baysian interpretation

In Bayesian probability, assume in advance a probability that Higgs bosonexists and then interprete the data, taking into account all possibilities which can produce such a data.

Page 11: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 11

Frequentist Use of Bayes TheoremExample: Particle Identification: Particle types e,,,K,pDetector Signals: DCH,RICH,TOF,TRD (different subdetectors throughWhich particles pass and leave similar signatures.Probability of a signal in DCH to be due to e is determined by probability of an eto leave a detectable signal in DCH, probability of an electron to be produced in a reaction and also total probability of DCh to register signals due to different

particles:

)...()|()()|()()|()( PDCHPPDCHPePeDCHPDCHP

)()(

)|()|()(' eP

DCHP

eDCHPDCHePeP

Page 12: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 12

Cumulative density: prob. to have outcome less than or equal to x

Continuous variable

Prob. to find x within x+dx:Eg., Parton density function:f(x) is NOT a probability!

x must be found somewhere!

Page 13: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 13

Expectation valuesConsider continuous r.v. x with pdf f (x).

Define expectation (mean) value as

Notation (often): ~ “centre of gravity” of pdf.

For a function y(x) with pdf g(y),

(equivalent)

Variance:

Standard deviation:

~ width of pdf, same units as x, given by.

Define covariance cov[x,y] as

Correlation coeff.:

For x, y independent:

Reverse is not true!

Page 14: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 14

Correlation

Page 15: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 15

Statistics

• Population: includes all objects of interest large. Parameters (mean, standard deviation etc.) are associated with a population ().

• Sample: only a portion of population convenient, but comes with a cost. Statistic is associated with sample providing the characteristics or measure obtained from a sample.We compute statistics to estimate parameters.

Variables can be discrete, like, number of events, or continuous, like, mass of the Higgs boson.

• Mean sum of all values/ no. of values.• Median mid-point of data after being ranked in ascending order there are as many numbers above the median, as below.• Mode the most frequent number in the distribution.

Page 16: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 16

Basic Data description

• Weighted mean:

e.g., measurement of tracks using multiple hits.• Sample variance (unbiased estimator of population variance

Takes care of the fact that the sample mean is determined from same set of observations x .

Page 17: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 17

Distribution/pdf Example use in HEP

Binomial Branching ratio

Multinomial Histogram with fixed N

Poisson Number of events found

Uniform Monte Carlo method

Exponential Decay time

Gaussian Measurement error

Chi-square Goodness-of-fit

Cauchy Mass of resonance

Landau Ionization energy loss

Page 18: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 18

The Binomial

Random process with exactly 2 possible outcomes, order not important. Bernoulli process.Individual success probability p, total n trials and r successes.

rnr pprnr

npnrP

)1(

)!(!

!),;( r=0,1,2,. ..n; 1-p p q; 0 ≤p≤ 1

Eg., 1. Efficiency/Acceptance calculations. 2. Observe n number of W decays, out of which r are of type

Wp= branching ratio.

Multinomial distribution with m possible outcomes. For ith outcome, pi = success rate, all are failure, ni = random variable here, binomially distributed.

Eg. in a histogram with m bins, total no. of N entries, content of each bin is an independent random binomial variable .

Mean = np, variance = np(1-p)

Page 19: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 19

Poisson‘Events in a continuum’ number of events in data.Mean rate in time (or space) interval.Prob. of finding exactly n events in a given interval:

n =0,1,2,…; mean = , variance =

e.g., 1.cosmic muons reaching the lab., 2.Geiger Counter clicks, 3. Number of scattering events n with cross-section for a given luminosity ∫L dt, with

Exponential

x continuous, mean = , variance = 2

Proper decay time t of an unstable particle, life time =

Page 20: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 20

Two Poissons

2 Poisson sources, means 1 and 2 Combine samples:e.g. leptonic and hadronic decays of W.

Forward and backward muon pairs.Tracks that trigger and tracks that don’t . …

What you get is a Convolution:P( r )= P(r’; 1 ) P(r-r’; 2 )

Turns out this is also a Poisson with mean 1+2 !

Avoids lot of worry!

Slide 20

Signal and background, each independent Poisson variable, but total no. of observed events, is also Poisson distributed!

In actual experiment total number of observed events = expected signal+ estimated background

Page 21: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 21

Gaussianf =probability density for continuous, r.v. x, with mean = , variance = 2

Max. height = 1/ √ (2

• Height is reduced by factor of √e (~ 61%) at x = ± half width at half max.

- ∞ < x < ∞ ; - ∞ < < ∞ ; > 0

Special case of standard / normalised Gaussian:

If y ~ Gaussian with then x = (y- follows x)

• Probability of x to be within ± 0.6745

90% within 1.645 95% within 1.960 99% within 2.576 99.9% within

3.290

68.27% within 195.45% within 299.73% within 3

These numbers apply to Gaussians and only Gaussians!

Page 22: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 22

Central Limit Theorem: Why is the Gaussian Normal?

If a continuous random variable x is distributed acc. to any pdf with finite mean and variance, the sample mean on n observations of x will have a pdf which approaches Gaussian for large n.

• If xi is a set of independent variables of mean and variance 2, y= xi/N, for large N, tends to become Gaussian with mean and variance (1/N) 2 .

Connection among Gaussian, Binomial and Poisson distributions

Binomial Poisson

GaussianN∞ ∞

p 0, N∞, Np=

Page 23: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 23

For large variety of situations, if the experiment is repeated many times,if the value of the quantity is measured accurately, without any bias,the results are distributed acc. to Gaussian.

•Typically, we assume that the form of experimental resolution is Gaussian, which may not be the case quite often!Artificial enhancement of significance of observed deviations.

It is also important to estimate the magnitude of the error correctlyunder-estimation of errors by 50% 4 effect may be actually 2

Page 24: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 24

For a set of n Gaussian random variables, not necessarily indepdt,The joint pdf is a multivariate Gaussian:

)()~~(2

1

2/

1

)2(

1),;(

μxVμx

VVμx

ePN

V is the covariance matrix of x’s, symmetric, nxn, V_ii=Var(x_i),V_ij = <(x_i - i)(x_j – j)> =ρ_ij .i . jρ_ij = correlation coeff. for x_i and x_j, | ρ_ij | 2 ≤1.ρ_ij = 0 for x_i, x_j to be independent of each other.

Multidimensional Gaussian

Correlation coeff. : ρ= cov(x,y)/ x y,

Page 25: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 25

e yxyxyyxx yxyx

yx

yxyxyxP

/))((2/)(/)()1(2

1

2

22222

12

1

),,,,;,(

ee yyxx yx

yxyxyxyxP

2222 2/)(2/)(

2

1),,,;,(

With correlations

No correlation = 0

Each elliptical contour fixed probability

Page 26: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 26

Page 27: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 27

More on correlation between variables.

Covariance matrix plays very important role in propagation of error in changing variables, from x to y (in first order only!).-ve covariance anti-correlation.• Semi-axis of ellipse given by sq. root of eigen values of error matrix.

•The ellipse provides the likely range of x, y values and they lie ina region smaller than the rectangle defined by max of x’,y’ values. •For the case of 2-variables, the point X lies outside 1-s.d. ellipsoidwith probability 61%.

Page 28: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 28

Chi-squared

z = sum of squared discrepancies, scaled by expected error, n = 1,2, .. = no. of degrees of freedom; x_i : independent Gaussians. Used frequently to test goodness-of-fit.

n

i i

iix

1

2

2

Mean=n, Variance = 2nz is continuous random variable =

Confidence level is obtained by integrating the tail of the f distribution (from χ 2 upto ∞)

CL(χ 2 )=∫ f(z,n) dz

Cumulative distribution of χ 2 is useful in judging consistency of data with a model.

Since mean =n a reasonable experiment should get χ 2 ≈ n

Thus reduced χ 2 is a useful measure!

Page 29: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 29

Page 30: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 30

Page 31: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 31

About Estimation

TheoryData

Statistical

Inference

Theory

Data

Probability

Calculus

Given these distribution parameters, what can we

say about the data? Given this data, what can we say about the properties or parameters or correctness of

the distribution functions?

Having estimated a parameter of the theory, we need to provide the error in the estimation as well.

Page 32: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 32

What is an estimator?

An estimator is a procedure giving a value for a parameter or property of the distribution as a function of the actual data values

2)ˆ(1

}{ˆ i

ixNxV

i

ixNx

1}{̂

2

}{ˆ minmax xxx

A perfect estimator is consistent, unbiased and efficient.Often we have to deal with less than perfect estimator!

aaLimitN

ˆ

2ˆˆ)ˆ( aaaV Minimum Variance Bound

2

2 ln

1)ˆ(da

LdaV

adxdxaxPaxPaxPxxaa ...)...;();();(,...),(ˆ...ˆ 2132121

Page 33: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 33

The Likelihood FunctionSet of data {x1, x2, x3, …xN}:

• Each x may be multidimensional

•Probability depends on some parameter a. a may be multidimensional!

Total probability (density) The Likelihood

P(x1;a) P(x2;a) P(x3;a) …P(xN;a)=L(x1, x2, x3, …xN ;a)

Given data {x1, x2, x3, …xN} estimate a by maximising

the L.

a

Ln L

â

In practice usually maximise ln L as it’s easier to calculate and handle; just add the ln P(xi)

ML has lots of nice properties (eg., it is consistent and efficient for large N).

Page 34: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 34

ML does not give goodness of fit !• ML will not complain if your assumed

P(x;a) is rubbish• The value of L tells you nothing.• Normalisation of L is important.• Quote only the upper limit from

analysis.

Fit P(x)=a1x+a0 will give a1=0; constant P L= a0

N

Just like you get from fitting

eg., Lifetime distribution

pdf p(t;λ) = λ e -λt

So L(λ;t) = λ e –λt (single observed t) , here both t and λ are continuous

pdf maximises at t = 0 while L maximises at λ = t

. Functionalform of P(t) and L(λ) are different

Page 35: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 35

Lifetime distribution

pdf p(t;λ) = λ e -λt

So L(λ;t) = λ e –λt (single observed t)

Here both t and λ are continuous

pdf maximises at t = 0

L maximises at λ = t

. Functionalform of P(t) and L(λ) are different

Fixed tFixed

PL

λ t

Page 36: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 36

Least Squares• Measurements of y at various x with

errors and prediction f(x;a)• Probability• Ln L• To maximise ln L, minimise 2

22 2/));(( axfye

2);(

2

1

ii

ii axfy

χ 2 =

y

So ML ‘proves’ Least Squares.

• Should get 2 1 per data point.

Ndegrees Of Freedom=Ndata pts – N parametersProvides ‘Goodness of agreement’ figure which allows for credibility

Page 37: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 37

Chi Squared Results

Large 2 comes from1. Bad Measurements2. Bad Theory3. Underestimated errors4. Bad luck

Small 2 comes from1. Overestimated errors2. Good luck

Slide 37

Extended Maximum Likelihood:Allow the normalisation of P(x;a) to floatPredicts numbers of events as well as their distributionsNeed to modify LExtra term stops normalistion shooting up to infinity

Page 38: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 38

Variance of estimator

• One way to do this would be to simulate the entire experiment many times with a Monte Carlo program (use ML estimate for MC).

• Log-likehood method: expand around the maximum.

2nd term is zero.

To a good approximation

Since

Basically, increase θ, until ln L decreases by -1/2.

For least square estimator:

Page 39: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 39

Hypothesis Testing• Consider a set of measurements pertaining to a particular subset of events:

• xi may refer to number of muons in the events, the transverse energy of the leading jet, missing transverse energy and so on.

• refers to n-dim. joint pdf which depends on the type of event actually produced, eg.,

For each reaction we consider we will have a hypothesis for the pdf of , e.g.,f( | H0), f( |H 1), and so on, where, Hi refers to different possibilities.Say, H0 corresponds to Higgs boson, H1, 2, .. backgrounds.

Now, each event is a point in space, so we put a set of criteria/cuts, called Test statistics:

and work out the pdfs such that the sample space is divided into 2 regions, where we accept or reject H0 .

Page 40: Slide 0 Statistics for Experimental HEP Kajari Mazumdar TIFR, Mumbai.

Slide 40

Level of Significance and Efficiency

Probability to reject H0, if it is true: Significance level

To accept H0 when H1 is true Power of test = 1 -

Probability to accept a background event (background efficiency)

Probability to accept a signal event (signal efficiency)

Error of 1st kind

Error of 2nd kind

Purity of selected sample depends on the prior probabilities as well as the efficiencies.