Statistics for HEP Roger Barlow Manchester University

24
Slide 1 Statistics for HEP Roger Barlow Manchester University Lecture 2: Distributions

description

Statistics for HEP Roger Barlow Manchester University. Lecture 2: Distributions. The Binomial. n trials r successes Individual success probability p. Variance V   ==- 2 =np(1-p). Mean ==  rP( r ) = np. Met with in Efficiency/Acceptance calculations. - PowerPoint PPT Presentation

Transcript of Statistics for HEP Roger Barlow Manchester University

Page 1: Statistics for HEP Roger Barlow Manchester University

Slide 1

Statistics for HEPRoger Barlow

Manchester University

Lecture 2: Distributions

Page 2: Statistics for HEP Roger Barlow Manchester University

Slide 2

0

0.2

0.4

0 1 2 3 4 5

r

The Binomialn trials r successesIndividual success probability

p

rnr pp

rnr

npnrP

)1(

)!(!

!),;(

Variance

V=<(r- )2>=<r2>-<r>2

=np(1-p)

Mean

=<r>=rP( r )

= np

1-p p q

Met with in Efficiency/Acceptance calculations

Page 3: Statistics for HEP Roger Barlow Manchester University

Slide 3

Binomial Examples

0

0.5

1

r

0

0.1

0.2

0.3

r

0

0.1

0.2

0.3

0.4

r

0

0.1

0.2

0.3

0.4

r

n=10 p=0.2 p=0.5 p=0.8

0

0.1

0.2

0.3

r

p=0.1 n=20

0

0.1

0.2

r

n=50 n=5

Page 4: Statistics for HEP Roger Barlow Manchester University

Slide 4

Poisson

‘Events in a continuum’e.g. Geiger Counter clicksMean rate in time intervalGives number of events in

data0

0.1

0.2

0.3

r

!);(

rerP

r Mean

=<r>=rP( r )

= Variance

V=<(r- )2>=<r2>-<r>2

=

=2.5

Page 5: Statistics for HEP Roger Barlow Manchester University

Slide 5

Poisson Examples

0

0.1

0.2

0.3

r

0

0.2

0.4

0.6

0.8

r

0

0.1

0.2

0.3

0.4

r

0

0.1

0.2

r

0

0. 2

r

0

0. 1

r

=25=10=5.0

=0.5 =2.0=1.0

Page 6: Statistics for HEP Roger Barlow Manchester University

Slide 6

Binomial and Poisson

From an exam paperA student is standing by the road, hoping to hitch a lift. Cars pass

according to a Poisson distribution with a mean frequency of 1 per minute. The probability of an individual car giving a lift is 1%. Calculate the probability that the student is still waiting for a lift

(a) After 60 cars have passed

(b) After 1 hour

a) 0.9960=0.5472 b) e-0.6=0.5488

Page 7: Statistics for HEP Roger Barlow Manchester University

Slide 7

Poisson as approximate binomial

0

0.1

0.2

0.3

r

Poisson mean 2

Binomial: p=0.1, 20 tries Binomial: p=0.01, 200 tries

Use: MC simulation (Binomial) of Real data (Poisson)

Page 8: Statistics for HEP Roger Barlow Manchester University

Slide 8

Two Poissons

2 Poisson sources, means 1 and 2 Combine samplese.g. leptonic and hadronic decays of W

Forward and backward muon pairsTracks that trigger and tracks that don’t

What you get is a Convolution

P( r )= P(r’; 1 ) P(r-r’; 2 )

Turns out this is also a Poisson with mean 1+2 Avoids lots of worry

Page 9: Statistics for HEP Roger Barlow Manchester University

Slide 9

The GaussianProbability

Density

ex

xP22 2/)(

2

1),;(

Mean

=<x>=xP( x ) dx

=Variance

V=<(x- )2>=<x2>-<x>2

=

Page 10: Statistics for HEP Roger Barlow Manchester University

Slide 10

Different GaussiansThere’s only one!

Normalisation (if required)

Location change

Width scaling factor

Falls to 1/e of peak at x=

Page 11: Statistics for HEP Roger Barlow Manchester University

Slide 11

Probability Contents

68.27% within 195.45% within 299.73% within 3

90% within 1.645 95% within 1.960 99% within 2.576 99.9% within

3.290These numbers apply to Gaussians and only Gaussians

Other distributions have equivalent values which you could use of you wanted

Page 12: Statistics for HEP Roger Barlow Manchester University

Slide 12

Central Limit TheoremOr: why is the Gaussian Normal?

If a Variable x is produced by the convolution of variables x1,x2…xN

I) <x>=1+2+…N

II) V(x)=V1+V2+…VN

III) P(x) becomes Gaussian for large N

There were hints in the Binomial and Poisson examples

Page 13: Statistics for HEP Roger Barlow Manchester University

Slide 13

CLT demonstration

Convolute Uniform distribution with itself

0

0.02

0.04

0.06

0.08

0.1

0.12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Series1

00.010.020.030.040.050.060.070.08

1 5 9

13

17

21

25

29

0

0.02

0.04

0.06

0.08

0.1

0.12

1 2 3 4 5 6 7 8 9 100

0.02

0.04

0.06

0.08

0.1

0.12

1 4 7 10 13 16 19

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

1 7 13 19 25 31 37 43 49

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

1 7 13 19 25 31 37 43 49 55

Page 14: Statistics for HEP Roger Barlow Manchester University

Slide 14

CLT Proof (I) Characteristic functions

Given P(x) consider

The Characteristic Function

For convolutions, CFs multiplyIf then Logs of CFs Add

)()(~

)( kkPdxxPee ikxikx

)()()( xhxgxf )(~

)(~)(~

khkgkf

Page 15: Statistics for HEP Roger Barlow Manchester University

Slide 15

CLT proof (2) Cumulants

CF is a power series in k<1>+<ikx>+<(ikx)2/2!>+<(ikx)3/3!>+…

1+ik<x>-k2<x2>/2!-ik3<x3>/3!+…Ln CF can then be expanded as a series

ikK1 + (ik)2K2/2! + (ik)3K3/3!…

Kr : the “semi-invariant cumulants of Thiele”Total power of xr

If xx+a then only K1 K1+a

If xbx then each Kr brKr

Page 16: Statistics for HEP Roger Barlow Manchester University

Slide 16

CLT proof (3)

• The FT of a Gaussian is a Gaussian

Taking logs gives power series up to k2

Kr=0 for r>2 defines a Gaussian• Selfconvolute anything n times: Kr’=n Kr

Need to normalise – divide by n Kr’’=n-r Kr’= n1-r Kr

• Higher Cumulants die away fasterIf the distributions are not identical but similar the same

argument applies

2/2/ 2222 kx ee

Page 17: Statistics for HEP Roger Barlow Manchester University

Slide 17

CLT in real life

Examples• Height• Simple

Measurements• Student final

marks

Counterexamples• Weight• Wealth• Student entry

grades

Page 18: Statistics for HEP Roger Barlow Manchester University

Slide 18

Multidimensional Gaussian

e yxyxyyxx yxyx

yx

yxyxyxP

/))((2/)(/)()1(2

1

2

22222

12

1

),,,,;,(

ee yyxx yx

yxyxyxyxP

2222 2/)(2/)(

2

1),,,;,(

Page 19: Statistics for HEP Roger Barlow Manchester University

Slide 19

Chi squared

Sum of squared discrepancies, scaled by expected error

Integrate all but 1-D of multi-D Gaussian

n

i i

iix

1

2

2

2/22/

2 2

)2/(

2);(

e

nnP n

n

Mean n Variance 2n

CLT slow to operate

Page 20: Statistics for HEP Roger Barlow Manchester University

Slide 20

Generating DistributionsGiven int rand()in stdlib.hfloat Random()

{return ((float)rand())/RAND_MAX;}float uniform(float lo, float hi)

{return lo+Random()*(hi-lo);}float life(float tau)

{return -tau*log(Random());}float ugauss() // really crude. Do not use

{float x=0; for(int i=0;i<12;i++) x+=Random(); return x-6;}

float gauss(float mu, float sigma) {return mu+sigma*ugauss();}

Page 21: Statistics for HEP Roger Barlow Manchester University

Slide 21

A better Gaussian Generator

float ugauss(){

static bool igot=false;

static float got;

if(igot){igot=false; return got;}

float phi=uniform(0.0F,2*M_PI);

float r=life(1.0f);

igot=true;

got=r*cos(phi);

return r*sin(phi);}

Page 22: Statistics for HEP Roger Barlow Manchester University

Slide 22

More complicated functions

Find P0=max[P(x)]. Overestimate if in doubtRepeat :

Repeat: Generate random xFind P(x)Generate random P in range 0-P0

till P(x)>P

till you have enough data If necessary make x non-random and compensate

0

5

10

15

20

25

1 3 5 7 9 11 13 15 17 19

Page 23: Statistics for HEP Roger Barlow Manchester University

Slide 23

Other distributions

Uniform(top hat)=width/12

Breit Wigner (Cauchy)Has no variance – useful for wide tails

LandauHas no variance or mean

Not given by .Use CERNLIB ee

Page 24: Statistics for HEP Roger Barlow Manchester University

Slide 24

Functions you need to know

•Gaussian/Chi squared

• Poisson• Binomial• Everything else