Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for...

49
Dr Lorenzo Moneta CERN PH-SFT CH-1211 Geneva 23 sftweb.cern.ch root.cern.ch 1 Introduction to Statistical Methods for Data Analysis

Transcript of Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for...

Page 1: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Dr Lorenzo Moneta CERN PH-SFT

CH-1211 Geneva 23 sftweb.cern.ch

root.cern.ch

1

Introduction to Statistical Methods for Data Analysis

Page 2: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Probability definition • Probability Density Functions • Some typical distributions • Bayes Theorem • Parameter Estimation • Hypothesis Testing

2

Outline

Page 3: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• A lot of the material for this introduction to statistical methods is extracted from a course: –Statistical Methods for Data Analysis

(Luca Lista, INFN Napoli)

–Material available also in his book • Statistical Methods for Data Analysis in Particle Physics

(Springer) – http://www.springer.com/us/book/9783319201757

• Other suggested book is –Data Analysis in High Energy Physics (Wiley)

3

References

Page 4: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Two main different definitions: –Frequentist

• Probability is the ratio of the number of occurrences of an event to the total number of experiments, in the limit of very large number of repeatable experiments.

• Can only be applied to a specific classes of events (repeatable experiments)

• Meaningless to state: “probability that the lightest SuSy particle’s mass is less tha 1 TeV”

–Bayesian • Probability measures someone’s the degree of belief that

something is or will be true: would you bet? • Probability measures someone’s the degree of belief that

something is or will be true: would you bet? – Probability that Barcelona will win the next Champion League

4

Definition Of Probability

Page 5: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Assume all accessible cases are equally probable • Valid on discrete cases only

–Problem in continuous cases (definition of metrics)

5

Classical Probability

Page 6: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Distribution of number of successes on N trials –e.g. spinning a coin or a dice N times

• Each trial has a probability p of success

• Average: <n> = Np • Variance: <n2>-<n>2 = Np(1-p) • Used for efficiency • In ROOT is available as

6

Binomial Distribution

ROOT::Math::binomial_pdf(n,p,N)

Page 7: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Law of large numbers

• this means also that

• circular definition of probabilities –a phenomenon can be proven to be random only if we

observe infinite cases

7

Frequentist Probability

Page 8: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Probability of A, given B : P(A|B) –probability that an event known to belong to set B is also

member of set A –P(A | B) = P(A ∩ B) / P(B)

–A is independent of B ifthe conditional probability of A given B is equal to theprobability of A: • P(A | B) = P(A)

–Hence, if A is independent on B • P(A | B) = P(A) P(B)

–If A is independent on B, B is independent on A8

Conditional Probability

Page 9: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 9

Prob. Density Functions (PDF)

Page 10: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Average = µ • Variance = σ2 • Widely used

because of thecentral limit theorem

10

Gaussian (Normal) Distribution

x5− 4− 3− 2− 1− 0 1 2 3 4 5

PDF(

x)

0

0.2

0.4

0.6

0.8

1

1.2

=0.3σ=0 µ

=1σ=0 µ

=3σ=0 µ

=1σ=-2 µ

Gaussian PDF

TMath::Gaus(x, μ, σ,true) ROOT::Math::normal_pdf( x, σ, μ )TF1 f(“f”,”gausn”,xmin,xmax);x = gRandom->Gaus(μ, σ);

N.B. “gausn” for a normalised (PDF) Gaussian

Page 11: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Sum of n random variables xn converges to a Gaussian, irrespective of the original distributions of the variables xn (only some basic regularity conditions must hold) –∑xn → Gaussian –Example adding n flat distributions

11

Central limit theorem

/ ndf 2χ 87.47 / 83

Constant 3.7± 306.4

Mean 0.013± 5.011

Sigma 0.009± 1.293

0 1 2 3 4 5 6 7 8 9 100

50

100

150

200

250

300 / ndf 2χ 87.47 / 83

Constant 3.7± 306.4

Mean 0.013± 5.011

Sigma 0.009± 1.293

<x> for n = 5 (x is uniform in [0,10])

/ ndf = 422.9 / 972χ

Constant 2.3± 190.8

Mean 0.022± 4.989

Sigma 0.015± 2.031

0 1 2 3 4 5 6 7 8 9 100

20

40

60

80

100

120

140

160

180

200

220 / ndf = 422.9 / 972χ

Constant 2.3± 190.8

Mean 0.022± 4.989

Sigma 0.015± 2.031

<x> for n = 2 (x is uniform in [0,10])

n = 2 n = 5

Page 12: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Standard Deviation

• Model for position of rain drops, time of cosmic ray passage, etc..

• Basic distribution for pseudo-random number generation

12

Uniform (“flat”) distribution

ROOT::Math::uniform_pdf( x, a, b)x = gRandom->Uniform(a, b);

Page 13: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Given a PDF f(x) the cumulative is defined as

• The PDF for F is uniform distributed in [0,1]

• Inverting the cumulative distribution one can generate pseudo-random numbers according to any distribution

13

Cumulative Distribution

Page 14: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

x5− 4− 3− 2− 1− 0 1 2 3 4 500.050.1

0.150.20.250.30.350.4

normal_pdf

x5− 4− 3− 2− 1− 0 1 2 3 4 5

p

0

0.2

0.4

0.6

0.8

1

normal_cdfnormal_cdf_c

p0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

x

3−

2−

1−

0

1

2

3 normal_quantilenormal_quantile_c

• Probability density function – ROOT::Math::normal_pdf(x,σ,μ)

• Cumulative distribution and its complement (right tail integral) – ROOT::Math::normal_cdf(x,σ,μ) – ROOT::Math::normal_cdf_c(x,σ,μ)

• Inverse of the cumulative distributions (quantile distributions) – ROOT::Math::normal_quantile(p,σ)– ROOT::Math::normal_quantile_c(p,σ)

14

Example of Cumulative Distributions

Page 15: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Probability to have n entries in x a subset of X >> x

• Limit of binomial distribution when p = x/X = 𝜈/N << 1 –P(n | 𝜈, N) for N → ∞ is a Poisson( n | 𝜈)

–Limit of Poisson for large 𝜈 is a Gaussian

15

Poisson Distribution

ROOT::Math::poisson_pdf(n,𝝂)

Page 16: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Poisson becomes a Gaussian for large 𝜈

16

Poisson limit for large 𝜈

Page 17: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Add an asymmetric power-law tail to a Gaussian PDF with proper normalisation and continuity of PDF and its derivative

17

Crystal Ball Function

ROOT::Math::crystalball_pdf(x,α,n,σ,μ)TF1 f(“f”,”crystalballn”,xmin,xmax)

Page 18: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Model the fluctuations in the energy loss of particles in this layers

18

Landau Distribution

ROOT::Math::landau_pdf(x,s,m)TF1 f(“f”,”landaun”,xmin,xmax)

Page 19: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics 19

Bayes Theorem

Page 20: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• A person received a diagnosis of a serious illness • The probability to detect positively a ill person is

~100% • The probability to give a positive result on a healthy

person is 0.2%

• What is the probability that the person is really ill? • Is 99.8% a reasonable answer ?

20

A concrete example

Page 21: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• We know: –P(+ | ill) ~ 100 % → P(- | ill) << 1 –P(+ | healthy) = 0.2 % → P(- | healthy) = 99.8

• Using Bayes theorem we want to know –P(ill | +) = P( + | ill) P(ill)/P(+) ~ P(ill)/P(+)

• We need to know –P(ill) = probability that a random person is ill << 1 –P(healthy) = 1-P(ill)

• We have also – P(+) = P(+ | ill) P(ill) + P(+|healhy)P(healty)

~ P(ill) + P(+| healthy) • Result: P(ill | +) ~ P(ill) / (P(ill) + P(+| healthy) )

21

Result using Bayes theorem

Page 22: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Result: • P(ill | +) ~ P(ill) / (P(ill) + P(+| healthy) )

• Using some numbers • P(ill) = 0.1 % • P(+ | healthy) = 0.2%

• Then we have: • P(ill|+) = .1 /(.1+.2) = 33 %

22

Result from Bayes theorem (2)

Page 23: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Likelihood function: –given some observed events: x1,… xn –Likelihood function is the PDF of the variables x1,… xn –L (x1,… xn | 𝛳1,…𝛳n )

• Bayes theorem can be written as

23

Likelihood Function

likelihood function prior probability

normalisation term

likelihood prior

normalisation term

posterior

Page 24: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics 24

Repeated use of Bayes theorem

Page 25: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Posterior summarises all information on the unknown parameters θ given the data

• From the posterior one can estimate best parameter values and probability intervals (credible intervals)

• Result depends on the prior distribution

25

Bayesian Inference

Page 26: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Perform analytical integration –feasible in very few simple cases

• Use numerical integration –May be CPU intensive –difficult for large multi-dimensional cases

• Markov Chain Monte Carlo • sample parameter space efficiently using a random walk

heading to the regions of higher probability • Metropolis algorithm to sample according to a PDF f(x)

26

How to compute the Posterior PDF

Page 27: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 27

Markov-Chain Monte Carlo

Available in ROOT in the RooStats package

Page 28: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Bayesian probability is subjective –depends on prior probabilities or degrees of belief about

the unknown parameters • Problem on how to represent lack of knowledge

–e.g. uniform distribution is not invariant under coordinate transformations • uniform in log𝛳 is scale-invariant

– Jeffreys prior: prior invariant under parameter transformation

• Recommend a study of the sensitivity of the result on the chosen prior PDF

28

Problem with Bayesian approach

Page 29: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 29

Frequentist vs Bayesian Inference

Page 30: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Dr Lorenzo Moneta CERN PH-SFT

CH-1211 Geneva 23 sftweb.cern.ch

root.cern.ch

30

Parameter Estimation

• Parameter estimate • Likelihood function • Maximum Likelihood method • Property of estimators

Page 31: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 31

Statistical Inference

Page 32: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 32

Parameter estimators

Page 33: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 33

Likelihood Function

Page 34: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 34

Maximum Likelihood Estimates

Page 35: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 35

Gaussian approximation

Page 36: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• Consistency • Bias • Efficiency • Robustness

36

Estimator properties

Page 37: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 37

Estimator consistency

Page 38: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics 38

Bias

Page 39: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 39

Efficiency

Page 40: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 40

Robustness

Page 41: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 41

Parameter uncertainties with ML

Page 42: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 42

Error Determination

Page 43: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Dr Lorenzo Moneta CERN PH-SFT

CH-1211 Geneva 23 sftweb.cern.ch

root.cern.ch

43

Hypothesis Testing

• Definition of hypothesis testing • Neyman-Pearson lemma and

Likelihood ratio

Page 44: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 44

Hypothesis Tests

Page 45: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• H0 : null hypothesis –the hypothesis we want to prove that is false –e.g. the data contains only background (no Higgs signal)

• H1 : alternate hypothesis –e.g. the data contains signal (Higgs) and background

• α : significance level: probability to reject H1 if true (error of first kind) –α = 1 - selection efficiency

• 𝛽 : probability to reject H0 if true (error of second kind) –power (probability to reject H0 if H1 is true) = 1 - 𝛽 –𝛽= misidentification probability

45

Hypothesis Test

Page 46: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics 46

Example: Cut analysis

Page 47: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 47

Likelihood Ratio

Page 48: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UENRJ 2015: Introduction to Statistics 48

Newman Pearson Lemma

Page 49: Introduction to Statistical Methods for Data Analysis · Introduction to Statistical Methods for Data Analysis . Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction

Lorenzo Moneta CERN PH-SFT Data Analysis Tutorial at UERJ 2015: Introduction to Statistics

• We will look next lectures on how to – how to use multivariate (machine learning) methods to do

classification and more – estimate the parameter uncertainty (errors) in maximum

likelihood fits – estimate confidence intervals – use hypothesis tests for estimate the discovery

significance of new particles

• We will complement this with examples in TMVA, RooFit and RooStats

49

Summary