Practical Statistics for Particle Physicists Lecture 1
description
Transcript of Practical Statistics for Particle Physicists Lecture 1
Practical Statistics for Particle PhysicistsLecture 1
Harrison B. ProsperFlorida State University
European School of High-Energy PhysicsParádfürdő, Hungary
5 – 18 June, 2013
1
Outline
h Lecture 1h Descriptive Statisticsh Probability & Likelihood
h Lecture 2h The Frequentist Approach h The Bayesian Approach
h Lecture 3h Analysis Example
2
Descriptive Statistics
Descriptive Statistics – 1
Definition: A statistic is any function of the data X.Given a sample X = x1, x2, … xN, it is often of interest to compute statistics such as
the sample average
and the sample variance
In any analysis, it is good practice to study ensemble averages, denoted by < … >, of relevant statistics
4
N
iix
Nx
1
1
2 2
1
1 ( )N
ii
S x xN
Descriptive Statistics – 2
Ensemble Average
Mean
Error
Bias
Variance
Mean Square Error
5
x
b x
V (x− x )2
x
MSE (x−)2
6
Descriptive Statistics – 3
2
2
MSE ( )x
V b
The MSE is the most widely used measure of closeness of anensemble of statistics {x} to the true value μ
The root mean square (RMS) is
RMS MSE
Exercise 1:Show this
Descriptive Statistics – 4
Consider the ensemble average of the sample variance
7
S2 1N
(xi −x)2 i1
N
∑
1N
xi2 −
i1
N
∑ 2N
xix i1
N
∑ 1N
x 2
i1
N
∑
1N
xi2 − x 2
i1
N
∑ x2 − x 2
Descriptive Statistics – 5
The ensemble average of the sample variance
has a negative bias of –V / N
8
S2 x2 − x 2
x2 − x2 N
−N −1N
⎛⎝⎜
⎞⎠⎟ x 2
V −VN
Exercise 2:Show this
Descriptive Statistics – SummaryThe sample averageis an unbiased estimateof the ensemble average
The sample variance is a biased estimateof the ensemble variance
9
N
iix
Nx
1
1
2 2
1
1 ( )N
ii
S x xN
Probability
11
Probability – 1
Basic Rules 1. P(A) ≥ 02. P(A) = 1 if A is true3. P(A) = 0 if A is false
Sum Rule4. P(A+B) = P(A) + P(B) if AB is false *
Product Rule5. P(AB) = P(A|B) P(B) *
*A+B = A or B, AB = A and B, A|B = A given that B is true
P(A | B) P(AB)P(B)
Probability – 2
By definition, the conditional probability of A given B is
P(A) is the probability of A withoutrestriction.
P(A|B) is the probability of A when we restrict to the conditions under which B is true.
P(B | A) P(AB)P(A)
12
ABAB
Fromwe deduceBayes’ Theorem:
( | ) ( )( | )( )BP BB A PP
PA
A
( ) ( | ) ( )( | ) ( )
P PBA A AP A
PBB P B
13
ABAB
Probability – 3
A and B are mutually exclusive if
P(AB) = 0
A and B are exhaustive if
P(A) + P(B) = 1Theorem
14
Probability – 4
( ) ( ) ( ) ( )P P PA B A B BAP
Exercise 3: Prove theorem
ProbabilityBinomial & Poisson Distributions
Binomial & Poisson Distributions – 1A Bernoulli trial has two outcomes:
S = success or F = failure.
Example: Each collision between protons at the LHC is a Bernoulli trial in which something interesting happens (S) or does not (F).
16
Binomial & Poisson Distributions – 2Let p be the probability of a success, which is assumed to be
the same at each trial. Since S and F are exhaustive, the probability of a failure is 1 – p. For a given order O of n trails, the probability Pr(k,O|n) of exactly k successes and n – k failures is
17
Pr(k,O,n) pk(1−p)n−k
Binomial & Poisson Distributions – 3If the order O of successes and failures is irrelevant, we can
eliminate the order from the problem integrating over all possible orders
This yields the binomial distribution
which is sometimes written as
18
Pr(k ,n) Pr(k,O,n)
O∑ pk(1−p)n−k
O∑
Binomial(k,n, p) ≡ kn( )pk(1−p)n−k
k : Binomial(n, p)
Binomial & Poisson Distributions – 3We can prove that the mean number of successes a is
a = p n.
Suppose that the probability, p, of a success is very small,
then, in the limit p → 0 and n → ∞, such that a is constant, Binomial(k, n, p) → Poisson(k, a).
The Poisson distribution is generally regarded as a good model for a counting experiment
19
Exercise 5: Show that Binomial(k, n, p) → Poisson(k, a)
Exercise 4: Prove it
20
Common Distributions and Densities
1 / a
exp[−(x−)2 / (2σ 2 )] / (σ 2p )
xn/2−1 xp(−x / 2) / [2n/2Γ(n / 2)]
xb−1ab xp(−ax) / Γ(b)axp(−ax)
kn( )pk(1−p)n−k
ak xp(−a) / k!n!
k1!L kK !piki
i1
K
∏ , pi 1i1
K
∑ , ki ni1
K
∑
Uniform(x,a)Gaussian(x,,σ )Chiσq(x,n)Γa a(x,a,b)Exp(x,a)Bino ial(k,n, p)Poiσσon(k,a)Multino ial(k,n, p)
Probability – What is it Exactly?
21
There are at least two interpretations of probability:
1. Degree of belief in, or plausibility of, a propositionExample:
It will snow in Geneva on Friday
2. Relative frequency of outcomes in an infinite sequence of identically repeated trialsExample:
trials: proton-proton collisions at the LHC
outcome: the creation of a Higgs boson
Likelihood
23
Likelihood – 1The likelihood function is simply the probability, or probability
density function (pdf), evaluated at the observed data.
Example 1: Top quark discovery (D0, 1995)
p(D| d) = Poisson(D |d) probability to get a count D
p(17|d) = Poisson(17|d) likelihood of observation D = 17
Poisson(D|d) = exp(-d) dD / D!
24
Likelihood – 2Example 2: Multiple counts Di with a fixedtotal count N
This is an example of a multi-binned likelihood
p(D | p) Multino ial(D,N, p)D D1,L ,DK , p p1,L , pK
D i Ni1
K
∑
25
Likelihood – 3Example 3: Red shift and distance modulus measurements of N = 580 Type Ia supernovae
This is an example of an un-binned likelihood
p(D |ΩM ,ΩD ,Q)
Γauσσian(xi,(zi,ΩM ,ΩD ,Q),σ i)i1
N
∏D zi, xi ±σ i
26
Likelihood – 4Example 4: Higgs to γγThe discovery of the neutral Higgs boson in the di-photon final
state made use of an an un-binned likelihood,
where x = di-photon massesm = mass of new particlew = width of resonances = expected signalb = expected backgroundfs = signal modelfb = background model
p(x | s,m,w,b) xp[−(σ b)] σfσ(xi | ,w) b fb(xi)[ ]i1
N
∏
Exercise 6: Show that a binned multi-Poisson likelihood yields anun-binned likelihood ofthis form as the bin widthsgo to zero
27
Likelihood – 5
Given the likelihood function we can answer questions such as:
1. How do I estimate a parameter?2. How do I quantify its accuracy?3. How do I test an hypothesis?4. How do I quantify the significance of a result?
Writing down the likelihood function requires:5. Identifying all that is known, e.g., the observations6. Identifying all that is unknown, e.g., the parameters7. Constructing a probability model for both
28
Likelihood – 6
Example: Top Quark Discovery (1995), D0 Results
knowns:D = 17 eventsB = 3.8 ± 0.6 background events
unknowns:b expected background counts expected signal countd = b + s expected event count
Note: we are uncertain about unknowns, so 17 ± 4.1 is a statement about d, not about the observed count 17!
Likelihood – 7
Probability:
Likelihood:
whereB = Q / kδB = √Q / k
29
Q (B /δB)2 (3.8 / 0.6)2 41.11
k B /δB2 3.8 / 0.62 10.56
p(D | s, b) Poiσσon(D, σ b) Poiσσon(Q, bk)
(σb)D −(σb)
D !(bk)Q−bk
Γ(Q 1)
p(17 | s, b)
30
Summary
Statistic A statistic is any function of potential observations
ProbabilityProbability is an abstraction that must be interpreted
LikelihoodThe likelihood is the probability (or probability density) of potential observations evaluated at the observed data
Tutorials
Location:http://www.hep.fsu.edu/~harry/ESHEP13
Download tutorials.tar.gz
and unpacktar zxvf tutorials.tar.gz
Need:Recent version of Root linked with RooFit and TMVA
31