Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in...
-
Upload
darcy-randall -
Category
Documents
-
view
216 -
download
1
Transcript of Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in...
1
Statistics In HEP 2
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
How do we understand/interpret our measurements
2
Outline
Maximum Likelihood fit strict frequentist Neyman – confidence intervals
what “bothers” people with them
Feldmans/Cousins confidence belts/intervals
Bayesian treatement of ‘unphysical’ results
How the LEP-Higgs limit was derived
what about systematic uncertainties?Profile Likleihood
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
3
Parameter Estimation
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
want to measure/estimate some parameter e.g. mass, polarisation, etc..
observe: e.g observables for events
“hypothesis” i.e. PDF ) - distribution for given e.g. diff. cross section independent events: P(,..
for fixed regard ) as function of (i.e. Likelihood! L() ) close to Likelihood L() will be large
Glen Cowan: Statistical data analysis
try to maximuse L() typically:
minimize -2Log(L())
Maximum Likelihood estimator
4
Maximum Likelihood Estimator
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
example: PDF(x) = Gauss(x,μ,σ)
estimator for from the data measured in an experiment
full Likelihood
typically: Note: It’s a function of !
𝜇
−𝟐𝐥𝐧
(𝐋(𝜇
))
�̂�𝑏𝑒𝑠𝑡𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒
For Gaussian PDFs: ==
from interval variance on the estimate
=1
5
Parameter Estimation
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
properties of estimators
biased or unbiased large or small variance distribution of on many measurements ?
Glen Cowan: 𝜽 𝒕𝒓𝒖𝒆
Small bias and small variance are typically “in conflict” Maximum Likelihood is typically unbiased only in the limit
If Likelihood function is “Gaussian” (often the case for large K central limit theorem)
get “error” estimate from or -2 if (very) none Gaussian
revert typically to (classical) Neyman confidence intervals
6
Classical Confidence Intervals
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
another way to look at a measurement rigorously “frequentist” Neymans Confidence belt for CL α (e.g. 90%)
Feldman/Cousin (1998) xmeasured
μh
ypo
the
tica
lly
tru
e
each μhypothetically true has a PDF of how the measured values will be distributed
determine the (central) intervals (“acceptance region”) in these PDFs such that they contain α
do this for ALL μhyp.true
connect all the “red dots” confidence belt
measure xobs : conf. interval =[μ1, μ2] given by vertical line intersecting the belt.
by construction: for each xmeas. (taken according PDF() the confidence interval [μ1, μ2] contains in α = 90% cases
μ1
μ2
7
Classical Confidence Intervals
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
another way to look at a measurement rigorously “frequentist” Neymans Confidence belt for CL α (e.g. 90%)
Feldman/Cousin(1998): xmeasured
μh
ypo
the
tica
lly
tru
e
conf.interval =[μ1, μ2] given byvertical line intersecting the belt. by construction:
if the true value were lies in [, ] if it intersects ▐ xmeas intersects ▬ as in 90%
(that’s how it was constructed)
only those xmeas give [, ]’s that intersect with the ▬
90% of intervals cover
μ1
μ2
is Gaussian ( central 68% Neyman Conf. Intervals Max. Likelihood + its “error” estimate ;
𝝁𝒕𝒓𝒖𝒆
8
Flip-Flop
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
Feldman/Cousins(1998)
When to quote measuremt or a limit! estimate Gaussian distributed quantity that cannot be < 0 (e.g. mass) same Neuman confidence belt construction as before:
once for measurement (two sided, each tail contains 5% ) once for limit (one sided tails contains 10%)
induces “undercovering” as this acceptance region contains only 85% !!
decide: if <0 assume you =0 conservative
if you observe <3 quote upper limit only
if you observe >3 quote a measurement
9
Some things people don’t like..
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
Feldman/Cousins(1998).
same example: estimate Gaussian distributed quantity that cannot be < 0 (e.g. mass)
using proper confidence belt assume:
confidence interval is EMPTY!
Note: that’s OK from the frequentist interpretation
in 90% of (hypothetical)
measurements.
Obviously we were ‘unlucky’ to pick one out of the remaining 10%
nontheless: tempted to “flip-flop” ??? tsz .. tsz.. tsz..
10
Feldman Cousins: a Unified Approach
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
How we determine the “acceptance” region for each μhyp.true is up to us as long as it covers the desired integral of size α (e.g. 90%)
include those “xmeas. ” for which the large likelihood ratio first:
𝑹=𝑳 (𝒙𝒎𝒆𝒂𝒔|𝝁𝒎𝒆𝒂𝒔𝒖𝒓𝒆𝒅 )𝑳 (𝒙𝒎𝒆𝒂𝒔|𝝁𝒃𝒆𝒔𝒕𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒆 )
here: either the observation or the closes ALLOWED
α = 90%
𝑹
No “empty intervals anymore!
11
Being Lucky…
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
give upper limit on signal on top of know (mean) background n=s+b from a possion distribution
Neyman: draw confidence belt with “” in the “y-axis” (the possible true values of )
for fixed Background
Glen Cowan: Statistical data analysisBackground so
rry…
th
e p
lots
do
n’t
mat
ch
: o
ne
is
of
90%
CL
th
e o
ther
fo
r 95
%C
L
observed n
2 experiments (E1 and E2) , observing (out of luck) 0E1: 95% limit on E2: 95% limit on UNFAIR ! ?
12
Being Lucky …
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
Feldman/Cousins confidence beltsmotivated by “popular” ‘Bayesian’ approaches to handle such problems.
Bayesian: rather than constructing Confidence belts: turn Likelihood for (on given ) into Posterior probability on add prior probability on “s”:
Background b
Up
per
lim
it o
n s
ign
al
𝒏𝒐𝒃𝒔
Feldman/Cousins• there is still SOME “unfairness”• perfectly “fine” in frequentist
interpretation:• should quote “limit+sensitivity”0 1 10
Feldman/Cousins(1998).Helene(1983).
13
Statistical Tests in Particle Searches
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
exclusion limits• upper limit on cross section
(lower limit on mass scale)• (σ < limit as otherwise we would have seen
it)
need to estimate probability of downward fluctuation of s+btry to “disprove” H0 = s+bbetter: find minimal s, for which you can sill exclude H0 = s+b a pre-specified Confidence Level
discoveriesneed to estimate probability of upward fluctuation of btry to disprove H0 =“background only”
b only
Nobs for 5s discoveryP = 2.8 10–7
possible observation
b only s+b
type 1 error
14
Which Test Statistic to used?
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
exclusion limit: • test statistic does not necessarily have to be simply the counted
number of events: remember Neyman Pearson Likelihood ratio
pre-specify 95% CL on exclusion make your measurements
if accepted (i.e. in “chritical (red) region”) where you decided to “recjet” H0 = s+b
(i.e. what would have been the chance for THIS particular
measurement to still have been “fooled” and there would have actually BEEN a signal)
P(t
_
t
b only s+b
15
Example :LEP-Higgs search
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
Remember: there were 4 experiments, many different search channels treat different exerpiments just like “more channels”
Evaluate how the -2lnQ is distributed for background only signal+background (note: needs to be done for all Higgs masses)
example: mH=115GeV/c2
more signal like more background like
𝑪𝑳𝒔+𝒃𝑪𝑳𝒃
Example: LEP SM Higgs Limit
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 16
17
Example LEP Higgs Search
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
𝑪𝑳𝒔+𝒃𝑪𝑳𝒃
more signal like more background like
In order to “avoid” the possible “problem” of Being Lucky when setting the limit
rather than “quoting” in addition the expected sensitivity weight your CLs+b by it:
𝑪𝑳𝒔=𝑪𝑳𝒔+𝒃
𝑪 𝑳𝒃
=𝑷 (𝑳𝑳𝑹≥𝑳𝑳𝑹𝒐𝒃𝒔∨𝑯𝟏 )𝑷 (𝑳𝑳𝑹≤𝑳𝑳𝑹𝒐𝒃𝒔∨𝑯𝟎 )
18
Systmatic Uncertainties standard popular way: (Cousin/Highland)
integrate over all systematic errors and their “Probability distribution)
marginalisation of the “joint probability density of measurement paremters and systematic error)
“hybrid” frequentist intervals and Bayesian systematic
has been shown to have possible large “undercoverage” for very small p-values /large significances (i.e. underestimate the chance of “false discovery” !!
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
! Bayesian ! (probability of the systematic parameter)
LEP-Higgs: generaged MC to get the PDFs with “varying” param. with systematic uncertainty
essentiall the same as “integrating over” need probability density for “how these parameters vary”
19
Systematic Uncertainties
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
Why don’t we: include any systematic uncertainly as “free parameter” in the fit
K.Cranmer Phystat2003
eg. measure background contribution under signal peak in sidebands
measurement + extrapolation into side bands have uncertainty
but you can parametrise your expected background such that:
if sideband measurement gives this data then b=…
Note: no need to specify prior probability
Build your Likelyhood function such that it includes: your parameters of interest those describing the influcene of the sys. uncertainty nuisance parameters
20
Nuisance Parameters and Profile Liklihood
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
Build your Likelyhood function such that it includes: your parameters of interest those describing the influcene of the sys. uncertainty nuisance parameters
“ratio of likelihoods”, why ?
21
Profile Likelihood
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
Why not simply using L(m,q ) as test statistics ?
• The number of degrees of freedom of the fit would be N q + 1m
• However, we are not interested in the values of q ( they are nuisance !)
• Additional degrees of freedom dilute interesting information on m• The “profile likelihood” (= ratio of maximum likelihoods) concentrates the
information on what we are interested in• It is just as we usually do for chi-squared: Dc2(m) = c2(m,qbest’ ) – c2(mbest, qbest)• Nd.o.f. of Dc2(m) is 1, and value of c2(mbest, qbest) measures “Goodness-of-fit”
“ratio of likelihoods”, why ?
22
Summary
Helge Voss Hadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP
Maximum Likelihood fit to estimate paremters
what to do if estimator is non-gaussion:Neyman – confidence intervals
what “bothers” people with them Feldmans/Cousins confidene belts/intervals
unifies “limit” or “measurement” confidence belts CLs … the HEP limit;
CLs … ratio of “p-values” … statisticians don’t like that
new idea: Power Constrained limits rather than specifying “sensitivity” and “neymand conf. interval” decide beforehand that you’ll “accept” limits only if the where your
exerpiment has sufficient “power” i.e. “sensitivity ! .. a bit about Profile Likelihood, systematic error.