Jan Conrad NuFACT 06 25 August 2006 1 Some comments on ”What (should) sensitivity estimates mean...
-
Upload
gregory-copeland -
Category
Documents
-
view
215 -
download
0
Transcript of Jan Conrad NuFACT 06 25 August 2006 1 Some comments on ”What (should) sensitivity estimates mean...
Jan Conrad NuFACT 06 25 August 2006 1
Some comments on ”What (should) sensitivity estimates mean ?”
Jan Conrad
Royal Institute of Technology (KTH)
Stockholm
Jan Conrad NuFACT 06 25 August 2006 2
Outline
Definitions of “sensitivity” confidence intervals/p-values with
systematic uncertainties
Averaging Profiling
An illustration Remarks on the ensemble Summary/recommendations
The aim of this talk is
to confuse as much as
necessary but as little
as possible
Jan Conrad NuFACT 06 25 August 2006 3
Definition of ”sensitivity” - I
1 (well known HEP-statistics expert from Oxford)
Median upper limit obtained from repeated experiment with no signal. in a two dimensional problem keep one parameter fixed.
2 (fairly well known HEP statistics expert from Germany)
Mean result of whatever the quantity we want to measure, for example 90 % confidence intervals, mean being taken over identical replica of the experiment.
3 (less well known HEP statistics expert from Italy)
Look at that paper I wrote in arXiv:physics. Nobody has used it but it is the best definition .....
Jan Conrad NuFACT 06 25 August 2006 4
Definition of sensitivity -II
Definition using p-values (hypothesis test):
The experiment is said to be sensitive to a given value of the parameter Θ13 = Θ13
sens at signficance level α if the mean p-value obtained given Θ13
sens is smaller than α .
The p-value is (per defintion) calculated given zero hypothesis Θ13 = 0:
test statistics, T, could be for example χ2
Actually observed value of the test statistics
Jan Conrad NuFACT 06 25 August 2006 5
Definition of sensitivity –III(what nuFact people most often use ?)
Definition using confidence intervals (CI ) 1
The experiment is said to be sensitive to a given value of the parameter Θ13 = Θ13
sens at signficance level α if the mean2 1-α CI obtained, given Θ13
sens , does not contain Θ13 = 0.
1) This means using confidence intervals for hypothesis testing. I think I convinced myself, that the approaches are equivalent, but .....
2) some people prefer median .... (because the median is invariant under parameter transformations)
Jan Conrad NuFACT 06 25 August 2006 6
So what ?
Once we decided on the definition of sensitivity,
two problems need to be addressed:
What method should be used to calculate the CI or the p-value ?
Since the experiment does not exist, what is the ensemble of experiments we use to calculate the mean (or other quantities) ?
Jan Conrad NuFACT 06 25 August 2006 7
P-values and the Neyman Pearson lemma
Uniformly most powerful test statistic:
To calculate p-values, we need to know the null-distribution of T.
Therefore it comes handy that asymptotically:
Remember:
Jan Conrad NuFACT 06 25 August 2006 8
Example: practical calculation using p-values
Simulate an observation were Θ13 >0. Fit a model with Θ13 = 0 and a model with Θ13 >0 then:
δχ2 is (under certain circumstances) χ2 distributed.
For problems with these approach Luc Demortier: "P-Values: What They Are and How to Use Them", draft report presented at the BIRS workshop on statistical inference problems in high energy physics and astronomy, July 15-20, 2006.
Jan Conrad NuFACT 06 25 August 2006 9
Some methods for p-value calculation
Conditioning Prior-predictive Posterior-predictive Plug-In Likelihood Ratio Confidence Interval Generalized frequentist
I will not talk about these any more.
Jan Conrad NuFACT 06 25 August 2006 10
Some methods for confidence interval calculation (the Banff list)
Bayesian Feldman & Cousins with Bayesian treatment of
nuisance parameters Profile Likelihood Modified Likelihood Feldman & Cousins with Profile Likelihood Fully frequentist Empirical Bayes
I will talk a little bit about this one
Jan Conrad NuFACT 06 25 August 2006 11
Properties I: Coverage
A method is said to have coverage (1-α) if, in infinitely many repeated experiments the resulting CIs include (cover) the true value in a fraction (1-α) of all cases (irrespective of what the true value is).
1 -α
s
1
0.9over-covering
under-covering
Jan Conrad NuFACT 06 25 August 2006 12
Properties II:Type I, type II error and power
Type I error: Reject H0, though it is true.
Prob(Type I error) = α (corresponds to coverage for hyp. tests)
Type II error: Accept H0, though it is false
Power β = 1 – Prob(Type II error) Given H1, what is the probability that we will reject H0 at given
significance α ?
Jan Conrad NuFACT 06 25 August 2006 13
Nuisance parameters
Nuisance parameters are parameters which enter the data model, but which are not of prime interest. Example background:
You don’t want to give CIs (or p-values) dependent
on nuisance parameters need a way to get rid of
them
Jan Conrad NuFACT 06 25 August 2006 14
How to treat nuisance parameters ?
There is a wealth of approaches to dealing with nuisance parameters. Two are particularly common:
Averaging
No time to discuss this, see:
Profiling
Example which I will present here:
Profile Likelihood/MINUIT (which is similar to what many of
you have been doing)
J.C et. al. Phys. Rev D67:012002,2003 J.C & F. Tegenfeldt , Proceedings PhyStat 05, physics/0511055F. Tegenfeldt & J.C. Nucl. Instr. Meth.A539:407-413, 2005
Bayesian
Jan Conrad NuFACT 06 25 August 2006 15
Profile Likelihood Intervals
Lower limit Upper Limit
2.706
meas n, meas. b MLE of b given s
MLE of b and s given observations
To extract limits:
Jan Conrad NuFACT 06 25 August 2006 16
From MINUIT manual
See F. James, MINUIT Reference Manual, CERN Library Long Write-up D506, p.5:
“The MINOS error for a given parameter is defined as the change in the value of the parameter that causes the F’ to increase by the amount UP, where F’ is the minimum w.r.t to all other free parameters”.
Confidence
Interval
Profile Likelihood ΔΧ2 = 2.71 (90%),
ΔΧ2 = 1.07 (70 %)
Jan Conrad NuFACT 06 25 August 2006 17
Coverage of profile likelihood
Background: Poisson (unc ~ 20 % -- 40 %) , Efficiency: binomial (unc ~ 12%) Rolke
et alMinuit
W. Rolke, A. Lopez, J.C. Nucl. Inst.Meth A 551 (2005) 493-503
(1-
α) M
C
true s
Jan Conrad NuFACT 06 25 August 2006 18
Confidence Intervals for new particle searches at LHC?
Basic idea: calculate 5 σ confidence interval and claim discovery if s = 0 is not included.
Straw-man model:
Observed in signal region
Obs
erve
d in
bac
kgro
und
regi
on
K. S. Cranmer, Proceedings PhyStat 2005
Sideband of size τ
- Bayesian under-covers badly (add 16 events to get correct significance)
- Profile is the only method considered here which gives coverage (exc. full construction)
ProfileBayesian
Jan Conrad NuFACT 06 25 August 2006 19
The profile likelihood and the χ2
The most common method in neutrino physics seems to be minimizing a χ2
Assume Likelihood function:
Omitting terms not dependent on parameter:
χ2 fit asymptotically equivalent to
profile likelihood if you minimize w.r.t
nuisance parameters
Exact asymptotically
Jan Conrad NuFACT 06 25 August 2006 20
A simple example calculation.
Model generating the data:
This means: in each experiment you measure n and bmeas, given s and b. σb is assumed to be known.
In what follows I use the χ2 to calculate a p-value (not a confidence interval)
Jan Conrad NuFACT 06 25 August 2006 21
Two approaches using χ2
Adding the uncertainty in quadrature:
Allowing for a nuisance parameter (background normalisation) and minimize with respect to the nuisance parameter:
...seems to be quite common ...
Similar to what is used in for example:
Burguet-Castell et. al.Nucl.Phys.B725:306-326,2005 (Beta-beams at CERN SPS)
Jan Conrad NuFACT 06 25 August 2006 22
Coverage (type I error)
Nominal χ2:
what you assume is the correct null distribution
Ignore/Profile/Quad. add etc:
”real” null distributions of what you call a χ2
Empirical:
”true” χ2 distribution...... to the extent you trust ROOT.....
Jan Conrad NuFACT 06 25 August 2006 25
Power and sensitivity ?
In most cases I saw, an average result is presented This tells you very little about the probability that a
given signal will yield a significant observation (power)
Shot at ”What should sensitivity mean ?”:
An experiment is sensitive to a finite value Θ of a parameter if the probability of obtaining an observation n which rejects Θ = 0 with at least significance α is at least β.
Jan Conrad NuFACT 06 25 August 2006 26
What is the ensemble ....
..... .of repeated experiments which
I should use to calculate the ”mean” (or the probability β) in the sensitivity calculation ?
I should use to calculate the coverage ?
Jan Conrad NuFACT 06 25 August 2006 27
My answer ...... .... both ensembles should be the same ....
Each pseudo-experiment: has fixed true values of the prime parameter and the nuisance nuisance
parameters. yields a prime measurement (e.g. Number of observed events),
yields one estimate for each nuisance parameter (e.g. background)1)
This estmate might come from auxiliary measurements in the same or other detectors or from theory. In the former case, care has to be taken that the measurement procedure is replicated as in the real experiment.
In case of theoretical uncertainties, there is no real ”measurement process”. I would argue even theoretical uncertainties should be treated as there was a true value and an estimate, which we pretent is a random variable.
1) Shape and size of uncertainties known beforehand ? Otherwise generalize .....
Jan Conrad NuFACT 06 25 August 2006 28
Update ”what should sensitivity mean ?”
An experiment is sensitive to a finite value Θ of a parameter if the probability of obtaining an observation n which rejects Θ = 0 with at least significance α is at least β.
The probability is hereby evaluated using replica of the experiment with
fixed true parameter Θ and fixed nuisance parameters. The random variables in this experiment are thus the observation n and the estimates of the nuisance parameter.
The significance of the observation n hereby evaluated using replica of the experiment with fixed true parameter Θ = 0 and fixed nuisance parameters (assuming a p-value procedure, otherwise by CI)
Jan Conrad NuFACT 06 25 August 2006 29
Unscientific study of 12 papers dealing with sensitivities to oscillation parameters
0 papers seem to worry about the ensemble w.r.t which the ”mean” is calculated 0 papers check the statistical validity of the χ2 used 3 papers treat systematics and write down explicitely what χ2 used (or give enough
information to reconstruct it in principle ) 6 papers ignore the systematics or don’t say how they are included in the fit 2 of the papers don’t say how signficance/CIs are calculated 1 paper doesn’t even tell me the signficance level
No paper is doing what I would like best, ¼ of the papers are in my opinion acceptable with some goodwill, ¾ of the papers I would reject. Binomial errors on these figures are neglected
How do things look in the nuFact community ?
Jan Conrad NuFACT 06 25 August 2006 30
Summary/Recommendations
More is more !
Include systematics in your calculation (or discuss why you neglect them) ...not just neglect them ....
Report under which assumptions the data is generated. Report the test statistic you are using explicetly.
What does ”mean” mean ?
I did not encounter any discussion of neither the power of your sensitivity analysis nor of the ensemble of experiments which is used for the ”average”.
Jan Conrad NuFACT 06 25 August 2006 31
Summary con’d
And the winner is ....
Most of the papers have been using a χ2 fit. If you include nuisance parameters in those and minimize w.r.t them, this is equivalent to a Profile Likelihood approach for strictly Gaussian processes. Otherwise asymptotically equivalent. This approach seems to provide coverage in many even unexpected cases.
Don’t think, compute .....
Given the computer power available (and since the stakes are high), I think for sensitivity studies comparing different experimental configurations, there is no reason to stick slavely to the native χ2 distribution instead of doing a Toy MC to construct the distribution of test statistics yourself.
The thinking part is to choose the ensemble of experiments to simulate.
Jan Conrad NuFACT 06 25 August 2006 32
And a last one ....for the customer
Best is not necessarily best.
The intuitive (and effective) result of including systematics (or doing a careful statistical analysis instead of a crude one) is to worsen the calculated sensitivity. If I was to spend XY M$ on an experiment, I would insist to understand how the sensitivity is calculated in detail.
Otherwise, if anything, I would give the XY M$ to the group with the worse sensitivity but the more adequate calculation.
Jan Conrad NuFACT 06 25 August 2006 33
List of relevant references.
G. Feldman & R. Cousins. Phys.Rev.D57:3873-3889,1998 THE method for confidence interval calculation
J.C et. al. Phys. Rev D67:012002,2003 combining FC with Bayesian treatment of systematics.
J.C & F. Tegenfeldt , Proceedings PhyStat 05, Oxford, 2005, physics/0511055 combined experiments, power calculations for CIs with Bayesian treatment of systematics
F. Tegenfeldt & J.C. Nucl. Instr. Meth.A539:407-413, 2005 coverage of CI intervals.
L. Demortier: presented at the BIRS workshop on statistical inference problems in high energy physics and astronomy, July 15-20, 2006.
all you do want to know about p-values, but don’t dare to ask W.Rolke, A. Lopez and J.C, Nucl. Inst. Meth A. 551 (2005) 493-503
Profile likelihood and its coverage K. S. Cranmer, Proceedings PhysStat 05.
Signficance calculation for the LHC F. James, Computer Phys. Comm. 20 (1980) 29- 35
profile likelihood without calling it that G. Punzi, Proceedings of Phystat 2003, SLAC, Stanford (2003)
a defintion of sensitivity including power S. Baker and R. Cousins
Likelihood and χ2 in fits to histograms. J. Burguet-Castell et. al. Nucl.Phys.B725:306-326,2005
Example of a rather reasonable sensitivity calculation in neutrino physics (random pick, there are certainly others …maybe even better).
R. Barlow, .. J.C. et. al “The Banff comparison of methods to calculate Confidence Interval” Systematic comparison of confidence interval methods, to be published beginning 2007.