GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar...

35
GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group

Transcript of GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar...

Page 1: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection

Lucio Baggio

Seminar to ICRR’s GW group

Page 2: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Topics of this presentation

False discovery probability

Highlights from GravStat 2005 Photographs, outline, comments about the workshop at Penn State University

Multiple tests and large surveys change the overall confidence of the first detection

Miscellaneous topicsStatus of AURIGA

First steps toward TAMA-AURIGA collaboration

Page 3: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Highlights from GravStat 2005

Page 4: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

GravStat workshop

Center for Gravitational Wave PhysicsPenn State University, 19-21 May 2005

Program Committee: Lee Samuel Finn, Michelle Larson (Penn State University)Peter Shawhan, Albert Lazzarini (California Institute of Technology)Graham Woan (University of Glasgow)

Registration fee: 95$Accomodation: Days Inn Penn StatePartecipants: 50

8 9 10 11 12 13 14 15 16 1719 roundtable

20

21 roundtable

lunch

More discussion than talksMore questions than answers!

http://cgwp.gravity.psu.edu/events/GravStat/

Page 5: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Focus Session: Statistics for Gravitational Wave Data Analysis

    Thursday, 19 May 2005Welcome - Sam FinnTom Loredo (Cornell) Statistics for gravitational wave data analystsLee Samuel Finn (Penn State) Gravitational wave data and analysis questions for statisticiansRoundtable: Questions posed in the analysis of gravitational wave data

Giovanni Prodi (Universita di Trento and INFN)Peter Shawhan (Caltech) Massimo Visco (IFSI-INAF)Gianluca Guidi (Universita di Urbino - INFN Firenze)Daisuke Tatsumi (National Astronomical Observatory of Japan)

Joseph Romano (Cardiff) Characterizing Confusion: Inference from stochasitc gravitational wave searches

Graham Woan (Glasgow) Statistical approach to the analysis and interpretation of periodic gravitational wave signals

Friday, 20 May 2005Katherine Rawlins (MIT) Post-hoc vetoes in Frequentist analysesTom Loredo (Cornell) Bayesian Adaptive ExplorationPeter Shawhan (Caltech) Setting limits and making discoveriesTiffany Summerscales (Penn State) Deconvolution and inference using maximum entropyBanquet Dinner Nittany Lion Inn Harry Collins (Cardiff): After dinner speakerSaturday, 21 May 2005Giovanni Prodi (University of Trento) Statistical problems associated with the analysis of data

from a network of narrowband detectorsSukanta Bose (Washington State, Pullman) Statistical problems associated with analysis of data

from a network of broadband detectorsRoundtable: Working with detectors of different sensitivity

All talks can be downloaded from http://cgwp.gravity.psu.edu/events/GravStat/program.shtml

Page 6: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Tom Loredo (Cornell): Statistics for gravitational wave data analysts

• Byesian vs Frequentist statistics• Algorithms may be similar, but interpretation is different

• Byesian statistics is both a more general approach (can talk about probability of everything) and narrower one (can only reason about likelihood and not repeatability, and need priors)

• Frequentist approach stops before decisions can be taken: decision theory than has to be applied (which is the risk associated with a wrong claim?

• Frequentist: given the model Hi the probability to measure the data D can be stated: P(D|Hi). Then, based on a test on the data, we can say that the statement “the true hypothesis was Hk” can be either wrong or true, but we know how frequently it would be wrong if we apply this procedure many times

• Byesian: data D, models {Hi} and other information (I) are argument of a function P(D,I,Hi)[0-1] which satisfies the axioms of probability. All expressions P(D), P(D|I,Hi), P(Hi|D,I), P(Hi) etc are valid. Based on the data, we assess a statement about the likelihood that an hypothesis Hi is true or false conditioned on the observed data:

• P(Hi |D,I) ~ P(Hi) P(D|I,Hi)

Page 7: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Tom Loredo (Cornell): Statistics for gravitational wave data analysts

• Neyman-Pearson criterion vs Fisher P-level• A correct hypothesis testing requires to first fix a decision threshold

and then checking whether the null hypothesis is accepted or rejected.

• Reporting and investigating the actual value of the p-level seems a better way to use information from the data, but at the cost of loosing in fact any control in the error rate.• Depending on how many tests we performed, observing p~0 may not be significant

• Frequentist issues: multiple trials change the false alarm probability; Monte Carlo procedures varies on the possible outcome given a model

• Byesian approach: multiple models tried at once, will eventually find a model that fits by chance; Monte Carlo methods vary the model parameters given the found outcome.

• Frequentist: choose the best procedure based on performances (reduce the risk of statistical errors)

• Byesian approach: nothing to optimize (use likelihood function straightaway)

Page 8: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Unfolding results

Finn: we should face the problem of unfolding results, from event rates to source population models. In particular, now that we move from no evidence of GW toward the first detections

Baggio: But even if you still have upper limits, you should worry about unfolding, to give physical information to astrophysics community

Finn: I agree

Shawhan: In principle this task is not difficult, if we stick with one specific model with few parameters (e.g. when the unknown parameter is only the overall rate, not the spatial distribution). The problem starts when we have to put constraints on many parameters at once, and even worse when the model itself is not well defined. With increasing difficulty, we can:

1. Limit the rate of detectable signals

2. Limit on rate of a particular signal or source as a function of amplitude

3. Aggregate rate of a population of sources

Page 9: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

RoundTable: Questions posed in the analysis of gravitational wave data

Prodi: When can we safely assess a claim of detection? To avoid biases on claimed confidence, blind analysis should be encouraged, especially when we have more than one detector.

Should we rely only on statistics?

• if we do not believe in statistics, we did not make a good use of statistics!

[comment]: if you have priors about the source characteristics, you should add them a priori (e.g. when designing the confidence belt) Feigelson: I believe that all experiments have already all ancillary channels available, plus a lot of ideas about other things to test. Why should not take those information from the beginning? It is called multivariate analysis, using both primary and ancillary channels

Shawhan: should we really do everything in a statistical rigorous way? Should we put everything in a single likelihood function? This is very tough, and can be avoided by looking to simpler and understood test statistics from the data (e.g. reducing to a counting experiment)

Page 10: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

RoundTable: Questions posed in the analysis of gravitational wave data

Feigelson: About modeling noise, there are various types of noise: wide band, glitches, harmonics, autoregressive noise (1/f). I think that we are in great trouble if in the detectors all kind of noises appear simultaneously!

Baggio: We had most of them in the detectors in the IGEC. But as long as they are uncorrelated, this should just imply random noise in the final observable (e.g. coincidence counts), doesn’t it?

Tatsumi: blind analysis is ideal for 1st detection, but for this purpose we should know and model exactly the noise of the detector. Care should be taken about systematic errors. Network analysis anyway is a good idea.

Mohanty: There are attempts to simulate and differentiate more than one kind of noise at a time (but without mutual interference)

Guidi: Signal injection is presently [in VIRGO] the way to judge which is the best method [and overcome the problem of estimating noise].

Mohanty: We should try to model the noise, but most of all check the possibility of other noise models which could jeopardize the statistical significance

Page 11: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

RoundTable: Questions posed in the analysis of gravitational wave data

Loredo: 40 years ago there was a similar dichotomy: having all models a priori and use Neyman-Pearson decision rule, or just look to the data with more flexibility (Tukey). You should look to US Army experience in “non-cooperative target identification”.

Finn: There are already a lot of judgments which are flexibly taken in account without being written in a paper (e.g. not taking into account an auxiliary detector, not because of a measure, but by intuition, guided by our previous experience as scientists)

Finn: From past experience and knowledge, if we would like to really model the instrument, as we would like, we should rebuild LIGO from scratch! But as we cannot rebuild LIGO, we have to rely on statistics. Do you believe that given the models and the best knowledge we have today, we are ready (today) for a claim with no uncertainty?

Page 12: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

RoundTable: Questions posed in the analysis of gravitational wave data

Johnson: ESP (extra-sensorial perception) experiments already showed the problem of purely statistical discoveries.

Collins: But in that case, you can grasp that they were phantoms, because the instruments were evolving in sensitivity. In this case, evidence should improve, but neither with ESP nor with Weber’s discoveries this happened

Page 13: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

RoundTable: Questions posed in the analysis of gravitational wave data

Shawhan: about priors, how should we choose them? Some improper prior (e.g. 1/) do not actually leave any space for information from the data!

Page 14: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

RoundTable: Questions posed in the analysis of gravitational wave data

Q: What about using posteriors from previous analysis?

A: Yes, but you have anyway to decide the initial prior

Q: But iteration of many experiments should make the final result insensitive to the initial prior

A (Loredo): Yes, but only if they lead to informative posteriors!

Shawhan: Our priors are also biasing our analysis from the beginning, when we try to optimize the data (e.g. cuts)

Finn: Right. Another example: if by chance nobody thought at the possibility of stochastic gravitational wave background, no specific analysis tool would have been developed.

Page 15: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Katherine Rawlins (MIT): Post-hoc vetoes in Frequentist analyses--or--

The “price of scrutiny”

“Take-away lesson”

● The frequentist formulism only works when you construct the intervals using the correct probability distribution before the measure.

● If you take additional action based on the value of the measure, you are changing the probability distribution of the final result and therefore the correct intervals will move.

● Keyword: stopping rule

Fix a confidence

belt

Set up the analysis

Get the data (N) Results

ResultsGet new data (N’)Tune the analysis

From this point, the confidence

on the results will be biased

Page 16: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Banquet Dinner Nittany Lion Inn

Harry Collins After dinner speaker

How do you know that men landed on the moon?

There are photographs…

They may be fake

We got signals from mirrors left on the surface!

But it could be an umanned mission

I do not believe in such a big conspiracy

That is my point!

Page 17: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Other remarks

Mohanty: Isn’t the p-level just exactly what we need to claim a discovery?

Prodi: But at what level should we put our threshold? IGEC already required 95% confidence on the single trial just to have barely less than one accidental claim. We would claim 99.9%CL, but can we really do this, when the background estimate uncertainty is at least 3%?

Finn: when we decide a “null hypothesis”, we usually choose the “skeptical” one. In LIGO, it is “we did not see anything”. In LISA, as for white dwarfs, it would be “we see routinely white dwarfs”

Johnson: Like ESP, we should fight to make people believe in our discoveries, we need to go to the public with a very affirmative statement

Loredo: My recipe: first, you should assess about claims of detection in the classical frequentist way (the “null not rejected” negative statement). Then, the step toward evidence is accomplished giving the Byes factor, and people will make up their own mind adopting their priors.

Page 18: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Other remarks

Woan: About synthesis of amplitudes from many detectors:

A=Sum(wn An)

be careful of noise uncertainty: if you combine detectors with enormous difference in sensitivity, a small error in the weights for the less sensitive will have terrible consequences on the overall variance.

Finn: after trigger exchange, which only allows coincidence counts, we should exchange filtered/raw data around each coincidence, in order to extract more information (and possibly make those information available)

Baggio: Exchanging raw/filtered data around coincidences apparently well established will eventually lead to a lot of scrutiny by other individual scientists in order to falsify those claims. Then, if only the tests that were not passed are published, this leads to a post-hoc (then statistically biased) reassessment of the detection. Until effects of these trials on efficiency are measured on time-shifted data, the box should stay closed. Otherwise, post-hoc scrutiny should just be limited to extract parameters of the source, without other statements on detection confidence.

Page 19: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

False discovery rate: setting the probability of false claim of detection

Page 20: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Why FDR?

When should I care of multiple test procedures?.

• All sky surveys: many source directions and polarizations are tried

• Template banks

• Wide-open-eyes searches: many analysis pipelines are tried altogether, with different amplitude thresholds, signal durations, and so on

• Periodic updates of results: every new science run is a chance for a “discovery”. “Maybe next one is the good one”.

• Many graphical representations or aggregations of the data: “If I change the binning, maybe the signal shows up better…

Page 21: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Preliminary (1) : hypothesis testingFalse discoveries (false positives)

Detected signals

(true positives)

Reported signal candidates

inefficiency

Null Retained

(can’t reject)

RejectReject Null ==

AcceptAccept Alternative

Total

Null (Ho) True

Background (noise)

U B

Type I Error α = εb

mo

Alternative True signal

Type II Error β = 1- εs

T

S m1

m-R

R = S+B

m

Page 22: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Preliminary (2): p-levelAssume you have a model for the noise that affects the measure x.

However, for our purposes it is sufficient assuming that the signal can be distinguished from the noise, i.e. dP/dp 1. Typically, the measured values of p are biased toward 0.

signal

You derive a test statistics t(x) from x.F(t) is the distribution of t when x is sampled from noise only (off-signal).

The p-level associated with t(x) is the value of the distribution of t in t(x):

p = F(t) = P(t>t(x))

• Example: 2 test p is the “one-tail” 2 probability associated with n counts (assuming d degrees of freedom)

Usually, the alternative hypothesis is not known.

p-level

1

background

pdf

• The distribution of p is always linearly raising in case of agreement of the noise with the model P(p)=p dP/dp = 1

Page 23: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Usual multiple testing procedures

For each hypothesis test, the condition {p< reject null} leads to false positives with a probability

In case of multiple tests (need not to be the same test statistics, nor the same tested null hypothesis), let p={p1, p2, … pm} be the set of p-levels. m is the trial factor.We select “discoveries” using a threshold T(p): {pj<T(p) reject null}.

• Uncorrected testing: T(p)=

–The probability that at least one rejection is wrong is

P(B>0) = 1 – (1- )m ~ m

hence false discovery is guaranteed for m large enough

• Fixed total 1st type errors (Bonferroni): T(p)= /m

–Controls familywise error rate in the most stringent manner:

P(B>0) =

–This makes mistakes rare…

–… but in the end efficiency (2nd type errors) becomes negligible!!

Page 24: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

p

S

pdf

m0

Let us make a simple case when signals are easily separable (e.g. high SNR)

Controlling false discovery fractionWe desire to control (=bound) the ratio of false discoveries over the

total number of claims: B/R = B/(B+S) q.

The level T(p) is then chosen accordingly.

m

B

m

BpT

0

)(

B R

BqFDR

m

q

R

pT

)(

m

)( pT p

B

S

cumulative counts

R

R

pmT )(

q

Page 25: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Benjamini & Hochberg FDR control procedure

Among the procedures that accomplish this task, one simple recipe was proposed by Benjamini & Hochberg (JRSS-B (1995) 57:289-300)

• choose your desired FDR q (don’t ask too much!);

• define c(m)=1 if p-values are independent or positively correlated; otherwise c(m)=Sumj(1/j)

• compute p-values {p1, p2, … pm} for a set of tests, and sort them in creasing order;

p

m

• determine the threshold T(p)= pk by finding the index k such that pj>(q/m) j/c(m) for every j>k;

reject H0

)( pT

q/c(m)

Page 26: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Status of AURIGA experiment

Page 27: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

New low frequency vibrational damping

New external suspensions: “fast” assembling in April-June

Air springs: effective above1-2 Hz

Page 28: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Low frequency suspensions installed (May 19th)

Page 29: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Low frequency suspensions installed (May 19th)

suspension activation

Page 30: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Low frequency suspensions installed (May 19th)

before after

Page 31: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Low frequency suspensions installed (May 19th)

21/05/2005

Page 32: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Noise power spectral density

• Theoretical prediction by three-modes model (spurious modes not included!).• Most parameters of the model directly measured on the assembled detector or in separated tests (Test Facility).• Mechanical Q set to:Qbar=4106

Qtr=1.5106

(Note: one-sided spectra !)

Page 33: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Collaborations

2004/07/12 AURIGA-LIGO

2005/01/17 IGEC-2

2005/06/07 AURIGA-VIRGO

2005/06/?? AURIGA-TAMA

currently exchanging data

S3 analysis almost finished

Data exchange in September

with next VIRGO engineering run

Agreement to be signed at Amaldi6

Attachment in preparation

Page 34: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Possible next upgrades: (1) Increase bias field

Increase bias field: E x 2.5

(DUAL R&D target: factor 10-100)

1) Mechanical unchanged

2) Electrical + BA decreases (as 1/E1/2)

Page 35: GravStat 2005 and other Topics in Statistics for Gravitational Waves Detection Lucio Baggio Seminar to ICRR’s GW group.

Possible next upgrades: (2) decrease temperature

Ultracryogenic operation (like AURIGA first run): T=100 mK

All noise components in AURIGA decrease with T!

1) SQUID noise saturates at 200 mK!

2) Mechanical and electrical noise assumed thermal (scale with T down to 100 mK)