Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

23
Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts Ian Jolliffe and David Stephenson 1 EMS September 2013 0 5 10 15 20 0.6 0.8 1.0 1.2 (a+c) ratio n=500 n=100 n=20 1. Sampling uncertainty and sampling schemes for (2x2) tables 2. Hit rate 3. Extensions – other measures and serial correlation

description

Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts. Ian Jolliffe and David Stephenson. Sampling uncertainty and sampling schemes for (2x2) tables Hit rate Extensions – other measures and serial correlation . Binary deterministic forecasts. - PowerPoint PPT Presentation

Transcript of Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Page 1: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Ian Jolliffe and David Stephenson

1EMS September 2013

0 5 10 15 20

0.6

0.8

1.0

1.2

(a+c)

ratio

n=500n=100n=20

1. Sampling uncertainty and sampling schemes for (2x2) tables

2. Hit rate 3. Extensions – other measures and

serial correlation

Page 2: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

EMS September 2013 2

Binary deterministic forecasts• Such forecasts are fairly common – forecast

whether or not an event will occur• Their format leads to a (2x2) contingency

table

Page 3: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

EMS September 2013 3

(2 x 2) table and some verification measures

• a/(a+c) Hit rate (H) = probability of detection

• b/(b+d) False alarm rate (F) = probability of false detection

• H-F - Peirce’s (1884) skill score (PSS)

• (a+d)/n Proportion correct (PC)

• (a+b)/(a+c) Frequency bias• a/(a+b+c). Critical success

index (CSI) = threat score

Event Observed

Event not observed

Total

Event Forecast

a (Hits) b (False alarms)

a + b

Event not forecast

c (Misses)

d (Correct rejections)

c + d

Total a + c b + d n

… many more -18 in Chapter 3 (by Hogan & Mason) in Jolliffe and Stephenson (2012) Forecast Verification. A Practitioner’s Guide in Atmospheric Science, 2nd edition, Wiley.

Page 4: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Uncertainty/inference for verification measures

• Given the value of some verification measure, some idea of its uncertainty is needed to make inferences e.g. construct confidence intervals

• The example is a subset of the well-known Finlay tornado data for May 1884. The figure resamples from these data.

EMS September 2013 4

Tornado Observed

Tornado not observed

Total

Tornado Forecast

3 19 22

Tornado not forecast

7 511 518

Total 10 530 540

Histogram of hr

hr

Freq

uenc

y

0.0 0.2 0.4 0.6 0.8 1.0

020

040

060

080

010

0012

00

Page 5: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Sampling schemesEvent Observed

Event not observed

Total

Event Forecast

a b a + b

Event not forecast

c d c + d

Total a + c b + d n

EMS September 2013 5

Could have:1. a, b, c, d all independent

Poisson

2. n fixed; a, b, c, d multinomial

3. Row totals fixed or column totals fixed – independent binomials

4. Row totals and column totals fixed – hypergeometric

Which is most plausible?

Does it make much difference?

Page 6: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

MULTINOMIAL SAMPLING BINOMIAL SAMPLING

Binomial sampling has fixed a+c=10 and so hit rate is always a multiple of 1/10 Multinomial has additional sampling variation in hit rates between 1/10ths 6EMS September 2013

Page 7: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Sampling schemesEvent Observed

Event not observed

Total

Event Forecast

a b a + b

Event not forecast

c d c + d

Total a + c b + d n

EMS September 2013 7

Could have:1. a, b, c, d all independent (Poisson)

2. n fixed; a, b, c, d multinomial

3. Row totals fixed or column totals fixed – independent binomials

4. Row totals and column totals fixed – hypergeometric

• The second of these is the most plausible for much climate data

• Hogan & Mason (Chapter 3 of Jolliffe & Stephenson) give (approximate) variances for 16 measures, but they assume column totals fixed.

Page 8: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Sampling schemesEvent Observed

Event not observed

Total

Event Forecast

a b a + b

Event not forecast

c d c + d

Total a + c b + d n

EMS September 2013 8

Could have:1. a, b, c, d all independent (Poisson)

2. n fixed; a, b, c, d multinomial

3. Row totals fixed or column totals fixed – independent binomials

4. Row totals and column totals fixed – hypergeometric

• The second of these is the most plausible for much climate data – but you may disagree!!

• Hogan & Mason (Chapter 3 of Jolliffe & Stephenson) give (approximate) variances for 16 measures, but they assume column totals fixed.

Page 9: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Variance of hit rate

EMS September 2013 9

• Hit rate or probability of detection is H = a/(a+c)• Suppose that (a+c) is fixed (binomial sampling) and that θH is

the probability that the event has been forecast, given that it occurred

• Then var(H) = θH(1- θH)/(a+c) which is estimated by ac/(a+c)3

• The multinomial sampling scheme can be obtained by first sampling (a+c) from a binomial with n trials and probability of success equal to the probability of event occurring (base rate); then, given the sampled value of (a+c), sample from the binomial with (a+c) trials and probability of success θH

Page 10: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Variances of hit rate II

EMS September 2013 10

• It turns out that with multinomial sampling, var(H) = θH(1- θH)/(a+c) is replaced by var(H) = θH(1- θH)E[(a+c)-1] with slight abuse of notation

• Using a variance expression based on fixed (a+c) ignores the variability in (a+c) that occurs under multinomial sampling

• There is a complication that (a+c) can equal zero, leading to an infinite value of E[(a+c)-1], but data with (a+c) = 0 can be ignored as they provide no information on the performance of the forecasts

Page 11: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Multinomial vs. binomial comparison for hit rate

• The table gives , for n=100 , some values of the ratio of multinomial vs. binomial variances for various values of (a+c)

• The diagram shows this ratio for more values of (a+c) and three values of n

EMS September 2013 11

s (a+c) nsE[(a+c)-1]

0.02 2 1.153

0.06 6 1.231

0.10 10 1.115

0.20 20 1.0440 5 10 15 20

0.6

0.8

1.0

1.2

(a+c)

ratio

n=500n=100n=20

Page 12: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Multinomial vs. binomial comparison

EMS September 2013 12

0 5 10 15 20

0.6

0.8

1.0

1.2

(a+c)

ratio

n=500n=100n=20

• Inflation of variance for most values of (a+c)

• Exception for very small values of (a+c) – due to frequently discarded zero values?

• Maximum inflation of around 30% occurs around (a+c) = 4

• Inflation decreases towards 0 as (a+c) increases

• A remarkable similarity of curves for different n

• For the tornado data, multinomial variance is 12.7% larger than for binomial

Page 13: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Extensions

• Only one measure (hit rate) has been examined here• Exactly the same reasoning can be used for other

measures with a similar ratio formula• Modifications are needed for other measures• Serial correlation is another complication – the

results given assume independence which is not necessarily true. Can have a bigger effect than choice of sampling scheme.

EMS September 2013 13

Page 14: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

• When reporting values of verification measures it is important to quantify the uncertainty associated with that value

• For the seemingly simple case of data in a (2x2) contingency table this is a surprisingly subtle task because– Different sampling schemes lead to different variances– Serial correlation (or other forms of dependence) also

change variances• Some fairly general results can be found, but for many measures

and situations tailor-made calculations may be needed• Not withstanding the difficulties, the calculations should be

done

EMS September 2013 14

Conclusions

Page 15: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

EMS September 2013 15

Questions?

Comments?

[email protected]

Page 16: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Other verification measures• Exactly the same reasoning can be used to obtain multinomial-

based variances for measures which are proportions, with the denominator equal to a sum of cell counts and the numerator a sum of a subset of the denominator counts, for example – F = False alarm rate b/(b+d)– J = Threat score a/(a+b+c)

• The variance comparison table for H can be used– For F replacing (a+c) by (b+d)– For J, replacing (a+c) by (a+b+c). The comparison here is

with an unrealistic sampling scheme, which nonetheless corresponds to a variance estimate given in the literature.

EMS September 2013 16

Page 17: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Other verification measures II

• For proportion correct, there are exact analytic expressions for variance under both binomial and multinomial sampling, which can be compared

EMS September 2013 17

For the tornado data, the percentage increases in variance for multinomial sampling compared to the alternative scheme assumed by the table are 12.7 (H), 3.4 (J) and 17.5 (PC)

Asymptotic expressions are available for some other measures, but different considerations are needed for exact values, possibly including simulation

Page 18: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Serial correlation – another complication

• All that has been said has assumed independence of the n observations being forecast

• This is not necessarily true – there may be serial correlation. Rain today may be more likely if there was rain yesterday than if there was not

• Serial correlation can have a bigger effect on variance than assuming the wrong sampling scheme

EMS September 2013 18

Page 19: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Serial correlation – an example

• Gabriel & Neumann (1962), QJRMS, 88, 90-95, give data on wet/day days in Tel Aviv for 27 years of daily data, November-April

• There is serial correlation – for example, for November the probability of a wet day following a wet (dry) day is 0.60 (0.13)

• To assess how much such serial correlation affects variances of verification measures use Markov chain simulation

EMS September 2013 19

Page 20: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Markov chain simulation

• Wilks (2010), QJRMS, 136, 2109-2118 considers probability forecasts and builds in serial dependence between forecasts directly

• We consider binary deterministic forecasts with dependence built directly into the observations and hence indirectly into the forecasts

• We simulate from a two-state Markov chain for various values of n (sample size), s (base rate) and ρ, the serial correlation

EMS September 2013 20

Page 21: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Multinomial vs. binomial comparison for hit rate

• The table gives , for n=100 , some values of the ratio of variances with/without serial correlation for various values of (a+c) and ρ

• The diagram shows this ratio for more values of (a+c) and three values of n

EMS September 2013 21

s (a+c) ρ nsE[(a+c)-1]

0.10 10 0.25 1.216

0.10 10 0.75 1.876

0.20 20 0.25 1.078

0.20 20 0.75 1.548

0.2 0.3 0.4 0.5 0.6 0.7 0.8

1.0

1.2

1.4

1.6

1.8

2.0

ρra

tio

n=20; (a+c)=10n=100; (a+c)=10n=100; (a+c)=20n= 500; (a+c)=10n=500; (a+c)=20n=500; (a+c)=100

Page 22: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Serial correlation – simulation results

• Ratio gets bigger for increasing ρ

• Largest values are bigger than when comparing sampling schemes

• For given n, things get worse as (a+c) decreases

• Things get worse for lower base rate

EMS September 2013 22

0.2 0.3 0.4 0.5 0.6 0.7 0.8

1.0

1.2

1.4

1.6

1.8

2.0

ρ

ratio

n=20; (a+c)=10n=100; (a+c)=10n=100; (a+c)=20n= 500; (a+c)=10n=500; (a+c)=20n=500; (a+c)=100

Page 23: Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Serial correlation - examples• The Gabriel/Neumann data have large n and moderate s and

ρ, so the effect of serial correlation is small • For example, for November, ρ=0.47, s=0.24 and n=810 leading

to only a 1% increase in variance• For the May tornado data, n is again large (540) but s is much

smaller (0.02). We don’t know ρ but if it were 0.5, then variance would be increased by about 30% by serial correlation.

• In reality non-independence is likely to exist in the tornado data but will be more complex with space and time both involved

EMS September 2013 23