Adventures in high quantile estimation

33
jrothenb – 1 Joerg Rothenbuehler

description

Extreme Tail's Tales. Adventures in high quantile estimation. Joerg Rothenbuehler. The distribution of the Maximum:. Fisher-Tippett Theorem. The Extreme Value Distributions. Generalized Extreme Value Distribution (GEV). - PowerPoint PPT Presentation

Transcript of Adventures in high quantile estimation

Page 1: Adventures in high  quantile estimation

jrothenb – 1

Joerg Rothenbuehler

Page 2: Adventures in high  quantile estimation

jrothenb – 2

The distribution of the Maximum:

).1)(,sup(

where, a.s. M e,consequenc a As

(x).Fx]P[MThen

).,....,,( maxMLet

F(x). cdf with variablesrandom iid be ,....),(Let

n

nn

21n

21

xFxx

x

XXX

XX

F

F

n

Page 3: Adventures in high  quantile estimation

jrothenb – 3

Fisher-Tippett Theorem

holds. (1) if MDA(H) F :Notation

(EVD). onsDistributi

Value Extreme called are onsdistributi These

Gumbel.or Weibull,Frechet,either is HThen

(1) )(c

)d-(M

:such that

H, df degenerate-non a andd 0,c Suppose

n

n n

n n

xHD

Page 4: Adventures in high  quantile estimation

jrothenb – 4

The Extreme Value Distributions

with xdistr. edlight tail oflimit theis

R x),eexp(Λ(x):Gumbel

with xdistr. oflimit theis

0α 0, x),exp(-(-x)(x)Ψ :Weibull

xas cxF(x)-1 :αindex tail with

onsdistributi edheavy tail oflimit theis

0α 0,x),xexp((x)Φ :Frechet

F

x

F

αα

α

αα

Page 5: Adventures in high  quantile estimation

jrothenb – 5

The three EVD can be represented by a single three parameter distribution, called the GENERALIZED EVD (GEV):

0-x

ξ1 where

0ξ if -x

exp(-- exp

0ξ if -x

ξ1- exp

(x)H

ξ/1

,ξ,

Generalized Extreme Value Distribution (GEV)

Page 6: Adventures in high  quantile estimation

jrothenb – 6

The function of the parameters

parameter. shape theisparameter crucial The

dataset. a toGEV the

fitting of purpose for the introduced are and

theis 0

theis R

1/α- ξ with dist. Weibull the toscorrespond H :0 ξ

dist. Gumbel the toscorrespond H :0 ξ

1/α ξ with dist.Frechet the toscorrespond H :0 ξ

: thecalled is R

parameterscaling

parameter location

parametershape

Page 7: Adventures in high  quantile estimation

jrothenb – 7

Excesses over high thresholds

(x)G]aXaP[X β(a)ξ,

Page 8: Adventures in high  quantile estimation

jrothenb – 8

Generalized Pareto Distribution (GPD)

0ξ if ξ/βx0

0ξ if 0 x where

ξ and 0 :parameterswith

0ξ if ) exp(-x/β -1

0ξ if β

xξ1 -1(x)G

-1/ξ

βξ,

R

Page 9: Adventures in high  quantile estimation

jrothenb – 9

Properties of GPD

β/ξendpoint xright finite has isG :0 ξ

edlight tail isG :0 ξ

1/ξindex with tailedheavy tail isG :0 ξ

1

0ξaβ ,ξ-1

ξaβ]aXaE[Xe(a)

: xafor then β, and 1 with GPD ~ X If F

Page 10: Adventures in high  quantile estimation

jrothenb – 10

The Empirical Mean Excess Function

The empirical mean excess function of a GPD with

1β .2,ξ

Page 11: Adventures in high  quantile estimation

jrothenb – 11

Modeling Extreme Events:

The number of exceedances of a high threshold follows a Poisson process (iid exp. distributed interarrival times)

Excesses over a high threshold can be modeled by a GPD An appropriate value of the high threshold can be found

by plotting the empirical mean excess function. The distribution of the maximum of a Poisson number of

iid excesses over a high threshold is a GEV with the same shape parameter as the corresponding GPD.

Page 12: Adventures in high  quantile estimation

jrothenb – 12

Extremal Index of a Stationary Time Series The extremal index measures the dependence

of the data in the tails. can be interpreted as the average cluster size in the

tails: High values appear in clusters of size means there is no clustering in the tails. If the data does not show strong long range dependence,

but has extremal index , its maxima has distribution

, where H is the GEV of iid data with the same marginal distribution.

GPD analysis may not be appropriate for data with

1 θ0

1/θ

1 θ )(Hθ x

1 θ

1 θ

1/θ

Page 13: Adventures in high  quantile estimation

jrothenb – 13

The Data: Surveyor Project

One way delays of probe packets during one week Packets sent according to a Poisson process with a rate of

2/sec Packet is time-stamped to measure delay If delay >10 sec, packet assumed lost, discarded Saturday and Sunday excluded for analysis More details:

http://telesto.advanced.org/~kalidindi/papers/INET/inet99.html

Page 14: Adventures in high  quantile estimation

jrothenb – 14

Time-Series Plot Colorado-Harvard

Monday 12:00am - Friday 8:00 pm

Page 15: Adventures in high  quantile estimation

jrothenb – 15

ACF and Ex. Index Estimation

Page 16: Adventures in high  quantile estimation

jrothenb – 16

Empirical Mean Excess Function

Page 17: Adventures in high  quantile estimation

jrothenb – 17

Estimation of Shape Parameter as a function the used threshold using GPD

Page 18: Adventures in high  quantile estimation

jrothenb – 18

Result of the GPD Fit

Page 19: Adventures in high  quantile estimation

jrothenb – 19

Fit of a GPD-Distr. for Colorado-Harvard

threshold = 107.774 Quantile of threshold = 0.9993536 Number of exceedances = 500 Parameter estimates and Standard Errors

xi beta

-0.3319409 86.50868

0.03683419 4.844786

Page 20: Adventures in high  quantile estimation

jrothenb – 20

Estimations based on GPD Fit

p quantile sfall empirical quantile0.99940 114.13890 177.50201 115.6380.99950 129.06973 188.71184 130.8681 0.99960 146.15561 201.53964 147.7083 0.99970 166.39564 216.73554 165.8802 0.99980 191.83186 235.83264 190.7157 0.99990 228.12013 263.07730 229.9743 0.99995 256.94996 284.72227 252.47050.99999 303.07272 319.35051 311.11221.00000 368.3887 368.3887 329.237

Page 21: Adventures in high  quantile estimation

jrothenb – 21

Quantile estimation as a function of the threshold

Empirical quantile

99.995% Estimate

Page 22: Adventures in high  quantile estimation

jrothenb – 22

Fitting a GEV to block wise maxima

Block 1 Block 2 Block 3 Block 4 Block 5

Page 23: Adventures in high  quantile estimation

jrothenb – 23

GEV-Fit Results for different Block sizes

Block size = 7200 : 108 Blocks

xi sigma mu

Estimation -0.3375603 59.75591 197.1503

Std. Error 0.0734163 4.69351 6.4305

Block size = 14400 : 54 Blocks

xi sigma mu

Estimation -0.4346847 51.7389 235.2513

Std. Error 0.1256784 6.2025 8.0083

Page 24: Adventures in high  quantile estimation

jrothenb – 24

High Level Estimation

Level exceeded during 1 of 50 hours

Block size Lower Estimate Upper 1h 314.8669 326.7487 355.9581

1.5 h 318.4539 327.3824 357.6406

2h 315.7877 324.6415 352.1868

Level exceeded during 1 of 100 hours

Block size Lower Estimate Upper 1h 322.4220 336.7065 371.3975

1.5h 325.7779 335.4343 371.0107

2h 324.3893 332.4487 365.1899

Page 25: Adventures in high  quantile estimation

jrothenb – 25

Does GPD always work? The Army-Lab. – Univ. of. Virginia dataset

Time Series Plot

ACF Plot, Lags:5-1000PACF Plot, Lags: 1-1000

ACF Plot, Lags:1-1000

Page 26: Adventures in high  quantile estimation

jrothenb – 26

What goes wrong beyond the LRD:

Empirical Mean Excess Function

Shape Parameter

Page 27: Adventures in high  quantile estimation

jrothenb – 27

Non-Stationarity: Harvard to Army- Lab.Time Series Plot: Monday 12 am – Friday 8 pm

Page 28: Adventures in high  quantile estimation

jrothenb – 28

Pick a few hours per day!Mean Excess Plot11am – 4pm Mon - Fri

Empirical Tail Distr.

Shape Parameter Estimation

Page 29: Adventures in high  quantile estimation

jrothenb – 29

Single Outlier: Virginia - HarvardEmpirical Tail Distr.

ACF, Lag 3-1000

Estimation of Extremal Index

Monday 12am – Friday 8pm

Page 30: Adventures in high  quantile estimation

jrothenb – 30

The effect of the outlier on GEV

Fit Without outlier:

Block size = 14400 53 Blocks

xi sigma mu

-0.4988539 45.08995 117.2362 x1=280.101

0.09091089 5.217626 6.735118

Fit With outlier

Block size = 14400 53 Blocks

xi sigma mu

-0.09130441 44.97725 109.3819 x1=1242.969

0.06274905 4.576706 6.716929

Page 31: Adventures in high  quantile estimation

jrothenb – 31

The effect of the single outlier on GPD:Analysis with outlier

Analysis without outlier

Page 32: Adventures in high  quantile estimation

jrothenb – 32

Conclusions:

The GPD is a model that can be fitted to the tails of a distribution. The quality of the fit can be checked with various methods. From the model, we can gain quantile estimates at the edge of or outside the data range. However, a good fit is often not possible.

The GEV provides a model for the distribution of block wise Maxima. Its use is supported by EVT for stationary time series without strong LRD, while GPD is only supported in the iid case. The quality of fit can be checked with similar tools as in the GPD model. Certain problems remain, and reliable quantile estimates are not available.

Page 33: Adventures in high  quantile estimation

jrothenb – 33

Acknowledgements:

•Applied Research Group at Telcordia:–E. van den Berg

–K. Krishnan

–J. Jerkins

–A. Neidhardt

–Y. Chandramouli

•Cornell University:–Prof. G. Samorodnitsky