Overview of Extreme Value Analysis (EVA)reich/talks/Rossby.pdf · Common objectives in EVA I...
Transcript of Overview of Extreme Value Analysis (EVA)reich/talks/Rossby.pdf · Common objectives in EVA I...
Overview of Extreme Value Analysis (EVA)
Brian Reich
North Carolina State University
July 26, 2016
RossbypaloozaChicago, IL
Brian Reich Overview of Extreme Value Analysis (EVA) 1 / 24
Importance of extremes in climate research
I From heat waves to hurricanes, often the environmentalprocesses that are the most critical to understandprobabilistically are extreme events
I There is a large literature on EVA
I There are some beautiful mathematical results
I The statistical methodology is unique
I There are many open statistical problems
Brian Reich Overview of Extreme Value Analysis (EVA) 2 / 24
Common objectives in EVA
I Estimate the 1,000 year return level, i.e., the value thatoccurs on average once every 1,000 years
I Identify environmental covariates that drive extremes
I Test the hypothesis that the likelihood of an extreme eventis changing over time
I Determine if two locations are asymptotically dependent
I Project the change in the 99.9th percentile in 2050
Brian Reich Overview of Extreme Value Analysis (EVA) 3 / 24
Unique statistical challenges
I Most of our intuition and methodology are built aroundthe mean and deviation from the mean
I In EVA concepts like mean and variance are irrelevantbecause they don’t speak to the tail of the distribution
I Similarly, correlation isn’t the best measure of dependencebecause it is based on deviation around the means
I We need new ways to describe distributions anddependence between random variables
Brian Reich Overview of Extreme Value Analysis (EVA) 4 / 24
Isolating the extremes
I The first step in classic EVA is to separate extremeobservations from the bulk of the distribution
I For example, in a daily time series of precipitation in FLthese correspond to very different weather regimes
I Bulk: Thunderstorms
I Tails: Hurricanes
I Mean regression focus on thunderstorms and treatshurricanes as outliers
I If you want to estimate the 100-year storm, you shouldfocus only on hurricanes
I Two common ways to isolate the extremes: block maximaand points above a threshold
Brian Reich Overview of Extreme Value Analysis (EVA) 5 / 24
Block maxima (BM) in CheeseboroBlock is a year and the block maximum is annual maximum
2000 2002 2004 2006 2008 2010
020
4060
80
Year
Hou
rly W
ind
Spe
ed
●
●
●
●
●●
●
●
●
●
Brian Reich Overview of Extreme Value Analysis (EVA) 6 / 24
Points above a threshold (POT) for CheeseboroThe threshold is 50 and we analyze the points in red
2000 2002 2004 2006 2008 2010
020
4060
80
Year
Hou
rly W
ind
Spe
ed
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●●●●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ●●●●
●
●●●●
●●●
●
●●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●●●●
●
●
●
●● ● ●
●●
●●
●●
●●
●
●
●●
●
●
●
Brian Reich Overview of Extreme Value Analysis (EVA) 7 / 24
Pros/cons of analyzing BM
Pros:I Can evoke EVA theory and use a simple modelI It removes dependence with block
Cons:I Excludes some large values (second highest each year)I Must pick the block size: too big and you lose data; too
small and you can’t use EVA theory
Brian Reich Overview of Extreme Value Analysis (EVA) 8 / 24
Pros/cons of analyzing POT
Pros:I Can evoke EVA theory and use a simple modelI Retains all large values in the analysis
Cons:I Must deal with dependence within blockI Setting the threshold is really difficult
Brian Reich Overview of Extreme Value Analysis (EVA) 9 / 24
BM: normal distribution/central limit theorem
I Let Y1, ...,Yn be the n independent and identicallydistributed values in a block (say n = 365 days in a year)
I In certain conditions, for large n the sample (annual) mean
Yn =1n
n∑i=1
Yi
is approximately normally distributedI Holds with some forms of dependence and nonstationarityI The underlying data Yi do not have to be GaussianI For example, the mean of 10 uniforms is ≈ normalI So if you are analyzing data that are constructed as
means, then a normal distribution is a good startI You should still check the assumption of normality
Brian Reich Overview of Extreme Value Analysis (EVA) 10 / 24
BM: GEV distribution
I A similar result holds for the block maximum
Yn = max{Y1, ...,Yn}
I Under certain conditions, for large n Yn approximatelyfollows the Generalized Extreme Value (GEV) distribution
I Holds with some forms of dependence and nonstationarity
I So if you are analyzing data that are constructed as sayannual maximums, then GEV is a good start
I You should still of course check the GEV fit
Brian Reich Overview of Extreme Value Analysis (EVA) 11 / 24
BM: GEV distribution
I The GEV has three parameters:I Location: µ
I Scale: σ > 0
I Shape: ξ
I The shape defines three special cases:I Weibull: ξ < 0 and the distribution is bounded above
I Gumbel: ξ = 0 and the distribution is unbounded
I Frechet: ξ > 0 and the distribution is bounded below
Explore the shape of the distribution: http://teaching.stat.ncsu.edu/shiny/bjreich/GEV/
Brian Reich Overview of Extreme Value Analysis (EVA) 12 / 24
BM: Fitting in R
I Say Y1, ...,Ymiid∼ GEV(µ, σ, ξ)
I Estimates of the three GEV parameters and the standarderrors can be obtained with usual MLE
I The fgev package in R does this
I CLIMDEX example: http://www4.stat.ncsu.edu/~reich/Rossby/CLIMDEX_GEV.html
Brian Reich Overview of Extreme Value Analysis (EVA) 13 / 24
BM: Model checking
I Just because data are block maxima doesn’t necessarilymean they fit the GEV perfectly
I Why?
I QQ-plots are a good diagnostic
I KS goodness-of-fit tests can be constructed
Brian Reich Overview of Extreme Value Analysis (EVA) 14 / 24
BM: Return levels
I Say the data are annual maxima
I The n-year return level is the value exceeded once every1/n years
I This is the 1− 1/n quantile of the GEV distribution
I In R, the n-year return level is
RLn = qgev(1-1/n,mu.est,sigma.est,xi.est)
I Standard errors account for uncertainty in the GEVparameters, and can be found using the delta method
Brian Reich Overview of Extreme Value Analysis (EVA) 15 / 24
BM: non-stationarity
I Until now we have assumed the distribution isstationary, i.e., constant over time
I However, the GEV parmeters (usually µ, but sometimes σand ξ) can vary with time
I Linear GEV location: Yt ∼ GEV(β0 + tβ1, σ, ξ)
I Add a covariate: Yt ∼ GEV(β0 + tβ1 + Xtβ2, σ, ξ)
I Linear GEV location and scale:Yt ∼ GEV[β0 + tβ1,exp(α0 + tα1), ξ]
I These models can be fit in evd/fgev using MLE
Brian Reich Overview of Extreme Value Analysis (EVA) 16 / 24
POT: GPD distribution
I A POT analysis begins by selecting a threshold T thatseparates the bulk from the extremes
I The dataset for analysis then becomes only the values thatexceed T
I For most distributions, for large enough T the tail of thedistribution matches the Generalized Pareto Distribution(GPD)
I Picking the threshold too low leads to bias because theGPD doesn’t fit well
I Picking the threshold too high leads to high variancebecause the number of observations is small
Brian Reich Overview of Extreme Value Analysis (EVA) 17 / 24
POT: GDP distribution
I The GPD has three parameters:I Location/lower bound/threshold: T
I Scale: σ > 0
I Shape: ξ
I The shape defines three special cases:I ξ < 0 and the distribution is bounded above
I ξ > 0 and the distribution is unbounded
Explore the shape of the distribution: http://teaching.stat.ncsu.edu/shiny/bjreich/GPD/
Brian Reich Overview of Extreme Value Analysis (EVA) 18 / 24
POT: Fitting in R
I Say we set the threshold at T and Y1, ...,YmT are the mTobservations above T
I The model is Y1, ...,YmT
iid∼ GPD(T , σT , ξT )
I Estimates of the two GPD parameters σT and ξT and thestandard errors can be obtained with usual MLE
I The fpot package in R does this
I CLIMDEX example: http://www4.stat.ncsu.edu/~reich/Rossby/CLIMDEX_GPD.html
Brian Reich Overview of Extreme Value Analysis (EVA) 19 / 24
POT: Model checking
I The biggest challenge is picking the threshold T
I For the GPD, for any u > T the mean residual life is:E(Y − u|Y > u) = σ
1−ξ + ξ1−ξu
I The mean residual life (MRL) plot plots the sample meanestimate of E(Y − u|Y > u) versus u
I The smallest u which the MRL plots is linear above u is areasonable threshold choice
I In R/evd this is the mrlplot function
Brian Reich Overview of Extreme Value Analysis (EVA) 20 / 24
POT: Return levels
I Now the data are daily data
I The n-year return level is the value exceeded once every1/n years, which is 1/(365n) days
I Let pT be the probability below the threshold
I On a given day the probability of being below u > T ispT + (1− pT )FGPD(u)
I In R, the n-year return level isq = 1-(1/(365*n)-p.T)/(1-p.T)RLn = qgpd(q,thresh,sigma.est,xi.est)
I Standard errors account for uncertainty in the GPDparameters, and can be found using the delta method
Brian Reich Overview of Extreme Value Analysis (EVA) 21 / 24
POT: non-stationarity
I As with the GEV, the GPD parameters can vary overspace and time following covariates
I Allowing the threshold to vary with covariates is probably agood idea, but really tricky
I Another departure for the simple model assumptions isserial dependence in the daily data
I This can be handled by declustering
I For example, if 5 consecutive days exceed the thresholdthen only the largest is retained
I Declustering is implemented in fpot
Brian Reich Overview of Extreme Value Analysis (EVA) 22 / 24
Other extensions
I Multivariate extremes
I Time series of extremes and heat waves
I Spatial extremes
I Detection and attribution
I Methods to handle large n
I Many more!
Brian Reich Overview of Extreme Value Analysis (EVA) 23 / 24
Resources
I Book on applied EVA: Coles (2001)
I Book on theory: de Haan and Ferreira (2006)
I Book on recent methods: Dey and Yan (2016)
I More computing in R: evd; extRemes;SpatialExtremes
I My info: http://www4.stat.ncsu.edu/~reich/
Brian Reich Overview of Extreme Value Analysis (EVA) 24 / 24