Bayesian Spatial Modeling of Extreme
Precipitation Return Levels
Daniel COOLEY, Douglas NYCHKA, and Philippe NAVEAU (2007, JASA)
Background
• July 28, 1997, a rainstorm in Fort Collins, Colorado– killed five people– caused $250 million in damage
• 1976 Big Thompson flood near Loveland, Colorado– Killed 145 people
• 1965 South Platte flood– $600 million in damages around Denver
extreme precipitation events
• understanding their frequency and intensity is important for public safety and long-term planning
• Challenges– limited temporal records– extrapolate the distributions to locations where
observations are not available
• Data– Precipitation amount at some stations– Possibly some other covariates
Measure of extreme events
• Return level– The r-year return level is the quantile that has probability 1/r of
being exceeded in a particular year.
P(X>tr) = 1/r
• Precipitation return levels– given in the context of the duration of the precipitation event– The r-year return level of a d-hour (e.g., 6- or 24-hour) duration
interval is reported.– The standard levels for the NWS’s most recent data products
are quite extensive with duration intervals ranging from 5 minutes to 60 days and with return levels for 2–500 years.
• This article focuses on providing return level estimates for daily precipitation (24 hours)
Most recent precipitation atlas for Colorado
• Produced in 1973
• the atlas provides point estimates of 2-, 5-, 10-, 25-, 50-, and 100-year return levels for duration intervals of 6 and 24 hours.
• Shortcoming– it does not provide uncertainty measures of its
point estimates
Extreme value theory (EVT)
• Statistical models for the tail of a probability distribution
• Univariate case: generalized extreme value (GEV) distribution– Given iid continuous data Z1,Z2, . . . ,Zn and
letting Mn = max(Z1,Z2, . . . ,Zn), it is known that if the normalized distribution of Mn converges as n→∞, then it converges to a GEV
Generalized Pareto distribution (GPD)
• Using the maxima only disregards other extreme data that could provide additional information.
• GPD– based on the exceedances above a threshold– Exceedances (the amounts that observations exceed
a threshold u) should approximately follow a GPD as u becomes large and sample size increases
GPD
• Tail of the distribution
• Scale parameter• Shape parameter controls the tail
– – –
More EVTExceedance rate
Extreme of spatial data
• Weather describes the state of the atmosphere at a given time– Extreme weather events can be modeled by theory on
the dependence of extreme observations
• Climate at a given location is the distribution over a long period of time– climatological quantities, such as return levels, and
their spatial dependence must be modeled outside of the framework above
– How does the distribution of precipitation vary over space?
Goal
• Let Z(x) denote the total precipitation for a given period of time (e.g., 24 hours) and at location x.
• The goal is to provide inference for the probability P(Z(x) > z + u) for all locations, x, in a particular domain and for u large– Given this function, one can compute return
levels and other summary measures– To produce a return level map with measure
of uncertainty
Basic idea
• In the GPD model, we add a spatial component by considering all parameters to be functions of a location x in the study area.
• We assume that the values of result from a latent spatial process that characterizes the extreme precipitation and arises from climatological and orographic effects.
• The dependence of the parameters characterizes the similarity of climate at different locations
A Bayesian study
• A study of 24-hour precipitation extremes for the Front Range region of Colorado– Estimate potential flooding– Apr 1 – Oct 31– 75% of Colorado’s population lives in this
area
Study Region
Data
• 56 weather stations• Daily total precipitation amounts during 1948-
2001– 21 stations have over 50 years of data– 14 stations have less than 20 years of data– All stations have some missing values
• Covariates– Elevation– Mean precipitation (MSP)– Remark: covariate information is needed for the entire
region to interpolate over the study region and produce a precipitation map
Boulder Station
Data Precision
• Boulder Station– prior to 1971, precipitation was recorded to
the nearest 1/100th of an inch (.25 mm)– after 1971, recorded to the nearest 1/10th of
an inch (2.5 mm)
• All but three stations similarly switched their level of precision around 1970
• Low precision data is a discretization of the high precision data
Treatment to discretization
• True value is uniformly distributed around the observed value– What is the effect of such an assumption?
• Adjust the likelihood
– d is the length of the interval
How to choose the threshold u?
• Bias-variance trade off– If u is large, distribution is close to GPD– If u is large, less data can be used
• Finally, the threshold is taken as 0.55 inches– a threshold sensitivity analysis of model runs
indicates that the shape parameter is more consistently estimated above this threshold
– 7789 exceedances (2% of the original data)
Residual dependence
• Assumption– the precipitation observations are conditionally
independent spatially and temporally given the stations’ parameters
– the spatial dependence is accounted for in the stations’ parameters
• This conditional independence may not be true, though.
temporal independence
• Temporal dependence– When dependence is short range and
extremes do not occur in clusters, maxima still converges to GEV in distribution
– If a station had consecutive days that exceeded the threshold, we declustered the data by keeping only the highest measurement
– Declustering actually did not change the results much
Spatial dependence
• The authors tested for spatial dependence in the annual maximum residuals of the stations– there was a low level of dependence between
stations within 24 km (15 miles) of one another and no detectable dependence beyond this distance.
– there are very few stations within this distance that record data for the same time period
Seasonal effects
• Restricting our analysis to the nonwinter months reduces seasonality
• inspecting the data from several sites showed no obvious seasonal effect
Model for Threshold Exceedance
• Hierarchical model– Layer 1: data at each station– Layer 2: the latent process that drives the
climatological extreme precipitation for the region– Layer 3: the prior distributions of the parameters that
control the latent process
Data layer for return level
• A GPD distribution• Reparametrization • Let be the kth recorded precipitation
amount at location
density
Process layer
• A structure that relates the parameters of the data layer to the orography and climatology of the region.
• Spatial (longitude/latitude) space climate (elevation/MSP) space– Stations are sparse in the spatial space– Stations far away spatially can be close in the
climate space– MSP: mean precipitation
Scale parameter
• : A Gaussian process with
Shape parameter
• A single value for the entire study region with a Unif(-Inf, Inf) prior
• Two values– One for the mountain stations– One for the plain stations
• A Gaussian process with structure similar to the scale parameter
Process layer
Priors of
• Prior independence• Regression parameter: noninformative
• Spatial parameter– Noninformative leads to improper posterior– Informative priors from MLE
•
•
• Shape parameter
Priors
Model for Exceedance Rate
• To know the return level, we need to know both the model parameters and the exceedance rate
• Assume each station’s number of exceedances is binomial with probability parameter
• Logit transformation• Assume the logit transformed parameter as a Gaussian
process• Similar prior specification
MCMC
• Metropolis within Gibbs– Proposal distribution is obtained using normal
approximation or random walk– Three parallel chains– Each chain has 20,000 iterations– 2000 burn-in steps– Test for convergence: Gelman<1.05
• Draws are used to perform spatial interpolation and inference
Point estimate for log-transformed GPD scale parameter
Point estimate for 25-year return level for daily precipitation
0.025 and 0.975 quantile of the 25-year return level
Sensitivity analysis
• Sensitivity of the inference to prior of • Ran Model 7 with
– Original prior for : Unif[6/7,12]– Alternative prior : Unif[0.214,6]– Posterior of is sensitive to the prior– But the product is less sensitive, and it is
what is important for interpolation
Conclusions
• A Bayesian analysis for spatial extremes– Model for exceedances– Model for threshold exceedance rate parameter
• By performing the spatial analysis on locations defined by climatological coordinates, the authors were able to better model regional differences for this geographically diverse study area.
• Produce a map of return levels with features not well shown by the 1973 atlas– an east–west region of higher return levels north of the Palmer
Divide– a region of lower return levels around Greeley– region-wide uncertainty measures
Top Related