Download - Bayesian Spatial Modeling of Extreme Precipitation Return Levels

Bayesian Spatial Modeling of Extreme

Precipitation Return Levels

Daniel COOLEY, Douglas NYCHKA, and Philippe NAVEAU (2007, JASA)

Background

• July 28, 1997, a rainstorm in Fort Collins, Colorado– killed five people– caused $250 million in damage

• 1976 Big Thompson flood near Loveland, Colorado– Killed 145 people

• 1965 South Platte flood– $600 million in damages around Denver

extreme precipitation events

• understanding their frequency and intensity is important for public safety and long-term planning

• Challenges– limited temporal records– extrapolate the distributions to locations where

observations are not available

• Data– Precipitation amount at some stations– Possibly some other covariates

Measure of extreme events

• Return level– The r-year return level is the quantile that has probability 1/r of

being exceeded in a particular year.

P(X>tr) = 1/r

• Precipitation return levels– given in the context of the duration of the precipitation event– The r-year return level of a d-hour (e.g., 6- or 24-hour) duration

interval is reported.– The standard levels for the NWS’s most recent data products

are quite extensive with duration intervals ranging from 5 minutes to 60 days and with return levels for 2–500 years.

• This article focuses on providing return level estimates for daily precipitation (24 hours)

Most recent precipitation atlas for Colorado

• Produced in 1973

• the atlas provides point estimates of 2-, 5-, 10-, 25-, 50-, and 100-year return levels for duration intervals of 6 and 24 hours.

• Shortcoming– it does not provide uncertainty measures of its

point estimates

Extreme value theory (EVT)

• Statistical models for the tail of a probability distribution

• Univariate case: generalized extreme value (GEV) distribution– Given iid continuous data Z1,Z2, . . . ,Zn and

letting Mn = max(Z1,Z2, . . . ,Zn), it is known that if the normalized distribution of Mn converges as n→∞, then it converges to a GEV

Generalized Pareto distribution (GPD)

• Using the maxima only disregards other extreme data that could provide additional information.

• GPD– based on the exceedances above a threshold– Exceedances (the amounts that observations exceed

a threshold u) should approximately follow a GPD as u becomes large and sample size increases

GPD

• Tail of the distribution

• Scale parameter• Shape parameter controls the tail

– – –

More EVTExceedance rate

Extreme of spatial data

• Weather describes the state of the atmosphere at a given time– Extreme weather events can be modeled by theory on

the dependence of extreme observations

• Climate at a given location is the distribution over a long period of time– climatological quantities, such as return levels, and

their spatial dependence must be modeled outside of the framework above

– How does the distribution of precipitation vary over space?

Goal

• Let Z(x) denote the total precipitation for a given period of time (e.g., 24 hours) and at location x.

• The goal is to provide inference for the probability P(Z(x) > z + u) for all locations, x, in a particular domain and for u large– Given this function, one can compute return

levels and other summary measures– To produce a return level map with measure

of uncertainty

Basic idea

• In the GPD model, we add a spatial component by considering all parameters to be functions of a location x in the study area.

• We assume that the values of result from a latent spatial process that characterizes the extreme precipitation and arises from climatological and orographic effects.

• The dependence of the parameters characterizes the similarity of climate at different locations

A Bayesian study

• A study of 24-hour precipitation extremes for the Front Range region of Colorado– Estimate potential flooding– Apr 1 – Oct 31– 75% of Colorado’s population lives in this

area

Study Region

Data

• 56 weather stations• Daily total precipitation amounts during 1948-

2001– 21 stations have over 50 years of data– 14 stations have less than 20 years of data– All stations have some missing values

• Covariates– Elevation– Mean precipitation (MSP)– Remark: covariate information is needed for the entire

region to interpolate over the study region and produce a precipitation map

Boulder Station

Data Precision

• Boulder Station– prior to 1971, precipitation was recorded to

the nearest 1/100th of an inch (.25 mm)– after 1971, recorded to the nearest 1/10th of

an inch (2.5 mm)

• All but three stations similarly switched their level of precision around 1970

• Low precision data is a discretization of the high precision data

Treatment to discretization

• True value is uniformly distributed around the observed value– What is the effect of such an assumption?

• Adjust the likelihood

– d is the length of the interval

How to choose the threshold u?

• Bias-variance trade off– If u is large, distribution is close to GPD– If u is large, less data can be used

• Finally, the threshold is taken as 0.55 inches– a threshold sensitivity analysis of model runs

indicates that the shape parameter is more consistently estimated above this threshold

– 7789 exceedances (2% of the original data)

Residual dependence

• Assumption– the precipitation observations are conditionally

independent spatially and temporally given the stations’ parameters

– the spatial dependence is accounted for in the stations’ parameters

• This conditional independence may not be true, though.

temporal independence

• Temporal dependence– When dependence is short range and

extremes do not occur in clusters, maxima still converges to GEV in distribution

– If a station had consecutive days that exceeded the threshold, we declustered the data by keeping only the highest measurement

– Declustering actually did not change the results much

Spatial dependence

• The authors tested for spatial dependence in the annual maximum residuals of the stations– there was a low level of dependence between

stations within 24 km (15 miles) of one another and no detectable dependence beyond this distance.

– there are very few stations within this distance that record data for the same time period

Seasonal effects

• Restricting our analysis to the nonwinter months reduces seasonality

• inspecting the data from several sites showed no obvious seasonal effect

Model for Threshold Exceedance

• Hierarchical model– Layer 1: data at each station– Layer 2: the latent process that drives the

climatological extreme precipitation for the region– Layer 3: the prior distributions of the parameters that

control the latent process

Data layer for return level

• A GPD distribution• Reparametrization • Let be the kth recorded precipitation

amount at location

density

Process layer

• A structure that relates the parameters of the data layer to the orography and climatology of the region.

• Spatial (longitude/latitude) space climate (elevation/MSP) space– Stations are sparse in the spatial space– Stations far away spatially can be close in the

climate space– MSP: mean precipitation

Scale parameter

• : A Gaussian process with

Shape parameter

• A single value for the entire study region with a Unif(-Inf, Inf) prior

• Two values– One for the mountain stations– One for the plain stations

• A Gaussian process with structure similar to the scale parameter

Process layer

Priors of

• Prior independence• Regression parameter: noninformative

• Spatial parameter– Noninformative leads to improper posterior– Informative priors from MLE

•

•

• Shape parameter

Priors

Model for Exceedance Rate

• To know the return level, we need to know both the model parameters and the exceedance rate

• Assume each station’s number of exceedances is binomial with probability parameter

• Logit transformation• Assume the logit transformed parameter as a Gaussian

process• Similar prior specification

MCMC

• Metropolis within Gibbs– Proposal distribution is obtained using normal

approximation or random walk– Three parallel chains– Each chain has 20,000 iterations– 2000 burn-in steps– Test for convergence: Gelman<1.05

• Draws are used to perform spatial interpolation and inference

Point estimate for log-transformed GPD scale parameter

Point estimate for 25-year return level for daily precipitation

0.025 and 0.975 quantile of the 25-year return level

Sensitivity analysis

• Sensitivity of the inference to prior of • Ran Model 7 with

– Original prior for : Unif[6/7,12]– Alternative prior : Unif[0.214,6]– Posterior of is sensitive to the prior– But the product is less sensitive, and it is

what is important for interpolation

Conclusions

• A Bayesian analysis for spatial extremes– Model for exceedances– Model for threshold exceedance rate parameter

• By performing the spatial analysis on locations defined by climatological coordinates, the authors were able to better model regional differences for this geographically diverse study area.

• Produce a map of return levels with features not well shown by the 1973 atlas– an east–west region of higher return levels north of the Palmer

Divide– a region of lower return levels around Greeley– region-wide uncertainty measures