Introduction to ensemble forecasting Eric J. Kostelicheric/msri/ejk_msri2.pdf · Introduction Data...
Transcript of Introduction to ensemble forecasting Eric J. Kostelicheric/msri/ejk_msri2.pdf · Introduction Data...
Introduction to ensemble forecasting
Eric J. Kostelich
SCHOOL OF MATHEMATICS AND STATISTICS
MSRI Climate Change Summer SchoolJuly 21, 2008
Introduction Data Mathematical Framework LETKF
Co-workers:
Istvan Szunyogh, Brian Hunt, Edward Ott,
Eugenia Kalnay, Jim Yorke
and many others!
Thanks to: Dave Kuhl
Papers, preprints, and codes:
http://www.weatherchaos.umd.eduhttp://math.asu.edu/∼eric
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 2 / 32
Introduction Data Mathematical Framework LETKF
Principal papers
Preprints: www.weatherchaos.umd.edu
Initial papers:E. Ott et al., Tellus A 56 (2004), 415–428.I. Szunyogh et al., Tellus A 57 (2005),528–545.
Refined mathematical implementation: B. R. Hunt, E. K.,I. Szunyogh, Physica D 230 (2007) 112–126.
Results with real data: I. Szunyogh, E.K. et al., Tellus A 60(2008) 113–130.
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 3 / 32
Introduction Data Mathematical Framework LETKF
Recap from last time
In a chaotic process, every point is sensitiveUncertainties in initial conditions grow exponentially(at least for awhile)The weather is chaotic (as far as anyone can tell)The uncertainty in the global weather vector roughlydoubles every 2 daysForecast horizon: about 2 weeks
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 4 / 32
Introduction Data Mathematical Framework LETKF
Relevant U. S. organizations
The National Oceanographic and AtmosphericAdministration (NOAA) is a division of theDepartment of CommerceThe National Centers for Environmental Prediction(NCEP) is the division of NOAA responsible fordeveloping and maintaining weather forecast modelsSpectrum of models: Global Forecast System (GFS),Regional Spectral Model (RSM), etc.Model data is distributed to local Weather Serviceoffices, which generate public forecast products
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 5 / 32
Introduction Data Mathematical Framework LETKF
Other important modeling efforts
NASA develops and maintains its own forecast modelsInternational agreements to share forecasts andobservations (NCEP, UK Met Office, ECMWF,Canada, Japan, Brazil, etc.)Research community: Weather Research andForecasting model (WRF)NOAA and the U. S. Navy develop and maintain oceanmodelsPrivate sector efforts: AccuWeather, airlines, etc.
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 6 / 32
Introduction Data Mathematical Framework LETKF
What do we want to predict?
The best long-term forecast is climatology (the mean isthe maximum likelihood estimate)Prior to the mid 1960s, the starting initial condition wasclimatologyThe U. S. Weather Service defines “normal” as the1971–2000 averageExample: in Phoenix, Arizona, tomorrow’s weatherwill be sunny with 96% probabilityExceptional weather often is of greatest interest
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 7 / 32
Introduction Data Mathematical Framework LETKF
What is data assimilation?
The process by which empirical measurements areincorporated into a forecast model to refine an estimateof the initial conditionThe distinction between variables and parameters is amatter of definitionOperational weather forecast centers perform dataassimilation steps 4 times per day (0Z, 6Z, 12Z, 18Z)Real-time constraints: NCEP allows 20 minutes
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 8 / 32
Introduction Data Mathematical Framework LETKF
Measures of forecast quality
One objective measure of goodness:
〈forecast−observations〉
A 72-hour forecast today is as accurate as a 36-hourforecast in 1985“Holy grail:” 7-day forecasts that are as accurate as3-day forecasts are now
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 9 / 32
Introduction Data Mathematical Framework LETKF
Many applications besides weather
Controls (e.g., airplane autopilots)Ocean and climate models (obviously)Biological models (e.g., Tim Sauer & Steve Schiff)Parameter estimation
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 10 / 32
Introduction Data Mathematical Framework LETKF
Some fundamental problems
Naive approach: direct insertionDifficulty: there are usually many more grid pointsthan available measurementsDoes not account for errors in the measurementDoes not exploit correlations between nearby gridpointsThe variables in the model are not necessarily the onesthat can be easily measured
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 11 / 32
Introduction Data Mathematical Framework LETKF
Example: Global Forecast System
Principal variables in the GFS:natural logarithm of surface pressurevirtual temperaturedivergence and vorticity of the wind field
Principal measurements:barometric pressuresensible temperaturerelative humiditywind speed and directionsatellite radiances (complicated!)
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 12 / 32
Introduction Data Mathematical Framework LETKF
Typical 6-hour land surface dataset: 31,310 locations
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 13 / 32
Introduction Data Mathematical Framework LETKF
Typical 6-hour surface marine dataset: 2,642 locations
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 14 / 32
Introduction Data Mathematical Framework LETKF
Typical 6-hour satellite dataset: 53,842 locations
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 15 / 32
Introduction Data Mathematical Framework LETKF
The observation space
For these reasons, data assimilation is done in theobservation spaceGiven a vector of observations y, interpolate the modelstate x to the same locationsThe interpolation operator is denoted HThe innovation is y−H(x)
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 16 / 32
Introduction Data Mathematical Framework LETKF
Basic idea: Weighted least squares
Observations: y ∈ Rp, y = H(xt)+ ε
Observation errors: E(ε) = 0, E(εεT) = RModel forecast (“background”): x ∈ Rn, xb = xt +η
Model errors: E(η) = 0, E(ηηT) = Pb
Goal: minimize the objective function
J(x)= [y−H(x)]TR−1[y−H(x)]+(x−xb)TP−1b (x−xb)
Minimization produces an analysis xa with associatedcovariance Pa
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 17 / 32
Introduction Data Mathematical Framework LETKF
Simplest assumptions
The observation errors ε are normally distributed withmean 0 and covariance RModel errors similarly: N(0,Pb)When the underlying model is linear, it can be shownthe the minimizer xa of J is unique, unbiased and hasminimum variance among all linear estimatorsWeather models are “linear enough” over 6-hourintervals, but there is no guarantee of optimality
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 18 / 32
Introduction Data Mathematical Framework LETKF
The dimensionality problem
Must evaluate
J(x)= [y−H(x)]TR−1[y−H(x)]+(x−xb)TP−1b (x−xb)
where y ∈ Rp, x ∈ Rn
Current NCEP operations: p∼ 1.75 million andn∼ 3 billionWe need R−1 (p×p) and P−1
b (n×n)
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 19 / 32
Introduction Data Mathematical Framework LETKF
The computational complexity problem
Inversion of a k× k matrix is an O(k3) algorithmIf a 100×100 matrix takes ∼ 1 sec to invert, then a109×109 matrix takes ∼ 1018 secR is nearly diagonal if observation errors are mostlyuncorrelatedPb is not diagonalComputing Pb(t +∆t) from Pa(t) requires integrationof the tangent linear model
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 20 / 32
Introduction Data Mathematical Framework LETKF
Complexity reduction strategies
Localization: Try to do the minimzation over smallerregions of the globeEstimate and precompute P−1
b : Assume that theforecast uncertainty is approximately constant fromone day to the next. (Used in all current operational DAsystems)Thin the observations and use only the “mostimportant” ones
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 21 / 32
Introduction Data Mathematical Framework LETKF
Each strategy has drawbacks
Assuming Pb ≈ constant ignores the “errors of the day”Generally regarded as one of the key impediments tobetter forecastsThe result of sequential assimilation of observationsdepends on the order of processingMust assure continuity at the boundaries of the smallerregions
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 22 / 32
Introduction Data Mathematical Framework LETKF
The Local Ensemble Transform Kalman Filter (LETKF)
Addresses many of these problemsExploits the “geometry of uncertainty” in chaoticprocesses to lower the dimension but still account forerrors of the dayAssimilates all the data at onceUses localization and sets of observations that varyslowly in space to help assure continuityPermits efficient implementation on massively parallelcomputers
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 23 / 32
Introduction Data Mathematical Framework LETKF
The geometry of forecast uncertainty
The size of a typical high- or low-pressure system isabout 1000 km×1000 km (≈ Texas)The GFS, when run at medium (T62) resolution,contains about 3000 grid-point variables in Texas-sizedregionsSuppose we run k statistically equivalent forecastsWhat are the singular values of the resulting 3000× kforecast matrix XF?
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 24 / 32
Introduction Data Mathematical Framework LETKF
Correlation and dimensionality
Over most Texas-sized regions, one solution looksmuch like anotherThe columns of XF tend to be highly correlated
so the SVD of XF yields a good rank-r approximationeven when r � k
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 25 / 32
Introduction Data Mathematical Framework LETKF
Correlation and dimensionality
Over most Texas-sized regions, one solution looksmuch like anotherThe columns of XF tend to be highly correlatedso the SVD of XF yields a good rank-r approximationeven when r � k
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 25 / 32
Introduction Data Mathematical Framework LETKF
Key empirical finding
This was a key finding byD. J. Patil et al. PRL 86 (2001), 5878–5881.
GFS at T62 resolution: ∼ 3000 grid variables overtypical Texas-sized regionTypical ensemble of 100≤ k ≤ 200 forecasts generatesa 3000× k forecast matrix XF whose first r singularvectors, 40≤ r ≤ 80, yield an excellent approximationof the forecast uncertainty
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 26 / 32
Introduction Data Mathematical Framework LETKF
The ensemble dimension
The ensemble dimension (E-dimension) of an n× kmatrix is
E ≡ (s1 + s2 + · · ·+ sk)2
s21 + s2
2 + · · ·+ s2k
Measures the eccentricity of the “ellipse” of forecastuncertainty
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 27 / 32
Introduction Data Mathematical Framework LETKF
Example: s1 = 3.78, s2 = 3.60, Edim = 1.99
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 28 / 32
Introduction Data Mathematical Framework LETKF
Example: s1 = 19.24, s2 = 4.35, Edim = 1.43
−5 −4 −3 −2 −1 0 1 2 3 4 5−4
−3
−2
−1
0
1
2
3
4
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 29 / 32
Introduction Data Mathematical Framework LETKF
Example: s1 = 83.65, s2 = 4.33, Edim = 1.10
−15 −10 −5 0 5 10 15
−10
−5
0
5
10
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 30 / 32
Introduction Data Mathematical Framework LETKF
The key idea behind the LETKF
If the E-dimension is much less than the dimension ofthe overall space, then the distribution is “flat”The ensemble forecast uncertainty over a typicalsynoptic region resembles a “pancake” (at least forshort intervals)Reduce the dimensionality of the problem by changingcoordinates to the r-dimensional subspace containingmost of the forecast uncertaintyThe dynamics reduces the uncertainty in the remainingdirections
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 31 / 32
Introduction Data Mathematical Framework LETKF
Next lecture
Outline of the Kalman filterMathematical details of how we accomplish thedimension reductionResults with operational models and real observations
MSRI Lecture #2 E. Kostelich MATHEMATICS AND STATISTICS 32 / 32