Post on 31-Mar-2020
INFORMATION THEORY AND ENSEMBLE
DATA ASSIMILATION. PART I: THEORETICAL ASPECTS
Dusanka Zupanski1, Milija Zupanski1, Arthur. Y. Hou2, Sara Q. Zhang2,
and Christian D. Kummerow1
1Colorado State University, Fort Collins, Colorado
2NASA Goddard Space Flight Center, Greenbelt, Maryland
Manuscript version 2, to be resubmitted to J. Atmos. Sci
(2 tables, 5 figures)
Corresponding author address:
Dusanka Zupanski, Cooperative Institute for Research in the Atmosphere/Colorado State
University, Fort Collins, Colorado, 80523-1375; E-mail: Zupanski@cira.colostate.edu
2
Abstract
A general framework to link together information theory and ensemble data assimilation
is proposed. This framework can be used to estimate various information measures (e.g., degrees
of freedom for signal and Shannon entropy reduction) by employing information matrix defined
in ensemble subspace. There are, at least, two major advantages of defining information matrix
in ensemble subspace. The first major advantage is that relatively small dimensions of the
information matrix (equal to the ensemble size) make it possible for a straightforward and
computationally inexpensive calculation of different information measures. The second, which is
an equally important advantage, is that the information matrix employs a flow-dependent
forecast error covariance matrix, defined in terms of time evolving ensemble perturbations. The
flow-dependent forecast error covariance takes into account the impact of the model state time
evolution on the information measures.
In this two-part study, we employ the Maximum Likelihood Ensemble Filter (MLEF)
data assimilation approach in application to the Goddard Earth Observing System Single Column
Model (GEOS-5 SCM). In Part I we define theoretical background for the proposed general
framework, and focus on the impact of ensemble size, covariance localization, and on the
temporal evolution of the information measures. In Part II we evaluate the impact of different
data assimilation approaches on the information measures by comparing Kalman filter and 3-
dimansional variational solutions as two special cases of the MLEF solution.
The results of Part I indicate that it is possible to capture the essential character of the
information measures with a relatively small ensemble size. For example, in applications to the
models similar to the GEOS-5 SCM, 10 ensemble members might be sufficient. The results
3
additionally indicate that covariance localization generally increases the amount of information
and also improves the data assimilation results. The temporal evolution of the information
measures is found to be in agreement with the true model state evolution, thus indicating that the
information measures are meaningful.
4
1. Introduction
Novel probabilistic approaches to data assimilation and ensemble forecasting are often
referred to as ensemble data assimilation, or Ensemble Kalman Filter (EnKF) methods. These
methods are considered powerful because of their capability to address both data assimilation
and ensemble forecasting within a consistent mathematical approach. As other advanced data
assimilation methods, such as Kalman Filer (KF) and variational methods, ensemble data
assimilation techniques provide an “optimal” estimate of the atmospheric state using information
from available observations. The “optimal” estimate of the atmospheric state is commonly
defined either as a minimum variance, or as a maximum likelihood solution (e.g., Lorenc 1986;
Cohn 1997). Most of the ensemble data assimilation methods seek a minimum variance solution
(Evensen 1994; Houtekamer and Mitchell 1998; Lermusiaux and Robinson 1999; Hamill and
Snyder 2000; Keppenne 2000; Mitchell and Houtekamer 2000; Anderson 2001; Bishop et al.
2001; van Leeuwen 2001; Reichle et al. 2002a,b; Whitaker and Hamill 2002; Tippett et al. 2003;
Zhang et al. 2004; Ott et al. 2005; Szunyogh et al. 2005; and Peters et al. 2005). There are also
ensemble data assimilation approaches seeking a maximum likelihood solution (e.g., Zupanski
2005; Zupanski and Zupanski 2006; and Fletcher and Zupanski 2006). As explained in Fletcher
and Zupanski (2006), the differences between the two solutions can become significant in data
assimilation problems with errors described by non-Gaussian Probability Density Functions
(PDFs). This paper also points out that cloud variables and their errors would likely follow log-
normal (i.e., non-Gaussian) PDFs. Also, non-linearity of the forecast models and observation
operators would cause the errors involved in data assimilation to depart form the Gaussian
distribution. Nevertheless, in this study we assume that the errors are Gaussian, and thus assume
5
that the results of this study will also be applicable to other ensemble-based data assimilation
methods, as long as the Gaussian assumption is valid.
It has been recognized that information theory (e.g., Shannon and Weaver 1949; Rodgers
2000) and predictability are inherently related (e.g., Schneider and Griffies 1999; Kleeman 2002;
Roulston and Smith 2002; DelSole 2004; Abramov et al. 2005). Information theory has also
come to the attention of data assimilation, where it has been used to calculate information
content of various observations (e.g., Wahba 1985; Purser and Huang 1993; Wahba et al. 1995;
Rodgers 2000; Rabier et al. 2002; Fisher 2003; Johnson 2003; Engelen and Stephens 2004;
L’Ecuyer et al. 2006).
The information theory has primarily been examined in application to other data
assimilation methods (e.g., variational, KF), while its application to ensemble data assimilation
has been rather limited so far. Some of the pioneering studies in this area are as follows. Bishop
and Toth (1999) and Wang and Bishop (2003) examined the eignevalues and eigenvectors of the
Ensemble Transform Kalman Filter (ETKF, Bishop et al. 2001) transformation matrix and
demonstrated that these eigenvalues and eigenvectors define the amount and the direction of the
maximum forecast error reduction due to information from the observations. Patil et al. (2001),
Oczkowski et al. (2005), and Wei et al. (2006) used the eigenvalues of the ETKF transformation
matrix to define measures of information, referred to as “bred dimension”, “effective degrees of
freedom”, and “E dimension”, respectively. These studies have recognized that ensemble-based
methods have a potential to improve measures of information due to flow-dependent forecast
error covariance matrix, especially in applications to adaptive observations. Building upon the
previous studies, we link the ETKF transformation matrix with the so-called information or
observability matrix, defined in ensemble subspace, and demonstrate how this matrix can be
6
used to define standard measures of information theory, such as Degrees of Freedom (DOF) for
signal and Shannon entropy reduction (e.g. Rodgres 2000). Thus, we propose a general
framework to link together ensemble data assimilation and information theory in a similar
manner as in variational and KF methods. We evaluate this framework within an ensemble-based
data assimilation method, using a single column precipitation model and simulated observations.
In Part I (this paper) we focus on the impact of ensemble size and covariance localization. In Part
II (the following paper) we compare the results of the KF and the 3-dimensional variational (3d-
var) approaches, defined as special applications of the proposed framework.
Part I is organized as follows. In section 2 the general framework is described. The
experimental design is explained in section 3, and experimental results are presented in section 4.
Finally, in section 5, the conclusions are summarized and their relevance for future research is
discussed.
2. General framework
In this study we employ an ensemble data assimilation approach referred to as Maximum
Likelihood Ensemble Filter (MLEF, Zupanski 2005; Zupanski and Zupanski 2006; Zupanski et
al. 2006). Here we shortly describe the MLEF. The MLEF seeks a maximum likelihood state
solution employing an iterative minimization of a cost function. The solution for a state vector x
(also referred to as control variable), of dimension Nstate, is obtained by minimizing a cost
function J defined as
J(x) =
1
2[x ! xb ]
TPf
!1[x ! xb ]+
1
2[ y ! H (x)]
TR
!1[ y ! H (x)] , (1)
7
where y is an observation vector of dimension equal to the number of observations (Nobs), and H
is a non-linear observation operator. Subscript b denotes a background (i.e., prior) estimate of x,
and superscript T denotes a transpose. The Nobs ×Nobs matrix R is a prescribed observation error
covariance, and it includes instrumental and representativeness errors (e.g., Cohn 1997). The
matrix Pf of dimension Nstate×Nstate is the forecast error covariance. Note that we employ the
common rank-reduced square-root formulation
Pf = Pf
1
2 Pf
1
2( )T
, where Pf
1
2 is an Nstate×Nens
square-root matrix (Nens being the ensemble size).
Uncertainties of the optimal estimate of the state x are also calculated by the MLEF. The
uncertainties are defined as square roots of the analysis error covariance ( Pa
1
2 ) and the forecast
error covariance ( Pf
1
2 ), both defined in ensemble subspace. The square root of the analysis error
covariance is obtained as
!Pa
12 = pa
1pa2... pa
Nens!" #$!!= Pf
12 (Iens +C )
% 12 , (2)
where Iens
is an identity matrix of dimension Nens× Nens, and pa
i are column vectors representing
analysis perturbations in ensemble subspace. The square root in (2) is calculated via eigenvalue
decomposition of C. It is defined as a symmetric positive semi-definite square root, and therefore
it is unique (e.g., Horn and Johnson 1985, Theorem 7.2.6). This is one of the differences between
the ETKF (e.g., Bishop et al. 2001 and Wang and Bishop 2003) and the MLEF: in the ETKF a
non-symmetric, hence a non-unique square root is chosen [Bishop et al. 2001, Eq. (18b)], while
in the MLEF the unique symmetric square root is used [Zupanski 2005, Eq. (10)]. As a
consequence of using different eigenvectors in (2), the square roots of the analysis error
8
covariance in the ETKF and the MLEF are different, even for linear observation operators and
identical forecast error covariances. The eigenvalues are, however, identical under these
conditions. Note that, in application to non-linear forecast models (M), the final results of the two
approaches will be different in terms of both the eigenvalues and the eigenvectors, due to
different non-linear updates of Pf
1
2 . Non-uniqueness of the square root filters, such as ensemble
square root filters, is discussed in Tippett et al. (2003).
Matrix C of dimension Nens×Nens is defined as
ZZCT
= ; zi= R
!12H (x + p f
i) ! R
!12H (x) , (3)
where vectors zi are the columns of the matrix Z of dimension Nobs×Nens. Note that, when
calculating zi, a nonlinear operator H is applied to perturbed and unperturbed states x. Vectors
i
fp are columns of the square root of the background error covariance matrix obtained via
ensemble forecasting employing a non-linear forecast model M:
!Pf
1
2 = p f
1p f
2... p f
Nens!" #$! ; p f
i= M (x + pa
i) ! M (x) . (4)
Equations (1)-(3), referred to as analysis equations, are solved iteratively in each data
assimilation cycle, while equation (4), referred to as a forecast equation, is used to propagate in
time the columns of the forecast error covariance matrix Pf
1
2 .
A measure of information content of observations referred to as DOF for signal is often
used in information theory (e.g., Rodgers 2000). It is a number measuring the amount of
9
independent pieces of information of observations that is above the noise. In data assimilation
applications, DOF for signal (here denoted as ds) is commonly defined in terms of analysis and
forecast error covariances, Paand
Pf , (e.g., Wahba 1985; Purser and Huang 1993; Wahba et al.
1995; Rodgers 2000; Rabier et al. 2002; Fisher 2003; Johnson 2003; Engelen and Stephens
2004) as
ds = tr [Istate ! PaPf
!1] , (5a)
where tr denotes trace, and Istate
is an identity matrix of dimension Nstate× Nstate. Wahba et al.
(1995) define ds in terms of so-called influence matrix A as
ds= tr [R
- 12 HP
aH
TR- 12 ] = tr [A] , (5b)
which is equivalent to (5a), as pointed out by Fisher (2003).
Employing definition of Pa in ensemble subspace (2) and using
tr [xx
T] = tr [x
Tx] we
can write (5b) in ensemble subspace as
ds = tr [(Iens +C )
!1(Pf
12 )T
HT(R
!12 )T
R!12 HPf
12 ] . (6)
Using the non-linear operator H instead of the linear operator H, we can write
R
!12 HPf
12 " R
!12H (x + p f
i) ! R
!12H (x) . (7)
10
Finally, combining (3), (6), and (7) we have
ds= tr [(I
ens+C )
!1Z
TZ ] = tr [(I
ens+C )
!1C ] . (8)
Definition (8) is essentially the same as Eq. (2.61) of Rodgers (2000). The only difference is that
the trace is obtained employing matrix C of dimension Nens×Nens, while in the formulation of
Rodgers (2000), the trace is obtained employing information matrix of dimension Nstate×Nstate
(the full-rank information matrix). We will denote matrix C as information matrix in ensemble
subspace.
By introducing information matrix C, we have defined a link between information theory
and ensemble data assimilation. Having this link is of special importance for the following
reasons. When calculating information content measures such as ds, a flow-dependent
Pf obtained directly from ensemble data assimilation is used. In addition, eigen-decomposition
of C is easily accomplished due to the relatively small size of this matrix (Nens×Nens). Hence, it is
practical to calculate information content of numerous observations (large Nobs) in applications to
complex models with large state vectors (large Nstate). A possible disadvantage of this approach is
that a small ensemble size might not be sufficient to describe full variability of the forecast error
covariance matrix, which could potentially result in meaningless information measures. One of
the main focuses of this study is the impact of ensemble size on the information measures.
Once the information matrix C is available, various measures of information content can
be calculated. It is especially useful to define these measures in terms of the eigenvalues 2
i! of C.
Thus, as in Rodgers (2000), we can define (8) in terms of 2
i! and calculate ds as:
11
ds=
!i
2
(1+ !i
2)i
" . (9)
Similarly, Shannon information content h, defined as the reduction of entropy due to
added information from the observations (Shannon and Weaver 1949; Rodgers 2000), can be
calculated using the following formula
! +=
i
ih )1ln(
2
1 2" . (10)
As explained in Rodgers (2000, Section 2.4) the values !i
2" 1 correspond to signal and
conversely !i
2< 1 correspond to noise. Eqs. (3) and (7) indicate that the eigeinvalues !
i
2 depend
on the ratio between the forecast error covariance and the observation error covariance, both
defined in the observation locations. Thus, for the forecast errors larger than the observation
errors we have !i
2" 1 (signal), and for the forecast errors smaller than the observation errors we
have !i
2< 1 (noise). Therefore, one should expect that it is important to properly estimate a flow-
dependent forecast error covariance matrix (e.g., via ensemble-based or KF methods), since it
brings the impact of changing atmospheric conditions on the eigenvalues !i
2 and consequently
on the information measures.
We have explained how the information theory and ensemble data assimilation, in
particular the MLEF approach, can be linked to produce a technique for calculation of various
information content measures, defined in ensemble subspace. An important characteristic of the
12
MLEF approach is that it can be made identical to KF or variational methods, under special
conditions explained below. This provides an opportunity to directly compare information
measures obtained using different data assimilation approaches.
a. Connection to KF
As shown in Zupanski (2005), a linear version of the MLEF is identical to the KF under
the assumptions of the classical linear KF (e.g., Jazwinski 1970): assuming Gaussian PDFs,
linear models M, and linear observation operators H. Under these assumptions, the solution that
minimizes (1) can be explicitly calculated using the following formula (e.g., Zupanski 2005,
Appendix A, Eq. A7):
x = xb +!Pf H
T(HPf
TH
T+ R)
"1[ y " H (xb )] . (11)
The solution (11) is identical to the KF solution, since the minimization step-size α is equal to 1
for quadratic cost functions (Gill et al. 1981). The MLEF solution will remain identical to the KF
solution through all data assimilation cycles, since the linear version of the forecast error
covariance update equation (4) is the same as the KF update equation. Note that the full-rank
MLEF (Nens=Nstate) is identical to the full-rank KF, while the reduced-rank MLEF (Nens<Nstate) is
related to the reduced-rank KF, under the assumptions of the classical linear KF.
b. Connection to 3d-var
As explained before, the solution obtained by the MLEF is a maximum likelihood one,
and, in general, a non-linear one. These characteristics are shared with variational methods, thus
there is a connection to these methods as well. The full-rank non-linear MLEF solution without
13
the update of the forecast error covariance [i.e., using a prescribed covariance instead of Eq. (4)]
is identical to a 3d-var solution, since the same cost function (1) is minimized. To obtain
identical results, one can employ the same minimization method with the same Hessian
preconditioning in both the MLEF and the 3d-var (e.g., Zupanski 2005). In practice, however, it
is not always feasible to employ the perfect Hessian preconditioning in variational methods due
to large dimensions of the full-rank covariance matrices.
As explained before, the general framework proposed here should be directly applicable
to KF and 3d-var methods, as long as it remains practical to evaluate full-rank covariance
matrices. In cases when full-rank matrices are too large for practical evaluations, the information
matrix of reduced order, defined in ensemble subspace (Eq. 3), can be used as a more practical
tool to define information measures. Note that calculation of the information measures (9) and
(10) is straightforward within the ETKF and the MLEF ensemble-based approaches, since the
eigenvalues of (3) are explicitly calculated within these algorithms. In other EnKF approaches
the additional calculation of the information matrix C and its eigenvalues would have to be
included.
There are, however, some restrictions to the proposed general framework. For example,
when deriving information measures (e.g., DOF for signal and entropy reduction) we have
assumed, as in Rodgers 2000, that all errors are Gaussian. Therefore, we have implicitly
assumed weak nonlinearity in M and H, even though ensemble-based and variational methods do
not necessarily require this assumption. Consequently, the information measures obtained in
highly non-linear data assimilation problems, and also for variables that are typically non-
Gaussian (e.g., humidity and cloud microphysical variables) could be incorrect, or only
approximately correct. A theoretical framework for information measures employing non-
14
Gaussian ensembles is proposed in Majda et al. (2002) and Abramov and Majda (2004). They
have employed a different approach, based on the moment constraint optimization, to estimate
so-called “predictive utility”, which is an information measure derived from the Shannon
entropy. The framework proposed here could be further generalized following Majda et al.
(2002) and Abramov and Majda (2004). These generalizations are beyond the scope of this
study, and will be addressed in the future.
Additional potential restriction of the proposed approach is a possibility that the
calculated information measures could be underestimated in the experiments with small
ensemble size when assimilating many observations. To simulate this situation we examine
information measures in the experiments with a relatively small number of ensemble members
(10 ensemble members) and a relatively large number of observations per data assimilation cycle
(40 and 80 observations).
3. Experimental design
a. Forecast model
A single column version of the GEOS-5 Atmospheric General Circulation Model
(AGCM) is used in this study. Previous experience employing column versions of the GEOS-
series within a 1-dimensional variational data assimilation technique indicated that the 1-
dimensional framework could produce useful data assimilation results, especially in applications
to rainfall assimilation (Hou et al. 2000, 2001, 2004).
15
The GEOS-5 SCM consists of the model physics components of the GEOS-5 AGCM:
moist processes (convection and prognostic large-scale cloud condensation), turbulence,
radiation, land surface, and chemistry. The dynamic advection is driven by prescribed forcing
time series. The column model is capable of updating all the prognostic state variables and
evaluating of a suite of additional observable quantities such as precipitation and cloud
properties. The GEOS-5 SCM retains most of the non-linear complexities and interaction
between physical processes as in the full AGCM. In the meanwhile, it has the advantage of
reduced dimensions when it is used in the research experiments of ensemble data assimilation.
b. Control variable, observations
In the applications of this paper we focus on using simulated observations directly on two
state variables: temperature (T) and specific humidity (q) vertical profiles. They are also the
control variables for data assimilation. In the experiments presented, 40 model levels are used.
Thus, the dimension of the control vector is 80. The column model only updates temperature and
specific humidity during a data assimilation interval. Remaining state variables, along with the
advection forcing, are prescribed by the Atmospheric Radiation Measurement (ARM) data time
series. The Tropical Western Pacific site (130E, 15N) in ARM observation program is chosen for
the application discussed in this paper. The assimilation experiments cover the period from 7
May 1998 to 24 May 1998 (17 days).
Data assimilation interval of 6 hours is used in the experiments, and simulated
observations of temperature and specific humidity are assimilated at the end of each data
assimilation interval. Simulated observations are defined using the “true” state, defined by the
16
GEOS-5 SCM, and by adding a Gaussian white random noise to the “true” state. Thus, the
observation error covariance matrix R is assumed diagonal and constant in time. We use the
same version of the model to perform data assimilation and to create observations, thus we
assume that the model is perfect. In experiments with real observations the perfect model
assumption might not be justified. In order to relax this assumption one can use some of the
recently proposed model error estimation approaches (e.g., Heemink et al. 2001; Mitchell et al.
2002; Reichle et al. 2002a; Zupanski and Zupanski 2006).
The observations are created assuming instrumental error for T of 0.2 K at all model
levels ( Rinst
1 2= 0.2K ). The instrumental errors for q vary between
Rinst
1 2= 6.1*10
!8 and
Rinst
1 2= 7.9 *10
!4 ; the errors are defined to decrease from the lowest to the highest model level.
The total observation errors are defined as R1 2
= !Rinst
1 2 , where an empirical parameter α>1 is
employed to approximately account for representativeness errors. To approximately account for
the reduced variability in the forecast error covariance due to small ensemble size the parameter
α increases with decreasing ensemble size. The values of the parameter α are tuned to the
ensemble size to approximately satisfy the expected chi-square innovation statistic, calculated for
optimized innovations and normalized by the analysis error (e.g., Dee et al. 1995; and Menard et
al. 2000; Zupanski 2005). The instrumental errors and the values of the parameter α used in data
assimilation experiments of this study are listed in Table 1.
Initial conditions for T and q at the beginning of the first data assimilation cycle are from
ARM observations of T and q at the time (0000 UTC 07 May 1998), interpolated from
observation levels to the model levels. With this configuration the errors in initial conditions are
simulated by the difference between ARM observations and the “true” states defined by the
model simulation (started from 1800 UTC 06 May 1998 and integrated for 6 hours to 0000 UTC
17
07 May1998). This has resulted in Root Mean Square (RMS) errors of 0.46 K for Tb and
4.8×10-4 for qb in the first data assimilation cycle (recall that subscript b denotes background
values). In all subsequent cycles, 6-h forecast of T and q from the previous cycle is used to
define the background for the current cycle.
c. Ensemble perturbations
Ensemble perturbations i
fp that are used to define forecast error covariance Pf
1
2 are
prescribed in the first data assimilation cycle (cold start); in the subsequent cycles the data
assimilation scheme updates i
fp by using the analysis perturbations i
ap and by running
ensembles of forecasts (4). The cold start ensemble perturbations are defined using Gaussian
white noise with prescribed standard deviation of comparable magnitude to the observations
errors. A compactly supported second-order correlation function of Gaspari and Cohn (1999),
with decorrelation length of 3 vertical layers, is applied to the random perturbations to define a
correlated random noise (e.g., Zupanski et al. 2006). The decorrelation length of 3 vertical layers
was determined empirically, based on overall best data assimilation performance of all
experiments of this two-part study.
d. Minimization
A conjugate gradient minimization algorithm (e.g., Luenberger 1984), with line-search
defined as in Navon et al. (1992), and with Hessian preconditioning (Zupanski 2005) is used in
the experiments of this paper. In all data assimilation experiments, only a single iteration of the
18
minimization is performed, which is sufficient for linear observation operators (Zupanski 2005).
Note that non-linearity of the forecast model M, even though it influences the final data
assimilation results, it does not influence the minimization results within a filter formulation.
This would be, however, different for a smoother application, since the non-linear model would
influence the minimization results.
e. Covariance localization
Covariance localization is often used in ensemble-data assimilation applications to better
constrain the data assimilation problems with either insufficient observations or insufficient
ensemble size (e.g., Houtekamer and Mitchell 1998; Hamill et al. 2001; Whitaker and Hamill
2002). The localization was also found beneficial in the full-rank KF filter applications due to
spurious loss of variance in the discrete KF covariance evolution equation (e.g., Menard et al.
2000). Since covariance localizations are typically achieved by employing arbitrary covariance
functions (e.g., Gaspari and Cohn 1999) it is important to evaluate if such localizations could
unrealistically change information measures.
Covariance localization is applied in a set of data assimilation experiments of this paper
to assess its impact on the information content measures. We use a common localization
technique (e.g., Houtekamer and Mitchell 1998; Hamill et al. 2001; Whitaker and Hamill 2002)
based on Schur (element-wise) product between the covariance matrix and a compactly
supported covariance function. Since the localization increases the number of degrees of
freedom, the Nens leading eigenvalues and eigenvectors of the localized forecast error covariance
are selected after localization. We have employed the compactly supported second-order
19
correlation function of Gaspari and Cohn (1999), with decorrelation length of 3 vertical layers.
Recall that in data assimilation experiments without localization we also employ the same
correlation function, with the same parameters, but to define correlated random noise in the cold
start, as explained in sub-section 3c. Using the same correlation function ensures maximum
compatibility between different data assimilation experiments.
4. Results
a. Verification summary
We have performed extensive verifications of all data assimilation experiments listed in
Table 1 in terms of analysis and background errors and the chi-square innovation statistic tests
(e.g., Dee et al. 1995; and Menard et al. 2000; Zupanski 2005). The verification summary is
given in Table 2. The RMS errors of the analysis and the 6-h forecast (background) are
calculated with respect to the truth as mean values over 70 consecutive data assimilation cycles.
The mean values and the standard deviations of the chi-square statistic are calculated over 70
data assimilation cycles from the chi-square statistic values obtained in the individual data
assimilation cycles. Note that ergodic hypothesis was made when calculating the mean chi-
square values: sample mean was replaced by time mean, calculated over 70 data assimilation
cycles.
The results in Table 2 indicate that the RMS errors are increasing as the ensemble size
decreases (from 80 to 10), and also as the number of observation decreases (from 80 to 40),
which is an expected performance. The analysis and background errors of all experiments are
20
smaller than the errors of the experiment without data assimilation (no_obs), thus indicating a
positive impact of data assimilation. The analysis errors of the experiments with 80 observations
are within the estimated total observation errors (note that the total observation errors also
include empirical represenatitventess errors).
Table 2 also indicates that covariance localization generally reduces analysis and
background errors. Exceptions are some experiments with 20 ensembles (e.g., RMS Ta=0.63 K
vs. RMS Ta=0.54 K) and 40 ensembles (e.g., RMS qa=3.98*10-4 vs. RMS qa=3.74*10-4). It is not
surprising that covariance localization could sometime have an adverse impact on data
assimilation results due to an arbitrary decorrelation length imposed on the forecast error
covariance (also discussed in Houtekamer and Mitchel 2001; and Zhang et al. 2006).
The mean values of the chi-square statistic indicate that the experiments without
localization are within less than 20% difference from the expected value of 1, with standard
deviations within 15%-31%. Larger departures from the expected chi-square statistic are
obtained in the experiments with covariance localization. The mean chi-square values are 21%-
51% different from the expected value of 1. Standard deviations are in the range of 13%-38%.
Note that the mean chi-square values larger (smaller) than 1 indicate an underestimation
(overestimation) of the forecast error variance, which would result in underestimation
(overestimation) of the information measures. One should, however, expect departures from the
expected chi-square statistic, since the Gaussian assumption is not strictly valid due to non-
linearity of the forecast model. The chi-square values calculated in individual data assimilation
cycles indicated no time increasing or decreasing trends, meaning that all data assimilation
experiments had stable filter performance (figure not included). Based on the stable filter
21
performance, we will assume that chi-square values in Table 2 are acceptable, which is
admittedly a subjective assumption.
b. DOF for signal and entropy reduction
1) IMPACT OF ENSEMBLE SIZE
Information measures ds (DOF for signal) and h (entropy reduction), calculated in data
assimilation experiments with 80 observations, are shown as functions of data assimilation
cycles in Figs. 1a and 1b. Comparison of Figs. 1a and 1b indicates that both information
measures have similar variability with time; however, the amplitude of variability of h is larger.
Note that, by definition, ds cannot exceed ensemble size Nens, since matrix C has Nens
eigenvalues. The entropy reduction h, however, can be greater that Nens. As seen in Figs. 1a and
1b, the experiments with larger ensemble size typically have larger values of the information
measures, and vice versa. Assuming that the full-rank experiment produces the information
measures close to the truth, we can notice that the true information content is underestimated in
the reduced-rank experiments. Important observation is, however, that all experiments show
similar time variability of the information measures (the lines approximately follow each other).
Thus, even though small ensembles sizes could result in underestimation of the true information
measures, comparing information measures within the same ensemble size could produce
meaningful results. This is an indication that even a small ensemble size (e.g., 10 ensembles) is
sufficient to capture basic variability of the information measures.
2) IMPACT OF COVARIANCE LOCALIZATION
22
In this subsection we evaluate the impact of covariance localization on the information
measures focusing on the experiments with insufficient number of observations (40
observations) and with a relatively small ensemble size (10 ensemble members). In Fig. 2, DOF
for signal, obtained in the experiments with and without localization are plotted as functions of
data assimilation cycles. Both Fig. 2a (with 10 ensemble members) and Fig. 2b (with 80
ensemble members) indicate, in general, an increased amount of information due to localization.
This is not surprising, since covariance localization introduces extra DOF to the data assimilation
system (e.g., Hamill et al. 2001), but the total number of DOF cannot exceed 40, which is the
maximum number of independent pieces of information in the example in Fig. 2a,b. An
important observation is that the localization does not change the essential character of the
information measures (the lines with and without covariance localization are approximately
parallel). Important to note is that 10 ensemble members capture the essential temporal
variability of the information measures obtained in the experiment with 80 ensemble members,
which we consider close to the truth. There are some cases, however, with large disagreements
between the two experiments (e.g., there is a notable departure between the two lines around
cycle 56 in Fig. 2a). In such cases the experiment with localization (ds_10ens_loc) is in better
agreement with the “true” information content obtained with 80 ensemble members (Fig. 2b).
Note, however, that due verifications over a single column, it is likely to get shifted maximums
and minimums by a single point, even under similar experimental conditions.
3) TEMPORAL VARIABILITY OF THE INFORMATION MEASURES
23
As seen in Figs. 1 and 2, the information measures have peculiar time variability. One
can observe that the information measures have a maximum in the first data assimilation cycle,
and there is a steep decrease in the following data assimilation cycles. After the initial period
(lasting up to five data assimilation cycles), the information measures vary with time in a
seemingly random way. There are, however, two pronounced local maximums in the later cycles,
located around cycles 40 and 50 (the exact locations of the maximums vary between different
experiments). Interestingly enough, our experience from other data assimilation applications,
using different forecast models and observations, has also indicated that the information
measures have very similar time variability in the initial data assimilation cycles, while in the
later cycles the variability was dependent on a particular application. This is an indication that
the local maximum in the first cycle is likely a consequence of the initially prescribed forecast
error covariance matrix, which is not dependent on the model state evolution. Conversely, the
local maxima or minima in the later cycles are likely influenced by the evolving model state. In
the following text, we examine if there is a correlation between the information measures in Figs.
1 and 2 and the model state evolution.
True T, true q, observed T, and observed q are shown, respectively in Figs. 3a, b, c, and d
as functions of data assimilation cycles and model vertical levels. One can observe rapid, front-
like, time-tilted changes in both temperature and humidity around cycles 40 and 50. Comparison
with Figs. 1 and 2 indicates that the two local maxima in the information measures are also
observed around the same data assimilation cycles. One can also observe correlations between
additional smaller local maxima in Figs. 1 and 2 and rapid changes in Fig. 3, though, the rapid
changes are more pronounced in the humidity than in the temperature field. It is, therefore,
evident that the time evolution of the information measures in the later cycles is in agreement
24
with the true model state time evolution. It is not obvious, however, if the maximum in the
information measures in the first data assimilation cycle is correlated with the true model state
evolution. We will examine this issue further in Part II of this study.
One can also observe in Fig. 3 more variability in the observations than in the
corresponding model generated true fields, especially for the specific humidity field (Figs. 3b
and 3d). This is a manifestation of representativeness error, introduced by randomly perturbing
the model state variables when creating simulated observations. Recall that we have
approximately accounted for the impact of the representativenes error through the empirical
parameter α (Table 1).
Another way to look at the information measures is to compare the time evolution of the
information measures with the time evolution of the errors obtained in the experiments with and
without data assimilation. In the example shown in Fig. 4, we can compare the analysis errors of
the best data assimilation experiment (the full-rank experiment with 80 observations) with the
errors of the experiment without assimilation (no_obs). As the figure indicates, the largest errors
in both T and q of the experiment without data assimilation are associated with the abrupt
changes in the true model state around cycles 40 and 50 and also with the local maxima in the
information measures. The largest errors are reduced by the greatest amount in the analysis,
which indicates a highly efficient use of observed information (e.g., Daley 1991; Wang, and
Bishop 2003), and also confirms that the information measures are meaningful.
c. Eigenvalue spectrum of the information matrix
25
In the previous subsections we have examined information measures defined in terms of
a single parameter, such as ds or h. Since the full spectrum of the eigenvalues of C is also
available, this spectrum can also be evaluated as a more detailed information measure than a
single parameter. Here, we examine the eigenvalues of the matrix 21
)(!
+CIens
. This particular
matrix is chosen because it measures the impact of observations on the analysis error reduction
[see definition of Pa
1
2 in (2)]. In addition, the eigenvalues of 21
)(!
+CIens
are on the interval [0,1],
which is a convenient property when comparing different experiments. Note that the eigenvalues
equal to 0 indicate maximum possible information, while the eigenvalues equal to 1 indicate no
information.
The eigenvalues (1+ !i
2)"1
2 of matrix 21
)(!
+CIens
calculated in the full-rank experiment
with 80 observations and with 40 observations without localization are plotted in Fig. 5 as
functions of the eigenvalue rank. We focus here on comparing the spectrums of the eigenvalues
for the data assimilation cycles with similar values of ds, to examine if the spectrum could
potentially offer additional information. Thus, we have selected cycles 3, 4, 12, and 37 as similar
cycles, according to Figs. 1a and 2b. Let us now compare the eigenvalue spectrum in Fig. 5a for
cycles 4 and 12. Note that the two cycles have comparable values of ds (e.g., ds =11.77 in cycle 4
and ds =11.58 in cycle 12 for experiment ds_80ens in Fig. 1a). Fig. 5a indicates that, even for the
values of ds that are reasonably close to each other, one can obtain notably different distributions
of the eigenvalues. For example, the eigenvalue spectrum in cycle 4 is flatter over a larger
portion of the spectral domain, compared to the eigenvalue spectrum in cycle 12. On the other
hand, the eigenvalue spectrums of cycles 3 and 4 in Fig. 5a are both flat, and thus are more
similar, even though there is a larger difference in the values of ds (e.g., ds =15.28 in cycle 3 and
ds =11.77 in cycle 4). Thus, similarity between cycles 3 and 4 is not so much due to similar
26
values of ds, rather it is due to similar eigenvalue spectrums. Our experience is that typically the
first five data assimilation cycles have a flatter spectrum than the later cycles. When eigenvalues
are close together the corresponding eigenvectors become linearly dependent (Golub and van
Loan 1989), which is an indication that the ensemble members are not used effectively in the
initial cycles. This is not surprising, since the adjustment of the prescribed forecast error
covariance is taking place during the initial data assimilation cycles.
By comparing Figs. 5a and 5b, we can observe similar spectrums for corresponding data
assimilation cycles with the difference that the experiment with 40 observations produces more
eigenvalues equal to 1 (with no information) than the experiment with 80 observations, reflecting
the fact that 40 observations cannot bring more than 40 pieces of independent information. The
upper limit of 40 pieces of information does not imply that the experiment with 40 observations
should necessarily always have smaller amount of information than the experiment with 80
observations. Note that the information content depends on the ratio between the forecast error
and the observation error covariance, not only on the number of observations. Finally, we can
conclude that evaluations of the eigenvalue spectrum of the information matrix can provide
additional information not present in the parameters ds and h.
27
6. Conclusions
In Part I of this two-part study, we have proposed a general framework to link together
information theory and ensemble data assimilation. We have evaluated this framework in
application to the GEOS-5 SCM and simulated observations, employing ARM observations as
forcing. In this part of the study, we have focused on the impact of ensemble size, covariance
localization, and on the temporal evolution of the information measures.
Experimental results indicated that, even though larger ensemble size is desirable for
improved data assimilation results, the essential character of the information measures could still
be captured with a relatively small ensemble size (10 ensemble members in our experiments).
This follows from the fact that the information measures have indicated similar trends of increase
or decrease with time in the experiments with different ensemble sizes. The information matrix
in ensemble subspace can be, therefore, used in cases when the full-rank information matrix is
impractical to evaluate.
Experimental results also indicted that the temporal evolution of the information
measures is in agreement with the true model state evolution, which is an indication that the
flow-dependent forecast error covariance played a proper role in the definition of the flow-
dependent information matrix.
The impact of covariance localization was found beneficial: it generally improved data
assimilation results and also increased the information content of data, without introducing
unrealistic changes to the temporal evolution of the information measures.
28
The encouraging results of this study indicate that it is indeed advantageous to have a
unified framework involving information theory and ensemble data assimilation. Availability of
the eigevalue spectrum of this matrix is an additional benefit, since it can provide more detailed
information content measures. Further evaluations of the proposed approach, employing complex
atmospheric models and various observations are still needed, and are planned for the near
future.
The proposed framework is applicable to different data assimilation approaches,
including classical KF and 3d-var approaches. The impact of different data assimilation
approaches on the information measures is examined in Part II of this study.
29
Acknowledgements
We thank Chris Snyder and two anonymous reviewers for their comments that helped to
significantly clarify and improve this paper. The first author would also like to thank Graeme
Stephens, Christine Johnson, and Stephane Vannitsem for constructive discussions regarding
information content measures. This research was supported by NASA grants: 621-15-45-78,
NAG5-12105, and NNG04GI25G.
30
References:
Abramov, R, and A. Majda, 2004: Quantifying uncertainty for non-Gaussian ensembles in
complex systems. SIAM J. Sci. Stat. Comp., 26, 411-447.
Abramov, R., A. Majda and R. Kleeman, 2005: Information theory and predictability for low-
frequency variability. J. Atmos. Sci., 62, 65–87.
Anderson, J. L., 2001: An ensemble adjustment filter for data assimilation. Mon. Wea. Rev., 129,
2884–2903.
Bishop, C. H., B. J. Etherton, and S. Majumjar, 2001: Adaptive sampling with the ensemble
Transform Kalman filter. Part 1: Theoretical aspects. Mon. Wea. Rev., 129, 420–436.
Bishop, C. H., and Z. Toth, 1999: Ensemble transformation and adaptive observations. J. Atmos.
Sci., 56, 1748–1765.
Cohn, S. E., 1997: An introduction to estimation theory. J. Meteor. Soc. Japan, 75, 257–288.
Daley, R., 1991: Atmospheric Data Analysis. Cambridge University Press, 457 pp.
Dee, D., 1995: On-line estimation of error covariance parameters for atmospheric data
assimilation. Mon. Wea. Rev., 123, 1128–1145.
DelSole, T., 2004: Predictability and information theory. Part I: Measures of predictability. J.
Atmos. Sci., 61, 2425–2440.
Engelen, R. J., and G. L. Stephens, 2004: Information Content of Infrared Satellite Sounding
Measurements with Respect to CO2. J. Appl. Meteor. 43, 373–378.
Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using
Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99, (C5), 10143-
10162.
Fisher, M., 2003: Estimation of entropy reduction and degrees of freedom for signal for large
variational analysis systems. ECMWF Tech. Memo. No. 397. 18 pp.
31
Fletcher, S.J., and M. Zupanski, 2006: A data assimilation method for lognormally distributed
observational errors. Q. J. Roy. Meteor. Soc. (in press).
Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three
dimensions. Quart. J. Roy. Meteor. Soc., 125, 723–757.
Gill, P. E., W. Murray, and M. H. Wright, 1981: Practical Optimization. Academic Press, 401
pp.
Golub, G. H., and C. F. van Loan, 1989: Matrix Computations. 2d ed. The Johns Hopkins
University Press, 642 pp.
Hamill, T. M., and C. Snyder, 2000: A hybrid ensemble Kalman filter/3D-variational analysis
scheme. Mon. Wea. Rev., 128, 2905–2919.
Hamill, T. M., J. S. Whitaker, and C. Snyder, 2001: Distance-dependent filtering of background
error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev., 129, 2776–
2790.
Heemink, A. W., M. Verlaan, and A. J. Segers, 2001: Variance reduced ensemble Kalman
filtering. Mon. Wea. Rev., 129, 1718–1728.
Horn, R. A., and C. R. Johnson, 1985: Matrix Analysis. Cambridge University Press, 561 pp.
Hou, A. Y., S. Q. Zhang, A. da Silva and W. Olson, 2000: Improving assimilated global datasets
using TMI rainfall and columnar moisture observations. J. Climate., 13, 4180–4195.
Hou, A. Y., S. Q, Zhang, A. da Silva, W. Olson, C. Kummerow, and J. Simpson, 2001:
Improving global analysis and short-range forecast using rainfall and moisture
observations derived from TRMM and SSM/I passive microwave sensors. Bull. Amer.
Meteor. Soc., 81, 659–679.
Hou, A. Y., S. Q. Zhang, and O. Reale, 2004: Variational continuous assimilation of TMI and
32
SSM/I rain rates: Impact on GEOS-3 hurricane analyses and forecasts. Mon. Wea. Rev.,
132, 2094–2109.
Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter
technique. Mon. Wea. Rev., 126, 796–811.
Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for
atmospheric data assimilation. Mon. Wea. Rev., 129, 123–137.
Jazwinski, A. H., 1970: Stochastic Processes and Filtering Theory. Academic Press, 376 pp.
Johnson, C., 2003: Information content of observations in variational data assimilation. Ph.D.
thesis, Department of Meteorology, University of Reading, 218 pp. [Available from
University of Reading, Whiteknights, P.O. Box 220, Reading, RG6 2AX, United
Kingdom.]
Keppenne, C., 2000: Data assimilation into a primitive-equation model with a parallel ensemble
Kalman filter. Mon. Wea. Rev., 128, 1971–1981.
Kleeman, R, 2002: Measuring dynamical prediction utility using relative entropy. J. Atmos. Sci.,
59, 2057–2072.
Lermusiaux, P. F. J., and A. R. Robinson, 1999: Data assimilation via error subspace statistical
estimation. Part I: Theory and schemes. Mon. Wea. Rev., 127, 1385–1407.
L’Ecuyer, T. S., P. Gabriel, K. Leesman, S. J. Cooper, and G. L. Stephens. 2006: Objective
assessment of the information content of visible and infrared radiance measurements for
cloud microphysical property retrievals over the global oceans. Part i: liquid clouds. J.
Appl. Meteor. Climat., 45, 20–41.
Lorenc, A. C., 1986: Analysis methods for numerical weather prediction. Quart. J. Roy. Meteor.
Soc., 112, 1177–1194.
Luenberger, D. L., 1984: Linear and Non-linear Programming. 2d ed. Addison-Wesley, 491 pp.
33
Menard, R., S. E. Cohn, L.-P. Chang, and P. M. Lyster, 2000: Assimilation of stratospheric
chemical tracer observations using a Kalman filter. Part I: Formulation. Mon. Wea. Rev.,
128, 2654–2671.
Mitchell, H. L, and P. L. Houtekamer, 2000: An adaptive ensemble Kalman filter. Mon. Wea.
Rev., 128, 416–433.
Mitchell, H. L., P. L. Houtekamer, and G. Pellerin, 2002: Ensemble size, balance, and model-
error representation in an ensemble Kalman filter. Mon. Wea. Rev., 130, 2791–2808.
Navon, I. M., X. Zou, J. Derber, and J. Sela, 1992: Variational data assimilation with an
adiabatic version of the NMC spectral model. Mon. Wea. Rev., 120, 1433–1446.
Oczkowski, M., I. Szunyogh, and D. J. Patil, 2005: Mechanism for the development of locally
low-dimensional atmospheric dynamics. J. Atmos. Sci., 62, 1135-1156.
Ott, E., Hunt, B. R., Szunyogh, I., Zimin, A. V., Kostelich, E. J., Corazza, M., Kalnay, E.,
Patil, D. J. and Yorke, J. A. 2004: A local ensemble Kalman filter for atmospheric
data assimilation. Tellus, 56A, 273-277.
Patil, D. J., B. R. Hunt, E. Kalnay, J.A. Yorke, and E. Ott, 2001. Local low dimensionality of
atmospheric dynamics. Phys. Rev. Lett., 86, 5878-5881.
Peters, W., J.B. Miller, J. Whitaker, A.S. Denning, A. Hirsch, M.C. Krol, D. Zupanski, L.
Bruhwiler, and P.P. Tans, 2005: An ensemble data assimilation system to estimate
CO2 surface fluxes from atmospheric trace gas observations. J. Geophys. Res. 110,
D24304, doi:10.1029/2005JD006157.
Purser, R.J., and H.-L. Huang, 1993: Estimating effective data density in a satellite retrieval or an
objective analysis. J. Appl. Meteorol., 32, 1092–1107.
Rabier F., N. Fourrie, C. Djalil, and P. Prunet, 2002: Channel selection methods for Infrared
34
Atmospheric Sounding Interferometer radiances. Quart. J. Roy. Meteor. Soc., 128, 1011–
1027.
Reichle, R. H., D. B. McLaughlin, D. Entekhabi, 2002a: Hydrologic data assimilation with the
ensemble Kalman filter. Mon. Wea. Rev., 130, 103–114.
Reichle, R.H., J.P. Walker, R.D. Koster, and P.R. Houser, 2002b: Extended versus ensemble
Kalman filtering for land data assimilation. J. Hydrometorology, 3, 728-740.
Rodgers, C. D., 2000: Inverse Methods for Atmospheric Sounding: Theory and Practice. World
Scientific, 238 pp.
Roulston, M, and L. Smith, 2002: Evaluating probabilistic forecasts using information theory.
Mon. Wea. Rev., 130, 1653–1660.
Schneider, T, and S. Griffies, 1999: A conceptual framework for predictability studies. J.
Climate., 12, 3133–3155.
Shannon, C. E., and W. Weaver, 1949: The Mathematical Theory of Communication. University
of Illinois Press, 144 pp.
Szunyogh, I., E. J. Kostelich, G. Gyarmati, D. J. Patil, B. R. Hunt, E. Kalnay, E. Ott, and J. A.
Yorke, 2005: Assessing a local ensemble Kalman filter: Perfect model experiments with
the NCEP global model. Submitted to Tellus, 57A, 528-545.
Tippett, M., J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. S. Whitaker, 2003: Ensemble
square-root filters. Mon. Wea. Rev., 131, 1485–1490.
van Leeuwen, P. J., 2001: An ensemble smoother with error estimates. Mon. Wea. Rev., 129,
709–728.
Wahba, G., 1985: Design criteria and eigensequence plots for satellite-computed tomography. J.
Atmos. Oceanic Technol., 2, 125–132.
35
Wahba, G., D. R. Johnson, F. Gao, and J. Gong, 1995: Adaptive tuning of numerical weather
prediction models: Randomized GCV in three- and four-dimensional data assimilation.
Mon. Wea. Rev., 123, 3358–3370.
Wang, X., and C. H. Bishop, 2003: A comparison of breeding and ensemble transform Kalman
filter ensemble forecast schemes. J. Atmos. Sci., 60, 1140–1158.
Wei, M., Z. Toth, R.Wobus, Y. Zhu, C.H. Bishop, and X. Wang, 2006: Ensemble Transform
Kalman Filter-based ensemble perturbations in an operational global prediction system at
NCEP, Tellus, 58A, 28-44.
Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed
observations. Mon. Wea. Rev., 130, 1913–1924.
Zhang, F., Z. Meng, and A. Aksoy, 2006: Tests of an ensemble Kalman filter for mesoscale and
regional-scale data assimilation. Part I: perfect model experiments. Mon. Wea. Rev., 134,
722–736.
Zhang, F., Snyder, C., and Sun J., 2004: Impacts of initial estimate and observation availability
on convective-scale data assimilation with an ensemble Kalman filter. Mon. Wea. Rev.
132, 1238–1253.
Zupanski D. and M. Zupanski, 2006: Model error estimation employing an ensemble data
assimilation approach. Mon. Wea. Rev., 134, 1337-1354.
Zupanski D., Zupanski, M., DeMaria, M., Grasso L., Hou, A.Y., Zhang, S., and Lindsey, D.,
2005: Ensemble data assimilation and information theory. Extended abstracts of the AMS
21st Conference on Weather Analysis and Forecasting and AMS 17th Conference on
Numerical Weather Prediction, 1–5 August 2005, Washington, D.C., 4pp.
Zupanski, M., 2005: Maximum Likelihood Ensemble Filter: Theoretical Aspects. Mon. Wea.
Rev., 133, 1710–1726.
36
Zupanski, M., S. J Fletcher, I. M. Navon, B. Uzunoglu, R. P. Heikes, D. A. Randall, T. D.
Ringler, and D. Daesccu, 2006: Initiation of ensemble data assimilation. Tellus,
58A, 159-170.
37
Table Captions List
Table 1. List of data assimilation experiments discussed in this paper. Nobs indicates the number
of observations per data assimilation cycle. The empirical parameter α, varying with ensemble
size, is employed to approximately account for an unknown representativeness error. In the
experiments with suffix “_loc” localization is applied to the forecast error covariance using a
compactly supported second-order correlation function of Gaspari and Cohn (1999) with
decorrelation length of 3 vertical layers. Note that all experiments with localization employ 40
observations, while the experiments without localization employ either 40 or 80 observations.
Experiment denoted no_obs is an experiment without data assimilation.
Table 2. Total RMS errors of the analysis and the background solution, calculated with respect to
the truth over 70 data assimilation cycles, for the experiments listed in Table 1. The RMS
analysis and background errors are shown for temperature (denoted RMS Ta and RMS Tb) and
for specific humidity (denoted RMS qa and RMS qb). The RMS errors are smallest for the
experiment with Nens=80 and Nobs=80, and are largest for the experiment without data
assimilation (no_obs). The smallest RMS errors are highlighted in bold, and the largest RMS
errors are highlighted in bold italic. Also shown are the mean values and standard deviations of
the chi-square statistic, calculated over 70 data assimilation cycles.
38
Table 1. List of data assimilation experiments discussed in this paper. Nobs indicates the number
of observations per data assimilation cycle. The empirical parameter α, varying with ensemble
size, is employed to approximately account for an unknown representativeness error. In the
experiments with suffix “_loc” localization is applied to the forecast error covariance using a
compactly supported second-order correlation function of Gaspari and Cohn (1999) with
decorrelation length of 3 vertical layers. Note that all experiments with localization employ 40
observations, while the experiments without localization employ either 40 or 80 observations.
Experiment denoted no_obs is an experiment without data assimilation.
Experiment
Nens
(T and q
estimated)
Nobs
(T and q
observed)
Rinst
1 2 for T in
degrees K
Rinst
1 2 for q in kg/kg
(Min; Max errors)
Parameter
α
Localization
10ens_80obs 10 80 0.2 6.1*10-8 ; 7.9*10-4 2.1 NO
20ens_80obs 20 80 0.2 6.1*10-8 ; 7.9*10-4 1.7 NO
40ens_80obs 40 80 0.2 6.1*10-8 ; 7.9*10-4 1.4 NO
80ens_80obs 80 80 0.2 6.1*10-8 ; 7.9*10-4 1.15 NO
10ens_40obs 10 40 0.2 6.1*10-8 ; 7.9*10-4 2.1 NO
20ens_40obs 20 40 0.2 6.1*10-8 ; 7.9*10-4 1.7 NO
40ens_40obs 40 40 0.2 6.1*10-8 ; 7.9*10-4 1.4 NO
80ens_40obs 80 40 0.2 6.1*10-8 ; 7.9*10-4 1.15 NO
10ens_loc 10 40 0.2 6.1*10-8 ; 7.9*10-4 2.1 YES
20ens_loc 20 40 0.2 6.1*10-8 ; 7.9*10-4 1.7 YES
40ens_loc 40 40 0.2 6.1*10-8 ; 7.9*10-4 1.4 YES
80ens_loc 20 40 0.2 6.1*10-8 ; 7.9*10-4 1.15 YES
no_obs _ 0 _ _ _ _
39
Experiment RMS Ta
(K)
RMS Tb
(K)
RMS qa
(kg/kg)
RMS qb
(kg/kg)
Chi-square
(mean)
Chi-square
(stddev)
10ens_80_obs 0.45 0.49 3.77*10-4 3.97*10-4 1.11 0.27
20ens_80_obs 0.28 0.35 2.65*10-4 3.08*10-4 0.95 0.20
40ens_80_obs 0.23 0.32 2.26*10-4 2.91*10-4 0.92 0.15
80ens_80_obs 0.21 0.31 2.04*10-4 2.57*10-4 1.06 0.20
10ens_40_obs 0.64 0.68 4.93*10-4 5.08*10-4 1.16 0.31
20ens_40_obs 0.54 0.57 4.07*10-4 4.27*10-4 1.03 0.31
40ens_40_obs 0.51 0.55 3.74*10-4 4.14*10-4 0.84 0.22
80ens_40_obs 0.38 0.40 3.38*10-4 3.42*10-4 0.81 0.20
10ens_loc 0.57 0.58 4.35*10-4 4.51*10-4 1.21 0.34
20ens_loc 0.63 0.58 3.85*10-4 3.87*10-4 1.26 0.38
40ens_loc 0.50 0.49 3.98*10-4 3.97*10-4 0.78 0.17
80ens_loc 0.29 0.38 2.52*10-4 3.23*10-4 0.59 0.13
no_obs 0.82 0.82 6.56*10-4 6.56*10-4 _ _
Table 2. Total RMS errors of the analysis and the background solution, calculated with respect to
the truth over 70 data assimilation cycles, for the experiments listed in Table 1. The RMS
analysis and background errors are shown for temperature (denoted RMS Ta and RMS Tb) and
for specific humidity (denoted RMS qa and RMS qb). The RMS errors are smallest for the
experiment with Nens=80 and Nobs=80, and are largest for the experiment without data
assimilation (no_obs). The smallest RMS errors are highlighted in bold, and the largest RMS
errors are highlighted in bold italic. Also shown are the mean values and standard deviations of
the chi-square statistic, calculated over 70 data assimilation cycles.
40
Figure Captions List
Fig. 1. (a) Degrees of Freedom (DOF) for signal, denoted ds, and (b) entropy reduction, denoted
h, obtained in the experiments with 80 observations (first four experiments in Table 2), shown as
functions of data assimilation cycles. Note that both information measures have similar time
variability, however, the amplitude of variability of h is larger. By definition, ds cannot exceed
ensemble size; conversely, h can be larger than the ensemble size.
Fig. 2. DOF for signal, obtained in the experiments employing 40 observations, with and without
localization, are plotted as functions of data assimilation cycles. The results with 10 ensemble
members are given in (a), and with 80 ensemble members in (b).
Fig. 3. (a) True temperature, (b) true specific humidity, (c) observed temperature, and (d)
observed specific humidity, shown as functions of data assimilation cycles and model vertical
levels. Observations defined in each grid point (80 observations) are used in Fig. 3c,d. Units for
temperature are degrees K, and for specific humidity g/kg. Note rapid time-tilted changes in both
temperature and humidity around cycles 40 and 50.
Fig. 4. Analysis errors, calculated with respect to the truth, are shown as functions of data
assimilation cycles and model vertical levels. The results from the best data assimilation
experiment in Table 2 (80_ens_80_obs) are shown in (a) for temperature (in degrees K) and in
(b) for specific humidity (in g/kg). The corresponding errors of the experiment without data
41
assimilation (no_obs) are given in (c) and (d). The numbers is the upper right corners are total
RMS errors from Table 2.
Fig. 5. Eigenvalue spectrum of 21
)(!
+CIens
calculated in the experiments with 80 ensemble
members, without localization using in (a) 80 observations, and in (b) 40 observations. The
eigenvalues are shown as functions of the eigenvalue rank. The eigenvalues equal to 1 indicate
maximum information, and the eigenvalues equal to 0 indicate no information.
42
(b)
(a)
43
Fig. 1. (a) Degrees of Freedom (DOF) for signal, denoted ds, and (b) entropy reduction, denoted
h, obtained in the experiments with 80 observations (first four experiments in Table 2), shown as
functions of data assimilation cycles. Note that both information measures have similar time
variability, however, the amplitude of variability of h is larger. By definition, ds cannot exceed
ensemble size; conversely, h can be larger than the ensemble size.
44
Fig. 2. DOF for signal, obtained in the experiments employing 40 observations, with and without
localization, are plotted as functions of data assimilation cycles. The results with 10 ensemble
members are given in (a), and with 80 ensemble members in (b).
(a)
(b)
45
Fig. 3. (a) True temperature, (b) true specific humidity, (c) observed temperature, and (d)
observed specific humidity, shown as functions of data assimilation cycles and model vertical
levels. Observations defined in each grid point (80 observations) are used in Fig. 3c,d. Units for
temperature are degrees K, and for specific humidity g/kg. Note rapid time-tilted changes in both
temperature and humidity around cycles 40 and 50.
(a) T true
(c) T obs
(b) q true
(d) q obs
Vert
ical
leve
ls
Data assimilation cycles
46
Fig. 4. Analysis errors, calculated with respect to the truth, are shown as functions of data
assimilation cycles and model vertical levels. The results from the best data assimilation
experiment in Table 2 (80_ens_80_obs) are shown in (a) for temperature (in degrees K) and in
(b) for specific humidity (in g/kg). The corresponding errors of the experiment without data
assimilation (no_obs) are given in (c) and (d). The numbers is the upper right corners are total
RMS errors from Table 2.
(c) T no_obs
(a) T, 80 ens, 80 obs
Data assimilation cycles
Vert
ical
leve
ls
RMS=0.21K
RMS=0.82K
(d) q no_obs
(b) q, 80 ens, 80 obs
RMS=2.04*10-4
RMS=6.56*10-4
47
Fig. 5. Eigenvalue spectrum of 21
)(!
+CIens
calculated in the experiments with 80 ensemble
members, without localization using in (a) 80 observations, and in (b) 40 observations. The
(a)
(b)
48
eigenvalues are shown as functions of the eigenvalue rank. The eigenvalues equal to 1 indicate
maximum information, and the eigenvalues equal to 0 indicate no information.