Diagnostics and optimization of analysis by cross-validation · Richard Ménard and Martin...
Transcript of Diagnostics and optimization of analysis by cross-validation · Richard Ménard and Martin...
Diagnostics and optimization of analysis by cross-validation
Richard Ménard and Martin Deshaies-Jacques
Air Quality Research Division – Environment and Climate Change Canada
Environment and Climate Change Canada
Air Quality Research Division
Abstract. We examine how observations can be used to evaluate
an air quality analysis by verifying against passive observations
(i.e. cross-validation) that are not used to create the analysis and
we compare these verifications to those made against the same set
of (active) observations that were used to generate the analysis.
The results show that both active and passive observations can be
used to evaluate of first moment metrics (e.g. bias) but only
passive observations are useful to evaluate second moment metrics
such as variance of observed-minus-analysis and correlation
between observations and analysis. We derive a set of diagnostics
based on passive observation–minus-analysis residuals and we
show that the true analysis error variance can be estimated,
without relying on any statistical optimality assumption. This
diagnostic is used to obtain near optimal analyses that are then
used to evaluate the analysis error using several different methods.
We compare the estimates according to the method of
Hollingsworth Lonnberg, Desroziers, a diagnostic we introduce,
and the perceived analysis error computed from the analysis
scheme, to conclude that as long as the analysis is optimal, all
estimates agrees within a certain error margin. The analysis error
variance at passive observation sites is also obtained.
Cross-validation design
• Divide randomly the observations into 3 equal number sets
• Perform an analysis using 2 sets, and use the third one to validate the
analysis. Do 3 permutations of these sets, so that all observations
sites are used to validate the analysis
analysis with set 2 & 3 analysis with set 1 & 3 analysis with set 1 & 2
Surface aerosols – fine particle (PM2.5)
Modified Normalized Mean Bias
verification against excluded set – passive observations ● mean of the three verification subset ▪ verification against same observations – active observations
Note: with respect to passive observation minimum in variance with respect to active observation variance is monotonically deceasing with
22 / bo
Theory Assuming that observations errors are spatially uncorrelated and
uncorrelated with background errors then the passive observation
errors are uncorrelated with the cross—validation analysis error
Hilbert space representation of random variables
])][])([[(:, yyxxyx EEE
0)(),( TBTO
0)(,)( TBTO c
active obs error uncorrelated with background error
passive obs error uncorrelated with background error
0)(,)( TOTO c passive obs error uncorrelated with active obs error
0)(,)(planeanalysis)(thus ccc TATOTO
Tccccc AOAO AHHR ])()[(E
is true whether the analysis is optimal or not [3]
A is the analysis error covariance, Hc is the operator for the passive obs
Tcc
Tcccc AOAO HAHAHH ˆ}])()[( {min E
searching for the minimum gives the optimal analysis
In the case of optimal analysis There are a number of methods to estimate the analysis error covariance
using active observations
• Hollingsworth-Lönnberg [1]
since ΔOAT is a right triangle
• This study
since ΔBAT is a right triangle
• Desroziers [2]
since ΔOTA is similar to ΔTBA
• Analysis error covariance calculated by the analysis scheme –
perceived analysis error
analysis error variance for PM2.5
using passive observations
• This study
since ΔOcAB is also a right triangle
• General result in cross-validation space [3]
ˆ])ˆ)(ˆ[( THL
TAOAO HAHR E
ˆ])ˆ)(ˆ[( TMD
TBABA HAHHBHE
ˆ])ˆ)(ˆ[( TD
TBAAO HAHE
TTTTTP HBHRHBHHBHHBHHAH
~)
~~(
~~ ˆ 1
ˆ])ˆ()ˆ[( TcMDccc
Tcc BABA HAHBHH E
Tccccc AOAO HAHR ˆ}])()[( {min E
Experiment design
• Hourly analyses for PM2.5 and O3 for a period of 60 days
(June 14 to August 12, 2014)
Step 1 - First guess experiment
• Maximum likelihood estimation of Lc using second-order
autoregressive correlation model and error variances
obtained from local Hollingsworth-Lönnberg fit [4]
• Analysis performed over the same period of 60 days
o Using all observation “active”
o 3 cross-validation analyses using 2 subset out of the 3
with permutations
• Conduct a series of analysis with prescribed uniform error
variances with different ratio but such that
we have an innovation variance
consistency
Step 2 - Optimization
• Obtain the optimal by minimizing
that is minimizing the analysis error variance
• Reevaluate Lc using maximum-likelihood using the new
error variances
Step 3 - Optimal analysis experiment
• Re-conduct an optimization for
• As in Step 1 perform a series of analysis
22)]var[( boBO
22 / bo
22 /ˆbo
])var[( cAO
22 /ˆbo
Results of first guess experiment (iter0) and optimization (iter1)
Input error statistics
Estimation of analysis error variance at the passive observation sites
Estimation of active analysis error variance
Remark: Sensitivity analysis to lack of innovation covariance consistency of the different diagnostics shows
that the MD and D estimates are the least sensitive and the HL is the most sensitive
Experiment cL (km) 2)( BO 22 ˆ/ˆˆbo
2ˆo 2ˆ
b p/2
O3 iter 0 124 101.25 0.22 18.3 83 2.23
O3 iter 1 45 101.25 0.25 20.2 81 1.36
PM2.5 iter 0 196 93.93 0.17 13.6 80.3 2.04
PM2.5 iter 1 86 93.93 0.22 16.9 77 1.25
1
Experiment Active
)ˆ( TMDdiag HAH
Active
)ˆ( TDdiag HAH
Active
)ˆ( THLdiag HAH
Active
)ˆ( TPdiag HAH
O3 iter 0 22.69 9.61 -6.03 5.77
O3 iter 1 13.32 13.68 8.94 11.60
PM2.5 iter 0 17.98 7.71 -3.18 4.37
PM2.5 iter 1 10.68 9.51 7.33 8.21
Experiment Passive
)ˆ( TcMDcdiag HAH
Passive 2])ˆvar[( occAO
O3 iter 0 26.03 32.72
O3 iter 1 28.95 28.75
PM2.5 iter 0 22.65 24.49
PM2.5 iter 1 24.62 21.38
Summary and Conclusions
• We examine how passive observations can be used to evaluate analyses, estimate the analysis error
variance and optimize the analysis
• Assuming that observation errors are horizontally uncorrelated gives the following central diagnostic [3]
which is valid whether or not the analysis is optimal.
• By minimizing the central diagnostic we minimize the analysis error variance thus proving a means to
optimize the analysis
• Optimization of the ratio of observation error variance over background error variance, and re-
evaluating the correlation length give near optimal analyses with chi-square values closer to one
• We have introduce a new diagnostic for analysis error that can be applied in both active and passive
observation space
• We have compared different diagnostic of analysis error variance in active observation space; the
Hollingworth-Lonnberg [1], Desroziers [2], our diagnostic, and the computed analysis error provided
by the analysis scheme and showed that they roughly agrees for near optimal analyses. Strong
disagreement between the estimates is found when the analysis is not optimal.
• We have compared different diagnostic of analysis error variance in passive observation space; our
diagnostic and the central diagnostic and showed strong agreement in the esitmates when the analysis
is optimal
• The method introduced here is general and could be used in other geophysical applications and in
particular in surface analyses
Study has been submitted to Atmosphere Special Issue:Air Quality Forecasting and Monitoring
References 1. Hollingsworth, A. and P. Lönnberg. The verification of objective analyses: Diagnostics of analysis system performance. Meteorol. Atmos. Phys. 1989
2. Desroziers, G., L. Berre, B. Chapnik, and P. Poli. Diagnosis of observation-, background-, and analysis-error statistics in observation space. Q. J. Roy.
Meteorol. Soc. 2005, 131, 3385-3396
3. Marseille, G.-J., J. Barkmeijer, S. de Haan, and W. Verkley. Assessment and tuning of data assimilation systems using passive observations. Q. J. R.
Meteorol. Soc., 2016, 142, 3001-3014, DOI:10.1002/qj.2882.
4. Ménard, R., M. Deshaies-Jacques, and N. Gasset. A comparison of correlation-length estimation methods for the objective analysis of surface pollutants at
Environment and Climate Change Canada. Journal of the Air & Waste Management Association. 2016, 66:9, 874-895
Tccccc AOAO AHHR ])()[(E
ˆ])ˆ)(ˆ[( TMD
TBABA HAHHBHE
ˆ])ˆ()ˆ[( TcMDccc
Tcc BABA HAHBHH E