1. Introduction - GNS Science€¦ · Web viewA Theoretical Analysis of Model Simplification John...

A Theoretical Analysis of Model Simplification

John Doherty and Catherine Moore

March, 2017

Contents1. Introduction.......................................................................................................................................12. Using Models to Make Decisions.......................................................................................................2

2.1 Risk Analysis.................................................................................................................................22.2 Model Predictive Uncertainty......................................................................................................32.3 Complex Models and Simple Models...........................................................................................4

3. History-Matching...............................................................................................................................83.1 Introduction.................................................................................................................................83.2 Linearized Bayes Equation...........................................................................................................93.3 Model Calibration......................................................................................................................10

3.3.1 Some Concepts...................................................................................................................103.3.2 Regularisation.....................................................................................................................11

3.4 Calibration through Singular Value Decomposition...................................................................123.4.1 Preconditions......................................................................................................................123.4.2 Singular Value Decomposition............................................................................................133.4.3 Quantification of Parameter and Predictive Error..............................................................153.4.4 Model-to-Measurement Misfit...........................................................................................18

4. Accommodating Model Defects......................................................................................................194.1 Introduction...............................................................................................................................194.2 Mathematical Formulation........................................................................................................19

4.2.1 The Problem Defining Equation..........................................................................................194.2.2 Parameter Estimation.........................................................................................................204.2.3 Model-to-Measurement Fit................................................................................................204.2.4 Predictive Error...................................................................................................................21

4.3 A Pictorial Representation.........................................................................................................244.4 Regularisation and Simplification..............................................................................................284.5 Non-Linear Analysis...................................................................................................................29

5. Repercussions for Construction, Calibration and Deployment of Simple Models............................325.1 Introduction...............................................................................................................................325.2 Benefits and Drawbacks of Complex Models.............................................................................325.3 Notation.....................................................................................................................................345.4 Prediction-Specific Modelling....................................................................................................345.5 Playing to a Model’s Strengths..................................................................................................355.6 Dispensing with the Need for Model Calibration.......................................................................375.7 Tuning the Calibration Dataset to Match the Prediction...........................................................385.8 Model Optimality and Goodness of Fit......................................................................................405.9 Detecting Non-Optimality of Simplification...............................................................................41

6. Avoiding Underestimation of Predictive Uncertainty when Using Simple Models..........................446.1 Introduction...............................................................................................................................446.2 Solution Space Dependent Predictions......................................................................................456.3 Null Space Dependent Predictions.............................................................................................466.4 Predictions which are Dependent on Both Spaces....................................................................486.5 Direct Hypothesis-Testing..........................................................................................................49

7. Joint Usage of a Simple and Complex Model...................................................................................517.1 Introduction...............................................................................................................................517.2 Linear Analysis...........................................................................................................................51

7.3 Predictive Scatterplots...............................................................................................................527.4 Surrogate and Proxy Models......................................................................................................537.5 Suggested Improvements to Paired Simple/Complex Model Usage..........................................54

7.5.1 General...............................................................................................................................547.5.2 Option 1: Using the Simple Model for Derivatives Calculation...........................................547.5.3 Option 2: Modifications to Accommodate Complex Model Numerical Problems..............557.5.4 Option 3: Direct Adjustment of Random Parameter Fields.................................................55

8. Conclusions......................................................................................................................................579. References.......................................................................................................................................60

1. IntroductionThe discussion presented herein is focussed on modelling in the context of decision-support. First it briefly examines the role that numerical modelling should play in this process, and then provides some metrics for judging its success or otherwise in supporting it. Having defined these metrics, the discussion then turns to the level of complexity that a model must possess in order to achieve these metrics. In doing, so it attempts to provide an intellectual structure through which an appropriate level of model complexity can be selected for use in a particular decision-making context, at the same time as it attempts to define how models can be simplified in a way that does not erode the benefits that modelling can bring to the decision-making process. It also examines whether a modeller actually needs to make a choice between “simple” and “complex”, or whether both types of models can be used in partnership in order to gain access to the benefits of both while avoiding the detriments of either.

To the authors’ knowledge, an analysis of the type presented herein has not been previously presented. Because of this, there is a tendency for models that are built to support environmental decision-making to be more complex than they need to be. Rarely is this bias toward complexity the outcome of deliberations that conclude that a high level of complexity is warranted in a particular decision-support context. More often than not, a modeller’s bias towards complexity arises from a fear that that his/her work will be criticised by reviewers for omitting the complexity required for his/her model to be considered as a faithful simulator of environmental processes at a particular study site.

In fact, no model is a faithful simulator of environmental processes, regardless of its complexity. All models are gross simplifications of the myriad of environmental process that take place within their domain at every scale. In spite of this, they can still form a solid basis for environmental decision-support. It is argued herein that this support is not an outcome of their ability to simulate what will happen if management of a system changes, for no model can do this. Instead, it is an outcome of the fact that models are unique in their ability to provide receptacles for two types of information, namely that arising from expert knowledge and direct measurements of system properties on the one hand, and that which is resident in the historical behaviour of an environmental system on the other hand. These receptacles are far from perfect, and may indeed be improved with more faithful reproduction of system dynamics. However it will be argued herein that while some extra receptacles can be opened with increased model complexity, others are simultaneously closed.

Conceptualization of models as receptacles for information implies that there is an optimal level of complexity that is appropriate to any particular decision-making context. Furthermore this level is “fluid” in the sense that it may change with advances in the design of simulation and inversion algorithms, and with advances in computing technology. At the same time, it may depend on the size and talent of human resources on which an institution can draw for model construction and deployment. Hence the decision for adoption of a certain level of model complexity must be made anew for each new decision context, and possibly for each decision that pertains to that context. Moreover, it may need to undergo periodic revision as the modelling process progresses.

Arguments for and against complexity should take all necessary factors into account if they are to support decisions on how models can best support the decision-making process. This requires a clear conceptualization of what complexity can provide and what simplicity can provide in any decision-making context. The purpose of the present document is to provide these conceptualizations.

1

2. Using Models to Make Decisions2.1 Risk AnalysisIn a seminal paper on the role of models in decision-support, Freeze et al (1990) characterize this role as quantifying the level of risk associated with a particular course of management action. They argue that for any of the choices that face a decision-maker, an object function Փ can be roughly calculated as follows:

Փ= B – C – R (2.1.1)

In this equation B represents the benefits accruing from this particular choice, C represents the costs associated with this choice, while R represents the risk of failure. R can be loosely defined as the probability of failure times the cost of failure. Freeze et al argue that for any particular management option B and C are known. It is the role of modelling to evaluate R as it pertains to different management choices. The option with the highest objective function represents the best course of action.

It follows immediately that if modelling is to properly serve the decision-making process, its outputs must be probabilistic. As will be discussed below, it is the very nature of environmental modelling that expectations of anything other than probabilistic outputs are inconsistent with what modelling can provide.

The concepts embodied in equation 2.1.1 can be expressed as follows. Associated with any management course of action is the possibility that an unwanted event, or “bad thing” may happen. In the water management context this “bad thing” may, for example, be the occurrence of unduly high or unduly low groundwater levels at a particular location at a particular time, durations of lower-than-threshold stream flows that exceed those required for maintenance of stream health, or concentrations of contaminants that exceed human or biotic health limits. Any of these may constitute failure of a selected environmental management plan. Tolerance of failure is related to the cost of failure. If the cost is relatively low, then a decision-maker can tolerate a moderate possibility of failure if this reduces implementation costs for a particular management option. On the other hand, if the price of failure is high, the probability of its occurrence must be low for a management option to be acceptable.

Seen in another light, failure of a particular management plan constitutes a type of “null hypothesis”. The purpose of environmental modelling it to test that hypothesis. Ideally, for a particular management option to be viable, the modelling process must demonstrate that the failure hypothesis can be rejected at a level of confidence that is appropriate for the cost associated with such failure.

This concept of the role of modelling in decision-making allows us to define failure of the modelling process as it supports the decision-making process. Suppose that a modeller concludes that the probability of occurrence of a bad thing or unwanted event is low. Suppose, however, that the actual probability of occurrence of that event is greater than quantified by the modeller. This constitutes a so-called type II statistical error, this being defined as false rejection of a hypothesis. In the present discussion we declare the occurrence of such an error to constitute failure of the modelling process, for that process has provided an overly optimistic view of the outcomes of a particular management option.

2

It can be argued that the occurrence of a type I statistical error should also be construed as failure of a modelling exercise. A type I error occurs through failure to reject a hypothesis that is, in fact, unlikely. It is argued herein that this does not provide a useful definition of modelling failure. As will become apparent later in this document where model simplification is considered, an inevitable cost of simplification is (or should be) that model-calculated predictive uncertainty margins may be broader than would have been calculated by a more complex model. However, as will also be discussed, excessive run times and numerical instability associated with use of a complex model may preclude quantification of predictive uncertainty at all. The possible occurrence of a type I statistical error is thus a price that may need to be paid for quantification of uncertainty. Of course, if quantified uncertainty bounds are too broad, then the benefits of modelling are lost, for hypotheses of bad things happening can never be rejected. While we do not classify this as failure herein, we do characterize it as “unhelpful”.

2.2 Model Predictive UncertaintyConceptually, the uncertainties associated with predictions made by an environmental model can be quantified using Bayes equation. For the sake of brevity we assume that model predictive uncertainty is an outcome of model parametric uncertainty. “Parameters” in this context can include system properties that are represented in the model, as well as system stresses and boundary conditions. In fact they can include any aspect of a model’s construction and deployment of which a modeller is not entirely certain. The analysis could be extended to include conceptual uncertainties that underpin design of a model; however, though important, these are not addressed until later in this document where the concept of model defects is introduced.

Let the vector k denote parameters employed by a model. At this stage we assume that the model is complex enough to represent all aspects of an environmental system that are salient to a prediction. The vector k may therefore possess many elements – far more than can be estimated uniquely. Let P(k) characterize the prior probability distribution of k. As such, P(k) expresses expert knowledge as it pertains to system properties; it also reflects direct measurements of those properties that may have been made at a number of locations throughout the domain of the system of interest.

Let the vector h represent measurements of system state that collectively comprise a “calibration dataset” for a particular model. (The meaning of the term “calibration” will be addressed later in this document.) Prior estimates of k must be “conditioned” by these measurements if model-generated counterparts to these measurements are to replicate them to a level that is commensurate with measurement noise. That is to say, the range of possibilities for k that is expressed by P(k) must be narrowed in order for the model to be capable of reproducing historical measurements of system state (i.e. h) when supplied with historical stresses. The outcome of this conditioning process is the so-called posterior probability distribution of k, denoted herein as P(k|h) (i.e. the probability distribution of k as conditioned by h). The relationship between the prior and posterior parameter probability distributions is expressed by Bayes equation, that is

P (k|h ) ∝ P (h|k ) P (k ) (2.2.1)

In equation 2.2.1 P(h|k) is called the “likelihood function”. It increases with the extent to which model outputs approach measurement of system state. Thus parameters which give rise to better reproduction by the model of historical system behaviour are more likely to be representative of those which exist in reality than those which do not.

Predictions of the future state of a system made by a model are also dependent on the parameter set k that is used to make them. Naturally, predictions grow in probability as the parameters that the

3

model employs to make these predictions themselves grow in probability. Notionally, the posterior probability distribution of a prediction can be constructed by sampling the posterior probability distribution of parameters and making a model run using each such sample. Sampling of the posterior parameter distribution can be effected using methodologies such as Markov chain Monte Carlo. However while high sampling efficiencies can be achieved using Markov chain Monte Carlo where parameters are few in number, this is not the case where parameters number in the hundreds or thousands. Sampling efficiencies then become low. Furthermore, where a model takes a long time to run, the computation burden of sampling P(k|h) becomes impossibly high. Fortunately however, alternative, albeit approximate, methods for sampling the posterior parameter probability distribution are available for use in highly parameterized contexts where model run times are long. This is further discussed below.

While the two terms on the right of equation 2.2.1 denote probability distributions, conceptually they can also be viewed as receptacles for information. P(k) expresses information contained in expert knowledge, including the inherently probabilistic nature of such knowledge as it pertains to complex environmental systems. On the other hand, P(h|k) expresses information contained in historical measurements of system state. Where a model employs parameters that are, on the one hand, informed by expert knowledge while, on the other hand, induce pertinent model-calculated quantities to replicate historical observations of system state, it can be considered to be a repository for these two types of information. As will be further discussed below, this does not mean that its parameters are unique; in fact parameter nonuniqueness is expressed by the fact that the history-matching process yields a probability distribution, denoted in equation 2.2.1 as P(k|h). What it does mean is that, in respecting this posterior parameter probability distribution, the model provides receptacles for the two types of information that are required for specification of the posterior probability distribution of a prediction of management interest.

The concept of a model as providing receptacles for information can be extended to better understand the role of models in the decision-making process. Recall from section 2.1 that a model supports the decision-making process through the ability it provides to test whether the possibility of eventuality of a particular bad thing can be rejected (at a certain level of confidence). Rejection of the “bad thing hypothesis” can be made if the occurrence of the bad thing is demonstrably incompatible with either expert knowledge of system properties and processes, the historical behaviour of the system, or both. It is the task of the model to hold the information through which the management-related hypothesis can be tested, and possibly rejected. Conceptually, the hypothesis can be rejected if it is not possible to find a reasonable (in terms of expert knowledge) set of parameters that allows the model to fit the calibration dataset. Methods are available through which a model can be deployed directly in this hypothesis-testing capacity. See Moore et al (2010), Doherty (2015) and section 6.5 of this document.

The idea that an environmental model is more usefully viewed as a repository for information than as a simulator of environmental processes is eloquently expressed by Kitanidis (2015). This is not to say that a model’s ability to simulate environmental processes is of no use. Rather it expresses the concept that a model’s simulation capabilities should serve its primary role as a repository for decision-critical information rather than the other way round. Hence where a model that is built for use in the decision-making process is reviewed by a modeller’s peers, it is its ability to hold and use information that is salient to available decision options that should be most carefully scrutinized, and not its (imagined) ability to “realistically” replicate environmental processes.

4

2.3 Complex Models and Simple ModelsThe above discussion has attempted to define the role that models should play in the decision-making process. At the same time it has presented a metric for failure of a modelling exercise when conducted in support of that process. This provides an intellectual structure for choosing modelling options. The choice of what type of model to build in a particular decision context, and how that model should be deployed to support the decision-making process, should be accompanied by a guarantee that modelling will not fail in attempting to play this role, using the metric for failure provided above.

From a Bayesian perspective, the task of modelling in decision-support can be viewed as quantifying the uncertainties of predictions of management interest, while reducing those uncertainties as much as possible through giving voice to all pertinent information that pertains to the system under study. This information is comprised of expert knowledge (including direct measurements of system properties), as well as the historical behaviour of the system. However, because it is more useful in exploring concepts related to model simplification, the present document adopts a frequentist perspective, in which the role of the modelling process is depicted as testing the hypothesis that an unwanted event will occur following implementation of an environmental management strategy that may precipitate that event. It then attempts to reject the hypothesis of its occurrence through demonstrable incompatibility of the event with information for which a model provides receptacles.

Conceptually, the more complex is a model, the more receptacles can it provide for information. Presumably a complex model is endowed with many parameters. It can thus express the heterogeneity of system properties that govern the processes that are operative within that system. Expressions of heterogeneity are expressions of expert knowledge. Furthermore, because a complex model can represent the hydraulic properties of a spatial system at a scale that is commensurate with field measurements of those properties, these measurements can be used to constrain the expression of hydraulic property detail in the model at field measurement locations. However, it is important to realize that while complex models have the ability to represent system detail, they must recognize the probabilistic nature of such detail, and the fact that direct measurements of system properties can condition that detail at only a discrete number of points. In general, the greater the detail that is expressed by a model, the greater is the uncertainty associated with that detail. Stochasticity (normally involving a large number of model runs) thus becomes integral to expressing the expert knowledge of which a complex model is the repository.

Because complex models can be endowed with many parameters, adjustment of these parameters should promulgate a good fit between model outputs and measurements of system state which comprise a calibration dataset. In theory, complex models can therefore provide receptacles for the information contained in these measurements. However the transfer of this information to the model requires that a satisfactory (in terms of measurement noise) level of fit be attained between pertinent model outputs and field measurements. In practice, this requires use of the model in conjunction with software such as PEST (Doherty, 2016) that is capable of implementing highly parameterized inversion. This, in turn, requires that the model be run many times, and that the model’s numerical performance be good. Unfortunately, complex models are often burdened with long run times; moreover, their numerical performance is often questionable. History-matching of a complex model to produce a parameter field of minimum error variance which can be deemed to “calibrate” the model can therefore be an extremely difficult undertaking. Generating many other parameters fields which also satisfy calibration constraints in an attempt to sample the posterior parameter probability distribution of the model can be an impossibly difficult undertaking.

5

From the above considerations it is apparent that it is difficult, if not impossible, for a complex model to live up to its decision-support potential. Even in modelling circumstances where complexity is embraced because of the possibility that it offers to represent detail, rarely is the information content of expert knowledge given proper expression through using the model in a stochastic framework.

Simple models generally employ fewer parameters than complex models. Generally these parameters express abstractions of system properties such as large zones of assumed piecewise constancy in a groundwater model, or lumped storage elements in a land use or surface water model. Expert knowledge can be difficult to apply to such parameters. Hence the ability of a simple model to provide receptacles for expert knowledge is limited.

On the other hand, simple models often run fast and are numerically stable. If they are endowed with enough parameters, it may be possible to provide values for these parameters which support a good fit between pertinent model outputs and historical measurements of system state. Furthermore, achievement of this fit can often be accomplished very quickly using inversion software such as PEST. Simple models therefore constitute good receptacles for information contained in historical measurements of system state. As will be discussed later in this document, for some predictions this is all the information that a model needs to carry. However where a prediction is partly sensitive to combinations of parameters which occupy the calibration null space (see below), the parameters which comprise these combinations must be represented in the model, even if they cannot be uniquely estimated, if the uncertainty of the prediction is to be properly quantified. Furthermore, the simple model’s parameters must be capable of adjustment through a range of values that is compatible with their prior uncertainties as posterior predictive uncertainty is explored. A problem with many simple models is that these parameters may not be represented at all in the model, as they are not required to achieve a good fit with the calibration dataset. Furthermore, even if they are represented, their prior uncertainties may be difficult to establish because of the abstract nature of parameters that the simple model employs.

The above discussion attempts to illuminate some fundamental differences between complex and simple models, as well as the ramifications of these differences for model-based decision-making. However the discussion omits some important nuances which will be covered later in this document. The choice of appropriate complexity will always be site-specific, and indeed prediction-specific. Furthermore “appropriate” must be judged in the context of a given modelling budget. As a practitioner of the “art of modelling”, the modeller must choose a level of complexity that is best tuned to his/her decision context and to his/her modelling budget. In doing so he/she must guarantee that his/her choice is accompanied by model specifications that prevent occurrence of a type II statistical error in which the hypothesis of a bad thing happening is wrongly rejected.

In some decision contexts it may be possible to build a model that is complex enough to express expert knowledge yet simple enough to be employed with inversion software that enables a good fit between model outputs and field data to be obtained. In another context, a modeller may decide to build a simple model; however the model may be too simple to support a parameterization scheme that promulgates a good fit between model outputs and historical measurements of system state. The modeller may then increase parameterization complexity of the model in order to achieve such a fit. However, as will be shown in following chapters, even if a good fit between model outputs and field measurements can be obtained, the calibration process may induce bias in some predictions. The attainment of a “well calibrated model” may therefore actually compromise the utility of the model in that particular decision context.

6

It is an inconvenient truth that for all but the simplest environmental systems, it is difficult with current modelling technology to build a model which simultaneously provides receptacles for the two information types that are expressed by the two terms on the right hand side of Bayes equation. A modeller must therefore ask him/herself which type of information should be better expressed by his/her model, given his/her current decision-making context. Where a prediction is sensitive to parameters (or parameter combinations) whose uncertainties cannot be reduced much through history-matching, it is more important that the model provide receptacles for expert knowledge than for information contained in measurements of system state. In contrast, where a prediction is sensitive to parameters whose uncertainties can be significantly reduced through history-matching, then the model must run fast enough, and be numerically stable enough, to be used in concert with inversion software which provides a good fit between model outputs and historical measurements of system state; the uncertainty of that prediction can be consequentially reduced.

As will be discussed in following chapters, the choice of modelling approach becomes most difficult where a prediction of management interest is partially sensitive to parameters whose uncertainties can be reduced through history-matching, and partially sensitive to parameters whose uncertainties cannot be thus reduced. Unfortunately many predictions of management interest fall into this category. This is because the modelling imperative often arises from a proposal that management of a system be altered from its historical state. The system will therefore be exposed to stresses to which it has not hitherto been exposed. Historical measurements of system state may not inform all (combinations of) parameters to which model predictions of future system behaviour are sensitive; however it may inform some of these (combinations of) parameters. In these circumstances, model design for decision-support becomes very difficult. These difficulties are compounded by the fact that the requirements of a simple model in this context are more stringent than in contexts where predictions are very similar in nature to measurements that comprise a calibration dataset. As will be shown, while a simple model may indeed support a sufficient number of parameters for a good fit to be obtained between model outputs and field data, the information which is thereby transferred from these measurements to the model’s parameters may be placed into receptacles that are imperfect and distorted. The repercussions for some predictions may be small; the repercussions for other predictions may be dire.

Theory through which an understanding of model simplification can be gained is presented in the following chapters of this document.

7

3. History-Matching3.1 IntroductionThis chapter, and the following chapter, introduce theory which, it is hoped, provides insights into both history-matching and model simplification. These are seen to be closely related. The theory is descriptive rather than exact, as it is based on an assumption of model linearity, this implying that the relationship between a model’s outputs and its parameters can be described by a matrix. Most models are, of course, nonlinear. Nevertheless, their behaviour can be considered to be “locally linear” as long as parameters are not varied too much from specified starting values. Of more importance to the present context, however, is the fact that linear analysis supports the use of subspace concepts in examining the roles played by model calibration and model simplification. As will be seen, the light that these concepts shed on calibration and simplification are profound.

The presentation of theory in this and the following chapter is relatively brief. Full details can be found in Doherty (2015). Many of the equations derived below can be evaluated for real-world models using members of the PEST and PyEMU suites; see Doherty (2016) and White et al (2016) for details. See, in particular, members of the PEST PREDUNC* and PREDVAR* suite of programs. PREDVAR1C is of particular importance as it accommodates model defects (see the following chapter), albeit in restricted situations where so-called “defect parameters” can be identified.

As in the previous chapter, we use the vector k to denote parameters used by a model. To begin with we consider models which are “complex”, in that they employ enough parameters to represent real world heterogeneity, and simulate enough processes for their outputs to be uncompromised by any defects. Let the vector h, once again, denote the calibration dataset. The number of elements of h may or may not exceed that of k; hence in a numerical groundwater model, for example, different properties can be ascribed to every cell of the model grid or mesh (which is indeed representative of the complexity associated with many groundwater systems). Let the vector ε denote measurement noise associated with the elements of h. Finally, let the matrix Z denote the action of the model under calibration conditions. Then

h = Zk + ε (3.1.1)

The symbol C(k) is used herein to denote the prior covariance matrix of k, this being associated with the prior probability density function P(k) featured in equation 2.2.1. Let C(ε) denote the covariance matrix of measurement noise; this is used in calculation of the likelihood function P(h|k) of equation 2.2.1. We express the interrelationship between the k and ε vectors and their respective covariance matrices using the expressions

k ~ C(k) (3.1.2a)

ε ~ C(ε) (3.1.2b)

In most cases of interest C(ε) is diagonal as the noise associated with any one measurement is considered to be independent of that associated with any other. However this does not have to be the case. Indeed it is not the case if noise is “structural” in origin – a matter which will be discussed later in this document. Let the scalar s denote a prediction of interest made by the model, and let the sensitivities of this prediction to model parameters be encapsulated in the vector y. Then

s = ytk (3.1.3)

where the superscript “t” denotes matrix/vector transpose.

8

Before proceeding, we remind the reader of a matrix relationship used to express propagation of variance. Let x be a random vector with covariance matrix C(x). Let y be calculable from x through a linear relationship involving the matrix A. That is

y = Ax (3.1.4)

It is easily shown (see, for example, Koch, 1999) that the covariance matrix C(y) of y is calculated from that of x through the relationship

C(y) = AC(x)At (3.1.5)

Applying this to equation 3.1.3, the prior covariance matrix of s (which is the variance of s, as s is a scalar) becomes

σ2s = ytC(k)y (3.1.6)

(Recall that variance is the square of standard deviation.) For a real (nonlinear) model, the variance of predictive uncertainty would be calculated by drawing samples from P(k), running the model for each sample, and building an empirical probability density function for s.

3.2 Linearized Bayes EquationEquation 3.1.6 depicts the uncertainty of a prediction made by a model which has not been calibrated; this is the so-called “prior uncertainty” of the prediction. As was discussed in the preceding chapter, application of Bayes equation to the history-matching process restricts the range of predictive possibilities to those that are calculated using parameter sets that match the calibration dataset h to within a tolerance set by the noise ε that is associated with h. It can be shown (Doherty, 2015), that either of the following, mathematically equivalent, equations can be used to calculate the posterior covariance matrix Cʹ(k) of parameters k used by the model. Cʹ(k) is thus the covariance matrix of these parameters subject to the conditioning effects of history-matching.

Cʹ(k) = C(k) –C(k)Zt[ZC(k)Zt + C(ε)]-1ZC(k) (3.2.1a)

Cʹ(k) = [ZtC-1(ε)Z + C-1(k)]-1 (3.2.1b)

The posterior uncertainty variance of a prediction s made by the model, which we denote as σsʹ2, is then readily calculated using either of the following equations.

σsʹ2 = ytC(k)y – ytC(k)Zt[ZC(k)Zt + C(ε)]-1ZC(k)yt (3.2.2a)

σsʹ2 = yt[ZtC-1(ε)Z + C-1(k)]-1y (3.2.2b)

Note that all of equations 3.2.1a to 3.2.2b depend on an assumption of model linearity; they also assume that prior parameter probabilities and measurement noise have multi Gaussian probability distributions. A comparison of equation 3.2.2a with equation 3.1.6 shows that history-matching reduces the range of predictive possibilities, thereby ensuring that σsʹ2 is no greater than σ2

s.

Where a model is nonlinear, calculation of posterior parameter and predictive probabilities is far more difficult than implementing equations 3.2.1 and 3.2.2. Methods such as Markov chain Monte Carlo must be used. As has already been stated, these become numerically intractable where model run times are large and/or where parameter numbers are high. Nevertheless, provided that sensitivities encapsulated in the Z and y matrix are calculable, equations 3.2.1 and 3.2.2 can provide useable approximations to posterior parameter and predictive uncertainty. They can also be used to calculate value-added quantities of modelling interest such as

9

contributions to predictive uncertainty by different parameters or parameter types; worth of various types of existing or posited data in reducing uncertainties of predictions of

management interest.

Calculations such as these and others are made easier by the fact that only sensitivities, and not the actual values of parameters k or measurements h, appear in equations 3.2.1 and 3.2.2. Hence the worth of data in reducing uncertainty can be calculated prior to actual acquisition of that data. This provides a powerful basis for choosing between different data acquisition strategies. See Dausman et al (2010) and Wallis et al (2014) for examples of application of these equations.

3.3 Model Calibration3.3.1 Some ConceptsIn most contexts of real-world model deployment, history-matching is achieved through model calibration rather than through conditioning using Bayesian methodologies. Model calibration seeks a unique solution to a normally ill-posed inverse problem. Obviously, uniqueness is an artificiality; basic Bayesian considerations expose the fact that parameters and predictions are uncertain both before and after history-matching. However, numerically, solution of an ill-posed inverse problem is easier to undertake than calculation of a posterior probability distribution.

(It is worth noting that the pivotal role played by the “calibrated model” in model-based decision-support as it us undertaken at the present time is rooted more in an age-old quest for insights into an unknown, but possibly dark, future than on a theory-based understanding of what history-matching can actually deliver. Theory presented below demonstrates that the commonly-held view that predictions made by a calibrated model cannot be too wrong, has no mathematical basis.)

Environmental systems are complex. Hence the model parameter vector k has many elements. Rarely is there a unique solution to the inverse problem of model calibration. Thus a model can be “calibrated” in many ways; that is to say, there are infinitely many parameter sets which allow a model to fit the calibration dataset h. Ideally, however, the parameter set k which is accepted as the “calibrated parameter set” should have a certain property which guarantees its uniqueness and its usefulness. This property is that of minimized error variance. In other words, the parameter set k which is deemed to calibrate a model should achieve uniqueness not through any claim to correctness in the sense that it resembles the real (but unknown) parameter set k. Its claim to uniqueness should be based on the notion that its error in being an estimate of k has been minimized. This error may still be large, for it is set by the information content of the calibration dataset (which may be small). Notionally, minimization of the variance of parameter error k-k is achieved through trying to estimate a k that lies somewhere near the centre of the posterior probability distribution of k. The potential for error of k is therefore minimized through making this potential symmetrical with respect to k. In doing so, both k, and predictions s that are made on the basis of k, can be considered to be unbiased.

In mathematical parlance, the means through which uniqueness of solution of an ill-posed inverse problem is achieved is termed “regularisation”. Calibration methodologies differ in the way that regularisation is achieved. However the metric by which one regularisation method can be judged to be superior to another regularisation method is the guarantee that it provides that the k which it calculates is indeed of minimized error variance. In practice, claims to minimized error variance cannot be verified, especially in complex parameterization contexts where prior parameter probability distributions have no analytical description and must be sampled through geostatistical means. Nevertheless, most methods of numerical regularisation have mathematical foundations which seek to provide a practical guarantee that the parameter set which is calculated through its

10

use is “unbiased enough” for predictions made using that parameter set to be “close enough” to minimum error variance.

It is obvious from the above discussion that a “calibrated model” does not constitute a suitable basis for decision-support, for it provides no means to assess the risk of unwanted environmental events, this being central to a model’s role in decision-support. See the discussion in chapter 2 of this document. Calculation of a parameter set k of minimized error variance supports calculation of a prediction s of minimized error variance. This should be considered as the first step in a two-step process of quantifying that error variance as a substitute for quantifying posterior predictive uncertainty. Doherty (2015) shows that post-calibration predictive error variance is larger than posterior predictive uncertainty variance. However it is generally not too much larger and, provided certain conditions are met, is much easier to evaluate.

3.3.2 RegularisationAs was stated in the above section, the means through which a unique solution is attained to a fundamentally non-unique inverse problem is termed “regularisation”. In many instances of model usage, regularisation is implemented manually. A modeller first defines a parsimonious parameter set p by combining or lumping elements of k. In the groundwater modelling context this can be achieved by defining a suite of zones of assumed piecewise constancy that collectively span the model domain. In other modelling contexts regularisation may be achieved by fixing many of the elements of k so that only a few are exposed to adjustment through the calibration process. Alternatively, groups of parameters may be linked so that relativity of their values is preserved while grouped values are adjusted. Regardless of how manual regularisation is implemented, the elements of p should be few enough in number to be uniquely estimable. In achieving their estimation the model is deemed to be calibrated.

Despite the ubiquitous use of manual regularisation in model calibration, it should be used with caution. It offers no mathematical guarantee that the parameter field achieved through solution of the inverse problem is indeed of minimum error variance. Hence the calibration process may actually engender bias in some model predictions, a matter that will be discussed later in this document. Secondly, to the extent that a model prediction is sensitive to elements of k which cannot be estimated, and are therefore removed from the model in order to achieve parameter uniqueness, the error variance of that prediction will be under-estimated. If this is a decision-critical prediction, a type II statistical error may therefore follow.

Model simplification can also be considered as a form of regularisation – a matter which will be discussed extensively in the next chapter of this document. A simplified model generally employs far fewer parameters than are required to depict the potential for spatial heterogeneity of hydraulic and other properties throughout a study area. The same problems may thus be incurred through calibration and deployment of a simplified model as may be incurred through suboptimal manual regularisation of a complex model parameter field, namely induction of predictive bias and failure to fully quantify predictive uncertainty. However, as will be shown below, this depends on the prediction; some predictions made by a simplified model may be afflicted by these problems, while other predictions made by the same model may be free of them.

Non-manual regularisation is achieved mathematically, and implemented numerically, as part of the calibration process itself. There are two broad approaches to mathematical regularisation. These can be broadly labelled as Tikhonov and subspace methods, singular value decomposition (SVD) being the best-known means to implement the latter approach.

11

Tikhonov regularisation achieves uniqueness by supplementing the information content of the calibration dataset with expert knowledge as it applies to parameter values and spatial/temporal relationships between parameter values. Expert knowledge is therefore given mathematical voice, and included in the inverse problem solution process, in a way that attempts to guarantee parameter uniqueness at the same time as it attempts to promulgate numerical stability in solution of the ill-posed inverse problem. Claims to minimum error variance solution of that problem are based on the role awarded to expert knowledge in defining a prior minimum error variance parameter set, and on the use of inversion algorithms that seek a parameter set that departs as little as possible from that prior parameter set in order to support a good fit of pertinent model outputs with the calibration dataset. Thus any unsupported heterogeneity is suppressed. Of course the real world is heterogeneous. But expressions of heterogeneity in a calibrated parameter field that are not supported by the calibration dataset run the risk of being incorrect. Such a parameter field therefore loses its claim to that of minimized parameter error variance.

To put it another way, Tikhonov regularisation seeks parameter heterogeneity that MUST exist to explain a measurement dataset. Meanwhile, exploration of the heterogeneity that MAY exist, and that is consistent with the measurement dataset, is the task of post-calibration error/uncertainty analysis.

In contrast to Tikhonov regularisation, subspace methods seek parameter uniqueness through parameter simplification. These methods are discussed in detail in the present document as model simplification can be considered to be a form of parameter simplification. Exposition of the theory which underpins use of subspace methods in model calibration therefore provides insights into the positive and negative outcomes of model simplification. It is to exposition of this theory that we now turn.

3.4 Calibration through Singular Value Decomposition 3.4.1 PreconditionsIn order to simplify the following discussion, we will assume that the matrix Z which features in equation 3.1.1 has more columns than rows. We thus assume that parameters comprising the vector k outnumber observations comprising the vector h. This guarantees that the inverse problem of estimation of k is ill-posed. However the discussion that follows is just as pertinent to cases where the elements of h outnumber those of k. Most such problems are also ill posed. However if an inverse problem is well-posed, the methods outlined below can still be used for solution of that problem. In all cases a solution of minimum error variance is found for the inverse problem – provided certain conditions are met.

To further simplify the mathematical presentation below, the following assumptions are made. Both of these are crucial to achievement of a solution of minimum error variance to the inverse problem posed by equation 3.1.1 where that problem is solved using singular value decomposition.

C(k) = σ2kI (3.4.1)

C(ε) = σ2εI (3.4.2)

Equation 3.4.1 states that elements of k are statistically independent from an expert knowledge point of view. In practice this assumption is often violated – especially in groundwater model domains where hydraulic properties are expected to show a high degree of spatial correlation, so that C(k) has off-diagonal elements. In principal this problem can be overcome through estimating a set of transformed parameters j, and then back-transforming to k after a minimum error variance solution for j has been obtained. The transformation from k to j must be such as to provide j with the

12

properties defined in equation 3.4.1; such a transformation is known as a Kahunen-Loève transformation. Where spatial variability of k can be specified through a covariance matrix C(k), this transformation is easily formulated through undertaking singular value decomposition of this matrix; see Doherty (2015) for details.

Equation 3.4.2 states that measurement noise is homoscedastic and uncorrelated. In practice, an observation dataset will normally be comprised of different data types, with different levels of noise associated with different types of measurements. Also, even for the same data type, some measurements will be more reliable than others. Fortunately, this situation is easily accommodated by applying weights to different measurements and slightly re-formulating the inverse problem in accordance with this weighting scheme. Ideally, minimum error variance of estimated parameters k is achieved through use of a weight matrix Q that is designed according to the specification

Q = σ2rC(ε) (3.4.3)

The inverse problem is then reformulated to estimate Q1/2h instead of h. However, in order to maintain simplicity of the following equations, (3.4.2) will be assumed.

3.4.2 Singular Value DecompositionSingular value decomposition (SVD) can be applied to any matrix. Singular value decomposition of the matrix Z can be formulated as follows.

Z = USVt (3.4.4)

In equation 3.4.4, U is an orthonormal matrix whose columns span the range space of Z. V is an orthonormal matrix whose columns span parameter space. S is a rectangular matrix whose diagonal elements are comprised of singular values. These are either positive or zero; normally they are arranged down the diagonal from highest to lowest. Off-diagonal elements of S are zero.

An orthonormal matrix has columns which are unit vectors which are orthogonal to each other. It has the very useful property that its transpose is its inverse. Hence, for the orthonormal matrix V,

VtV = VVt = I (3.4.5)

Suppose, as will be done shortly, V is partitioned as

V = [V1 V2] (3.4.6)

Then

Vt1V1 = I (3.4.7)

However V1Vt1 is not equal to I. Rather, it is an orthogonal projection operator into a subspace

spanned by the columns of V1. See texts such as Menke (2012) and Aster et al (2013) for further details.

If partitioning of V according to equation 3.4.6 is performed in such a way that the number of columns of V1 is equal to the number of non-zero singular values in S (see equation 3.4.4), then the columns of V2 span the null space of the matrix Z. The null space of this matrix is comprised of vectors δk for which

Zδk = 0 (3.4.8)

Meanwhile the minimum error variance solution k to the inverse problem defined by equation 3.1.1 can be calculated as

13

k = V1S-11Ut

1h (3.4.9)

The use of U1 instead of U in equation 3.4.9 follows from the fact that partitioning of V into [V1 V2] may imply complementary partitioning of U into [U1 U2].

Equation 3.4.9 states that the solution k to the inverse problem of model calibration lies within the subspace spanned by the columns of V1. This subspace, the orthogonal complement to the null space, is referred to herein as the “solution space” of Z. k is, in fact, the orthogonal projection of the unknown parameter set k onto the solution space of Z. It is the minimum error variance solution to the inverse problem defined by (3.1.1). It gains this status from the fact that k includes no null space components of k. Hence it includes no features that are not supported by the calibration dataset h. By confining itself to the solution space, and admitting no null space components, calculation of k using equation 3.4.9 avoids the risk of “wandering into the null space in the wrong direction”. k therefore constitutes the safest, and therefore minimum error variance, parameter vector which allows the model to fit the calibration dataset. See Moore and Doherty (2005) for further details.

The situation can be visualized using figure 3.1. Suppose that parameter space has three dimensions; that is, the number of elements comprising the vector k is 3. Singular value decomposition of Z yields the orthogonal unit vectors v1, v2 and v3 which collectively span parameter space. These do not, in

general, point in the same direction as the vectors [100 ] ,

[010 ]and [001 ] . Hence each of vectors v1, v2

and v3 are effectively orthogonal combinations of the parameters k1, k2 and k3 comprising the vector k. v1, v2 and v3 define new axes in parameter space.

Suppose that, for this particular example, the dimensionality of the solution space is 2 while that of the null space is 1. The relationship between the calculated k and the unknown k is depicted in the figure.

v3

v2

v1

k

k

14

Figure 3.1 Relationship between estimated parameters k and real world parameters k. From Doherty (2015).

3.4.3 Quantification of Parameter and Predictive ErrorIn practice, the optimal dimensionality of the solution space is not defined by the number of non-zero singular values in S, for this takes no account of the presence of noise in the calibration dataset. This will now be shown.

Though substitution of equation 3.1.1 into equation 3.4.9 we obtain

k = V1S-11Ut

1Zk + V1S-11Ut

1ε (3.4.10)

From (3.4.4) and the orthogonality of V1 and V2 this becomes

k = V1Vt1k + V1S-1

1Ut1ε (3.4.11)

Parameter error can therefore be calculated as

k – k = -(I – V1Vt1)k + V1S-1

1Ut1ε (3.4.12)

Through use of the following relationship

V1Vt1 + V2Vt

2 = I (3.4.13)

equation 3.4.12 can be re-written as

k – k = -V2Vt2k + V1S-1

1Ut1ε (3.4.14)

Equation 3.4.14 makes it clear that the error in the parameter set k obtained through model calibration has two orthogonal components; see figure 3.2. The first contribution is expressed by the first term on the right of equation 3.4.4. This is the “cost of uniqueness” term, the error accrued through eschewing the null space when seeking a minimum error variance solution to the inverse problem. The second term arises from noise in the calibration dataset. Where an inverse problem is well posed, this is the only source of error in the calibrated parameter field. Where manual regularisation is employed to formulate a well-posed inverse problem from an ill-posed inverse problem, this is the only “visible” source of parameter error. However, parameter (and predictive) error may be seriously underestimated through ignoring the first term. An advantage of undertaking regularisation numerically instead of manually is that this term is, in theory, calculable because parameter complexity is retained in solving the inverse problem. Where regularisation is undertaken manually, it is not.

15

v3

v2

v1

k

k

-V2Vt2k

V1S-11Ut1ε

Figure 3.2. The two contributions to error in the estimated parameter set k. (From Doherty, 2015).

Parameter error k - k cannot be calculated (and therefore corrected) as both k and ε are unknown. However the potential for parameter error can be expressed through the covariance matrix of parameter error. If equation 3.1.5 is applied to equation 3.4.14 we obtain

C(k – k) = V2Vt2C(k)V2Vt

2 + V1S-11Ut

1C(ε)U1S-11Vt

1 (3.4.15)

From (3.4.1) and (3.4.2) this becomes

C(k – k) = σ2kV2Vt

2 + σ2εV1S-2

1Vt1 (3.4.16)

Suppose now that we wish to make a prediction s using the calibrated model. The correct value for this prediction is given by equation 3.1.3. On the other hand, the prediction s made by the calibrated model is given by

s = ytk (3.4.17)

Predictive error is thus calculated as

s – s = yt(k - k) (3.4.18)

Using equation 3.1.5, predictive error variance can be calculated as

σ2s–s = ytC(k -k)y (3.4.19)

If (3.4.16) is substituted into (3.4.19) we obtain

σ2s–s = σ2

kytV2Vt2y + σ2

εytV1S-21Vt

1y (3.4.20)

Moore and Doherty (2005) discuss equation 3.4.20 in detail. This equation can be used to define optimal partitioning of parameter space into solution and null subspaces. If the error variance of a prediction is plotted against the number of pre-truncation singular values (with “truncation” referring to the location at which partitioning of V into V1 and V2 occurs), a graph such as that shown in figure 3.3 is obtained.

16

σ2s-s

Number of singular values

null space term

solution space term

total predictive error variance

benefit of calibration

Figure 3.3. Predictive error variance as a function of number of singular values used in the inversion process. (From Doherty, 2015.)

(Note that, in general, goodness of fit achieved with the calibration dataset increases with number of singular values assigned to the solution space. The horizontal axis of the graph shown in figure 3.3 can therefore be labelled “goodness of fit”. With this as the independent variable, it can also be drawn for other forms of regularisation, including manual regularisation.)

The use of zero singular values in figure 3.3 is equivalent to not calibrating the model at all. Predictive error variance under these circumstances is equivalent to the variance of prior predictive uncertainty; see equation 3.1.6. The first term on the right of equation 3.4.20 falls monotonically as the number of pre-truncation singular values increases; this is referred to as the “null space term” in figure 3.3. At the same time, second term rises monotonically, slowly at first and then very rapidly because of the presence of S1

-2 component of this term; this is referred to as the “solution space term” in figure 3.3. At some point predictive error variance is minimized. The difference between the error variance at this point and that calculated for truncation at zero singular values records the reduction in predictive error variance from its pre-calibration value accrued by the calibration process. Where a prediction is sensitive to solution space parameter components this, and hence the benefits of history-matching, can be large. In contrast, where a prediction is sensitive to predominantly null-space parameter components, the benefits of calibration may be very small. For all predictions made by the model, the minimum of the predictive error variance curve, no matter how shallow, will occur at about the same number of singular values. Doherty (2015) shows that the minimized predictive error variance will be slightly greater than the predictive uncertainty variance calculated using the linearized form of Bayes equation (see equations 3.2.2).

The above theory illustrates why “over-fitting” should be avoided. Presumably, a fit with the calibration dataset should be sought which is no better than that which is dictated by the potential for error in measurements which comprise it. That is, the level of model-to-measurement misfit should respect measurement noise. Equation 3.4.20 shows that the information contained in a calibration dataset becomes more and more contaminated by associated measurement noise as a modeller tries to expand the dimensionality of the solution space. At some point, the potential for parameter and predictive error incurred by measurement noise as the solution space is expanded

17

becomes greater than the reduction in parameter and predictive error incurred through reducing the dimensionality of the null space.

3.4.4 Model-to-Measurement MisfitA residual is the difference between a measurement and the corresponding quantity calculated by a model. For a calibrated model, the vector of residuals r is calculated as

r = h - Zk (3.4.21)

Substitution of (3.4.11) into (3.4.21) yields

r = h – ZV1Vt1k – ZV1S-1

1Ut1ε (3.4.22)

Following substitution of equation 3.1.1 for h this becomes

r = Zk + ε – ZV1Vt1k – ZV1S-1

1Ut1ε (3.4.23)

Subjecting Z to singular value decomposition, and noting that

Z = U1S1Vt1 + U2S2Vt

2 (3.4.24)

equation 3.4.23 becomes

r = U2S2Vt2k + (I – U1Ut

1)ε (3.4.25)

Through implementation of the relationship

U1Ut1 + U2Ut

2 = I (3.4.26)

we finally obtain

r = U2S2Vt2k + U2Ut

2ε (3.4.27)

so that (using equations 3.1.5, 3.4.1 and 3.4.2) the covariance matrix of residuals is calculated as

C(r) = σ2kU2S2

2Ut2 + σ2

εU2Ut2 (3.4.28)

Both terms of equation 3.4.28 fall monotonically as the number of singular values employed for solution of the inverse problem (i.e. the number of pre-truncation singular values) increases. Doherty (2015) shows that the diagonal elements of C(r) should all be less than twice σ2

ε if truncation takes place at that point where predictive error variance is minimized. If weights are chosen in accordance with equation 3.4.3, the calibration objective function (sum of weighted squared residuals) should thus be between one and two times the number of observations comprising the calibration dataset.

18

4. Accommodating Model Defects4.1 IntroductionImplied in the theoretical developments of the previous chapter is the notion that a numerical model contains no defects, and that all predictive error arises from the quest for uniqueness and from the presence of noise in the measurement dataset. It is also implied that the potential for predictive error (that is predictive error variance) can be quantified, and that this provides a conservative estimate of posterior predictive uncertainty. The latter is assumed to be a quantity that depends only on the information content of expert knowledge and of historical measurements of system state. Both of these are contaminated “noise”.

In fact, all models have defects because all models are simplifications of reality. Simpler models have greater defects. Simple models may not provide receptacles for all of the information that is resident in expert knowledge and in historical measurements of system state. At the same time, the receptacles that they do provide for this information may be imperfect, so that when the model is used to make a prediction, that prediction is accompanied by bias. Worse still, the extent to which a prediction may be corrupted through having being calculated using a defective model is inherently unquantifiable using that model.

Notwithstanding the problems that use of a simple model may incur, simple models are attractive for many reasons. They are less expensive to build than complex models. They run much faster, and are generally more numerically stable than complex models. If they are endowed with enough parameters, they can often be calibrated to yield a good fit with a measurement dataset. It is tempting to believe that if this is the case then their defects have been effectively “calibrated out”. As it turns out, this may indeed be the case for some predictions; however if the same simple model is used to make other predictions, the potential for error associated with those predictions may have actually been amplified through the history-matching process. At the same time, because of the parsimonious parameter set that normally accompanies simple model design, the calibration null space may not be well represented in a simple model. The first term of equation 3.4.16 is therefore diminished; predictive error variance is thereby underestimated.

The present chapter expands the theoretical developments of the previous chapter to accommodate model defects. The subspace theme is retained because of the light that this sheds on the simplification process. In fact, singular value decomposition can be considered a simplification process itself, for in separating the null space from the solution space, and restricting solution of the inverse problem to the latter space, it dispenses with parameter combinations that are non-essential to attaining a good fit with the calibration dataset. Furthermore, simplification achieved in this manner is “optimal” in the sense that it incurs no bias in estimated parameters and in predictions required of the model. As previously stated however, elimination of the null space comes at a cost, for it compromises the ability of a simple model to quantify the uncertainties of predictions that have any null space dependency.

4.2 Mathematical Formulation4.2.1 The Problem Defining EquationTo recognize the presence of defects in a model, equation 3.1.1 which describes the action of a model under calibration conditions, is replaced by the following equation

h = Zk + Zdkd + ε (4.2.1)

19

Equation 4.2.1 introduces a new set of parameters to the model, these being encapsulated in the kd

vector. These can be thought of as “defect parameters”. They represent differences between a simple model and the real world. These differences can arise from parameter simplification. However they can also arise from approximations incurred by the nature of the model itself. Thus they can represent the effects of gridding, meshing and other forms of spatial and temporal discretisation, misrepresentation and simplification of boundary conditions, use of lumped water storage elements in place of spatially distributed storages, incorrect and simplified forcing functions, etc. The matrix Zd represents the processes that operate on these defect parameters.

4.2.2 Parameter EstimationThe user of a simplified model is unaware of the Zdkd term of equation 4.2.1. When he/she calibrates the model, he/she assumes that equation 3.1.1 prevails. Calibration of the simple model, achieved through regularised inversion, calculates a calibrated parameter set k using equation 3.4.9. If equation 4.2.1 instead of equation 3.1.1 is then substituted for h in equation 3.4.9 we obtain

k = V1S-11Ut

1Zk + V1S-11Ut

1Zdkd + V1S-11Ut

1ε (4.2.2)

from which parameter error is readily calculated as

k - k = -V2Vt2k + V1S-1

1Ut1Zdkd + V1S-1

1Ut1ε (4.2.3)

From the above equation it is apparent that unless all of the columns of U1 are orthogonal to those of Zd, model defects incur calibrated parameter error. This does not mean that these defects are necessarily exposed through the calibration process as irreducible model-to-measurement misfit, though this may be the case for some simple models; see the discussion on model-to-measurement misfit below. In many cases of model simplification however, it may mean that adjustable parameters k adopt roles that compensate for model defect parameters kd to such an extent that these defects are effectively concealed from the calibration process altogether. A good fit is therefore obtained with the calibration dataset. The greater the extent to which they are concealed, the greater is the extent to which adjustable model parameters k thereby “absorb” the misinformation that is encased in model defects. Given that even simple models are generally not so simple that they seriously compromise model-to-measurement misfit (this is normally an important design criterion for a simple model), the potential for parameters employed by a simple model to adopt surrogate roles in compensation for their defects is high.

A modeller may be able to mitigate the adverse effects of parameter surrogacy through appropriate re-formulation of the inverse problem. Suppose that observations used in the calibration process, together with their model generated counterparts, are transformed prior to undertaking this process so that equation 4.2.1 becomes

Th = TZk + TZdkd + Tε (4.2.4)

The transformation matrix T should be chosen to be as orthogonal to Zd as possible. Such transformation will necessarily require that the number of elements in Th are fewer than those in h. Thus data which holds information that the simple model is not capable of accommodating is eliminated, or “orthogonalized out” of the calibration process. Doherty and Welter (2010) and White et al (2014) show that such transformation will often involve spatial and temporal differencing. This is not unexpected, as a model is often better at calculating differences than absolutes. Thus the calibration process is formulated in such a way that it “plays to the model’s strengths”.

4.2.3 Model-to-Measurement FitWith model defects taken into account, equation 3.4.21 becomes

20

r = h – Zk - Zkd (4.2.5)

With a little matrix manipulation, similar to that undertaken previously, this can be expressed as

r = U2S2Vt2k + U2Ut

2ε + U2Ut2Zdkd (4.2.6)

Equation 4.2.6 shows that model defects have a minimal effect on model-to-measurement misfit if the columns of Zd lie within the range space of U1, and hence are orthogonal to the columns of U2. This means that errors in model-calculated quantities that emerge from its defects can be “absorbed” by adjustable parameters during the process of model calibration. As stated above, parameters adopt surrogate roles in doing so. As will be shown below, this opens the possibility of predictive bias. At the same time, the defects themselves may be invisible to the calibration process as their presence is not expressed as model-to-measurement misfit. Nevertheless, their presence may be expressed by lack of credibility of values assigned to some parameters through the calibration process. It is important to note, however, that while this manner of detecting parameter surrogacy may provide a safeguard for physically-based models for which the link between model parameters and real-world hydraulic properties is explicit, it provides less of a safeguard for simple models which employ lumped storage elements and other abstract numerical devices for which the links to directly observable quantities whose properties are informed by expert knowledge of the real world are less direct.

Suppose that the elements of kd are statistically independent of those of k, and that (for the sake of simplicity) the covariance matrix of defect parameters C(kd) can be expressed as

C(kd) = σ2kdI (4.2.7)

Then from (4.2.6) and (3.1.5) the covariance matrix of residuals C(r) is then readily expressed as

C(r) = σ2kU2S2

2Ut2 + σ2

εU2Ut2 + σ2

kdU2Ut2ZdZt

dU2Ut2 (4.2.8)

The first term of equation 4.2.8 decreases, ultimately to zero, with increasing singular value truncation point. The second term decreases more slowly, but does not disappear completely as measurement noise finds expression in model-to-measurement misfit. The behaviour of the third term depends on the nature of model defects expressed by kd and Zd. If some columns of Zd do not lie within the range space of Z (and hence are non-orthogonal to columns of U2 which correspond to singular values of zero) these defects will find expression in model-to-measurement misfit regardless of values assigned to parameters k adjusted through the calibration process. In this case the inadequacies of the simplified model will indeed be exposed through calibrating that model. This does not mean, however, that all of its adjustable parameters, and some of its predictions, will be immune from calibration-induced bias, for some other columns of Zd may indeed lie within the range space of Z.

4.2.4 Predictive ErrorSuppose that the calibrated, simple model is used to make a prediction. For the linear model which is the subject of the present discussion, a prediction is made using equation 3.4.17. This, of course, is the same equation as that employed when making a prediction with a non-defective model; after all, when the owner of a simple model uses that model to make a prediction, he/she is oblivious to its defects. The real world prediction, however, is calculated using the equation

s = ytk + ytdkd (4.2.9)

In this equation the vector y encapsulates sensitivities of the prediction to parameters employed by the simplified model. However the vector yd holds sensitivities of the prediction to defect

21

parameters which are an integral, though inaccessible, aspect of the simple model’s design. The error in the prediction made by the simple model is therefore

s – s = yt(k – k) - ytdkd (4.2.10)

Through substitution of equation 4.2.3 into equation 4.2.10, followed by a small amount of matrix manipulation, this becomes

s – s =-ytV2Vt2k + ytVt

1S-11Ut

1ε + (ytV1S-11Ut

1Zd – ytd)kd (4.2.11)

The k and ε vectors that feature in equation (4.2.11) are unknown. However, as was done in the previous chapter, a modeller can be expected to have some knowledge of the level of noise afflicting his/her measurements, and can express this through the measurement noise covariance matrix C(ε). Similarly, the covariance matrix C(k) can be used to express prior parameter uncertainty, this reflecting expert knowledge as it pertains to these parameters. As stated above, however, the link between expert knowledge and parameters employed by a simplified model may be more tenuous than that between expert knowledge and parameters used by more complex, physically-based models.

Stochastic characterization of defect parameters kd is more difficult, as this approaches the realm of “unknown unknowns”. Nevertheless, for the purpose of continuing the mathematical discussion, it is assumed that the “extent of simple model wrongness” can be characterized by a covariance matrix C(kd) that is described by equation 4.2.7. Then, if model defect parameters show no statistical correlation with either the parameters k employed by the simple model, or with measurement noise ε, propagation of variance as expressed by equation 3.1.5 can be applied to equation 4.2.11 to yield the error variance of a prediction made by the simplified model.

σ2s–s = σ2

kytV2Vt2y + σ2

εytVt1S-2

1V1y + σ2kd(ytV1S-1

1Ut1Zd – yt

d)t(ytV1S-11Ut

1Zd – ytd) (4.2.12)

The first two terms of equation 4.2.12 are identical to those of equation 3.4.20. These are the terms that a modeller actually employs to calculate predictive error variance using his/her simplified model. Alternatively, but similarly, he/she may use the linearized form of Bayes equation (equations 3.2.2) to calculate the posterior uncertainty of a prediction using sensitivities embodied in the Z matrix. As has already been discussed, both of these equations should yield similar results. Or the modeller may undertake nonlinear predictive uncertainty analysis using, for example, Markov chain Monte Carlo. If the simplified model is not too nonlinear the results will be similar. In using any of these methodologies to characterize model predictive error/uncertainty, model defects are ignored. Hence the real potential for model predictive error is probably miscalculated.

Predictive error variance cannot in fact be calculated unless both Zd and kd are known. This can only be done for synthetic cases designed specifically to explore the effects of model defects on predictive error; see Watson et al (2013), White et al (2014), White et al (2016), and the PREDVAR1C utility of Doherty (2016). If Zd and C(kd) are, in fact, known, the graph depicted in figure 3.3 can be modified to that depicted in figure 4.1. In this figure the black lines are the same as those shown in figure 3.3; they depict the outcomes of predictive error/uncertainty analysis that the owner of the simple model undertakes using his/her model. The red lines, however, are the normally unseen terms that express the effects of model simplification on predictive error variance. The red dashed line depicts the third term of equation 4.2.12, while the red full line depicts total predictive error variance.

22

σ2s-s

Number of singular values

null space term

solution space term

total predictive error variance (no model defects)

model defect term

total predictive error variance (model defects included)

Figure 4.1. The three terms of equation 4.2.12, together with total predictive error variance. (From Doherty, 2015.)

As was discussed in the chapter 3 of this document, the first term of equation 4.2.12 falls with increasing number of singular values used in the inversion process (and hence with goodness of model-to-measurement fit achieved through that process). In contrast, the second term rises, eventually very fast. The third term, however, cannot be expected to show monotonic behaviour. Nor can it be expected to be zero when the number of singular values is zero.

An important feature of the third term of equation 4.2.12 is its prediction-specific nature. If a prediction is such that

ytd = ytV1S-1

1Ut1Zd (4.2.13)

then this term will be zero. Doherty (2015) shows that this occurs where a prediction is entirely sensitive to solution space parameters of the real-world “model” – the model of which the numerical model actually used to make predictions is a simplification. In general, these are predictions that are very similar in character to measurements comprising the calibration dataset. The model is thus being asked to make similar predictions in the future to observations against which it was calibrated; furthermore, the stresses to which the simulated system will be subjected in the future are expected to be similar to those that it has experienced in the past. In these circumstances, the design of a model can be very forgiving indeed, for its capacity to fit the past is all that is required for it to predict the future. Furthermore, prior information pertaining to simple model parameters takes no part in predictive uncertainty quantification, as the information that the model requires to make the prediction is resident solely in the calibration dataset.

The situation is different, however, for predictions that are at least partially sensitive to null space parameter components of the “real-world model” (of which the simplified model is a defective emulator). As far as the owner of the simple model is concerned, these predictions may be sensitive only to solution space components of his/her model. However, as will be shown below, adjustment

23

of these parameters during calibration of the simple model may entrain null space parameters of the real world model. In doing so, predictions which are sensitive to thus-entrained, real world null space parameters lose their minimum error variance status, and therefore accrue bias. The extent to which this occurs is prediction-specific and, in general, unquantifiable.

Another important feature of simple model calibration that is demonstrated by equation 4.2.12, and by figure 4.1 that schematizes application of this equation, is the shift to the left of the number of singular values at which predictive error variance is minimized. Once again, the extent of this shift is prediction-specific and cannot in general be known. In some cases the total predictive error variance curve may not even have a minimum, but instead rise from zero singular values. The repercussions of this for calibration of a simple model are profound. If the simple model is used to make certain types of predictions - predictions that are similar to measurements comprising the calibration dataset so that equation 4.2.13 applies - then the modeller is entitled to seek just as good a fit with the calibration dataset as if the model were not defective at all. Alternatively, where the model is used to make other types of predictions, a poorer fit with field measurements should be sought as the model is calibrated. For some predictions, especially those which are largely uninformed by the calibration dataset as they are sensitive mainly to null space parameter components of the background real world model, it may be better to eschew history-matching altogether. The prediction is then made using the uncalibrated model; in exploring the uncertainty of this prediction, prior uncertainty is used as a surrogate for posterior uncertainty.

An extremely important point that this analysis demonstrates is that, for a simplified model, the link between good parameters and good predictions is broken. Where equation 4.2.13 applies, parameters may adopt surrogate roles during the calibration process to the point where they are assigned values that are highly questionable from an expert knowledge point of view. However the prediction is not compromised in any way. For other predictions, parameter surrogacy induced by history-matching may engender a high degree of unquantifiable predictive bias. It is possible that this can be mitigated to some extent through under-fitting. In other cases, reformulation of the inverse problem through filtering out bias-inducing combinations of measurements from the calibration dataset using a strategy that is schematically depicted by equation 4.2.4 may be effective in reducing predictive bias. The important point, however, is that calibration of a simple model needs to be done with the prediction that it is required to make in mind. If the model is required to make a number of different predictions, then it may need to be calibrated, and then re-calibrated, accordingly.

At this point it is salient to remember that all models – not just so-called “simple models” - are gross simplifications of reality. The above considerations apply to them as well. Sadly, the extent to which they apply is unquantifiable. This makes them no less real, and recognition of the phenomena discussed herein, no less urgent.

4.3 A Pictorial RepresentationThe following explanation and pictures are slightly modified from Doherty (2015).

Figure 4.2 depicts a three-dimensional parameter space with orthogonal axes which coincide with the three parameters that define this space. Two of the parameters specified by these axes are employed by a simple model; these are adjustable through calibration of that model. The third parameter is a defect parameter (i.e. a “kd” parameter of equation 4.2.1). Hence its (unknown) value is hardwired into the defective model’s construction. Collectively, the three parameters span the entirety of parameter space. Hence, conceptually at least, replication of past and future system

24

behaviour is possible using these three parameters even though parameter kd1, the sole element of the kd parameter set of equation 4.2.1, cannot be varied.

k1

k2

kd1

Figure 4.2. Two parameters employed by a simple model and one defect parameter; collectively they span the entirety of parameter space.

Let the matrix Zr denote the “reality” model of the system. (As has been stated above, the modeller does not have access to this model; he/she has access only to the Z model). Thus

h= Zr [ kk d ] + ε= [Z Zd ] [ kk d ] + ε = Zk + Zdkd + ε (4.2.14)

Let the real world model matrix Zr be subjected to singular value decomposition, so that

Zr = UrSrVtr (4.2.15)

Suppose that the null space of Zr has one dimension and that its solution space therefore has 2 dimensions. The three vi vectors that comprise the columns of Vr are added to the three native model parameter vectors in figure 4.3. v1 and v2 span the solution space of Zr while v3 spans its null space.

25

k1

k2

kd1

v1

v2

v3

Figure 4.3. Model parameters together with the v i vectors (shown in black) which result from singular value decomposition of the real world model matrix Zr.

Let the vector kr represent the three parameters of the real world model Zr. Suppose that it is possible to actually build and then calibrate this model. The calibration process of the Zr model would yield the vector kr shown in figure 4.4. This is the projection of kr onto the solution space of the real world model matrix Zr. This estimate of parameters is not, of course, correct. But it is of minimum error variance (and hence without bias) because the parameter set kr allows model outputs to fit the calibration dataset to a level that is commensurate with measurement noise while possessing no null space components; the latter are, by definition, unsupported by the calibration dataset.

26

k1

k2

kd1

v1

v2

v3

kr

kr

Figure 4.4. True model parameter set kr and parameter set kr (shown in red) that would be estimated through an “ideal” calibration process undertaken using the real world model Zr.

Unfortunately, the real world model Zr cannot be calibrated because a modeller does not have access to it. He/she can only calibrate the simplified model Z. It is through estimation of the k1 and k2

parameters comprising the vector k of equation 4.2.14 that a good fit is thereby sought with the calibration dataset. Obviously, there are only two of these. Meanwhile the third parameter kd of the simple model which expresses its defective nature with respect to the real world system is fixed at a certain value, this value being implied in construction of the simplified, defective model.

Because the dimensionality of the solution space of Zr is two, the two parameters of the simplified Z model are enough to support a good fit between its outputs and the calibration dataset (provided that neither of its parameters lies entirely within the null space of Zr). Let k denote the parameter set achieved through calibration of the Z model. The vector corresponding to this parameter set must lie in the k1/k2 plane. At the same time, its projection onto the solution space of the real world model (i.e. the space spanned by v1 and v2) must be kr; if this is not the case then k would not allow the model to fit the calibration dataset. This is depicted in figure 4.5.

27

k1

k2

kd

v1

v2

v3

kr

kr

k

k1

k2

Figure 4.5. Calibration of the Z model through adjustment of only k1 and k2 with kd fixed leads to the vector k (shown in blue) which projects onto the solution space of the real world model Z r as kr.

While the projection of k onto the solution space of the real world model is correct (as it must be for the simple model to fit the calibration dataset), the projection of k onto the null space of the real world model is incorrect if the vector k is to claim a status of minimized error variance. This projection has a non-zero value. To the extent that any prediction required of the simple model is sensitive to null space components of the real world model, that prediction will therefore have gained an unquantifiable bias through calibration of the simple model.

For further discussion, see Doherty (2015).

4.4 Regularisation and Simplification The above discussion suggests that model simplification can be viewed as a form of regularisation. Recall from the discussion of chapter 3 that “regularisation” is the term used to describe the process through which a unique solution is found to an inherently nonunique inverse problem. As was discussed above, there is a metric for optimality of this unique solution. Obviously, it is desirable for the calibrated model to fit the calibration dataset well; in doing so it has extracted the information content of that dataset and transferred it to the model. However, because the inverse problem is nonunique, there are an infinite number of ways to fit the calibration dataset. The “best” way is that which achieves a parameter set of minimized error variance. Any prediction which is made using the calibrated model is therefore also of minimized error variance; that is, it is without bias.

It was shown in chapter 3 that singular value decomposition achieves this minimum error variance solution to the inverse problem. If necessary, it has to be applied to an inverse problem which is modified through pre-calibration Kahunen-Loève transformation of its parameters. (Watson et al, 2013, demonstrate how parameter and predictive bias may be incurred if this is not done.)

28

Model simplification also achieves parameter decomposition. Ideally, a simplified model employs enough (appropriately designed) parameters to allow a good fit to be achieved between model outputs and members of the calibration dataset. However the decomposition implied by construction of the simple model may not be ideal. That is, parameter decomposition implied by construction of the simple model may depart from “optimal simplification” implied by singular value decomposition of the (unattainable) “real world model”. Hence calibration of the simple model may lead to entrainment of null space parameter components of the notional real world model. Some predictions made by the simplified model may therefore incur bias through the act of calibrating that model. Other predictions, particularly those that are similar in nature to measurements comprising the calibration dataset, will not.

For the making of predictions that are similar in nature to those comprising the calibration dataset, a simplified model is therefore “fit for purpose” as long as it can fit the calibration dataset well. It may not be “fit for purpose” when asked to make other predictions, however. This does not mean that it should not be asked to make these other predictions. What it does mean, is that the simple model may need to be re-calibrated before making them in order to render it fit for this new purpose. The revised calibration procedure may adopt an inversion formulation that filters out components of the calibration dataset that may induce bias in these new predictions; at the same time, a greater misfit may be tolerated between model outputs and the calibration dataset.

4.5 Non-Linear AnalysisDespite their linear origins, the equations developed in the present chapter can be used as a basis for nonlinear analysis through which the costs and benefits of model simplification can be explored. First, we rewrite equation 4.2.11 as equation 4.5.1 after slightly re-arranging terms.

s = s + ytV2Vt2k - ytVt

1S-11Ut

1ε – (ytV1S-11Ut

1Zd + ytd)kd (4.5.1)

Suppose that a modeller has built a complex model for a study site, as well as a simpler model for the same site. Let us assume that defects associated with the complex model do not compromise its predictions, and that they do not induce parameter surrogacy through its calibration. These are standard (though not necessarily correct) assumptions that underpin most complex model usage. The simple model employs a parameter set k. However implied in its construction is a defect parameter set kd. Elements of kd are unknown and non-adjustable. For reasons discussed above, however, their existence may compromise the use of the simple model.

Suppose now that the modeller populates the complex model with N different random realisations of the (appropriately complex) set of parameters employed by this model. (The complex model may not even employ explicit parameters; perhaps its parameterization is based on stochastic, geostatistically-based, hydraulic property fields.) For each such realization, the complex model is run to produce a set of outcomes which correspond to measurements h which comprise the calibration dataset. Realisations of measurement noise ε are added to these outputs. For each of these N parameter fields and measurement noise realisations, the simple model is then calibrated against the complex-model-generated calibration dataset to yield a parameter set k corresponding to the original complex model parameter field. Ideally the simple model runs fast enough to undergo rapid calibration, yet is complex enough to support a good fit with the calibration dataset. The outcome of this process is N sets of random realisations of the complex model’s parameter field, and N sets of corresponding k parameter sets which allow the simple model to match a calibration dataset generated by the complex model.

29

Suppose now that a prediction s of interest is made by the complex model using each of its N random parameter sets. The simple model is then used to make this same prediction using each of its respective k parameter sets. We designate the prediction made by the simple model as s. If s (the prediction made by the complex model) is plotted against s (the prediction made by the simple model with the partnered parameter field k), a graph such as that shown in figure 4.6 results.

Figure 4.6. A plot of predictions s made by a complex model against predictions s made by a simple model. On each occasion the simple model is calibrated against outputs generated by the complex model. (From Doherty, 2015.)

Doherty and Christensen (2011) analyse the properties of scatterplots such as those depicted schematically in figure 4.6. As is illustrated in that figure, prior parameter uncertainty is expressed by the vertical range of the plot. Where simplification is ideal, a line of best fit through the scatterplot has a slope of unity and passes through the origin. However, to the extent that calibration of the simple model induces bias in predictions made by that model because of the surrogate roles that simple model parameters must play to compensate for its defects, the slope of the best-fit line through the scatterplot falls below 1.0. This indicates that the range of predictions made by the calibrated simple model can be greater than those made by the complex model (and, by inference, those which are possible in reality). Alternatively, where the line of best-fit has a slope of greater than unity, this illustrates an incapacity on the part of the simple model to simulate the range of conditions that may prevail in the future. In terms of equation 4.2.1, this is an outcome of correlation of kd with k, an expression of the fact that, as far as the prediction of interest is concerned, the simple model is not fit for purpose.

Suppose now that the simple model is calibrated one more time – this time against the real-world dataset instead of a synthetic dataset generated by the complex model of that study site. The simple model is then used to make the prediction s. Doherty and Christensen (2011) show that the prediction of minimum error variance that the complex model would have made if it had been calibrated can be inferred from a graph such as that shown in figure 4.6. The post-calibration predictive error variance associated with that prediction can also be ascertained; see figure 4.7. This

30

error variance is the same as that which would have been calculated using equation 3.4.20 (if the model were linear), but using the real model.

s

s

post-calibration predictive error variance

prediction made by calibrated simple model

minimum error variance prediction inferred for complex model

Figure 4.7. Estimation of the prediction, and its associated error variance, that would have been made using a complex model when that prediction is actually made with a paired simple model.

31

5. Repercussions for Construction, Calibration and Deployment of Simple Models5.1 IntroductionThe present chapter examines some of the repercussions of the theory presented in the previous chapter for how a simple model should be built, calibrated and deployed in the decision-making context. Recall from chapter 2 that the role of modelling in decision-support is to enable risk assessment. Conceptually, this requires that the model be capable of testing hypotheses that certain bad things will happen if certain courses of management action are taken. Rejection of the hypothesis that a certain bad thing will happen may allow that course of action to proceed. Modelling fails as a basis for decision-support where a hypothesis is rejected that should not be rejected; a type II statistical error is therefore committed. Predictive uncertainty intervals calculated by a simple model must therefore be conservative. In being so, they must account for any bias that is introduced to the simple model through its construction and through its calibration. However predictive uncertainty intervals must not be so conservative as to render the support that modelling provides to the decision-making process meaningless.

The role of modelling in decision-support, and the fact that this role must take into account the defective nature of both complex and simple models, is addressed by Doherty and Simmons (2013) and by Doherty and Vogwill (2015). The discussion provided in the present chapter draws on conclusions presented in both of these documents. At the same time it expands the discussion to include other facets of simple model usage in the decision-making context.

5.2 Benefits and Drawbacks of Complex ModelsFrom the theoretical developments of the preceding chapter it can be concluded that simple models should be used with caution. The same theory also indicates what to be cautious about. Before discussing how caution should be exercised in simple model deployment, it is worthwhile considering why there is an incentive to employ a simple model in place of a complex model in the first place. In doing so, some of the points made in chapter 2 of this document are revisited.

Conceptually, complex models have the following advantages.

The parameters and processes which they embody are represented in ways that can be informed by expert knowledge. Complex models are generally physically-based. Their parameters pertain to system properties that can be measured in the field or in a laboratory. The geometries of their constituent subdomains are directly inferable from real-world measurements, this reducing the burden of calibration in providing values for many of the parameters employed by these models.

Model complexity supports representation of system property (and therefore parameterization) detail. Not only the calibration solution space, but also the calibration null space, can therefore be adequately represented in the model. Representation of the former space allows the model to replicate historical system behaviour. Good model-to-measurement fits can therefore be attained through calibration of a complex model. Representation of the null space ensures that the uncertainties of decision-critical predictions can be properly explored. The greater the degree to which these predictions are sensitive to non-inferable parameterization detail, the more important this becomes.

There are strong sociological pressures to build a complex model. Uninformed stakeholders demand that a numerical model be in accordance with concepts of “a model” drawn from other contexts. Where models compete for public or judicial approval, a model which looks

32

more like “the real thing” is generally favoured over one that does not for, in the eyes of the mathematically illiterate, it has already demonstrated its superior abilities to emulate an environmental system. Looks are everything.

Models that are built by one party are generally reviewed by another party. Those who are paid to build models, and who need the approval of reviewers in order to satisfy their clients, are disinclined to resort to abstraction and simplicity, as this approach to model-based decision-support is more likely to meet with peer disapproval than attempts to more “accurately” simulate the nuances of environmental processes. In short, reviewers are less likely to object to a complex model than to a simple model.

At the same time, complex models have many disadvantages. In many modelling contexts, these heavily outweigh their advantages. They include the following.

The run times of complex models can be inordinately long. While the physical basis of a complex model may provide a vehicle for expression of expert knowledge, it is rarely given proper voice in everyday modelling practice. As is discussed in chapter 3 of this document, expert knowledge is a stochastic quantity. The greater is the level of detail that is represented in a complex model, the greater is the uncertainty associated with this detail. Stochasticity can be given expression by running a model many times using different random realisations of its parameters. This becomes impractical where a model take a long time to run. It also requires the development of stochastic descriptors of parameter fields of high dimension. The skillsets required to do this are rare among environmental modelling practitioners.

Complex models are often numerically unstable. This increases run times. It also makes computation of finite-difference derivatives of model outputs with respect to adjustable parameters almost impossible. This makes calibration of complex models difficult, if not impossible. It also obstructs use of packages such as PEST which can obtain minimum error variance solutions to highly parameterized inverse problems. Unfortunately, in many practical complex modelling exercises the burden of calibration is eased by draping a simplistic parameterization scheme over the complex model domain. Such a strategy often compromises model-to-measurement fit achieved through the calibration process at the same time as it fails to achieve a minimum error variance parameter field.

Notwithstanding the availability of numerical methodologies such as null space Monte Carlo, (Tonkin and Doherty, 2009; Doherty, 2016), calibration-constrained uncertainty analysis in highly parameterized contexts can be a time-consuming undertaking. Where model run times are large, and where model numerical stability is questionable, it is simply unachievable. Where it cannot be used to assess posterior predictive uncertainty, or is compromised in its attempts to do so, a complex model can contribute little to the decision-making process, unless it is used to assess prior uncertainty as a surrogate for posterior uncertainty. The latter course of action may prove useful for predictions whose uncertainties are reduced only mildly through the history-matching process.

“Complexity” is a relative thing. Complex models are more complex than simple models. However they are far less complex than reality. Their ability to quantify the uncertainty of a particular prediction may therefore be compromised by failure of the model to include in its parameterisation all null space parameter components to which the prediction may be sensitive. Some predictions made by a complex model may incur bias through calibration-induced parameter surrogacy, especially if the complex model is endowed with a simplistic parameterization scheme. The performance of a complex model in this regard may be little better than that of a cleverly-designed simple model.

33

Modellers often lose touch with the decisions that modelling must support when meeting the daily challenges of building a complex model. Once they have embarked on this time-consuming and expensive exercise, their overwhelming concern is to produce something “that works”. “Works” is often defined in terms of maintaining solution convergence while finding a parameter field that provides a fit with the calibration dataset that is good enough to be accepted by reviewers or clients. The need to ensure that decision-critical predictions are unbiased, and that their uncertainties are explored, is forgotten.

The case for providing decision-support using simple models should therefore be strong. However in building a simple model, the theoretical insights provided in the preceding chapter should be heeded. Some of the means through which they can be given practical voice in construction, calibration and deployment of a simple model are now discussed.

5.3 NotationWhile the present chapter draws on theory presented in previous chapters, a slight change in notation is now introduced in order to make it easier to distinguish between a complex model and a simple model. In the following discussion the former is presumed to be a numerical substitute for reality; hence its defects are not considered.

For a complex model the y, Z and k notation is preserved. Hence under calibration conditions

h = Zk + ε (5.3.1)

When a complex model makes a prediction s, this is made using the equation

s = ytk (5.3.2)

It is presumed that the simple model counterpart to a complex model employs a smaller number of parameters than the complex model. Its parameters will be specified by the vector p. Because the model is simple, it has defects; these are specified as pd. The action of a simple model on its parameters is specified by the matrix X; defect parameters are subject to the action of the matrix Xd. Thus

h = Xp + Xdpd + ε (5.3.3)

Equation 5.3.3 can be re-written as

h = Xp - Zk + Zk + Xdpd + ε = Zk + η + ε (5.3.4)

In equation 5.3.4 η is the “structural noise” induced by model simplification. Ideally a simple model should be complex enough for this to be small. In other words, the simple model should be complex enough to fit the calibration dataset well – or to fit the “necessary parts” of the calibration dataset well; see below.

Let the sensitivities of the prediction s to p and pd parameters be contained in the vectors w and wd. Thus

s = wtp + wtdpd (5.3.5)

5.4 Prediction-Specific ModellingComplex models are often asked to make a variety of predictions of the future behaviour of an environmental system under a variety of future stresses, some of which may be different from any stresses to which the system has been subjected in the past. Given the fact that a complex model is simple compared to reality, the expectations that are thus placed on it are questionable.

34

Nevertheless, this often defines the context in which it is built and deployed, the justification being that, as a physically-based simulator of an environmental system, it can be asked to predict any aspect of system behaviour under any conditions.

The notion that a numerical model should be considered more as a provider of receptacles for information than as a simulator of complex environmental behaviour was discussed in chapters 1 and 2 of this document. While a complex model is rarely viewed from this standpoint (wrongly in the authors’ opinion), it is important that a simple model be viewed in this way. Furthermore, the design of a simple model should focus on a specific prediction of management interest so that its performance in making this prediction can be optimised. This occurs when

it can make the prediction with as little bias as possible; it can quantify the uncertainty of the prediction; it can be guaranteed not to underestimate the uncertainty of the prediction (thereby

avoiding a type II statistical error); while reducing the uncertainty of the prediction as much as may be required to test, and maybe

reject, the likelihood of occurrence of an unwanted event.

The last specification is met if the simple model provides receptacles for the information contained within either or both of expert knowledge and measurements of system behaviour through which inconsistency of the occurrence of the unwanted event with this information can be demonstrated.

A simple model may not be capable of providing receptacles for all expert knowledge and measurement information that is available at a particular study site. Nor should it. Simplicity (and with it numerical tractability) may be gained if it only provides receptacles for the information which is pertinent to the prediction that it is designed to support.

5.5 Playing to a Model’s StrengthsThe ability of a simple model to make predictions of future environmental behaviour with integrity is compromised by the presence of defect parameters pd. (The same applies, of course, to a complex model, for it too is defective.) Some predictions will be more sensitive than others to pd parameters. In general, the greater the extent to which a particular model prediction pertains to nuances of environmental behaviour that occur at specific locations within a model domain, and the greater the extent to which the prediction depends on subtleties and details of environmental processes that are difficult to represent in a numerical simulator, the more likely it is that the prediction is sensitive to pd parameters. Unfortunately, many predictions that are required of a model are of this type. These include the response of an environmental system to extreme climatic events (for example to prolonged droughts), and calculation of indicators of aquatic biotic health such as the number of days below which stream flow will be less than a certain threshold, and/or the number of days over which nutrient concentrations will be above a certain threshold.

These same types of prediction are often subject to a high degree of uncertainty. Conceptually, if a model is complex enough, and if it includes all processes and parameters to which a prediction of this type is sensitive (whereby pd parameters of equation 5.3.5 are included in k parameters of equation 5.3.2), the uncertainty of this prediction can be quantified. Perhaps it can also be reduced through conditioning by the calibration dataset. In practice, increased complexity may fail to achieve either of these goals because

even complex models employ simplistic representations of processes which are salient to predictions of these types;

the long run times of complex models may preclude stochastic analysis of uncertainty;

35

the long run times and questionable numerical stability of complex models may preclude reduction of predictive uncertainty through history-matching, and/or quantification of post-history-matching predictive uncertainty.

A simple model will almost certainly run faster than a complex model. However its representation of prediction-salient processes may be even more compromised, this exacerbating problems that it encounters in making certain types of predictions.

A question that therefore arises is whether environmental policy should be based on quantities that (especially simple models) can calculate with integrity rather than on those which are difficult to calculate, and whose uncertainty cannot be quantified. The assistance that models provide to environmental management may be illusory, or even negative, when that assistance is based on the false premise that key model predictions are made without bias, and/or that the magnitude of possible predictive bias can be included in quantified uncertainty intervals. In contrast, to the extent that environmental decision-making is based on quantities that a model is demonstrably capable of predicting with integrity, modelling can provide better support to the decision-making process. The chances that it may diminish, rather than enhance, the decision-making process will thereby be reduced.

Expectations by the non-modelling community of what models can deliver are generally too high. These expectations are rarely disavowed by modellers themselves, as few understand the ramifications of model defects on model performance, and even fewer wish to suggest to a potential client that a large investment in modelling may be wasted. However, if decision-making was focussed more on quantities which are somewhat immune from model defects, the role that modelling can play in that process would be greatly enhanced. At the same time, it may be able to do so at lower cost. In general, these quantities tend to be differences rather than absolutes. In any particular management context they may include:

the extent to which management option A improves stream quality over management option B;

the extent to which durations of low flow will be increased or decreased above/below their present levels following specific changes in land management;

the difference in drawdown at a particular observation well following an alteration to pumping at a particular production well;

the extent to which an historical measurement of water quality would have been improved if a different land management protocol was in place at the time of the measurement.

Uncertainty analysis undertaken with a complex model would probably demonstrate that all of these predictions are accompanied by less uncertainty than predictions of actual low-flow durations, water levels and nutrient concentrations. This implies that predictive differences of these types are less sensitive to null space parameter components and to model defects than are predictive absolutes. The need to represent these null space components in a decision-support model, and/or to ensure that a decision-support model has no defects which bias these predictions, is thus reduced. This grants the modelling process a license for simplicity.

If support for environmental management can be demonstrated to be more robust when provided by a simple model than by a complex model, more such models can be built at more locations than would otherwise be the case. This would further enhance the ability of modelling to support environmental management.

36

5.6 Dispensing with the Need for Model CalibrationWhere the information content of a calibration dataset h is limited with respect to a prediction of management interest (either because it is small, noisy, or simply uninformative of the prediction), this implies that the prediction required of the model is sensitive to null space parameter components of the real world model k. In these circumstances a modeller may judge that the risk of model calibration in terms of reducing uncertainty is not worth the potential for predictive bias that calibration of a simple model may incur. He/she may thus decide to eschew calibration of the simple model, and make the prediction using parameters p that are informed by expert knowledge alone. Predictive uncertainty can then be explored by making the prediction many times with different stochastic realizations of p. The prior predictive probability distribution is thus used as a surrogate for the posterior predictive probability distribution.

Provided that:

the prediction of interest is not sensitive to model defects (or can be formulated to be such), and

realistic prior probabilities can be assigned to the elements of p, notwithstanding their abstract nature,

such a procedure, if properly undertaken, should provide conservative predictive intervals, and therefore avoid a type II statistical error (and thus failure of the modelling process according to the metrics set out in chapter 2). Conservatism arises from the fact that the posterior predictive probability distribution should, in theory, be no wider than the prior predictive probability distribution because of the constraining effect of the likelihood function on the former. However if data paucity, poor data quality, or poor data relevance suggests that its constraining effect will be small, then the degree of conservatism that is accepted through adoption of the prior predictive uncertainty interval as a surrogate for the posterior predictive uncertainty interval may not be too great.

While adoption of such a strategy can be justified on the basis of theory provided in previous chapters, it may meet with opposition from modelling stakeholders and reviewers who judge that a model can have no predictive integrity unless it is “calibrated”. While this viewpoint is not supported by mathematics presented in this document, there is nevertheless some merit in ensuring that model-calculated quantities are representative of the behaviour of the environmental system whose management it is designed to support. Hence there is validity in comparing outputs of the simple model with the historical behaviour of that system. However, there is no need to actually “calibrate” the simple model. Instead, its outputs can be compared with the calibration dataset in a probabilistic sense in order to ensure that these outputs, when generated with a variety of stochastic realizations of p, encompass the calibration dataset even if no particular p provides a fit with this dataset that can be considered to “calibrate” the model. (As is described shortly, it may be useful to process the calibration dataset before fitting model outputs to it. In this is done, processed measurements should be compared with corresponding processed model outputs.) The situation is schematized in figure 5.1.

37

time

mea

sure

d qu

antit

y

Figure 5.1. A schematic example of “stochastic fitting” of model outputs to measurements of system state.

The decision to use prior predictive uncertainty as a surrogate for posterior predictive uncertainty may be a difficult one to make for, in general, a modeller cannot know the extent to which the calibration process will induce bias in predictions in which he/she is most interested. After all, he/she has no access to the pd parameters and the Xd and wd matrix/vector of equations 5.3.3 to 5.3.5; the modeller has access to only the X matrix and p vector featured in these equations. Nevertheless, these items may provide enough information to justify the decision to forego model calibration. If linear analysis, undertaken using tools of the PEST or PyEMU suites, demonstrates that the uncertainty of a particular prediction is unlikely to be reduced much through history-matching, even if such an analysis ignores Xd, pd and wd, then the decision is probably being made on good grounds.

Where stochastic analysis is untenable because of time constraints, or because it is difficult to know the prior probability distribution of p parameters, then worst case scenario analysis may provide a sound basis for testing, and attempting rejection, of hypotheses pertaining to the occurrence of unwanted events. This, of course, is simple to implement in a context where calibration constraints are not applied. A modeller simply maximizes or minimizes wtp of equation 5.3.5 by choosing values for the elements of p appropriately. Here it is assumed that the elements of wd are small because model simplification is tuned to the prediction of interest; the pd vector does not therefore compromise the model’s ability to make that prediction.

5.7 Tuning the Calibration Dataset to Match the PredictionAs stated above, prediction-specificity is the key to appropriate model simplification. It follows that in a single study area, a number of different models may be constructed, each optimised to make a specific prediction and to examine the uncertainty in that prediction. The question of how a simple model may be prediction-optimised through the way it is calibrated is now addressed. See also the discussion above pertaining to the nature of predictions that are best sought from a simplified model.

Let us start by considering the making of a prediction by a complex model. From equation 5.3.2

s = ytk = ytk – yt(k - k) (5.7.1)

38

where k is the minimum error variance solution to the inverse problem of model calibration. From (3.4.9) and (3.4.14), this becomes

s = ytk = yt V1S-11Ut

1h + ytV2Vt2k - ytV1S-1

1Ut1ε (5.7.2)

Ignoring measurement noise (for simplicity), this can be written as:

s =∑i=1

n

si−1 y t v1iu1 i

t h + ∑i=n+1

m

yt v2i v2it k

(5.7.3)

where n is the number of pre-truncation singular values, m is the total number of parameters comprising the vector k, v1i are unit vectors comprising the columns of V1, and v2i are unit vectors comprising the columns of V2. The first summation provides the prediction of minimum error variance while the second summation must be done probabilistically for exploration of uncertainty using, for example, different realizations of k based on expert knowledge. This can be accomplished using a methodology such as null space Monte Carlo; see Doherty (2016). Alternatively, it can be accomplished using methods discussed in chapter 6 which better accommodate the design of simple models.

For the moment, let us focus on the first summation that yields the prediction of minimized post-calibration error variance, so that

s =∑i=1

n

si−1 y t v1iu1 i

t h(5.7.4)

Equation 5.7.4 can be re-written as

s =∑i=1

n

si−1α iu1i

t h(5.7.5)

where

αi = ytv1i (5.7.6)

Each ut1ih in equation 5.7.5 is a single number. It is the value of the observation dataset projected

onto a single unit vector spanning the range space of the model. It can be considered to be the value of a combination of observations that is informative of a combination of parameters v1i used by the model; see Doherty (2015). The extent to which this combination of observations contains information that is relevant to the prediction is determined by the respective αi. Where the v1i which defines the combination of parameters that is uniquely and entirely informed by ut

1ih is nearly orthogonal to the predictive sensitivity vector y, the respective combination of observations lacks information that is pertinent to the prediction. It is also apparent that, regardless of its size, any calibration dataset possesses a finite number of useable pieces of information (this being equal to the number of pre-truncation singular values). Many of these may be of little pertinence to the prediction required of the model.

These concepts can free the designer of a simple model from the imperative of designing a model that is able to replicate all aspects of the historical behaviour of a system whose management it is intended to support. There are some aspects of this behaviour that the simple model must reproduce in order to constrain the prediction that it is required to make. As the same time, there are others aspects of this behaviour that it does not need to reproduce. In other words, the simple

39

model does not need to provide receptacles for the information content of those aspects of historical system behaviour that have no bearing on the prediction that is the focus of its design. As a result, the model does not need to be as complicated as it would need to be if it were being asked to make more than the single prediction. (The analysis can easily be extended to multiple predictions, with similar conclusions.)

Despite the fact that most models are nonlinear, the above considerations are nevertheless salient to the design of any prediction-specific, simple model. The simple model must provide receptacles for information that is salient to the prediction that it supports; it does not need to provide receptacles for other information. This can reduce its complexity and increase its speed of execution. However, in calibrating the model, measurements comprising the calibration dataset may require processing (as do corresponding numbers calculated by the model) so that aspects of this dataset which the model cannot fit (because it provides no receptacles for the information which it contains) do not erode the capacity of the model to fit those aspects of the calibration dataset which it can fit (or provide the visual impression that the model is poorly calibrated).

This strategy of simple model deployment is actually quite common. For example, if predictions required for decision-support pertain to permanent alterations to a groundwater system, these can be made with a steady state model. Despite the fact that the historical behaviour of the system may show seasonal variations, steady state conditions may be assumed for its calibration. While transient data is richer in information than averaged, steady state data, much of this information is not salient to predictions required of the model. Furthermore, a transient calibration process would require a much more complicated model for which the information content of the calibration dataset would have to inform more parameters (storage and recharge parameters). Hence the capacity of the model to make the steady state prediction may be no better for having undergone transient calibration. Instead, a steady state calibration in which model outputs are compared with appropriately processed historical observations can provide the model with predictive abilities of the desired type which are in no way diminished from that of a vastly more complicated model that must undergo a vastly more complex calibration process.

Another means through which the above strategy can be implemented is through formulation of a multi-component objective function (i.e. sum of weighted squared residuals) in which the same data, processed in different ways, comprises the different components of the objective function. Each component of the objective function is weighted to ensure roughly equal visibility at the commencement of the inversion process. The benefits of data transformation prior to fitting, particularly where transformation involves spatial and temporal differencing, were discussed in the chapter 4. As is stated therein, such processing can “orthogonalize out” information that would otherwise be directed to model receptacles that would bias some predictions as the model is calibrated. In general, a modeller is not aware of just how non-optimal his/her model is, nor of the amount of bias that the calibration process will induce in any particular prediction. Nevertheless, this should not stop him/her from taking some precautions against the unwanted side-effects of simplification. By admitting both the original and transformed (in multiple ways) data into the calibration process, some defence against these side-effects is provided. Further defences can then be put into place if initial calibration of the model provides continued evidence of non-optimality of simplification through estimation of parameter values that are highly dubious from an expert knowledge point of view. See Doherty and Welter (2010) and Doherty (2015) for further discussion of this approach.

40

5.8 Model Optimality and Goodness of FitOptimality of simplification was discussed in chapter 4. There it was pointed out that model simplification can be considered as a form of parameter decomposition. Ideally, the model simplification process should perform a role similar to that of singular value decomposition. It should separate parameter space into two orthogonal subspaces such that calibration of the simple model does not entrain real world null space parameter combinations that can bias certain predictions.

Though not often stated in these terms in the modelling literate, many instances of model simplification tend to follow this precept. Hence, for example, an “upper soil moisture store” and a “lower soil moisture store” may form important components of a rainfall-runoff model or a land use model whose domain spans a watershed of large area. Implied in definition of these storages is the notion that the information content of the calibration dataset is sufficient to parameterize each of these separately. If a more complex model were used instead, this same information would flow to similar receptacles that are not defined in these simplistic terms, but are defined using many more spatial parameters that have greater hydrogeological meaning. However few of the greater number of parameters associated with the latter conceptualization of storage mechanisms that prevail in the watershed would be individually identifiable on the basis of the calibration dataset. Nevertheless the post-calibration correlation that these parameters exhibit because of this lack of information would be internal to each storage, or cross between these storages to only a limited degree.

It was also stated in chapter 4 that there is no need for model simplification to be optimal where predictions required of a model are entirely dependent on solution space components of the “real world model”. Of course, a modeller cannot determine whether this is actually the case because he/she has no access to the real world model. However, as has been stated, where a prediction is similar in nature to measurements comprising the calibration dataset this is likely to be the case. This often applies to rainfall-runoff models, except where the latter are asked to make predictions pertaining to extremes of rainfall or drought that are not represented in the calibration dataset; these predictions are likely to exhibit sensitivity to real-world null space parameter combinations.

In some modelling contexts, therefore, optimality of model simplification is not an issue. In these contexts a modeller should seek as good a fit with the calibration dataset as he/she can (with due consideration taken of measurement noise associated with the calibration dataset). Information contained within this calibration dataset is transferred directly to the prediction, with no possibility of predictive bias. However in modelling contexts where there is a possibility of calibration-induced predictive bias, this strategy may not be optimal. In that case a modeller may purposefully seek a misfit with the calibration dataset that is greater than that which would be judged as optimal on the basis of measurement noise in order to avoid bias when using the model to make certain predictions; see figure 4.1. The result will be higher calculated (using the simple model) uncertainty for those predictions than would have been the case if a better fit with the calibration dataset had been attained. However predictive error (which cannot be calculated using the simple model) may be less.

A problem with this approach, however, is that a modeller is often unaware of how great a misfit he/she should seek with the calibration dataset. Sadly, this problem is unavoidable.

It is important to note that this approach to calibration raises the spectre that a model should be “calibrated tightly” for the purpose of making some predictions and “calibrated loosely” for the purpose of making other predictions. This concept is at odds with common modelling practice. Nevertheless it is supported by theory outlined in this document.

41

5.9 Detecting Non-Optimality of SimplificationDespite the fact that optimality of simplification is subject to theoretical characterization, it is difficult to verify in real world modelling practice. Furthermore, for reasons discussed above, it may not matter. Theory presented in chapter 4 shows that when model defects are taken into account, the link between good parameters and good predictions is broken. A non-optimal model may therefore provide unbiased predictions of low posterior uncertainty simply because it can fit the calibration dataset well. At the same time, predictions that are highly sensitive to entrained real-world null space parameter components may be subject to considerable bias. Many predictions will fall between these two extremes. It is thus apparent that non-optimality of simple model design can only be defined for a specific prediction. Hence the same model may be optimal for the making of one prediction but non-optimal for the making of others.

The PREDVAR1C utility available through the PEST suite of software can be used to assess optimality of simplification as it pertains to specific predictions. This utility employs the theory that is developed in chapter 4 of this document. Utilities within the PyEMU suite provide similar functionality; the latter suite was used in studying optimality of simplification by White et al (2015).

While PREDVAR1C and PyEMU are unique in providing these capabilities (the authors know of no other software that provides the same functionality), there are disadvantages associated with their use. In particular:

model linearity is assumed, and use of these packages requires definition of defect parameters (i.e. the pd parameters of

equation 5.3.3), and that sensitivities of calibration-related and prediction-related model outputs with respect to these parameters be calculated.

The latter of the above two points restricts use of these packages to a small range of modelling instances where model defects can be expressed as continuously variable parameters. It does not allow exploration of the effects of categorical simplifications such as course grid spacing, missing model layers, or the choice between a physically-based model and a lumped parameter counterpart to this model (although sometimes it is possible to define continuous surrogate parameters to explore these effects).

A nonlinear alternative to the use of PREDVAR1B and PyEMU is implementation of the methodology presented by Doherty and Christensen (2011) and discussed in section 4.5 of this document. However this mode of model defect analysis is numerically intensive. It requires the construction of a complex model and a complementary simple model. The latter is then calibrated against the former for many parameter realizations of the former. Both are then employed to make one or a variety of predictions for which the effects of model defects on predictive bias are explored. This methodology is more flexible than that provided by linear analysis, and readily accommodates categorical features of model construction like grid cell size, number of layers and modelling approach.

It is the authors’ opinion that further research into the costs and benefits of model simplification, and of ways that simplification can be optimised in different modelling contexts, is urgently needed. With greater industry knowledge of what simple models can achieve, the default modelling condition of “complexity is better” may change. However to achieve such a culture change, modellers will require guidance in many matters related to simple model construction and deployment, including:

simple model specifications for different decision-support contexts; and

42

formulation of appropriate calibration strategies for simple models.

As has been extensively discussed in the present and previous chapters of this document, it is in the nature of simple models that their construction, calibration and deployment must be done in a prediction-specific manner. It is of interest to note that this approach to modelling is somewhat at odds with present day modelling culture in which a large, all-purpose “model for an area” is built to support the making of many different predictions. The model is then reviewed by a third party who assesses the model in terms of its presumed capacity to simulate environmental processes within the study area. If the model is considered satisfactory in this regard, it is then used to make a variety of predictions of future system behaviour under a variety of management scenarios. A more enlightened approach to model construction and assessment would recognize that different models, and/or different approaches to calibration of the same model, may be required for the making of different decision-critical predictions. Modelling methodologies, rather than models themselves, should therefore be the subject of assessment and review.

The following list summarizes a number of features of a modelling methodology that should be assessed in reviewing that methodology for its usefulness in decision-support.

If the prediction required of a model is very similar in nature to at least some members of the calibration dataset, then a good fit with the calibration dataset should be sought (taking into account the level of measurement noise associated with that dataset).

If linear analysis using a simple model suggests that the uncertainty of a prediction will be reduced little through calibration, then prior predictive uncertainty analysis may be worthy of consideration as a surrogate for posterior predictive uncertainty analysis.

When calibrating a model to make predictions which fall between these extremes, a modeller should look for early signs of over-fitting. If parameter values violate reasonableness, and/or if spatial parameters adopt unusual patterns, these may signal that the information content of the calibration dataset as it pertains to a prediction of management interest is being directed towards receptacles that are inappropriate for the making of that prediction.

Overfitting can be avoided through purposefully seeking a “less-than-perfect” fit with the calibration dataset. (When using PEST in “regularisation” mode, this can be implemented through use of an appropriate target measurement objective function.) It can also be avoided through “orthogonalizing out” those components of the calibration dataset for which a simplified model has no, or defective, receptacles. Thus some modes of historical system behaviour are fit, while others are not.

43

6. Avoiding Underestimation of Predictive Uncertainty when Using Simple Models6.1 IntroductionMost of the focus of the previous chapter has been on how to avoid bias when using a simple model to make predictions. A problem with predictive bias is that its existence is generally undetectable. Nevertheless, because it contributes to the potential error of a prediction, it needs to be included in assessment of that potential. As was discussed in chapter 2, estimation of predictive error variance is central to the use of a model in the decision-making context. As a surrogate for predictive uncertainty it is central to evaluation of risk, and hence to ascribing a level of confidence to the assertion that a “bad thing” will not happen if a certain course of management action is taken. The centrality of predictive uncertainty analysis to model use in the decision-making context was underlined in the definition of failure presented in that chapter, namely the occurrence of a type II statistical error wherein the hypothesis of occurrence of an unwanted event is falsely rejected. This type of error can, of course, be avoided by calculating uncertainty intervals which are very wide. However this renders a model unhelpful, for it removes from it the ability to reject any hypotheses at all.

The present chapter focusses on how uncertainty can be quantified when making a prediction using a simple model. In many modelling contexts this will be the greatest challenge facing use of a simple model. This is because a simple model generally employs far fewer parameters than a complex model of the same system. A simple model often dispenses with parameters that cannot be inferred through the history-matching process, and hence belong to the calibration null space. However in many instances of environmental modelling undertaken for decision-support, these are the very parameters whose presence is required for quantification of predictive uncertainty. This applies especially to predictions of expected system behaviour under climatic, land use or other conditions which are different from those which prevailed during calibration of the model. At least some of the parameters (or parameter combinations) to which such a prediction will be sensitive are likely to be uninformed by the calibration dataset. Because they therefore inhabit the calibration null space, they are informed only by expert knowledge. This provides their expected values, the extent to which their values may be different from expected, and spatial, temporal or other correlations which these differences may exhibit. The representation of null space parameter components in a model is therefore crucial to correct inference of predictive uncertainty – not because of their estimability (which is often put forward as the sole criterion for inclusion of a parameter in a model) but because of their lack of estimability.

Equation 3.2.1 (which describes posterior predictive uncertainty) and equation 3.4.20 (which describes post-calibration predictive error) proclaim that the posterior uncertainty/error of a prediction depends on:

the parameters to which it is sensitive; the potential for variability of those parameters (this being a matter of expert knowledge

expressed by the C(k) prior parameter covariance matrix); the extent to which this variability is reduced through history-matching.

Reduction of parameter variability occurs if the calibration dataset contains information which is pertinent to those parameters. However the extent to which this information can effect a reduction in parameter variability depends on the extent to which it is contaminated by measurement noise; the magnitude of measurement noise is described by the C(ε) matrix that is discussed in previous

44

chapters of this document. However the level of model-to-measurement fit attained through calibration of a model is rarely commensurate with that expected from measurement noise; this is especially the case where a model has been simplified – often with consequential reduction in its ability to simulate some nuances of environmental system behaviour. So-called “structural noise” then contributes to model-to-measurement misfit, this being specified by the η vector of equation 5.3.4. In most attempts to estimate posterior parameter and predictive uncertainty, structural noise is treated as if it were measurement noise. It is assigned an (often diagonal) covariance matrix and used in equations such as 3.2.1 and 3.4.20. This is done not because it is a theoretically correct approach to handling structural noise, but because it is easy.

It can be shown that structural noise can indeed be treated as if it were measurement noise provided it is awarded a covariance matrix C(η) that is “correct” in the sense that its structural origins are acknowledged. Methods for accomplishing this in certain instances of parameterization simplification are provided by Cooley (2004), Cooley and Christensen (2006), and Christensen (2017). In most practical modelling contexts, however, this theory is difficult to apply, requiring paired models and many runs of the complex member of the pair. Christensen (2017) shows how correction factors to calculated predictive uncertainty intervals can be applied to accommodate the structural noise contribution to model-to-measurement misfit. However, more often than not, these result in predictive uncertainty intervals that are too wide to be of use.

Attempts to obtain a covariance matrix for structural noise are further hampered by the fact that C(η) is generally singular; see Doherty and Welter (2010). Its non-existent inverse makes development of a suitable calibration weighting strategy difficult; equation 3.4.3 cannot be applied.

In light of the above considerations, it is apparent that problems facing estimation of the uncertainty associated with predictions made by simplified models are considerable. Given the centrality of uncertainty analysis to decision-support, this has the potential to hinder their use in this context. However, for reasons discussed in the previous chapter, quantification of uncertainty of predictions made by complex models faces even greater challenges. Hence there is no alternative but to seek approximate means of uncertainty quantification where simple models are used as a basis for decision-support. The present chapter suggests a few of these means.

6.2 Solution Space Dependent PredictionsAs has already been discussed, where a prediction is sensitive only to solution space parameter components of a notional “real world model”, then that prediction has no null space dependency. Entrainment of null space components of the background real world model through calibration of the simple model does not therefore incur predictive bias. A simple model which is designed to make this kind of prediction can be endowed with a parameterization scheme that is parsimonious enough for its calibration to constitute a well-posed inverse problem. Predictive uncertainty analysis can then proceed using standard regression techniques; see for example Draper and Smith (1998). If the model runs fast enough, methods such as Markov chain Monte Carlo can be employed to sample the posterior parameter probability distribution with a relatively high level of numerical efficiency; see for example Gamerman and Lopes (2006) and Laloy and Vrugt (2012).

Where an inverse problem is well posed then, theoretically, all parameter uncertainty is inherited from noise in the calibration dataset. However, as was discussed in the preceding paragraph, much of this “noise” is likely to be structural in nature so that its stochastic characterisation is illusive. The problem of stochastic characterization of measurement noise becomes particular difficult where the measurement dataset is comprised of time series of quantities such as stream flow and/or water

45

quality which have a very large dynamic range and exhibit considerable temporal correlation. Ways in which this has been addressed include

use of a subjective likelihood function (Beven, 2005; Beven et al, 2008); use of autoregressive moving average (ARMA) techniques (Kuczera, 1983; Campbell et al,

1999; Campbell and Bates, 2001); formulation of a multi-component objective function in which different modes of system

behaviour are given equal visibility so that their information content is exposed to the calibration process (Doherty and Johnson, 2003; Doherty and Welter, 2010; White et al, 2015);

simulation of model errors using a Gaussian process whose specifications are inferred through the calibration process itself (Kennedy and O’Hagan, 2001; Higdon et al, 2005).

When using a model whose outputs are accompanied by structural noise, this noise must be accommodated as “predictive noise” when assessing the total uncertainty of a prediction. The so-called “predictive interval” of a decision-critical prediction must therefore include not just the uncertainty margin of the prediction that arises from uncertainties in estimated parameters. When calculating the range of possible predictions that are compatible with the historical behaviour of the system as recorded in the calibration dataset, this range of possibilities must also include an interval that quantifies limitations in a model’s ability to replicate all nuances of that behaviour. This interval can be calculated using regression theory as outlined by Graybill (1976). However, as Christensen (2017) points out, calculations become much more difficult when the structural nature of model-to-measurement misfit is recognized. Alternatively, it may be possible to construct data-driven, model-error-correcting statistical submodels using modern “big data” processing methods; see, for example, Demissie et al (2009).

6.3 Null Space Dependent PredictionsHistory-matching does little to reduce the uncertainty of a prediction that is sensitive almost entirely to real world null space parameter components. In fact, has been extensively discussed, for a simple model, history-matching may do more harm than good as it may ascribe erroneous values to real world null space parameter combinations that are artificially linked to solution space parameter combinations through the simple model’s parameterization scheme. In Section 5.6 it was suggested that when a simple model (or even a more complex model for that matter) is asked to make a prediction which is largely uninformed by the calibration dataset, history-matching may be dispensed with; the prior probability distribution of the prediction may then be considered as a surrogate for its posterior probability distribution.

While this strategy overcomes inflation of post-calibration predictive error incurred through calibration-induced null space entrainment, other problems remain. In particular:

The simple model must represent all parameters to which the prediction of interest is sensitive despite the fact that many of these will be inestimable; and

A prior uncertainty must be ascribed to simple model parameters; this may be difficult if they are abstract in nature.

These two problems are not independent of each other. A simple model is indeed likely to employ far fewer parameters than a more complex model of the same system. Hence each of its parameters is likely to represent a combination of complex model parameters, and hence temporal/spatial averages of real world properties.

46

Suppose that, either actually or conceptually, simple model parameters p can be formulated from complex model parameters (or real world system properties) k using a linear equation of the type

p = Nk (6.3.1)

For a simple groundwater model, p may represent a handful of zones of piecewise constancy whereas k may represent cell-by-cell hydraulic properties of a complimentary complex model, or point-by-point hydraulic properties of the real world. In this case, N may be considered as an averaging matrix; each of its rows is thus comprised of zeroes except for a range of elements whose values are all 1/n where n is the number of complex model cells, or real-world points, comprising a particular simple model zone that is represented by a single element of p. It would be tempting to use equation 3.1.5 to calculate a covariance matrix for p as

C(p) = NC(k)Nt (6.3.2)

Presumably C(k) is known, as it expresses expert knowledge. Let the sensitivities of a prediction s to parameters p be encapsulated in the vector w. The simple model could then be used to make this prediction, the linear representation of this process being

s = wtp (6.3.3)

The uncertainty variance of the prediction would then be calculated as

σ2s = wtC(p)w = wtNC(k)Ntw (6.3.4)

That this course of action is incorrect can be illustrated using a simple groundwater modelling example. Suppose that the purpose of the simple model is to compute travel time of a contaminant to a receptor, and that the subsurface contains narrow, coarse-grained, alluvial channels set in fine-grained flood plain sediments, and that preferential flow takes place through the former. Representation of these channels is lost where averaging takes place to form broad zones of piecewise constancy. Furthermore, the variability of upscaled permeability in a large zone will be quite small if calculated using equation 6.3.2. Predictive uncertainty will therefore be seriously underestimated as no account is taken of the fact that the contaminant may be transported through a channel.

The mistake in the above approach arises from failure to account for the model defect term; see equation 5.3.5. If hydraulic property averaging takes place to form a large zone of piecewise constancy when the zone in fact contains a high permeability channel (or MAY contain a high permeability channel), then this comprises a defect in the simple model to which the prediction of interest is particularly sensitive but which is unrepresented in calculation of the value of the prediction or in assessment of its uncertainty.

The simple groundwater model would no longer be simple if, in order to rectify this problem, it was modified to represent discrete alluvial channels within a broader host rock. This would probably have to be done stochastically as a modeller may not know where the channels are, or even if they really exist. To retain simplicity of the model while not compromising its purpose, the modeller must introduce defects in a more appropriate way. Hence he/she may assign to zonal permeabilities downstream of a contaminant source values that are more in accordance with those of buried channel alluvium than with those of flood plain sediments. The uncertainties associated with these zonal permeabilities would also be increased to accommodate the fact that the zone may, or may not, contain an alluvial channel. The defect terms that are now implied in construction of the simple

47

model are thus shifted from failing to represent alluvium to failing to represent host material. Underestimation of contaminant travel times will consequentially be avoided.

Alternatively, the weighting scheme used in definition of the N matrix of equation 6.3.1 could be altered from that of simple spatial averaging, to that of weighting in terms of predictive sensitivity. Because the prediction of interest is that of contaminant transport, alluvial channel permeabilities would receive much higher weights in determining upscaled zonal permeabilities than would host rock permeabilities. Upscaled zonal permeabilities, and their upscaled prior uncertainties, would thus be much more appropriate for the prediction required of the model.

Similar considerations should apply to the design of simple models for other purposes. The prediction-specific nature of their design is again apparent.

6.4 Predictions which are Dependent on Both SpacesThis is the most difficult case of all to deal with. Unfortunately it is also the most common case. It pertains to predictions that are only partly informed by the calibration dataset. Hence their uncertainties have the capacity to be reduced through history-matching. However a large amount of uncertainty may remain, this being an outcome of their sensitivity to parameters whose variability is constrained by expert knowledge alone; see equation 3.4.20. Quantification of the uncertainties of these predictions is therefore afflicted by all of problems that have been addressed previously in this chapter. In particular:

The solution space component of predictive error variance (second term of equation 3.4.20) may be difficult to quantify because of difficulties in ascribing a stochastic characterization to structural noise (which is likely to make a significant contribution to model-to-measurement misfit);

The null space component of predictive error variance (first term of equation 3.4.20) may be difficult to quantify because of the abstract nature of model parameters.

A third problem with this modelling context is that this is the context in which calibration-induced predictive bias is most likely to occur.

Suggestions provided in the two previous subsections of the present chapter pertaining to formulation of an objective function and strategic definition of model parameters (and associated defects) can be applied to this case. Strategies which may apply in addition to these include the following.

Because of the need to avoid calibration-induced predictive bias, a modeller may purposefully under-fit the calibration dataset. The contribution to predictive error variance from measurement/structural noise (second term of equation 3.4.20) is thereby increased. Stochastic characterization of this noise should be such that its variance is comparable with the fit that is ultimately attained through the calibration process. The C(ε) matrix should be informed of the level of measurement/structural noise implied by model-to-measurement misfit before being used for quantification of parameter/predictive uncertainty.

With increased misfit, the minimum of the error variance curve of figure 3.3 is shifted to the left. The dimensionality of the calibration null space is consequentially increased. The role of C(p) in contributing to predictive uncertainty is therefore increased. A modeller may then decide to allow greater variability in C(p) than would be allowed on the basis of considerations that are encapsulated in an equation such as 6.3.2. This greater variability allows the uncertainty analysis process to award greater variability to the prediction, thereby avoiding a type II statistical error.

48

With values for elements of C(p) and C(ε) increased appropriately in accordance with the above considerations, analysis of parameter/predictive uncertainty could take place using any of a number of methodologies that are designed to undertake calibration-constrained uncertainty analysis. These include:

linear analysis using PEST or PyEMU utilities; Markov chain Monte Carlo (if the simple model runs fast enough, and its parameters are few

enough in number); PEST’s predictive analyser (if the simple model runs fast enough and its parameters are few

enough in number); null space Monte-Carlo; ensemble Kalman Filter/Smoother; direct hypothesis testing (see below).

The presence of “predictive noise” may also require accommodation for reasons already discussed. Alternatively, or as well, an “engineering safety margin” may be added to estimates of predictive uncertainty computed in any of the ways listed above.

6.5 Direct Hypothesis-TestingIt was stated in chapter 1 of this document that the unique features that modelling can bring to environmental decision-support is its ability to test, and maybe reject, hypotheses that bad things will result from certain courses of management action. It can do this by demonstrating the incompatibility of these unwanted events with information contained in expert knowledge (including direct measurements of system properties), and/or with information contained in measurements of the behaviour of the system.

One way in which a hypothesis of management interest can be tested using a model is to draw samples from the posterior parameter probability distribution, run the model using each sample, and then count the number of times (if any) that the unwanted event occurs. This comprises the more-or-less “standard” way to do uncertainty analysis.

Another option is to use highly-parameterized inversion software such as PEST to “observe” the occurrence of the unwanted event in an observation dataset that is expanded from the original calibration dataset by one to include this unwanted occurrence. If it can be established that the unwanted event will occur only if either

the parameter field required for its occurrence cannot support a good fit with the calibration dataset, or

the parameter field required for its occurrence is “unrealistic”

then the hypothesis of occurrence of the bad thing can be rejected on the basis of incompatibility with the two types of information for which the model is a repository. PEST can be asked to undertake this process if run in “Pareto” mode. When run in this mode, the model is initially provided with the minimum error variance parameter field estimated through a previous inversion exercise. The “bad thing” observation is initially given a weight of zero. Over a series of inversion iterations the weight ascribed to the bad thing observation is slowly increased. From data which PEST records on a number of output files which are specific to this mode of its operation, the modeller can decide for him/herself the value of the prediction of management interest at which likelihood of that value diminishes to something approaching zero because of demonstrable incompatibility between that predictive value and either or both of fit with the calibration dataset or reasonableness of the model’s parameter field. In providing this information, PEST traverses the so-

49

called “Pareto front” in which occurrence of the unwanted event is traded off against fit with the calibration dataset and reasonableness of model parameters.

It is important to note that this mode of direct hypothesis testing should not artificially close off predictive possibilities through use of parameter schema that lack the flexibility to introduce nuances of hydraulic property heterogeneity that, on the one hand are not unrealistic, and on the other hand are required for realization of an unwanted event. For example, pilot points may be preferred to zones of piecewise constancy as the spatial parameterization device of choice in critical parts of the model domain when undertaking direct predictive hypothesis-testing in this manner. This is because zones of piecewise constancy may disable the emergence of realistic expressions of heterogeneity that are essential for occurrence of the unwanted event. A model’s capacity to explore predictive possibilities is therefore considerably reduced through use of zonal parameters. See Fienen et al (2010) for a further discussion of this issue.

Unfortunately, the use of pilot points requires that the model be run many times in order to calculate finite-difference derivatives with respect to parameters associated with these points; these calculations must be repeated during each iteration of the inversion process through which the Pareto front is traversed. A relatively simple model may thus be required for direct hypothesis testing. However, for reasons outlined above, while the model may be simple, its parameterization may need to be locally complex.

In principle, use of PEST in “Pareto” mode allows a rigorous confidence limit to be associated with rejection of the hypothesis of occurrence of an unwanted event; see Moore et al (2010). In practice, for reasons already discussed, specification of the stochastic properties of measurement noise and prior parameter variability (these are encapsulated in the C(ε) and C(p) matrices discussed above) will probably have a subjective component. By gradually placing greater and greater weight on the observation of occurrence of the prediction of interest, a modeller will be able to witness the degree to which calibration misfit must be incurred, and/or unrealistic heterogeneity must be introduced to a model’s parameter field, in order to support values of the prediction that approach undesirable levels. Rejection of the hypothesis that the event will actually occur is subjectively enabled through PEST’s provision of model outputs and parameter fields at each stage of the Pareto front traversal process. The necessarily subjective nature of risk assessment when enabled through use of a simple model is thereby embraced through providing the modeller with as much information as is needed for him/her to exercise his/her judgement.

The potential that it offers for a modeller to exercise informed subjectivity is a significant strength of the direct hypothesis-testing methodology offered by PEST. As has already been discussed, traditional uncertainty analysis often explores parameter and predictive uncertainty by drawing random samples from the posterior parameter probability distribution; the model is then run using each such sample. It is important to note that characterization of the posterior parameter probability distribution depends heavily on how the prior parameter probability distribution is defined. In some circumstances the prior parameter probability distribution can be described analytically. In other circumstance (particularly in groundwater modelling), more complex geostatistical descriptions of prior parameter stochasticity are employed. However even the most picturesque realizations of parameters, based on the most complex geological concepts, may fail to include key geological nuances that may enable the occurrence of an unwanted event. In other cases the key geological nuance that promulgates the occurrence of an unwanted event may indeed be compatible with prior stochastic characterization of geological heterogeneity; however it may not be realized in the limited number of samples that are drawn from the prior parameter probability

50

distribution. A strength of direct hypothesis-testing in which an inversion package such as PEST is directed to “make it happen”, is that the parameterization nuance required for the occurrence of an unwanted prediction is brought into existence, and made explicitly visible, as part of a calibration-constrained worst-case-scenario calibration exercise. It is then up to the modeller to judge the reasonableness or otherwise of this nuance.

51

7. Joint Usage of a Simple and Complex Model7.1 IntroductionMuch of the discussion in previous chapters has focussed on use of a simple model in place of a complex model, this allowing use of the complex model to be dispensed with so that a modeller can gain access to the benefits of fast run time and numerical stability that accompany use of a simple model. The dangers of replacing a complex model with a simple model have been outlined. Means by which these dangers can be averted have also been addressed.

We conclude this document with some suggestions of how a simple and complex model can be used together, this perhaps allowing a modeller access to the benefits of both while ameliorating the disadvantages associated with use of either on its own. The discussion is brief. References are provided through which the interested reader may acquire further information on this issue. Some of the suggestions provided in this document have not yet been implemented. The authors consider the use of complex/simple models in partnership an area of fruitful research whose outcomes may have profoundly useful consequences for model-based environmental decision-making.

7.2 Linear AnalysisAs has already been discussed, equations developed for linear analysis in chapter 4 of this document are implemented in programs PREDVAR1C of the PEST suite, as well as in functions available through the PyEMU suite. At the time of writing, literature-documented use of these equations is provided only by Watson et al (2013) and White et al (2015). However the authors are aware of other contexts in which they are being used to explore issues related to parameterization and model simplification. It is anticipated that the outcomes of these studies will eventually be published. They include the following.

PREDVAR1C has been used in geothermal reservoir modelling to inquire into the repercussions of treating a dual porosity system as if it were single porosity. While a dual porosity geothermal reservoir model can be calibrated under both assumptions, it has been found that considerable predictive bias can be incurred if a model neglects dual porosity. Predictive uncertainty may also be seriously underestimated.

In another geothermal modelling application which has implications for other types of spatial models such as groundwater models, the use of various spatial parameterization devices was explored. The outcomes of this research suggest that if an area is faulted, but a modeller does not know the exact locations of these faults, then use of pilot points as a parameterization device can support good calibration, valid uncertainty analysis, and an avoidance of predictive bias. However their use comes at a high computational cost. Zones of piecewise constancy constitute a cheaper parameterization device. However unless fault-specific zones are emplaced at the correct locations, predictive bias may be incurred and predictive uncertainty may be underestimated.

Neglecting along-river horizontal anisotropy when calibrating a ground water /surface water model in which part of the model domain includes an alluvial aquifer, does not normally compromise goodness of fit attained through calibration of that model, as horizontal anisotropy is normally “invisible” to the calibration process. However certain predictions may accrue considerable bias. Furthermore, the uncertainties of these predictions may be seriously underestimated if the model is calibrated under the false assumption of along-river isotropy.

52

Implementation of the linear theory presented in chapter 4 requires that a complex model be constructed, and that families of parameters be identified as “defect parameters”. These parameters may represent hydraulic properties of the modelled system (as is often the case for parameters). Alternatively, they may represent features of an environmental model that are not usually adjusted or estimated, but can nevertheless be considered as somewhat simplified or defective representations of the real world. Particularly important in this regard may be the specifications of model boundary conditions, which often comprise simplistic representations of far more complex environmental stresses and process.

The “paired models” used for linear analysis, based on the equations of chapter 4, are actually the same (complex) model. This model must be designed in such a way that the assignment of defect status to a subset of its parameters results in a simple model that is representative of models used in current modelling practice. However other simplifications, such as a reduction in the number of model layers, or an increase in model cell size, cannot be readily explored using this methodology, as implementation of linear theory requires that model outputs be differentiable with respect to defect parameters. Hence, as has already been stated, though powerful, the range of model defects that can be examined using linear methods is somewhat restricted. Nevertheless studies which employ linear analysis can (and have) provided fruitful insights that can be readily extended to practical modelling applications.

7.3 Predictive ScatterplotsSection 4.5 of this document describes a methodology of paired model usage whereby a simple counterpart to a complex model is repeatedly calibrated to match outputs produced by the latter as it is provided with different realizations of parameters. These realisations are sampled from the prior probability distribution of those parameters. The complex model can employ categorical (and hence non-continuous) parameter fields generated using multiple point or other modern geostatistical methods. Though not implemented so far, realizations may also include categorical features of complex model construction, such as the presence or otherwise of geological features whose dispositions are unknown.

In contrast to linear analysis, the complex and simple models used in this analysis do not need to be related to each other by a set of “defect parameters”. In fact defect parameters do not need to be explicitly defined. Nor do the parameters used by the two models need to be the same. The simple model can employ an entirely different parameterization scheme from that employed by the complex model. In fact, use of a simple, lumped parameter, model as a partner to a far more complex, physically-based, model would allow the latter to be calibrated rapidly to outputs generated by the former. This would expedite considerably the efficiency of this methodology, as repeated calibration of the simple model is the most time-consuming aspect of implementation of this methodology.

The methodology has the strength that it allows quantification of calibration-induced simple model predictive bias, at the same time as it allows for correction of that bias. It supports the making of a prediction of minimum error variance, and quantification of that error variance, at the same time as it provides information that could be used as a basis for simple model design.

Apart from repeated calibration of the simple model, the need to run the complex model using different random parameter fields adds to the computational demands of implementing this methodology. This is especially the case if the complex model takes a long time to run. This may limit the number of random parameter fields that can be tested. This, in turn, may confuse the interpretation of scatterplots yielded by this methodology.

53

Another problem with this methodology (that is common to all methodologies that employ random parameter field generation) is whether the random realizations or hydraulic property fields that it employs are in fact representative of reality. Geological and environmental process “surprises” are encountered in most modelling exercises. If key aspects of model parameterization and/or processes are not represented in the complex model, then this model has defects. The methodology has no ability to explore the ramifications of these defects on predictive bias and uncertainty.

To the authors’ knowledge, implementation of this methodology has been restricted to two publications, namely Doherty and Christensen (2011) and Watson et al (2013). There have been no-real-world applications of which the authors are aware. This is not surprising, given the need to construct two separate models; most modelling budgets would not support this. On the other hand, most modelling budgets support construction of a complex model that is of questionable integrity and of limited use. It is possible that co-production of a simple model for addressing some predictions required in a study area may be a fruitful activity as far as decision-support is concerned. To the extent that construction of the simple model can be informed and tested through conjunctive use of a complex model, the role of both of these models in decision-support may be strengthened. It is also possible that the simple model could be used as a surrogate for the complex model in calibration of the latter. This is further discussed below.

7.4 Surrogate and Proxy ModelsUse of a simplified version of a complex model to expedite calibration and uncertainty analysis of the latter is receiving increasing attention in the literature. Nevertheless it is still far from commonplace in everyday environmental modelling practice. Razavi et al (2012) present a review of applications, with particular attention given to surface and land use modelling. Asher et al (2015) do the same for groundwater modelling.

Surrogate models (as distinct from simplified models) gain maximum utility in parameter estimation and uncertainty analysis contexts where parameters are few in number. They have been used in conjunction with so-called global optimisation methods, or Markov chain Monte Carlo uncertainty analysis methods, where speed of execution is essential because of the need for these methods to undertake many model runs.

To the authors’ knowledge, the only documented uses of simplified and surrogate models in direct partnership with complex models in a gradient-based parameter estimation and uncertainty analysis context are those described by Burrows and Doherty (2014) and Burrows and Doherty (2016). Both of these make use of PEST’s “observation re-referencing” functionality, wherein the simple or proxy model is used for filling of the Jacobian matrix while model runs required for testing parameter upgrades are carried out using the complex model. This strategy vastly reduces the number of complex model runs required for solution of the inverse problem. In the first of these publications parameter estimation (using Tikhonov regularisation) and post-calibration uncertainty analysis (using null space Monte Carlo) was effected in a highly parameterized context involving 600 pilot point parameters; the simpler model used a much coarser numerical grid than the more complex model, this reducing its simulation time to a fraction of that of the latter. In the second of these cases the simpler model was not a model at all; instead a suite of PEST-calibrated polynomial proxies linked each model output used in the calibration process to each of the eight parameters that required adjustment during calibration and subsequent calibration-constrained uncertainty analysis.

It is of interest to note that the use of so-called “super parameters” by PEST when implementing “SVD-assisted” parameter estimation can be considered as a form of model simplification. In this application, simplification is restricted to definition of parameters. However such simplification can

54

be considered to be optimal according to concepts presented in this document, as it is based on singular value decomposition of a complex parameterization scheme.

7.5 Suggested Improvements to Paired Simple/Complex Model Usage7.5.1 GeneralIt is considered that there is ample room for more innovative use of simple models to expedite calibration and uncertainty analysis of complex models. For example, a Jacobian matrix, calculated using a simpler version of a complex model, could be used for definition of super parameters employed for calibration of the latter. To expedite run times, the simple model could, for example, implement particle-tracking instead of solving the advection-dispersion equation to simulate movement of contaminants, or use the SWI package of Barker et al (2013) to simulate salt water intrusion instead of a three-dimensional groundwater model which implements density-dependent flow. The role of super parameters is to span the calibration solution space; approximations used in calculation of sensitivities should not affect definition of this space. Once a limited number of super parameters are synthesised from base parameters, calibration and uncertainty analysis could then be undertaken using the complex model. In doing this, the numerical burden of parameter estimation and calibration constrained uncertainty analysis would be considerably reduced through use of the smaller number of parameters.

There is a potential for very large efficiencies in calibration and calibration-constrained uncertainty analysis to be realized for complex models if simple models with simple parameterization schemes can be used as partial or total substitutes for them in calculation of the Jacobian matrix and/or in evaluation of super parameter sensitivities where super parameters are actually simple model parameters. A few possible options are now discussed.

7.5.2 Option 1: Using the Simple Model for Derivatives CalculationIf h from equation 5.3.1 is equated to h from equation 5.3.3, and measurement noise is ignored, we have

Xp + Xdpd = Zk (7.5.1)

If we are only concerned with derivatives of adjustable parameters the second term on the left disappears, so that

Xp = Zk (7.5.2)

For simplicity, suppose that the simple model possesses enough parameters to allow a good fit to be achieved with the calibration dataset h, and that calibration of this simple model can be formulated as a well-posed inverse problem. With the inverse problem for estimation of p being well posed, an equivalent p to a complex parameter set k can be derived using the equation

p = (XtX)-1XtZk = Nk (7.5.3)

where, obviously,

N = (XtX)-1XtZ (7.5.4)

(The above equation can be easily altered to accommodate measurement weights; however this is not done for the sake of notational efficiency.) If the matrix (XtX)-1XtZ were available, then the simple model could be used for calculation of sensitivities for the complex model. On each occasion that an element of k was varied for the purpose of finite-difference derivatives calculation, then an

55

equivalent p vector could be calculated using equation 7.4.3; the simple model would then be run to compute changes in those model outputs which correspond to the calibration dataset h.

Unfortunately, calculation of Z for use in equation 7.5.3 may be numerically intensive as it requires as many model runs for filling of this matrix as there are adjustable parameters employed by the complex model. In some modelling contexts this may be considered a small price to pay in order to gain access to simple-model-based super parameters for implementing the actual inversion process. However the large model would need to be numerically stable so that finite-difference derivatives have integrity. Also, the methodology would require that the complex model not display too much nonlinearity with respect to parameters, so that the (XtX)-1XtZ matrix is useable for at least a few iterations of the inversion process. If it were only useable for a single iteration, then the simple model would not be required as the Z matrix could be used indirectly as a basis for computation of parameter upgrades.

Once the model was calibrated, standard null space Monte Carlo techniques could be employed for generating realisations of k that fit the calibration dataset while exploring the null space of Z. Generation of these parameter sets could be based on singular value decomposition of Z. (Definition of the null space would, once again, rest on an assumption of the integrity of Z when calculated using finite difference derivatives.) Adjustment of random parameters sets to respect calibration constraints could be done using the simple model and the p parameter field in the manner described above. If this methodology was successful in adjusting nearly-calibration-constrained realisations of k such that the complex model respects the calibration dataset h, then large efficiency gains in making this adjustment would be realized.

7.5.3 Option 2: Modifications to Accommodate Complex Model Numerical ProblemsIn practice, calculation of Z for a large model with many parameters may incur a large computational burden. An alternative option is to approximate N through random field generation. Many realizations of k could be generated, and the complex model run each time. The simple model could then be calibrated against the h vector produced by this model to compute an equivalent p parameter set. After enough model runs had been undertaken, an empirical correlation matrix C(p,k) could be constructed. (It is possible that where the solution space of the large model is small, the number of runs required for reliable construction of this matrix may be smaller than that required for filling of the Z matrix as required by option 1 above.) Then, using the relationship

C(p,k) = NC(k) (7.5.5)

an approximation to N could be calculated as

N = C(p,k)C-1(k) (7.5.6)

This N could be used in the complex model calibration process in the manner discussed above. It may also be able to support calibration constrained Monte Carlo analysis, as the null space of N would be equivalent to the null space of Z. Singular value decomposition could thus be undertaken on N to generate realisations of null space parameter combinations of k which could then be added to the parameter field k of the calibrated model. Adjustment to respect calibration constraints could then proceed as above.

7.5.4 Option 3: Direct Adjustment of Random Parameter FieldsThis option has some resemblance to the Ensemble Kalman Smoother. However, in the spirit of methodologies such as that described by Chen and Oliver (2013) it attempts to lend efficiency to the

56

method by using regression techniques. In the present case, efficiency gains would also be realised through use of a complementary simple model.

In a similar fashion to option 2, a suite of complex model parameter fields is generated using the prior probability distribution of complex model parameters. The matrix N of equation (7.5.6) is also obtained as previously. However instead of using this matrix to obtain a minimum error variance parameter field k, the random k parameter fields would be themselves adjusted to conform to calibration constraints. This would be effected using the simple model for calculation of derivatives. (Here lies the distinction with the Kalman smoother where random parameter field generation is used to calculate C(h,k), the matrix which expresses correlation between model parameters and model outputs which correspond to measurements of system state comprising the calibration dataset.) Testing of updated model parameter fields would require that complex model runs be carried out. A simple model p counterpart to reach revised random k could then be obtained through (supposedly rapid) calibration of the simple model against this h-counterpart. The N matrix could then be updated. At the same time, random generation of more k fields could take place with an evolving posterior parameter covariance matrix, calculated using a modified form of equation 3.4.15.

57

8. ConclusionsThe use of models is now ubiquitous in environmental decision-support. However the support that they provide to the decision-making process is often far from optimal – and sometimes even counter-productive. There is a tendency for those who commission the building of models to request that models be complicated. This is done in recognition of the complex nature of environmental systems. Logic dictates, so the argument goes, that if models are to simulate these systems with integrity, then they must also be complicated.

No model can be as complex as the environmental system that it purports to simulate. Every model is simple. Problems are inevitably encountered as a modeller attempts to add complexity to his/her simulator in order that another party who views the model will consider it to be an acceptable simulator of environmental processes, unsullied by approximations that make the distinction between the simulator and the real world too plane to the naked eye.

Those who have attempted to construct complex models are sadly familiar with the unsatisfying nature of this task. Their run times are long. They are numerically delicate. Their fit with an observation dataset is often poor notwithstanding their complexity. Attempts to improve that fit are often met with frustration, whether these attempts are made manually, or employ high-end inversion software. Basic mathematics shows that, even if a good fit with a field dataset can in fact be attained, there are an uncountable number of other ways to obtain the same level of fit using other parameters. In most cases of complex model construction, a modeller cannot even be sure that the manner in which his/her fit with the measurement dataset was obtained is of minimum error variance. Yet important predictions are made using that model and the single parameter field which is deemed to “calibrate” the model. The model is too big, and the budget is too small, to try to find other parameter sets that also fit the calibration dataset, and that can be used to explore the potential for wrongness in decision-critical predictions made by the complex model.

Conceptually, complex models provide a mechanism for a modeller to understand environmental processes. Conceptually, they can be used to explore the range of possibilities that are compatible with expert knowledge of the set of processes that are operative at a specific study site and the hydraulic properties which govern these processes. In doing so, they have the potential to contribute much to environmental management. However the role which they are often forced to play is very different from this. Expectations are that they can be used as surrogates for a real world system. As such, different scenarios for management of that system can be tested on them; that for which a model calculates favourable management outcomes can then be adopted for the real world.

Years of collective modelling experience, supported by basic mathematics, demonstrates that this is not the correct way to view models that are built as a basis for decision-support. While they do indeed have the capacity to provide such support, the notion that they can be used as surrogates for a real-world environmental system must be abandoned if this capacity is to be realized. The world is too complex, its properties are too uncertain, and its details are too heterogeneous for this to be the case.

Instead, models should be viewed as scientific instruments, constructed (like any other scientific instrument) to conduct carefully designed experiments at study sites where a great deal is unknown, but where attempts are nevertheless being made to learn more about the system in order to support proper management of it. Their construction, deployment and calibration must be such that they can extract information from the historical behaviour of that system that is most pertinent to its future management, and that they can store this information in ways that are easily accessible

58

when weighing up the merits of competing management scenarios. All of this must be done in full recognition of what current computing technology can provide, and of the computing resources available to those who manage a particular site.

In short, when used in the decision-making context, models should be considered as receptacles for information – information which can be used to test hypotheses of interest to those who manage environmental systems. Decisions pertaining to management of those systems will never be made with certainty. Hence the decision-making process is best served when a model can provide an environmental manager with an assessment of the risk that he/she may be making a wrong decision. When used according to this precept, it is immediately apparent that the modelling process is far more important than any model that will serve that process. It is also apparent that this process must be capable of recognizing and, if possible, quantifying the uncertainties that determine the context of real-world environmental management.

The assumption underpinning much modern-day modelling practice that a single, complex, simulator will answer all questions that a manager will ask, and provide all information that he/she needs to know, is completely unsupported – either by modelling history or by logic. Instead, it is apparent that environmental management is best served by a suite of models, each optimized in its ability to provide receptacles for certain types of information, and each able to deliver that information to the decision-maker in a manner that best supports risk assessment – the vital ingredient of decision-making in all fields of human endeavour. Some of these models may be complex. Many will be simple, this enhancing their utility in uncertainty assessment and risk analysis as it pertains to some aspect of a system for which scientifically-based management is required.

Simplicity in modelling brings lightness of step and flexibility, both in terms of what a simple model can achieve, and in terms of what a modeller can achieve when using that model. A simple model can serve a modeller well, thereby allowing him/her to serve his/her clients/stakeholders well. The same can rarely be said of a complex model. Far too often a complex model becomes a modeller’s master, commanding the modeller to do whatever is necessary for its capricious numerical fancies to be served, with numerical nonconvergence being the punishment for failure to provide satisfaction. The close attention to numerical detail that the complex model relentlessly demands moves the gaze of the modeller from the decisions which he/she must support to the unrelenting numerical details required for maintenance of the health of the bloated model. These details have little to do with the real world, and less to do with the problems that must be solved so that the real world is properly managed. An artificial reality is created wherein a modeller must solve a suite of problems that are of little importance while ignoring those that are of over-riding importance.

While the use of simple models is not beset by the same problems, they must nevertheless be used with caution, for simplicity comes at a cost. This document has attempted to outline the costs of simplicity, while providing some suggestions on how they can be assessed and/or minimized. Some of these costs are obvious. In particular, if a model is too simple, then it cannot provide receptacles for information resident in the historical behaviour of a system (i.e. it cannot fit a calibration dataset). Some of these costs are less obvious, but are more insidious. Thus a simple model may be capable of fitting a calibration dataset; however the information that flows from that dataset into the model is directed to receptacles which may corrupt, rather than enhance, the model’s capacity to assess future risks.

In supporting the making of environmental decisions, those who undertake model-based data processing are themselves faced with many decisions. Some of these will pertain to the level of complexity that is required of a model if its role in decision-support is to be realized. The path taken

59

by a particular modeller will almost certainly be subjective; different choices will probably be made by different individuals. However, whatever the path that a modeller chooses to take, he/she should follow that path with a full understanding of where that path may lead, and of where other paths that have not been taken may also have lead.

The idea that a single model can be used to answer all questions is challenged by the ideas and mathematics presented in this document. A simple model may assist in the assessment of some decision-critical risks. A more complex model may be warranted for the assessment of others. In still other cases, it may be necessary to build a number of complementary models with different levels of complexity, that can work with each other so that the contribution that the totality of these models makes to the decision-making process is greater than the sum of what they can individually make. Such sophistication of model usage is comparatively rare, as the intellectual and software tools to support such usage are generally unavailable. It is hoped that the present document can provide some justification for more flexible and adventurous model usage than is generally undertaken at present, and that the making of environmental decisions will benefit from this. If this is the case, software support for facilitated implementation of principles and suggestions espoused herein will naturally follow.

60

9. ReferencesAsher,M.J., Croke, B.F.W., Jakeman, A.J. and Peeters, L.J.M., 2015. A review of surrogate models and their application to groundwater modelling. Water Resour. Res. 51 (8), 5957-5973

Aster, R.C., Borchers B. and Thurber, C.H., 2013. Parameter Estimation and Inverse Problems. Second edition. New York: Academic Press.

Bakker M., Schaars, F., Hughes, J.D., Langevin, C.D. and Dausman, A.M., 2013. Documentation of the Seawater Intrusion (SWI2) Package for MODFLOW. U.S. Geological Survey Techniques and Methods, Book 6, Chap. A46, 60 p.

Beven, K., 2005. On the concept of model structural error. Water Sci.Technol., 52(6), 167–175.

Beven, K. J., Smith, P.J. and J. E. Freer, J.E., 2008. So why would a modeller choose to be incoherent? J. Hydrol., 354, 15–32, doi:10.1016/j.jhydrol. 2008.02.007.

Burrows, W. and Doherty, J., 2014. Efficient calibration/uncertainty analysis using paired complex/surrogate models. Groundwater, 53(4), pp531-541.

Burrows, W. and Doherty, J., 2016. Gradient-based model calibration with proxy-model assistance. Journal of Hydrology, 533, 114-127.

Campbell, E.P. and Bates, B.C., 2001. Regionalization of rainfall-runoff model parameters using Markov chain Monte Carlo samples. Water Resour. Res., 37(3), 731-739, doi:10.1029/2000WR900349.

Campbell, E. P., Fox, D. R., and Bates, B. C., 1999. A Bayesian approach to parameter estimation and pooling in nonlinear flood event models. Water Resour. Res., 35(1), 211–220, doi:10.1029/1998WR900043.

Chen, Y. and Oliver, D.S., 2013. Levenberg-Marquardt forms of the iterative ensemble smoother for efficient history matching and uncertainty quantification. Comput. Geosci. (17) 689-703.

Cooley, R.L., 2004. A theory for modelling ground-water flow in heterogeneous media. U.S. Geological Survey Professional paper 1679, 220p.

Cooley, R.L. and Christensen, S., 2006. Bias and uncertainty in regression-calibrated models of groundwater flow in heterogeneous media. Adv. Water. Resour., 29 (5),639-656.

Christensen, S., 2017. Methods to correct and compute confidence and prediction intervals of models neglecting sub-parameterization heterogeneity – from the ideal to practice. Adv. Water. Resour., 100, 109-125.

Dausman, A.M., Doherty, J., Langevin, C.D., and Sukop, M.C., 2010. Quantifying data worth toward reducing predictive uncertainty. Ground Water, 48 (5), 729-740.

Demissie, Y.K., Valocchi, A.J., Minsker, B.S., and Bailey, B.A., 2009. Integrating a calibrated groundwater flow model with error-correcting data-driven models to improve predictions. J. Hydrol., 364, 257-271.

Doherty, J., 2015. Calibration and uncertainty analysis for complex environmental models. Published by Watermark Numerical Computing, Brisbane, Australia. 227pp. ISBN: 978-0-9943786-0-6. Downloadable from www.pesthomepage.org.

61

http://www.pesthomepage.org/

Doherty, J., 2016. PEST: Model-Independent Parameter Estimation. Watermark Numerical Computing, Brisbane, Australia.

Doherty, J. and Christensen, S., 2011. Use of paired simple and complex models in reducing predictive bias and quantifying uncertainty. Water Resourc. Res doi:10.1029/2011WR010763.

Doherty, J. and Johnston, J.M., 2003. Methodologies for calibration and predictive analysis of a watershed model, J. American Water Resources Association, 39(2):251-265.

Doherty, J. and Simmons, C.T., 2013. Groundwater modelling in decision support: reflections on a unified conceptual framework. Hydrogeology Journal 21: 1531–1537

Doherty, J. and Vogwill, R., 2015. Models, Decision-Making and Science. In Solving the Groundwater Challenges of the 21st Century. Vogwill, R. editor. CRC Press.

Doherty, J. and Welter, D., 2010, A short exploration of structural noise, Water Resour. Res., 46, W05525, doi:10.1029/2009WR008377.

Draper, N.R. and Smith, H., 1998. Applied Regression Analysis. John Wiley & Sons, Inc. ISBN 9780471170822.

Fienen, M.N., Doherty, J., Hunt, R.J. and Reeves, H.W., 2010. Using Predictive Uncertainty Analysis to Design Hydrologic Monitoring Networks: Example Applications from the Great lakes Water Availability Pilot Project. USGS Scientific Investigations Report 2010-5159.

Freeze R.A., Massmann J., Smith L., Sperling T., James B., 1990. Hydrogeological decision analysis: 1 A framework. Ground Water 28 (5),738–766.

Gamerman, D. and Lopes, H.F., 2006. Markov Chain Monte Carlo. Stochastic Simulation for Bayesean Inference. Chapman and Hall/CRC, 342pp.

Graybill, F.A., 1976. Theory and Applications of the Linear Model. Duxbury press, North Scituate, Mass., p704.

Higdon, D., Kennedy, M. , Cavendish, J. C., Cafeo, J. A. and Ryne, R. D., 2005. Combining field data and computer simulations for calibration and prediction. SIAM J. Sci. Comput., 26(2), 448–466, doi:10.1137/S1064827503426693.

Kennedy, M. C., and O’Hagan, A., 2001. Bayesian calibration of computer models. J. R. Stat. Soc., Ser. B, 63(3), 425–450.

Kitanidis, P.K., 2015. Persistent questions of heterogeneity, uncertainty, and scale in subsurface flow and transport. Water Resour. Res., 51, 5888-5904, doi:10.1002/2015WR017639.

Koch, K-R., 1999. Parameter Estimation and Hypothesis Testing in Linear Models. Springer. ISBN 9783540652571.

Kuczera, G., 1983. Improved parameter inference in catchment models: 1. Evaluating parameter uncertainty. Water Resour. Res., 19(5), 1151–1172, doi:10.1029/WR019i005p01151.

Laloy, E. and Vrugt, J.A., 2012. High-diimensional posterior exploration of hydrologic models using multiple-try DREAM(zs) and high-performance computing. Water Resources Research, 48 (1), W01526, doi:10.1029/2011WR010608.

Menke, W., 2012. Geophysical Data Analysis: Discrete Inverse Theory. Academic Press.

62

http://dx.doi.org/10.1029/2011WR010608

Moore, C. and Doherty, J., 2005. The role of the calibration process in reducing model predictive error. Water Resources Research. Vol 41, No 5. W05050.

Moore, C., Wöhling, T., and Doherty, J., 2010. Efficient regularization and uncertainty analysis using a global optimization methodology. Water Resources Research. Vol 46, W08527, doi:10.1029/2009WR008627.

Razavi, S., Tolson, B. and Burn, D., 2012. Review of surrogate modeling in water resources. Water Resour. Res., 48: W07401. DOI:10.1029/2011,WR0011527.

Tonkin M., J. and Doherty, J., 2009. Calibration-constrained Monte Carlo analysis of highly parameterized models using subspace techniques, Water Resour. Res., 45, W00B10, doi:10.1029/2007WR006678.

Wallis, I., Moore, C., Post, V., Wolf, L., Martens, E. and Prommer, H., 2014. Using predictive uncertainty analysis to optimise tracer test design and data acquisition. J. Hydrol., 515 pp. 191-204.

Watson, T.A., Doherty, J.E. and Christensen, S., 2013. Parameter and predictive outcomes of model simplification. Water Resourc. Res. 49 (7), 3952-3977. DOI: 10.1002/wrcr.20145

White, J.T., Doherty, J.E. and Hughes, J.D., 2014. Quantifying the predictive consequences of model error with linear subspace analysis. Water Resour. Res, 50 (2): 1152-1173. DOI: 10.1002/2013WR014767

White, J.T., Fienen, M.N., and Doherty, J.E., 2016. pyEMU: A Python framework for environmental model uncertainty analysis. Environ Modell Softw, 85, 217-228.

63

1. Introduction - GNS Science€¦ · Web viewA Theoretical Analysis of Model Simplification John...

Documents

Transcript of 1. Introduction - GNS Science€¦ · Web viewA Theoretical Analysis of Model Simplification John...