Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of...

40
Applications of Bayesian Applications of Bayesian sensitivity and sensitivity and uncertainty analysis to uncertainty analysis to the statistical analysis the statistical analysis of computer simulators for of computer simulators for carbon dynamics carbon dynamics Marc Kennedy Marc Kennedy Clive Anderson, Stefano Conti, Clive Anderson, Stefano Conti, Tony O’Hagan Tony O’Hagan Probability & Statistics, University of Probability & Statistics, University of Sheffield Sheffield

Transcript of Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of...

Page 1: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Applications of Bayesian sensitivity Applications of Bayesian sensitivity and uncertainty analysis to the and uncertainty analysis to the statistical analysis of computer statistical analysis of computer simulators for carbon dynamicssimulators for carbon dynamics

Marc KennedyMarc Kennedy

Clive Anderson, Stefano Conti, Tony O’HaganClive Anderson, Stefano Conti, Tony O’Hagan

Probability & Statistics, University of SheffieldProbability & Statistics, University of Sheffield

Page 2: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

OutlineOutline

Uncertainties in computer simulatorsUncertainties in computer simulators Bayesian inference about simulator outputsBayesian inference about simulator outputs

– Creating an Creating an emulatoremulator for the simulator for the simulator– Deriving uncertainty and sensitivity measuresDeriving uncertainty and sensitivity measures

Example applicationExample application Some recent extensionsSome recent extensions

Page 3: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Uncertainties in computer Uncertainties in computer simulatorssimulators

Consider a complex deterministic code with Consider a complex deterministic code with a vector of inputs and single outputa vector of inputs and single output

Use of the code is subject to:Use of the code is subject to:– Input uncertaintyInput uncertainty– Code uncertaintyCode uncertainty

)(xfy

Page 4: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Input uncertaintyInput uncertainty

The inputs to the simulator are unknown for a The inputs to the simulator are unknown for a given real world scenariogiven real world scenario

Therefore the true value of the output is uncertainTherefore the true value of the output is uncertain A Monte Carlo approach is often used to take this A Monte Carlo approach is often used to take this

uncertainty into accountuncertainty into account– Sample from the probability distribution of XSample from the probability distribution of X

– Run the simulator for each point in the sample to give a Run the simulator for each point in the sample to give a sample from the distribution of Ysample from the distribution of Y

– Very inefficient…not practical for complex codesVery inefficient…not practical for complex codes

Page 5: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Code uncertaintyCode uncertainty

The code output at a given input point is The code output at a given input point is unknown until we run it at that pointunknown until we run it at that point– In practice codes can take hours or days to run, so In practice codes can take hours or days to run, so

we have a limited number of runswe have a limited number of runs

We have some prior beliefs about the outputWe have some prior beliefs about the output– Smooth function of the inputsSmooth function of the inputs

Page 6: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Bayesian inference about Bayesian inference about simulator outputssimulator outputs

Bayesian solution involves building an Bayesian solution involves building an emulatoremulator Highly efficientHighly efficient

– Makes maximum use of all available informationMakes maximum use of all available information– A single set of simulator runs is required to train the A single set of simulator runs is required to train the

emulator. All sensitivity and uncertainty information is emulator. All sensitivity and uncertainty information is derived directly from thisderived directly from this

– The inputs for these runs can be chosen to give good The inputs for these runs can be chosen to give good information about the simulator outputinformation about the simulator output

A natural way to treat the different uncertainties A natural way to treat the different uncertainties within a coherent frameworkwithin a coherent framework

Page 7: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Inference about functions using Inference about functions using Gaussian processesGaussian processes

We model as an unknown function We model as an unknown function having a Gaussian process prior distributionhaving a Gaussian process prior distribution

hh(.) is a vector of regression functions and (.) is a vector of regression functions and are unknown coefficientsare unknown coefficients

)),(,)((~],)([ 22 cNf T βhβ

Prior expectation of the model output as a function of the inputs

)(f

β

Page 8: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Inference about functions using Inference about functions using Gaussian processesGaussian processes

We model as an unknown function We model as an unknown function having a Gaussian process prior distributionhaving a Gaussian process prior distribution

c(.,.) is a correlation function, which defines c(.,.) is a correlation function, which defines our beliefs about smoothness of the output our beliefs about smoothness of the output and is the GP varianceand is the GP variance

)),(,)((~],)([ 22 cNf T βhβ

Prior beliefs about covariance between model outputs

)(f

2

Page 9: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Choice of correlation functionChoice of correlation function

We use the product of univariate Gaussian We use the product of univariate Gaussian functions:functions:

Where is a measure of the roughness of Where is a measure of the roughness of the function in the the function in the kkth inputth input

p

kkkk xxbc

1

2})'(exp{)'( xx,

kb

Page 10: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

roughness = 0.5roughness = 0.5

Page 11: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

roughness = 0.2roughness = 0.2

Page 12: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

roughness = 0.1roughness = 0.1

Page 13: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

roughness = 0.01roughness = 0.01

Page 14: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Conditioning on code runsConditioning on code runs

Conditional on the observed set of training Conditional on the observed set of training runs,runs,

is still a Gaussian process, with simple is still a Gaussian process, with simple analytical forms for the posterior mean and analytical forms for the posterior mean and covariance functionscovariance functions

),( ii fy x ni ,,2,1

)(f

Page 15: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

2 code runs2 code runs

Page 16: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

2 code runs2 code runs

Page 17: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

2 code runs2 code runs

Large b

Small b

Page 18: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

3 code runs3 code runs

Page 19: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

5 code runs5 code runs

Page 20: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

More about the emulatorMore about the emulator

The emulator The emulator meanmean is an estimate of the is an estimate of the model output and can be used as a surrogatemodel output and can be used as a surrogate

The emulator is much more…The emulator is much more…– It is a It is a probability distributionprobability distribution for the whole for the whole

functionfunction– This allows us to derive inferences for many This allows us to derive inferences for many

output related quantities, particularly integralsoutput related quantities, particularly integrals

Page 21: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Inference for integralsInference for integrals

For particular forms of input distribution For particular forms of input distribution (Gaussian or uniform), analytical forms (Gaussian or uniform), analytical forms have been derived for integration-based have been derived for integration-based sensitivity measuressensitivity measures

– Main effects of individual inputsMain effects of individual inputs

– Joint effects of pairs of inputsJoint effects of pairs of inputs

– Sensitivity indicesSensitivity indices

Page 22: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Example ApplicationExample Application

Page 23: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Sheffield Dynamic Global Sheffield Dynamic Global Vegetation ModelVegetation Model (SDGVM) (SDGVM)

Developed within the Centre for Terrestrial Developed within the Centre for Terrestrial Carbon DynamicsCarbon Dynamics

Our job with SDGVM is to: Our job with SDGVM is to: – Apply Apply sensitivity analysis sensitivity analysis for model testingfor model testing

– Identify the greatest sources of uncertaintyIdentify the greatest sources of uncertainty

– Correctly reflect the uncertainty in predictionsCorrectly reflect the uncertainty in predictions

Page 24: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Plant respiration

Photosynthesis

Loss

Soil respiration

Loss

– Terrestrial carbon source if NEP is negative

– Terrestrial carbon sink if NEP is positive

Net Ecosystem Production

(CARBON FLUX)

Page 25: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Some Inputs ParametersSome Inputs Parameters

Leaf life spanLeaf life span Leaf areaLeaf area Budburst temperature Budburst temperature Senescence temperatureSenescence temperature Wood densityWood density Maximum carbon storageMaximum carbon storage Xylem conductivityXylem conductivity

Soil clay %Soil clay % Soil sand %Soil sand % Soil depthSoil depth Soil bulk densitySoil bulk density

Page 26: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Main Effect: Leaf life spanMain Effect: Leaf life span

100 150 200 250 300 350

leaf life-span

01

02

03

0

me

an

NE

P

If leaves die young, NEP is predicted to be higher, on average. Why?

Page 27: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Main Effect: Leaf life span (updated)Main Effect: Leaf life span (updated)

100 150 200 250 300 350

leaf life-span

05

10

15

20

25

30

Me

an

NE

P

If leaves die young, SDGVM allowed a second growing season, resulting in increased carbon uptake. This problem was fixed by the modellers

Page 28: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Main Effect: Senescence TemperatureMain Effect: Senescence Temperature

4 5 6 7 8 9 10

senescence

01

02

03

0

me

an

NE

P

Small values mean the leaves stay until the temperature is very low

Large values mean the leaves drop earlier, so reduce the growing season

Page 29: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

When soil bulk density was added to the active parameter set, the Gaussian Process model did not fit the training data properly

Page 30: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Error discovered in the soil moduleError discovered in the soil module

NEP

-20

0

20

40

60

80

0 500000 1000000 1500000

NEP

-20

0

20

40

60

80

0 500000 1000000 1500000

Before… After…

Bulk density Bulk density

Page 31: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Our GP model depends on the output being a smooth function of the inputs. The problem was again fixed by the modellers

Page 32: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

SDGVM: new sensitivity SDGVM: new sensitivity analysisanalysis

Extended sensitivity analysis to 14 input Extended sensitivity analysis to 14 input parameters (using a more stable version)parameters (using a more stable version)

Assumed uniform probability distributions Assumed uniform probability distributions for each of the parametersfor each of the parameters

The aim here is to identify the greatest The aim here is to identify the greatest potential sources of uncertaintypotential sources of uncertainty

Page 33: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

160 170 180 190 200

max. age (years)

150

160

170

180

190

1.8 2.0 2.2 2.4 2.6

water potential (M Pa)

150

160

170

180

190

160 180 200

leaf life span (days)

150

160

170

180

190

0.0035 0.0040 0.0045

minimum growth rate (m)

150

160

170

180

190

NE

P (

g/m

2 /y)

NE

P (

g/m

2 /y)

Page 34: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Leaf life span 69.1% by investing effort to learn by investing effort to learn more about this parameter, more about this parameter, output uncertainty could be output uncertainty could be

significantly reducedsignificantly reduced

Minimum growth rate 14.2%

Water potential 3.4%

Maximum age 1.0%

Percentage of total output variance

Page 35: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Extensions to the theoryExtensions to the theory

Page 36: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Multiple outputsMultiple outputs

So far we have created independent So far we have created independent emulators for each outputemulators for each output– Ignores information about the correlation Ignores information about the correlation

between outputsbetween outputs We are experimenting with simple models We are experimenting with simple models

linking the outputs togetherlinking the outputs together This is an important first step in treating This is an important first step in treating

dynamic emulatorsdynamic emulators and in and in aggregating code aggregating code outputsoutputs

Page 37: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Dynamic emulatorsDynamic emulators

Physical systems typically evolve over timePhysical systems typically evolve over time Their behaviour is modelled via dynamic Their behaviour is modelled via dynamic

codescodes

– wherewhere xx are tuning constants andare tuning constants and zztt are context-are context-specific driversspecific drivers

– Recursive emulation ofRecursive emulation of yytt over the appropriate over the appropriate time span shows promising resultstime span shows promising results

),,( 1 ttt zxyfy

Page 38: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

CENTURY output ( ) and dynamic emulator ( )

Page 39: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Aggregating outputsAggregating outputs

Motivated by the UK carbon budget problemMotivated by the UK carbon budget problem– The total UK carbon absorbed by vegetation is a sum of The total UK carbon absorbed by vegetation is a sum of

individual pixels/sitesindividual pixels/sites

– Each site has a different set of input parameters (e.g. Each site has a different set of input parameters (e.g. vegetation/soil properties), but some of these are vegetation/soil properties), but some of these are correlatedcorrelated

This is a multiple output codeThis is a multiple output code– Each site represents a different outputEach site represents a different output

Bayesian uncertainty analysis is being extended, Bayesian uncertainty analysis is being extended, to make inference about the sumto make inference about the sum

Page 40: Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

ReferencesReferences

For Bayesian analysis of computer models:For Bayesian analysis of computer models:– Kennedy, M. C. and O’Hagan, A. (2001). Kennedy, M. C. and O’Hagan, A. (2001).

Bayesian calibration of computer models (with Bayesian calibration of computer models (with discussion) J. Roy. Statist. Soc. B, 63: 425-464discussion) J. Roy. Statist. Soc. B, 63: 425-464

For Bayesian Sensitivity analysis:For Bayesian Sensitivity analysis:– Oakley, J. E. and O’Hagan, A. (2004). Oakley, J. E. and O’Hagan, A. (2004).

Probabilistic sensitivity analysis of complex Probabilistic sensitivity analysis of complex models: A Bayesian approach. J. Roy. Statist. models: A Bayesian approach. J. Roy. Statist. Soc. B, 66: 751-769Soc. B, 66: 751-769