Data Assimilation - LUTData assimilation for nonlinear dynamical models Matylda Jab lonsk a Data...

Empirical modellingMathematical Modelling

Data Assimilation

Matylda Jab lonska

Laboratory of Applied MathematicsLappeenranta University of Technology

University of Dar es Salaam, June 2013

Matylda Jab lonska Data Assimilation


Overview

1 Empirical modellingLeast Squares FittingCurve FittingRegression modelsCreation of data: Design of Experiments

2 Mathematical ModellingParameter estimation for nonlinear dynamical modelsData assimilation for nonlinear dynamical models



Least Squares FittingCurve FittingRegression modelsCreation of data: Design of ExperimentsExperimental plans for various purposes

Overview






Contents

Empirical models

Visualization of data

Calibration of models: Least squares

Fitting of a straight lineCurve fittingRegression models: several independent variablesDesign of experiments




Empirical ModelsWe consider here situations where the model that describes thedependency between X and Y ,

X −→ YInput response

is constructed by empirical data only. If the input values consist ofp variables x1, ..., xp and we have n experimental measurementsdone, the data values may be expressed as a design matrix

X =

x1 x2 . . . xp

x11 x12 . . . x1p

x21 x22 . . . x2p...

.... . .

...xn1 xn2 . . . xnp

,




Visualization of dataAll data contains measurement noise. But we should avoid usingdata that contains obvious ’big errors’, outliers due to typingerrors, malfunctioning instruments, etc. Before any modelling, it isadvisable to check the data by visual plots.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

1

2

3

4

5

6

X

Y

OUTLIER




Suppose we want to fit a line y = b0 + b1x to data that wasmeasured at points xi , yi ,i = 1, 2, ..., n. We must find values forthe model parameters b0, b1 so that the values computed by themodel and the measured values ’agree’ as closely as possible. Themost common way of doing this is to construct the least squares,LSQ, function

`(b) =n∑

i=1

(yi − (b0 + b1xi ))2

and to find values b0, b1 that minimize this expression.




More generally, a model often is written in the form

y = f (x , b) + ε

where x and y are input and response, b denotes the vector ofunknown parameters, and ε represents the measurement noise.The LSQ function then assumes the form

`(b) =n∑

i=1

(yi − f (xi , b))2

and again we have to find the parameter values that minimize thesum. The minimization is generally performed by numericaloptimization algorithms. In some cases, typically with empiricalmodels, we can derive formulas that directly calculate the LSQ fit.




Fitting of a straight lineLet us return to the case, where the model is given by a straightline f (x , b) = b0 + b1x .

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.5

1

1.5

2

2.5

3

3.5

X

Y

DATA

MODEL

The coefficients for minimizing the LSQ can now be explicitlycalculated. Let us yet consider separately the special cases whereone of the coefficients vanishes.




If f (x , b) = b0, the LSQ function reads as

`(b) =n∑

i=1

(yi − b0)2

The necessary condition for a minimum is that the derivative withrespect to b0 vanishes. We get the condition

d

db`(b) =

n∑i=1

2(yi − b0)(−1) = −2n∑

i=1

yi + 2n∑

i=1

b0 = 0

from which b0 assumes the formula b0 =∑n

i=1 yin . We see that if we

fit a constant value to the data, the best LSQ fit is the mean valueof the data.




Goodness of fit, the R2 valueThe result is not just a math exercise, but gives a valuable tool forassessing the goodness of fit of a model to data. The ’model’f (x , b) = constant is the simplest possible, and lacks anyexplanatory power. But so we can use it as a comparison case tosee how well our ’real’ model works: generally, it should fit to thedata clearly better than the trivial, constant model. Thecomparison is commonly written in the form of the R2 value:

R2 = 1−∑

(y i − f (x i , b))2∑(y i − y)2

.

Here y denotes the mean value of y , the fit by a constant. If ourmodel fits the data better than the mean value, R2 should b closeto the value 1. If our model is as bad as the mean value, R2 isclose to the value 0. As a rule of thumb, a R2 value between 0.7and 0.9 is often regarded good enough for an empirical model.




Next, consider a straight line that goes through the origin ,f (x , b) = b1x . Now the LSQ function reads as

`(b) =n∑

i=1

(yi − b1xi )2

Again, we compute the derivative with respect to the parameter,now b1, and get

d

db`(b) =

n∑i=1

2(yi − b1xi )(−xi ) = −2n∑

i=1

yixi + 2b1

n∑i=1

x2i = 0

from which we get for b1 =∑n

i=1 xiyi/∑n

i=1 x2i .




Using the vector-matrix notations, the result may be written in acompact form. Recall the definitions of a transpose of a vector andthe inner product of to column vectors x = (x1, x2, ..., xn),y = (y1, y2, ..., yn):

x ′y =n∑

i=1

xiyi

(a special case of a matrix-matrix product). If y = x , we have

x ′x =n∑

i=1

xixi

So the previous result for b1 can be written as

b1 =x ′y

(x ′x)= (x ′x)−1x ′y

.Matylda Jab lonska Data Assimilation



Finally, let us consider the full equation of a line,f (x , b) = b0 + b1x , and suppose we have measured the responsevalues y = (y1, y2, ..., yn) at the points x = (x1, x2, ..., xn). Usingagain the vector-matrix notations, we may write the equationsyi = b0 + b1xi , i = 1, 2, ...n as one matrix equation

y =

1 x1

1 x2...1 xn

×(b0

b1

)= Xb

By analogy to the previous case (we skip a detailed derivationhere) we may write the LSQ solution as b = (X ′X )−1X ′y . Weshall later see that this formula is valid for models with severalinput variables x , too.




Fitting of a polynomialIf the data clearly is nonlinear – it is curved, exhibits a maximumor minimum, etc – we must drop the straight line and seek forother functions to be fitted. A first extension might be apolynomial of 2. degree, a parabola,

y = b0 + b1x + b2x2

We note that this model is nonlinear only with respect to the xvariable, but linear with respect to the unknown coefficientsbi , i = 0, 1, 2. Using again the matrix–vector notations, the aboveexpression may be written as

y = (1, x , x2)

b0

b1

b2




Suppose then that we have again measured some response valuesy = (y1, y2, ..., yn) at the points x = (x1, x2, ..., xn). Using againthe vector-matrix notations, we may write the equationsyi = b0 + b1xi + b2x

2i , i = 1, 2, ...n as one matrix equation

y =

1 x1 x2

1

1 x2 x22

...1 xn x2

n

× b0

b1

b2

= Xb

In this form, the equation is just the same as before, and the samesolution formula applies. So we may write the LSQ solution asb = (X ′X )−1X ′y . Using Matlab, we may also use the shortcutnotation b = X\y




Example

x =0:5 ; x = x ’ ;b = [ 2 −.02 2 ] ” ;X = [ ones ( 6 , 1 ) x x . ˆ 2 ] ;y = X∗b ;p l o t ( x , y )

y n o i s y = y + randn ( 6 , 1 ) ∗ 1 . 5 ;bn = X\ y n o i s yy f i t = X∗bn ;

p l o t ( x , y n o i s y , ’ o ’ , x , y f i t , ’ r−”)




Example of the effect of an outlier in linear regression

x =0:10; x = x ’ ;X = [ ones ( 1 1 , 1 ) x ] ;y = 2∗x + 3 ; % t h e ” t r u e ” s o l u t i o np l o t ( x , y )

% add ing n o i s e to o b s e r v a t i o n sy n o i s y = y + randn ( 1 1 , 1 )∗2 ;bn = X\ y n o i s yy f i t = X∗bn ;p l o t ( x , y n o i s y , ’ o ’ , x , y f i t , ’ r−”)

% th e e f f e c t o f an o u t l i e ry n o i s y ( 5 ) = 1 0 0 ;bn = X\ y n o i s yy f i t = X∗bn ;p l o t ( x , y n o i s y , ’ o ’ , x , y f i t , ’ r−”)




Higher PolynomialsA polynomial of 3. degree, y = b0 + b1x + b2x

2 + b3x3, may

equally well be written as

y = (1, x , x2, x3)

b0

b1

b2

b3

So, to fit a polynomial of 3. degree, we just have to add yetanother column, containing the x3 values, in the matrix X . Again,we arrive at the system y = Xb, and solve it as previously.Warning: high order polynomials may behave curiously, see theexample ’census’ in Matlab!




Example. A chemical component A reacts following anexponential decay law,

A(t) = A0e−k∗t

for time t ≥ 0.

0 1 2 3 4 5 6 7 8 9 100

0.5

1

1.5

2

2.5

3

3.5

4

TIME

A




A polynomial no more is a good choice for a model, if we knowmore in detail how the data should behave: it goes to zero andstays there - but a polynomial never does so. To find the bestparameters A0, k that fit the data, we use methods of CurveFitting.Additional tasks

write the function to be optimized

use some optimization algorithm,

give an initial guess for the optimizer




For these purposes we may use, e.g., the function ’lsqcurvefit’ inMatlab. Below an example:

%c r e a t e data f i r s t :t = 0 : 1 : 1 0 ; % xdatay=4∗exp (−0.75∗ t ) ; % t h e ’ t r u e ’ p r o f i l ey n o i s y = y+randn ( 1 , 1 1 )∗0 . 2 5 ; % ydata , t he ’ e x p e r i m e n t a l ’

% n o i s y data%do t h e f i t :

i n i t g u e s s = [ 1 1 ] ; % th e i n i t i a l g u e s s% f o r t h e o p t i m i z e r

x l s q= l s q c u r v e f i t ( @myfun1 , i n i t g u e s s , t , y n o i s y );% do t he f i ty f i t = myfun1 ( x l s q , t ) ; % compute t h e s o l u t i o n

% w i t h t h e o p t i m i z e d% p a r a m e t e r s

p l o t ( t , y n o i s y , ’ o ’ , t , y f i t );% p l o t both data and f i t




The function we fit is given in a separate ’function’ file with thename, for instance, ’myfun1.m’:

f u n c t i o n F = myfun1 ( t e t a , t )% INPUT p a r a m et e r l i s t :% t e t a t h e p a r a m e t e r s to be e s t i m a t e d% t ’ xdata ’ , h e r e th e t i m e s o f o b s e r v a t i o n

% y = A0∗ exp(−k∗ t ) t h i s i s j u s t a comment rowF = t e t a ( 1 )∗ exp(− t e t a ( 2 )∗ t ) ;




So far we have considered cases where the response only dependson one experimental factor, the time for instance. But it oftendepends on several factors – and, in advance, we do not even knowwhich ones. Regression analysis provides methods for studyingresponses with several independent explanatory factors.Example. We want to optimize the quality y of food prepared inan oven. y depends on two factors, x1, the temperature of the oven, and x2, the time how long we keep it in the oven. There clearly isan optimum: too small or large values of x1, x2 would spoil thefood. How should we find the optimum, with minimal costs?




The question of minimizing the costs – the number of experimentsperformed – leads us to the topic of design of experiments. Wereturn to this question later. Let us first suppose that we have,somehow, decided to make n experiments at the points collected inthe design matrix

X =

x1 x2

x11 x21

x12 x22...

x1n x2n

and have measured the response values y = (y1, y2, ..., yn) in theexperiments. We may now try to fit various models that depend onthe x variables to the data.




A linear model of two variables has the form

y = b0 + b1x1 + b2x2.

Using the vector-matrix notations, the equationsyi = b0 + b1x1i + b2x2i , i = 1, 2, ..., n may be written as a singleequation

y =

1 x11 x21

1 x12 x22...1 x1n x2n

× b0

b1

b2

= X b

Formally, this is the same matrix system we have met before, andwe may write the LSQ solution as b = (X ′X )−1X ′y (in Matlab,use the shortcut notation b = X\y)




The terms of a quadratic modelBut, in our food example, we know that a linear model can notexplain the data: the optimum inside the experimental regionrequires some quadratic terms in the model. We may separatelyconsider the partsThe linear part, the main effects

b0 + b1x1 + b2x2

The products, the interaction terms

b12x1x2

The 2. powers, the quadratic terms

b11x21 + b22x

22




A full quadratic model contains all the above parts,

y = b0 + b1x1 + b2x2 + b11x21 + b12x1x2 + b22x

22 .

If measurement data is available at the points xi , yi , i = 1, 2, ..., n,the equations yi = b0 + b1x1i + b2x2i + b11x

21i + b12x1ix2i + b22x

22i

may again be written as a single equation, just by adding therespective columns for x2

1 , x1x2 and x22 in the design matrix X . The

LSQ solution for the model coefficients b is then obtain just asbefore.So, technically, we know how to calculate LSQ solutions forregression linear or quadratic models with two (or more)independent variables. But the statistical analysis remains: howgood is the model, which model should be selected?




Statistics for regressionThe Residual of a fit is the difference between data and the modelvalues, for regression models given as:

res = y − Xb

These values should be compared to the size of the experimentalerror, the noise in the measurements. In the simple (and mosttypical) case we may suppose that the noise level is the same in allexperiments. The noise should be estimated by replicatedmeasurements: measurements repeated several times in the sameexperimental condition. The size of the noise is expressed as thestandard deviation (std) σ of the repeated measurement values.The basic statistical requirement is that a model should fit thedata with same accuracy as the measurements are obtained, theresidual of the fit should roughly equal the estimated size of themeasurements noise:

std res ' std noise

.




Example

0 0.5 1 1.5 2 2.5 3−0.2

0

0.2

0.4

0.6

0.8

1

1.2

STD(RESIDUALS) = 0.050

STD(REPLICATES) = 0.054

MODELDATAREPLICATES




t–valuesRecall the LSQ estimator for the coefficients of the model,b = (X ′X )−1X ′y . Since the data y contains noise, the values of bare noisy, too.The std of the coefficients may be calculated by the so calledcovariance matrix cov(b) = (X ′X )−1σ2. The diagonal of cov(b)gives the variances of b, so std(bi ) for each coefficient is obtainedas the square root of the diagonal values.The ’signal to noise ratio’, the calculated value of the coefficientdivided by the std of it, is called the t–value of the coefficient:

ti = bi/std(bi )

.




The t–values may be used to select the terms of a model. The ideais that if a term is not known reliably enough, it should be droppedaway from the model. This is the situation if the uncertainty,std(bi ) is large if compared to the calculated value of thecoefficient, bi . As a rule of thumb, terms with

‖ti‖ = bi/std(b)i < 3

should be abandoned.




X = [ ones ( s i z e ( x , 1 ) , 1 ) , x ] ;% add t he ’ 1 ’ columnbhat = X\y ; yhat = X∗ bhat ; % t h e LSQ s o l u t i o nr e s = y−yhat ; % t h e r e s i d u a l ar s s = sum ( r e s . ˆ 2 ) ; % t h e r e s i d u a l sum o f s q u a r e ss2 = r s s /( n−p ) ; % t h e v a r i a n c e o f n o i s e i n ycb = i n v (X’∗X)∗ s2 ; % c o v a r i a n c e o f bsdb = s q r t ( d i a g ( cb ) ) ; % t he s t a n d a r d d e v i a t i o n s o f btb = bhat . / sdb ; % t h e t−v a l u e st s s = sum ( ( y−mean ( y ) ) . ˆ 2 ) ; % t h e t o t a l sum o f s q u a r e sR2 = 1− r s s / t s s ; % Rˆ2 v a l u e




Choice of the experimental regionConcepts:

Operational region: the conditions in which the experimentsreasonably may be carried out

Experimental plan: the set of experiments performed, typicallycovers a subdomain of the operational region

Risks in choosing the region (min /max values) for theexperiments:

Too small → all experiments (almost) replications of onesituation, the effects of factors confounded by experimentalnoise.

Too big → the effects of factors not covered by an (linear orquadratic) regression model




A rule of thumb: the effects due to different values of the factorsshould be 2 – 3 times larger than the size of experimental noise.The selection of min/max values of the factors may beproblematic. The experimenter needs to know a ’reasonable’ regionfor experiments, here the statistical methods (alone) do not help!Suppose we have p factors, x = (x1, ..., xp)T , whose effect on aresponse variable is studied. The values for a set of experimentsare given as a table, Experimental Design.

X =

x1 x2 . . . xp

x11 x12 . . . x1p

x21 x22 . . . x2p...

.... . .

...xn1 xn2 . . . xnp

,




Screening: From a large number of possibly important factors ,we want to find out the most significant ones. Properties of plansto be used:

+ A minimal number of experiments.

- Preliminary experiments, a ’final’ model not yet found

Experiments for a regression model.

+ a model reliably constructed, enables an analysis of the effectsof the factors and, e.g., an optimization of the responsevalues.

- a small number of factors (max 4, 5)




Coded UnitsDesign plans are typically expressed in coded units: the centerpoint of the experimental plan is moved to the origin, the min/maxpoints to the values ±1. If xi denotes the mean value and ∆i

difference between max and min values of the factor xi , thetransformation is given by

Xi =xi − xi∆xi/2

The coded units (X ) give a generic way to present various plans.The values in the real ’laboratory’ units (x) are obtained fromcoded units by solving the above equations.




The most important design plansDifferent designs allow the determination of different models. Sothe design of experiments should be selected according to theexpected behavior of the response:

2N design. Enables the estimation of a linear model (plusinteraction terms, see below)

CCD, Central Composite Design Allows the estimation of afull quadratic model (or response surface)




Replicated measurementsBy the above design, it is possible to a linear model together withthe interaction term, y = b0 + b1x1 + b2x2 + b12x1x2. But weshould have more experiments than unknowns (the coefficients b).As a rule of thumb, the number of experiments should be roughlytwice the number of unknowns.It is always advisable to perform several (e.g. 4–5) measurementsthat are repeated at the same point. From replicates we get anestimate of the noise level of the measurement. Replicates may bedone in many ways, e.g., by performing all experiments twice.Most commonly, the replicated measurements are carried out inthe center point. Then the noise level is easily computed by thestd (standard deviation) of the replicates




Example 2N experiment with 4 repeated measurements in thecenter, in coded units, when N = 2

X =

x1 x2

+1 −1+1 +1−1 −1−1 +10 00 00 00 0




Central Composite Design (CCD) plansA 2N plan may be extended by doing experiments ’One Variable ata Time (OVAT) ”: change only the values of one variable, keepingthe rest constant at the center point.Example with N=2

−1.5 −1 −0.5 0 0.5 1 1.5−1.5

−1

−0.5

0

0.5

1

1.5

2N EXPERIMENTS

OVAT EXPERIMENTS




Example CCD plan with 2 center point replicates, in coded units,N = 2

X =

+1 −1+1 +1−1 −1−1 +1√

2 0−√

2 00

√2

0 −√

20 00 0




The plans enables the construction of a full quadratic modely = b0 + b1x1 + b2x2 + b11x

21 + b12x1x2 + b22x

22 . with the the

main effects, the interaction terms and the quadratic termsRemarks

It is common to do only the ’One Variable at a Time’experiments. This is not recommended: the interactionbetween factors will not be seen.

Experiments may be done in two stages: First a 2N designwith center point replicates. If the center point values give anindication of quadratic behavior (the values are larger/smallerthan at the corner points), the design may be extended byOVAT experiments to the full Central Composit Design




Experimental OptimizationPurpose: optimize the quality of a product, by doing experimentsand analyzing the results by regression methods.The procedure is sequential:

1 Select an experimental plan:

A 2N plan when far from optimumA CCD plan when near to the optimum

Perform the experiments, collect the results in data matrices

Create the regression model

Do (a few) experiments towards the optimum, as guided bythe regression model (’response surface’)

When results deteriorate, go back to step 1



Parameter estimation for nonlinear dynamical modelsA more complicated exampleData assimilation for nonlinear dynamical modelsA Recipe for parameter identification with MatlabTemperature dependency in chemical kinetics

Overview






Example A chemical reaction (or radioactive decay): A→ B → C .Assume that the reaction rates are proportional to the amounts ofthe components A,B. They may then be written as theexpressions k1A, k2B where k1, k2 are the reaction rate constants.Modelling by material balances leads us to a system of ’ordinarydifferential equation’ (ODE):

dA

dt= −k1A

dB

dt= k1A− k2B

dC

dt= k2B

Note that the mass balance always is satisfied:d/dt(A + B + C ) = 0, A + B + C = constant.




In this example, the solution may be obtained integrating ’byhand’. However, typically a solution only is available by numericalmethods. Using MATLAB, this requires the steps

write a m–file (a script file) that gives all the necessary initialinformation and calls an ODE solver.

write a m-file (a function file ) that contains the modelequations.

Note that the ODE solver may remain a ’black box’ for the user -it is usually enough to know just which solver to use.The solver call, as well as the model function file, must be writtenis a specific way, as given in the example (and MATLAB’s helpfiles). The files may be named,e.g., as ’myfirst.m’ and’myfirstode.m’. The solution is obtained by writing the command’myfirst’ in MATLAB.




%SCRIPT f i l e to run t h e ODE s i m u l a t i o n f o r A−>B−>C .s0 = [ 1 0 0 ] ; %i n i t i a l v a l u e s f o r A, B, Ct s p a n = [ 0 , 1 0 ] ; %t ime i n t e r v a lk1 = 0 . 7 ; %model p a ra m e t e rk2 = 0 . 2 ; %model p a ra m e t e r% C a l l o f t he MATLAB ODEsolver ’ ode23 ’ :[ t , s ] = ode23 ( @ m y f i r s t o d e , tspan , s0 , [ ] , k1 , k2 ) ;

%i n p u t s : m y f i r s t o d e th e name o f t h e m− f i l e , where t h e% ODE i s g i v e n% t s p a n t ime i n t e r v a l where s o l u t i o n wanted% s0 i n i t i a l v a l u e s at t ime t=0% [ ] o p t i o n s , empty : not used h e r e% k1 , k2 model p a r a m e t e r s used i n ’ m y f i r s t o d e ’%o u t p u t s : t t he t ime p o i n t s where s o l u t i o n p r e s e n t e d ,% s t h e s o l u t i o n m a t r i xp l o t ( t , s ) % p l o t t h e s o l u t i o n




f u n c t i o n ds = m y f i r s t o d e ( t , s , k1 , k2 ) ;%i n p u t t t h e t ime v a r i a b l e ( not used i n t h i s c a s e )% s th e s t a t e v e c t o r% k1 , k2 model p a r a m e t e r s%output ds t he d e r i v a t i v e ds / dt a t t ime t

A = s ( 1 ) ; %f o r c l a r i t y & r e a d a b i l i t y , w r i t e t h eB = s ( 2 ) ; %model u s i n g t h e n o t a t i o n A, B, C f o r th eC = s ( 3 ) ; %components

dA = −k1∗A ; %t he ODE system e q u a t i o n sdB = k1∗A − k2∗B ;dC = k2∗B ;ds = [ dA ; dB ; dC ] ; %c o l l e c t th e output i n v e c t o r ds




Examples of linear and nonlinear models:

f (x ; θ) = x1θ1 + x2θ2 linear model

f (x ; θ) = θ1exθ2 nonlinear model

Here x denotes the experimental points, θ the parameters to beestimated.In both examples the model is written in an algebraic form, i.e., interms of some ’simple’ formulas. No numerical solvers are thenrequired.A dynamical model is written as an ODE system, and the solutionis obtained by numerical solver.




General form of a modelGenerally, a model may be written in the form

s = f (x , θ, const)

y = g(s)

where

s state

x experimental conditions

θ estimated parameters

const known constants

y the observables

f the model function

g the observation function




Example 1 Consider again the reaction A→ B → C . modelled asthe ODE system

dA

dt= −k1A

dB

dt= k1A− k2B

dC

dt= k2B

The data y consists of the values of (any of) the componentsA,B,C , measured at some sampling instants ti , i = 1, 2, ...n. Theunknowns to be estimated are rate constants, θ = (k1, k2).




Matlab solutionThe parameter estimation will be done by the FMINSEARCHoptimizer. Let us first suppose that only values of B have beenmeasured, with an initial values A(0) = 1.0,B(0) = C (0) = 0. Todo the LSQ fitting, we have to write a script file for initializations,a call of the optimizer, and plots for the solution:

%SCRIPT f i l e f o r commands to c a l l FMINSEARCH o p t i m i z e rc l e a r a l l ; % e l i m i n a t e e a r l i e r d e f i n i t i o n s

% G e n e r a t e t h e t r u e s o l u t i o no d e s o l v e r ;

%t h e s a m p l i n g i n s t a n t s , c o r r e s p o n d to t s p a n i n o d e s o l v e rt = 0 : 1 : 1 0 ;t = t ’% g e n e r a t e n o i s y o b s e r v a t i o n s w i t h n o i s e% p r o p o r t i o n a l to s o l u t i o ny = y + y .∗ randn ( 1 1 , 1 ) ∗ 0 . 1 ;data = [ t y ] ; %data f o r t he f i t t i n g :




% b e g i n pa r a m e t e r i d e n t i f i c a t i o nk1 = 0 . 3 ; % i n i t i a l g u e s s e s o f t he unknownk2 = 0 . 2 ; %p a r a m e t e r s f o r o p t i m i z e rt e t a = [ k1 k2 ] ; %j u s t c o l l e c t i n 1 v e c t o r

%s a m p l i n g i n s t a n t s t and measured Bs0 = [ 1 0 0 ] ; % i n i t i a l v a l u e s f o r ODE




% C a l l t he o p t i m i z e r :t e t a o p t = f m i n s e a r c h ( @my1lsq , t e t a , [ ] , s0 , data ) ;

% INPUT : my1lsq , t h e f i l e n a m e o f t he o b j e c t i v e f u n c t i o n% t e t a , t h e s t a r t i n g p o i n t f o r o p t i m i z e r% [ ] o p t i o n s ( not used )% s0 , data p a r a m e t e r s needed i n ” my1lsq ”% OUTPUT: t e t a o p t , t h e o p t i m i z e d v a l u e f o r t e t a%ODE s o l v e r c a l l e d once more , to g e t t h e o p t i m i z e d s o l u t i o nk1 = t e t a o p t ( 1 ) ;k2 = t e t a o p t ( 2 ) ;

[ t , s ] = ode23 ( @ m y f i r s t o d e , t , s0 , [ ] , k1 , k2 ) ;p l o t ( t , y , ’ o ’ , t , s ) %p l o t t h e data vs s o l u t i o n




The LSQ objective function is coded in the ’my1lsq’ function:

f u n c t i o n l s q = my1lsq ( t e t a , s0 , data ) ;%INPUT t e t a , t h e unknowns k1 , k2% s0 , data th e c o n s t a n t s needed :% s0 i n i t i a l v a l u e s needed by t h e ODE% data ( : , 1 ) t ime p o i n t s% data ( : , 2 ) r e s p o n s e s : B v a l u e s%OUTPUT l s q v a l u et = data ( : , 1 ) ;y o b s = data ( : , 2 ) ; %data p o i n t sk1 = t e t a ( 1 ) ; k2 = t e t a ( 2 ) ;




%c a l l t he ODE s o l v e r to g e t t h e s t a t e s s :[ t , s ] = ode23 ( @ m y f i r s t o d e , t , s0 , [ ] , k1 , k2 ) ;

%t h e ODE system i n ” m y f i r s t o d e ” i s j u s t as b e f o r e%a t each row ( t ime p o i n t ) , s has%t h e v a l u e s o f t he components [ A, B, C ]y c a l = s ( : , 2 ) ; %s e p a r a t e th e measured B

%compute th e e x p r e s s i o n to be min imized :l s q = sum ( ( y obs−y c a l ) . ˆ 2 ) ;




The script ’odesolver’ generates the true solution:

%SCRIPT f i l e to run t h e ODE s i m u l a t i o n f o r A−>B−>C .s0 = [ 1 0 0 ] ; % i n i t i a l v a l u e s f o r A, B, Ct s p a n = [ 0 : 1 : 1 0 ] ; %t ime i n t e r v a l w i t h o b s e r v a t i o n s

%a t e v e r y i n t e g e r v a l u ek1 = 0 . 7 ; %model pa r a m e t e rk2 = 0 . 2 ; %model pa r a m e t e r% C a l l o f t he MATLAB ODEsolver ” ode23 ” :[ t , s ] = ode23 ( @ m y f i r s t o d e , tspan , s0 , [ ] , k1 , k2 ) ;

%i n p u t s : m y f i r s t o d e th e name o f t h e m− f i l e , where t h e% ODE i s g i v e n




% t s p a n t ime i n t e r v a l where s o l u t i o n wanted% s0 i n i t i a l v a l u e s at t ime t=0% [ ] o p t i o n s , empty : not used h e r e% k1 , k2 model p a r a m e t e r s used i n ” m y f i r s t o d e ”%o u t p u t s : t t he t ime p o i n t s where s o l u t i o n p r e s e n t e d ,% s t h e s o l u t i o n m a t r i xp l o t ( t , s ) % p l o t t h e s o l u t i o ny = s ( : , 2 ) ; % e x t r a c t component B as t he o b s e r v e d q u a n t i t y




The function ’myfirstode’ is the same as before:

f u n c t i o n ds = m y f i r s t o d e ( t , s , k1 , k2 ) ;%i n p u t t t h e t ime v a r i a b l e ( not used i n t h i s c a s e )% s t h e s t a t e v e c t o r% k1 , k2 model p a r a m e t e r s%output ds t h e d e r i v a t i v e ds / dt a t t ime tA = s ( 1 ) ; %f o r c l a r i t y & r e a d a b i l i t y , w r i t e th eB = s ( 2 ) ; %model u s i n g t he n o t a t i o n A, B, C f o r t h eC = s ( 3 ) ; %componentsdA = −k1∗A ; %t he ODE system e q u a t i o n sdB = k1∗A − k2∗B ; dC = k2∗B ;ds = [ dA ; dB ; dC ] ; %c o l l e c t t he output i n v e c t o r ds




Chemical kinetics with reactions

A + B → k1C + F

A + C → k2D + F

A + D → k3E + F

Modelled as an ODE:

d [A]

dt= −k1[A][B]− k2[A][C ]− k3[A][D]

d [B]

dt= −k1[A][B]

d [C ]

dt= +k1[A][B]− k2[A][C ]

d [D]

dt= +k2[A][C ]− k3[A][D]

d [E ]

dt= +k3[A][D].




With know initial values A(0), . . . , E (0) the solution f (t, θ) isagain obtained by one of the ODExx ”-solvers of Matlab.

The data: yi the analyzed concentrations at time points ti .

The parameters to be estimated: θ = (k1, k2, k3).

The system is more complicated, but solved using just the sameprocedure as above.




The Matlab solutionThe ODE solution is obtained by a call to a ODE solver

[ t , y ] = ode45 ( ’ odefun ’ , t ime , y0 , [ ] , t h e t a ) ;

where the function ’odefun’ contains the code for the ODE system.The LSQ objective function to be minimized may be written as

f u n c t i o n s s = l s q f u n ( t he t a , data , y0 )t = data ( : , 1 ) ;yobs = data ( : , 2 ) ;[ t , ymodel ] = ode45 ( ’ odefun ’ , t , y0 , [ ] , t h e t a ) ;s s = sum ( ( yobs−ymodel ) . ˆ 2 ) ;

And the optimization – after necessary initializations – by a call

t h e t a o p t = f m i n s e a r c h ( ’ l s q f u n ’ , theta0 , [ ] , data , y0 )




Example: heat transfer. A glass of beer is at t = 0 in temperature T0 in aglass. It will be cooled from outside by water, which has a fixed temperatureTwater. We measure temperatures and get the data (ti ,Ti ), i = 1, . . . , n.Based on this data we want to fit parameters in a model that describes the heattransfer between the the glass (’reactor’) and water (’cooler’). Note that theheat transfer takes place both through the glass, and via the air/water surface:

dT/dt = −k1(T − Twater)− k2(T − Tair)

The solution may be obtained either by an ODE solver, or by integrating theequation by hand:

T (t) = (T0 − Tinf )e−(k1+k2)t + Tinf

HereTair is the temperature of the air, Tinf = (k1Twater + k2Tair )/(k1 + k2) is

the ’steady state’ temperature (T ′ = 0) and k1, k2 are the unknown parameters

to be fitted.




An example fit:

0 5 10 15 20 25 30 35 40 450

5

10

15

20

25

Note the non–ideality of the data.




Examples of linear and nonlinear models:

f (x ; θ) = x1θ1 + x2θ2 linear model

f (x ; θ) = θ1exθ2 nonlinear model

Here x denotes the experimental points. In data assimilation, the”parameter” to be estimated is the initial state of the system,when some later observations are given.A dynamical model is written as an ODE system, just like inparameter estimation, and the solution is obtained by numericalminimization, with the components of the initial state as theindependent variables in minimization.The least squares cost function of data assimilation on a nonlinearmodel may have very many local minima!




Example 1 Consider again the reaction A→ B → C . modelled asthe ODE system

dA

dt= −k1A

dB

dt= k1A− k2B

dC

dt= k2B

The data y consists of the values of (any of) the componentsA,B,C , measured at some sampling instants ti , i = 1, 2, ...n. Theunknowns to be estimated are the initial concentrationsθ = (A(0),B(0),C (0)).




Matlab solutionData assimilation will be carried out again by the FMINSEARCHoptimizer. Let us first suppose that only values of B have beenmeasured, and that the reaction rate coefficients are known to bek1 = 0.7, k2 = 0.2.

%SCRIPT f i l e f o r commands to c a l l FMINSEARCH o p t i m i z e r% data a s s i m i l a t i o n f o r i n i t i a l c o n d i t i o n sc l e a r a l l ;

% G e n e r a t e t h e t r u e s o l u t i o no d e s o l v e r ;

%t h e s a m p l i n g i n s t a n t s , c o r r e s p o n d to t s p a n i n o d e s o l v e rt = 0 : 1 : 1 0 ;t = t ’% g e n e r a t e n o i s y o b s e r v a t i o n s w i t h n o i s e p r o p o r t i o n a l to s o l u t i o ny = y + y .∗ randn ( 1 1 , 1 ) ∗ 0 . 1 ;data = [ t y ] ; %data f o r t he f i t t i n g :




% b e g i n data a s s i m i l a t i o ns0 = [ 0 . 8 0 . 2 0 ] ; %f i r s t g u e s s f o r i n i t i a l v a l u e st e t a = s0 ; %j u s t c o l l e c t i n 1 v e c t o r

% C a l l t he o p t i m i z e r :t e t a o p t = f m i n s e a r c h ( @ a ss i m l sq , t e t a , [ ] , data ) ;

% INPUT : a s s i m l s q , t h e f i l e n a m e o f th e o b j e c t i v e f u n c t i o n% t e t a , t h e s t a r t i n g p o i n t f o r o p t i m i z e r% [ ] o p t i o n s ( not used )% data p a r a m e t e r s needed i n ” my1lsq ”% OUTPUT: t e t a o p t , t h e o p t i m i z e d v a l u e f o r t e t a%ODE s o l v e r c a l l e d once more , to g e t t h e o p t i m i z e d s o l u t i o ns0 ( 1 ) = t e t a o p t ( 1 ) ; s0 ( 2 ) = t e t a o p t ( 2 ) ; s0 ( 3 ) = t e t a o p t ( 3 ) ;[ t , s ] = ode23 ( @ m y f i r s t o d e , t , s0 , [ ] , k1 , k2 ) ;p l o t ( t , y , ’ o ’ , t , s ) %p l o t t h e data vs s o l u t i o n




The LSQ objective function is coded in the ’assimlsq’ function,different from parameter optimization since the unknown isdifferent:

f u n c t i o n l s q = a s s i m l s q ( t e t a , data ) ;%INPUT unknown t e t a , k1 , k2 known% data t h e c o n s t a n t s needed :% data ( : , 1 ) t ime p o i n t s% data ( : , 2 ) r e s p o n s e s : B v a l u e s%OUTPUT l s q v a l u et = data ( : , 1 ) ;y o b s = data ( : , 2 ) ; %data p o i n t sk1 = 0 . 7 ; k2 = 0 . 2 ;s0 (1)= t e t a ( 1 ) ;s0 (2)= t e t a ( 2 ) ;s0 (3)= t e t a ( 3 ) ;




%c a l l t he ODE s o l v e r to g e t t h e s t a t e s s :[ t , s ] = ode23 ( @ m y f i r s t o d e , t , s0 , [ ] , k1 , k2 ) ;

%t h e ODE system i n ” m y f i r s t o d e ” i s j u s t as b e f o r e%a t each row ( t ime p o i n t ) , s has%t h e v a l u e s o f t he components [ A, B, C ]y c a l = s ( : , 2 ) ; %s e p a r a t e th e measured B

%compute th e e x p r e s s i o n to be min imized :l s q = sum ( ( y obs−y c a l ) . ˆ 2 ) ;




The script ’odesolver’ generates the true solution, unchanged fromparameter estimation or modelling:

%SCRIPT f i l e to run t h e ODE s i m u l a t i o n f o r A−>B−>C .s0 = [ 1 0 0 ] ; % i n i t i a l v a l u e s f o r A, B, Ct s p a n = [ 0 : 1 : 1 0 ] ; %t ime i n t e r v a l w i t h o b s e r v a t i o n s a t e v e r y i n t e g e r v a l u ek1 = 0 . 7 ; %model pa r a m e t e rk2 = 0 . 2 ; %model pa r a m e t e r% C a l l o f t he MATLAB ODEsolver ” ode23 ” :[ t , s ] = ode23 ( @ m y f i r s t o d e , tspan , s0 , [ ] , k1 , k2 ) ;

%i n p u t s : m y f i r s t o d e th e name o f t h e m− f i l e , where t h e% ODE i s g i v e n




% t s p a n t ime i n t e r v a l where s o l u t i o n wanted% s0 i n i t i a l v a l u e s at t ime t=0% [ ] o p t i o n s , empty : not used h e r e% k1 , k2 model p a r a m e t e r s used i n ” m y f i r s t o d e ”%o u t p u t s : t t he t ime p o i n t s where s o l u t i o n p r e s e n t e d ,% s t h e s o l u t i o n m a t r i xp l o t ( t , s ) % p l o t t h e s o l u t i o ny = s ( : , 2 ) ; % e x t r a c t component B as t he o b s e r v e d q u a n t i t y




The function ’myfirstode’ is the same as before:

f u n c t i o n ds = m y f i r s t o d e ( t , s , k1 , k2 ) ;%i n p u t t t h e t ime v a r i a b l e ( not used i n t h i s c a s e )% s t h e s t a t e v e c t o r% k1 , k2 model p a r a m e t e r s%output ds t h e d e r i v a t i v e ds / dt a t t ime tA = s ( 1 ) ; %f o r c l a r i t y & r e a d a b i l i t y , w r i t e th eB = s ( 2 ) ; %model u s i n g t he n o t a t i o n A, B, C f o r t h eC = s ( 3 ) ; %componentsdA = −k1∗A ; %t he ODE system e q u a t i o n sdB = k1∗A − k2∗B ; dC = k2∗B ;ds = [ dA ; dB ; dC ] ; %c o l l e c t t he output i n v e c t o r ds




You need four separate routines, namely:

1 ODE step calculator that calculates the differential of thecomponents, using the appropriate ODE for each component

2 ODE solver that sets the initial values and calls a MatlabODE solver, with Routine 1. given as a function parameter

3 Parameter estimator that sets initial guesses for theparameters and calls a Matlab minimizer to find optimalestimates

4 Cost function calculator that calculates syntheticobservations using the current guesses for the parameters, andcomputes their squared difference from the observations




The routines one to four are used in the following pattern:

1 Use Routine 3. to set up a minimization problem with a newguess for the parameters

2 Use Routine 2. to set up the creation of syntheticobservations with the current guesses at the parameters

3 Use Routine 1. to compute the synthetic observations withthe ODE system at hand

4 Use Routine 4. to compute the difference between syntheticand real observations

5 Iterate from 1.




So far the reaction rate parameters have been assumed to beconstants. But chemical reactions depend on temperature – theyonly take place if temperature is high enough. Typically, thedependency is expressed by the Arrhenius law:

k(T ) = A e−E/RT

where T is temperature (in Kelvin), A the amplitude, E theactivation energy, and R the gas constant (R = 8.314 in SI units).To find out how the reaction rate depends on temperature, theparameters A and E should determined by measured data.




Example

Suppose that the rate coefficient values k = 0.55, 0.7, 0.8267 havebeen measured at the temperatures T = 283K , 300K , 313K . Theparameters A,E may be fitted in the usual way by least squares. Aperfect fit is obtained, we may compute the R2 value of the fit,R2 ' 1. But if we compute the R2 value at other parametervalues, we get the contour picture

10 20 30 40 50 60 70 80 90 1006000

7000

8000

9000

10000

11000

12000

13000

0.95

0.950.95

0.950.95

0.5

0.5

0.3

0.3

Contour of R2 values for the fit of Arrhenius parameters A,EMatylda Jab lonska Data Assimilation



We see that the parameters are badly identified: several values on a’banana shape’ region will give an equally good fit (high R2 value).To overcome the situation, it is customary to make a change ofparameters:

k = Ae−E/RT = kmeane−zE

where kmean = Ae−E/RTmean , z = 1/R(1/T − 1/Tmean), and Tmean

is some ’mean’ temperature value, between the minimum andmaximum used in the experiments. Instead of the the original A,Ewe now estimate kmean,E (or E/R, in order to avoid dimensionalerrors).




Now the contour plot for the ’landscape’ of the LSQ objectivefunction becomes nicely roundish, the parameters are wellidentified - and the optimizer more easily will find the best fit.

0.55 0.6 0.65 0.7 0.75 0.8

2000

4000

6000

8000

10000

12000

14000

16000

18000

0.95

0.5

0.3

KMEAN

E

Contour of R2 values for the fit of Arrhenius parameters kmean,EMatylda Jab lonska Data Assimilation



Example: Sterilizing of apple juiceThe Arrhenius type temperature dependency is often used for otherprocesses, too. Here we discuss a biological example. Below is atable of data from an experiment where a bottle of apple juice issterilized by heating

%time t e m p e r a t u r e o f t e m p e r a t u r e o f number o f

t ( min ) j u i c e , T (C) t h e h e at bath , T h microbes , B0 20 94 > 501 . 5 38 94 503 50 94 254 . 5 60 94 36 69 94 17 . 5 78 94 0




To simulate the process we need to assume some model for thedecay rate of the microbes, and how it depends on temperature.Let us use the ’standard’ assumptions: a first order kinetics for thedecay rate, so it is proportional to the number B of bacteriamicrobes, Re = k ∗ B, where the temperature dependency of theparameter k is given by the Arrhenius law.We have to model the temperature T of the juice, too. The heattransfer is given by the equation

mCpdT

dt= UA(Th − T )

where m = 0.330kg is the mass of the juice, A = 0.034m2 is thearea of the bottle, Cp is the specific heat of the juice (we may usthe Cp of water), and U is the heat conductivity parameter.




So we arrive at the model, in a form of an ODE system:

d [T ]

dt= UA(Th − T )

d [B]

dt= −k(T ) ∗ B,

where k(T ) = kmeane−E/R∗(1/T−1/Tmean). Note that the initial

value of bacteria density is not properly known, only a lower boundB > 50 is given. The unknown parameters to be fitted are theconductivity U and the Arrhenius parameters kmean,E . Note alsothat the the reference temperature Tmean may be freely chosen.




Several questions should be taken into account:

should B(0) also be estimated?

could the heat conductivity U be identified independentlyfirst?

is modelling of the heat conductivity necessary, indeed wehave measurements for t,T ,B and might deal with the latterequation separately?




Below is the fit of the model to the data, in case where the initialamount of the bacteria also is one of the estimated parameters:

0 1 2 3 4 5 6 7 80

10

20

30

40

50

60

70

80

But how unique is the result? How reliable are the predictionsmade by the model?




These questions may be answered, if we create all the possible fitsto the data, instead of just the best fitting one.The production of ’all’ the solutions – all that statisticallyreasonably well fit the noisy data – may be done by ’Monte Carlo’sampling methods. Solution as a figure:

0 1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

70

80

90POSSIBLE FITS AND DATA (o)

TIME

BA

CT

ER

IA, T

EM

PE

RA

TU

RE




We see that there is considerable uncertainty in the results, due tothe unknown initial amount of the bacteria:

either the initial amount of bacteria is low (50), almost nobacteria dies at low temperatures, but then rapidly dies attemperatures around 50 C.

or the initial amount of bacteria is higher (60), and thebacteria starts vanishing already at lower temperatures.

To resolve the ambiguity we should repeat the measurements atroom temperatures, or use available ’a priori’ knowledge about thebehavior of bacteria - it hardly starts decaying at roomtemperatures.




The heat conductivity U does not depend on the reaction, and itmay be indeed estimated separately. Note however, that theuncertainty in estimating U also effects (increases) the uncertaintyconcerning the decay of bacteria, so they better would beestimated together.In principle, the decay of bacteria might be calculated separately,too. But the integration of the equation requires temperaturevalues T for all time values t, and the temperatures only aremeasured for a few time points. So we should somehow interpolatethe measured – and noisy – values between the measured timepoints. Again, the analysis of the modelling results is more reliableif the modelling of B and T is done together.


Data Assimilation - LUTData assimilation for nonlinear dynamical models Matylda Jab lonsk a Data...

Documents

Transcript of Data Assimilation - LUTData assimilation for nonlinear dynamical models Matylda Jab lonsk a Data...