Iterative Algorithms for Data Reconciliation Estimator Using Generalized t ...

Iterative Algorithms for Data Reconciliation Estimator UsingGeneralized t‑Distribution Noise ModelHoang Dung Vu*

Electrical and Computer Engineering, National University of Singapore, 11756 Singapore

ABSTRACT: The generalized t-distribution (GT) is well-known because of its flexibility in transforming into many populardistributions. However, implementation of data reconciliation (DR) estimator using GT noise is somehow difficult due to itscomplex structure. This work proposes two iterative algorithms to ease the complexity of the GT DR estimator, hence making iteasy to implement even in a large-scale problem. We also point out the convergence condition for each algorithm. Somesimulation examples are shown to verify the effectiveness of the proposed algorithms on computational time. The results fromthis work can also be applied to other data reconciliation estimators.

1. INTRODUCTIONData reconciliation (DR) is the popular technique that adjuststhe measurement data so that the reconciled data follow someconstraints established by some physical laws such as mass andenergy conservation.1−3 DR has been widely employed in manyindustry fields: industrial chemical processes,3 wastewatertreatment,4 and thermal reactor power,5 etc. Commonly, theleast-squares (LS) method is chosen to solve the DR problem.However, it is known that LS is very sensitive to outliers; it alsoperforms poorly in case of non-Gaussian noise. Various schemeshave been developed to cope with this problem such as usingredescending estimators,6 which is mainly based on the novelrobust statistics works by Huber and Ronchetti7 and Hampelet al.,8 another approach is to use the contaminated distributiontomodel the measurement error,9 and so on. A broader approachis to use the generalized t-distribution (GT, Figure 1) noisemodel as the GT distribution is very flexible in transforming tomany other distributions.10 This approach has been widelyadopted in many research areas such as econometrics,11 data rec-onciliation problems,12,13 bias estimation for multizone thermalsystems,14 and parameters estimation in semiconductor manu-facturing.15 Moreover, Wang and Romagnoli12,13 have shownthat data reconciliation with the GT noise model is very robustagainst outliers.However, even when many robust techniques have been

developed in literature, LS is still a favored choice for manyapplications. It is because LS method is very easy to implementwhile other techniques require one to solve the nonlinearoptimization with constraint. Solving the nonlinear optimizationwith constraint requires very intensive computation which is notapplicable for online estimation. Although a fast computer cancure the computational burden, it will also increase the costwhich is not suitable for industry. Therefore, it is necessary todevelop iterative algorithms that are easy to implement but alsofast enough for online estimation. Alhaj-Dibo et al.9 proposed aniterative algorithm for some DR estimators; however, it has twomain drawbacks: First, it can only be used in the case of a unitsample size (N = 1); when the sample size is greater than one,e.g., one collects more sets of data to obtain one estimate, thealgorithm cannot be used. Second, it lacks a rigorousmathematical proof for convergence, and hence, it cannot tell

if the algorithm will always converge and which estimator issuitable to apply. In this work, we propose two iterative algo-rithms for the Generalize t data reconciliation estimator. We alsoshow the convergence proof and convergence conditions ofthe two proposed algorithms. Some examples are presented todemonstrate the effectiveness of the two algorithms. Despite thetwo algorithms being developed for the GT DR estimator, by theconvergence proof and conditions, readers may find them beapplicable for other data reconciliation estimators as well. This ismentioned in the Discussion.This work is organized as follows: in the next section we will

give a brief overview about the DR problem and the DR GTestimator; follow-ups are our two proposed algorithms withcomplete convergence proofs and conditions; some simulationcase studies will be shown in Computational Evaluation followedby the Discussion; in the end, a summary of this work will begiven in the Conclusion.

2. PROBLEM FORMULATION

Consider a linear-constraint data reconciliation problem

ε= +

=

y k x k

Ax

( ) ( )

s.t. 0 (1)

where y(k), x, and ε(k) are the kth (n × 1) measurement vector,(n × 1) true value vector, and (n × 1) measurement noise vector,respectively. A is the (m × n) constraint matrix which is usuallydefined by some physical and energy conserved laws, matrix A isassumed to be full rank; i.e., matrix AAT is invertible. A maximumlikelihood estimator (MLE) is to maximize the following log-likelihood function

∑ ∑ ∑ ∑θ ρ θ= | = − |

== = = =

L x f y k x y k x

Ax

( ) log( ( ( ), )) ( ( ), )

s.t. 0i

n

k

N

i i ii

n

k

N

i i i1 1 1 1

(2)

Received: June 5, 2013Revised: September 10, 2013Accepted: December 22, 2013Published: December 23, 2013

Article

pubs.acs.org/IECR

© 2013 American Chemical Society 1478 dx.doi.org/10.1021/ie401787z | Ind. Eng. Chem. Res. 2014, 53, 1478−1488

pubs.acs.org/IECR

where N is the sample size, i.e., a collection of N samples for oneestimate, f(.|θ) is the probability density function of the noisemodel distribution, θ is the parameter(s) of distribution f, andρ(.|θ) is defined as

ρ θ θ| = − |f(. ) log (. )

To solve (2), the Lagrange function is introduced as

∑ ∑λ ρ θ λ= − | += =

F x y k x Ax( , ) ( ( ), )i

n

k

N

i i iT

1 1 (3)

where λ is the (m× 1) Lagrange multiplier vector. Differentiating(3) with respect to {x, λ} gives

∑

∑

λ ρ θλ

ρλ

=|

+ = =

= − + =

=

=

F xx

y k x

xa i n

r kr

a

d ( , )d

d ( ( ), )

d0 ( 1... )

d ( ( ))d

0

i k

Ni i i

iiT

k

Ni

iiT

1

1 (4)

λλ

= =F xAx

d ( , )d

0(5)

where ri(k) = yi(k) − xi is the residual and ai is the ith column ofthe constraint matrix A.The GT DR estimator is defined by replacing f(.|θi) by

f GT(.|θi), where

θ σ= p q{ , , }i i i i

θσ β

| =+

σ| − | +

⎜ ⎟⎛⎝

⎞⎠

f y k xp

q p q( ( ), )

2 (1/ , ) 1i i i

i

i ip

i iy k x

q

q pGT( )

1/i i i

pi

i ipi

i i

ρσ

= + +| |⎛

⎝⎜⎜

⎞⎠⎟⎟r k q p

r kq

( ( )) ( 1/ )log 1( )

i i ii

p

i ip

i

i(6)

ρσ

=+ | |

+ | |

−rr

pq r r

q rd ( )

d

( 1)i

i

i i i ip

i ip

ip

2i

i i (7)

The parameters θi = {pi, qi, σi} should be specified before the GTestimator is used. A suggestion for choosing θi is to continuouslyadapt to the residual obtained from the current estimate.12

Another method is based on the maximum likelihood of thehistorical data.15 More detailed discussion can be found inprevious studies.12,15 Solving (4) and (5) will give the estimate x .However, (4) and (5) consist of n nonlinear and m linear

equations with (m + n) unknowns. Although there are someglobal solvers for this problem, they also require a large amountof resources and very intensive computational loads. In the nexttwo sections, two simplified iterative algorithms will be proposedto reduce the computational complexity. The algorithms willonly calculate x instead of both x and λ; this makes the com-putation significantly faster and easier to implement than con-ventional methods. Because of the complex structure of theGT DR estimator, two algorithms will be proposed for p≤ 2 andp ≥ 2.

3. ITERATIVE WEIGHTING ALGORITHM FOR P ≤ 2

For p ≤ 2, the iterative weighting algorithm is described asfollows:

When pi≤ 2, algorithm 1 will guarantee the convergence of x(t)

in a limited step t.3.1. Proof for Algorithm 1. In this section, we will prove

that algorithm 1 will converge to the maximum of the likelihoodfunction (2) with the constraint Ax = 0.For the sake of brevity, ρ(yi(k), xi|θi) in eq 2 is written as

ρ(ri(k)). From (2), (6), and (8), we note that L(ri), ρ(ri), andw(ri) are even functions; i.e., w(ri) = w(−ri). Therefore, we justneed to prove the convergence for the case when ri > 0.For ri > 0, we define a function

ρ=g r r( ) ( )i i2

(12)

By the definition of g(.) in (12), we have

Figure 1. Different choices of the GT distribution shape parameters p and q can give different well-known distributions.

Industrial & Engineering Chemistry Research Article

dx.doi.org/10.1021/ie401787z | Ind. Eng. Chem. Res. 2014, 53, 1478−14881479

http://pubs.acs.org/action/showImage?doi=10.1021/ie401787z&iName=master.img-000.png&w=239&h=185


ρρ

= =

= =

rr

rg r

r

g rr

rr

rg r

( )d ( )

dd ( )

d

d ( )d( )

d( )d

2 ( )

ii

i

i

i

i

i

i

ii i

2

2

2

22

(13)

It follows from (7),(8) and (13) that

ρρ= = =

rr

r rg r rw rd ( )

d( ) 2 ( ) ( )i

ii i i i i

2

or

= w r g r( ) 2 ( )i i2

(14)

Now we will provide three claims needed to prove algorithm 1.Claim 1. With w(ri) def ined in (8), if pi ≤ 2 , then w(ri) is

nonincreasing in ri > 0; hence, g(ri2) is concave.

Proof. The derivative of w(r) is

σ

σ = −

| | + | | −

+ | |w r

r q r p

r q r( )

2 (2 )

( )ii

pi i

pi

pi

i i ip

ip

2

3 2

i i i

i i

It is obvious that if pi ≤ 2 and ri > 0, w(ri) is nonpositive. Thisimplies that w(ri) is nonincreasing. And because of (14), g(ri

2) isconcave.One notices that for ri > 0, w(ri) is only nonincreasing when

pi ≤ 2; when pi > 2, w(ri) is no longer monotonic in ri > 0. Someillustrative graphics of w(ri) with some different values of pi areshown in Figure 2.By the concavity of g(.), we have the following inequality

− ≤ −g a g b g b a b( ) ( ) ( )( )2 2 2 2 2

Using (14),

− ≤ − +g a g b w b a b a b( ) ( )12

( )( )( )2 2(15)

Claim 2. x(t+1) given in (11) is the solution of the followingequation

λ

− =− +

+

⎡⎣⎢

⎤⎦⎥⎡⎣⎢⎢

⎤⎦⎥⎥

⎡⎣⎢

⎤⎦⎥

W AA

x Y0 0

T t

t

1 ( 1)

( 1) (16)

Proof. Let

λ = −+ −AWA AWY( )t T( 1) 1 (17)

Substituting x(t+1) from (11) and λ(t+1) from (17) into the left-hand side of eq 16 gives the right-hand side of eq 16. Thiscompletes the proof of claim 2.From claim 2 and eq 16, we have an immediate result

λ− =+ +Wx A Yt T t( 1) ( 1)

or

∑ ∑λ− ==

+ +

=

w r k x a w r k y k( ( )) ( ( )) ( )k

N

it

it

iT t

k

N

it

i1

( ) ( 1) ( 1)

1

( )

(18)

Claim 3. x(t+1) given in (11) will be consistent in every step t (t =1, ...), or at every step t > 0, the following condition holds true,regardless, to Y:

=+Ax 0t( 1) (19)

Proof. From (11),

Figure 2. Illustrative graphics of w(ri) with different pi: (a) p = 1.6; (b) p = 2.0; (c) p = 3.0; (d) p = 4.0.



http://pubs.acs.org/action/showImage?doi=10.1021/ie401787z&iName=master.img-002.jpg&w=398&h=308

= −

= −

= −

=

+ −

−

Ax A W WA AWA AW Y

AW AWA AWA AW Y

AW AW Y

( ( ) )

( ( )( ) )

( )

0

t T T

T T

( 1) 1

1

This completes the proof of claim 3.Now we will examine {∑κ=1

N ρ(ri(t+1)(κ)) − ∑κ=1

N ρ(ri(t)(κ))}.

Using eq 12 and inequality 15, we have

∑ ∑

∑ ∑

∑

∑

ρ ρ−

= −

≤ − +

= − − −

=

+

=

=

+

=

=

+ +

=

+ +

r k r k

g r k g r k

w r k r k r k r k r k

w r k x x y k x x

( ( )) ( ( ))

([ ( )] ) ([ ( )] )

12

( ( ))( ( ) ( ))( ( ) ( ))

12

( ( ))( )(2 ( ) )

k

N

it

k

N

it

k

N

it

k

N

it

k

Nt t t t t

k

Nt

it

it

i it

it

1

( 1)

1

( )

1

( 1) 2

1

( ) 2

1

( ) ( 1) ( ) ( 1) ( )

1

( ) ( ) ( 1) ( ) ( 1)

Using (18),

∑ ∑

∑

∑

ρ ρ

λ

λ

−

≤ − − −

= − − − + Δ

=

+

=

+

=

+ +

+

=

+ + +

r k r k

x x w r k x x a

x x w r k x x x a

( ( )) ( ( ))

12

( )[ ( ( ))( ) 2 ]

12

( ) ( ( ))( )

k

N

it

k

N

it

it

it

k

Nt

it

it

iT t

it

it

k

Nt

it

it

it

iT t

1

( 1)

1

( )

( ) ( 1)

1

( ) ( 1) ( ) ( 1)

( 1) ( )

1

( ) ( 1) ( ) ( 1) ( 1)

(20)

where Δxi(t+1) = xi(t+1) − Δxi(t). From (8), it is obvious that

w(r(t)(k)) ≥ 0; hence (20) becomes

∑ ∑ρ ρ λ− ≤ −=

+

=

+ +r k r k x x a( ( )) ( ( )) ( )k

N

it

k

N

it

it

it

iT t

1

( 1)

1

( ) ( 1) ( ) ( 1)

(21)

Take summation of (21) all over i (i = 1...n)

∑ ∑

∑

ρ ρ

λ

λ

− = −

≤ −

= −

+

= =

+

=

+ +

+

L x L x r k r k

x x a

x x A

( ) ( ) [ ( ( )) ( ( ))]

( )

( )

t t

i

n

k

N

it

it

j

n

it

it

iT t

t t T T

( ) ( 1)

1 1

( 1) ( )

1

( 1) ( ) ( 1)

( 1) ( )

Using (19),

− ≤+L x L x( ) ( ) 0t t( ) ( 1)

or

≥+L x L x( ) ( )t t( 1) ( )(22)

Equation 22 shows that the log-likelihood L increases at everynew step t until it reaches the maximum which gives the optimalsolution x . This completes the proof for algorithm 1.Remark 1. From claim 1 and Figure 2, one can see that if p > 2,

the weighting function w(r) will not be nonincreasing; hence,inequality 15 will not hold true, and therefore the convergence ofalgorithm 1 will not be guaranteed.Remark 2. Consider the unit sample size case (N = 1):

substituting N = 1 into (9) and (10) gives

=

=

W w r w r

Y w r y w r y

diag(1/ ( (1)), ..., 1/ ( (1)))

[ ( (1)) (1), ..., ( (1)) (1)]

tn

t

tn

tn

T

1( ) ( )

1( )

1( )

hence,

× = =W Y y y Y[ (1), ..., (1)]nT

1

Equation (11) then becomes

= −+ −x I WA AWA AW Y( ( ) )t T T( 1) 1 (23)

One notes that (23) is the same algorithm suggested by Alhaj-Diboet al.9

3.2. Illustrative Examples. In this section, we will presentsome calculation examples to show the effectiveness of the pro-posed iterative algorithm.

3.2.1. Example 1. Consider the following DR problem

ε

ε= +

⎡

⎣⎢⎢

⎤

⎦⎥⎥

⎡⎣⎢

⎤⎦⎥

⎡⎣⎢⎢

⎤⎦⎥⎥

y k

y k

xx

k

k

( )

( )

( )

( )1

2

1

2

1

2

− =x x2 01 2

Let the true value x be [2 1]T. Given a batch of N = 5measurements

=

= − −

y k

y k

( ) {2.9020, 3.5300, 0.9197, 2.9681, 1.8519}

( ) { 0.0549, 0.5304, 0.1447, 0.4830, 0.5318}1

2

The parameters for GT estimator are

θ σ

θ σ

= =

= =

p q

p q

{ , , } {2, 2, 1.414}

{ , , } {2, 3, 1.414}

1 1 1 1

2 2 2 2

Using these parameters and y(k) to solve (4) and (5) gives x ={1.9885, 0.9942}. Table 1 shows the result using algorithm 1with

the initial value x(0) = {1, 1}. As can be seen in Table 1, algorithm 1requires six steps to achieve the minimum x with the toleranceof 10−5. The simplicity of calculation and the guarantee ofL(x(t+1)) ≥ L(x(t)) (see the last column of Table 1) makes algo-rithm 1 most suitable for the implementation of the GT esti-mator when p≤ 2. Some graphical views of Table 1 can be seen inFigure 3.

3.2.2. Example 2. In this example, we will show one case inwhich, with p > 2, algorithm 1 does not converge and in somestep t: L(x(t)) < L(x(t+1)). Consider the same problem as inexample 1 but with pi = 4 (i = 1, 2) instead of 2. Algorithm 1 isimplemented with the same y(k) and {qi, σi}. The results areshown in Table 2 and Figure 4. Figure 4 shows that xi

(t) oscillatesand will not converge. This result supports our claim in remark 1

Table 1. Example 1 Results

step t x(t) = {x1(t), x2

(t)} ∥x(t) − x∥ likelihood L(x)

0 {1.0000, 1.0000} 0.9885 −5.39341 {1.9163, 0.9582} 0.0807 −3.39152 {1.9772, 0.9886} 0.0126 −3.37693 {1.9867, 0.9933} 0.0020 −3.37664 {1.9882, 0.9941} 0.0003 −3.37665 {1.9884, 0.9942} 0.0001 −3.37666 {1.9885, 0.9942} 0.0000 −3.37667 {1.9885, 0.9942} 0.0000 −3.3766



that when pi > 2, the convergence of algorithm 1 will not beguaranteed.

4. ITERATIVE ALGORITHM FOR P ≥ 2

When p > 2, algorithm 1 is not suitable as the weightingw(r) is nolonger nonincreasing in r > 0. Hence, in this case, we propose anew iterative algorithm which is rather simple in calculation. Thealgorithm is described as follows:

where

∑σ

= = +| |

+ | |=

−L x

Lx

pqr rq r

( )dd

( 1)sgn( )

ii

i ik

Ni

pi

i ip p1

1

1i

i i (28)

∑σ

σ= = +

| | | | + −

+ | |=

−⎡⎣⎢⎢

⎤⎦⎥⎥L x

Lx

pqr r q p

q r( )

dd

( 1)( (1 ))

( )ii

i ik

Ni

pi

pi i

pi

i ip

ip2

1

1

2

2

i i i

i i

(29)

4.1. Proofs for Algorithm 2. To prove algorithm 2, wepropose two claims.Claim 4. With L2 def ined in (29) and pi ≥ 2, L2(ri) has one

global minimum at

σ| | = − + + −+ −⎡

⎣⎢⎢⎛⎝⎜⎜

⎞⎠⎟⎟

⎤

⎦⎥⎥r q

pq p q pq p p3

4 4

6 7

4i i ii i i i i i i i

p2 2

1/ i

(30)

and mi = min L2(ri) < 0.

Figure 3. Results of example 1: (a) x(t) trajectory (x1(t), solid line; x2

(t), dashed line); (b) likelihood L(x(t)).

Table 2. Results of the Iterative Weighting Algorithm forExample 2 (pi = 4)

step t x(t) = {x1(t), x2

(t)} ∥x(t) − x∥ likelihood L(x)

0 {2.1704, 1.0852} 0.2800 −2.98291 {2.7434, 1.3717} 0.3606 −3.30312 {2.1495, 1.0748} 0.3034 −3.02293 {2.7756, 1.3878} 0.3966 −3.43164 {2.1283, 1.0641} 0.3271 −3.06595 {2.8089, 1.4045} 0.4339 −3.58006 {2.1069, 1.0535} 0.3510 −3.11137 {2.8431, 1.4215} 0.4720 −3.74798 {2.0858, 1.0429} 0.3746 −3.1583

Figure 4. Results of example 2: (a) x(t) trajectory (x1(t), solid line; x2

(t), dashed line); (b) likelihood L(x(t)).






Proof. By differentiating L2(ri) with respect to (wrt) ri andequating it to zero gives five solutions

σ

=

| | = − + + ±+ −⎡

⎣⎢⎢⎛⎝⎜⎜

⎞⎠⎟⎟

⎤

⎦⎥⎥

r

r qpq p q pq p p

0

3

4 4

6 7

4

i

i i ii i i i i i i i

p2 2

1/ i

By checking the sign of ((d2L2)/(dri2))(ri), L2(ri) has a mini-

mum at

σ| | = − + + −+ −⎡

⎣⎢⎢⎛⎝⎜⎜

⎞⎠⎟⎟

⎤

⎦⎥⎥r q

pq p q pq p p3

4 4

6 7

4i i ii i i i i i i i

p2 2

1/ i

One notes that, with pi > 2, L2(ri) is continuous at all ri and

=→∞

L rlim ( ) 0r

i2i

This implies that the minimum of L2(ri) with ri defined in (30) isthe global minimum. One also notes that, at |ri| = σi[qi(pi −1)]1/pi, L2(ri) = 0; therefore mi < 0.Following claim 4, we have the following results

≥ ×h L x h h m h n h( ) for any vector ( 1)Ti

t Ti

( )(31)

−Δ Δ ≥+ − +x B x 0t T t( 1) 1 ( 1) (32)

The following claim is needed to prove algorithm 2.Claim 5. x(t+1) def ined in (27) in algorithm 2 is the solution of the

following equation

λ λ

λ− = − −−− +

+

−⎡⎣⎢

⎤⎦⎥⎡⎣⎢⎢

⎤⎦⎥⎥

⎡⎣⎢

⎤⎦⎥⎡⎣⎢⎢

⎤⎦⎥⎥

⎡⎣⎢⎢

⎤⎦⎥⎥

B AA

x B AA

x L x A

Ax0 0

( )T t

t

T t

t

t T t

t

1 ( 1)

( 1)

1 ( )

( )

1( ) ( )

( )

(33)

Proof. The proof is similar to that of claim 2. Let

λ λ= ++ −ABA ABY( )t t T( 1) ( ) 1 (34)

Substitute (27) and (34) into (33) and then verify the equalityof the left-hand side (lhs) and right-hand side (rhs) of (33). Thiscompletes the proof for claim 5.Claim 6. x(t+1) given in (27) will be consistent in every step t (t = 1,

2, ...), or at every step t > 0, the following condition holds true,regardless, to L1(x

(t))

Δ =+A x 0t( 1) (35)

Proof. From (27),

Δ = −

= − −

= − −

= − −

=

+ +

−

−

A x A x x

A B BA ABA AB L x

AB ABA ABA AB L x

AB AB L x

( )

( ( ) ) ( )

( ( )( ) ) ( )

( ) ( )

0

t t t

T T t

T T t

t

( 1) ( 1) ( )

11

( )

11

( )

1( )

This completes the proof of claim 6. From (33),

Δ = − +− + +B x L x A( ) lt t T t1 ( 1)1

( ) ( 1)(36)

λ⇒Δ Δ = −Δ + Δ+ − + + + +x B x x L x x A( )t t t t t T t( 1) 1 ( 1) ( 1)1

( ) ( 1) ( 1)T T T

(37)

Using (35), (37) becomes

Δ = −Δ Δ+ + − +x L x x B x( )t t t t( 1)1

( ) ( 1) 1 ( 1)T T

(38)

Equation 36 can also be written as

λΔ = − ++ +m x L x a( )i it

it

iT t( 1)

1( ) ( 1)

(39)

Consider the second-order Taylor expansion of L(xi(t+1))

around xi(t)

= + −

+ − −

+ +

+ +

L x L x x x L x

x x L x x x

( ) ( ) ( ) ( )12

( ) ( )( )

it

it

it

it

it

it

it

it

it

it

( 1) ( ) ( 1) ( )1

( )

( 1) ( )2

( ) ( 1) ( )

This approximation is quite accurate as xi(t+1) is usually not far

from xi(t). Using (31),

− ≥ Δ + Δ Δ+ + + +L x L x L x x x m x( ) ( ) ( )12i

ti

ti

ti

ti

ti i

t( 1) ( )1

( ) ( 1) ( 1) ( 1)

Using (39),

λ

λ

− ≥ Δ + Δ − +

= Δ + Δ

+ + + +

+ + +

L x L x L x x x L x a

L x x x a

( ) ( ) ( )12

( ( ) )

12

( )12

it

it

it

it

it

it

iT t

it

it

it

iT t

( 1) ( )1

( ) ( 1) ( 1)1

( ) ( 1)

1( ) ( 1) ( 1) ( 1)

(40)

Taking the summation of (40) all over i (i = 1...n)

∑

∑ ∑ λ

λ

− = −

≥ Δ + Δ

= Δ + Δ

+

=

+

=

+

=

+ +

+ + +

L x L x L x L x

L x x x a

x L x x A

( ) ( ) ( ) ( )

12

( )12

12

( )12

t t

k

N

it

it

k

N

it

it

k

N

it

iT t

t t t T t

( 1) ( )

1

( 1) ( )

11

( ) ( 1)

1

( 1) ( 1)

( 1)1

( ) ( 1) ( 1)T T

(41)

Using (38), (35), and (32), (41) becomes

− ≥ − Δ Δ ≥+ + − +L x L x x B x( ) ( )12

0t t t t( 1) ( ) ( 1) 1 ( 1)T

or

≥+L x L x( ) ( )t t( 1) ( ) (42)

Equation 42 shows that the log-likelihood L increases at everynew step t until it reaches the maximum which gives the optimalsolution x . This completes the proof for Algorithm 2.Remark 3. From claim 6, one notices that onlyΔx(t+1) satisf ies the

constraint AΔx(t+1) = 0, not Ax(t+1) = 0. Hence, it is necessary tochoose the initial value x(0) that Ax(0) = 0.Remark 4. L2(r) def ined in (29) only has the global minimum

when p ≥ 2. Hence, algorithm 2 is only suitable when p ≥ 2.Moreover, at p = 2, L2(r) reaches the minimum at r = 0. Some plots ofL2(r) with dif ferent p are shown in Figure 5.

4.2. Illustrative Example 3. In this section, we will showthat the proposed algorithm 2 can deal with the cases wherealgorithm 1 cannot. Consider the same problem as in illustrativeexample 2. Solving (4) and (5) with the values of pi, qi, σi, andy(k) given in example 2 gives the estimate x = {2.4209, 1.2104}.By implementing algorithm 2 with the initial value x(0) = {2.1704,1.0852}, we have the results in Figure 6 and Table 3. Unlike theoscillation from using algorithm 1, Figure 6 clearly indicates thatx(t) converges to x . Table 3 shows that algorithm 2 takes 13 stepsto reach the error of 10−5. This suggests that algorithm 2 is slowerthan algorithm 1; however its computational simplicity com-pensates for its weakness. The next section will illustrate thisstatement.



5. COMPUTATION EVALUATIONIn this section, we provide an evaluation based on thecomputational time of the two proposed methods in comparisonwith the MATLAB function “fmincon”16 using the “active set”algorithm which is the well-known constraint optimizationmethod. Three simulation case studies are presented with adifferent number of variables xi and constraint A. Thecomparison method is done as follows: the three algorithmsare applied to the same set of data with the same initial value x0,and they will stop when reaching the same estimate accuracy (weuse the accuracy of 10−5 as in the previous examples); the com-putational time is then used for comparison. All of the simu-lations were carried out using MATLAB R2011a running onWindows 7 with Intel Core-i5 and 4GB of RAM.

5.1. Case Study 1. Consider the chemical reactor with fourflows in ref 3. The elemental balances define the followingconstraint matrix

=− −− −− −

⎡

⎣⎢⎢⎢

⎤

⎦⎥⎥⎥

A0.1 0.6 0.2 0.70.8 0.1 0.2 0.10.1 0.3 0.6 0.2

The measured data are generated from t2-distribution, i.e., t-distribution with 2 degrees of freedom, for y1,3 and t3-distributionfor y2,4 using function trnd in MATLAB. The GT estimatorparameters are as follows

σ σ

= = =

= =

= =

= = =

p p

q q

q q

... 2

1

1.5

... 2

1 4

1 3

2 4

1 4

The simulation is set up as follows: we fix the sample sizeN = 50and then vary the number of batch sizes Nb (the total data areN × Nb); the computational time (in seconds) will be measuredfor each number of batch size. The initial value x0 for the threemethods is the well-known least-squares solution which iscalculated by the following equation

∑= Ψ − Ψ Ψ Ψ−

=

xN

A A A A y k1

( ( ) ) ( )T T

k

N

01

1 (43)

For simplicity, we setΨ = In. LetNb increase from 10 to 200: theresults are shown in Table 4.In Table 4, the fmincon function takes about 3 times more

than the other two methods to compute the estimate, whereas

Figure 5. L2(r) with different values of p: (a) p = 1; (b) p = 2; (c) p = 4.

Figure 6. Results of example 3.

Table 3. Results of the Iterative Weighting Algorithm forExample 2 (pi = 4)

step t x(t) = {x1(t),x2

(t)} ∥x(t) −x∥ likelihood L(x)

0 {2.1704, 1.0852} 0.2800 −2.98291 {2.2727, 1.1364} 0.1657 −2.82462 {2.3398, 1.1699} 0.0907 −2.75963 {2.3789, 1.1895} 0.0469 −2.73824 {2.3999, 1.1999} 0.0235 −2.73215 {2.4106, 1.2053} 0.0115 −2.73066 {2.4159, 1.2079} 0.0056 −2.73027 {2.4185, 1.2092} 0.0027 −2.73018 {2.4197, 1.2099} 0.0013 −2.73019 {2.4203, 1.2102} 0.0006 −2.730110 {2.4206, 1.2103} 0.0003 −2.730111 {2.4208, 1.2104} 0.0001 −2.730112 {2.4208, 1.2104} 0.0001 −2.730113 {2.4209, 1.2104} 0.0000 −2.7301





algorithm 1 takes about 40% more time to compute in com-parison with algorithm 2 because in every step t, it needs toevaluate the weighting matrix fmincon and the inversion matrix(AWAT)−1. This might become a problem in the large-scalesituation where many process constraints and variables are in-volved. The differences in computational time between the threemethods are shown in Figure 7a, while Figure 7b shows the dif-ferences between algorithms 1 and 2.5.2. Case Study 2. Next, we will consider an industrial

process steam system for a methanol synthesis unit in ref 17. Thesystem has 12 steams with 28 measurable nodes (variables) withthe constraint shown in Figure 8.For the sake of brevity, all of the measured data from 28

variables y are generated from the uniform distribution in therange of [−2, 2]. We fix the sample size atN = 100 and then varyNb from 10 to 200. The parameters for the GT estimator arelisted below

σ σ

= = =

= = =

= = =

p p

q q

... 2

... 1

... 2

1 28

1 28

1 28

The simulation results are tabulated in Table 5. As the sizes ofx and A grow, more time is required for computation. Thefmincon function takes much more time than in the previous casestudy. In comparison with the fmincon function, algorithm 1

significantly reduces the computational time by a factor of 1/15,while algorithm 2 just needs half the time of algorithm 1 tocomplete the computation. A visual view of the performance ofthe three methods is shown in Figure 9a, while Figure 9b showsthe comparison between algorithm 1 and algorithm 2.

5.3. Case Study 3. In this example, we will investigate howalgorithms 1 and 2 behave when the number of variables nchanges, e.g., the number of sensors changes. We also comparethe two algorithms with the fmincon function. The simulationsetup is as follows: we fix N = Nb = 10, the measured data aregenerated from t6-distribution using function trnd in MATLAB,the GT parameters are set to {pi = 2, qi = 3, σi =√2} for i = 1...n,and the number of variables n are varied, and then at each n wemeasure the computational time for comparison. Because thechange of nwill change the constraint matrix A, hence at n = 2 weset A = [1 −2], and then when n increases, we add [1 −2] tomatrix A. The procedure can be expressed as

=−+

−

−

⎡⎣⎢⎢

⎤⎦⎥⎥A

A 0

0 1 2n

n n

nT1

2

2

where An is the constraint matrix A corresponding to n variables,0n−2 is the (n − 2) column vector in which all of the elements arezero. Examples with n = 2, 3, 4 are listed for illustrative purpose

= − = −−

=−

−−

⎡⎣⎢

⎤⎦⎥

⎡

⎣⎢⎢

⎤

⎦⎥⎥A A A[1 2],

1 2 00 1 2

,1 2 0 00 1 2 00 0 1 2

2 3 4

With n varies from 2 to 200, Table 6 shows the computationaltime of the three methods. It is reasonable that computationaltime increases proportionally to the size of the variables. How-ever, Figure 10a shows that as n increases, the computationaltime for fmincon increases much faster than the other methods.In Figure 10b, algorithm 2 shows its potential by remaining at avery low computational time (less than 0.3 s for n = 200) incomparison with that of algorithm 1, which needs more than 1 sto finish computing the estimate.

6. DISCUSSIONIn the previous section, all three simulation case studies suggestthat algorithms 1 and 2 are very efficient in both computational

Table 4. Computational Time (in Seconds) of the ThreeMethods with N = 50 in Case Study 1

Nb fmincon algorithm 1 algorithm 2

10 0.08464 0.02386 0.0163320 0.16350 0.04458 0.0321740 0.31724 0.08306 0.0656860 0.46234 0.12448 0.0973580 0.63288 0.16462 0.13454100 0.78168 0.20802 0.16471120 0.95727 0.25244 0.20198140 1.10523 0.28634 0.22852160 1.23357 0.32845 0.25854180 1.42277 0.36670 0.29419200 1.55561 0.41318 0.33813

Figure 7. Graphical comparison between the three methods: (a) fmincon, algorithm 1, and algorithm 2 and (b) algorithm 1 vs algorithm 2 in casestudy 1.




simplicity and time efficiency. The simulation also suggests thatalgorithm 2might be slightly faster than algorithm 1. However, asstated in remark 4, algorithm 2 is only applicable for the GTestimator when p ≥ 2 (L2(r) exists as a negative globalminimum). For algorithm 1, as mentioned in remark 1, it onlyguarantees convergence when w(r) is nonincreasing in r > 0 andthis is only true when p ≤ 2. Therefore, algorithms 1 and 2complete the iterative computation for the GT data reconcilia-tion estimator.

In this work, the constraint matrix A is assumed to be full rowrank; i.e., the matrix AAT is invertible. If A is not full row rank, thematrices AWAT in algorithm 1 and ABAT in algorithm 2 will notbe invertible; hence, both algorithms cannot be used. However,Ais not full row rank, indicating that one or some rows in matrix Aare linearly dependent on the basis rows of A. One can simplydeletes the linearly dependent rows to make matrix A full rowrank so that the two proposed algorithms can be applied. In the

Figure 8. Constraint matrix A for industrial case study 2.

Table 5. Computational Time (in Seconds) of the ThreeMethods with N = 100 in Case Study 2

Nb fmincon algorithm 1 algorithm 2

10 3.1350 0.2135 0.103120 6.2836 0.4445 0.210440 12.6021 0.9233 0.435560 18.7140 1.3114 0.623480 25.9526 1.8530 0.9045100 32.9793 2.3200 1.1419120 39.2327 2.7834 1.3373140 45.5501 3.0503 1.4419160 51.9445 3.6735 1.7549180 58.7225 3.9365 1.8475200 65.6590 4.5616 2.1960


Table 6. Computational Time (in Seconds) of the ThreeMethods with n = 2...200 in Case Study 3

n fmincon algorithm 1 algorithm 2

2 0.07128 0.00886 0.003695 0.08213 0.01348 0.0076210 0.11038 0.02175 0.0132520 0.21576 0.04740 0.0331740 0.47817 0.08358 0.0612960 0.75995 0.12105 0.0819180 1.22505 0.17991 0.10390100 2.11728 0.30169 0.16715120 2.64124 0.34185 0.13999140 3.40153 0.44018 0.17415160 4.60913 0.64843 0.23207180 5.76114 0.83588 0.25589200 6.95839 1.12894 0.27068





case of some unmeasured nodes present in matrix A, sometechniques such as a projection matrix approach18 and Q−Rapproach,19 can be used to eliminate the unmeasured variables,and hence the proposed algorithms still can be applied.From section 3.1, one notes that the key point to prove the

convergence of algorithm 1 is the weighting function w(r) =(dρ(r)/(r dr)) is nonincreasing in (r > 0). This means that if anyestimator which satisfies the conditions L(r), ρ(r), and w(r) areeven functions (this is usually true as we always assume sym-metric noise) and w(r) is a nonincreasing function in (r > 0),algorithm 1 can be applied. There are plenty of estimators thatsatisfy the above conditions, such as the popular bisquare esti-mator, the Huber estimator, the χ2 estimator, and the t-estimator,etc. However, not all of the estimators have the nonincreasingweighting function; e.g., the GT estimator with p > 2. When anestimator ρ(x) has a negative global minimum of the secondderivative (d2ρ(x)/dx2), algorithm 2 can be applicable as section4.1 points out that the needed condition for convergence ofalgorithm 2 is min(d2ρ(x)/dx2) < 0. However, as in remark 3, foralgorithm 2, the initial estimate x(0) needs to satisfy the con-straint Ax(0) = 0. A suggestion for choosing x(0) is the least-squares solution (43) or the previous estimate. Some estimatorsthat have the second derivative (d2ρ(x)/dx2) < 0 are the bisquareestimator, the logistic estimator, and the t-estimator.Although the two algorithms proposed in this study are to

facilitate the implementation of the estimator (2) with fixedparameters θi, they may be successfully applied in some parts ofother broader DR frameworks, e.g., step 3 of the fully adaptivedata reconciliation,12 where y is estimated provided that the con-straint is linear, or the linearization step of the successive line-arizations method3 for the nonlinear DR problem. However, forthe case of nonlinear constraint, mathematic proof is essential toensure the convergence and stability of the applied DR frame-work, which will be carried out in our future work.

7. CONCLUSION

In this work, we propose two iterative algorithms for the GTDR estimator. The proposed algorithms are structurally simpleand computationally efficient, which are suitable for onlineestimation and implementation in large-scale industry processes.Rigorous proofs and conditions for convergence are also pre-sented. Some examples and industrial case studies are shown to

demonstrate the efficiency of the proposed algorithms. Finally,by the presented convergence conditions, readers may find thealgorithms be applicable in many other estimators.

■ AUTHOR INFORMATIONCorresponding Author*E-mail: [email protected] authors declare no competing financial interest.

■ REFERENCES(1) Crowe, C. M. Data reconciliationProgress and challenges. J.Process Control 1996, 6, 89−98.(2) Narasimhan, S.; Jordache, C. Data reconciliation and gross errordetection: An intelligent use of process data. Ann. N. Y. Acad. Sci. 1999,195, 406.(3) Romagnoli, J. A.; Sanchez, M. C. Data processing and reconciliationfor chemical process operations; Academic Press: New York, 1999.(4) Rieger, L.; Takacs, I.; Villez, K.; Siegrist, H.; Lessard, P.;Vanrolleghem, P. a.; Comeau, Y. Data Reconciliation for WastewaterTreatment Plant Simulation StudiesPlanning for High-Quality Dataand Typical Sources of Errors. Water Environ. Res. 2010, 82, 426−433.(5) Valdetaro, E. D.; Schirru, R. SimultaneousModel Selection, RobustData Reconciliation and Outlier Detection with Swarm Intelligence in aThermal Reactor Power calculation. Ann. Nucl. Energy 2011, 38, 1820−1832.(6) Arora, N.; Biegler, L. T. Redescending estimators for datareconciliation and parameter estimation. Comput. Chem. Eng. 2001, 25,1585−1599.(7) Huber, P. J.; Ronchetti, E. M. Statistics & Probability Letters, 2nded.; Wiley Series in Probability and Statistics 1; John Wiley & Sons:Hoboken, NJ, USA, 2009; Vol. 15; pp 21−26.(8) Hampel, F. R.; Ronchetti, E. M.; Rousseeuw, P. J.; Stahel, W. A.Robust Statistics: The Approach Based on Influence Functions; John Wiley& Sons: New York, 1986; p 536.(9) Alhaj-Dibo, M.; Maquin, D.; Ragot, J. Data reconciliation: A robustapproach using a contaminated distribution. Control Eng. Pract. 2008,16, 159−170.(10)McDonald, J. B. Partially adaptive estimation of ARMA time seriesmodels. Int. J. Forecasting 1989, 5, 217−230.(11) Theodossiou, P. Financial Data and the Skewed Generalized TDistribution. Manage. Sci. 1998, 44, 1650−1661.(12) Wang, D.; Romagnoli, J. A. A Framework for Robust DataReconciliation Based on a Generalized Objective Function. Ind. Eng.Chem. Res. 2003, 42, 3075−3084.




mailto:[email protected]


(13) Wang, D.; Romagnoli, J. Generalized T distribution and itsapplications to process data reconciliation and process monitoring.Trans. Inst. Meas. Control 2005, 27, 367−390.(14) Yan, H.; Ho, W. K.; Ling, K. V.; Lim, K. W. Multi-Zone ThermalProcessing in Semiconductor Manufacturing: Bias Estimation. IEEETrans. Ind. Inf. 2010, 6, 216−228.(15) Ho, W. K.; Vu, H. D.; Ling, K. V. Influence Function Analysis ofParameter Estimation with Generalized t Distribution NoiseModel. Ind.Eng. Chem. Res. 2013, 52, 4168−4177.(16) Find minimum of constrained nonlinear multivariable functionMATLAB. http://www.mathworks.com/help/toolbox/optim/ug/fmincon.html(17) Serth, R. W.; Heenan, W. A. Gross error detection and datareconciliation in steam-metering systems. AIChE J. 1986, 32, 733−742.(18) Crowe, C. M.; Campos, Y. A. G.; Hrymak, A. Reconciliation ofprocess flow rates by matrix projection. Part I: Linear case. AIChE J.1983, 29, 881−888.(19) Sanchez, M.; Romagnoli, J. Use of orthogonal transformations indata classification-reconciliation. Comput. Chem. Eng. 1996, 20, 483−493.



http://www.mathworks.com/help/toolbox/optim/ug/fmincon.html

http://www.mathworks.com/help/toolbox/optim/ug/fmincon.html

Iterative Algorithms for Data Reconciliation Estimator Using Generalized t ...

Documents

Transcript of Iterative Algorithms for Data Reconciliation Estimator Using Generalized t ...