Optimal control of batch processes using particle swam optimisation with stacked neural network...

9
Computers and Chemical Engineering 33 (2009) 1593–1601 Contents lists available at ScienceDirect Computers and Chemical Engineering journal homepage: www.elsevier.com/locate/compchemeng Optimal control of batch processes using particle swam optimisation with stacked neural network models Fernando Herrera, Jie Zhang School of Chemical Engineering and Advanced Materials, Newcastle University, Newcastle upon Tyne NE1 7RU, UK article info Article history: Received 13 October 2008 Accepted 10 January 2009 Available online 31 January 2009 Keywords: Batch processes Neural networks Particle swam optimisation Reliability abstract An optimal control strategy for batch processes using particle swam optimisation (PSO) and stacked neural networks is presented in this paper. Stacked neural network models are developed form historical process operation data. Stacked neural networks are used to improve model generalisation capability, as well as provide model prediction confidence bounds. In order to improve the reliability of the calculated optimal control policy, an additional term is introduced in the optimisation objective function to penalize wide model prediction confidence bounds. The optimisation problem is solved using PSO, which can cope with multiple local minima and could generally find the global minimum. Application to a simulated fed-batch process demonstrates that the proposed technique is very effective. © 2009 Elsevier Ltd. All rights reserved. 1. Introduction Batch or semi-batch processes are suitable for the responsive manufacturing of high value added products (Bonvin, 1998). Batch and fed-batch processes are becoming the important means of manufacturing in the chemical and pharmaceutical industries. To maximise the profit from batch process manufacturing, optimal control should be applied to batch processes. The performance of optimal control depends on the accuracy of the process model. Mechanistic models have been utilised for many years for opti- mal control studies (Luus, 1991; Park & Ramirez, 1988). However, developing full phenomenological models for complex processes is usually very difficult and time consuming if feasible at all. The time and effort needed to develop mechanistic models has tended to limit the applications of mechanistic model-based opti- mal control strategies, especially to agile responsive manufacturing processes. Data-based empirical models, such as neural network models (Zhang, 2005a,b), nonlinear partial least square models (Zhao, Zhang, & Xu, 2006), and hybrid models (Tian, Zhang, & Morris, 2001) have to be utilised. Currently the most popular data- based nonlinear modelling technique is artificial neural networks. Neural network models gain their attraction from speed and ease of implementation, wide applicability and abundant knowledge and research that they have been receiving. Neural networks have been widely used in process modelling and control (Morris, Montague, & Willis, 1994; Zhang, 2005a,b). Stacked neural networks have been shown to possess better generalisation capability than sin- Corresponding author. Tel.: +44 191 222 7240; fax: +44 191 222 5292. E-mail address: [email protected] (J. Zhang). gle neural networks (Sridhar, Seagrave, & Bartlett, 1996; Zhang, Martin, Morris, & Kiparissides, 1997) and are used in this paper to model batch processes. An additional feature of stacked neural net- works is that they can also provide prediction confidence bounds indicating the reliability of the corresponding model predictions (Zhang, 1999). Due to model–plant mismatches, the “optimal” con- trol policy calculated from a neural network model may not be optimal when applied to the actual process (Zhang, 2004). Thus it is important that the calculated optimal control policy should be reliable. The use of neural network model-based optimal control strat- egy is faced with two major challenges. The first challenge is the non-robust performance of neural networks when they are applied to unseen data and the second challenge is the need for power- ful global optimisation method that can effectively overcome the conventional problem of falling into local minima. Various tech- niques have been proposed to enhance the robustness of neural network models, such as aggregating multiple neural networks (Sridhar et al., 1996; Wolpert, 1992; Zhang et al., 1997), integrat- ing first principle knowledge into neural networks (Tian et al., 2001), and training neural networks with both static and dynamic process data (Zhang, 2001). Neural network models are typically nonlinear and thus are rich in sub-optimal traps that can lock in the traditional gradient-based optimisation methods. Conventional gradient base optimisation techniques are not effective to deal with objective functions with multiple local minima and can be trapped in local minima. Therefore, population-based optimisation meth- ods such as genetic algorithms (Goldberg, 1989), particle swam optimisation (Kennedy & Eberhart, 1995) and ant colony optimisa- tion (Dorigo & Gambardella, 1997) should be used to overcome this problem. 0098-1354/$ – see front matter © 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.compchemeng.2009.01.009

Transcript of Optimal control of batch processes using particle swam optimisation with stacked neural network...

Ow

FS

a

ARAA

KBNPR

1

mammcoMmdiTtmpm(MbNirw&b

0d

Computers and Chemical Engineering 33 (2009) 1593–1601

Contents lists available at ScienceDirect

Computers and Chemical Engineering

journa l homepage: www.e lsev ier .com/ locate /compchemeng

ptimal control of batch processes using particle swam optimisationith stacked neural network models

ernando Herrera, Jie Zhang ∗

chool of Chemical Engineering and Advanced Materials, Newcastle University, Newcastle upon Tyne NE1 7RU, UK

r t i c l e i n f o

rticle history:eceived 13 October 2008

a b s t r a c t

An optimal control strategy for batch processes using particle swam optimisation (PSO) and stacked neuralnetworks is presented in this paper. Stacked neural network models are developed form historical process

ccepted 10 January 2009vailable online 31 January 2009

eywords:atch processeseural networks

operation data. Stacked neural networks are used to improve model generalisation capability, as well asprovide model prediction confidence bounds. In order to improve the reliability of the calculated optimalcontrol policy, an additional term is introduced in the optimisation objective function to penalize widemodel prediction confidence bounds. The optimisation problem is solved using PSO, which can cope withmultiple local minima and could generally find the global minimum. Application to a simulated fed-batch

t the

article swam optimisationeliability

process demonstrates tha

. Introduction

Batch or semi-batch processes are suitable for the responsiveanufacturing of high value added products (Bonvin, 1998). Batch

nd fed-batch processes are becoming the important means ofanufacturing in the chemical and pharmaceutical industries. Toaximise the profit from batch process manufacturing, optimal

ontrol should be applied to batch processes. The performance ofptimal control depends on the accuracy of the process model.echanistic models have been utilised for many years for opti-al control studies (Luus, 1991; Park & Ramirez, 1988). However,

eveloping full phenomenological models for complex processess usually very difficult and time consuming if feasible at all.he time and effort needed to develop mechanistic models hasended to limit the applications of mechanistic model-based opti-

al control strategies, especially to agile responsive manufacturingrocesses. Data-based empirical models, such as neural networkodels (Zhang, 2005a,b), nonlinear partial least square models

Zhao, Zhang, & Xu, 2006), and hybrid models (Tian, Zhang, &orris, 2001) have to be utilised. Currently the most popular data-

ased nonlinear modelling technique is artificial neural networks.eural network models gain their attraction from speed and ease of

mplementation, wide applicability and abundant knowledge and

esearch that they have been receiving. Neural networks have beenidely used in process modelling and control (Morris, Montague,Willis, 1994; Zhang, 2005a,b). Stacked neural networks have

een shown to possess better generalisation capability than sin-

∗ Corresponding author. Tel.: +44 191 222 7240; fax: +44 191 222 5292.E-mail address: [email protected] (J. Zhang).

098-1354/$ – see front matter © 2009 Elsevier Ltd. All rights reserved.oi:10.1016/j.compchemeng.2009.01.009

proposed technique is very effective.© 2009 Elsevier Ltd. All rights reserved.

gle neural networks (Sridhar, Seagrave, & Bartlett, 1996; Zhang,Martin, Morris, & Kiparissides, 1997) and are used in this paper tomodel batch processes. An additional feature of stacked neural net-works is that they can also provide prediction confidence boundsindicating the reliability of the corresponding model predictions(Zhang, 1999). Due to model–plant mismatches, the “optimal” con-trol policy calculated from a neural network model may not beoptimal when applied to the actual process (Zhang, 2004). Thusit is important that the calculated optimal control policy should bereliable.

The use of neural network model-based optimal control strat-egy is faced with two major challenges. The first challenge is thenon-robust performance of neural networks when they are appliedto unseen data and the second challenge is the need for power-ful global optimisation method that can effectively overcome theconventional problem of falling into local minima. Various tech-niques have been proposed to enhance the robustness of neuralnetwork models, such as aggregating multiple neural networks(Sridhar et al., 1996; Wolpert, 1992; Zhang et al., 1997), integrat-ing first principle knowledge into neural networks (Tian et al.,2001), and training neural networks with both static and dynamicprocess data (Zhang, 2001). Neural network models are typicallynonlinear and thus are rich in sub-optimal traps that can lock inthe traditional gradient-based optimisation methods. Conventionalgradient base optimisation techniques are not effective to deal withobjective functions with multiple local minima and can be trapped

in local minima. Therefore, population-based optimisation meth-ods such as genetic algorithms (Goldberg, 1989), particle swamoptimisation (Kennedy & Eberhart, 1995) and ant colony optimisa-tion (Dorigo & Gambardella, 1997) should be used to overcome thisproblem.

1 hemical Engineering 33 (2009) 1593–1601

mpottwibc

sspraT

2

siflmftbftnt

trfdati

V

p

wgb

rtpttc

hailirsoIt

Fig. 1. PSO with global search.

be formed. Fig. 2 shows the idea behind this concept. It can be seenfrom the figure the formation of the small groups in the community.These small groups will only communicate with members of theirown community (see Fig. 3). The expected result with this forma-

594 F. Herrera, J. Zhang / Computers and C

Particle swam optimisation (PSO) is a recently developed opti-isation technique that can cope with multiple local minima. This

aper proposes using PSO and stacked neural networks to find theptimal control policy for batch processes. In order to enhancehe reliability of the obtained optimal control policy, an additionalerm is added to the optimisation objective function to penalizeide model prediction confidence bounds. The proposed method

s demonstrated on a simulated fed-batch process. It is shown that,y incorporating model prediction reliability in the optimisationriteria, reliable control policy is obtained.

The paper is organised as follows. Section 2 presents particlewarm optimisation. The effectiveness of PSO algorithms is demon-trated on several benchmark optimisation problems. Section 3resents the modelling of a fed-batch reactor using stacked neu-al networks. Optimal control of the fed-batch reactor using PSOnd the stacked neural network models is presented in Section 4.he last section concludes the paper.

. Particle swarm optimisation

PSO was first proposed by Kennedy and Eberhart (1995) andtarted as a simulation of the social behaviour. They were search-ng for a simulation of the human behaviour in which a personalactor was needed in order to approximate the model to the realife. The main principle behind this optimisation routine is com-

unication. In real life individual members of a group can profitrom the discoveries and previous knowledge of other members ofhe community. In PSO there is a group of particles that looks for theest solution within the search area. If a particle finds a fitter answer

or the objective function, the particle will communicate this resulto the rest of the particles. Once the knowledge has been commu-icated the particles have two options: to follow the behaviour ofhe group or to follows its own search path.

All the particles in the PSO have “memory” and they modifyhese memorized values as the optimisation routine advances. Theecorded values are: velocity (V), position (p), best previous per-ormance (pbest) and best group performance (gbest). The first oneescribes how fast a particle should move from its actual positionnd the second one is the actual position. The last two parame-ers are the recorded best values that have been found during theterations. Eqs. (1) and (2) describe how these values change:

(k + 1) = wV(k) + C1r(pbest(k) − p(k)) + C2r(gbest(k) − p(k)) (1)

(k + 1) = p(k) + V(k + 1) (2)

here w is the halt parameter, C1 is the personal parameter, C2 is theroup parameter, k is the iteration index, and r is a random numberetween 0 and 1.

The above equations, proposed by Kennedy and Eberhart (1995),epresent the simplest case in the PSO routine. The whole PSO rou-ine is divided into two loops. In one loop all the positions of thearticles will be calculated and compared, during this loop the par-icles will communicate their results. The second loop includes allhe previous calculations. This loop is repeated until a terminationriterion has been met.

The parameters w, C1 and C2 play an important part in PSO. Thealt parameter (w) helps the particles to move around the searchrea. If it is too large the particles may miss the solution and if its too small they may not reach it. Good values are usually slightlyess than 1 (Kennedy & Eberhart, 1995). The coefficients C1 and C2ndicate the preference of the particles for personal or communal

esults. If the value of C1 is larger than C2 then the particles willearch for the best value within the best results obtained during itswn search and they will not try to reach a communal best point.f vice versa, the particles will not perform individual searches andhis will diminish the ability of the particles to perform “adven-

Fig. 2. Representation of the circular community.

turous” searches. Kennedy and Eberhart (1995) recommended thatthese values should be 2. This keeps a balance between the personaland communal search.

Four PSO algorithms are developed in this study and they willperform different ways to communicate the results within thecommunity. The first one is the simplest algorithm presented byKennedy and Eberhart (1995). In this algorithm, the particles havethe ability to communicate its result to all the members of thecommunity (see Fig. 1 for an illustration).

The other three algorithms are based on local searches that areperformed within small groups formed in the community. In thefirst case, the group is based on a circular community (Kennedy &Mendes, 2006). This community is not geographical, it is a “social”community that it is formed by members that are closed to eachother according to the index that they have. For example, if a swarmcontains 10 particles, then small communities of three particles can

Fig. 3. Search within the members of the communities.

F. Herrera, J. Zhang / Computers and Chemical Engineering 33 (2009) 1593–1601 1595

ta

aaca

pittkoa

ec

m

TtdTv

OPpsm

pn

Fig. 6. Surface plot for the optimisation problem.

Fig. 4. Cluster community communication.

ion is that the particles will search more intensively the solutionrea.

The second case proposed for the local search is presented ascluster community. In this case, small groups are also formed

ccording to the index of the particles. The difference with the cir-ular community is the fact that only one particle will communicatend compare the results with members of others groups (see Fig. 4).

The last option performs a geographical search. In this case thearticles will communicate with the particles that are close to them

n the solution area. Fig. 5 presents an example of this search. Par-icle number 9 will communicate its results only with the particleshat are within the search circle. This means that particle 9 willnow if particles 1, 4, 5, and 7 have found a better result for thebjective function. The expected results are that the local searchlgorithms explore more intensively the search area.

The PSO codes were tested using some benchmark functions. Anxample is given by Eq. (3). The objective of the problem is to cal-ulate the maximum value that can be obtained from the function:

ax F = 1

0.1 +(∑2

i=1(x2i/4000) −

∏2i=1cos (xi/

√i) + 1

) (3)

his optimisation problem has several local maxima. Fig. 6 showshe surface plot of the objective function and Fig. 7 presents theifferent local maxima for x1 and x2 within the ranges (−10, 10).he global optimum for this function is located at [0,0] and thealue of the objective function is 10.

The problem was solved using the PSO codes and the MATLABptimisation Toolbox function fminunc. The parameters used forSO command are: 20 particles, 0.1 as halt, 2 for the group andersonal parameter. For the local versions the communities were

elected to be 7 for the circular community, 5 for the cluster com-unity and 5 for the geographical search code.

Using the PSO codes the global optimum was found without anyroblems. On the other hand, the fminunc optimisation routine didot find the global optimum in all the cases. This is because fminunc

Fig. 5. Geographical search.

Fig. 7. Locations of global and local optima for x1 and x2 within the ranges (−10, 10).

requires as input the initial conditions for the search. Therefore, if aninitial condition close to [0,0] was selected, then fminunc could findthe global optimum quickly. If an initial condition far from the globaloptimum was introduced, then fminunc could not find the globalmaximum value of the function and a local maximum is obtained.Table 1 presents some of the results from fminunc.

The PSO algorithms can also be applied to problems that con-tain constraints in their formulation by using the penalty functionapproach (Mathur, Sachin, Sidhartha, Jayaraman, & Kulkarni, 2000).The penalty function approach consists of the use of a term that

is added to the objective function. If one of the solutions violatesa constraint then the penalty term is increased. Therefore, if aparticle finds a good solution for the objective function f(x) butthis violates any constraints, then the penalty parameter will be

Table 1Results from MATLAB Optimisation Toolbox function fminunc.

Initial valuex0 = (x1,0, x2,0)

Solution (x1, x2) Objective functionvalue

(1, 1) (−0.0654 × 10−6, −0.2141 × 10−6) 10.0000(2, 2) (3.1400, 4.4384) 9.3113(3, 3) (3.1400, 4.4384) 9.3113(1, 3) (6.2800, 17.7538) 5.2982(−1, −1) (0.0573 × 10−6, 0.2251 × 10−6) 10.0000(−3, −1) (−3.1400, −4.4385) 9.3113

1596 F. Herrera, J. Zhang / Computers and Chemical Engineering 33 (2009) 1593–1601

Table 2Constrained benchmark optimisation problems.

Test function Global optimum

1

max F = x21 + x2

2 + x23

s.t

4(x1 − 0.5)2 + 2(x2 − 0.2)2 + x23 + 0.1x1x2 + 0.2x2x3 ≤ 16

2x21 + x2

2 − 2x23 ≥ 2

X = [0.989,2.674,−1.884] F = 11.68

2

max F = −((x1 − 2)2 − (x2 − 1)2)

s.t.

x1 − 2x2 + 1 = 0

−(

x21

4

)− x2

2 + 1 ≥ 0

X = [0.823,0.911] F = −1.3777

3

min F = −12x1 − 7x2 + x22

s.t.

−2x41 + 2 − x2 = 2

X = [0.718,1.47] F = −16.7389

4

min F = x21 + x2

2 + 223 + x2

4 − 5x1 − 5x2 − 21x3 + 7x4

s.t.

x21 + x2

2 + x23 + x2

4 + x1 − x2 + x3 − x4 − 8 ≤ 0

x21 + 2x2

2 + x23 + 2x2

4 − x1 − x4 − 10 ≤ 0

2x21 + x2

2 + x23 + 2x1 − x2 − x4 − 5 ≤ 0

X = [0,1,2,−1] F = −44

5

min F = −10.5x1 − 7.5x2 − 3.5x3 − 2.5x4 − 1.5x5 − 10x6 − 0.5

5∑i−1

x2i

s.t

6x1 + 3x2 + 3x3 + 2x4 + x5 ≤ 6.5

10x1 + 10x3 + x6 ≤ 20

X = [0,1,0,1,1,20] F = −213

Table 3Results obtained using PSO global search in constrained problems.

Limits Particles Iterations Results

Lower Upper

1 [−10 −10 −10] [10 10 10] 20 90 [0.999,2.666,1.886]2345

irpbtpt

P

e

TR

12345

[−10 −10] [10 10][−10 −10] [10 10][−10 −10 −10 −10] [10 10 10 10][0 0 0 0 0 0] [1 1 1 1 1 20]

ncreased reducing the usefulness of the new answer. Eq. (4) rep-esents the new objective function. The parameter C represents theenalties for the m constraints of the problem. The coefficient � cane decreased or increased in order to modify the value of the penal-ies. If the particles are violating the problem constraints then thearameter can be increased, this will force them to select a solutionhat is located within the search area:

m∑

= f (x) + �

i=1

Ci(x)2 (4)

A set of benchmark-constrained optimisation problems (Mathurt al., 2000) given in Table 2 was solved using the PSO codes devel-

able 4esults obtained using PSO circular community in constrained problems.

Limits P

Lower Upper

[−10 −10 −10] [10 10 10] 1[−10 −10] [10 10][−10 −10] [10 10][−10 −10 −10 −10] [10 10 10 10][0 0 0 0 0 0] [1 1 1 1 1 20]

20 54 [0.824,0.911]20 67 [0.717,1.471]

200 150 [0.020,1.037,1.970,−1.017]100 20 [0,1,0,1,1,20]

oped in this study. Tables 3–5 give the optimisation results. It canbe seen that the global optima were found in all the cases. It can beappreciated that the performance of the global version of the PSOseems to be better. It requires fewer cycles to get an approximationof the answer and also in average it requires fewer particles to findit.

3. Modelling of a fed-batch process using neural networks

3.1. A fed-batch process

The fed-batch reactor used in this work was taken from(Terwiesch, Ravemark, Schenker, & Rippin, 1998). The batch reactor

articles Iterations Size Result

00 159 17 [0.989,2.673, −1.884]10 36 5 [0.824,0.911]20 155 7 [0.717,1.471]50 150 17 [0,1,2, −1]50 28 17 [0,1,0,1,1,20]

F. Herrera, J. Zhang / Computers and Chemical Engineering 33 (2009) 1593–1601 1597

Table 5Results obtained using PSO cluster in constrained problems.

Limits Particles Iterations Size Result

Lower Upper

1 [−10 −10 −10] [10 10 10] 100 52 20 [0.963,2.685,−1.88]2 103 204 1005 100

i

TdmBicltsp

te

Icutrt

3

rbttdidsBswn1oi

ual networks on bootstrap re-samples of the original training data(Zhang, 1999). In bootstrap re-sampling with replacement, replica-tions of the original data set are obtained by randomly samplingfrom the original data set. Proper determination of the stackingweights is essential for good modelling performance. A popular

[−10 −10] [10 10][−10 −10] [10 10][−10 −10 −10 −10] [10 10 10 10][0 0 0 0 0 0] [1 1 1 1 1 20]

s based on the following reaction system:

A + Bk1−→C

B + Bk2−→D

his reaction is conducted in an isothermal semi-batch reactor. Theesired product in this system is C. The objective is to convert asuch as possible of reactant A by the controlled addition of reactant

, in a specified time tf = 120 min. It is not appropriate to add all Bnitially because the second reaction will take place, increasing theoncentration of the undesired by-product D. Therefore, to keep aow concentration of by-product D and at the same time increasehe concentration of product C, the reactant B has to be fed in atream with concentration bfeed = 0.2. A mechanistic model for thisrocess can be found in Terwiesch et al. (1998).

Based on the reaction kinetics and material balances in the reac-or, the following mechanistic model can be developed (Terwiescht al., 1998).

d[A]dt

= −k1[A][B] − [A]V

u (5)

d[B]dt

= −k1[A][B] − 2k2[B]2 + bfeed − [B]V

u (6)

d[C]dt

= k1[A][B] − [C]V

u (7)

d[D]dt

= 2k2[B]2 − [D]V

u (8)

dV

dt= u (9)

n the above equations, [A], [B], [C], and [D] denote, respectively, theoncentrations of A, B, C, and D, V is the current reaction volume,is the reactant feed rate, and the reaction rate constants have

he nominal value k1 = 0.5 and k2 = 0.5. At the start of reaction, theeactor contains [A](0) = 0.2 mole/l of A, no B ([B](0) = 0) and is fedo 50% (V(0) = 0.5 m3).

.2. Stacked neural networks

The developed neural network models need to be accurate andeliable in order to be applied to optimisation control of the fed-atch reactor. A limitation of single neural network models is thathey can lack generalisation when applied to unseen data, i.e. therained neural network gives good performance on the trainingata but gives unsatisfactory performance on unseen data which

s not used in the training process. Various techniques have beeneveloped to improve neural network generalisation capability,uch as regularisation (Bishop, 1991), early stopping (Bishop, 1995),ayesian learning (MacKay, 1992), training with both dynamic andtatic process data (Zhang, 2001), and combining multiple net-

orks through stacked neural networks or bootstrap aggregatedeural networks (Sridhar et al., 1996; Wolpert, 1992; Zhang et al.,997). In the training with regularisation approach, the magnitudef network weight is introduced as a penalty term in the train-

ng objective function and unnecessarily large network weights

34 5 [0.824,0.911]200 10 [0.717,1.471]150 10 [0.012,0.987,2.012,−0.986

15 10 [0,1,0,1,1,20]

are avoided. In the training with early stopping approach, neu-ral network performance on the testing data is monitored duringthe training process and the training stops when the neural net-work prediction errors on the testing data start to increase. Amongthese techniques, combining multiple networks is a very promis-ing approach to improving model predictions on unseen data. Theemphasis of this approach is on generalisation accuracy on futurepredictions (i.e. predictions on unseen data). When building neu-ral network models, it is quite possible that different networksperform well in different regions of the input space. By combin-ing multiple neural networks through stacked neural networks orbootstrap aggregated neural networks, prediction accuracy on theentire input space could be improved. Stacked neural networkshave been successfully used for the inferential estimation of poly-mer quality (Zhang et al., 1997), prediction of final product quality(Zhang, Morris, Martin, & Kiparissides, 1998), and estimation ofreactive impurities and reactor fouling (Zhang, Morris, Martin, &Kiparissides, 1999) in a batch polymerisation process.

Fig. 8 presents a stacked neural network model. The overall out-put of the aggregated neural network is a weighted combination ofthe individual neural network outputs. This can be represented bythe following equation:

f (X) =n∑

i=1

wifi(X) (10)

where f(X) is the aggregated neural network predictor, fi(X) is theith neural network, wi is the aggregating weight for combining theith neural network, n is the number of neural networks, and X is avector of neural network inputs. Individual networks are developedwith different network structures, different initial weights, and/ordifferent data sets. One effective approach is to train the individ-

Fig. 8. A stacked neural network model.

1 hemic

crPpditbncei(

ab(m

wwndnc

3n

d[

y

y

wtf

cw

598 F. Herrera, J. Zhang / Computers and C

hoice of stacking weights is simple averaging, i.e. the stacked neu-al network output is an average of the individual network outputs.errone and Cooper (1993) show that combining n independentredictors by simple averaging can reduce the mean squared pre-iction errors by a factor of n. This result is interesting although the

ndividual models are generally not independent. An implication ofhis result is that significant improvement in model prediction cane obtained if dissimilar models are combined. Since the individualeural networks are highly correlated, appropriate stacking weightsould be obtained through principal component regression (Zhangt al., 1997). Instead of using constant stacking weights, the stack-ng weights can also dynamically change with the model inputsAhmad & Zhang, 2005, 2006).

Another advantage of stacked neural network or bootstrapggregated neural network is that model prediction confidenceounds can be calculated from individual network predictionsZhang, 1999). The standard error of the ith predicted value is esti-

ated as

e ={

1n − 1

n∑b=1

[y(xi; Wb) − y(xi; ·)]2

}1/2

(11)

here y(xi; ·) =∑n

b=1y(xi; Wb)/n and n is the number of neural net-orks in a stacked neural network. Assuming that the individualetwork prediction errors are normally distributed, the 95% pre-iction confidence bounds can be calculated as y(xi;·) ± 1.96�e. Aarrower confidence bound, i.e. smaller �e, indicates that the asso-iated model prediction is more reliable.

.3. Modelling the fed-batch process using stacked neuraletworks

Neural network models for the prediction of the amount ofesired product [C](tf)V(tf) and the amount of undesired by-productD](tf)V(tf) at the final batch time are of the form:

1 = f1(U) (12)

2 = f2(U) (13)

here y1 = [C](tf)V(tf), y2 = [D](tf)V(tf), U = [u1, u2, . . ., u10]T is a vec-

or of the reactant feed rates during a batch, f1 and f2 are nonlinearunctions represented by neural networks.

For the development of neural network models simulated pro-ess operation data from 50 batches with different feeding profilesere generated using the mechanistic model of the process. In each

Fig. 9. Performance of individual netw

al Engineering 33 (2009) 1593–1601

batch, the batch duration is divided into 10 equal stages. Withineach stage, the feed rate is kept constant. The control policy for abatch consists of the feed rates at these 10 stages. Dividing the batchtime into more stages will increase the degree of freedom in con-structing the control policy but it will also increase the computationburden.

In the stacked neural network models several individual net-works are trained using bootstrap re-sampling of the original data(Efron, 1982). The individual network outputs are combined to givethe final model output. For each of the stacked neural network mod-els, a group of thirty individual neural networks were developed.Each neural network contains in the hidden layer three nodes. Thenumber of hidden nodes was selected based on the performanceon the testing data. The nodes in the hidden layer use a hyperbolictangent activation function while that in the output layer uses a lin-ear activation function. The stacked neural network output is takenas the average of the individual network outputs.

Figs. 9 and 10 show, respectively, the performance of individ-ual networks and stacked networks for predicting the amount ofdesired product [C](tf)V(tf) on the training and unseen validationdata sets. Note that the unseen validation data are of an interpola-tion nature and are within the range covered by the training data.Model generalisation capability can be verified by examining itsperformance on the unseen validation data. Fig. 9 indicates thatin some networks the SSE on the training data is small but thisis not the case on the unseen validation data. These results showthat individual networks are not reliable. It can be seen from Fig. 10that stacked networks give consistent performance on the trainingdata and on the unseen validation data. The performance gradu-ally improves when more networks are combined and approachesa stable level. This is observed on both the training and unseenvalidation data. This result indicates that the stacked model for pre-dicting the amount of desired product [C](tf)V(tf) is more reliable asthe number of individual networks is increased. It does not matterif some networks do not have a good performance, what matters isthe communal performance of the group.

Using the developed stacked neural network models the con-centrations of the product and by-product at the end of the batchwere predicted with confidence bounds. Figs. 11 and 12 presentthe stacked neural network model predictions for [C](tf)V(tf) and

[D](tf)V(tf), respectively. The 95% model prediction confidencebounds are also shown in the figures. The comparison between thepredicted values and the real ones can be seen from the figures. Thelines located above and below the points indicate the confidencebounds of the model predictions. It can be seen that, although the

orks for predicting [C](tf)V(tf).

F. Herrera, J. Zhang / Computers and Chemical Engineering 33 (2009) 1593–1601 1599

Fig. 10. Performance of stacked networks for predicting [C](tf)V(tf).

Fig. 11. Model predictions and their 95% confidence bounds for [C](tf)V(tf).

Fig. 12. Model predictions and their 95% confidence bounds for [D](tf)V(tf).

1 hemical Engineering 33 (2009) 1593–1601

psv

4

to

w0a

tatbcfvaPtk

nooatbcpTc2lfs

Table 6Parameters used in PSO algorithms.

PSOG1 PSOG2 PSOG3 PSOG4 PSOL1 PSOL2 PSOL3 PSOL4

Particles 50 70 50 70 20 40 20 40Halt 0.01 0.01 0.005 0.005 0.01 0.01 0.005 0.005

Table 7Values of ([C](tf) − [D](tf))V(tf) on neural network models and actual process.

Mechanisticmodel

Single neural network Stacked neural network

Neuralnetwork

Process Neuralnetwork

Process

fmincon 0.0381 0.0411 0.0314 0.0304 0.0363PSOG1 0.0382 0.0400 0.0344 0.0296 0.0359PSOG2 0.0377 0.0405 0.0319 0.0297 0.0370PSOG3 0.0381 0.0399 0.0325 0.0302 0.0358PSOG4 0.0379 0.0396 0.0347 0.0300 0.0368PSO 0.0373 0.0377 0.0341 0.0298 0.0338

600 F. Herrera, J. Zhang / Computers and C

redictions are quite accurate, the predictions are more reliable forome batches than for other batches. This confidence indication isery useful as will be demonstrated in the next section.

. Optimising control using PSO

The objective of the optimisation is to maximise the amount ofhe final product while reducing the amount of the by-product. Theptimisation problem solved in this work is

minU

J = {˛1[D](tf ) − ˛2[C](tf )}V(tf )

s.t.0 ≤ ui ≤ 0.01, (i = 1, 2, · · ·, 10)V(tf ) = 1

here ˛1 and ˛2 are weighting parameters which were both set to.5 in this study, U is a vector of control actions (reactant feed rates),nd V is the reaction volume.

For the solution using the developed PSO codes different condi-ions were selected. For the global version the number of particlesnd the halt parameter were modified. The options for the size ofhe community were 50 and 70. The halt parameter was changedetween 0.01 and 0.005. The particle population for the local PSOode was also modified. This was changed from 20 to 40. The optionsor the halt parameter were the same as the ones used for the globalersion. The sizes of the internal communities were kept the same inll the cases, 17 particles. Table 6 lists the parameters used in globalSO (PSOG1 to PSOG4) and local PSO (PSOL1 to PSOL4) algorithms. Forhe local PSO algorithms, the sizes of the internal communities wereept the same in all the cases: 17 particles.

For the purpose of comparison, optimisation using a mecha-istic model and using a single neural network were first carriedut. Table 7 shows the obtained results. The optimisation resultsbtained from the mechanistic model define the upper limit of thechievable performance of neural network model-based optimisa-ion. As can be seen from the table, the values for the differenceetween the final amounts of product and by-product using the PSOodes are similar to that obtained using the sequential quadraticrogramming (SQP) implemented by the MATLAB Optimisationoolbox function, fmincon, in this fed-batch reactor. However, PSO

an cope with multiple local minima in general as shown in Section. As expected, the performance of optimal control policies calcu-

ated from single neural networks is not as good as that calculatedrom the mechanistic model due to model plant mismatches. Fig. 13hows the mechanistic model-based optimal control profile calcu-

Fig. 13. Control policy calculated using SQP with the mecha

L1

PSOL2 0.0376 0.0407 0.0307 0.0298 0.0364PSOL3 0.0367 0.0394 0.0364 0.0297 0.0348PSOL4 0.0370 0.0397 0.0301 0.0297 0.0363

lated using SQP and the associated profiles of [C] and [D]. Controlpolicies obtained using PSO codes are similar to that obtained usingSQP.

It can also be appreciated that an increment in the number ofparticles in the global version of the PSO code does not help thecode to find a better solution for the optimisation problem. Thiscould indicate that the PSO code only needed a minimum numberof particles and the inclusion of more particles will not be helpful.A different behaviour was encountered in the local version of thePSO. When more particles were used for the solution of the prob-lem, then the code required less number of iterations to solve theproblem.

Changing the value of the halt did not show any improvement inthe performance. As can be seen from the table, the results obtainedusing different halt values are similar. The results indicate that thelocal version of PSO can find a similar answer to the problem usingfewer particles than the global version of PSO.

Once the optimal feed rates were obtained, they were appliedto the actual process (i.e. simulation by the mechanistic model of

the process). Table 7 shows the difference between the amounts ofthe final product and by-product on neural network model and theactual process. It can be seen from Table 7 that the actual amountsof product and by-product under these “optimal” control policiesare quite different from the neural network model predictions. This

nistic model and the associated profiles of [C] and [D].

hemic

icomi

o(rf

wma

ttmbcolppo

5

pfnfigdnipf

R

A

A

of Measurement and Control, 27, 391–410.Zhang, J. (2005b). Modelling and optimal control of batch processes using recurrent

F. Herrera, J. Zhang / Computers and C

ndicates that the single neural network-based optimal control poli-ies are only optimal on the neural network model and are notptimal on the real process. Hence, they are not reliable. This isainly due to the model plant mismatches, which is unavoidable

n data-based modelling.A method to overcome the impact of model plant mismatch on

ptimisation performance was previously investigated by Zhang2004) where model prediction confidence bounds are incorpo-ated as a penalty in the objective function. Therefore, the objectiveunction can be modified as

minU

J = {˛1[D](tf ) − ˛2[C](tf )}V(tf ) + ˛3(stderr[C] + stderr[D])

s.t.

0 ≤ ui ≤ 0.01, (i = 1, 2, · · ·, m)

V(tf ) = 1

here stderr[C] and stderr[D] are the standard errors of the stackedodels, ˛3 is a weighting factor for model prediction confidence

nd was selected as 0.5 in this work.Table 7 shows the results obtained using the new objective func-

ion with stacked neural network models. It can be seen from Table 7hat the modified objective function with stacked neural network

odels lead to better performance on the actual process. It cane appreciated that the actual performance of the control policiesalculated from stacked neural networks with the above-modifiedbjective function is very close to that of the control policies calcu-

ated using the mechanistic model. This demonstrates that contrololicies obtained using stacked neural networks considering modelrediction confidence bounds is much more reliable than thosebtained using a single neural network model.

. Conclusions

The study demonstrates that particle swam optimisation is aowerful optimisation technique, especially when the objective

unction has several local minima. Conventional optimisation tech-iques could be trapped in local minima but PSO could in generalnd the global minimum. Stacked neural networks can not onlyiven better prediction performance but also provide model pre-iction confidence bounds. In order to improve the reliability ofeural network model-based optimisation, an additional term is

ntroduced in the optimisation objective to penalize wide modelrediction confidence bound. The proposed technique is success-

ully demonstrated on a simulated fed-batch reactor.

eferences

hmad, Z., & Zhang, J. (2005). Bayesian selective combination of multiple neuralnetworks for improving long range predictions in nonlinear process modelling.Neural Computing & Applications, 14, 78–87.

hmad, Z., & Zhang, J. (2006). Combination of multiple neural networks using datafusion techniques for enhanced nonlinear process modelling. Computers & Chem-ical Engineering, 30, 295–308.

al Engineering 33 (2009) 1593–1601 1601

Bishop, C. (1991). Improving the generalisation properties of radial basis functionneural networks. Neural Computation, 13, 579–588.

Bishop, C. (1995). Neural networks for pattern recognition. Oxford: Oxford UniversityPress.

Bonvin, D. (1998). Optimal operation of batch reactors: A personal view. Journal ofProcess Control, 8, 355–368.

Dorigo, M., & Gambardella, L. M. (1997). Ant colonies for the travelling salesmanproblem. Biosystems, 43, 73–81.

Efron, B. (1982). The Jackknife, the bootstrap and other resampling plans. Philadelphia:Society for Industrial and Applied Mathematics.

Goldberg, D. E. (1989). Genetic algorithms in search, optimisation and machine learning.Addison-Wesley Publishing Company.

Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of the1995 IEEE international conference on neural networks Perth, Australia, vol. VI, (pp.1942–1948).

Kennedy, J., & Mendes, R. (2006). Neighbourhood topologies in fully informed andbest of neighbourhood particle swarms. IEEE Transactions on Systems, Man andCybernetics, 36, 515–519.

Luus, R. (1991). Effect of the choices of the final time in optimal control of non-linearsystems. Canadian Journal of Chemical Engineering and Research, 30, 1525–1530.

MacKay, D. J. C. (1992). Bayesian interpolation. Neural Computation, 4, 415–447.Mathur, M., Sachin, K., Sidhartha, P., Jayaraman, V., & Kulkarni, B. D. (2000). Ant

colony approach to continuous functions optimization. Industrial & EngineeringChemistry Research, 39(10), 3814–3822.

Morris, A. J., Montague, G. A., & Willis, M. J. (1994). Artificial neural networks: Studiesin process modelling and control. Transactions of IChemE, 72, 3–19.

Park, S., & Ramirez, W. F. (1988). Optimal production of secreted protein in fed-batchreactors. AIChE Journal, 34, 1550–1558.

Perrone, M. P., & Cooper, L. N. (1993). When networks disagree: Ensemble methodsfor hybrid neural networks. In R. J. Mammone (Ed.), Artificial neural networks forspeech and vision (pp. 126–142). London: Chapman and Hall.

Sridhar, D. V., Seagrave, R. C., & Bartlett, E. B. (1996). Process modelling using stackedneural networks. AIChE Journal, 42, 2529–2539.

Terwiesch, P., Ravemark, D., Schenker, B., & Rippin, D. W. T. (1998). Semi-batch pro-cess optimization under uncertainty: Theory and experiments. Computers &Chemical Engineering, 22, 201–213.

Tian, Y., Zhang, J., & Morris, A. J. (2001). Modeling and optimal control of a batchpolymerization reactor using a hybrid stacked recurrent neural network model.

Industrial & Engineering Chemistry Research, 40, 4525–4535.Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5, 241–259.Zhang, J., Martin, E. B., Morris, A. J., & Kiparissides, C. (1997). Inferential estima-

tion of polymer quality using stacked neural networks. Computers & ChemicalEngineering, 21, s1025–s1030.

Zhang, J., Morris, A. J., Martin, E. B., & Kiparissides, C. (1998). Prediction of polymerquality in batch polymerisation reactors using robust neural networks. ChemicalEngineering Journal, 69, 135–143.

Zhang, J., Morris, A. J., Martin, E. B., & Kiparissides, C. (1999). Estimation of impurityand fouling in batch polymerisation reactors through the application of neuralnetworks. Computers & Chemical Engineering, 23(3), 301–314.

Zhang, J. (1999). Developing robust non-linear models through bootstrap aggregatedneural networks. Neurocomputing, 25, 93–113.

Zhang, J. (2001). Developing robust neural network models by using both dynamicand static process operating data. Industrial & Engineering Chemistry Research,40, 234–241.

Zhang, J. (2004). A reliable neural network model based optimal control strategy fora batch polymerisation reactor. Industrial & Engineering Chemistry Research, 43,1030–1038.

Zhang, J. (2005a). A neural network based strategy for the integrated batch-to-batchcontrol and within batch control of batch processes. Transactions of the Institute

neuro-fuzzy networks. IEEE Transactions on Fuzzy Systems, 13(4), 417–427.Zhao, S. J., Zhang, J., & Xu, Y. M. (2006). A nonlinear projection to latent structures

method and its applications. Industrial & Engineering Chemistry Research, 45,3843–3852.