Multi-phase principal component analysis for batch processes modelling

10
Multi-phase principal component analysis for batch processes modelling José Camacho , Jesús Picó Departamento de Ingeniería de Sistemas y Automática, Universidad Politécnica de Valencia, Camino de Vera s/n, 46022, Valencia, Spain Received 26 July 2005; received in revised form 9 November 2005; accepted 14 November 2005 Available online 10 January 2006 Abstract Projection to latent structures based methods have been widely used for process monitoring and many extensions to batch processes have been reported. When data from a process includes nearly non-correlated groups of variables (for example, in a batch process, because of their distance in time), it can be advantageous to model theses groups separately. Additionally, traditional methods have an important drawback: they can only model linear combinations of variables. When a batch process shows non-linear dynamics in its variation around the average trajectory, linear models obtain poor performance. Traditionally, in process modelling, two solutions for non-linearity have been implemented: non-linear models and local linear models. In this paper, an algorithm for the detection of phases during the batch processing, where the behavior of the process can be well approximated by a linear local model, is presented. © 2005 Elsevier B.V. All rights reserved. Keywords: Statistical process monitoring; Batch process; Multi-stage model; Local models; Piece-wise modelling 1. Introduction Batch processes play an important role in the pharmaceu- tical, biochemical, food, etch and metal industry. They are processes of finite duration which mainly consist of three steps: charge of the vessel, processing and discharge [1]. Raw materials and control variables follow a specific recipe in order to obtain under specification products. Thus, initial and processing conditions have to be controlled. The non-linear nature of most batch processes makes modelling a very challenging task. Statistical methods, especially projection to latent structure-based methods, have proven to be a very interesting tool for the modelling of batch processes. This kind of models are very different to the idea of model in the control theory. Statistical Process Control (SPC) or, more correctly, Statistical Process Monitoring, is based on modelling the variability of the process variables around their average. In the particular case of batch processes one has an average trajectory of the process variables. Subtracting this average trajectory most of the non-linear dynamics is subtracted and so linear models can be used [2]. Data obtained during the process are information poor [3] because variables measured are highly correlated. The infor- mation (the variability) is hidden inside the data in the form of combinations of variables: the latent structures. The dimension of the latent space is much lower than the original variable space. Models obtained from the projection of the data to the latent space have lower dimension and better statistical properties. Principal component analysis (PCA) and partial least squares (PLS) are the most used methods to discover the latent structures of the data. The data set obtained from a batch process is three- dimensional. Matrix X(I × J × K) contains the measures of J variables at K time instants 1 in I batches. Traditional projection to latent structures based methods obtain the latent structures of two-way data. Thus, extensions of these methods had to be developed. Multi-way (or unfolded) PCA and PLS [46] unfold the three-way matrix to use PCA and PLS, respectively. Other methods analyze the data matrix without the necessity of unfolding, like PARAFAC and Tucker3 [7,8]. Prior to the analysis, batch data has to be preprocessed. The most suitable preprocessing method for batch process Chemometrics and Intelligent Laboratory Systems 81 (2006) 127 136 www.elsevier.com/locate/chemolab Corresponding author. E-mail address: [email protected] (J. Camacho). 1 In general, we will use the name time instantsfor the K mode. However, the data can be measured at regular intervals of an indicator variable, instead of time. 0169-7439/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.chemolab.2005.11.003

Transcript of Multi-phase principal component analysis for batch processes modelling

Page 1: Multi-phase principal component analysis for batch processes modelling

ry Systems 81 (2006) 127–136www.elsevier.com/locate/chemolab

Chemometrics and Intelligent Laborato

Multi-phase principal component analysis for batch processes modelling

José Camacho ⁎, Jesús Picó

Departamento de Ingeniería de Sistemas y Automática, Universidad Politécnica de Valencia, Camino de Vera s/n, 46022, Valencia, Spain

Received 26 July 2005; received in revised form 9 November 2005; accepted 14 November 2005Available online 10 January 2006

Abstract

Projection to latent structures based methods have been widely used for process monitoring and many extensions to batch processes have beenreported. When data from a process includes nearly non-correlated groups of variables (for example, in a batch process, because of their distancein time), it can be advantageous to model theses groups separately. Additionally, traditional methods have an important drawback: they can onlymodel linear combinations of variables. When a batch process shows non-linear dynamics in its variation around the average trajectory, linearmodels obtain poor performance. Traditionally, in process modelling, two solutions for non-linearity have been implemented: non-linear modelsand local linear models. In this paper, an algorithm for the detection of phases during the batch processing, where the behavior of the process canbe well approximated by a linear local model, is presented.© 2005 Elsevier B.V. All rights reserved.

Keywords: Statistical process monitoring; Batch process; Multi-stage model; Local models; Piece-wise modelling

1. Introduction

Batch processes play an important role in the pharmaceu-tical, biochemical, food, etch and metal industry. They areprocesses of finite duration which mainly consist of three steps:charge of the vessel, processing and discharge [1]. Rawmaterials and control variables follow a specific recipe inorder to obtain under specification products. Thus, initial andprocessing conditions have to be controlled.

The non-linear nature of most batch processes makesmodelling a very challenging task. Statistical methods,especially projection to latent structure-based methods, haveproven to be a very interesting tool for the modelling of batchprocesses. This kind of models are very different to the idea ofmodel in the control theory. Statistical Process Control (SPC)or, more correctly, Statistical Process Monitoring, is based onmodelling the variability of the process variables around theiraverage. In the particular case of batch processes one has anaverage trajectory of the process variables. Subtracting thisaverage trajectory most of the non-linear dynamics is subtractedand so linear models can be used [2].

⁎ Corresponding author.E-mail address: [email protected] (J. Camacho).

0169-7439/$ - see front matter © 2005 Elsevier B.V. All rights reserved.doi:10.1016/j.chemolab.2005.11.003

Data obtained during the process are information poor [3]because variables measured are highly correlated. The infor-mation (the variability) is hidden inside the data in the form ofcombinations of variables: the latent structures. The dimensionof the latent space is much lower than the original variablespace. Models obtained from the projection of the data to thelatent space have lower dimension and better statisticalproperties. Principal component analysis (PCA) and partialleast squares (PLS) are the most used methods to discover thelatent structures of the data.

The data set obtained from a batch process is three-dimensional. Matrix X(I×J×K) contains the measures of Jvariables at K time instants1 in I batches. Traditional projectionto latent structures based methods obtain the latent structures oftwo-way data. Thus, extensions of these methods had to bedeveloped. Multi-way (or unfolded) PCA and PLS [4–6] unfoldthe three-way matrix to use PCA and PLS, respectively. Othermethods analyze the data matrix without the necessity ofunfolding, like PARAFAC and Tucker3 [7,8].

Prior to the analysis, batch data has to be preprocessed. Themost suitable preprocessing method for batch process

1 In general, we will use the name “time instants” for the K mode. However,the data can be measured at regular intervals of an indicator variable, instead oftime.

Page 2: Multi-phase principal component analysis for batch processes modelling

128 J. Camacho, J. Picó / Chemometrics and Intelligent Laboratory Systems 81 (2006) 127–136

monitoring is subtracting the average trajectory and auto-scaling [7]. Nonetheless, many batch processes show non-lineardynamics in the variation around the mean trajectory and thelinear models designed with PCA/PLS obtain poor perfor-mance. Traditionally, non-linearity is overcome by using a non-linear model or local linear models.

In batch processes, multi-stage models are constructed bymodelling any stage (see Appendix) of the process with adifferent model (Fig. 1). This idea seems to be similar to that ofmulti-block models [9], where variables are arranged in blocks.Nonetheless, in the multi-block approach, a hierarchicalarchitecture is used to develop the model, with a high levelwhere the information of the blocks is combined. Then, unlikein multi-stage modelling, the models of the different blocks (orstages) are not generated independently. Thus, the differentmodels in the multi-stage approach are local to a stage while themulti-block models do not possess this property of locality.From here on, when talking about multiple models, we willassume that those models are generated independently.

Kosanovich et al. [2] proposed to divide the model in twobased on changes in the variance explained by the principalcomponents during the batch. Lennox et al. [10] suggest the useof multiple models to account for changes in the correlationstructure when using SPE charts. Louwerse and Smilde [11]divide the batch in segments for on-line monitoring. Each modelrepresents the batch from its beginning to the end of a segment.These segments are obtained from process knowledge or bydividing in regular intervals. The method named Sub-PCA [12]uses a clustering algorithm to detect segments of the batchwhich can be modelled by the same model. Authors use thismethod for on-line monitoring. Ündey et al. suggested to useseveral models to represent the multi-stage/multi-phase natureof the process for end-of-batch [13] and on-line [14]monitoring. Inspired on their nomenclature, we will call“stages” the different steps of the batch processing and “phases”the segments of the batch well approximated by a linear model.

In this paper, a new algorithm for the detection of phases in abatch process is presented. The authors believe this is the firstattempt to automatically divide the batch in segments in order toimprove the prediction power of the model in an end-of-batchmonitoring framework.

XI

J

K

X1 X3X2I

JK1 K2

I

J

I

JK3

Fig. 1. Sub-matrices in the time mode representing stages or phases. Each sub-matrix is modelled separately.

The rest of the paper is organized as follows: Section 2introduces the problem of utilizing a single linear model for themonitoring of some batch processes. Section 3 describes thealgorithm used to detect phases in the process. Section 4presents the data sets used for experimentation. Section 5 showsexperimental results. Section 6 gives some conclusions, remarksand future research lines. The Appendix defines some of theterms used throughout the paper.

2. Multi-phase batch processes

The most accepted way to unfold the matrix X in batchprocess monitoring is the batch-wise unfolding (1), althoughother unfolding directions have been used [15].

XPðI � J � KÞZXðI � JKÞ ð1ÞThe resulting number of variables in matrix X may be very

large. The variability around the average will be modelled witha multi-normal distribution of the same dimension of the data.Then, in end-of-batch monitoring, the compression performedby PCA is instrumental in the analysis of the data. Moreover,the resulting set of variables shows better statistical properties.

PCA extracts the orthogonal directions of maximumvariance in the data, the principal components (PCs), whichare linear combinations of the variables. The direction of thePCs are obtained from the eigenvectors of the variance–covariance matrix of X. The eigenvectors are arranged as thecolumns of the loading matrix, ordered by explained variance. Ifonly the first PCs are included in the model, the varianceexplained by the residuals is treated as noise. PCA follows theEq. (2)

X ¼ TP Vþ E ð2Þwhere X(I×JK) is the data matrix after batch-wise unfoldingand preprocessing, P(JK×R) is the loading matrix, with R thenumber of PCs of the model, T(I×R) is the score matrix and E(I×JK) is the residual matrix.

PCA reduces the original dimension of the data JK to thenumber of principal components R. When no dimensionreduction is done, PCA is just a rotation of the axes. Therows of the score matrix are the projections of the data to thelatent subspace of the PCs.

When process data includes independent or non-linearlyrelated groups of variables, a model could be designed for eachof these groups. In Fig. 2, a pair of examples are shown.Imagine a data set composed of a hundred of observations offour variables, where the relationship between the first two (Fig.2(a)) and between the last two (Fig. 2(b)) is perfectly linear.Nonetheless, the first pair of variables is either independent ofthe last pair (example in Fig. 2(c)) or non-linearly related to it(example in Fig. 2(d)).

When the pairs of variables are independent, the analysis canbe done separately for each pair without lose of predictionpower and with less number of principal components (lowersize of model). In the example, the model of the four variableshad to include two PCs whereas the models of the pairs of

Page 3: Multi-phase principal component analysis for batch processes modelling

-5 -4 -3 -2 -1 0 1 2 3 4 5-5

-5

-4

-3

-2

-1

-4

-3

-2

-1

0

1

2

3

4

5

Var 1 Var 4

Var

2

First PC & Second PC

(a) {Var1,Var2} space

-8 -6 -4 -2 0 2 4 6 8-8

-6

-4

-2

0

2

4

6

8

Var

3

First PC & Second PC

(b) {Var3,Var4} space

-5 -4 -3 -2 -1 -5 -4 -3 -2 -10 1 2 3 4 5

0

1

2

3

4

5

Var 2

Var

3

First PCSecond PC

(c) {Var2,Var3} space, Indepen-

dent variables

0 1 2 3 4 5-1

0

1

2

3

4

5

6

7

Var 2

Var

3

First PCSecond PC

(d) {Var2,Var3} space, Non-linear

relationship

Fig. 2. Independent or non-linearly related groups of variables. Non-preprocessed observations are represented by points. Preprocessed (mean-centered and auto-scaled) observations are represented by circles.

129J. Camacho, J. Picó / Chemometrics and Intelligent Laboratory Systems 81 (2006) 127–136

variables needed only one. Moreover, imagine the second PC isnot added to the complete model because its explained varianceis low. This can occur when one of the independent groups ofvariables has many more variables than the other. In this case,the little group of variables could be poorly modelled.Modelling the groups separately solves the problem.

Separation into different models has the drawback that itdoes not take into account part of the multivariate nature of thedata. Nonetheless, in batch-wise unfolded data, the number ofvariables is so big that by fitting only one model the variablescollected in one of the short phases of the batch could be poorlymodelled. In batch processes of multi-phase nature, a multi-phase model is convenient.

When dealing with non-linear relationships, the PCA modellosses prediction power. Fig. 2(d) shows a quadraticrelationship between variables 2 and 3. The PCA modelwhich approximates the non-linear relationship with one PCproduces large residuals and hence high prediction error.These residuals do not follow a normal distribution andcontrol limits established for monitoring are not appropriate.Some normal observations can be determined to be abnormaland the other way round, incrementing Type I (normalobservations detected as faults) and II (non-detected faults)

risks. An alternative to treating the value of the residuals asnormal is to include another principal component in the model.Again, the problem here is the normality assumption, whichnow it is not satisfied by the scores. This causes an increment inthe Type I and II risks because of the poor adjustment of thecontrol limits to the real distribution of the samples. In Fig. 2(d),any preprocessed observation which falls between the mean(crossing point of the PCs) and the quadratic curve of circles isabnormal, but the D-statistic and Q-statistic commonly used incombination with the PCA model [1] would catalogue it asnearer to the average behavior than any of the calibrationsamples. This will happen unless the information obtained fromthe rest of the relationships among process variables detect theabnormality.

Non-linear dynamics approximated with a linear combina-tion of variables can increment the prediction and monitoringerror of the model of a process. Approximating the non-linearitywith a non-linear model can be the solution, but this is a difficulttask and a big and rich data set is needed. When dealing with ahuge amount of variables, this is almost impossible. In batchprocesses, a compromising solution is to model only thedynamics which are well approximated by a linear model.Nowadays, as the information of a process is repeated because a

Page 4: Multi-phase principal component analysis for batch processes modelling

20 40 60 80 100 120 140 160 180 200 220

20

40

60

80

100

120

140

160

180

200

220

Variables

Var

iabl

es

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Fig. 3. Correlation of two variables measured 116 times (batch duration) afterpre-processing (mean-centered and auto-scaled) and batch-wise unfolding.

Fig. 4. Graphical example of the top–down recursive procedure. The referencefor the values of k is the sub-matrix time mode, and not the complete matrix timemode.

130 J. Camacho, J. Picó / Chemometrics and Intelligent Laboratory Systems 81 (2006) 127–136

big amount of variables is collected, a good model of theprocess can be obtained by modelling the linear or pseudo-linear relationships. Note that, to improve the prediction power,the multiple models have to be independent. Thus, a multi-blockapproach should not be applied to model a multi-phase processbecause this approach forces a linear relationship among blocksat the highest level of the model. With a multi-block model, notonly that the prediction power is not improved but it can also bereduced.

Non-linear relations and independence between two vari-ables are examples of types of relationships poorly approxi-mated with a linear model. The variables so related show lowcorrelation values. In batch processes, two observations of thevector of process variables commonly show less correlation asthe time (grade of completion of the batch) distance betweenthem grows. In Fig. 3, the correlation map of the batch-wiseunfolded data of two variables of the polymerization of nylon 6′6 process used in [2] is shown. The batch length is 116 and datainclude the measurements of 50 batches. The result of unfoldingis a bi-dimensional matrix with 50 observations of 232variables, the first 116 of one of the original variables and therest of the other.

The figure shows three diagonals of high correlation. Theprincipal diagonal is the correlation of the variables withthemselves. The other two (remember the correlation matrix issymmetric) is the correlation between the two originalvariables at the same time instant. The dark rectangles of thegraphic represent the segments of the batch were the variablesare highly correlated with their preceding values. Thecorrelation gets negligible as the time distance grows. Theformation of rectangles of different sizes in the correlation mapmeans that the auto-correlation and lagged cross-correlation isdependant on the phase of the batch and that the modellingwith linear models should be done independently for differentphases.

The aim in this paper is to find the phases in which a batchcan be divided and evaluate whether dividing it is worthor not.

3. Multi-phase principal component analysis

The goal of the multi-phase principal component analysis(MPPCA) algorithm is to find the phases of the batch process,i.e. the segments of the batch well approximated by a linearmodel. The algorithm sequentially executes two steps: a top–down step and a down–top step. The top–down step [16] is arecursive procedure which finds an initial solution (a partition ofthe batch duration in m phases). The down–top step is added tocontrol the maximum number of phases of the final solution.

The top–down step initially finds the best time instant, interms of prediction power, to divide the batch duration in two.Any possible time subdivision is evaluated with the percentageof reduction (3) of the sum-of-the-squared-prediction-errors(PRESS, Eq. (4)). The one with the highest value of reduction(maxtα

t) is chosen and the procedure recursively proceeds withthe resulting segments (corresponding sub-matrices of data, seeFig. 4).

at ¼ 100d 1−PRESStcPRESSc

� �ð3Þ

PRESSc ¼XI

i¼1

XJj¼1

XKk¼1

ðxijk−x̂ ijkÞ2 ð4Þ

PRESStc ¼XI

i¼1

XJj¼1

Xt

k¼1

ðxijk−x̂ ð1Þijk Þ2

þXI

i¼1

XJj¼1

XKk¼tþ1

ðxijk−x̂ ð2Þijk Þ2 ð5Þ

where x̂ijk is the prediction of variable j at time instant k in batchi, obtained using cross-validation and one c PCs model. PRESSc

t

is the PRESS value obtained when the model is divided in twosub-models corresponding to the segments [1, …, t] and [t+1,…, K]. x̂ijk

(1) represents the prediction of the first sub-model andx̂ ijk(2) the prediction of the second sub-model.In practice, the PRESS calculation can be too much time

consuming. An approximation is done by evaluating thepercentage of reduction of non-explained variance (6). Thenon-explained variance or squared error (SE) of the model is

Page 5: Multi-phase principal component analysis for batch processes modelling

Fig. 5. Computation of βk.

131J. Camacho, J. Picó / Chemometrics and Intelligent Laboratory Systems 81 (2006) 127–136

calculated like the PRESS value (4), but the predictions x̂ijk areobtained from a model generated with the complete data set,instead of using cross-validation. The explained variance hasbeen used before to justify the use of multistage models [13,12].

bt ¼ 100d 1−SEt

c

SEc

� �: ð6Þ

Once the top–down procedure has finished, the down–top stepselects the partition in n segments with the highest varianceexplained, out of them segments obtained in the top–down step,with n≤m.

The inputs of the MPPCA algorithm are:

• The three-way data matrix.• The number of PCs of the model.• Minimum length of a phase (minlength), understood as theminimum number of consecutive time instants included in aphase. This parameter will be defined as a portion of thebatch length.

• Improvement threshold (threshold) to accept a subdivision.• Maximum number of phases (maxnumber).

The first four are the inputs of the top–down step whereasmaxnumber is used in the down–top step.

The partition of the batch obtained with MPPCA isdependent on the number of PCs used in the model. In PCA,the selection of the number of PCs using cross-validation isdone by looking at the form of the PRESS curve or by using astatistical index, like the R-statistic [17] or the W-statistic [18].The selection of the number of PCs in MPPCA can be done inthe same manner. Thus, the MPPCA is a generalization of theunfold-PCA. When single-phase processes are analyzed,unfold-PCA and MPPCA are just the same.

The minimum length of a phase is included in order not toobtain phases of small size. If no minimum is wanted, this inputcan be settled to 1/K, with K the number of time instants in thebatch. A segment of this length is represented by an I×J matrixwhich includes the measurements of the J process variables inthe I batches collected at time instant k. Avalue of minlength=1is equivalent to utilizing unfold-PCA.

The improvement threshold is the key parameter of thealgorithm. It establishes the convenience of a subdivision. Asubdivision is justified if the percentage of reduction of non-explained variance (β) is sufficiently high. A threshold=0%implies that all the subdivisions are accepted in the top–downstep, only constrained to the minimum length of phase. Apercentage of 100% is equivalent to utilizing unfold-PCA (nosubdivision is accepted).

The search space of the solution is related to the minlengthvalue, which constrains the maximum number of phases foundin the top–down step. The parameter maxnumber and thedown–top step of the algorithm have been added to separate themaximum number of phases from the minimum length. In themulti-phase framework there are as many sub-models as phases.Therefore, each batch produces one score vector per phase.Then, the number of points in a D-statistic chart for one batch isequal to the number of phases. In order to simplify the

monitoring task, a maximum number of phases can be settled.With a value of the maximum number of phases settled to thelength of the batch, this parameter does not affect the resultingmodel. A value of maxnumber=1 is equivalent to utilizingunfold-PCA. The generation of the multi-phase monitoringcharts can be done as proposed in Camacho and Picó [16].

The top–down step follows next recursive algorithm:

i. Unfolding of the three-way data matrix.ii. Data preprocessing (mean-centering and auto-scaling).iii. PCA of the data.iv. For each time instant (k)

iv.1 If the subdivision in k generates two segments oflength higher than the minimum established(minlength):iv.1.1 PCA of the two segments.iv.1.2 Compute β k (6) from the predictions of the

models obtained in steps iii and iv.1.1.iv.1.3 If β k is the highest to the moment (β k=maxtβ

t:t=minlength, …, k), update better subdivision.

v. If the best improvement does not reach the threshold, thenstop this branch.

vi. Otherwise, accept subdivision and (recursively) repeatsteps from iii to vi for the two resulting segments.

In Fig. 5, a diagram depicting the computation of βk is shown.In each recursion, this computation is done for every time instantwhich generates two segments of length higher than the mini-mum. Note that the preprocessing and unfolding can be doneoutside the recursive part of the algorithm, because the result ofdividing before or after preprocessing and unfolding is the same.

If the number of phases obtained in the top–down step ishigher than the maximum established (maxnumber), then thedown–top step is in charge of deleting some of the subdivisions.To delete a subdivision, it has to fulfil two conditions:

• To be a leaf of the subdivision tree generated, i.e. thesegments obtained in the subdivision were not divided by thetop–down procedure or, if they were, the down–topprocedure has joined them again.

Page 6: Multi-phase principal component analysis for batch processes modelling

Fig. 6. Graphical example of Fig. 4 after one subdivision has been subtracted bythe down–top procedure.

132 J. Camacho, J. Picó / Chemometrics and Intelligent Laboratory Systems 81 (2006) 127–136

• Its improvement in terms of total explained variance is thelowest of the subdivisions which fulfil first condition.

The total variance explained by the multi-phase model iscomputed each time according to the partition formed by thesubdivisions which are not a leaf of the tree and one of theleaves. The leaf included in the multi-phase model with thelowest explained variance is erased and that subdivisiondiscarded. Thus, for instance, if the example of Fig. 4corresponds to the input of the down–top procedure, the leavesof the subdivision tree are those for k=31 and for k=14.Imagine the multi-phase model generated including the firstsubdivision had higher explained variance than the oneincluding the second subdivision. Then, subdivision k=14 iserased (the result appears in Fig. 6).

Thus, the MPPCA is a data driven algorithm with threeintuitive parameters which can be deactivated wheneverwanted. Moreover, the parameters are independent from theprocess. Therefore, general rules of thumb to estimate theirvalues or more sounded tuning strategies can be defined.

4. Data sets

Two data sets have been used for the experiences shown inthis paper. The first one corresponds to data from the nylon 6′6

510

1520

0.20.15

0.250.3

0

0.1

0.2

0.3

0.4

Threshold (% reduction Non-EV)

Minimum length of phase

Goo

dnes

s of

pre

dict

ion

(a) Prediction power

Fig. 7. MPPCA results with 1 PC and maximum numb

polymerization process. This process has five stages. InKosanovich et al. [2], the use of two sub-models was proposed.Details of the process can be read in the cited paper. The secondone is the etch process used in Wise et al.[8], where theprocessing has six stages. Data from this process belong todifferent operation points. As we are not going to use anadaptive method only the first 30 batches, which belong to thesame operation point, are used.

Data from the first set are not aligned, but eachmeasurement vector includes information about the stage itbelongs to. Alignment is done by linear interpolation from thisinformation. Investigating the data, 13 batches were cataloguedas outliers and eliminated. The resulting data matrix contains37 batches, 9 variables and 116 time instants. The second dataset is aligned and includes 30 batches, 12 variables and 80 timeinstants.

5. Experimental results

In this section, the capability of MPPCA to distinguishbetween single-phase processes and multi-phase processes istested. Then a comparative study with other multi-modelproposals is carried out.

5.1. Discriminative capability between single-phase andmulti-phase processes

Initially, to detect the single-phase or multi-phase nature of aprocess, it is recommended to analyze the data with 1 PC and nomaximum number of phases (maxnumber is settled to the batchlength K). With 1 PC it is easier to recognize segments of abatch where the direction of maximum variance is not correctlymodelled. No maximum number of phases is used in order toobserve the top–down procedure output, which is more relatedwith the data nature.

Fig. 7 shows the result of the MPPCA of the polymerizationprocess with different threshold (from 1% to 20% of non-explained variance reduction) and minlength values (1/3, 1/4, 1/5, 1/6, 1/7 and 1/8 of the entire batch). The prediction power

510

1520

0.150.2

0.250.3

1

2

3

4

5

6

Threshold(% reduction Non-EV)

Minimum lengthof phase

Num

ber

of p

hase

s

(b) Number of phases

er of phases deactivated. Polymerization process.

Page 7: Multi-phase principal component analysis for batch processes modelling

15

1015

20

0.150.2

0.250.3

0

0.1

0.2

0.3

0.4

Threshold(% reduction Non-EV)

Minimum length of phase

Goo

dnes

s of

pre

dict

ion

(a) Prediction power

510

1520

0.150.2

0.250.3

1

2

3

4

5

6

Threshold( % reduction Non-EV)

Minimum lengthof phase

Num

ber

of p

hase

s

(b) Number of phases

Fig. 8. MPPCA results with 1 PC and maximum number of phases deactivated. Etch process.

133J. Camacho, J. Picó / Chemometrics and Intelligent Laboratory Systems 81 (2006) 127–136

(Fig. 7(a)) is evaluated using the goodness of prediction (Q2,Eq. (7)).

Q2 ¼ 1−PRESSc

SS; ð7Þ

where SS is the sum of squares of the data set. Fig. 7(b) showsthe number of phases detected by the algorithm for each pair ofthe parameters values (remember maxnumber is settled to K).

With the pair of figures it is easy to compare the multi-phasemodel with the unfold-PCA model. The Q2 value of the lattercan be seen in Fig. 7(a) for the same parameters that does notfind more than 1 phase in Fig. 7(b) (threshold values from 11%to 20%). A threshold equal to 6% or 7% seems to be theconvenient for the analysis with MPPCA, because with a higherthreshold some phases which add prediction power are notfound and with a lower threshold some phases included nearlydo not add prediction power. A threshold equal to 8%, 9% or10% could be selected too.

No matter which of these threshold values is selected, themulti-phase model improves the prediction power. For athreshold=7%, the improvement rise to more than a 320%.For a threshold=10%, it gets near to a 270%.

0 20 40 60 80 100

1

2

3

4

Batch time

Clu

ster

s

(a) Sub-PCA output

Fig. 9. Post-processing of an example of Sub-PC

In Fig. 8, results for the etch process are shown. The figuresare very different from those mentioned before. Phases (Fig. 8(b)) are only found with a threshold=1%. Moreover, the three orfour phases found do not add enough prediction power to bejustified (see Fig. 8(a) for a threshold=1%). The conclusion isthat the etch process is single-phase.

Note that any of the threshold values proposed for the otherprocess work for this process too, as the correct output is asingle phase.

5.2. Prediction power

In this section, a comparative study in terms of predictionpower is performed. The data set used is that from the multi-phase process, the polymerization one. Four multi-modelmethods are tested against unfold-PCA and MPPCA: splittingthe batch in regular intervals, splitting in regular intervals andmodelling from the beginning of the batch, as proposed byLouwerse and Smilde [11], splitting according to the knownstages of the process, as proposed by Ündey and Çinar [13], andSub-PCA [12].

0 20 40 60 80 100Batch time

1

2

3

4

Pha

ses

(b) Post-processing

A partition of the polymerization process.

Page 8: Multi-phase principal component analysis for batch processes modelling

134 J. Camacho, J. Picó / Chemometrics and Intelligent Laboratory Systems 81 (2006) 127–136

Some remarks about Sub-PCA have to be done. Firstly, onlythe partition calculated with Sub-PCA is going to be used, notthe modelling method, which is specific for on-line monitoringand would give worse results (they use an average of the modelof the instants included in a cluster). Secondly, as Sub-PCA usesa variation of the k-means and does not include the timeinformation explicitly, the output has to be post-processed toobtain the phases. Authors do not say anything about how to dothis automatically. Here, postprocessing was achieved bydividing the clusters in phases (where all observations areconsecutive), detecting short phases (using the same minimumlength parameter for deleting clusters of the k-means algorithm)and merging these with the nearest contiguous phase, using thesame distance defined for the k-means. Fig. 9 shows an exampleof partition with Sub-PCA and the post-processing effect. Fig. 9(a) displays the output of Sub-PCA. The clusters include smallgroups of consecutive time instants. Fig. 9(b) shows the outputof the post-processing, where there is not phases smaller thanthe minimum used (equal to 10). Note that cluster 1 correspondsto phase 2 and vice versa.

0 2 4 6 8 101

1.5

2

2.5

3

3.5

4 x 104

x 104

No Principal components

0 2 4 6 8 10

No Principal components

PR

ES

S

1

1.5

2

2.5

3

3.5

4

PR

ES

S

unfold–PCARegular partitionSub–PCAMPPCALouwerse & Smilde

unfold–PCARegular partitionSub–PCAMPPCALouwerse & Smilde

(a) 2 sub-models

(c) 4 sub-models

Fig. 10. Comparative of multi-model

To compare the different proposals, the multi-modelsgenerated with 2, 3, 4 and 5 sub-models have been testedagainst the unfold-PCA model (Fig. 10). The minimum lengthof phase was settled to 9 because the shortest stage of theprocess includes 9 observations. MPPCA threshold used wasequal to 7%. The parameters for Sub-PCA were obtained afterextensive experimentation and choosing the best results for 1PC for each of the first three cases. No partition in 5 phases werefound with Sub-PCAwhile maintaining a minimum length of 9.Then this method is not shown in the last comparative graphic(Fig. 10(d)). The proposal of Ündey and Çinar only appears inthe last comparative because the process has 5 stages.

Again, the tests show the multi-phase nature of thepolymerization process. The unfold-PCA model gets by farthe poorest performance. The proposal of Louwerse and Smildeobtains better outcomes, but it does not improve theperformance so much as the other proposals.

The method of subdividing according to the stages of thebatch (Ündey and Çinar) and Sub-PCA are beaten by the regularpartition. The latter method got really good outcomes. A good

x 104

x 104

0 2 4 6 8 10

No Principal components

0 2 4 6 8 10

No Principal components

1

1.5

2

2.5

3

3.5

4

PR

ES

S

1

1.5

2

2.5

3

3.5

4

PR

ES

S

unfold–PCARegular partitionSub–PCAMPPCALouwerse & Smilde

unfold–PCARegular partitionSub–PCAMPPCALouwerse & Smilde

(b) 3 sub-models

(d) 5 sub-models

methods. Polymerization process.

Page 9: Multi-phase principal component analysis for batch processes modelling

135J. Camacho, J. Picó / Chemometrics and Intelligent Laboratory Systems 81 (2006) 127–136

modelling of short segments only improves the predictivepower of the segment, and so it does not improve the predictionpower as much as using longer segments does. Dividing inregular intervals does not generate any short segment. Sincehere the same number of sub-models is used for any of themethods in each of the comparatives, a partition has toaccurately correspond with the nature of the process to getbetter performance than this method. Of course, the regularpartition cannot be used to model a process if it is not knownwhether the process has multi-phase nature or not and howmany phases should be used.

MPPCA provides the best outcomes in most of the caseswith the first PCs. Remember these PCs explain much of theinformation in the data. When much of the residuals is noise(which occurs for higher number of PCs), MPPCA output relieson that noise. Since in process monitoring only the first PCs areused, we can conclude that MPPCA obtains the highestprediction power by adjusting the model to the nature of theprocess.

6. Conclusion

The use of multiple models has been proposed before inmany batch processes, but as far as the authors know, noattempt has been done to automatically generate a multi-model from the data to improve the modelling. In this paper,it has been shown that the model of some batch processescan be improved not only by adding principal components,but by dividing the time mode in segments which are wellapproximated by a linear model: the phases.

Any subdivision implies an enlargement of the monitoringinformation, which should include only what is indispens-able, and a lost of variables relationship in the model. Thus,a subdivision should be justified enough.

The MPPCA framework has been proposed to analyze batchprocesses. This framework has many advantages:

• The threshold defined evaluates the degree of justification ofa subdivision and so adjusts the result to the nature of theprocess.

• The algorithm makes the most of the repetitive nature of theprocess, using the time information implicitly. Thus, nopostprocess is necessary unless the designer wants areduction in the number of sub-models (down–top step ofthe algorithm).

• The parameters are intuitive and have been defined to beindependent of the specific process. The analysis can be donewith almost no process knowledge, but common sense. So,no exhaustive experimentation is needed to apply the methodto a process. The only parameter which tuning could be morechallenging is the improvement threshold. An strategy to setthis parameter has been given and values around a 10%should be appropriated.

• MPPCA provides the best performance in terms ofprediction power. This reflects the good adjustment to theprocess nature. Thus, the degree in which the modelimproves the process and faults knowledge is higher.

This paper opens some interesting future research lines:

• The design of other algorithms within the MPPCAframework, to improve its performance.

• The design of new statistical indexes or variations of thealready proposed, in order to include the convenience ofsubdividing a model together with adding a new PC.

• The use of fuzzy transitions between phases or Gaussianmixture models which could improve the modelling of thenonlinear dynamics and be more robust to alignmenterrors.

Acknowledgements

This work has been partially funded by the FPU grantsprogram, Secretaría de Estado de Educación y Universidades(Ministry of Education and Science, Spain). Authors wish tothank Kenneth S. Dahl (DuPont) and Neal B. Gallagher(Eigenvector), who provided the experimental data, LuNingyun for his clarifications concerning Sub-PCA and RicardBoqué for his useful comments.

Appendix A

The sense in which one should understand some of the termsused in this paper is described in the following definitions:

Segment: time interval contained in the alignedbatch duration.

Stage: segment corresponding to one unitoperation.

Phase: (a) segment well approximated by a linearmodel. (b) segment included in the outputof any of the multi-phase algorithms(MPPCA, Sub-PCA).

Partition: the division of the batch duration indisjoint segments.

Subdivision: the division in two of a segment.Sub-model: PCA model which models the data of a

sub-matrix corresponding to a segment.Multi-model: the complete model of a batch process

represented by several sub-models. Herethe distinction is made with a piece-wisemodel, where the segments correspondingto the sub-models have to be disjoint.

Piece-wise model: the complete model of a batch processrepresented by the sub-models corres-ponding to a partition.

Multi-stage model: piece-wise model which correspondingsegments are stages.

Multi-phase model: piece-wise model which correspondingsegments are phases.

References

[1] P. Nomikos, J. MacGregor, Multivariate SPC charts for monitoring batchprocesses, Technometrics 37 (1) (1995) 41–59.

Page 10: Multi-phase principal component analysis for batch processes modelling

136 J. Camacho, J. Picó / Chemometrics and Intelligent Laboratory Systems 81 (2006) 127–136

[2] K. Kosanovich, K. Dahl, M. Piovoso, Improved process understandingusing multiway principal component analysis, Engineering ChemicalResearch 35 (1996) 138–146.

[3] T. Kourti, Process analysis and abnormal situation detection: from theoryto practice, IEEE Control Systems Magazine 22 (5) (2002) 10–25.

[4] S. Wold, P. Geladi, K. Esbensen, J. Ohman, Multi-way principalcomponents and PLS analysis, Journal of Chemometrics 1 (1987) 41–56.

[5] P. Nomikos, J. MacGregor, Monitoring batch processes using multiwayprincipal components analysis, AIChE Journal 40 (8) (1994) 1361–1375.

[6] P. Nomikos, J. MacGregor, Multi-way partial least squares in monitoringbatch processes, Chemometrics and Intelligent Laboratory Systems 30(1995) 97–108.

[7] J. Westerhuis, T. Kourti, J. MacGregor, Comparing alternative approachesfor multivariate statistical analysis of batch process data, Journal ofChemometrics 13 (1999) 397–413.

[8] B. Wise, N. Gallagher, S. Butler, D.W. Jr., G. Barna, A comparison ofprincipal component analysis, multiway principal component analysis,trilinear decomposition and parallel factor analysis for fault detection in asemiconductor etch process, Journal of Chemometrics 13 (1999) 379–396.

[9] J. Westerhuis, T. Kourti, J. MacGregor, Analysis of multiblock andhierarchical PCA and PLS models, Journal of Chemometrics 12 (1998)301–321.

[10] B. Lennox, H. Hiden, G. Montague, G. Kornfeld, P. Goulding, Applicationof multivariate statistical process control to batch operations, Computersand Chemical Engineering 24 (2000) 291–296.

[11] D. Louwerse, A. Smilde, Multivariate statistical process control of batchprocesses based on three-way models, Chemical Engineering Science 55(2000) 1225–1235.

[12] N. Lu, F. Gao, F. Wang, Sub-PCA modeling and on-line monitoringstrategy for batch processes, AIChE Journal 50 (1) (2004) 255–259.

[13] C. Ündey, A. Çinar, Statistical monitoring of multistage, multiphase batchprocesses, IEEE Control Systems Magazine 22 (5) (2002) 40–52.

[14] C. Ündey, S. Ertunç, A. Çinar, Online batch/fed-batch process performancemonitoring, quality prediction, and variable-contribution analysis fordiagnosis, Industrial and Engineering Chemical Research 42 (2003)4645–4658.

[15] S. Wold, N. Kettaneh, H. Friden, A. Holmberg, Modelling and diagnosticsof batch processes and analogous kinetic experiments, Chemometrics andIntelligent Laboratory Systems 44 (1998) 331–340.

[16] J. Camacho, J. Picó, Monitorización de procesos por lotes a través de PCAmultifase, Submitted for publication to Revista Iberoamericana deAutomática e Informática Industrial.

[17] S. Wold, Cross-validatory estimation of the number of components infactor and principal components, Technometrics 20 (4) (1978) 397–405.

[18] H. Eastment, W. Krzanowski, Cross-validatory choice of the number ofcomponents from a principal component analysis, Technometrics 24 (1)(1982) 73–77.