Evaluation of prediction error based fuzzy model clustering approaches for multiple model learning

Evaluation of prediction error based fuzzy model clusteringapproaches for multiple model learning

Vidyashankar Kuppuraj • Raghunathan Rengaswamy

Published online: 16 May 2012

� Indian Institute of Technology Madras 2012

Abstract Identifying multiple models from both static

and dynamic data is an important problem in several

engineering fields. Clustering based on Euclidean distance

measure has been proposed to solve this problem. How-

ever, since Euclidean distance is not directly related to

model fidelity, these approaches can lead to suboptimal

results even when the number of models is known. In this

work, through a three step algorithm that includes initial-

ization, prediction error based fuzzy clustering and model

rationalization, we evaluate the possibility of uncovering

multiple model structures from data. The three step algo-

rithm is also assessed for the identification of piecewise

auto regressive exogenous systems with unknown number

of models and their (unknown)orders. The basic approach

can be extended for trend analysis and generalized princi-

pal component analysis.

Keywords Multiple model learning (MML) �PWARX models � Fuzzy clustering

1 Introduction

In this paper, we discuss a fuzzy clustering based model

learning formulation for estimation of multiple models.

Good predictive models are imperative in the fields of Fault

Detection and Isolation (FDI), economics, home insurance,

image processing and supervisory control applications.

There are many instances where data are generated from

multiple models operating in intersecting/non-intersecting

partitions of the input space [5, 11, 14, 22, 24, 25, 28–30].

The focus of the paper is on these scenarios where the data

samples are the input (X) and output (Y) data generated by

multiple linear models. Let Ci denote the parameters of the

ith model. The data samples are composed of several input-

output tuples [Xij Yij] that are functionally related as below.

Yi;j ¼ CiXi;j þ ei;j : i ¼ 1 : N; j ¼ 1 : Mi ð1Þ

where ei,j is i.i.d random noise. Xi;j�Rn; Yi;j�R

m and

Ci�Rm�n: N is the number of underlying models and Mi is the

number of data points from the region described by model

i, M is the total number of points with RMi ¼ M. Let us

consider three instances of the MML problem [11, 24].

Case 1 The number of models and data partitioning are

known. In these cases, the objective is to estimate model

parameters. These are cases where we know the partitions

a priori based on process knowledge. The model parame-

ters for each partition can be estimated using a ordinary

least squares (OLS) method.

Case 2 The number of models is known; however, data

partitioning is not known. In this case, the objective is to

partition the data and to estimate model parameters. The

data are partitioned into subsets equal to the number of

models. Model parameters are estimated for each of these

subsets. Various clustering techniques are employed in the

literature for data partitioning.

Case 3 Neither the partitions nor the number of models

are known. This is a challenging and difficult problem.

Under some restrictive assumptions, the existing tech-

niques solve this problem either by repetitively applying

their algorithm for the known number of models case or

through an optimization formulation. Optimization for-

mulations are computationally demanding.

V. Kuppuraj � R. Rengaswamy (&)

Department of Chemical Engineering, Texas Tech University,

Lubbock, TX 79409, USA

e-mail: [email protected]

123

Int J Adv Eng Sci Appl Math (March–June 2012) 4(1–2):10–21

DOI 10.1007/s12572-012-0058-y IIT, Madras

In this paper, we evaluate a multiple model learning

approach that performs clustering based on prediction

error. This is similar to the work of Frigui and Krish-

napuram [12] where the residual is used as a distance

measure, which is a prediction error based approach. The

residuals are generated in a total least squares (TLS)

framework as discussed in the work of [12]. In our work,

we retain the fuzzy clustering framework and develop a

model clustering approach based on a ordinary least

squares (OLS) formulation. We evaluate such an approach

for model update and also a line search approach for model

update, which seems to be more efficient at identifying the

underlying multiple model structure. Further, we develop

post-processing techniques for model rationalization that

enhances the ability of the approach in identifying the

original system as closely as possible. We also evaluate the

efficacy of such a multiple model learning framework

through extensive simulation studies where the underlying

data is generated from known multiple models under dif-

ferent sampling situations; this is something that does not

seem to have been discussed much in the literature.

In the area of identification of piecewise auto regressive

exogenous (PWARX) models, surprisingly, most of the

clustering methods seem to have as a first step, a distance

based clustering for the identification of local data sets

(LDs). When our proposed approach is applied to the

PWARX problem, this step can be removed and the LDs

and the corresponding model parameters can be identified

in a single integrated step using the prediction error based

membership. We believe that this will remove many of the

problems related to the mixed LDs reported in the litera-

ture. Further, since the clustering objective is directly

related to the model fidelity, theoretical analysis of the

performance characteristics of the proposed algorithm

should be more accessible.

1.1 Literature review

Research in the literature has addressed both the static and

dynamic multiple model learning (MML) cases. The

problem of MML has been around for a long time. The

initial work went under the name of clusterwise linear

regression [6, 7, 15, 16, 27]. Spath [27] coined the term

clusterwise regression and developed an algorithm that

finds optimal feasible partitions and the regression coeffi-

cients for each of the clusters. A conditional maximum

likelihood estimation approach for clusterwise regression

was introduced by [6]. A simulated annealing based

approach for the same problem was proposed by [7].

Hennig [15] discusses different methods for clusterwise

regression, while in [16], the identifiability of the cluster-

wise regression parameters was studied. A similar problem

has been discussed as multiple model general linear

regression (MMGLR) problem in Frigui and Krishnapuram

[12]. While the focus of this paper was on robust clustering

using competitive agglomeration, MMGLR is discussed as

an exemplar problem for the robust competitive agglom-

eration (RCA) algorithm, where the standard c-means

distance measure is replaced by a prediction error based

measure.

In a similar vein, a multiple model learning (MML)

problem is posed by Cherkassky and Ma [5] and solved

assuming the existence of dominant model structures in the

data. A repetitive two-step algorithm is proposed to esti-

mate multiple models. In the first step, a major model is

estimated. In the second data partitioning step, data

belonging to the major model are segregated from the

residual (left-out) data. The residual data is assumed to be

generated by a dominant model. These two steps are

applied on data until a specified tolerance is met. A support

vector machine (SVM) based regression procedure is

employed for model parameter estimation. In this

approach, the existence of dominant model structures is

assumed and model estimation is iterative, one model is

estimated at a time. Dufrenois and Hamad [8] discuss an

extension to support vector regression (SVR) approach to

simultaneously extract multiple linear structures from data.

In this formulation, data are assigned fuzzy memberships to

multiple linear structures. Residuals—defined as maximum

model mismatches—are used to update memberships.

Updated memberships are used in recalculation of model

parameters using a dual SVR optimization formulation.

Except for providing dynamic upper bounds for the

Lagrange multipliers, memberships do not change the

standard SVR dual formulation. The number of models is

assumed to be known in this work. The authors suggest that

the question of unknown number of models may be

resolved by competitive agglomeration but do not provide

an algorithm. Fair sampling and known number of models

are assumptions made in this approach. Elfelly et al. [10]

propose a three step algorithm for development of multiple

models. In the first step, the optimal number of clusters are

determined using a rival penalized competitive learning

(RPCL). The cluster centers for these models are identified

using a fuzzy k-means algorithm using an Euclidean dis-

tance based membership. In the second step, parametric

model identification is performed based on the clustering

results. In the third step, these local models are combined

to form a global model.

In related literature, several papers address the multiple

model regression problem under the framework of fuzzy

regression analysis (see [4, 9, 20, 21, 26, 31] and the ref-

erences therein). The ideas are very similar to the MML

algorithms but the final model is presented in a fuzzy

regression framework. The basic idea of prediction error

based distance is included in the model update step in

Int J Adv Eng Sci Appl Math (March–June 2012) 4(1–2):10–21 11

123

different forms in the works of [20, 21, 26, 31]. However,

papers in this framework do not seem to be focused on

identifying the exact underlying model structure but rather

on building a fuzzy regression model that captures such

data adequately.

MML for dynamical systems are of interest in various

fields of engineering. The most popular of these problems

is the identification of PWARX models. While there have

been several techniques for PWARX identification such as:

modified expectation minimization (EM) [17], bounded

error procedure [2], Bayesian approach [18], algebraic

approaches [30], and mixed integer programming approach

[24], our focus is on the clustering based approaches.

Nakada et al. [22] proposed an algorithm to identify

PWARX models based on a statistical clustering technique.

In the proposed algorithm, with the assumptions of known

number of models and model order, the original data is

clustered using a Gaussian mixture model. The clustered

data is projected on to the regressor space. Support vector

classifiers are used to estimate the boundary hyperplanes

between neighboring clusters in the regressor space. The

model parameters in each cluster are identified using a least

squares method. Through repetitive application of their

identification procedure, [22] discussed a method to esti-

mate number of models from data. Consistent Akaike’s

information (CAI) and minimum description length (MDL)

criteria are minimized to identify the unknown number of

models. The number of models is identified by iteration

and the EM clustering algorithm is strongly dependent on

initial values.

Ferrari-Trecate et al. [11] proposed a clustering based

algorithm that works under the assumption that the model

order and number of models are known. The first step in

this algorithm is the grouping of small data sets known as

local data sets (LDs). The LDs which consist of data from a

single model are termed pure LDs. In contrast, mixed LDs

include data from different models. The authors discuss

problems in dealing with mixed LDs. An important

assumption in this work is that the ratio of the number of

mixed to pure LDs is small. In other words, fair sampling is

assumed. A parameter vector is estimated for each LD

based on a least squares method. K-means clustering is

employed to group parameter vectors into disjoint subsets

equal to the assumed known number of models. Bijective

maps between parameter vectors and data are used to

classify the original data. The authors suggest that the

estimation of number of models may be performed using

clustering algorithms; however, this idea has not been

developed further. The knowledge of number of models is

a limitation in this technique and an assumption of fair

sampling is also made. In recent work, Gegundez et al. [13]

propose an initial distance based clustering, followed by

identification of initial models and lastly, identification of

the final models using competitive learning. Baptista et al.

[1] propose a split and merge algorithm, where initial

distance based clustering is followed by a least squares

identification of the submodel ARX parameters and finally,

similar models are grouped together using a split and

merge algorithm.

As seen from the literature survey, in the static case, the

idea of prediction error based multiple model learning has

been around for a while. The original work of Frigui and

Krishnapuram [12] discussed this but the focus was not a

detailed evaluation of such an approach for multiple model

learning. Further, many of fuzzy regression approaches

also use prediction error based distance measure, however,

the final goal is the development of an adequate fuzzy

regression model for the data. In summary, estimating the

unknown number of models and input partitions from finite

data is a difficult problem. The majority of existing

approaches work under assumptions about either known

number of models or restrictions on sampling of data

points. Violation of these assumptions could result in poor

multiple model estimation. Some of the existing approa-

ches iteratively estimate one model at a time using the

assumption of dominant model structures. Further, many of

the prior approaches use an Euclidean distance measure for

model identification. Since Euclidean distance based clus-

tering of data points using input data or augmented input

and output data is not directly related to number of models

these approaches may be suboptimal.

2 Multiple model learning formulation

In this work, a fuzzy model clustering (FMC) approach is

developed and evaluated for MML problems. The FMC

approach builds on the fuzzy C-means (FCM) clustering

approach. In view of this, the FCM clustering algorithm is

described next.

2.1 Fuzzy clustering

Clustering is a technique that is used to assign objects with

similar characteristics to groups. The similarity is measured

through an appropriate objective. As an example, cluster-

ing of multi-dimensional data with Euclidean distance

objective identifies data rich regions in the input space.

Clustering techniques can be classified as either hard or

soft clustering. Figure 1a depicts hard clustering. The three

clusters shown in the figure are separated by sharp

boundaries. Soft clustering is shown in Fig. 1b; the clusters

overlap with each other. The K-means algorithm is a hard

clustering technique, while FCM algorithm performs soft

clustering. In hard clustering, whenever a data point

12 Int J Adv Eng Sci Appl Math (March–June 2012) 4(1–2):10–21

123

belongs to a cluster it has a membership of one to that

cluster; else the membership is zero. In soft clustering

methods, the membership of a data point to a cluster is

between zero (no membership) and one (complete mem-

bership). This is termed as partial membership. The sum of

partial memberships of a data point to all clusters equals

one. The FMC algorithm proposed in this paper uses a

novel model clustering concept that is based on FCM

clustering. The standard FCM algorithm is discussed next.

Let X�Rnþm denote the multi-dimensional data. Here we

assume that clustering is performed in the augmented

input-output space. The data contains M samples. N cluster

centers are initialized in the FCM algorithm. Let hi�Rnþm

denote the ith cluster center. i and j are the indices used to

represent the cluster centers and data points respectively. In

the FCM algorithm, a normalized membership of the jth

data point to the ith cluster center is computed by

lij ¼1

PNk¼1

dij

dkj

� � 2q�1

ð2Þ

In Eq. 2, dij is the Euclidean distance (jth data point to

ith cluster center). The parameter q is termed the fuzzifier.

The effect of the fuzzifier on the resulting cluster partitions

is described in [19]. q (greater than one) controls the

overlap in the cluster regions. If q is close to 1, the result is

hard clustering and if q �! 1, clustering becomes totally

fuzzy. It can be easily verified that the sum of partial

memberships of any data point to all clusters is one.

XN

i¼1

lij ¼ 1 ð3Þ

The update equation for cluster center is given by

hi ¼PM

j¼1 lqijXj

PMj¼1 lq

ij

ð4Þ

It can be seen from Eq. 4 that data with high

membership to a cluster have a larger impact on the

cluster location. This leads to competitive agglomeration.

At the termination of the FCM algorithm, the cluster

centers which have low memberships for all data are

discarded. The FCM clustering algorithm is summarized in

Table 1.

2.2 Prediction error based fuzzy model clustering

The fundamental idea behind the prediction error based

approaches is depicted in Fig. 2. In Fig. 2a, the different

symbols represent the data corresponding to the different

models. Figure 2a depicts the standard FCM algorithm

where the cluster centers (red circles) migrate towards data.

(a) Hard clustering (b) Soft clustering

Fig. 1 Hard and soft clustering

Table 1 Fuzzy c-means algorithm

Choose number of cluster centers and a tolerance tol

Initialize membership values lij which are elements of membership

matrix l

Let r be the iteration index. r = 1, 2, 3,…

The cluster centers are updated by hri ¼

PM

j¼1lðr�1Þ

ijð ÞqXjPM

j¼1lðr�1Þ

ijð Þq

The membership values are updated by lrij ¼ 1

PN

k¼1

dijdkj

� � 2q�1

The cluster centers and membership values are updated till

klr � lðr�1Þk� tol


123

Clustering is performed in the data space (that includes

inputs and outputs). This process is largely controlled by

the data density in the different regions of the data space.

Denser regions will attract more cluster centers. Notice that

this density has really no bearing on the models that are

applicable in these regions. In contrast, in the proposed

approach, conceptually depicted in Fig. 2b, where the

different symbols now represent the parameter vectors for

the different models. Clustering is now performed with the

model parameters as the decision variables. If we assumed,

for example, that four different models partition the input

space in Fig. 2a, then there are four points in the model

space that need to be identified. The cluster centers (red

circles) are initialized in the model space and they migrate

towards the four points through the FMC algorithm. Notice

that if these points are exactly identified, then the question

of the number of models and the model parameters are

directly answered. The input space partition is also

answered implicitly through the membership of the data in

the input space to the cluster centers.

It is well-known that the standard FCM algorithm can be

viewed as a minimization of a membership weighted distance

with cluster centers in the data space as the decision variables.

J ¼XN

i¼1

XM

j¼1

lqijkXj � hik2

2

!

ð5Þ

k � k represents Euclidean norm. In the clustering process

as depicted in Fig. 2b, model parameters replace the cluster

centers as decision variables. Further, the weighted

distance metric is replaced by a weighted prediction error

in the objective function. The FMC formulation is inter-

preted as models migrating to accumulate data points

minimizing overall prediction error. This migration results

in simultaneous model parameter estimation and data

partitioning. The data assigned to a model characterizes the

input partition. The prediction error is computed by

PEij ¼ kYj � CiXjk2 ð6Þ

where Yj�Rm and Xj�R

n are given jth input and output data

point, respectively. Ci�Rm�n is the ith model. The product

CiXj is the predicted jth output. PEij corresponds to the

prediction error (for model Ci) between jth measured and

predicted outputs. The membership based on prediction

error is computed by

lij ¼1

PNk¼1

PEij

PEkj

� � 2q�1

ð7Þ

The modified objective function for FMC is

J ¼ 1

2

XN

i¼1

XM

j¼1

lqijkYj � CiXjk2

2

!

ð8Þ

The model updation (r is the iteration counter) can be

performed through either the first order necessary conditions

as in [12] (Algorithm I) or through a line search (Algorithm

II). If first order necessary conditions are used

Crþ1i ¼

XM

j¼1

lqijYjX

Tj

!XM

j¼1

lqijXjX

Tj

!�1

ð9Þ

Fig. 2 The idea behind the

proposed approach


123

If a line search is used

Crþ1i ¼ Cr

i þ arrgr ð10Þ

where rgr�Rm�n (the gradient with respect to Cri �R

m�nÞis given by

rgr ¼ �XM

j¼1

lqijðYj � Cr

i XjÞXTj ð11Þ

and ar, the step length is given by

ar ¼PM

j¼1

PNi¼1 lq

ijðrgrXjÞTðYj � Cri XjÞ

PMj¼1

PNi¼1 lq

ijðrgrXjÞTðrgrXjÞð12Þ

In summary, it is clear that membership reflects model fit.

This results in models accumulating data that they predict well.

In the first model initialization step (Phase I), a large number of

models are initialized. An assumption here is that this is more

than the number of models that describe the underlying data. In

the second fuzzy clustering step (Phase II), the initial models

are updated till convergence. In the third model rationalization

step (Phase III), the models that do not predict the data well are

discarded. This is achieved by choosing a threshold. Further,

similar models are merged by defining an angle between

models and comparing this with a threshold for model

rationalization. The model angle is defined based on the

output, where there is the most deviation between the

parameter vectors for the two models. If for example, two

models are exactly the same, then the model angle will be zero,

in which case we would ideally want to merge the models. In

actual practice, this will never happen and the model

parameters will be slightly different for even very similar

models due to finite sampling and noise effects. This is taken

into account by comparing the model angle against a threshold.

This makes it possible to simultaneously identify the number

of models, estimates model parameters and identify regions

corresponding to each model. The three phases of the FMC

algorithm are summarized in Table 2. Note that once the data

belonging to each model is classified, a technique such as the

one proposed in Rengaswamy and Venkatasubramanian [23]

can be used to describe this data in the input space.

3 Evaluation of PE-based FMC approaches

for estimation of static multiple linear regression

(SMLR) models

Three examples are presented in this section to evaluate the

prediction error based model clustering approach. For

purposes of comparison, the first and second examples are

from literature and results obtained by the FMC approach

are tabulated. The third example is a multi-input multi-

output (MIMO) MML problem. Three scenarios are dis-

cussed in this example. In the first scenario, every model

consists of equal number of data points. This scenario

evaluates a case where the assumption of dominant models

will be violated [5]. Models with unequal number of data

points violating the fair sampling assumption is considered

in the second scenario. In the third case, a majority of data

is in the region close to the partitions which represents a

rather unfair sampling policy. This is a difficult example

for Euclidean distance based clustering approaches such as

the ones proposed in [11]. In all the results, C denotes the

original models that generated the data, E denotes the

models estimated by the proposed algorithm. Whenever

the estimated models are merged (when angle between

them, h\5 deg) they are denoted by E. The models that

Table 2 Multiple model identification algorithm

Phase I: Model Initialization

Let Y�<m and l be the index for components; Rl = Ylmax - Yl

min

Rl is partitioned into ml partitions such thatQ

l ml ¼ N (number

of models initialized)

Y and X data are segregated into N partitions identified in the

previous step

Initial model parameters in each partition are calculated using

OLS technique

Phase II: Fuzzy Clustering:

Let r be the iteration index. r = 1, 2, 3,…The prediction error is computed by PEij ¼ kYj � CiXjk2

The membership values are updated by lij ¼ 1

PN

k¼1

PEijPEkj

� � 2q�1

The cluster centers are updated by

Algorithm I Crþ1i ¼

PMj¼1 lq

ijYjXTj

� � PMj¼1 lq

ijXjXTj

� ��1

Algorithm II Cir?1 = Ci

r - aP

j=1M lij

q (Yj - Cir Xj) Xj

T

a ¼PM

j¼1

PN

i¼1lq

ijðiXjÞT ðYj�Cri XjÞ

PM

j¼1

PN

i¼1lq

ijðrgiXjÞT ðrgiXjÞ

Where r gi = -P

j=1M lij

q (Yj - CirXj)Xj

T

The model and membership are updated till Root Mean Square

Error (RMSE) C tolffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPM

j¼1PE2

bj

M

r

� tol, where b represents the best fit

Phase III: Model Rationalization:

The similarity between two models are calculated using a simple

metric

Given models Ci;Cj�Rm�n, define hk ¼ cos�1 Ck

i �Ckj

kCki kkCk

j k.

In the above expression Cik represents the kth row of model Ci.

The angle between two models h is defined as max (hk V k)

If the angle between two models is less than a threshold

(h ¼ 5 deg is default threshold), then they are combined

The parameters of the merged model are computed using OLS on

data augmented from the merging models

The models that do not have any similarity to other models and

also have less than X% (X = 5 is default) data are discarded

The data from models that are discarded are reassigned to models

that fit them best


123

are not part of the true model set but are still retained after

the post processing step are denoted by O.

3.1 SMLR—Example 1

This example problem is taken from [5]. This problem

consists of three different models and the model informa-

tion is given in Table 3. There are totally 100 data points.

Gaussian white noise of standard deviation 0.1 is added to

the output Y. Cherkassky and Ma [5] formulated a multiple

model estimation problem based on the assumption that

majority of data points are from a single model. In accor-

dance with this assumption, the first model consists of

60 % data points. The second and third models consist of

30 and 10 % data points, respectively. This example is

solved using the proposed FMC algorithms. To start with,

six models (different from the number of models that

generated the data) are initialized. To study the effect of

fuzzifier q on the final result, we consider three different

values for q. The results obtained are shown in Table 4.

The data used in all three cases are the same. The FMC

algorithm I exactly found three models without misclassi-

fication of data points for fuzzifier values 2.5 and 2.0 and

overestimated a model for fuzzifier value 1.5. The FMC

algorithm II exactly found three models in all the three

cases and captured the data points belonging to these

models. The Euclidean distance between true and initial-

ized models for algorithm II are given in Table 5. The

initialized models E1 and E2 are close to true model C3

and they shared the data points of C3. Similarly, model E4

captured data points of C2 and models E5 and E6 captured

C1. The accumulation of data points at various iteration

steps are given in Table 6. Model E1 captured only 4 data

points but the angle between E1 and E2 is 3:32 deg. So, the

data points of the models E1 and E2 are augmented and

model parameters are estimated using OLS method. There

is no misclassification of data points in all the three cases.

There is a slight variation in model parameter values based

on the fuzzifier used in both the algorithms. This can be

attributed to the fuzzy weighting used in the model update

Eqs. 9 and 10. The fuzzy weighting depends on the value of

q. The model parameters in Table 4 are obtained using

OLS method by augmenting data points of the models,

whose intervening angle is less than 5 deg. Hence the

model parameters are identical. This suggests that FMC

algorithm II is insensitive to the value of q whereas FMC

algorithm I is slightly sensitive.


This example is taken from [5]. This example consists of

two different models and model information is given in

Table 7. Gaussian white noise of standard deviation 0.1 isTable 3 Model information for SMLR—Example 1

Model number Model

C1 0.8x ? 2

C2 0.2x ? 1

C3 -1.5x

Table 4 Results for SMLR—Example 1

Fuzzifier FMC

Algorithm

Model

number

Model No of

points

2.5 I, II E1 0.79 2.01 60

E2 0.12 1.02 30

E3 -1.25 -0.19 10

2.0 I, II E1 0.79 2.01 60

E2 0.12 1.02 30

E3 -1.25 -0.19 10

1.5 I E1 0.79 2.01 58

E2 0.12 1.03 28

E3 -1.17 -0.25 9

O1 3.63 -0.63 5

1.5 II E1 0.79 2.01 60

E2 0.12 1.02 30

E3 -1.25 -0.19 10

Table 5 Euclidean distance between the initialized and original

models

Original model E1 E2 E3 E4 E5 E6

C1 2.98 2.88 2.15 1.19 0.31 0.16

C2 1.84 1.74 1.02 0.08 1.02 1.19

C3 0.46 0.51 1.50 1.92 2.80 3.01

Table 6 Convergence for q = 2.5 for FMC algorithm II in SMLR—

Example 1

Iteration Number E1 E2 E3 E4 E5 E6

Start 6 3 0 30 10 49

5 8 2 0 30 17 43

10 8 2 0 30 27 33

15 7 3 0 30 28 32

20 5 5 0 30 23 37

25 5 5 0 30 23 37

30 4 6 0 30 23 37

35 4 6 0 30 23 37

Final result (40) 4 6 0 30 23 37


123

added to the output. The first model consists of 60 % of

data and the second model consists of the remaining 40 %

data. Five models are initialized to solve this problem and

results obtained for both FMC algorithms are detailed in

Table 8. Two cases are considered in this example. In the

first case, 100 data points are used and in the second case

1000 data points are used. The FMC algorithm I overesti-

mated a model in the first case. The overestimated model

O1 captured 9 data points of model E1 which is higher than

the threshold of 5 % of data points and hence this model is

retained in the final result. The angle between the estimated

models O1 and E1 is 6 deg and hence they cannot be

combined. In the second case, FMC algorithm I identified

two models. The model which accumulated less than 5 %

of data points is discarded. The data points of this model

are reassigned to the models having minimum fit error.

After this procedure only one data point is misclassified.

The two original models are exactly identified by FMC

algorithm II in both cases and with no misclassification of

data points. The model parameters are almost the same for

both the cases. The number of data points does not seem to

have much effect on the performance of the FMC algo-

rithm II whereas FMC algorithm I overestimated a model

when data points are less. However, when noise levels are

high, more data will be useful in estimating multiple

models.


In this example, we consider a MISO system. Three cases

are considered. In the first case, the true model set contains

four models and each model consists of equal number of

points. In the second case, two models consists of equal

number of points and the remaining two models consists of

unequal number of points. In the third case, the sampling is

biased towards the partition regions. In all these examples,

Gaussian white noise of standard deviation 0.1 is added to

the output data.

3.3.1 Case 1

The input data for this case is shown in Fig. 3a. There are

totally 1000 data points and each model consists of 250

data points. Eight models are initialized. The final results

obtained using the FMC algorithms are shown in Table 9.

The proposed algorithms exactly identified four models. In

FMC algorithm I, two estimated models captured 14 and 9

data points which is less than the threshold of 5 % of total

number of data points to be considered as separate models.

Hence they are reassigned, after which there are no mis-

classification of data points. In FMC algorithm II, for

model C3, four initialized models E2, E4, E6 and E8 are

close and these models captured 80,89,38 and 43 data

points respectively. The angle computed (as defined in

Table 2) between model E2 and models E4, E6 and E8 are

1:5 deg; 2:7 deg and 4:05 deg respectively. As a result, the

final model parameters for C3 are obtained by OLS esti-

mation using all the data points accumulated by these

models. All the data points are identified exactly to their

correct partitions and hence there is no misclassification.

The model parameters estimated are the same for both the

FMC algorithms.

3.3.2 Case 2

This case also contains 1000 data points and the input data

partition is shown in Fig. 3b. The first model has 210 data

points and second model has 490 data points. The third and

fourth models have 150 data points each. Figure 3b shows

the four different regions. In this case, the models are valid

for regions with unequal areas unlike the previous case.

Table 7 Model information for SMLR—Example 2

Model number Model

C1 x1 ? x2 ? x3 ? x4

C2 -x1 - x2 - x3 - x4 ? 6

Table 8 Results for SMLR—

Example 2Total no

of points

FMC

algorithm

Model

number

Model parameters No of

points

100 I E1 1.04 0.96 1.00 1.07 -0.06 51

E2 -1.02 -1.01 -1.03 -0.88 6.01 40

O1 1.62 0.95 1.42 0.85 -0.28 9

100 II E1 1.06 1.02 1.02 1.08 -0.11 60

E2 -1.05 -1.04 -1.03 -0.91 6.05 40

1000 I E1 0.99 0.99 1.00 0.98 0.01 599

E2 -0.99 -0.99 -1.00 -0.99 5.99 401

1000 II E1 0.99 0.99 0.99 0.98 0.02 600

E2 -0.96 -1.00 -1.02 -0.98 5.98 400


123

The model information used is the same as in the first case.

FMC algorithms I and II identified the four models as

shown in Table 9. Only one data point is misclassified. The

estimated models are very close to the original models.

3.3.3 Case 3

This case differs from the previous case as the sampling of

data in each region is biased towards the partitions. [8, 11]

formulated the MML problem based on the assumption of

fair sampling in the input space. This example is consid-

ered to study the impact of unfair sampling on the FMC

algorithms evaluated in this paper. Eight models are ini-

tialized. The estimated models are given in Table 9. In this

case also, only one data point is misclassified.

4 Evaluation of PE-based FMC approaches

for estimation of PWARX models

In this section, we discuss the use of the proposed algorithm

in the identification of PWARX systems. As discussed in

the introduction, this idea of prediction error based clus-

tering for the identification of the LDs and the corre-

sponding ARX parameters in a single integrated step does

not seem to have been attempted in the literature. While the

same algorithm as discussed in Table 2 can be used, there

are some differences that need to be accounted for carefully.

First, in the model initialization step, ARX models with

different orders (ny and nu) are initialized. An assumption

here is that the true model structures are part of the ini-

tialized models. This makes it possible to identify the

number of models and also the orders of these models in

the proposed approach. Note however that models not in the

true set could also be present in the initialized model set.

Second, in the static case, the data used with every model

has the same dimension. However, since ARX models with

different orders are initialized for the PWARX case, dif-

ferent models will deal with data of different dimensions.

This needs to be handled carefully by stacking time data

corresponding to each model. This needs to be reflected in

the model update step. Finally, in the model rationalization

step, only models of the same order are merged. Whenever

the model orders are different, all of these models are

retained. Further, the piecewise affine systems in Rn are

separated by boundary hyperplanes. The boundary hyper-

planes between adjacent models are estimated using support

vector classifiers (SVC) described in [3].

We now describe two examples taken from the literature

and one another synthetic example and report the results

that are obtained using the proposed algorithm. The esti-

mated models are denoted by E. From now on, we show

results only for FMC algorithm II.

4.1 PWARX—Example 1

This example is taken from [22]. The PWARX system

consists of three models with Yk; Uk 2 R1; ny (output

order) = 1 and nu (input order) = 1. The model information

with the model parameters and the partitions are given

below.

Yk ¼

½�0:4 1 1:5�Xk þ Ek

if Xk 2 v1 ¼ fXk : ½4 �1 10�Xk\0g½0:5 �1 �0:5�Xk þ Ek

if Xk 2 v2 ¼ Xk :�4 1 10

5 1 �6

� �

Xk � 0

� �

:

½�0:3 0:5 �1:7�Xk þ Ek

if Xk 2 v3 ¼ fXk : ½�5 �1 6�Xk\0gXk ¼ ½Yk�1 Uk�1 1�0

8>>>>>>>>>><

>>>>>>>>>>:

Table 9 Results for FMC algorithms I and II for SMLR—Example 3

Model number Original model Case 1 Case 2 Case 3

Estimated model No of points Estimated Model No of points Estimated Model No of points

E1 3 2 3.01 2.03 250 3.03 1.99 209 3.00 2.08 209

3 4 3.07 3.93 2.95 4.03 2.99 3.95

5 7 4.93 7.06 4.97 6.98 4.99 6.94

E2 -1 2 -0.98 2.00 250 -0.98 2.00 490 -0.97 1.99 491

0 -1 0.01 -1.00 0.00 -0.99 0.01 -1.00

1 3 0.97 3.02 1.01 2.99 0.99 3.01

E3 9 10 8.97 10.08 250 8.98 10.09 151 9.00 10.02 150

12 0 12.00 -0.03 12.00 -0.01 11.99 0.01

1 5 1.06 4.88 1.01 4.99 0.97 5.10

E4 10 13 10.00 12.99 250 10.02 12.97 150 10.07 12.90 150

0 4 -0.02 4.01 -0.13 4.14 -0.05 4.05

2 4 1.97 4.03 2.06 3.92 2.00 3.99


123

200 data points for uk are generated from an uniform dis-

tribution in the interval [-5 5]. To this data, noise from a

uniform distribution in the interval [-0.1 0.1] is added. 57,

71 and 72 data points respectively are generated for models

C1, C2 and C3. PWARX models are identified from this

data using the FMC algorithm initialized with five models.

The results obtained are tabulated in Table 10. Two esti-

mated models which accumulated less than 5 % of data

points are discarded. SVC is used to estimate boundary

hyperplanes and the estimated boundary hyperplanes are

given in Table 11. The data points belonging to discarded

models are reassigned to their appropriate models based on

the hyperplane partition. In the final result, only one data

point is misclassified. The data partition in the regressor

space is shown in Fig. 4.


This example is taken from [22]. This system is charac-

terized by a SISO Hammerstein model. This system con-

sists of three models with with Yk; Uk 2 R1; ny ¼ 2 and

nu = 1. The PWARX formulation of the SISO Hammer-

stein system taken from [22] is given below.

Yk ¼

½�a1 �a2 0 b1umax�Xk þ Ek

if Xk 2 v1 ¼ fXk : ½0 0 1 �umax�Xk [ 0g½�a1 �a2 b1 0�Xk þ Ek

if Xk 2 v2 ¼ Xk :0 0 �1 umax

0 0 1 �umin

� �

Xk � 0

�

½�a1 �a2 0 b1umin�Xk þ Ek

if Xk 2 v3 ¼ fXk : ½0 0 �1 umin�Xk\0g

Xk ¼ ½Yk�1 Yk�2 Uk�1 1�0

8>>>>>>>>>>>>><

>>>>>>>>>>>>>:

The parameter values are a1 = 0.5, a2 = 0.1,

b1 = 1, umin = - 1, and umax = 2. There are 250 data

points. The input data is generated from a normal distri-

bution with standard deviation 2. To the measurement data,

Fig. 3 Input data for SMLR—Example 3

Table 10 Estimated model parameters for PWARX—Example 1

Model number Model No. of points

E5 -0.40 1.00 1.53 56

E3 0.52 -1.00 -0.49 72

E2 -0.30 0.50 -1.72 72

Table 11 Estimated hyperplanes for PWARX—Example 1

Hyperplane True hyperplane Estimated hyperplane

h12 ½ 0:40 �0:10 1:00 � ½ 0:40 �0:10 1:00 �h23 ½ �0:83 �0:16 1:00 � ½ �0:85 �0:14 1:00 �


123

noise generated from a normal distribution with standard

deviation 0.02 is added. The number of data points for

models C1, C2 and C3 are 42, 132 and 76 respectively. The

FMC algorithm is initialized with six models. The results

obtained are tabulated in Table 12. The FMC algorithm

identified the three models. The models that accumulated

less than 5 % of data points are discarded. These data

points are reassigned to the three identified models. Three

data points are misclassified. The boundary hyperplanes

estimated using the SVC method are tabulated in Table 13.


This is a synthetic example that we consider to demonstrate

the efficacy of the FMC algorithm for the case of unknown

number of models and their orders. The input is generated

from a uniform distribution in [-10 10]. The input is

partitioned into four different models of different orders.

The model orders and the parameters for this system are

given below.

Yk ¼

½�0:7 0:7� ½Uk�1 1� 0 þEk

if Uk�1 2 ½�10 �4�½�0:4 �0:7 �0:3� ½Yk�1 Uk�1 1� 0 þEk

if Uk�1 2 ½�4 0�½0:6 �0:2 �0:3 0:1� ½Yk�1 Uk�1 Uk�2 1� 0 þEk

if Uk�1 2 ½0 6�½0:3 �0:7 0:5 0:5� ½Yk�1 Yk�2 Uk�1 1� 0 þEk

if Uk�1 2 ½6 10�

8>>>>>>>>>><

>>>>>>>>>>:

To the output Yk, noise generated from a uniform

distribution in [-0.1 0.1] is added. 500 data points are

generated. The number of data points for models

C1, C2, C3 and C4 are 128, 111, 92 and 169

respectively. The FMC algorithm is initialized with 6

models. Four of the six models are of the true model class,

while the other two are not. The model parameters obtained

by the proposed algorithm are tabulated in Table 14 and

the estimated hyperplanes are given in Table 15. It can be

seen that the proposed algorithm performs remarkably well

in identifying the unknown number of models and their

unknown orders.

5 Conclusions

This paper evaluates one possible prediction error formu-

lation for the MML problem. The basic approach relies on

Fig. 4 Results for PWARX—

Example 1



E6 -0.50 -0.10 0.00 2.00 43

E4 -0.50 -0.10 1.00 0.00 131

E1 -0.50 -0.10 0.00 -1.00 76



h12 ½ 0:00 0:00 �0:50 1:00 � ½ �0:01 �0:01 �0:50 1:00 �h23 ½ 0:00 0:00 1:00 1:00 � ½ �0:04 0:040 0:98 1:00 �



E1 -0.70 0.71 124

E2 0.40 -0.70 -0.27 115

E4 0.60 -0.24 -0.30 0.41 164

E5 0.30 -0.70 0.50 0.49 97


123

a fuzzy model clustering paradigm. Multiple models are

expected to be estimated without the knowledge of number

of underlying models and how these models partition the

data. This ties in with the goals of any solution to the MML

problem, which are: to identify number of models, estimate

model parameters, and partition data corresponding to each

model. We evaluate an approach where models migrate to

accumulate data through a prediction error based mem-

bership. Through a post processing step, the models that

accumulate a large number of data points (more than

specified threshold) are retained. Further, through a simi-

larity measure, model rationalization is performed. Taken

together, these ideas solve the problem of estimating the

unknown number of models and identifying the underlying

data partitions. Various simulated examples are used to test

the robustness of the prediction error based fuzzy model

clustering approaches for identifying static multiple linear

regression (SMLR) and piecewise ARX (PWARX) models.

References

1. Baptista, R.S., Ishihara, J.Y., Borges, G.A.: A Split and Merge

Algorithm for Identification of Piecewise Affine Systems. In:

American Control Conference, pp. 2018–2023. San Francisco,

USA (2011)

2. Bemporad, A., Garulli, A., Paoletti, S., Vicino, A.: A bounded-

error approach to piecewise affine system identification. IEEE

Trans. Autom. Control 50(10), 1567–1580 (2005)

3. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge

University Press, Cambridge (2004)

4. Chen, J., Xi, Y., Zhang, Z.: A clustering algorithm for fuzzy

model identification. Fuzzy Sets Syst. 98, 319–329 (1998)

5. Cherkassky, V., Ma, Y.: Multiple model regression estimation.

IEEE Trans. Neural Netw 16, 785–798 (2005)

6. DeSarbo, W.S.: A Maximum likelihood methodology for clust-

erwise linear regression. J. Classif. 5, 249–282 (1988)

7. DeSarbo, W.S., Oliver, R.L., Rangaswamy, A.: A Simulated

annealing methodology for clusterwise linear regression. Psy-

chometrika 4, 707–736 (1989)

8. Dufrenois, F., Hamad, D.: Fuzzy Weighted Support Vector

Regression For Multiple Linear Model Estimation: Application to

Object Tracking in Image Sequences. In: Proceedings of the

International Joint Conference on Neural Networks, pp. 1289–

1294. Orlando, Florida, USA (2007)

9. D’Urso, P., Massari, R., Santoro, A.: A class of fuzzy clusterwise

regression models. Inf. Sci. 180, 4737–4762 (2010)

10. Elfelly, N., Dieulot, J., Benrejeb, M., Borne, P.: A new appraoch

for multimodel identification of complex systems based on both

neural and fuzzy clustering algorithms. Eng. Appl. Artif. Intell.

23, 1064–1071 (2010)

11. Ferrari-Trecate, G., Musellic, M., Liberatid, D., Morari, M.: A

clustering technique for the identification of piecewise affine

systems. Automatica 39, 205–217 (2003)

12. Frigui, H., Krishnapuram, R.: A robust competitive clustering

algorithm with applications in computer vision. IEEE Trans.

Pattern Anal Mach Intell 21, 450–465 (1999)

13. Gegundez, M.E., Aroba, J., Bravo, J.M.: Identification of piece-

wise affine systems by means of fuzzy clustering and competitive

learning. Eng. Appl. Artif. Intell. 21, 1321–1329 (2008)

14. Gugiliya, J.K., Gudi, R.D., Lakshminarayanan, S.: Multi-model

decomposition of nonlinear dynamics using a fuzzy-CART

approach. J. Process Control 15(4), 417–434 (2005)

15. Hennig, C.: Models and Methods for Clusterwise Linear

Regression. Springer, Heidelberg (1999)

16. Hennig, C.: Identifiability of models for clusterwise linear

regression. J. Classif. 17, 273–296 (2000)

17. Jin, X., Huang, B.: Robust identification of piecewise/switching

autoregressive exogeneous process. AIChE J. 56, 7 (2010)

18. Juloski, A., Weiland, S., Heemels, W.: A Bayesian approach to

identification of hybrid systems. In: 43rd Conference on Decision

and Control, pp. 13–19. Paradise Island, Bahamas (2004)

19. Klawonn, F., Hoppner, F.: Advances in Intelligent Data Analysis

V. Springer, Berlin (2003)

20. Kung, C., Su, J., Nieh, Y.: A Novel Cluster Validity Criterion for

Fuzzy c-Regression Models. In: FUZZ-IEEE, pp. 1885–1890.

Korea (2009)

21. Li, C., Zhou, J., Xiang, X., Li, Q., An, X.: T-S fuzzy model

ientification based on a novel fuzzy c-regression model clustering

algorithm. Eng. Appl. Artif. Intell. 22, 646–653 (2009)

22. Nakada, H., Takaba, K., Katayama, T.: Identification of piece-

wise affine systems based on statistical clustering technique.

Automatica 41, 905–913 (2005)

23. Rengaswamy, R., Venkatasubramanian, V.: A fast training neural

network and its updation for incipient fault detection and diag-

nosis. Comput. Chem. Eng. 24, 431–437 (2000)

24. Roll, J., Bemporad, A., Ljung, L.: Identification of piecewise

affine systems via mixed-integer programming. Automatica 40,

37–50 (2004)

25. Skeppstedt, A., Ljung, L., Millnert, M.: Construction of com-

posite models from observed data. Int. J. Control 55, 141–152

(1992)

26. Soltani, M., Aissaoui, B., Chaari, A., Ben Hmida, F., Gossa, M.:

A modified fuzzy c-regression model clustering algorithm for t-s

model identification. In: 8th International Multi-Conference on

Systems Signals & Devices (2011)

27. Spath, H.: Multiple model regression estimation. Computing 22,

367–373 (1979)

28. Venkat, N., Gudi, D.: Fuzzy segregation-based identification and

control of nonlinear dynamic systems. Ind. Eng. Chem. Res. 41,

538–552 (2002)

29. Venkat, N., Vijaysai, P., Gudi, D.: Identification of complex

nonlinear processes based on fuzzy decomposition of the steady

state space. J. Process Control 13, 473–488 (2003)

30. Vidal, R.: Recursive identification of switched arx systems. Au-

tomatica 44, 2274–2287 (2008)

31. Yang, M., Ko, C.: On Cluster-wise fuzzy regression analysis.

IEEE Trans. Syst. Man Cybernet. B: Cybernet. 27, 1–13 (1997)



h12 ½ �0:12 0:05 �0:80 �0:01 0:01 �3:14 � ½ �0:07 0:03 �0:71 �0:03 0:01 �3:03 �h23 ½ �0:05 0:07 �2:78 �0:05 �0:02 �0:45 � ½ �0:18 0:02 �4:33 �0:09 �0:06 �0:29 �h34 ½ �0:13 �0:07 �0:62 �0:01 �0:03 4:24 � ½ 0:08 0:07 �0:95 0:03 0:05 4:69 �


123

Evaluation of prediction error based fuzzy model clustering approaches for multiple model learning

Documents

Transcript of Evaluation of prediction error based fuzzy model clustering approaches for multiple model learning