Evaluation of prediction error based fuzzy model clustering approaches for multiple model learning
-
Upload
raghunathan -
Category
Documents
-
view
219 -
download
0
Transcript of Evaluation of prediction error based fuzzy model clustering approaches for multiple model learning
![Page 1: Evaluation of prediction error based fuzzy model clustering approaches for multiple model learning](https://reader036.fdocuments.us/reader036/viewer/2022081808/575070371a28ab0f07d3ec16/html5/thumbnails/1.jpg)
Evaluation of prediction error based fuzzy model clusteringapproaches for multiple model learning
Vidyashankar Kuppuraj • Raghunathan Rengaswamy
Published online: 16 May 2012
� Indian Institute of Technology Madras 2012
Abstract Identifying multiple models from both static
and dynamic data is an important problem in several
engineering fields. Clustering based on Euclidean distance
measure has been proposed to solve this problem. How-
ever, since Euclidean distance is not directly related to
model fidelity, these approaches can lead to suboptimal
results even when the number of models is known. In this
work, through a three step algorithm that includes initial-
ization, prediction error based fuzzy clustering and model
rationalization, we evaluate the possibility of uncovering
multiple model structures from data. The three step algo-
rithm is also assessed for the identification of piecewise
auto regressive exogenous systems with unknown number
of models and their (unknown)orders. The basic approach
can be extended for trend analysis and generalized princi-
pal component analysis.
Keywords Multiple model learning (MML) �PWARX models � Fuzzy clustering
1 Introduction
In this paper, we discuss a fuzzy clustering based model
learning formulation for estimation of multiple models.
Good predictive models are imperative in the fields of Fault
Detection and Isolation (FDI), economics, home insurance,
image processing and supervisory control applications.
There are many instances where data are generated from
multiple models operating in intersecting/non-intersecting
partitions of the input space [5, 11, 14, 22, 24, 25, 28–30].
The focus of the paper is on these scenarios where the data
samples are the input (X) and output (Y) data generated by
multiple linear models. Let Ci denote the parameters of the
ith model. The data samples are composed of several input-
output tuples [Xij Yij] that are functionally related as below.
Yi;j ¼ CiXi;j þ ei;j : i ¼ 1 : N; j ¼ 1 : Mi ð1Þ
where ei,j is i.i.d random noise. Xi;j�Rn; Yi;j�R
m and
Ci�Rm�n: N is the number of underlying models and Mi is the
number of data points from the region described by model
i, M is the total number of points with RMi ¼ M. Let us
consider three instances of the MML problem [11, 24].
Case 1 The number of models and data partitioning are
known. In these cases, the objective is to estimate model
parameters. These are cases where we know the partitions
a priori based on process knowledge. The model parame-
ters for each partition can be estimated using a ordinary
least squares (OLS) method.
Case 2 The number of models is known; however, data
partitioning is not known. In this case, the objective is to
partition the data and to estimate model parameters. The
data are partitioned into subsets equal to the number of
models. Model parameters are estimated for each of these
subsets. Various clustering techniques are employed in the
literature for data partitioning.
Case 3 Neither the partitions nor the number of models
are known. This is a challenging and difficult problem.
Under some restrictive assumptions, the existing tech-
niques solve this problem either by repetitively applying
their algorithm for the known number of models case or
through an optimization formulation. Optimization for-
mulations are computationally demanding.
V. Kuppuraj � R. Rengaswamy (&)
Department of Chemical Engineering, Texas Tech University,
Lubbock, TX 79409, USA
e-mail: [email protected]
123
Int J Adv Eng Sci Appl Math (March–June 2012) 4(1–2):10–21
DOI 10.1007/s12572-012-0058-y IIT, Madras
![Page 2: Evaluation of prediction error based fuzzy model clustering approaches for multiple model learning](https://reader036.fdocuments.us/reader036/viewer/2022081808/575070371a28ab0f07d3ec16/html5/thumbnails/2.jpg)
In this paper, we evaluate a multiple model learning
approach that performs clustering based on prediction
error. This is similar to the work of Frigui and Krish-
napuram [12] where the residual is used as a distance
measure, which is a prediction error based approach. The
residuals are generated in a total least squares (TLS)
framework as discussed in the work of [12]. In our work,
we retain the fuzzy clustering framework and develop a
model clustering approach based on a ordinary least
squares (OLS) formulation. We evaluate such an approach
for model update and also a line search approach for model
update, which seems to be more efficient at identifying the
underlying multiple model structure. Further, we develop
post-processing techniques for model rationalization that
enhances the ability of the approach in identifying the
original system as closely as possible. We also evaluate the
efficacy of such a multiple model learning framework
through extensive simulation studies where the underlying
data is generated from known multiple models under dif-
ferent sampling situations; this is something that does not
seem to have been discussed much in the literature.
In the area of identification of piecewise auto regressive
exogenous (PWARX) models, surprisingly, most of the
clustering methods seem to have as a first step, a distance
based clustering for the identification of local data sets
(LDs). When our proposed approach is applied to the
PWARX problem, this step can be removed and the LDs
and the corresponding model parameters can be identified
in a single integrated step using the prediction error based
membership. We believe that this will remove many of the
problems related to the mixed LDs reported in the litera-
ture. Further, since the clustering objective is directly
related to the model fidelity, theoretical analysis of the
performance characteristics of the proposed algorithm
should be more accessible.
1.1 Literature review
Research in the literature has addressed both the static and
dynamic multiple model learning (MML) cases. The
problem of MML has been around for a long time. The
initial work went under the name of clusterwise linear
regression [6, 7, 15, 16, 27]. Spath [27] coined the term
clusterwise regression and developed an algorithm that
finds optimal feasible partitions and the regression coeffi-
cients for each of the clusters. A conditional maximum
likelihood estimation approach for clusterwise regression
was introduced by [6]. A simulated annealing based
approach for the same problem was proposed by [7].
Hennig [15] discusses different methods for clusterwise
regression, while in [16], the identifiability of the cluster-
wise regression parameters was studied. A similar problem
has been discussed as multiple model general linear
regression (MMGLR) problem in Frigui and Krishnapuram
[12]. While the focus of this paper was on robust clustering
using competitive agglomeration, MMGLR is discussed as
an exemplar problem for the robust competitive agglom-
eration (RCA) algorithm, where the standard c-means
distance measure is replaced by a prediction error based
measure.
In a similar vein, a multiple model learning (MML)
problem is posed by Cherkassky and Ma [5] and solved
assuming the existence of dominant model structures in the
data. A repetitive two-step algorithm is proposed to esti-
mate multiple models. In the first step, a major model is
estimated. In the second data partitioning step, data
belonging to the major model are segregated from the
residual (left-out) data. The residual data is assumed to be
generated by a dominant model. These two steps are
applied on data until a specified tolerance is met. A support
vector machine (SVM) based regression procedure is
employed for model parameter estimation. In this
approach, the existence of dominant model structures is
assumed and model estimation is iterative, one model is
estimated at a time. Dufrenois and Hamad [8] discuss an
extension to support vector regression (SVR) approach to
simultaneously extract multiple linear structures from data.
In this formulation, data are assigned fuzzy memberships to
multiple linear structures. Residuals—defined as maximum
model mismatches—are used to update memberships.
Updated memberships are used in recalculation of model
parameters using a dual SVR optimization formulation.
Except for providing dynamic upper bounds for the
Lagrange multipliers, memberships do not change the
standard SVR dual formulation. The number of models is
assumed to be known in this work. The authors suggest that
the question of unknown number of models may be
resolved by competitive agglomeration but do not provide
an algorithm. Fair sampling and known number of models
are assumptions made in this approach. Elfelly et al. [10]
propose a three step algorithm for development of multiple
models. In the first step, the optimal number of clusters are
determined using a rival penalized competitive learning
(RPCL). The cluster centers for these models are identified
using a fuzzy k-means algorithm using an Euclidean dis-
tance based membership. In the second step, parametric
model identification is performed based on the clustering
results. In the third step, these local models are combined
to form a global model.
In related literature, several papers address the multiple
model regression problem under the framework of fuzzy
regression analysis (see [4, 9, 20, 21, 26, 31] and the ref-
erences therein). The ideas are very similar to the MML
algorithms but the final model is presented in a fuzzy
regression framework. The basic idea of prediction error
based distance is included in the model update step in
Int J Adv Eng Sci Appl Math (March–June 2012) 4(1–2):10–21 11
123
![Page 3: Evaluation of prediction error based fuzzy model clustering approaches for multiple model learning](https://reader036.fdocuments.us/reader036/viewer/2022081808/575070371a28ab0f07d3ec16/html5/thumbnails/3.jpg)
different forms in the works of [20, 21, 26, 31]. However,
papers in this framework do not seem to be focused on
identifying the exact underlying model structure but rather
on building a fuzzy regression model that captures such
data adequately.
MML for dynamical systems are of interest in various
fields of engineering. The most popular of these problems
is the identification of PWARX models. While there have
been several techniques for PWARX identification such as:
modified expectation minimization (EM) [17], bounded
error procedure [2], Bayesian approach [18], algebraic
approaches [30], and mixed integer programming approach
[24], our focus is on the clustering based approaches.
Nakada et al. [22] proposed an algorithm to identify
PWARX models based on a statistical clustering technique.
In the proposed algorithm, with the assumptions of known
number of models and model order, the original data is
clustered using a Gaussian mixture model. The clustered
data is projected on to the regressor space. Support vector
classifiers are used to estimate the boundary hyperplanes
between neighboring clusters in the regressor space. The
model parameters in each cluster are identified using a least
squares method. Through repetitive application of their
identification procedure, [22] discussed a method to esti-
mate number of models from data. Consistent Akaike’s
information (CAI) and minimum description length (MDL)
criteria are minimized to identify the unknown number of
models. The number of models is identified by iteration
and the EM clustering algorithm is strongly dependent on
initial values.
Ferrari-Trecate et al. [11] proposed a clustering based
algorithm that works under the assumption that the model
order and number of models are known. The first step in
this algorithm is the grouping of small data sets known as
local data sets (LDs). The LDs which consist of data from a
single model are termed pure LDs. In contrast, mixed LDs
include data from different models. The authors discuss
problems in dealing with mixed LDs. An important
assumption in this work is that the ratio of the number of
mixed to pure LDs is small. In other words, fair sampling is
assumed. A parameter vector is estimated for each LD
based on a least squares method. K-means clustering is
employed to group parameter vectors into disjoint subsets
equal to the assumed known number of models. Bijective
maps between parameter vectors and data are used to
classify the original data. The authors suggest that the
estimation of number of models may be performed using
clustering algorithms; however, this idea has not been
developed further. The knowledge of number of models is
a limitation in this technique and an assumption of fair
sampling is also made. In recent work, Gegundez et al. [13]
propose an initial distance based clustering, followed by
identification of initial models and lastly, identification of
the final models using competitive learning. Baptista et al.
[1] propose a split and merge algorithm, where initial
distance based clustering is followed by a least squares
identification of the submodel ARX parameters and finally,
similar models are grouped together using a split and
merge algorithm.
As seen from the literature survey, in the static case, the
idea of prediction error based multiple model learning has
been around for a while. The original work of Frigui and
Krishnapuram [12] discussed this but the focus was not a
detailed evaluation of such an approach for multiple model
learning. Further, many of fuzzy regression approaches
also use prediction error based distance measure, however,
the final goal is the development of an adequate fuzzy
regression model for the data. In summary, estimating the
unknown number of models and input partitions from finite
data is a difficult problem. The majority of existing
approaches work under assumptions about either known
number of models or restrictions on sampling of data
points. Violation of these assumptions could result in poor
multiple model estimation. Some of the existing approa-
ches iteratively estimate one model at a time using the
assumption of dominant model structures. Further, many of
the prior approaches use an Euclidean distance measure for
model identification. Since Euclidean distance based clus-
tering of data points using input data or augmented input
and output data is not directly related to number of models
these approaches may be suboptimal.
2 Multiple model learning formulation
In this work, a fuzzy model clustering (FMC) approach is
developed and evaluated for MML problems. The FMC
approach builds on the fuzzy C-means (FCM) clustering
approach. In view of this, the FCM clustering algorithm is
described next.
2.1 Fuzzy clustering
Clustering is a technique that is used to assign objects with
similar characteristics to groups. The similarity is measured
through an appropriate objective. As an example, cluster-
ing of multi-dimensional data with Euclidean distance
objective identifies data rich regions in the input space.
Clustering techniques can be classified as either hard or
soft clustering. Figure 1a depicts hard clustering. The three
clusters shown in the figure are separated by sharp
boundaries. Soft clustering is shown in Fig. 1b; the clusters
overlap with each other. The K-means algorithm is a hard
clustering technique, while FCM algorithm performs soft
clustering. In hard clustering, whenever a data point
12 Int J Adv Eng Sci Appl Math (March–June 2012) 4(1–2):10–21
123
![Page 4: Evaluation of prediction error based fuzzy model clustering approaches for multiple model learning](https://reader036.fdocuments.us/reader036/viewer/2022081808/575070371a28ab0f07d3ec16/html5/thumbnails/4.jpg)
belongs to a cluster it has a membership of one to that
cluster; else the membership is zero. In soft clustering
methods, the membership of a data point to a cluster is
between zero (no membership) and one (complete mem-
bership). This is termed as partial membership. The sum of
partial memberships of a data point to all clusters equals
one. The FMC algorithm proposed in this paper uses a
novel model clustering concept that is based on FCM
clustering. The standard FCM algorithm is discussed next.
Let X�Rnþm denote the multi-dimensional data. Here we
assume that clustering is performed in the augmented
input-output space. The data contains M samples. N cluster
centers are initialized in the FCM algorithm. Let hi�Rnþm
denote the ith cluster center. i and j are the indices used to
represent the cluster centers and data points respectively. In
the FCM algorithm, a normalized membership of the jth
data point to the ith cluster center is computed by
lij ¼1
PNk¼1
dij
dkj
� � 2q�1
ð2Þ
In Eq. 2, dij is the Euclidean distance (jth data point to
ith cluster center). The parameter q is termed the fuzzifier.
The effect of the fuzzifier on the resulting cluster partitions
is described in [19]. q (greater than one) controls the
overlap in the cluster regions. If q is close to 1, the result is
hard clustering and if q �! 1, clustering becomes totally
fuzzy. It can be easily verified that the sum of partial
memberships of any data point to all clusters is one.
XN
i¼1
lij ¼ 1 ð3Þ
The update equation for cluster center is given by
hi ¼PM
j¼1 lqijXj
PMj¼1 lq
ij
ð4Þ
It can be seen from Eq. 4 that data with high
membership to a cluster have a larger impact on the
cluster location. This leads to competitive agglomeration.
At the termination of the FCM algorithm, the cluster
centers which have low memberships for all data are
discarded. The FCM clustering algorithm is summarized in
Table 1.
2.2 Prediction error based fuzzy model clustering
The fundamental idea behind the prediction error based
approaches is depicted in Fig. 2. In Fig. 2a, the different
symbols represent the data corresponding to the different
models. Figure 2a depicts the standard FCM algorithm
where the cluster centers (red circles) migrate towards data.
(a) Hard clustering (b) Soft clustering
Fig. 1 Hard and soft clustering
Table 1 Fuzzy c-means algorithm
Choose number of cluster centers and a tolerance tol
Initialize membership values lij which are elements of membership
matrix l
Let r be the iteration index. r = 1, 2, 3,…
The cluster centers are updated by hri ¼
PM
j¼1lðr�1Þ
ijð ÞqXjPM
j¼1lðr�1Þ
ijð Þq
The membership values are updated by lrij ¼ 1
PN
k¼1
dijdkj
� � 2q�1
The cluster centers and membership values are updated till
klr � lðr�1Þk� tol
Int J Adv Eng Sci Appl Math (March–June 2012) 4(1–2):10–21 13
123
![Page 5: Evaluation of prediction error based fuzzy model clustering approaches for multiple model learning](https://reader036.fdocuments.us/reader036/viewer/2022081808/575070371a28ab0f07d3ec16/html5/thumbnails/5.jpg)
Clustering is performed in the data space (that includes
inputs and outputs). This process is largely controlled by
the data density in the different regions of the data space.
Denser regions will attract more cluster centers. Notice that
this density has really no bearing on the models that are
applicable in these regions. In contrast, in the proposed
approach, conceptually depicted in Fig. 2b, where the
different symbols now represent the parameter vectors for
the different models. Clustering is now performed with the
model parameters as the decision variables. If we assumed,
for example, that four different models partition the input
space in Fig. 2a, then there are four points in the model
space that need to be identified. The cluster centers (red
circles) are initialized in the model space and they migrate
towards the four points through the FMC algorithm. Notice
that if these points are exactly identified, then the question
of the number of models and the model parameters are
directly answered. The input space partition is also
answered implicitly through the membership of the data in
the input space to the cluster centers.
It is well-known that the standard FCM algorithm can be
viewed as a minimization of a membership weighted distance
with cluster centers in the data space as the decision variables.
J ¼XN
i¼1
XM
j¼1
lqijkXj � hik2
2
!
ð5Þ
k � k represents Euclidean norm. In the clustering process
as depicted in Fig. 2b, model parameters replace the cluster
centers as decision variables. Further, the weighted
distance metric is replaced by a weighted prediction error
in the objective function. The FMC formulation is inter-
preted as models migrating to accumulate data points
minimizing overall prediction error. This migration results
in simultaneous model parameter estimation and data
partitioning. The data assigned to a model characterizes the
input partition. The prediction error is computed by
PEij ¼ kYj � CiXjk2 ð6Þ
where Yj�Rm and Xj�R
n are given jth input and output data
point, respectively. Ci�Rm�n is the ith model. The product
CiXj is the predicted jth output. PEij corresponds to the
prediction error (for model Ci) between jth measured and
predicted outputs. The membership based on prediction
error is computed by
lij ¼1
PNk¼1
PEij
PEkj
� � 2q�1
ð7Þ
The modified objective function for FMC is
J ¼ 1
2
XN
i¼1
XM
j¼1
lqijkYj � CiXjk2
2
!
ð8Þ
The model updation (r is the iteration counter) can be
performed through either the first order necessary conditions
as in [12] (Algorithm I) or through a line search (Algorithm
II). If first order necessary conditions are used
Crþ1i ¼
XM
j¼1
lqijYjX
Tj
!XM
j¼1
lqijXjX
Tj
!�1
ð9Þ
Fig. 2 The idea behind the
proposed approach
14 Int J Adv Eng Sci Appl Math (March–June 2012) 4(1–2):10–21
123
![Page 6: Evaluation of prediction error based fuzzy model clustering approaches for multiple model learning](https://reader036.fdocuments.us/reader036/viewer/2022081808/575070371a28ab0f07d3ec16/html5/thumbnails/6.jpg)
If a line search is used
Crþ1i ¼ Cr
i þ arrgr ð10Þ
where rgr�Rm�n (the gradient with respect to Cri �R
m�nÞis given by
rgr ¼ �XM
j¼1
lqijðYj � Cr
i XjÞXTj ð11Þ
and ar, the step length is given by
ar ¼PM
j¼1
PNi¼1 lq
ijðrgrXjÞTðYj � Cri XjÞ
PMj¼1
PNi¼1 lq
ijðrgrXjÞTðrgrXjÞð12Þ
In summary, it is clear that membership reflects model fit.
This results in models accumulating data that they predict well.
In the first model initialization step (Phase I), a large number of
models are initialized. An assumption here is that this is more
than the number of models that describe the underlying data. In
the second fuzzy clustering step (Phase II), the initial models
are updated till convergence. In the third model rationalization
step (Phase III), the models that do not predict the data well are
discarded. This is achieved by choosing a threshold. Further,
similar models are merged by defining an angle between
models and comparing this with a threshold for model
rationalization. The model angle is defined based on the
output, where there is the most deviation between the
parameter vectors for the two models. If for example, two
models are exactly the same, then the model angle will be zero,
in which case we would ideally want to merge the models. In
actual practice, this will never happen and the model
parameters will be slightly different for even very similar
models due to finite sampling and noise effects. This is taken
into account by comparing the model angle against a threshold.
This makes it possible to simultaneously identify the number
of models, estimates model parameters and identify regions
corresponding to each model. The three phases of the FMC
algorithm are summarized in Table 2. Note that once the data
belonging to each model is classified, a technique such as the
one proposed in Rengaswamy and Venkatasubramanian [23]
can be used to describe this data in the input space.
3 Evaluation of PE-based FMC approaches
for estimation of static multiple linear regression
(SMLR) models
Three examples are presented in this section to evaluate the
prediction error based model clustering approach. For
purposes of comparison, the first and second examples are
from literature and results obtained by the FMC approach
are tabulated. The third example is a multi-input multi-
output (MIMO) MML problem. Three scenarios are dis-
cussed in this example. In the first scenario, every model
consists of equal number of data points. This scenario
evaluates a case where the assumption of dominant models
will be violated [5]. Models with unequal number of data
points violating the fair sampling assumption is considered
in the second scenario. In the third case, a majority of data
is in the region close to the partitions which represents a
rather unfair sampling policy. This is a difficult example
for Euclidean distance based clustering approaches such as
the ones proposed in [11]. In all the results, C denotes the
original models that generated the data, E denotes the
models estimated by the proposed algorithm. Whenever
the estimated models are merged (when angle between
them, h\5 deg) they are denoted by E. The models that
Table 2 Multiple model identification algorithm
Phase I: Model Initialization
Let Y�<m and l be the index for components; Rl = Ylmax - Yl
min
Rl is partitioned into ml partitions such thatQ
l ml ¼ N (number
of models initialized)
Y and X data are segregated into N partitions identified in the
previous step
Initial model parameters in each partition are calculated using
OLS technique
Phase II: Fuzzy Clustering:
Let r be the iteration index. r = 1, 2, 3,…The prediction error is computed by PEij ¼ kYj � CiXjk2
The membership values are updated by lij ¼ 1
PN
k¼1
PEijPEkj
� � 2q�1
The cluster centers are updated by
Algorithm I Crþ1i ¼
PMj¼1 lq
ijYjXTj
� � PMj¼1 lq
ijXjXTj
� ��1
Algorithm II Cir?1 = Ci
r - aP
j=1M lij
q (Yj - Cir Xj) Xj
T
a ¼PM
j¼1
PN
i¼1lq
ijðiXjÞT ðYj�Cri XjÞ
PM
j¼1
PN
i¼1lq
ijðrgiXjÞT ðrgiXjÞ
Where r gi = -P
j=1M lij
q (Yj - CirXj)Xj
T
The model and membership are updated till Root Mean Square
Error (RMSE) C tolffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPM
j¼1PE2
bj
M
r
� tol, where b represents the best fit
Phase III: Model Rationalization:
The similarity between two models are calculated using a simple
metric
Given models Ci;Cj�Rm�n, define hk ¼ cos�1 Ck
i �Ckj
kCki kkCk
j k.
In the above expression Cik represents the kth row of model Ci.
The angle between two models h is defined as max (hk V k)
If the angle between two models is less than a threshold
(h ¼ 5 deg is default threshold), then they are combined
The parameters of the merged model are computed using OLS on
data augmented from the merging models
The models that do not have any similarity to other models and
also have less than X% (X = 5 is default) data are discarded
The data from models that are discarded are reassigned to models
that fit them best
Int J Adv Eng Sci Appl Math (March–June 2012) 4(1–2):10–21 15
123
![Page 7: Evaluation of prediction error based fuzzy model clustering approaches for multiple model learning](https://reader036.fdocuments.us/reader036/viewer/2022081808/575070371a28ab0f07d3ec16/html5/thumbnails/7.jpg)
are not part of the true model set but are still retained after
the post processing step are denoted by O.
3.1 SMLR—Example 1
This example problem is taken from [5]. This problem
consists of three different models and the model informa-
tion is given in Table 3. There are totally 100 data points.
Gaussian white noise of standard deviation 0.1 is added to
the output Y. Cherkassky and Ma [5] formulated a multiple
model estimation problem based on the assumption that
majority of data points are from a single model. In accor-
dance with this assumption, the first model consists of
60 % data points. The second and third models consist of
30 and 10 % data points, respectively. This example is
solved using the proposed FMC algorithms. To start with,
six models (different from the number of models that
generated the data) are initialized. To study the effect of
fuzzifier q on the final result, we consider three different
values for q. The results obtained are shown in Table 4.
The data used in all three cases are the same. The FMC
algorithm I exactly found three models without misclassi-
fication of data points for fuzzifier values 2.5 and 2.0 and
overestimated a model for fuzzifier value 1.5. The FMC
algorithm II exactly found three models in all the three
cases and captured the data points belonging to these
models. The Euclidean distance between true and initial-
ized models for algorithm II are given in Table 5. The
initialized models E1 and E2 are close to true model C3
and they shared the data points of C3. Similarly, model E4
captured data points of C2 and models E5 and E6 captured
C1. The accumulation of data points at various iteration
steps are given in Table 6. Model E1 captured only 4 data
points but the angle between E1 and E2 is 3:32 deg. So, the
data points of the models E1 and E2 are augmented and
model parameters are estimated using OLS method. There
is no misclassification of data points in all the three cases.
There is a slight variation in model parameter values based
on the fuzzifier used in both the algorithms. This can be
attributed to the fuzzy weighting used in the model update
Eqs. 9 and 10. The fuzzy weighting depends on the value of
q. The model parameters in Table 4 are obtained using
OLS method by augmenting data points of the models,
whose intervening angle is less than 5 deg. Hence the
model parameters are identical. This suggests that FMC
algorithm II is insensitive to the value of q whereas FMC
algorithm I is slightly sensitive.
3.2 SMLR—Example 2
This example is taken from [5]. This example consists of
two different models and model information is given in
Table 7. Gaussian white noise of standard deviation 0.1 isTable 3 Model information for SMLR—Example 1
Model number Model
C1 0.8x ? 2
C2 0.2x ? 1
C3 -1.5x
Table 4 Results for SMLR—Example 1
Fuzzifier FMC
Algorithm
Model
number
Model No of
points
2.5 I, II E1 0.79 2.01 60
E2 0.12 1.02 30
E3 -1.25 -0.19 10
2.0 I, II E1 0.79 2.01 60
E2 0.12 1.02 30
E3 -1.25 -0.19 10
1.5 I E1 0.79 2.01 58
E2 0.12 1.03 28
E3 -1.17 -0.25 9
O1 3.63 -0.63 5
1.5 II E1 0.79 2.01 60
E2 0.12 1.02 30
E3 -1.25 -0.19 10
Table 5 Euclidean distance between the initialized and original
models
Original model E1 E2 E3 E4 E5 E6
C1 2.98 2.88 2.15 1.19 0.31 0.16
C2 1.84 1.74 1.02 0.08 1.02 1.19
C3 0.46 0.51 1.50 1.92 2.80 3.01
Table 6 Convergence for q = 2.5 for FMC algorithm II in SMLR—
Example 1
Iteration Number E1 E2 E3 E4 E5 E6
Start 6 3 0 30 10 49
5 8 2 0 30 17 43
10 8 2 0 30 27 33
15 7 3 0 30 28 32
20 5 5 0 30 23 37
25 5 5 0 30 23 37
30 4 6 0 30 23 37
35 4 6 0 30 23 37
Final result (40) 4 6 0 30 23 37
16 Int J Adv Eng Sci Appl Math (March–June 2012) 4(1–2):10–21
123
![Page 8: Evaluation of prediction error based fuzzy model clustering approaches for multiple model learning](https://reader036.fdocuments.us/reader036/viewer/2022081808/575070371a28ab0f07d3ec16/html5/thumbnails/8.jpg)
added to the output. The first model consists of 60 % of
data and the second model consists of the remaining 40 %
data. Five models are initialized to solve this problem and
results obtained for both FMC algorithms are detailed in
Table 8. Two cases are considered in this example. In the
first case, 100 data points are used and in the second case
1000 data points are used. The FMC algorithm I overesti-
mated a model in the first case. The overestimated model
O1 captured 9 data points of model E1 which is higher than
the threshold of 5 % of data points and hence this model is
retained in the final result. The angle between the estimated
models O1 and E1 is 6 deg and hence they cannot be
combined. In the second case, FMC algorithm I identified
two models. The model which accumulated less than 5 %
of data points is discarded. The data points of this model
are reassigned to the models having minimum fit error.
After this procedure only one data point is misclassified.
The two original models are exactly identified by FMC
algorithm II in both cases and with no misclassification of
data points. The model parameters are almost the same for
both the cases. The number of data points does not seem to
have much effect on the performance of the FMC algo-
rithm II whereas FMC algorithm I overestimated a model
when data points are less. However, when noise levels are
high, more data will be useful in estimating multiple
models.
3.3 SMLR—Example 3
In this example, we consider a MISO system. Three cases
are considered. In the first case, the true model set contains
four models and each model consists of equal number of
points. In the second case, two models consists of equal
number of points and the remaining two models consists of
unequal number of points. In the third case, the sampling is
biased towards the partition regions. In all these examples,
Gaussian white noise of standard deviation 0.1 is added to
the output data.
3.3.1 Case 1
The input data for this case is shown in Fig. 3a. There are
totally 1000 data points and each model consists of 250
data points. Eight models are initialized. The final results
obtained using the FMC algorithms are shown in Table 9.
The proposed algorithms exactly identified four models. In
FMC algorithm I, two estimated models captured 14 and 9
data points which is less than the threshold of 5 % of total
number of data points to be considered as separate models.
Hence they are reassigned, after which there are no mis-
classification of data points. In FMC algorithm II, for
model C3, four initialized models E2, E4, E6 and E8 are
close and these models captured 80,89,38 and 43 data
points respectively. The angle computed (as defined in
Table 2) between model E2 and models E4, E6 and E8 are
1:5 deg; 2:7 deg and 4:05 deg respectively. As a result, the
final model parameters for C3 are obtained by OLS esti-
mation using all the data points accumulated by these
models. All the data points are identified exactly to their
correct partitions and hence there is no misclassification.
The model parameters estimated are the same for both the
FMC algorithms.
3.3.2 Case 2
This case also contains 1000 data points and the input data
partition is shown in Fig. 3b. The first model has 210 data
points and second model has 490 data points. The third and
fourth models have 150 data points each. Figure 3b shows
the four different regions. In this case, the models are valid
for regions with unequal areas unlike the previous case.
Table 7 Model information for SMLR—Example 2
Model number Model
C1 x1 ? x2 ? x3 ? x4
C2 -x1 - x2 - x3 - x4 ? 6
Table 8 Results for SMLR—
Example 2Total no
of points
FMC
algorithm
Model
number
Model parameters No of
points
100 I E1 1.04 0.96 1.00 1.07 -0.06 51
E2 -1.02 -1.01 -1.03 -0.88 6.01 40
O1 1.62 0.95 1.42 0.85 -0.28 9
100 II E1 1.06 1.02 1.02 1.08 -0.11 60
E2 -1.05 -1.04 -1.03 -0.91 6.05 40
1000 I E1 0.99 0.99 1.00 0.98 0.01 599
E2 -0.99 -0.99 -1.00 -0.99 5.99 401
1000 II E1 0.99 0.99 0.99 0.98 0.02 600
E2 -0.96 -1.00 -1.02 -0.98 5.98 400
Int J Adv Eng Sci Appl Math (March–June 2012) 4(1–2):10–21 17
123
![Page 9: Evaluation of prediction error based fuzzy model clustering approaches for multiple model learning](https://reader036.fdocuments.us/reader036/viewer/2022081808/575070371a28ab0f07d3ec16/html5/thumbnails/9.jpg)
The model information used is the same as in the first case.
FMC algorithms I and II identified the four models as
shown in Table 9. Only one data point is misclassified. The
estimated models are very close to the original models.
3.3.3 Case 3
This case differs from the previous case as the sampling of
data in each region is biased towards the partitions. [8, 11]
formulated the MML problem based on the assumption of
fair sampling in the input space. This example is consid-
ered to study the impact of unfair sampling on the FMC
algorithms evaluated in this paper. Eight models are ini-
tialized. The estimated models are given in Table 9. In this
case also, only one data point is misclassified.
4 Evaluation of PE-based FMC approaches
for estimation of PWARX models
In this section, we discuss the use of the proposed algorithm
in the identification of PWARX systems. As discussed in
the introduction, this idea of prediction error based clus-
tering for the identification of the LDs and the corre-
sponding ARX parameters in a single integrated step does
not seem to have been attempted in the literature. While the
same algorithm as discussed in Table 2 can be used, there
are some differences that need to be accounted for carefully.
First, in the model initialization step, ARX models with
different orders (ny and nu) are initialized. An assumption
here is that the true model structures are part of the ini-
tialized models. This makes it possible to identify the
number of models and also the orders of these models in
the proposed approach. Note however that models not in the
true set could also be present in the initialized model set.
Second, in the static case, the data used with every model
has the same dimension. However, since ARX models with
different orders are initialized for the PWARX case, dif-
ferent models will deal with data of different dimensions.
This needs to be handled carefully by stacking time data
corresponding to each model. This needs to be reflected in
the model update step. Finally, in the model rationalization
step, only models of the same order are merged. Whenever
the model orders are different, all of these models are
retained. Further, the piecewise affine systems in Rn are
separated by boundary hyperplanes. The boundary hyper-
planes between adjacent models are estimated using support
vector classifiers (SVC) described in [3].
We now describe two examples taken from the literature
and one another synthetic example and report the results
that are obtained using the proposed algorithm. The esti-
mated models are denoted by E. From now on, we show
results only for FMC algorithm II.
4.1 PWARX—Example 1
This example is taken from [22]. The PWARX system
consists of three models with Yk; Uk 2 R1; ny (output
order) = 1 and nu (input order) = 1. The model information
with the model parameters and the partitions are given
below.
Yk ¼
½�0:4 1 1:5�Xk þ Ek
if Xk 2 v1 ¼ fXk : ½4 �1 10�Xk\0g½0:5 �1 �0:5�Xk þ Ek
if Xk 2 v2 ¼ Xk :�4 1 10
5 1 �6
� �
Xk � 0
� �
:
½�0:3 0:5 �1:7�Xk þ Ek
if Xk 2 v3 ¼ fXk : ½�5 �1 6�Xk\0gXk ¼ ½Yk�1 Uk�1 1�0
8>>>>>>>>>><
>>>>>>>>>>:
Table 9 Results for FMC algorithms I and II for SMLR—Example 3
Model number Original model Case 1 Case 2 Case 3
Estimated model No of points Estimated Model No of points Estimated Model No of points
E1 3 2 3.01 2.03 250 3.03 1.99 209 3.00 2.08 209
3 4 3.07 3.93 2.95 4.03 2.99 3.95
5 7 4.93 7.06 4.97 6.98 4.99 6.94
E2 -1 2 -0.98 2.00 250 -0.98 2.00 490 -0.97 1.99 491
0 -1 0.01 -1.00 0.00 -0.99 0.01 -1.00
1 3 0.97 3.02 1.01 2.99 0.99 3.01
E3 9 10 8.97 10.08 250 8.98 10.09 151 9.00 10.02 150
12 0 12.00 -0.03 12.00 -0.01 11.99 0.01
1 5 1.06 4.88 1.01 4.99 0.97 5.10
E4 10 13 10.00 12.99 250 10.02 12.97 150 10.07 12.90 150
0 4 -0.02 4.01 -0.13 4.14 -0.05 4.05
2 4 1.97 4.03 2.06 3.92 2.00 3.99
18 Int J Adv Eng Sci Appl Math (March–June 2012) 4(1–2):10–21
123
![Page 10: Evaluation of prediction error based fuzzy model clustering approaches for multiple model learning](https://reader036.fdocuments.us/reader036/viewer/2022081808/575070371a28ab0f07d3ec16/html5/thumbnails/10.jpg)
200 data points for uk are generated from an uniform dis-
tribution in the interval [-5 5]. To this data, noise from a
uniform distribution in the interval [-0.1 0.1] is added. 57,
71 and 72 data points respectively are generated for models
C1, C2 and C3. PWARX models are identified from this
data using the FMC algorithm initialized with five models.
The results obtained are tabulated in Table 10. Two esti-
mated models which accumulated less than 5 % of data
points are discarded. SVC is used to estimate boundary
hyperplanes and the estimated boundary hyperplanes are
given in Table 11. The data points belonging to discarded
models are reassigned to their appropriate models based on
the hyperplane partition. In the final result, only one data
point is misclassified. The data partition in the regressor
space is shown in Fig. 4.
4.2 PWARX—Example 2
This example is taken from [22]. This system is charac-
terized by a SISO Hammerstein model. This system con-
sists of three models with with Yk; Uk 2 R1; ny ¼ 2 and
nu = 1. The PWARX formulation of the SISO Hammer-
stein system taken from [22] is given below.
Yk ¼
½�a1 �a2 0 b1umax�Xk þ Ek
if Xk 2 v1 ¼ fXk : ½0 0 1 �umax�Xk [ 0g½�a1 �a2 b1 0�Xk þ Ek
if Xk 2 v2 ¼ Xk :0 0 �1 umax
0 0 1 �umin
� �
Xk � 0
�
½�a1 �a2 0 b1umin�Xk þ Ek
if Xk 2 v3 ¼ fXk : ½0 0 �1 umin�Xk\0g
Xk ¼ ½Yk�1 Yk�2 Uk�1 1�0
8>>>>>>>>>>>>><
>>>>>>>>>>>>>:
The parameter values are a1 = 0.5, a2 = 0.1,
b1 = 1, umin = - 1, and umax = 2. There are 250 data
points. The input data is generated from a normal distri-
bution with standard deviation 2. To the measurement data,
Fig. 3 Input data for SMLR—Example 3
Table 10 Estimated model parameters for PWARX—Example 1
Model number Model No. of points
E5 -0.40 1.00 1.53 56
E3 0.52 -1.00 -0.49 72
E2 -0.30 0.50 -1.72 72
Table 11 Estimated hyperplanes for PWARX—Example 1
Hyperplane True hyperplane Estimated hyperplane
h12 ½ 0:40 �0:10 1:00 � ½ 0:40 �0:10 1:00 �h23 ½ �0:83 �0:16 1:00 � ½ �0:85 �0:14 1:00 �
Int J Adv Eng Sci Appl Math (March–June 2012) 4(1–2):10–21 19
123
![Page 11: Evaluation of prediction error based fuzzy model clustering approaches for multiple model learning](https://reader036.fdocuments.us/reader036/viewer/2022081808/575070371a28ab0f07d3ec16/html5/thumbnails/11.jpg)
noise generated from a normal distribution with standard
deviation 0.02 is added. The number of data points for
models C1, C2 and C3 are 42, 132 and 76 respectively. The
FMC algorithm is initialized with six models. The results
obtained are tabulated in Table 12. The FMC algorithm
identified the three models. The models that accumulated
less than 5 % of data points are discarded. These data
points are reassigned to the three identified models. Three
data points are misclassified. The boundary hyperplanes
estimated using the SVC method are tabulated in Table 13.
4.3 PWARX—Example 3
This is a synthetic example that we consider to demonstrate
the efficacy of the FMC algorithm for the case of unknown
number of models and their orders. The input is generated
from a uniform distribution in [-10 10]. The input is
partitioned into four different models of different orders.
The model orders and the parameters for this system are
given below.
Yk ¼
½�0:7 0:7� ½Uk�1 1� 0 þEk
if Uk�1 2 ½�10 �4�½�0:4 �0:7 �0:3� ½Yk�1 Uk�1 1� 0 þEk
if Uk�1 2 ½�4 0�½0:6 �0:2 �0:3 0:1� ½Yk�1 Uk�1 Uk�2 1� 0 þEk
if Uk�1 2 ½0 6�½0:3 �0:7 0:5 0:5� ½Yk�1 Yk�2 Uk�1 1� 0 þEk
if Uk�1 2 ½6 10�
8>>>>>>>>>><
>>>>>>>>>>:
To the output Yk, noise generated from a uniform
distribution in [-0.1 0.1] is added. 500 data points are
generated. The number of data points for models
C1, C2, C3 and C4 are 128, 111, 92 and 169
respectively. The FMC algorithm is initialized with 6
models. Four of the six models are of the true model class,
while the other two are not. The model parameters obtained
by the proposed algorithm are tabulated in Table 14 and
the estimated hyperplanes are given in Table 15. It can be
seen that the proposed algorithm performs remarkably well
in identifying the unknown number of models and their
unknown orders.
5 Conclusions
This paper evaluates one possible prediction error formu-
lation for the MML problem. The basic approach relies on
Fig. 4 Results for PWARX—
Example 1
Table 12 Estimated model parameters for PWARX—Example 2
Model number Model No. of points
E6 -0.50 -0.10 0.00 2.00 43
E4 -0.50 -0.10 1.00 0.00 131
E1 -0.50 -0.10 0.00 -1.00 76
Table 13 Estimated hyperplanes for PWARX—Example 2
Hyperplane True hyperplane Estimated hyperplane
h12 ½ 0:00 0:00 �0:50 1:00 � ½ �0:01 �0:01 �0:50 1:00 �h23 ½ 0:00 0:00 1:00 1:00 � ½ �0:04 0:040 0:98 1:00 �
Table 14 Estimated model parameters for PWARX—Example 3
Model number Model No. of points
E1 -0.70 0.71 124
E2 0.40 -0.70 -0.27 115
E4 0.60 -0.24 -0.30 0.41 164
E5 0.30 -0.70 0.50 0.49 97
20 Int J Adv Eng Sci Appl Math (March–June 2012) 4(1–2):10–21
123
![Page 12: Evaluation of prediction error based fuzzy model clustering approaches for multiple model learning](https://reader036.fdocuments.us/reader036/viewer/2022081808/575070371a28ab0f07d3ec16/html5/thumbnails/12.jpg)
a fuzzy model clustering paradigm. Multiple models are
expected to be estimated without the knowledge of number
of underlying models and how these models partition the
data. This ties in with the goals of any solution to the MML
problem, which are: to identify number of models, estimate
model parameters, and partition data corresponding to each
model. We evaluate an approach where models migrate to
accumulate data through a prediction error based mem-
bership. Through a post processing step, the models that
accumulate a large number of data points (more than
specified threshold) are retained. Further, through a simi-
larity measure, model rationalization is performed. Taken
together, these ideas solve the problem of estimating the
unknown number of models and identifying the underlying
data partitions. Various simulated examples are used to test
the robustness of the prediction error based fuzzy model
clustering approaches for identifying static multiple linear
regression (SMLR) and piecewise ARX (PWARX) models.
References
1. Baptista, R.S., Ishihara, J.Y., Borges, G.A.: A Split and Merge
Algorithm for Identification of Piecewise Affine Systems. In:
American Control Conference, pp. 2018–2023. San Francisco,
USA (2011)
2. Bemporad, A., Garulli, A., Paoletti, S., Vicino, A.: A bounded-
error approach to piecewise affine system identification. IEEE
Trans. Autom. Control 50(10), 1567–1580 (2005)
3. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge
University Press, Cambridge (2004)
4. Chen, J., Xi, Y., Zhang, Z.: A clustering algorithm for fuzzy
model identification. Fuzzy Sets Syst. 98, 319–329 (1998)
5. Cherkassky, V., Ma, Y.: Multiple model regression estimation.
IEEE Trans. Neural Netw 16, 785–798 (2005)
6. DeSarbo, W.S.: A Maximum likelihood methodology for clust-
erwise linear regression. J. Classif. 5, 249–282 (1988)
7. DeSarbo, W.S., Oliver, R.L., Rangaswamy, A.: A Simulated
annealing methodology for clusterwise linear regression. Psy-
chometrika 4, 707–736 (1989)
8. Dufrenois, F., Hamad, D.: Fuzzy Weighted Support Vector
Regression For Multiple Linear Model Estimation: Application to
Object Tracking in Image Sequences. In: Proceedings of the
International Joint Conference on Neural Networks, pp. 1289–
1294. Orlando, Florida, USA (2007)
9. D’Urso, P., Massari, R., Santoro, A.: A class of fuzzy clusterwise
regression models. Inf. Sci. 180, 4737–4762 (2010)
10. Elfelly, N., Dieulot, J., Benrejeb, M., Borne, P.: A new appraoch
for multimodel identification of complex systems based on both
neural and fuzzy clustering algorithms. Eng. Appl. Artif. Intell.
23, 1064–1071 (2010)
11. Ferrari-Trecate, G., Musellic, M., Liberatid, D., Morari, M.: A
clustering technique for the identification of piecewise affine
systems. Automatica 39, 205–217 (2003)
12. Frigui, H., Krishnapuram, R.: A robust competitive clustering
algorithm with applications in computer vision. IEEE Trans.
Pattern Anal Mach Intell 21, 450–465 (1999)
13. Gegundez, M.E., Aroba, J., Bravo, J.M.: Identification of piece-
wise affine systems by means of fuzzy clustering and competitive
learning. Eng. Appl. Artif. Intell. 21, 1321–1329 (2008)
14. Gugiliya, J.K., Gudi, R.D., Lakshminarayanan, S.: Multi-model
decomposition of nonlinear dynamics using a fuzzy-CART
approach. J. Process Control 15(4), 417–434 (2005)
15. Hennig, C.: Models and Methods for Clusterwise Linear
Regression. Springer, Heidelberg (1999)
16. Hennig, C.: Identifiability of models for clusterwise linear
regression. J. Classif. 17, 273–296 (2000)
17. Jin, X., Huang, B.: Robust identification of piecewise/switching
autoregressive exogeneous process. AIChE J. 56, 7 (2010)
18. Juloski, A., Weiland, S., Heemels, W.: A Bayesian approach to
identification of hybrid systems. In: 43rd Conference on Decision
and Control, pp. 13–19. Paradise Island, Bahamas (2004)
19. Klawonn, F., Hoppner, F.: Advances in Intelligent Data Analysis
V. Springer, Berlin (2003)
20. Kung, C., Su, J., Nieh, Y.: A Novel Cluster Validity Criterion for
Fuzzy c-Regression Models. In: FUZZ-IEEE, pp. 1885–1890.
Korea (2009)
21. Li, C., Zhou, J., Xiang, X., Li, Q., An, X.: T-S fuzzy model
ientification based on a novel fuzzy c-regression model clustering
algorithm. Eng. Appl. Artif. Intell. 22, 646–653 (2009)
22. Nakada, H., Takaba, K., Katayama, T.: Identification of piece-
wise affine systems based on statistical clustering technique.
Automatica 41, 905–913 (2005)
23. Rengaswamy, R., Venkatasubramanian, V.: A fast training neural
network and its updation for incipient fault detection and diag-
nosis. Comput. Chem. Eng. 24, 431–437 (2000)
24. Roll, J., Bemporad, A., Ljung, L.: Identification of piecewise
affine systems via mixed-integer programming. Automatica 40,
37–50 (2004)
25. Skeppstedt, A., Ljung, L., Millnert, M.: Construction of com-
posite models from observed data. Int. J. Control 55, 141–152
(1992)
26. Soltani, M., Aissaoui, B., Chaari, A., Ben Hmida, F., Gossa, M.:
A modified fuzzy c-regression model clustering algorithm for t-s
model identification. In: 8th International Multi-Conference on
Systems Signals & Devices (2011)
27. Spath, H.: Multiple model regression estimation. Computing 22,
367–373 (1979)
28. Venkat, N., Gudi, D.: Fuzzy segregation-based identification and
control of nonlinear dynamic systems. Ind. Eng. Chem. Res. 41,
538–552 (2002)
29. Venkat, N., Vijaysai, P., Gudi, D.: Identification of complex
nonlinear processes based on fuzzy decomposition of the steady
state space. J. Process Control 13, 473–488 (2003)
30. Vidal, R.: Recursive identification of switched arx systems. Au-
tomatica 44, 2274–2287 (2008)
31. Yang, M., Ko, C.: On Cluster-wise fuzzy regression analysis.
IEEE Trans. Syst. Man Cybernet. B: Cybernet. 27, 1–13 (1997)
Table 15 Estimated hyperplanes for PWARX—Example 3
Hyperplane True hyperplane Estimated hyperplane
h12 ½ �0:12 0:05 �0:80 �0:01 0:01 �3:14 � ½ �0:07 0:03 �0:71 �0:03 0:01 �3:03 �h23 ½ �0:05 0:07 �2:78 �0:05 �0:02 �0:45 � ½ �0:18 0:02 �4:33 �0:09 �0:06 �0:29 �h34 ½ �0:13 �0:07 �0:62 �0:01 �0:03 4:24 � ½ 0:08 0:07 �0:95 0:03 0:05 4:69 �
Int J Adv Eng Sci Appl Math (March–June 2012) 4(1–2):10–21 21
123