Hybrid kernel learning via genetic optimization for TS fuzzy system identification

INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSINGInt. J. Adapt. Control Signal Process. 2010; 24:12–19Published online 7 November 2008 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/acs.1089

Hybrid kernel learning via genetic optimization for TS fuzzysystem identification

Wei Li∗,† and Yupu Yang

Institute of Automation, Shanghai Jiaotong University, Shanghai, China

SUMMARY

This paper presents a new TS fuzzy system identification approach based on hybrid kernel learning and an improved geneticalgorithm (GA). Structure identification is achieved by using support vector regression (SVR), in which a hybrid kernelfunction is adopted to improve regression performance. For multiple-parameter selection of SVR, the proposed GA is adoptedto speed up the search process and guarantee the least number of support vectors. As a result, a concise model structurecan be determined by these obtained support vectors. Then, the premise parameters of fuzzy rules can be extracted fromresults of SVR, and the consequent parameters can be optimized by the least-square method. Simulation results show thatthe resulting fuzzy model not only achieves satisfactory accuracy, but also takes on good generalization capability. Copyrightq 2008 John Wiley & Sons, Ltd.

Received 24 May 2007; Revised 28 February 2008; Accepted 19 September 2008

KEY WORDS: fuzzy system identification; hybrid kernel function; support vector regression; genetic algorithm

1. INTRODUCTION

Since Zadeh proposed the conception of fuzzy setsin 1965 [1], fuzzy theory has been successfullyapplied in many fields to deal with uncertain andill-defined problems. Modeling techniques based onfuzzy sets have also played an increasingly signifi-cant role in occasions such as control, prediction andinference [2–4], etc. Based on the input–output datapairs {(xi , yi )|i=1,2, . . . ,n}, the data-driven fuzzymodeling, namely the fuzzy system identification,

∗Correspondence to: Wei Li, Institute of Automation, ShanghaiJiaotong University, Shanghai, China.

†E-mail: wei [email protected]

Contract/grant sponsor: National Basic Research Program ofPeople’s Republic of China; contract/grant number:2004CB720703

can be defined by extracting appropriate fuzzy rulesfrom these known data to express the unknownmappingfunction f , such that yi = f (xi ). Generally speaking,fuzzy system identification can be divided into twomain steps: structure identification and parametersoptimization. Structure identification puts emphases onthe partition of input–output space to achieve a propermodel structure. So far, there have been a number ofapproaches proposed to cope with this, such as a classof clustering-based methods, neuro-fuzzy methods,evolutionary computation [5–7], etc. However, forthe parameter optimization, various optimization tech-niques can be employed including gradient descent,artificial neural networks, genetic algorithm (GA), etc.Although these techniques can work well in somesituations, even now to design a fuzzy model withgood generalization ability in high-dimensional spaceis still a challenging research topic.

Copyright q 2008 John Wiley & Sons, Ltd.

HYBRID KERNEL LEARNING VIA GENETIC OPTIMIZATION 13

It is well known that support vector machine (SVM)has good generalization ability based on the principleof structure risk minimization [8]. Recently, researchershave proposed some methods for identifying fuzzysystem based on SVM. Chiang and Hao [9] presenteda fuzzy model that related the fuzzy basis functioninference system to SVM. The resulting model exhib-ited good performance in the cases of classification andprediction. Chen and Wang [10] proposed an SVM-based classification system named positive-definitefuzzy classifier. The fuzzy classifier represented as anadditive fuzzy system exhibited a good generalizationperformance. However, one challenging problem hasnot been addressed in these works, that is, how toselect proper SVM parameters to obtain an accuratestructure for the final fuzzy model. Particularly, ifabundant support vectors are produced, the resultingfuzzy model will be characterized by excessive fuzzyrules. Thus, we can manage to reduce the number ofsupport vectors to obtain a concise fuzzy model. Inthis paper, aiming to design a concise fuzzy modelwith good accuracy and generalization performance,we propose a new approach for identifying a TS fuzzysystem based on hybrid kernel learning via geneticoptimization. In our approach, a convex combination ofkernel functions instead of single kernel is adopted toimprove the performance of support vector regression(SVR). Moreover, an improved GA is proposed forparameter selection of SVR to obtain the least numberof support vectors. Consequently, a concise TS fuzzymodel can be achieved after identifying the consequentparameters with the least-square method (LSM).

The remainder of the paper is organized as follows.Section 2 presents the basics of SVR and hybrid kernelfunction. Section 3 describes an improved GA for multi-parameter optimization of SVR. Then, a new identi-fication method for TS fuzzy system is presented inSection 4. In Section 5, simulation results are providedto validate the performance of the fuzzy model. Finally,conclusions are drawn in Section 6.

2. SVR AND HYBRID KERNEL FUNCTION

Now we introduce some basics of SVR [11, 12]. Let{(x1, y1), . . . , (xl , yl)}⊂ Rn×R be a training data set;

SVR tries to find a regression model that can beexpressed as

f (x)=〈�·x〉+b (1)

where 〈., .〉 denotes an inner product in Rn , �∈ Rn ,b∈ R. To achieve the sparseness, an e-insensitive lossfunction |�|e is introduced:

|�|e={0 if |�|�e

|�|−e otherwise(2)

The task is thus to solve the following quadraticprogram:

min�,b,�i ,�

∗i

1

2‖�‖2+C

l∑i=1

(�i +�∗i )

s.t. yi −〈�·xi 〉−b�e+�i

−yi +〈�·xi 〉+b�e+�∗i

�i�0, �∗i �0, i=1, . . . , l (3)

where ‖�‖2 is a term that characterizes the modelcomplexity, the constant C>0 determines the tradeoffbetween the model complexity and accuracy. Aftersolving the constrained optimization problem, thevector � has the form: �=∑l

i=1 (�i −�∗i )xi ; therefore,

the regression function is

f (x)=l∑

i=1(�i −�∗

i )〈xi ·x〉+b (4)

where �i and �∗i are the Lagrange multipliers. These

points with (�i −�∗i ) �=0 are called as support vectors.

The nonlinear case can be achieved by employing amap � : Rn →F to map the original data xi into a high-dimensional feature space F . The map does not needto be known in advance since it is implicitly defined bya kernel function K : Rn×Rn → R, which correspondsto a dot product in some feature space:

K (x, xi )=〈�(x) ·�(xi )〉 (5)

Such a function K is also called the Mercer kernel [8].Substituting K (x, xi ) for 〈x, xi 〉, the regression function(4) then becomes:

f (x)=l∑

i=1(�i −�∗

i )K (xi , x)+b (6)

Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2010; 24:12–19DOI: 10.1002/acs

14 W. LI AND Y. YANG

The characteristics of the SVR are influenced greatlyby the type of kernel function and its parameters beingused. There are two main types of kernel functions:local and global kernel [13]. The values of local kernelsare influenced by the close points. On the contrary, thevalues of global kernels have been greatly influencedby points that are far away from each other. The radialbasis function (RBF) kernel is a typical local kernel

K1(x, y)=exp

(−‖x− y‖2

2�2

)(7)

and the one-order homogeneous polynomial kernel isa typical global kernel:

K2(x, y)=(x · y) (8)

In order to exert the advantages of the both, an intu-itionistic method is to combine these two commonlyused kernels in some admissible way. In this paper, weuse the convex combination of them as:

K (x, y)=�K1(x, y)+(1−�)K2(x, y) (0��1) (9)

It can be proved that a non-negative linear combinationof Mercer kernels is also a Mercer kernel [12]. As aresult, for an SVR with such a hybrid kernel, thereare three parameters C , � and � to be tuned under aspecified insensitive level e. Obviously, the analyticalsolution cannot be acquired easily. In the next section,we will propose an improved GA to tackle with it.

3. IMPROVED GA FOR PARAMETERSSELECTION OF SVR

GA is a directed random search technique that has beenwidely applied in various optimization problems. Manyimproved GA algorithms have been reported by usingdifferent selection, crossover and mutation mechanisms[14]. In this paper, we improve the standard GA byadopting a heuristic crossover and mutation strategy toreduce iteration times and raise evolution speed, andthe elite strategy is utilized to ensure that the geneticprocess convergences.

The initial population is a potential solution set andis usually generated randomly. Each chromosome inthe population will be evaluated by a predefined fitness

function. In this work, to achieve a concise fuzzy model,we hope that the number of support vectors is the leastpossible under a specified insensitive loss level; hence,the fitness function can be defined as

f (�)=1− �

l(10)

where � is the number of support vectors and l is thesize of training data set of SVR. Obviously, the less �makes the bigger f . Then some chromosomes will beselected for reproduction by the roulette wheel selec-tion. Thus, the individual with a higher fitness value hasa higher chance to be selected. In order to guaranteethe convergence, the individual with the finest value iskept down, that is, the elite individual is preserved.

Now, we expatiate on the heuristic crossover andmutation. In standard GA, the gene positions forcrossover and mutation are produced randomly, whichis an inefficient strategy actually. In SVR, C is a regu-larization parameter to determine the tradeoff betweenthe model complexity and accuracy. However, a trivialvariance of C nearly has no influence on the finalregression model according to abundant experiments,that is, the parameter has an insensitive variance step.This also happens to � and �. Hence, we can restrict arange of gene cluster for crossover and locate a moreappropriate gene position for mutation to speed up thesearch process. When adopting the binary coding, theadmissible search range of the parameter is [a,b], andthe insensitive step is �, we can get a range of thelength L of the gene cluster for the genetic operator

H�L�⌊log2

(2H ·�b−a

)⌋(11)

where H is the length of chromosome, .� is the nota-tion of a step function, e.g. a� is equal to the largestinteger less than or equal to a. A random number will beproduced in that range and this number will determinethe length of the gene cluster for crossover operator.Similarly, the mutation operator will only happen ona random gene position which also locates in such arange.

In addition, considering the three parameters to beoptimized, we should adopt joint-coding strategy torealize joint optimization. If H is the length of thechromosome for one parameter, the total length of the



chromosome for three parameters will be 3H . Conse-quently, a multi-point crossover and mutation strategyshould be applied. That is, for every chromosome to beoperated, the crossover and mutation will happen threetimes on its three subsections that belong to the threedifferent parameters, respectively.

Based on the above analysis, an improved GA forthe parameter selection of the SVR can be summarizedas follows:

Step 1: Determine the admissible search ranges ofC , � and � specify the insensitive loss parameter e.Initialize the maximal generation number M , popula-tion size N and crossover and mutation probability.

Step 2: According to the insensitive step of eachparameter, use Equation (11) to get a limited range.Initialize the first generational population randomly andget initial parameter values.

Step 3: Use SVR to find support vectors. Calcu-late the fitness value according to Equation (10). Ifthe fitness value of the optimum individual has noapparent increase or the maximum generation has beenproduced, the optimum C , � and � are obtained. Other-wise, go to Step 4.

Step 4: Utilize the aforementioned selection,crossover and mutation operators to produce a newpopulation. Figure out the parameter values, go toStep 3.

4. IDENTIFICATION ALGORITHM FOR TSFUZZY SYSTEM

A one-order TS fuzzy model is characterized by linearconsequences [15]. Generally, its fuzzy rules are givenby the following:

Ri : IF x1 is Ai1 and x2 is Ai

2 and . . .and xn is Ain

THEN yi =(ci )T x (12)

where ci =[c0i , . . . ,cni ]T∈ Rn+1, i=1, . . . , p· x=[1, x]=[1, x1, . . . , xn]T∈ Rn+1, Ai is the Gaussianmembership function

Ai (x)=exp

[− (x−i )2

22i

](13)

where i and i are real-value parameters. We adopt theproduct inference and weighted average defuzzificationmethod, the overall output of the fuzzy model is inferredas follows:

y(x)=∑p

i=1

∏nj=1 A

ij (x)yi∑p

i=1

∏nj=1 A

ij (x)

(14)

Now, we give the specified procedure of identifyingTS fuzzy system based on the above theories.

Algorithm 4.1 (TS fuzzy system identification usingSVR and an improved GA) Input: A set of trainingdata samples {(x1, y1), (x2, y2) . . . (xl , yl)}, xi =[x1i , x2i , . . . , xni ]⊂ Rn , yi⊂R, i=1,2, . . . , l.

Output: A set of TS fuzzy rules parameterized bya j , j , c j , p( j =1,2, . . . , p). a j ∈ Rn is the center ofthe Gaussian membership functions of the j th rulepremise, j is the width parameter, c j ∈ Rn+1 containsthe parameters of consequent linear function of the j thrule and p is the number of fuzzy rules.

Steps:

(1) Use the improved GA to find the optimumparameters of SVR.

(2) Utilize the SVR with the optimum parameters tofind support vectors.

(3) Determine the number of fuzzy rules and param-eters of the premise: The number of supportvectors means the number of fuzzy rules; thewidth parameter of Gaussian membershipfunction is decided by kernel parameter �; andthe center positions of membership functionsare the support vector points.

(4) Estimate the consequent linear parameters c j ofj th rule using LSM to minimize root mean-square error (RMSE)

RMSE=√1

l

l∑i=1

(yi − yi )2 (15)

where yi and yi are the system and the modeloutput, respectively. A least-square solution canbe obtained by

c j =(XTX)−1XTY (16)



where X=[x ′1, . . . , x ′l ]T, Y=[y1, . . . , yl ]T,x ′i=[�1i , . . . ,�pi , x j

1 �1i , . . . , xj1 �pi , . . . , x

jn �1i , . . . ,

x jn �pi ], �i j =� j (xi )/

∑pj=1� j (xi ), � j (xi )=∏n

j=1 A j (xi ). An existing efficient Kalman filteralgorithm can be employed to solve c j as

ck+1j = ckj +Sk+1X

Tk+1(Yk+1−Xk+1c

kj )

Sk+1 = Sk− Sk XTk+1(Xk+1)Sk

1+Xk+1Sk XTk+1

,

k=0,1, . . . , l−1

(17)

Xk+1 and Yk+1 are the (k+1)th row vector of Xand Y , respectively. Sk is the gain matrix. Theinitial conditions are selected as c0j =0, S0=�I ,where � is a positive large number and I is anidentity matrix. After l times of iteration, c j canbe obtained;

(5) Extract p fuzzy rules expressed as (12), then useEquation (14) to get model outputs.

It can be seen from the above algorithm that thestructure identification is achieved by using SVR. Thus,on the one hand, it can efficiently deal with the high-dimensional problems and avoid the ‘curse of dimen-sionality’. On the other hand, the structure risk mini-mization principle is also beneficial for improving thegeneralization ability of the fuzzy model. In the nextsection, we will present two numerical examples toillustrate the effectiveness of the fuzzy model.

5. SIMULATION RESULTS

5.1. Modeling of a 2-input nonlinear function

In this example, we consider the following nonlinearfunction:

f (x1, x2)=(1+x−21 +x−1.5

2 )2 (18)

From input ranges [1,5]×[1,5] of (18), 50 data pairsare obtained and treated as training data. For giveninsensitive level e=0.5, the improved GA is used tofind the optimum parameters. The length of chromo-some for each parameter is 10 and the total size is 30.

0 5 10 15 20 25 30 35 40 45 504

4.5

5

5.5

6

6.5

7

7.5

8

No. of Iterations

Bes

t Val

ue

the improved GAthe standard GA

Figure 1. Simulation result of standard GA and theimproved GA.

The population size used for the improved GA is 10.The probability of crossover and mutation is 0.6 and0.1, respectively. The admissible search range of C , �and � is [1,50], [0.01,5] and [0,1]. The insensitive stepfor each of them is 3, 0.01 and 0.1, respectively. After 50iterations, five support vectors can be obtained, and theoptimum parameter set (C,�,�)=(36.85,2.84,0.73).Comparatively, we set the same parameters for the stan-dard GA to deal with the same task. Figure 1 showsthe variance of the best value (the number of supportvectors) of every iteration of these two. The solid lineis the standard GA and the dashed line is the improvedGA. It can be seen that the improved GA is uniformconvergence and has stronger search ability.

To show the advantages of the hybrid kernel, wefirst conduct experiments by employing the standardSVR with different single kernels. Parameters areset as above. As a result, the SVR with RBF kernelproduces six support vectors with RMSE equaling to0.35, and the SVR with the one-order homogeneouspolynomial kernel produces nine support vectors withRMSE equaling to 0.39. However, the SVR with thehybrid kernel function produces five support vectorswith RMSE equaling to 0.33. Thus, it can be seen thatthe use of hybrid kernel can improve the performanceof SVR.

By using our identification algorithm, we can get aTS fuzzy model with five rules and RMSE equaling



Table I. Parameter values of TS fuzzy model with fiverules.

Premise part Consequent part

Rule (1i ,2i ) i (c0i ,c

1i ,c

2i )

1 (1.82,1.11) 2.84 (7.85,0.31,−3.35)2 (3.81,1.32) 2.84 (4.23,−0.21,−0.59)3 (1.64,4.51) 2.84 (2.91,−0.26,−0.12)4 (2.26,1.86) 2.84 (4.47,−0.43,−0.51)5 (1.28,1.62) 2.84 (9.18,−1.95,−1.34)

0 10 20 30 40 501

1.5

2

2.5

3

3.5

4

4.5

5

data

outp

ut

support vectors

real values

approximate values

Figure 2. The approximation results for the nonlinearfunction.

to 0.117. The parameter values of the five fuzzy rulesare listed in Table I. Figure 2 shows the approximationresults on these 50 training data points. The squaresdenote the location of support vectors. The stars are thereal values and the circles are the approximate values.Based on that our fuzzy model can approximate thenonlinear function fairly well. A further comparisonof previous fuzzy modeling techniques is presented inTable II in terms of accuracy and complexity (numberof fuzzy rules). It can be seen that our model not onlyachieves the best accuracy, but also costs less fuzzyrules.

Table II. Comparison with other methods for estimatingthe nonlinear function.

Type Rules RMSE

Sugeno and Yasukawa [5] 6 0.281Gomez-Skarmeta et al. [16] 5 0.266Chan et al. [17] 6 0.324Kim and Won [18] 5 0.171Our model 5 0.117

Table III. The obtained optimum parameter sets for SVR.

Training set size C � �

50 44.77 1.52 0.02100 48.97 0.87 0.31150 49.41 1.26 0.19

5.2. Mackey-Glass chaotic time series prediction

In this example, we design a TS fuzzy model usingAlgorithm 4.1 to predict a chaotic time series: Mackey-Glass chaotic time series (n=4). The time series usedhere is generated by the following differential delayequation:

d(x(t))

dt= 0.2x(t−�)

1+x10(t−�)−0.1x(t) (19)

The task is to use known values of the timeseries {x(t−(n−1)�), . . . , x(t−�), x(t)} to predict anunknown value x(t+P). By setting �=17, �=1 andP=1, the task is to use the time series (x(t−4), x(t−3), x(t−2), x(t−1)) to predict an unknown value x(t).From t=501 to t=1000, we obtain 500 data pairs.To validate the generalization ability of our model inthe small sample case, we carry out three experiments.The first 50, 100, 150 data pairs are used as trainingdata sets, respectively, and the following 450, 400, 350data pairs are used as checking data sets. Under theinsensitive loss level e=0.01, the initial parameters forthe improved GA are set as the same as the previousexample. The final obtained optimum parameter setsare listed in Table III in terms of different sizes oftraining data set.



Table IV. Comparison with other methods for time series prediction.

RMSE

Training set size Type Training data Checking data Number of rules

50 ANFIS 0.00022 0.01677 16Kim’s model 0.00172 0.00523 7Our model 0.00144 0.00443 7



500 550 600 650 700 750 800 850 900 950 10000

0.5

1

1.5

time (sec)

N=50

500 550 600 650 700 750 800 850 900 950 10000

0.5

1

1.5

time (sec)

N=100

500 550 600 650 700 750 800 850 900 950 10000

0.5

1

1.5

time (sec)

N=150

Real value Predicted value Support vectors



Figure 3. Simulation results with different training set size N .



Because Kim’s model performs better than the othermethods in the previous example, for this data set westill make a comparison with it. In addition, adap-tive neuro-fuzzy inference system (ANFIS) proposedby Jang [6] is one of the most popular neuro-fuzzymodels. Hence, we further compare our model with theANFIS in terms of training accuracy, checking accuracyand complexity. For Kim’s model, the initial parametervalues are selected as C=30, e=0.01, �=1.2, and forANFIS we adopt grid partition, then at least 16 fuzzyrules are produced for the 4-input system. By usingour algorithm, the obtained fuzzy model has less fuzzyrules. All simulation results are listed in Table IV. It canbe found that our model not only achieves higher accu-racy than Kim’s model, but also costs less fuzzy rules.Although the training accuracy of ANFIS is higher, ourmodel exhibits better generalization ability than ANFISaccording to results on checking data. Figure 3 showsthe prediction results when setting different sizes oftraining set. It can be see that the prediction ability ofour model is also quite good in the small sample case.

6. CONCLUSIONS

In this paper, a new approach to TS fuzzy system iden-tification has been presented. Unlike previous SVM-based fuzzy modeling techniques, we not only adopta hybrid kernel function for SVR, but also proposean improved GA for the selection of multiple param-eters. These procedures effectively reduce the numberof support vectors. Consequently, the constructed TSfuzzy model can be more concise. In addition, due tothe theoretic basis of SVR, the resulting fuzzy modelalso exhibits satisfactory accuracy and good general-ization ability according to the simulation results.

ACKNOWLEDGEMENTS

This work has been supported by the National BasicResearch Program of People’s Republic of China undergrant 2004CB720703. The authors are grateful to theanonymous reviewers for their very helpful comments andconstructive suggestions with regard to this paper.

REFERENCES

1. Zadeh LA. Fuzzy sets. Information and Control 1965; 8:338–353.

2. Attia AF, Abdel-Hamid R, Quassim M. Prediction of solaractivity based on neuro-fuzzy modeling. Solar Physics 2005;227(1):177–191.

3. Hwang JP, Kim E. Robust tracking control of anelectrically driven robot: adaptive fuzzy logic approach. IEEETransactions on Fuzzy Systems 2006; 14(2):232–247.

4. Rattasiri W, Wickramarachchi N, Halgamuge SK. Anoptimized anti-lock braking system in the presence of multipleroad surface types. International Journal of Adaptive Controland Signal Processing 2007; 21(6):477–498.

5. Sugeno M, Yasukawa T. A fuzzy-logic-based approach toqualitative modeling. IEEE Transactions on Fuzzy Systems1993; 1(1):7–31.

6. Jang J. ANFIS: adaptive-network-based fuzzy inferencesystem. IEEE Transactions on Systems, Man, and Cybernetics1993; 23(3):665–685.

7. Bastian A. Identifying fuzzy models utilizing geneticprogramming. Fuzzy Sets and Systems 2000; 113(3):333–350.

8. Vapnik VN. The Nature of Statistical Learning Theory.Springer: New York, 1995.

9. Chiang JH, Hao PY. Support vector learning mechanismfor fuzzy rule-based modeling: a new approach. IEEETransactions on Fuzzy Systems 2004; 12(1):1–12.

10. Chen YX, Wang JZ. Support vector learning for fuzzy rule-based classification systems. IEEE Transactions on FuzzySystems 2003; 11(6):716–728.

11. Smola AJ, Schoelkopf B. A tutorial on support vectorregression. Statistics and Computing 2004; 14(3):199–222.

12. Cristianini N, Shawe-Taylor J. An Introduction to SupportVector Machines and other Kernel-based Learning Methods.Cambridge University Press: Cambridge, MA, 2000.

13. Brailovsky VL, Barzilay O, Shahave R. On global, local,mixed and neighborhood kernels for support vector machines.Pattern Recognition Letters 1999; 20(11):1183–1190.

14. Davis L. Handbook of Genetic Algorithm. Van NostrandReinhold: New York, 1991.

15. Takagi T, Sugeno M. Fuzzy identification of systems andapplications to modeling and control. IEEE Transactions onSystems, Man, and Cybernetics 1985; 15:116–132.

16. Gomez-Skarmeta AF, Delgado M, Vila MA. About the useof fuzzy clustering techniques for fuzzy model identification.Fuzzy Sets and Systems 1999; 106(2):179–188.

17. Chan WC, Cheung KC, Haris CJ. On the modeling ofnonlinear dynamic systems using support vector neuralnetworks. Engineering Applications of Artificial Intelligence2001; 14(2):105–113.

18. Kim J, Won S. New fuzzy inference system using a supportvector machine. Proceedings of the 41st IEEE Conferenceon Decision and Control, Las Vegas, NV, U.S.A., 2002;1349–1354.


Hybrid kernel learning via genetic optimization for TS fuzzy system identification

Documents

Transcript of Hybrid kernel learning via genetic optimization for TS fuzzy system identification