MODEL ORDER SELECTION: A PRACTICAL APPROACH

Mechanical Systems and Signal Processing (2001) 15(2), 265}273doi:10.1006/mssp.2000.1289, available online at http://www.idealibrary.com on

MODEL ORDER SELECTION: A PRACTICALAPPROACH

Z. CHEN AND C. K. MECHEFSKE

Department of Mechanical and Materials Engineering, The University of Western Ontario,London, Ontario, Canada N6G 5B9. E-mails: [email protected]. [email protected]

(Received 14 January 2000, accepted 26 January 2000, published online 23 January 2001)

A practical model order selection criterion for autoregressive processes is presented in thispaper. The criterion was developed based on the normalised error between sample data anddata generated by the model. The method is an excellent indicator of how well an autoreg-ressive model "ts the available data and is therefore a useful tool when selecting theoptimum model order needed to accurately and e$ciently model the underlying process ofthe sampled data. The procedure is di!erent from the existing criteria that are in commonuse such as the Akaike information criterion, the minimum descriptive length and Hannan'scriterion. The paper shows that the criterion developed here performs well over a broadrange of data sample lengths and signal-to-noise ratios.

( 2001 Academic Press

1. INTRODUCTION

Correct estimation of model order and parameters of a model in system identi"cation andsignal modelling has attracted much attention in the research community because of itsbroad application to many areas such as speech modelling, radar, sonar, spectrum estima-tion, signal processing, etc. In mechanical engineering, the parametric modelling approachto data analysis, in which model order selection is essential, is gaining in use as an e!ectiveprocedure. It is particularly e!ective in situations where collection of su$cient data isdi$cult or impossible or where the signal strength is low relative to the background noisein the signal [1}3]. Parametric-model-based spectral estimation used in machinecondition monitoring and machinery diagnostics also has the added advantage ofallowing for model-based automatic diagnostic algorithms to be developed andemployed [4].

There are a number of practical issues that must be considered before adopting a particu-lar modelling technique. These primarily include the potential for real-time application ofthe analysis technique and the model order selection task. Model order selection is the "rstand a critical step toward the goal of correctly estimating a model. A model order that is toohigh will introduce false spike appearances and consume computing resources at anaccelerated rate. A model order that is too low will produce results that will lack detail andmay result in some of the system characteristics not being represented. Model orderselection is therefore a key element in the e!ective use of any modelling-based estimationapproach. The model order selection criteria used need to be well understood as a result ofcareful analysis before their use in any particular situation [1, 4].

2. MODEL ORDER SELECTION

The development of accurate and e$cient model order selection criteria has been studiedfor at least the past four decades. Many di!erent contributions have been made, a number of

0888}3270/01/020265#09 $35.00/0 ( 2001 Academic Press

266 Z. CHEN AND C. K. MECHEFSKE

which stand out as hallmark achievements by scientists of world renown in this "eld. One ofthe pioneers in this work is Akaike [5}7], who "rst described the Akaike informationcriterion (AIC) in 1972 as a means of selecting the optimum model order when "ttingsampled data to parametric models. Rissanen [8, 9] proposed the minimum descriptivelength (MDL) criterion in 1978. Increasing the model order (the number of terms in themodel) automatically improves the accuracy of the model "t to the sample data, but alsoincreases the computational requirements. To balance these two e!ects, both the AIC andthe MDL criterion include terms that penalise the model order selected as the model growstoo large. In this way an optimum value for the model order is achieved. In 1979, Hannanand Quinn [10] introduced a criterion in which the penalty term is quanti"ed between thetwo extremes of the AIC and the MDL. All of these criteria are based on, or partially basedon, the maximum likelihood principle. There also exist many criteria derived from othertheoretical bases that will not be described in this paper. Many of these are asymptoticallyequivalent to the AIC or the MDL.

The criteria described above are representative of many other model order selectioncriteria, but are generally considered to be the primary selection criteria used in the "eld ofparametric model estimation. These criteria are derived from di!erent perspectives anddi!erent theoretical bases. Each has its own merits as well as shortcomings in regard to theirapplication. A numerical investigation of the performance of these three criteria in real-timeapplications was conducted [11] and the results showed that they predicted model ordersaccurately only in rather limited ranges in terms of the number of sample data points usedand the signal-to-noise ratio (SNR).

This paper reports on a practical criterion that can be used to predict model order forautoregressive processes. As will be shown, the criterion performs well over a broad range interms of the number of sample data points used and the signal-to-noise ratio.

3. A NORMALISED ERROR-BASED MODEL ORDER SELECTION CRITERION

The criterion developed in this section is one that is based on the normalised predictionerror. Assume that S (n) is an autoregressive process,= (n) is a Gaussian white noise processwith zero mean and>

p(n) is an autoregressive model of order p that is intended to be used to

describe the process S (n). Also, let op

be the prediction error for model order p, and o0

bethe prediction error for model order zero. The normalisation of the prediction error atthe model order of p, with respect to the prediction error at model order zero, results inequation (1):

op

o0

"

+Nn/1

[S (n)#= (n)!>p(n)]2/N

+Nn/1

[S (n)#=(n)]2/N(1)

op

o0

"

+Nn/1

[S2(n)#=2 (n)#>2p(n)#2S (n= (n)!2S(n)>

p(n)!2= (n)>

p(n)

+Nn/1

[S2(n)#=2 (n)#2S (n)=(n)]. (2)

The term +2S(n)= (n) in the numerator of equation (2) equals zero because of the twoprocesses being uncorrelated and because of the random nature of the noise. If we assumethat the autoregressive model, >

p(n), perfectly describes the process S(n) and nothing else,

then the terms >p(n) and=(n) can also be considered as uncorrelated. This leaves the term

+2>p(n)=(n) as being equal to zero. The normalised prediction error can now be written as

in equation (5):

op

o0

"

+Nn/1

[S (n)!>p(n)]2#+N

n/1=2 (n)

+Nn/1

S2(n)#+Nn/1=2 (n)

(3)

TABLE 1

Signal-to-noise ratio and correspondingnormalised prediction error

SNR op/o

0

0 50%10 9%20 1%

267MODEL ORDER SELECTION

op

o0

"

+Nn/1=2(n)

+Nn/1

S2 (n)#+Nn/1=2 (n)

(4)

op

o0

"

p2/0*4%

p24*'/!-

#p2/0*4%

. (5)

This result indicates that, if the model correctly predicts the process that generated thesample data that were used to build the model, the normalised prediction error is a functionof the signal-to-noise ratio. Table 1 lists several corresponding values for the normalisedprediction error and the signal-to-noise ratio (SNR).

It was assumed above that there was no correlation between the autoregressive model,>p(n), and the Gaussian white noise term,= (n). The modelling algorithms e!ectively seek

out the autoregressive components of the sample signal and use these to form the modelwhile ignoring the noise components. This assumption cannot be fully justi"ed given thatthe model is generated by applying mathematical algorithms to "t the underlying process,S(n), and the noise process simultaneously. The autoregressive model, >(n), and theGaussian white-noise term, = (n), will be at least slightly correlated in most cases. As aresult, the numerator in equation (3) should be less than the term +=2 (n). That is,the variance of the noise will not be fully presented in the numerator because theautoregressive model, > (n), will at least partially model the noise process as well.For this reason the true value of the normalised error will be smaller than the values listed inTable 1.

The normalised-error-based model order selection criterion presented above suggestsa means for optimising model order selection without being dependent on the number ofsample data points used or the signal-to-noise ratio in the sample data. Such independencefrom these factors could make this procedure very successful. A series of numericalexperiments that test the procedure is described below.

4. PERFORMANCE OF THE CRITERION

This section serves to demonstrate how the normalised-error-based model order selectioncriterion works and to what extent satisfactory performance is achieved. To test this newcriterion a number of numerical examinations were conducted, which include stationaryprocesses as well as transient processes. The least-squares linear prediction method em-ployed was the covariance method. The algorithm for solving the covariance equationsadopted was the one presented by Marple [12]. For this particular work, the source codesof the routines used were modi"ed to accommodate double-precision computations. Theforward linear prediction error was the model prediction error that was normalised andemployed for the selection criterion.


A Fortran 77 program was written to accomplish the computational work. As mentionedpreviously, equation (5) can only serve as a guideline. The value of the normalised errorneeds to be determined through experiment. A number of numerical experiments wereconducted in order to extract a value that is suitable for most situations. It was observedthat a value of normalised error between 26.5 and 30% works well with all the processestested. As a result, a normalised error of 28% was the limit used throughout the resultsreported in this paper. To simulate real-time application, a selection criterion value of 28%was inserted in the program and the model orders were selected automatically by theprogram.

Four examples are presented as a demonstration of the new procedure's accuracy ande$ciency. All the "gures are shown in three dimensions that represent the model order, thesignal-to-noise ratio (SNR) and the number of sample data points used. The SNR wascalculated in each case according to

SNR"10 log10

p2s

p2n

. (6)

In equation (6) the terms psand p

nare the standard deviations of the signal and the noise,

respectively.The input data for the four examples were generated using equations (7)} (10). All the

equations used contain a noise term, the strength of which is calculated according todi!erent desired SNRs. Equations (7) and (8) consist of three and "ve frequency com-ponents, respectively, while equation (9) is composed of eight frequency elements. Allfrequencies are relatively well spaced and the amplitudes of the individual components aredi!erent to ensure that whatever error occurs is not due to the source data. Equation (10) iscomposed of six distinct frequency components with the individual frequencies being pairedtogether into three sets. The amplitudes of each pair are the same. The purpose of these testsis to demonstrate that, while the new criterion can be shown to work well in a controlledcase, it also works well in a relatively di$cult situation:

x (t)"10 cos(25t ) 2n#n)#5 cos(10t ) 2n#n/2)#3 cos(40t ) 2n#3n/4)#w(t) (7)

x(t)"10 cos(25t ) 2n#n)#5 cos(10t ) 2n#n/2)#3 cos(40t ) 2n#3n/4)

#6 cos(20t ) 2n#n/8)#8 cos(30t ) 2n#3n/4)#w (t) (8)


#6 cos(20t ) 2n#n/8)#8 cos(30t ) 2n#3n/4)#8 cos(35t ) 2n#5n/8)

#3 cos(40t ) 2n#3n/2)#9 cos(45t ) 2n#3n/8)#w (t) (9)


#10 cos(23t ) 2n#n/4)#5 cos(8t ) 2n#n/8)#3 cos(38t ) 2n#3n/2)#w (t)

(10)

The results are presented in Figures 1}4 following the sequence of equations (7)} (10). Forthe purpose of comparison, the well-known model order selection criteria, AIC, MDL andHannan's, were applied to signals generated by using equation (7) and the results aredisplayed in Figures 5}7.

Figure 1. Selected orders of the three-component case [equation (7)].

Figure 2. Selected orders of the "ve-component case [equation (8)].

Figure 3. Selected orders of the eight-component case [equation (9)].


Figure 4. Selected orders of the six-component case [equation (10)].

Figure 5. Selected order [equation (7)] by AIC.

Figure 6. Selected order [equation (7)] by MDL.


Figure 7. Selected order [equation (7)] by Hannon's criterion.


5. DISCUSSION

From the "gures presented in the previous section it is clear that one consistentcharacteristic of all the results is that at a signal-to-noise ratio of zero there is a distinctchange in the model order required to accurately represent the underlying process. In thenegative SNR regions the model orders selected by applying the normalised predictionerror criterion are well over the true order. This is due to the fact that in this region the noiseis stronger than the signals themselves. The autoregressive model begins to model the noiseas well rather than distinguishing the signals from the noise and modelling only the signals.Situations such as this where noise is much stronger than the signal itself, are not commonin normal applications. The incorrect prediction of the model order in this region is nota great concern.

The positive SNR region is where the new model order selection criterion is most likely tobe applied. In this region for all Figures 1}4, the model orders selected were generallycorrect [accurately re#ecting the underlying process*equations (7)} (10)] except for a smallarea where the SNR and the number of sample data points used were both small. In thisregion the models are all slightly overestimated. The reason for this model overestimation isthat there are not enough signal data available in the sample data for the modellingprocedure to accurately model the underlying process. The noise in the signal is relativelystrong compared to the signal and the number of sample data points used in the modellingis rather small resulting in a signi"cant amount of noise being incorporated into the model.The model order is therefore pushed up.

Many other processes (stationary and transient), with various numbers of frequencycomponents, were examined and they all depicted the same pattern. The previous sectiononly shows a selection of the results. Figure 4 shows the result of a case where theunderlying process is made up of three pairs of closely spaced frequency elements. Thissituation represents a process that is inherently more di$cult to model than the other threecases. The results show that the models that result from using the normalised predictionerror model order selection criterion in this case are also excellent. This demonstrates thenew methods capability to handle complex signals.

Figures 5}7 present the order selection results by applying the three well-known criteriato the relatively simple case with three frequency components [equation (10)]. The correctorder of the model should be 6, but it is clear that the results are not accurate and also not


stable. One observes that the new criterion outperforms these criteria by comparing thesethree "gures with Figure 1.

Knowing that residuals are easily computed by modern software identi"cation packagessuch as Matlab Identi"cation Toolbox, it may be worth noticing that this paper is todemonstrate the "nding that the normalised residual is independent of the signal processesand that the presented practical criterion works well.

6. CONCLUDING COMMENTS

This paper has presented the development details and preliminary numerical testing ofa practical model order selection criterion for autoregressive processes. This new criterionwas developed based on the normalised prediction error between sample data and modelgenerated data. The results show that the method is an excellent indicator of how well anautoregressive model "ts the available data. This new method shows the potential to bea useful tool when selecting the optimum model order needed to accurately and e$cientlymodel an underlying process based on sample data. The procedure is distinct from theexisting criteria that are in common use such as the Akaike information criterion, theminimum descriptive length and Hannan's criterion. The results also show that the criteriondeveloped here performs well over a broad range of data sample lengths and signal-to-noiseratios.

From the results presented, a few concluding comments can be made.

f A practical model order selection criterion based on the normalised prediction error wasderived.

f The criterion works well in most of the positive SNR region, accurately and e$cientlymodelling the underlying processes with a minimum model order.

f The criterion is simple and does not require human judgement to be involved. It thereforehas the potential to be directly applied in real-time applications.

f The preliminary results show that the new criterion outperforms the existing well-knowncriteria.

f Further work is planned to evaluate the performance of this technique when used withsample data representing real systems.

ACKNOWLEDGEMENTS

The authors wish to sincerely thank Professor S. M. Dickinson for his help and supportduring the course of this work and the Natural Sciences and Engineering Research Councilof Canada (NSERC) for "nancial support.

REFERENCES

1. C. K. MECHEFSKE and J. MATHEW 1992 Mechanical Systems and Signal Processing 6, 297}307.Fault detection and diagnosis in low speed rolling element bearings, Part I: the use of parametricspectra.

2. C. K. MECHEFSKE 1993 British Journal of Non Destructive ¹esting 35, 503}507. Parametricspectral estimation for use in machine condition monitoring, Part I: the optimum vibration signallength.

3. C. K. MECHEFSKE 1993 British Journal of Non Destructive ¹esting 35, 574}579. ParametricSpectral Estimation for use in Machine Condition Monitoring, Part II: the e!ect of noise in thevibration signal.


4. C. K. MECHEFSKE and J. MATHEW 1992 Mechanical Systems and Signal Processing 6, 309}316.Fault detection and diagnosis in low speed rolling element bearings, Part II: the use of nearestneighbour classi"cation.

5. H. AKAIKE 1971 2nd International Symposium on Information ¹heory, ¹sahkadsor, Armenia,;SSR, May, 267}281. Information theory and an extension of the maximum likelihood principle.

6. A. AKAIKE 1974 IEEE ¹ransaction on Automatic Control AC-19, 716}723. A new look atstatistical model identi"cation.

7. H. AKAIKE 1969 Annals of ¹he Institute of ¹he Statistical Mathematics 21, 407}419. Powerspectrum estimation through autoregressive model "tting.

8. J. RISSANEN 1983 Automatica 14, 465}471. Modeling by shortest data description.9. J. RISSANEN 1983 ¹he Annals of Statistics 11, 416}431. A universal prior for integers and

estimation by minimum description length.10. E. J. HANNEN and B. G. QUINN 1979 Journal of Royal Statistical Society 41, 190}195. The

determination of the order of an autoregression.11. Z. CHEN and C. K. MECHEFSKE 2000 CSME Forum 2000 Montreal, Canada, 16}19 May.

A numerical comparison of model order selection criteria performance.12. S. L. MARPLE 1987 Digital Spectral Analysis with Application. Englewood Cli!s, NJ: Prentice-

Hall.

MODEL ORDER SELECTION: A PRACTICAL APPROACH

Documents

Transcript of MODEL ORDER SELECTION: A PRACTICAL APPROACH