Dynamic control model of BOF steelmaking process based on ANFIS and robust relevance vector machine

Expert Systems with Applications 38 (2011) 14786–14798

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier .com/locate /eswa

Dynamic control model of BOF steelmaking process based on ANFISand robust relevance vector machine

Min Han ⇑, Yao ZhaoFaculty of Electronic Information and Electrical Engineering, Dalian University of Technology, 116023 Dalian, PR China

a r t i c l e i n f o a b s t r a c t

Keywords:Basic oxygen furnace (BOF) steelmakingDynamic control modelAdaptive-network-based fuzzy inferencesystem (ANFIS)Robust relevance vector machine

0957-4174/$ - see front matter � 2011 Elsevier Ltd. Adoi:10.1016/j.eswa.2011.05.071

⇑ Corresponding author. Tel.: +86 411 84707847; faE-mail address: [email protected] (M. Han).

This study concerns with the control of basic oxygen furnace (BOF) steelmaking process and proposes adynamic control model based on adaptive-network-based fuzzy inference system (ANFIS) and robust rel-evance vector machine (RRVM). The model aims to control the second blow period of BOF steelmakingand consists of two parts, the first of which is to calculate the values of control variables, viz., the amountsof oxygen and coolant requirement, and the other is to predict the endpoint carbon content and temper-ature of molten steel. In the first part, an ANFIS classifier is primarily constructed to determine whethercoolant should be added or not, then an ANFIS regression model is utilized to calculate the amounts ofoxygen and coolant. In the second part, a novel robust relevance vector machine is presented to predictthe endpoint. RRVM solves the problem of sensitivity to outlier characteristic of classical relevance vectormachine, thus obtaining higher prediction accuracy. The key idea of the proposed RRVM is to introduceindividual noise variance coefficient to each training sample. In the process of training, the noise variancecoefficients of outliers gradually decrease so as to reduce the impact of outliers and improve the robust-ness of the model. Simulations on industrial data show that the proposed dynamic control model yieldsgood results on the oxygen and coolant calculation as well as endpoint prediction. It is promising to beutilized in practical BOF steelmaking process.

� 2011 Elsevier Ltd. All rights reserved.

1. Introduction

The basic oxygen furnace (BOF) steelmaking is an importantmetallurgical technology. It is one of the most efficient methodsto produce molten steel from hot metal and is also the pre-processof continuous casting and rolling. Because of its high productivityand low cost, until now almost 65% of the total steel in the worldare produced by using this method. In general, the aim of control-ling BOF steelmaking is to guarantee the proper temperature andelement contents of molten steel under the metallurgical stan-dards. In practice, the criterion whether molten steel is acceptableor not mainly depends on the values of endpoint carbon contentand temperature (Chou, Pal, & Reddy, 1993). After smelting, carboncontent generally decreases from approximate 4% in hot metal toless than 0.08% in molten steel, and temperature increases fromabout 1250 �C to more than 1650 �C (Han & Huang, 2008). There-fore, it requires some methods to control the decarburization andtemperature-rising. In the early years, the process was just con-trolled according to the experience of operators, which often couldnot obtain satisfactory result. Thus, some mathematical modelswere developed to assist the operations. These models were mainly

ll rights reserved.

x: +86 411 84707847.

based on material balance and heat balance, which were account-able for their name ‘‘static control models’’. With the developmentof measuring techniques, a lot of new-style sensors and devices,such as sonic sensors, automatic sub lance and off-gas analysis,have been applied in BOF steelmaking to improve the control effect(Dippenaar, 1999; Iida et al., 1984). Based on these advanced mea-surements, the process control models are also improved. Themodels are able to automatically calculate the required amountsof oxygen and auxiliary additions as well as predict the endpointcarbon content and temperature (Birk, Johansson, Medvedev, &Johansson, 2002; Blanco & Diaz, 1993; Chou, Pal & Reddy, 1993;Johansson, Medvedev, & Widlund, 2000). In order to distinguishfrom the static control models, the improved models are namedafter ‘‘dynamic control models’’. Notwithstanding the superiorityto the static ones, the dynamic models are nevertheless based onphysical and chemical laws, which inevitably have some inherentlimitations in practical application. That’s because BOF steelmak-ing process is coupled with heat transfer, mass transfer, and a largenumber of chemical reactions. The complexity of the nature of BOFsteelmaking makes its modeling and control extremely difficultand too hard to be reduced to sets of equations.

To overcome the difficulty and establish an exact mathematicalrelationship between input and output variables of BOF steelmak-ing, data-driven models such as artificial neural networks (ANNs)

http://dx.doi.org/10.1016/j.eswa.2011.05.071

mailto:[email protected]

http://dx.doi.org/10.1016/j.eswa.2011.05.071

http://www.sciencedirect.com/science/journal/09574174

http://www.elsevier.com/locate/eswa

M. Han, Y. Zhao / Expert Systems with Applications 38 (2011) 14786–14798 14787

are adopted. Because of its accurate identification for complex andnonlinear dynamic system (Narendra & Parthasarathy, 1990),ANNs are suitable for both modeling and control purpose in ironand steel making process (Bloch, Sirou, Eustache, & Fatrez, 1997).Radhakrishnan and Mohamed (2000) utilized neural networks assoft sensors to predict silicon and sulfur content of blast furnacehot metal, and created an expert control system to improve thehot metal quality. Pernía-Espinoza, Castejón-Limas, González-Marcos, and Lobato-Rubio (2005) proposed several robust learningalgorithms to train neural networks and described the steelannealing process. As for BOF steelmaking, Cox, Lewis, Ransing,Laszczewski, and Berni (2002) used ANNs to predict oxygen andcoolant requirements during the second blow period. Fileti, Pacian-otto, and Cunha (2006) developed an inverse neural network mod-el to calculate the end-blow process adjustments and the modelwas successfully implemented on line in a Brazilian steelmakingplant. Das, Maiti, and Banerjee (2009) used ANNs with Bayesianregularization to predict the control action for a steelmakingfurnace. Many successful applications of ANNs on steelmakingmodeling have been reported in the literature, however ANNs havesome limitations. It is difficult to tune the structure parameters,which essentially affect the efficiency and prediction accuracy ofANNs. In addition, the models are sensitive to initializedparameters.

In recent years, statistical learning theory has been rapidlydeveloped. It takes structural risk minimization as its principleand focuses on controlling the generalization ability of learningprocess (Vapnik, 2000). Based on this theory, the support vectormachine (SVM) was invented. It enhances the computationalability by using kernel functions and mapping the data into high-dimensional space. Moreover, a regularization parameter C is de-fined to control the trade-off between the model complexity andtraining error (Müller, Mika, Rätsch, Tsuda, & Schölkopf, 2001).Therefore, SVM becomes a powerful tool to identify the nonlinearsystem and many successful applications have been achieved(Esen, Ozgen, Esen, & Sengur, 2009; Vong, Wong, & Li, 2006; Zhang& Wang, 2008). As in the application of steelmaking process con-trol, SVM has also delivered good performances. Yuan, Mao andWang (2007b) integrated multiple support vector machines withprinciple component regression to predict the endpoint parame-ters of electric arc furnace steelmaking. Valyon and Horváth(2009) proposed a sparse and robust extension of least-squareSVM (LS-SVM) to calculate the amount of oxygen blown in BOFsteelmaking, and demonstrated that the performance of LS-SVMwas better than that of ANNs. However, despite its success, SVMhas a number of significant and practical disadvantages. For exam-ple, predictions are not probabilistic and the kernel function mustsatisfy Mercer’s condition. The error/margin trade-off parameter Cneeds to be estimated by using cross validation which consumes alot of time. Moreover, although SVM is relatively sparse, the num-ber of support vectors still grows linearly with the size of the train-ing sample set (Tipping, 2000). These disadvantages limit thefurther applications of SVM.

To alleviate the above drawbacks, Tipping (2000, 2001) pro-posed the relevance vector machine (RVM). RVM is a nonlinearprobabilistic model based on Bayesian evidence framework. It usestype-II maximum likelihood method, which is also referred to as‘‘evidence procedure’’ (Mackay, 1992a, 1992b) to optimize thehyperparameters of the model and obtain a sparse solution. Thegeneralization performance of RVM is comparable to that of SVM,whereas RVM is a higher sparse model (Tipping, 2001). Due to itsadvantages, RVM has obtained state-of-the-art results in differentapplications, such as microbiological fermentation (Sun & Sun,2005), mechanical engineering (Yang, Zhang, & Sun, 2007), medicalimage processing (Wei, Yang, Nishikawa, Wernick, and Edwards,2005) and fault diagnosis (Widodo et al., 2009). However, RVM

has a serious weakness that it assumes all of the training samplesare coupled with independent Gaussian noise: e � N(0, r2). A well-known disadvantage with Gaussian noise model is that it is not ro-bust. If the training samples are contaminated by outliers, theaccuracy of RVM model will be significantly compromised (Faul& Tipping, 2001). In this paper, a novel robust relevance vector ma-chine (RRVM) is contrived, which assumes that each training sam-ple has its individual coefficient of noise variance. During themodel training procedure, the coefficients corresponding to outli-ers will decrease drastically to detect and eliminate outliers. Weutilize the proposed RRVM as an identifier to predict the endpointcarbon content and temperature of molten steel. In BOF steelmak-ing process, measured data are often interfused with outlyingobservations, while RRVM can reduce the impact of outliers andhas good generalization ability (These will be demonstrated bysimulations). Therefore, it is suitable to construct the endpoint pre-diction model.

On the other hand, the amounts of oxygen and coolant requiredin the second blow, which are considered as control variables, arecritical to achieve the expected endpoint. Based on the controlexperience of operators and production data of a steel plant, adap-tive-network-based fuzzy inference system (ANFIS) is adopted tocalculate the values of these control variables. ANFIS can acquireknowledge from a set of input-output data, and has competitivecalculation accuracy (Jang, 1993). The proposed dynamic controlmodel is implemented as follows. At first, ANFIS is utilized to cal-culate the amounts of oxygen and coolant based on metallurgicalstandards and parameters measured by sublance. Then the calcula-tion results and measured data are used as input variables of RRVMmodel to predict the endpoint carbon content and temperature. Ifpredicted values are in the expected range, the control variablesdetermined by ANFIS will be accepted. Otherwise, the calculatedcontrol variables should be adjusted by operators in order toachieve the optimal values. Combining ANFIS and RRVM, a dy-namic control model of BOF steelmaking process is constructed.In order to acquire the expected control effect, the premise is thatRRVM model must be well-trained as an identifier to approximatethe relationship between input and output, and accurately predictthe endpoint carbon content as well as temperature. In the latterpart of this paper, simulations will demonstrate that RRVM hasgood approximation ability and robustness.

The remainder of this paper is organized as follows: In Section2, the production process of BOF steelmaking is briefly described.Section 3 presents the structure of dynamic control model for thesecond blow period. Section 4 introduces the methods ANFIS andRRVM utilized in this paper. Some simulations on benchmark dataand industrial data are given in Section 5. In Section 6, the conclu-sions are drawn.

2. Description of BOF steelmaking process

BOF comprises a vertical solid-bottom furnace with a verticalwater-cooled oxygen lance entering the furnace from above. Thefurnace is tilting for charging and tapping. Above the vessel, thereare a hood and a duct for exhausted gas. The general view of BOF isshown in Fig. 1(Han & Huang, 2008). The molten steel capacity of afurnace generally ranges from 150 to 180 tons and the whole pro-duction process is as follows:

Step 1: Approximately 20–30 tons of scrap and 120–130 tons ofmolten hot metal are charged into the furnace. The hot metalhas been preprocessed for desulfuration.Step 2: Oxygen is blown into the furnace through the lance at aspeed of 500 cubic meters per minute. Meanwhile, burnt lime,dolomite and other auxiliary materials are added. On the

tappinghole

molten steel

refractorybody

slag

lance

exhausted gas hood

sputtering

Fig. 1. General view of basic oxygen furnace (BOF).

14788 M. Han, Y. Zhao / Expert Systems with Applications 38 (2011) 14786–14798

surface of hot metal, oxygen is reacted with the elements suchas carbon, silicon and manganese to remove the impurities andrise the temperature continuously. The slag and exhausted gasare also generated.Step 3: During the whole process, oxygen is blown for about25 min. At a predetermined point in the blowing (about 85%of the whole blowing period), automatic sublance is sticked intothe molten steel to measure the temperature and take a sample.Carbon content is estimated by analyzing the sample. Compar-ing the estimated values with expected endpoints, the amountsof oxygen and coolant for the second blow period arecalculated.Step 4: The blowing continues until the second blow periodends. Then the endpoint temperature and carbon content ofmolten steel are measured for the second time. If the endpointmeasurements are acceptable, the molten steel will be tapped.Otherwise, additional oxygen will be blown until steel compo-nents and temperature meet the technical standards. This per-iod is called ‘‘reblow’’ which consumes more oxygen andshould be avoided.

The definitions of different blowing periods are illustrated inFig. 2. Throughout the whole process, there are three periods,namely first blow, second blow and reblow. The first blow periodis also referred to as ‘‘main blow’’.

After the first blow, temperature and carbon content of moltensteel are measured by sublance to calculate the oxygen and coolantrequirement in the second blow. In this paper, we design the dy-namic control model to assist the second blow control. Based onthe measured parameters, the model is able to calculate the appro-priate amounts of oxygen and coolant as well as predict the

First Blow Sec.

Main Blow

First Measuremen

Hot metal and scrap are charged

Oxygen and additionsOx

and c

Fig. 2. The whole process

expected endpoint carbon content and temperature. That willavoid reblow so as to save operation time and oxygen.

3. Structure of dynamic control model

As pointed out in Section 2, the aim of dynamic control model isto accurately calculate the required volume of oxygen and theweight of coolant. Furthermore, it should achieve the expectedendpoints at the end of the second blow, so the control model con-sists of two parts. The first part focuses on calculating the amountsof oxygen and coolant by using ANFIS model. Input variables aretemperature and carbon content of molten steel measured at theend of the first blow. However, in practical operations, coolant isnot always added. It depends on the current condition and ex-pected endpoints (Fileti et al., 2006). We analyze 1000 groups ofproduction data which are collected from Benxi Steel Sheet Co.,Ltd. in China and illustrate the distribution histogram of coolantdata in Fig. 3. The data was collected from May 2008 to July2008. From the histogram, it is noted that there are a num of caseswhen there is not any coolant added. In addition, the ANFIS modelcannot predict a zero value, hence whether the coolant will beadded or not should be determined at first (Cox et al., 2002). Thisis a binary classification problem and we also choose ANFIS asthe classifier for this task.

Combined with the binary classifier, the block diagram of dy-namic control model is given in Fig. 4. When a new heat is underoperation and the first blow period ends, the temperature and car-bon content are measured by the use of sublance. Firstly, the mea-sured values are fed into ANFIS oxygen predictor and coolantclassifier as inputs to calculate the volume of oxygen as well asdetermine whether coolant should be added or not. If coolant isneeded, the ANFIS coolant predictor will calculate how much cool-ant should be added. Next, based on the predicted amounts of oxy-gen and coolant, endpoint model is utilized to predict thetemperature and carbon content of molten steel. In case the pre-dicted endpoint values are in the required range, namely ‘‘hit’’,the amounts of oxygen and coolant calculated by ANFIS will be uti-lized in the real system. Otherwise, the calculated amounts shouldbe adjusted by operators and used to predict the endpoint again.Possibly this procedure will be repeated for several times untilthe predicted temperature and carbon content are acceptable.Then the calculated amounts of oxygen and coolant are finally usedto control the steelmaking process. After the operation of currentheat, next heat will begin.

4. Methods

In this section, the methods utilized in dynamic control modelwill be described in detail, including ANFIS and RRVM. ANFIS ismeant to calculate the amounts of oxygen as well as coolant,

Blow Reblow

tSecond

Measurement

ygen oolant

Tapping

of BOF steelmaking.

0 1 2 3 4 5 6 7 8 9 100

100

200

300

400

500

600

700

800

Coolant added in the second blow (tons)

Num

ber

of h

eats

Fig. 3. Histogram of coolant added in the second blow for 1000 groups ofproduction data.


whereas RRVM is utilized to predict the endpoint carbon contentand temperature.

4.1. Adaptive-network-based fuzzy inference system

Adaptive-network-based fuzzy inference system (ANFIS) is anoff-line learning model. It has been widely used in modeling andcontrol of nonlinear systems by constructing a set of fuzzy if-thenrules with appropriate membership functions (Melin & Castillo,2005; Mon, 2007). Generally, an ANFIS model consists of five lay-ers. The architecture is shown in Fig. 5.

The fuzzy rules extracted from input–output pairs are describedas (1).

Rr : if x1 is As11 and x2 is As2

2 . . . and xn is Asnn then f r

¼ frðx1; x2; . . . ; xnÞ; r ¼ 1; . . . ;K ð1Þ

where Rr denotes the rth fuzzy rule, and As11 ; . . . ;Asn

n are the fuzzysets associated with the input variables x1, . . . , xn. Functionfr = fr(x1, x2, . . . , xn) is the output of the rth fuzzy rule. The differentfunctions of five layers are described as follows:

Layer 1: Input variables are fuzzificated and the membership ofxl (l = 1, . . . , n) on different fuzzy sets are calculated according toformula (2).

lsll ¼ l

ASll

ðxlÞ ð2Þ

where lA

Sll

ð�Þ denotes the membership function of variable xl onfuzzy sets Asl

l and lsll is the membership degree.

Layer 2: Calculate the confidence degrees of fuzzy rules. As forthe rth fuzzy rule, the degree of confidence is calculated as for-mula (3).

xr ¼ ls11 � l

s22 � � �lsn

n ; r ¼ 1; . . . ;K ð3Þ

Layer 3: All of the confidence degrees are normalized as:

�xr ¼ xr

XK

p¼1

xp

!,; r ¼ 1; . . . ;K ð4Þ

Layer 4: Calculate the output of each fuzzy rule according to for-mula (5). Here Takagi–Sugeno type fuzzy rules are adopted.

fr ¼ pr1x1 þ pr

2x2 þ � � � þ prnxn þ qr ð5Þ

where pr1; p

r2; . . . ;pr

n and qr are fuzzy consequent parameterswhich can be determined based on least-square regression.Layer 5: Calculate the final output of ANFIS. It is the weightedsummarization of fr, and the weight is �xr (r = 1, . . . , K).

y ¼XK

r¼1

�xrfr ð6Þ

4.2. Classical relevance vector machine

Relevance vector machine (RVM) is a probabilistic model basedon Bayesian theory. Consider a data set of input-target pairsfxi; tigN

i¼1 as the training samples, where xi e Rn denotes ann-dimensional input vector and ti e R denotes a scalar-measuredoutput. Furthermore, assume that the targets are independentlysampled from the regression model with additional noise ei asfollows:

ti ¼ yðxi; wÞ þ ei ð7Þ

where ei is assumed to be the mean-zero Gaussian noise with var-iance r2, namely ei � N(ei|0, r2). Similar to SVM, the prediction func-tion y(x; w) of RVM is defined as a linear combination of theweighted basis functions:

yðx; wÞ ¼XN

i¼1

wiKðx; xiÞ þw0 ð8Þ

where K(x, xi) is a basis function, effectively define one basis func-tion for each sample in training data set. The weight parameter vec-tor is defined as w = [w0, . . . , wN]T. According to Eq. (7) and the noiseassumption of ei, we have the Gaussian distribution over ti withmean y(xi; w) and variance r2, viz., p(ti|xi) = N(ti|y(xi; w), r2). Forconvenience, a hyperparameter b is defined as b = 1/r2. Therefore,the likelihood function of the complete training data set is ex-pressed as

pðtjw;bÞ ¼ b2p

� �N=2

exp � b2kt�Uwk2

� �ð9Þ

where t = [t1, t2, . . . , tN]T and U 2 RN�ðNþ1Þ defined as U ¼ ½/ðx1Þ;/ðx2Þ; . . . ;/ðxNÞ�T , which is called design matrix. The definition of/ðxiÞis/ðxiÞ ¼ ½1;Kðxi;x1Þ;Kðxi;x2Þ; . . . ;Kðxi;xNÞ�T ; i ¼ 1; . . . ;N.

The essence of training RVM is to determine the posterior distri-bution over weight vector w. In order to maintain sparse and max-imize the likelihood function, the prior distribution over wj

(j = 0, . . . , N) should be defined primarily. Assume that wj meetsGaussian distribution with mean-zero and variance aj

�1, hencethe prior distribution over w is expressed as

pðwjaÞ ¼YN

j¼0

Nðwjj0;a�1j Þ ð10Þ

which is a multivariate Gaussian distribution, where a =[a0, a1, . . . , aN]T and aj is the individual hyperparameter indepen-dently associated with each weight parameter wj. With the definedprior distribution (10) and the likelihood function (9), the posteriordistribution over w can be computed on the basis of Bayesian rule:

pðwjt;a; bÞ ¼ pðwjaÞpðtjw;bÞpðtja;bÞ ð11Þ

Since p(w|a) and p(t|w, b) are all Gaussian, the product of these twodistributions is also Gaussian. Furthermore, p(t|a, b) does not in-clude w, so it is considered as a normalization coefficient. The pos-terior distribution over w is also Gaussian and can be expressed as:

pðwjt;a; bÞ ¼ Nðwjl;RÞ ð12Þ

where l is the mean value vector and R is the variance matrix,which are expressed as formulas (13) and (14), respectively:

Measure temperature and estimate carbon

content

ANFIS oxygen predictor

ANFIS coolant classifier

Add coolant?

ANFIS coolant predictor

Finely adjust the weight of added

coolant

Finely adjust the volume of

blown oxygen

Predict the endpoint temperature and carbon content

Carbon content hit?Temperature hit?

Utilize the calculation results in real system

Start a new heat

Next heat

Y

N

YY

NN

Stop coolant calculation and wait until next heat starts

Fig. 4. Block diagram of dynamic control model for BOF steelmaking process.


R ¼ ðbUTUþ AÞ�1 ð13Þ

l ¼ bRUT t ð14Þ

where A = diag(a0, a1, . . . , aN). The posterior distribution over w aredetermined by hyperparameters b and a, thus the hyperparametersare optimized by using evidence procedure. The iterative optimiza-tion formulas for hyperparameters are (15) and (16), respectively.

aj ¼ 1=ðl2j þ RjjÞ ¼ cj=l2

j ; j ¼ 0;1; . . . ;N ð15Þ

b ¼N �

PNj¼0cj

kt�Ulk ð16Þ

where lj denotes the jth element of vector l and Rjj denotes the jthdiagonal element of matrix R, cj = 1 � ajRjj. In the process of train-

ing, formulas (13)–(16) are calculated iteratively. Most of aj tend to-ward infinity and the corresponding lj will tend toward zero. Thetraining stops until all the hyperparameters are convergent or themaximum number of iterations is reached.

4.3. Robust relevance vector machine

As inferred above, classical RVM is based on the assumptionthat the noise ei for each training sample is mean-zero Gaussiandistribution with the same variance r2 (or hyperparameter b).However, in practical applications, measured data are always con-taminated by outlying observations, which make the Gaussian-noise assumption untenable. This will compromise the robustnessof RVM regression model and reduce its prediction accuracy. Toalleviate this problem, researchers have proposed some modified

…

x1

…

x2

…

xn

……

……

……

N

N

N…

…

f1

f2

fk

1ω

x1,x2,..,xn

x1,x2,..,xn

x1,x2,..,xn

1 1fω

2 2fω

k kfω

y

……

layer 1 layer 2 layer 3 layer 4 layer 5

11A

11mA

12A

22mA

1nA

nmnA

Π

Π

Π

1ω

2ω2ω

kωkω

S

Fig. 5. The architecture of ANFIS.


methods. Faul and Tipping (2001) presented a variationalrelevance vector machine (VRVM) to deal with outliers. Theyintroduced an explicit distribution which is incorporated in amixture with the standard likelihood function to explain outliers,and utilized variational approximation to implement the infer-ence strategy. Tipping and Lawrence (2005) modified RVM withthe Student-t noise model the distribution of which had heaviertails than that of Gaussian noise model. However, the modifiedalgorithm was also implemented by using variational approxima-tion, which consumes more computation time. Yang, Zhang, andSun (2007) proposed a trimmed relevance vector machine(TRVM) that redefined the likelihood function as a trimmed one.The outliers are eliminated during model training and a re-weighted strategy is introduced to find the trimmed subset. Thenew algorithm is able to detect outliers and enhance the robust-ness of the model.

The above modified strategies are mainly based on variationalinference or trimming data set. In this section, a robust relevancevector machine is presented to reduce the impact of outliers andthe model can still be implemented by using evidence procedure.Instead of setting the same noise variance for all the samples, weassume that each training sample has its individual coefficient ofnoise variance. Then, based on the Bayesian evidence framework,the iteration formulas are deducted to optimize the hyperparame-ters and noise variance coefficients. During the optimization pro-cess, the noise variance coefficients of outliers will decrease so asto detect and eliminate outliers. The detailed optimization proce-dure is as follows.

In reference to Bayesian weighted linear regression (Ting,D’Souza, & Schaal, 2007), assume that the individual noise distribu-tion of the ith training sample is:

pðeiÞ ¼ Nðeij0;r2=biÞ; i ¼ 1; . . . ;N ð17Þ

where r2 denotes the average variance of all the training samplesand bi denotes the noise variance coefficient of the ith sample.The prior distribution of bi is assumed to be Gamma distribution,namely

pðbiÞ ¼ Gammaðai; biÞ ¼ CðaiÞ�1baii bai�1

i e�bibi ð18Þ

with ‘‘gamma function’’ CðaiÞ ¼R1

0 tai�1e�tdt. Define the vectorb = [b1, b2, . . . , bN]T and the likelihood function of the completetraining sample set will change from formula (9) to

pðtjw;b;r2Þ ¼ ð2pr2Þ�N=2jBj1=2

� exp � 12r2 ðt�UwÞT Bðt�UwÞ

� �ð19Þ

where B = diag(b1, b2, . . . , bN), and j � j is the determinant of matrix.The definitions of t, w and U are the same as before. The prior dis-tribution over w is still expressed as formula (10). According toBayesian rule, the posterior distribution of w is computed as

pðwjt;a; b;r2Þ ¼ pðwjaÞpðtjw; b;r2Þpðtja;b;r2Þ ¼ Nðwjl;RÞ ð20Þ

where the variance matrix R and mean value vector l can be com-puted by using formulas (21) and (22), respectively.

R ¼ ðAþ r�2UT BUÞ�1 ¼ Aþ r�2XN

i¼1

bi/ðxiÞ/ðxiÞT !�1

ð21Þ

l ¼ r�2RUT Bt ¼ r�2RXN

i¼1

bi/ðxiÞti

!ð22Þ

Since the computation formulas of variance matrix and meanvalue vector are both influenced by a, b and r2, these hyperparam-eters need to be optimized so as to maximize the posterior distri-bution of w. The optimization method is also based on Bayesianevidence framework (Mackay, 1992a, 1992b), which is imple-mented through maximizing the product of marginal likelihoodfunction p(t|a, b, r2) and the prior distribution over b,pðbÞ ¼

QNi¼1pðbiÞ. The marginal likelihood function is computed as

follows:

Table 1Description of BOF steelmaking data.

Variables Range Mean Standardvariance

Carbon content measured at the endof the first blow (0.01%)

4.5–98.3 34.3 18.9

Temperature measured at the end ofthe first blow (�C)

1517.4�1708.4 1627.9 28.5

The volume of oxygen (m3) 201.0�3221.0 914.1 338.4The amount of coolant (kg) 0.0�4089.0 446.7 791.2Endpoint carbon content (0.01%) 2.3�10.0 5.6 1.3Endpoint temperature (�C) 1613.7�1730.4 1675.2 16.1


pðtja;b;r2Þ ¼Z

pðtjw;b;r2ÞpðwjaÞdw

¼ ð2pÞ�N=2jCj�1=2 exp �12

tT C�1t� �

ð23Þ

where C = r2B�1 + UA-1UT. Equivalently, we can optimize the loga-rithm of the product of p(t|a, b, r2) and p(b). Moreover, we maximizethis quantity with respect to log a, log b and log r2 for convenienceof computing. Therefore, the objective to be optimized is

log pðtj log a; log b; logr2Þ þXN

i¼1

log pðlog biÞ ð24Þ

Note that p(log bi) = bi � p(bi) and delete the terms which are inde-pendent of a, b and r2, we get the objective function

L ¼ �12� log jRj � log jAj þ N logr2 � log jBj þ lT Al�

þr�2ðt�UlÞT Bðt�UlÞiþXN

i¼1

ðai log bi � bibiÞ ð25Þ

The optimized value of a, b and r2 cannot be obtained in closedform, and have to be re-estimated iteratively. Take the partialderivative of formula (25) with respect to log aj (j = 0, 1, . . . , N),log bi (i = 1, . . . , N) and log r2, and rearrange the equations to ob-tain the iteration formulas of a, b and r2. They are expressed as for-mulas (26)–(28), respectively (The detailed inference is inAppendix).

aj ¼1

l2j þ Rjj

¼cj

l2j

ð26Þ

bi ¼ai þ 0:5

bi þ 0:5 � ½r�2ðti � /ðxiÞTlÞ2 þ r�2trðR/ðxiÞ/ðxiÞTÞ�ð27Þ

r2 ¼ ðt�UlÞT Bðt�UlÞN �

PNj¼0cj

ð28Þ

where j = 0, . . . , N, i = 1, . . . , N. Rjj is the jth diagonal element ofvariance matrix R, cj = 1 � ajRjj and tr(�) denotes the trace of matrix.Finally the iterative formulas for optimization are all obtained.Formulas (21), (22), (26), (27) and (28) are the iterative estimationsof R, l and hyperparameters aj, bi, r2, respectively.

In practical utilization of this algorithm, we should set the ini-tialization of the priors used in formulas (21), (22), (26), (27) and(28). First of all, a and r2 can be initialized according to the char-acteristic of the data set, e.g. aj = N/var(t), r2 = var(t), where var(t)is the variance of t. Secondly, the scale parameters ai and bi, whichare included in bi’s prior distribution Gamma(ai, bi), should be se-lected so that the prior means of bi are 1. For example, when theparameters are set as ai = 1 and bi = 1, the noise variance coefficientbi has a prior mean of ai/bi = 1 with a variance of ai=b2

i = 1. Thatmeans we start by assuming the noise distributions of all the sam-ples are Gaussian with the same variance, that is to say, all of thetraining samples are inliers. By using these values, it shows clearlythat the range of bi is 0 < bi < 1.5, which could be inferred from for-mula (27). This setting of prior parameter values is generally validfor most applications or data sets. During the process of iteration,the bi corresponding to outliers will gradually become small.

To sum up the above arguments, the whole training procedureof RRVM is as follows:

Step 1: Initialize the hyperparameters a, b, r2 as well as ai and bi,i = 1, . . . , N.Step 2: Compute the variance matrix R and mean value vector lof posterior distribution over w by the use of formulas (21) and(22), respectively.

Step 3: Optimize the hyperparameters a, b and r2 according to(26)–(28) iteratively. During the optimization procedure, manyof aj will tend to infinity (it can be judged by a large thresholdvalue, such as 109). From (26), this implies that lj will tend tozero and so will wi. The corresponding basis functions arepruned and the sparsity of model is realized.Step 4: Determine whether all the parameters are convergent orthe maximal iteration number is reached. If so, stop iterationand training. If not, go back to Step 2. After the training is fin-ished, the basis functions corresponding to non-zero lj arecalled ‘‘relevance vectors’’.

During the training process, formulas (21), (22), (26), (27) and(28) will be computed iteratively until the termination conditionsare satisfied. Formula (27) reveals that the prediction error(ti � u(xi)Tl)2 of data point {xi, ti} is in the denominator. If the pre-diction error in ti is so large that it dominates over other denomi-nator terms, then the corresponding noise variance coefficient bi ofthat point will be very small. When the prediction error term in thedenominator tends to infinity, the bi will approach to zero. As canbe seen from (21) and (22), the calculation formulas of R and l ofthe posterior distribution over w both include a term which is thelinear weighted combination of all the samples, and the weight isexactly bi. If a sample has an extremely small coefficient, it willmake smaller contribution to the estimate of R and l. This effectis equivalent to the detection and removal of an outlier if the coef-ficient of the data sample {xi, ti} is small enough, which can im-prove the robustness of the model. After training, RRVM can beused to make prediction based on the posterior distribution overw. For a new input datum x⁄, the output is y� ¼ /ðx�ÞTl.

5. Simulations and discussion

In this section, we use the benchmark and industrial data toevaluate the performance of dynamic control model. The simula-tions consists of three parts: the first part is to calculate theamounts of oxygen and coolant based on ANFIS; the second partis to validate the robustness of RRVM by the use of benchmark dataset; the third part is to predict the endpoint carbon content andtemperature of molten steel by using RRVM.

5.1. Evaluation indicator

To evaluate the performance of dynamic control model, twoindicators are considered here. One is hit ratio and the other is rootmean square error (RMSE). Hit ratio of the calculation and predic-tion model is defined as follows:

Hit ratio ¼ Nh

N� 100% ð29Þ

where N is the size of all the test samples, and Nh is the size of testsamples the prediction error of which is within the required range,

Fig. 6. Comparison between predicted value and actual value of the volume of oxygen.

Fig. 7. The distribution of heats which are added with coolant and which are not.

Table 2Comparison between CBR and ANFIS.

Method The volume of oxygen The amount of coolant

Hit ratio (%) RMSE (m3) Hit ratio (%) RMSE (kg)

CBR 72 196.82 68 624.97ANFIS 86 170.45 80 447.73

Table 3RMSE comparison of two-dimensional sinc function approximation.

Method RMSE (training sample set with different number of outliers)

0 10 20 40

Classical RVM 0.04919 0.12933 0.14777 0.19132TRVM 0.04755 0.04981 0.05415 0.08337RRVM 0.04915 0.05074 0.05400 0.06774


viz. jypredict � yactualj 6 e, where ypredict and yactual are predicted valueand actual value of output, respectively. The hit ratio representsthe computational accuracy of the model. Therefore, the bigger itis, the better the accuracy is.

RMSE is defined as formula (30):

RMSE ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1N

XN

i¼1

ðypredicti � yactual

i Þ2vuut ð30Þ

Fig. 8. Comparison between predicted value a

where the index i denotes the ith sample and the definitions ofother variables are the same as before. RMSE represents the devia-tion of the predicted values from the actual ones, so the smaller it is,the better prediction result is.

5.2. Calculating the amounts of oxygen and coolant

In this part, the volume of oxygen and the amount of coolant inthe second blow period are calculated by using ANFIS. The ANFIS

nd actual value of the amount of coolant.

Fig. 9. The surface of two-dimensional sinc function and training samples with 40 outliers.

Fig. 10. The approximation result of classical RVM.

Table 4Comparison of endpoint prediction results.

Method Endpoint carbon content Endpoint temperature

Hit ratio (%) RMSE (0.01%) Hit ratio (%) RMSE (�C)

SVM 92 1.18 70 10.76Classical RVM 89 1.30 74 10.71VRVM 90 1.25 73 10.68TRVM 89 1.26 75 10.49RRVM 93 1.22 76 10.42


toolbox of MATLAB 7.5 is utilized to implement the algorithm. Onethousand groups of BOF steelmaking data, which are collectedfrom a 150 ton furnace of Benxi Steel Sheet Co., Ltd. in China, areused for simulation. The description of the variables, includingrange, mean value and standard variance is listed in Table 1. Firstly,the volume of second-blow oxygen is calculated based on ANFIS.

Fig. 11. The approximation result of RRVM.

The inputs of oxygen model are carbon content and temperaturemeasured at the end of the first blow period. We select 400 groupsof data for oxygen calculation, where 300 groups for model train-ing and 100 groups for test. The result is shown in Fig. 6 and thecurves of predicted value and actual value are both drawn. Whenthe permissible error range is ±200 m3, the hit ratio is 86% andRMSE is 170.45 m3.

As for coolant calculation, two ANFIS models should be trained.One is a classification model to determine whether the coolantshould be added, the other is a regression model to calculate theamount of coolant. In the experiment, firstly whether the coolantneeds to be added is determined by using the classifier. If it is re-quired, the amount of coolant will be calculated by using regres-sion model. The input variables of coolant model are also carboncontent and temperature measured at the end of the first blow.Fig. 7 shows the distribution of the heats. The circles representthe heats which are added with coolant, and the crosses representthe ones which are not added. There is a boundary between twoparts, thus the ANFIS classifier can be trained based on the data.

Utilize the same 100 groups of data tested in oxygen model tocalculate the amount of coolant. The result is shown in Fig. 8.The accuracy of ANFIS classifier is 88%. When permissible errorrange is ±500.0 kg, the hit ratio of coolant model is 80% and RMSEis 447.7 kg. In some cases, the classifier can make the right decisionto add coolant, but the calculation error of regression model is outof the permissible range, so the hit ratio is less than classificationaccuracy. In 12 groups which are not classified correctly, 11 groupsare determined not to add with coolant by ANFIS model, whereasthey are added in fact. This may depend on the experience of dif-ferent operators, and the classification accuracy of ANFIS modelhas come up to 88%, which is acceptable in practical BOFsteelmaking.

For comparison, we also use case based reasoning (CBR) to cal-culate the amounts of oxygen and coolant. Table 2 shows the com-parison results between CBR and ANFIS. From the comparison, itdemonstrates that ANFIS model can achieve better result than CBR.


5.3. Two-dimensional sinc function approximation

In order to evaluate the robustness of RRVM, two-dimensionalsinc function, y(x1, x2) = sin|x1|/|x1| + 0.1x2, is utilized to take theapproximation experiment. We assume the input vectorx = [x1, x2]T. Both x1 and x2 are uniformly drawn from [�10, 10]and y is corrupted by the Gaussian noise with mean-zero and stan-dard variance 0.1. As for the choice of kernel function, throughsome test and analysis, it is found that linear spline kernel is moresuitable than Gaussian kernel to implement the approximationexperiment, thus we choose the linear spline function to be thekernel of RRVM. Define two input vectors xi = [xi1, xi2]T andxj = [xj1, xj2]T, then the kernel for two-dimensional spline functionis generated as:

Kðxi;xjÞ ¼ Kðxi1; xj1ÞKðxi2; xj2Þ ð31Þ

For linear spline function, K(xi1, xj1) is expressed as follows (Vapnik,Golowich, & Smola, 1997):

Kðxi1; xj1Þ ¼ 1þ xi1xj1 þ xi1xj1 minðxi1; xj1Þ �xi1 þ xj1

2

� minðxi1; xj1Þ2 þminðxi1; xj1Þ3

3ð32Þ

Similarly, K(xi2, xj2) has the same expression. The size of trainingsample set is 400. At first, we investigate the approximation perfor-mance of RRVM with the clean training sample set. Then, some out-liers generated from standard Gaussian distribution are added intothe training sample set. We interfuse 10, 20, and 40 outliers withthe clean training samples, respectively. After training, the modelis utilized to approximate the two-dimensional sinc function and

Fig. 12. Comparison between RRVM predicted valu

Fig. 13. Comparison between RRVM predicted val

the test samples are noise-free. For comparison, two other methodsare also implemented in the experiment, including classical RVMand TRVM (Yang et al., 2007). The RMSE comparison of three meth-ods is listed in Table 3. When the training sample set excludes out-liers, the RMSE of RRVM is very close to that of classical RVM but isworse than that of TRVM. When outliers are added, the approxima-tion performance of classical RVM deteriorates drastically, whileTRVM and RRVM can still get good results. With the increase of out-lier number, RRVM can obtain better result than classical RVM andTRVM, which demonstrates that RRVM can effectively resist the im-pact of outliers and has good robustness.

Fig. 9 shows the actual surface of two-dimensional sinc functionand the distribution of training samples when 40 outliers areadded. Figs. 10 and 11 are the approximation results of classicalRVM and RRVM, respectively. It clearly shows that RVM cannotaccurately approximate the function because of the outliers, whileRRVM can reduce the impact of outliers and obtain better approx-imation result.

5.4. Endpoint prediction of BOF steelmaking

Endpoint carbon content [C]End and temperature TEnd predictionis the most important function of the dynamic control model.RRVM is utilized to predict the endpoint in this section. The dataused for experiment is the same as described in Section 5.2. Thesize of data set is 400, which splits into 300 groups for trainingand 100 groups for test. The input variables are carbon content[C]S and temperature TS measured at the end of the first blowand the amount of oxygen in the second blow. In this simulation,the volume of oxygen is supposed to use the values calculated by

e and actual value of endpoint carbon content.

ue and actual value of endpoint temperature.


ANFIS model. However, the test data is collected from practical BOFsteelmaking process. Although the calculation accuracy of ANFISmodel is more than 80%, the calculated amounts of oxygen are nev-ertheless a little different from the actual amounts. Furthermore, inthe practical operation, the data of endpoint carbon content andtemperature is generated by blowing the actual volume of oxygen.Therefore, if we use the calculated amount of oxygen as input topredict the endpoint in the simulation, the result cannot properlyreflect the real performance of the endpoint prediction model. Inthis section, our concern is mainly to validate the prediction abilityof RRVM endpoint model. Thus, it is more appropriate to utilize theactual volume of oxygen in the data set to predict the endpoint car-bon and temperature. Moreover, the outputs are the decrease ofcarbon content and the rise of temperature in the second blow per-iod, denoted as D[C] and DT, respectively. Then the predictionmodels are created as:

D½C� ¼ f½C�ð½C�S; TS;VO2 Þ ð33Þ

DT ¼ fTð½C�S; TS;VO2 Þ ð34Þ

Based on the prediction results, the endpoint carbon content andtemperature can be calculated as follows:

½C�End ¼ ½C�S � D½C� ð35Þ

TEnd ¼ TS þ DT ð36Þ

In order to evaluate the prediction performance, we compareRRVM with another four methods, including SVM, classical RVM,VRVM (Faul & Tipping, 2001) and TRVM. The popular software LIB-SVM is used to implement SVM (Chang & Lin, 2001). The Gaussiankernel function K(xi, xj) = exp{�||xi � xj||2/c2} is chosen, and kernelwidth c of carbon content and temperature models are selected as1.4 and 0.7, respectively. As for SVM, the parameters of e-insensi-tive loss function and regularization coefficient C are determinedby using cross validation. Table 4 lists the comparison results.The permissible error range of carbon content prediction is±0.02% and that of temperature prediction is ±12 �C. RRVMachieves better performance than other four methods, except thatthe RMSE of carbon content prediction is not as good as that ofSVM. Figs. 12 and 13 are the comparisons between RRVM predictedvalue and actual value of carbon content and temperature,respectively.

In addition, it shows that the prediction accuracy of tempera-ture is not as good as that of carbon content. That’s because thereare more factors involved in the prediction of endpoint tempera-ture than in that of carbon content, such as splashing, heat transferthrough furnace wall et al. In practical production, endpoint tem-perature is more difficult to predict and control. In the simulation,the hit ratio of RRVM temperature prediction model has exceeded75%, which is acceptable in practical application. Furthermore, ifwe combine dynamic model control with operators’ experience,probably the control effectiveness will be improved much more.

6. Conclusions

In this paper, a dynamic model based on ANFIS and robust rel-evance vector machine is presented to control the second blowperiod of BOF steelmaking process. Our main concern is to accu-rately calculate the volume of oxygen and the amount of coolantneeded in the second blow, and predict the endpoint carbon con-tent and temperature of molten steel. Firstly, the oxygen and cool-ant models are constructed based on ANFIS. A classifier is primarilytrained to determine whether coolant should be added, and theamount of coolant will be calculated if it is required indeed. Asfor the volume of oxygen, it is directly calculated based on the AN-

FIS regression model. Secondly, endpoint prediction models areconstructed by using robust relevance vector machine. It intro-duces the noise variance coefficients to modify classical RVM andcan resist the impact of outliers during training. Using benchmarkfunction for evaluation, it demonstrates that RRVM has betterrobustness in comparison with other methods. The experimentson industrial data also show that ANFIS and RRVM can obtain goodperformance both in hit ratio and RMSE. The proposed ANFIS mod-el is able to accurately calculate the amounts of oxygen and cool-ant, and the RRVM endpoint prediction model can provide morepromising results than other methods. Both of them have practicalvalue in BOF steelmaking production.

Acknowledgements

This research was supported by the Project (2007AA04Z158) ofthe National High Technology Research and Development Programof China (863 Program), the Project (60674073) of the NationalNature Science Foundation of China. Those supports are gratefullyacknowledged.

Appendix A. Derivation of iteration formulas (26)–(28) onhyperparameters a, b and r2

The objective function of hyperparameter optimization processis formula (25):

L ¼ �12� log jRj � log jAj þ N logr2 � log jBj þ lT Al�

þr�2ðt�UlÞT Bðt�UlÞiþXN

i¼1

ðai log bi � bibiÞ

A.1. The derivation of formula (26)

Formula (26) is used to iteratively estimate the hyperparame-ters aj (j = 0, 1, . . . , N). Firstly take the partial derivative of formula(25) with respect to log aj and ignore the terms which are indepen-dent of log aj, it gives

@L@ log aj

¼ �12

@

@ log ajð� log jAj þ lT Al� log jRjÞ

�

¼ �12� @ log jAj@ log aj

þ @

@ajðlT Al� log jRjÞ � @aj

@ log aj

� ð37Þ

Because A = diag(a0, a1, . . . , aN) and l = [l0, l1, . . . , lN]T, we obtainthe following equations:

@ log jAj@ log aj

¼ @

@ log ajlog

YNj¼0

aj

!¼ @

@ log aj

XN

j¼0

log aj

!¼ 1 ð38Þ

@lT Al@aj

¼ @

@ajlT � diagða0;a1; . . . ;aNÞ � l ¼ l2

j ð39Þ

where lj is the jth element of l. According to formula (36) which isto calculate the derivatives of determinant (Roweis, 1999):

@ log jXðzÞj@z

¼ tr X�1 @X@z

� �ð40Þ

we can obtain

�@ log jRj@aj

¼ @ log jRj�1

@aj¼ @ log jR�1j

@aj¼ tr R

@R�1

@aj

� �ð41Þ

From equation (21) in this paper, the quantity R�1 ¼ ðAþ r2UT BUÞ,therefore @R�1=@aj is a sparse matrix most elements of which are


zero. Only the jth diagonal element of the matrix is 1. Consequentlythe value of formula (41) is

�@ log jRj@aj

¼ tr R@R�1

@aj

� �¼ Rjj ð42Þ

where Rjj is the jth diagonal element of matrix R. Moreover, theequation oaj/olog aj = aj is hold. Put this result as well as (38),(39), (42) into formula (37), and it gives

@L@ log aj

¼ �12½�1þ ajðl2

j þ RjjÞ�

Set this to zero and we can obtain the formula (26):

aj ¼1

l2j þ Rjj

¼cj

l2j

where j = 0, . . . , N and cj = 1 � ajRjj.


Formula (27) is used to iteratively estimate the hyperparame-ters bi ði ¼ 1; . . . ;NÞ. The partial derivatives of (25) with respectto log bi is

@L@ log bi

¼ �12� @

@ log bi½� log jBj þ log jR�1j þ r�2ðt

�UlÞT Bðt�UlÞ� þ @

@ log bi

XN

i¼1


¼ �12� @ log jBj@ log bi

þ @ log jR�1j@ log bi

þ @ðr�2ðt�UlÞT Bðt�UlÞÞ

@ log bi

" #

þ @

@ log bi

XN

i¼1


ð43Þ

Because B = diag(b1, b2, . . . , bN), so the first term of formula (43) is

@ log jBj@ log bi

¼ @

@ log bilog

YN

i¼1

bi

!¼ @

@ log bi

XN

i¼1

log bi

!¼ 1 ð44Þ

The second term is

@ log jR�1j@ log bi

¼ @ log jR�1j@bi

� @bi

@ log bi¼ trðR @R

�1

@biÞ � bi

¼ tr R@

@biðAþ r�2UT BUÞ

� �� bi

¼ r�2 � trðR/ðxiÞ/ðxiÞTÞ � bi ð45Þ

The third term of (43) is

@

@ log bi½r�2ðt�UlÞT Bðt�UlÞ�

¼ @

@bi½r�2ðt�UlÞT Bðt�UlÞ� � @bi

@ log bi

¼ r�2ðti � /ðxiÞTlÞ2 � bi ð46Þ

and the fourth one is

@

@ log bi

XN

i¼1

ðai log bi � bibiÞ ¼ ai � bibi ð47Þ

Then put (44)–(47) into formula (43) and it gives

@L@ log bi

¼ �12½�1þ r�2 � trðR/ðxiÞ/ðxiÞTÞ � bi þ r�2ðti � /ðxiÞTlÞ2

� bi� þ ai � bibi

Set the above equation to zero and rearrange, we obtain the calcu-lation formula of bi (i = 1, . . . , N), that is formula (27) in this paper:

bi ¼ai þ 0:5

bi þ 0:5 � ½r�2ðti � /ðxiÞTlÞ2 þ r�2trðR/ðxiÞ/ðxiÞTÞ�


Take the partial derivative of (25) with respect to log r2 andignore the terms which are independent of log r2, and it gives

@L@ log r2 ¼ �

12

N þ @ðlog jR�1j þ r�2ðt�UlÞT Bðt�UlÞÞ@ logr2

" #ð48Þ

where

@

@ log r2 ðlog jR�1j þ r�2ðt�UlÞT Bðt�UlÞÞ

¼ @

@r2 ðlog jR�1j þ r�2ðt�UlÞT Bðt�UlÞÞ � @r2

@ logr2

¼ @

@r2 log jR�1j � r�4ðt�UlÞT Bðt�UlÞ�

� r2

¼ r2 � @@r2 log jR�1j � r�2ðt�UlÞT Bðt�UlÞ ð49Þ

The first term of formula (49) is calculated as follows:

r2 � @@r2 log jR�1j ¼ r2 � tr R

@R�1

@r2

� �¼ r2 � tr R

@ðAþ r�2UT BUÞ@r2

� �¼ r2 � trð�r�4RUT BUÞ ¼ �trðR � ðr�2UT BUÞÞ¼ �trðR � ðR�1 � AÞÞ ¼ �trðI� R � AÞ

Moreover, because of cj = 1 � ajRjj, the above formula can be furtherexpressed as:

r2 � @@r2 log jR�1j ¼ �trðI� R � AÞ ¼ �ðN � trðR � AÞÞ

¼ � N �XN

j¼0

ðRjjajÞ !

¼ �XN

j¼0

ð1� RjjajÞ

¼ �XN

j¼0

cj ð50Þ

Then put (49) and (50) into formula (48), and we can get

@L@ log r2 ¼ �

12

N þ �XN

j¼0

cj � r�2ðt�UlÞT Bðt�UlÞ !" #

Set the above equation to zero and rearrange, and we obtain the for-mula (28)

r2 ¼ ðt�UlÞT Bðt�UlÞN �

PNj¼0cj

These are the derivations of iteration formulas (26)–(28) on hyper-parameters a, b and r2.

References

Birk, W., Johansson, A., Medvedev, A., & Johansson, R. (2002). Model-basedestimation of molten metal analysis in the LD converter: experiments at SSABTunnplåt AB in Luleå. IEEE Transactions on Industry Applications, 38(2), 560–565.

Blanco, C., & Diaz, M. (1993). Model of mixed control for carbon and silicon in a steelconverter. ISIJ International, 33(7), 757–763.

Bloch, G., Sirou, F., Eustache, V., & Fatrez, P. (1997). Neural intelligent control for asteel plant. IEEE Transactions on Neural Networks, 8(4), 910–918.

Chang, C. C., & Lin, C. J. (2001). LIBSVM: A library for support vector machines.Software available from <http://www.csie.ntu.edu.tw/~cjlin/libsvm>.

http://www.csie.ntu.edu.tw/


Chou, K. C., Pal, U. B., & Reddy, R. (1993). A general model for BOP decarburization.ISIJ International, 33(8), 862–868.

Cox, I. J., Lewis, R. W., Ransing, R. S., Laszczewski, H., & Berni, G. (2002). Applicationof neural computing in basic oxygen steelmaking. Journal of Material ProcessingTechnology, 120(1–3), 310–315.

Das, A., Maiti, J., & Banerjee, R. N. (2009). Process control strategies for a steelmaking furnace using ANN with Bayesian regularization and ANFIS. ExpertSystems with Applications, 37(2), 1075–1085.

Dippenaar, R. (1999). Towards intelligent steel processing. In Proceedings of the 2ndinternational conference on intelligent processing and manufacturing of materials,Honolulu, USA (pp. 75–84).

Esen, H., Ozgen, F., Esen, M., & Sengur, A. (2009). Modelling of a new solar air heaterthrough least-squares support vector machines. Expert Systems withApplications, 36(7), 10673–10682.

Faul, A. C., & Tipping, M. E. (2001). A variational approach to robust regression.Lecture Notes in Computer Science, 2130, 95–102.

Fileti, A. M. F., Pacianotto, T. A., & Cunha, A. P. (2006). Neural modeling helps theBOS process to achieve aimed end-point conditions in liquid steel. EngineeringApplications of Artificial Intelligence, 19(1), 9–17.

Han, M. & Huang, X. Q. (2008). Greedy kernel component acting on ANFISto predict BOF steelmaking endpoint. In Proceedings of the 17th worldcongress of the international federation of automatic control, Seoul, Korea(pp. 11007–11012).

Iida, Y., Emoto, K., Ogawa, M., Masuda, Y., Onishi, M., & Yamada, H. (1984). Fullyautomatic blowing technique for basic oxygen steelmaking furnace. ISIJInternational, 24, 540–546.

Jang, J. S. R. (1993). ANFIS: Adaptive-network-based fuzzy inference system. IEEETransactions on Systems, Man and Cybernetics, 23(3), 665–685.

Johansson, A., Medvedev, A., & Widlund, D. (2000). Model-based estimation of metalanalysis in steel converters. In Proceedings of the 39th IEEE conference on decisionand control, Sydney, Australia (pp. 2029–2034).

Mackay, D. J. C. (1992a). Bayesian interpolation. Neural Computation, 4(3), 415–477.Mackay, D. J. C. (1992b). The evidence framework applied to classification networks.

Neural Computation, 4(5), 720–736.Melin, P., & Castillo, O. (2005). Intelligent control of a stepping motor drive using an

adaptive neuro-fuzzy inference system. Information Sciences, 170(2–4),133–151.

Mon, Y. J. (2007). Airbag controller designed by adaptive-network-based fuzzyinference system (ANFIS). Fuzzy Sets and Systems, 158(24), 2706–2714.

Müller, K. R., Mika, S., Rätsch, G., Tsuda, K., & Schölkopf, B. (2001). An introduction tokernel-based learning algorithms. IEEE Transactions on Neural Networks, 12(2),181–201.

Narendra, K. S., & Parthasarathy, K. (1990). Identification and control of dynamicalsystems using neural networks. IEEE Transactions on Neural Networks, 1(1),4–27.

Pernía-Espinoza, A., Castejón-Limas, M., González-Marcos, A., & Lobato-Rubio, V.(2005). Steel annealing furnace robust neural network model. Ironmaking andSteelmaking, 32(5), 418–427.

Radhakrishnan, V. R., & Mohamed, A. R. (2000). Neural networks for theidentification and control of blast furnace hot metal quality. Journal of ProcessControl, 10(6), 509–524.

Roweis, S. (1999). Matrix identities. Document available from <http://www.cs.nyu.edu/~roweis/notes/matrixid.pdf>.

Sun, Z. H., & Sun, Y. X. (2005). Soft sensor based on relevance vector machines formicrobiological fermentation. Developments in Chemical Engineering and MineralProcessing, 13(3–4), 243–248.

Ting, J. A., D’Souza, A., & Schaal, S. (2007). Automatic outlier detection: A Bayesianapproach. In Proceedings of 2007 IEEE international conference on robotics andautomation, Roma, Italy (pp. 2489–2494).

Tipping, M. E. (2000). The relevance vector machine. Advances in neural informationprocessing systems (Vol. 12). Cambridge, Massachusetts, USA: MIT Press, pp.652–658.

Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine.Journal of Machine Learning Research, 1(3), 211–244.

Tipping, M. E., & Lawrence, N. D. (2005). Variational inference for student-t models:Robust Bayesian interpolation and generalised component analysis.Neurocomputing, 69(1–3), 123–141.

Valyon, J., & Horváth, G. (2009). A sparse robust model for a Linz-Donawitz steelconverter. IEEE Transactions on Instrumentation and Measurement, 58(8),2611–2617.

Vapnik, V. N. (2000). The nature of statistical learning theory (second ed.). New York,USA: Springer-Verlag.

Vapnik, V. N., Golowich, S. E., & Smola, A. (1997). Support vector method for functionapproximation, regression estimation, and signal processing. Advances in neuralinformation processing system (Vol. 9). Cambridge, Massachusetts, USA: MITPress, pp. 281–287.

Vong, C. M., Wong, P. K., & Li, Y. P. (2006). Prediction of automotive engine powerand torque using least squares support vector machines and Bayesian inference.Engineering Applications of Artificial Intelligence, 19(3), 277–287.

Wei, L. Y., Yang, Y. Y., Nishikawa, R. M., Wernick, M. N., & Edwards, A. (2005).Relevance vector machine for automatic detection of clusteredmicrocalcifications. IEEE Transactions on Medical Imaging, 24(10), 1278–1285.

Widodo, A., Kim, E. Y., Son, J. D., Yang, B. S., Tan, A. C. C., Gu, D. S., et al. (2009). Faultdiagnosis of low speed bearing based on relevance vector machine and supportvector machine. Expert Systems with Applications, 36(3), 7252–7261.

Yang, B., Zhang, Z. K., & Sun, Z. S. (2007). Robust relevance vector regression withtrimmed likelihood function. IEEE Signal Processing Letters, 14(10), 746–749.

Yuan, J., Wang, K., Yu, T., & Fang, M. (2007a). Integrating relevance vector machinesand genetic algorithms for optimization of seed-separating process. EngineeringApplications of Artificial Intelligence, 20(7), 970–979.

Yuan, P., Mao, Z. Z., & Wang, F. L. (2007b). Endpoint prediction of EAF based onmultiple support vector machines. Journal of Iron and Steel Research,International, 14(2), pp. 20–24, 29.

Zhang, R., & Wang, S. (2008). Support vector machine based predictive functionalcontrol design for output temperature of coking furnace. Journal of ProcessControl, 18(5), 439–448.

http://www.cs.nyu.edu/

http://www.cs.nyu.edu/

Dynamic control model of BOF steelmaking process based on ANFIS and robust relevance vector machine

Documents

Transcript of Dynamic control model of BOF steelmaking process based on ANFIS and robust relevance vector machine