Download - Forecasting Disruptions in the ADITYA Tokamak

8/8/2019 Forecasting Disruptions in the ADITYA Tokamak

1/16

Forecasting disruptions in the ADITYA tokamak

using neural networks

A. Sengupta, P. Ranjan

Institute for Plasma Research,Bhat, Gandhinagar, India

Abstract. A neural network technique has been used to predict disruptions in the ADITYA tokamak.A time series prediction method is employed whereby a series of past values of some time dependent

quantity is used to predict its value in the future. The time varying observables used in the present

work are the different diagnostic signals from four Mirnov probes, one soft X ray monitor and one H monitor. The predicted quantities are the same observables at some future time. The neural network

is trained with the past values of the different diagnostic signals as inputs and the future values of the

same quantities as targets. The trained neural network is used to forecast in a multistep sequence. This

amounts to a prediction several time steps earlier. Very good prediction results have been obtained

up to 8 ms earlier with little distortion of the signals and no appreciable time lag, a capability which

is believed to be well suited to the task of on-line predictions of disruptions in ADITYA. As actual

experimental signals are used, confidence regarding the performance of the neural network on hardware

implementation is automatically ensured.

1. Introduction

Disruption in tokamaks is a sudden loss of confine-

ment and subsequent transfer of plasma energy to thesurrounding structures. As a result the machine walls

and the supporting structures are subjected to enor-

mous heat load causing moderate to severe damage.

Disruptions also result in rapid plasma current decay,

which induces large electric fields that in turn drive

large eddy currents in the conducting structures andmechanical supports. This results in enormous jB

forces. The damage caused by these forces determinesthe lifetime of a machine. Disruption avoidance, or

minimization of disruptivity, therefore, is important

for cost effective operation of tokamaks.

Artificial neural networks (ANNs) have alreadybeen used for studying different aspects of tokamak

plasmas. These include fast estimation of plasma

parameters in DIII-D [1], ASDEX Upgrade [2] and

ITER [3], as a means of predicting disruptions [46]

and the vertical position of the plasma current cen-

troid [7]. It has also been used to order the magnetic

sensors according to their importance in the estima-

tion of plasma parameters [2, 3].The motivation for using ANNs for prediction of

disruptions came from the early use of ANNs in

various forecasting applications [8, 9]. However, the

ultimate aim of the prediction will be to make an

Corresponding author.

attempt to reduce the frequency of disruptions on-

line in hardware. Therefore, if used as a disruption

alarm, an ANN should not only give an accurate pre-

diction of an approaching disruption, but also should

make this prediction sufficiently early to allow formeasures to be taken to soften the impact of disrup-

tion. In this article, ways to predict plasma disrup-

tions in the ADITYA tokamak [10, 11] are discussed,

using time series of various time dependent quanti-

ties obtained from diagnostics. These include fluctu-

ations of the tangential component of poloidal mag-

netic field B as measured by Mirnov probes placedat different poloidal locations around the plasma.

These have been used earlier [4], where only a single

probe is used as input to the ANN for the prediction.

The results of that study are not suitable for the goal

of disruption control, since:

(a) Large errors are present for predictions more

than 1.1 ms earlier.

(b) An increasing time lag appears between the

actual and the predicted instants of disruption

as the prediction is made earlier and earlier.

In a recent work [6], soft X rays have been used as

inputs instead of magnetic signals, and the predic-

tion is made 3.12 ms in advance of the event, whichis a 200% improvement over the results of Ref. [4].

However, the time lag problem persists for predic-

tions more than 3.12 ms in advance. For effective real

Nuclear Fusion, Vol. 40, No. 12 c2000, IAEA, Vienna 1993


2/16

A. Sengupta and D. Ranjan

time measures to be taken, this time has to increase

by at least a factor of 2.There are two goals for this article:

(i) To use an ANN to predict the instant oftriggering a disruption.

(ii) To use an ANN to make the prediction suf-

ficiently early that measures can be taken to

soften the impact of disruptions.

The criterion for (i) above is that the exact instant

of triggering of the disruptive instabilities should be

picked up, rather than the instant of current decay,

because once current quench starts, control mea-sures, even if taken, may prove futile. The triggering

of instabilities is signalled primarily by:

(a) Increased MHD activities around the plasma

edge, primarily the (m,n) = (2, 1) mode, picked

up by a set of Mirnov coils located around the

plasma. These immediately precede the thermal

quench.

(b) A fall in the soft X ray (SXR) intensity at theplasma core, which immediately follows edge

cooling.

(c) Increased H emission.

For (ii), the earliness, i.e. the extent of early pre-

diction, which can be quantified by a time interval

t, is the major issue. This t, when applied to dis-

ruption avoidance or minimization, must be around

57 ms for effective measures to be taken.The purpose of this article is to find out whether

ANN architectures, different from those used earlier,

and the use of additional diagnostic information helpimprove upon these results. So in addition to several

Mirnov probe signals, soft X ray (SXR) and H emis-

sion signals have also been used here. A series of val-ues of the diagnostic signals has been chosen as their

past values, and a prediction involves a continuation

of the series. This prediction can be a single time

step in future, or several time steps. The latter rep-

resents an earlier forecast, and this earliness can be

increased by increasing the number of predicted time

steps. However, since the prediction error increaseswith the increase in the number of time steps, the

choice for sufficiently early prediction should neces-

sarily be within permissible errors.

The organization of this article is as follows. Sec-

tion 2 contains a general treatment of time series pre-

diction, while Section 3 discusses briefly ANNs andtheir relation to time series prediction. In Section 4

an overview of the different ANN architectures used

for time series prediction is given. Section 5 shows

the preparation of the database, while Section 6 givesour forecasting results in detail, Section 7 discusses

the results and Section 8 summarizes the results and

conclusions.

2. Time series prediction

A time series [9] basically refers to a set of values

which are taken to be measurements of an observ-able over time. The system on which the observ-

able is being measured is evolving with time, i.e. it

is a dynamical system. The observable is a functiononly of the state of the system; as soon as the sys-

tem returns to the original state, the observable also

returns to the original value.

Let the state of the system at present be repre-

sented by a and the observable being measured byp(a). It is assumed that state a contains all the infor-

mation required to predict the state t time units into

the future. Let the state at this future time be Ft(a).

The prediction refers to the calculation of the observ-able at time t from a knowledge only of the present.

Similarly, if one goes backwards in time from the

present instant, a time series of past values of theobservable is obtained:

b = [p(a), p(F(a)), p(F2(a)),....,p(Fm(a))]

(1)

where is the time step length or the rate of sam-

pling of the observable. b is thus a segment of a time

series where the time dependence is now expressed

explicitly:

b = [xt1 , xt1, .....,xt1m] (2)

where x is the measured quantity xt1 = p(a) and t1is the present time instant.

Equation (2) is the form of the time series that isgenerally used [4, 6, 8, 9].

Prediction means estimating the measured vari-

able at future times, i.e. the continuation of the series

by way of extrapolation. For the extrapolation, some

functional representation of the extrapolated (pre-

dicted) value is required in terms of the given timeseries. This should have the following form:

xpredt1+n = fn[xt1 , xt1, .....,xt1mr]. (3)

The left hand side of the above equation gives

the predicted value of the dynamical quantity at thefuture time t1 + n(n = 1, 2, ....), where again t1refers to the present. fn gives the functional form

for the transformation. The problem, therefore, is

1994 Nuclear Fusion, Vol. 40, No. 12 (2000)


3/16

Article: Neural network forecasts of disruptions in ADITYA

to find an approximation for fn to bring about the

extrapolation.Extrapolation schemes for fn can be divided into

two broad categories, linear and non-linear. Linear

models such as auto-regressive (AR), moving average

(MA) or auto-regressive moving average (ARMA)

have been most frequently used for time series anal-

ysis [9]. These models work well only for simple

time series and are most likely to fail for stochas-tic or chaotic series. Analysis of such complex series

requires a long time history of the series, yielding

very high order linear models, i.e. models involving

a very large number of linear terms (corresponding

to the past temporal points of the series). In prac-tice such high order models are impractical from a

computational point of view.

Non-linear techniques, such as the ANN, wavelet

and chaos analysis can provide good insight into a

complex time series when linear models fail (Ref. [12]

and Refs [215] therein). The ANN algorithm invokes

non-linear models that approximate a much broaderclass of functions than linear models, so that it can

analyse any complex time series without involving

large errors due to numerical instabilities.

3. Artificial neural networks

The ANN technique, which has its origins as an

artificial model of the parallel processing capabili-

ties of the human brain, is typically used in patternrecognition where a collection of images is presented

to the network, and its task is to assign the images to

one or more classes. Another typical use of the ANNis non-linear regression, where the algorithm is used

to find a smooth interpolation between data points.

By way of contrast, time series prediction involves

processing of patterns which evolve over time, the

response at a particular point of time depending not

only on the current value of the observable, but also

on the past. The ANN, of which the multilayer per-

ceptron (MLP) is the most widely used type, consists

of several layers of nodes or neurons, and represents

an analytic mapping between a set of inputs xi anda set of outputs yk (shown in Fig. 1, where i = 1

5, k = 12). The layer(s) not directly accessible to

the user, referred to as the hidden layer(s), produce

the inherent non-linearity in the transformation, and

also increase the networks ability to model different

classes of function. While the size of the input andoutput layers are determined by the problem being

solved, the size of the hidden layer is determined by

trial and error, from the training and testing errors.

Signals, propagating in the forward direction only,i.e. from the input towards the output, impinging on

a particular neuron j of a hidden layer, are weighted

by certain factors to give the net input gj to the

neuron j:

gj =

m

i=0

wjixi (4)

where xi refers to the output of the ith neuron of theinput layer, m is the total number of input neurons

and the weight wji represents the strength of the con-

nection between the neuron j of a hidden layer and

the neuron i of the input layer. i = 0 corresponds tothe bias term, whose value is x0. The non-linear func-

tion usually chosen for the mapping is a sigmoidal

function [13], acting on gj , with the form

f(gj) =2

1 + egj 1. (5)

Neural network training refers to an adjustment of

the weights to achieve the minimization of an error,called the mean square error, defined by

E2 =

k

l

(y(l)k y

(l)desk )

2

NoutNex(6)

where y(l)desk is the desired value of the kth output

as determined by the lth member of a training data

set. Nout and Nex are the total number of outputs

and examples, respectively, in a given problem. Notethat E2 is averaged over all examples and all out-

puts (normalized). Training is stopped when E2

decreases to a pre-defined error goal.To evaluate the performance of the network, the

same network with the correct weights is applied to

another set of known input/output examples calledthe test dataset. If the network performance on this

dataset is satisfactory, it is supposed to have a gener-

alization capability over any set of similar data, and

can be used to process the unknown data in those

data sets.

For time series analysis, the inputs to the ANN are

the past values of the measured (temporally varying)quantity and the output is the predicted value. The

more complex the time series, the more past infor-

mation is needed. This results in a larger number

of inputs and weights. The yks in the numerator of

Eq. (6) are the outputs (ANN calculated and the tar-

get) measured at a certain (future) time instant andare therefore local in time.

The functional representation fn as shown in

Eq. (3) is in general unknown and, for ANN

Nuclear Fusion, Vol. 40, No. 12 (2000) 1995


4/16


Figure 1. Structure of the ANN. This shows a general 5:3:3:2 MLP-2 network. The

offset bias is not shown.

modelling, is usually approximated by a sigmoidalfunction, shown in Eq. (5). Here a very impor-

tant property of ANNs is used, which is the fact

that it is only the nature of the function, i.e.

whether it is linear or non-linear, that determines

a transformation, rather than its actual form. It

is this property which is utilized while definingthe inherent non-linearity of the ANN by only cer-

tain specific forms of sigmoidal functions, while

the examples in different problems may involve a

broad spectrum of non-linear functions. If the time

series is multivariate rather than univariate, the

scalars x and y representing the inputs and out-puts are to be replaced by vectors. In that case

the product in Eq. (4) is also to be substituted

by W x.

4. General methods for the prediction

There are three possible methods for the predic-

tion of disruption from the past values of a given time

series, using a feedforward neural network [9]. These

methods are used to predict the dynamical observ-able at a future time t1 + n, i.e. xt1 + n, from the

available data at time t1.Method 1. One possibility is to construct a sin-

gle function f which predicts one point into the

future, and iterate this function on its own out-

puts to predict further into the future. Expressed

mathematically,

xpredt1+1

= f(xt1 , xt11, ....) (7)

xpredt1+2 = fx

predt1+1, xt1 , xt11, ....) (8)

...

xpredt1+n1

= f(xpredt1+n2, xpredt1+n3

,....,xt1 , xt11,....)

(9)

xpredt1+n = f(x

predt1+n1

, xpredt1+n2

,....,xt1 , xt11, ....)

(10)

xpredt1+n is the predicted value of x at a time n steps

ahead of t1.Method 2. One function can be constructed that

uses only past data as inputs to directly predict one

desired future point; i.e.

xpredt1+n = f(xt1 , xt11, ....). (11)

Method 3. Another method which can be pro-

posed is to construct functions which take both

previous predictions and past values as inputs, and

predict only the future point as output:

xpredt1+1

= f1(xt1 , xt11,....) (12)

xpredt1+2 = f2(xpredt1+1, xt1 , xt11, ....) (13)

...

xpredt1+n1

= fn1(xpredt1+n2

, xpredt1+n3

, xt1 , xt11,....)

(14)



5/16


Table 1. Major parameters of the tokamak ADITYA

Parameters Design values Range of the discharges used

Major radius (cm) 75

Minor radius (cm) 25

Plasma cross-section shape Circular

Plasma current (kA) 250 80100

Toroidal field at plasma centre (T) 1.5 0.75

Plasma duration (ms) 300 6085Electron temperature (eV) 500 250300

xpredt1+n = fn(x

predt1+n1

, xpredt1+n2

, xt1 , xt11,....). (15)

In all the above cases, n = 1 implies a single step

prediction. Although both single and multistep pre-

diction can be used, our primary aim will be the

latter, since the application here requires long term

prediction. Methods 1 and 3 are called iterated pre-

diction methods, while method 2 is a direct predic-

tion method.

5. Database preparation

The database for the prediction task was prepared

using experimental ADITYA discharges (Table 1 lists

some major parameters of ADITYA). One disruptivedischarge was used for training purposes and one for

testing. Forecasting was then done with three disrup-

tive discharges. The plasma discharges chosen for our

work were all sampled at 0.02 ms.

Ten past values of each of the input variables wereused, and one predicted value at the output, which

was chosen many steps ahead, given the requirements

for real time prediction. This number of past tem-

poral points was slightly less than that used for the

TEXT studies, where 15 past values of a single input

were used. However, we shall see later that therewould be a total of 60 inputs in the present study,

that would consist of 10 past temporal values of six

different diagnostic signals. This would be shown to

be the optimum number of inputs.

The type of network chosen for this work was an

MLP-2 ANN with two layers of 16 neurons each.

The reason for using this rather than the MLP-1

network lay in the quality of fitting. It was found

that although the training error was less for the

MLP-1 network with 32 hidden neurons, the testing

error, as also the difference between the training andthe testing errors, was much smaller for the MLP-2

network with the same 32 neurons divided equally

between the two hidden layers. This was not surpris-

ing, because if the number of input neurons is large

in comparison with that of the hidden neurons (asin our case), an MLP-2 always contains a smaller

number of weights and therefore shows a better

generalization property than an MLP-1 network.

Looking at the iterated methods in Section 4, it

was observed that since the number of inputs and

outputs increased with every iteration, long term

prediction would be computationally intensive, whileit is known that for real time prediction of disrup-

tions, these predictions should be long enough to ask

for iterations of the order of 200400. Moreover, the

single step predicted variable xt1+1, that is fed back

to the input to predict xt1+2, is certainly not as accu-

rate as the target xt1 , xt11, .... Therefore, the iter-ative method was not thought to be well suited to

the task of disruption prediction. Hence method 2,

the direct method, was used for our predictions. In

the present study, with a sampling time of 0.02 ms,

there were 50 predicted time steps corresponding toa prediction 1 ms earlier (i.e. n = 50 in Eq. (11)).

Similarly n = 100 for a 2 ms early prediction andn = 400 when a forecast is made 8 ms in advance.

The non-linear mapping was brought about by

the sigmoid of Eq. (5). This is a symmetric sigmoid,

bounded in the interval [1,+1]. The inputs and out-

puts were normalized in the same interval. Without

this normalization, a normalization constant would

have been required in Eq. (6), as the outputs had

different dimensions.

The ANN was trained using the general adap-

tive recipe (GAR) algorithm [14]. Learning rate

or gradient descent step length was initialized to

1.0. On-line modification of the learning rate was

possible in GAR, through specification of up and

down adaptation parameters which were set at0.002 and 0.8, respectively. These values were deter-

mined by the network training process. A larger up



6/16


5

0

5

expt.

5

0

5

predicted t = 1ms

5

0

5

predicted t = 2ms

5

0

5

predicted t = 3ms

55 60 65 70 75 80 85 90

5

0

5

time(ms)

predicted t = 4ms

Figure 2. Using only soft X ray signals as input to the neural network, the quality

of prediction for t = 1, 2, 3 and 4 ms, respectively, are compared with the actual

signal. It is observed that for t = 3 ms, a time lag appears for the first time in the

predicted signal with respect to the actual experimental data. This lag increases with

higher t. The vertical lines represent the instant the disruption is actually triggered.

adaptation increased the gradient descent step length

so much as to often overshoot the minima, whereby

the error increased. A smaller down adaptation did

not reduce the learning rate enough, so that after afew iterations the learning rate increased once again

to overshoot the minimum. This effectively slowed

down the training.

To begin with, the ANN was trained with only one

diagnostic signal. This was to test the performance of

the network with similar input information as that

already used in Refs [4] and [6]. First, one Mirnov

probe was used as input, followed by the SXR signal.

Finally, only the H signal was used as the single

input. From the training stage itself it became clear

that the network required additional information to

learn the trends in the data as the learning remained

very slow throughout. The only exception was the

training with the H signal, when the error reducedmuch faster.

The performance of the trained network, fed

with SXR signals, in forecasting disruptions is pre-

sented in Fig. 2. The vertical lines denote the actual

triggering instant of the instabilities. The main

observation here is that the instant of prediction

of the triggering of the disruption started laggingbehind with respect to the actual signal when pre-

diction was done 3 ms or more early. This more or

less agreed with the results of Ref. [6].

The number of inputs was then increased by

choosing two Mirnov probes and the SXR and Hsignals. The Mirnov probes chosen first were two

closely located ones, at poloidal angles of 114 and

138. It was observed that the learning rate wors-

ened, as did the forecasting errors on a new dis-

charge. Next, two probes located more or less dia-metrically opposite to each other were selected, at

angles of 42 and 234. For this set of inputs, the

learning improved over the SXR case but was worse

than that of the H case. The performance of the

ANN on new data, however, remained more or less

the same as that on the single input cases. The exper-iment was repeated with similar inputs, but now the

two Mirnov probes were those located at 138 and

330. A much improved generalization capability of



7/16


Table 2. Comparison of the mean square training

errors for the ANN provided with different combinations

of diagnostic signals as inputs

Combination of inputs Training error

SXR 0.0165

H 2.65 104

Four inputsa 0.0054

Four inputsb

0.0103Six inputsc 0.0086

a Four inputs: Inputs comprised of Mirnov probes at 42

and 234, together with SXR and H.b Four inputs: Inputs comprised of Mirnov probes at 138

and 330, together with SXR and H.c Six inputs: Inputs comprised of Mirnov probes at 42,

138, 234 and 330, together with SXR and H.

the ANN was noticed. Moreover, the ANN seemed

to have gained a better tolerance for long term

predictions.

The number of inputs was further increased tofour magnetic signals from probes located in the four

quadrants around the plasma at angles 42, 138, 234

and 330, together with the SXR and H signals.

Although the execution time increased because of a

larger ANN structure, this set of inputs clearly pro-

duced an overall improvement in the fitting.

This observation was believed to be due to the uni-

formity of the probe locations around the plasma so

that more information was now put into the network.This was corroborated by the fact that initially the

choice of Mirnov probes located diametrically oppo-

site improved the performance, as compared withthe set of signals from two closely located probes.

Table 3. Comparison of ANN p erformance with respect to mean square error E2 for single input and multiple

input cases

t (ms) E2SXR E2H E

24 E24 E

26

0.02 0.0224 5.62 104 0.1396 0.0221 0.0144

1.00 0.0342 0.1039 0.1758 0.0579 0.0365

2.00 0.0721 0.1898 0.2032 0.0922 0.0562

3.00 0.1141 0.2107 0.2143 0.1105 0.0667

4.00 0.1621 0.2254 0.2278 0.1262 0.0762

8.00 0.3038 0.2546 0.2662 0.1763 0.1088

Notes:

E2SXR: mean square error for single input with SXR signal.

E2H: mean square error for single input with H signal.

E24: mean square error for four inputs when the Mirnov probes were chosen from the 42 and 234 locations.

E24: mean square error for four inputs when the Mirnov probes were chosen from the 138 and 330 locations.

E26: mean square error for six inputs.

Then the trained ANN behaved still better with four

probes more uniformly spread out in the four quad-

rants. Thus, this shows that the poloidal distribution

of the probes was crucial for the ANN to perform

well on out of sample discharges. Use of more probes,

however, did not improve the fitting much, and the

network ran the risk of being too heavy, resulting in

unnecessary computation time.

Table 2 compares the training errors for differentANN inputs. Table 3 displays the performance of the

trained ANN with various combinations of inputs.

These include:

(a) A single SXR signal input;

(b) A single H signal input;

(c) Four inputs consisting of the two Mirnov probes

at poloidal angles of 42 and 234, the SXR and

H signals;(d) Four inputs consisting of the two Mirnov probes

at poloidal angles of 138 and 330, the SXR and

H signals;

(e) Six inputs comprising all four Mirnov probe

signals, and the SXR and H signals.

When applied to new data, it is clear from Table 3

that the ANN was most tolerant to the increase of

predicted time steps when six different diagnostics

were used, although the training error as well the

single step prediction error were the minimum whenonly the H signal was the input.

Therefore, the final set of diagnostic data used in

this study consisted of the following:

(i) Four Mirnov probe signals. The probes chosenare located more or less symmetrically around



8/16


Table 4. Comparison of the instants of disruption

triggering as displayed by the indicator for various t

using the H signal

t (ms) Actual instant Predicted instant

1.00 81.495 81.50

2.00 81.495 81.50

3.00 81.495 81.50

4.00 81.495 81.505.00 81.495 81.50

6.00 81.495 81.50

7.00 81.495 81.50

8.00 81.495 81.52

the plasma, at poloidal angles of 42, 138, 234

and 330.

(ii) One set of SXR monitor data.(iii) One set of H monitor data.

Since each of the inputs to the ANN was an array,

composed of the past values of the variable, it had tobe expressed as a vector rather than a scalar, the vec-

tor components corresponding to the past values (thenumber of which in our case was ten). Thus there

were six input vectors in the network, corresponding

to the six diagnostic signals listed above. The out-

puts were the future values of the same signals to be

predicted, which in this study was at a single time

instant only, according to Eq. (11). Thus, the ANN

had six scalar outputs.

6. Forecasting disruption

After the ANN was trained and the weight fac-

tors properly set, it was used to forecast disruption

on three disruptive discharges from ADITYA. These

discharges differed in the maximum plasma current

and the duration, but the general behaviours of the

fluctuating quantities were similar. Another notablefeature was that all these discharges ended in a major

disruption, without any preceding minor disruption.

As already mentioned, an important criterion for all

our forecasting was to choose the instant of disrup-

tion triggering.

For the actual detection of the instant of disrup-

tion triggering, which in fact was our first goal, an

indicator was made whereby the moment the insta-

bilities set in, an alarm would be given to the controlsystem, which then could take measures to soften the

impact of the disruption. Table 4 shows the trigger-

ing instants as displayed by the indicator for various

Table 5. Comparison of ANN performance with respect

to mean square error E2 for unfiltered and filtered input

signals

(The first value of t corresponds to a single step ahead

prediction.)

t (ms) Without filter With filter

0.02 0.0272 0.0090

1.00 0.0577 0.03652.00 0.0650 0.0452

3.00 0.1037 0.0858

t, using one of the forecasting discharges for the

H signal. The H radiation in ADITYA was seen

to remain at a more or less constant value (Figs 3, 6

and 9) during the ramp-up and flat-top phase of the

discharge before starting to rise at the instant thedisruption precursors set in (which coincides with

the instant of disruption triggering). So the crite-

rion for defining the disruption triggering was thatthe signal value should be greater than 2.00. The

results showed that the prediction instants remained

exactly the same up to t = 7 ms (although therewas a very small discrepancy with the actual signal),

while for t = 8 ms, a small time lag of 0.02 ms was

observed for the first time. This seemed to be the

trend in all the discharges used for forecasting, where

this time lag varied from 0.02 to 0.03 ms. Therefore,

in our results t was limited to 8 ms. Since the ANNinputs were experimental signals, the inherent noise

was inevitably there. It was observed that there was

a good reduction of error after filtering of the noise,as shown in Table 5, so that a better fitting was

achieved. This motivated us to use filtered experi-

mental data as inputs in the subsequent cases.

Figure 3 shows the first of the discharges used

for forecasting, shot 6690. This 95.28 kA plasma dis-

rupted at t 82 ms, while a disruption was triggered

at t 81.50 ms, as our indicator shows. Figure 4

compares the quality of prediction of this disruptive

event t = 1, 2, 4 and 8 ms earlier, with respect

to the SXR experimental signal. Figure 5 does the

same, but with the H signals. With a sampling time

of 0.02 ms for these discharges, this corresponded to

predicted time instants 50, 100, 200 and 400 time

steps ahead, respectively; these being the values ofn

in Eq. (11).

The major observations from these figures were

the following.

(a) Unlike the previous articles [4] and [6] where

a time lag was reported for the predicted instant of



9/16


0

10

20

Vloop2

5

0

5

SX

R

5

0

5

mag.

fluct.

0

50

100

Ip(kA)

Shot : 6690 06Jan1999 01:46:58 PM

0

5

H

0 10 20 30 40 50 60 70 80 90 1000

1

2

Bv(kA)

Time(ms)

Figure 3. The first disruptive discharge, shot 6690, was used for forecasting. This

plasma shot disrupted around 82 ms, and the disruption was triggered around 81 ms.

The plasma current attained prior to disruption was t 90 kA.

5

0

5

predicted t = 1ms

5

0

5

predicted t = 2ms

5

0

5

predicted t = 4ms

50 55 60 65 70 75 80 85 905

0

5

time(ms)

predicted t = 8ms

5

0

5

expt.

Figure 4. Forecasting disruption using our full network with six inputs for shot

6690. Only SXR signals are shown. The actual experimental signal is compared with

the neural network predictions for t = 1, 2, 4 and 8 ms early, as shown. The vertical

lines indicate the actual instant of triggering the disruption.



10/16


0

5

predicted t = 2ms

0

5

predicted t = 4ms

50 55 60 65 70 75 80 85 900

5

time(ms)

predicted t = 8ms

0

5

exppt.

0

5

predicted

t = 1ms

Figure 5. Forecasting disruption using shot 6690. Only H signals are shown. The

actual experimental signal is compared with the neural network predictions for t =

1, 2, 4 and 8 ms early, as shown. The vertical lines indicate the actual instant of

triggering the disruption.

disruption beyond 1.12 and 3.12 ms, respectively, thepresent study did not show any appreciable time lag

even for a prediction 8 ms earlier. This showed a

significant improvement of the results by the use of

more diagnostic information into our neural network.(b) As the temporal activities were predicted ear-

lier and earlier, there was only a small change in the

waveform of the predicted signals with respect to the

corresponding targets.

(c) The last 30 ms of the discharge was scanned.

This was found to be enough for our purpose, asthe temporal activities around the time the instabil-

ities were triggered have been well depicted. More-

over, sawtooth phenomena are clearly observed from

Fig. 4, around 55 ms, which are also included within

the predicted part of the signal.

(d) The vertical lines in Figs 4 and 5 indicatethe instant the disruptive instabilities have just been

triggered. By following this line for each of the fiveplots of each figure, the ANN prediction and the

actual disruption can be compared very well.

(e) A prediction at a time t early means that

the signal at time t is predicted at the instant tt.

If the prediction results are analysed for t = 8 ms,

it is observed that the instant of observation of dis-

ruption precursors around 81 ms was predicted by

using the temporal behaviour around 73 ms.

Figure 6 shows the second plasma discharge usedfor forecasting. This 83.57 kA discharge disrupted

at t 62 ms, the disruption being triggered at t

60 ms. Figures 7 and 8 display the performance of the

neural network for prediction of this disruption 1, 2,4 and 8 ms early, with only two of the inputs, the

SXR and H signals, being shown.

Analysis of shot 6520 revealed the following:

(i) Once again a very good prediction of the trig-

gering of the instability, the instant of which is

given by the vertical lines, was observed even for

t = 8 ms.

(ii) The last 28 ms of this discharge were pre-

dicted. The reason for choosing only this portion

was that in this temporal range the SXR signal was

observed to rise along with the current ramp-up.It was observed that the signal from the monitor

was able to pick up the actual rise of core temper-

ature only around 30 ms. However, once again this



11/16


0

10

20

Vloop2

42

02

SXR

1

0

1

mag.

fluct.

0

50

100

Ip(kA)

Shot : 6520 24Dec1998 05:09:45 PM

0

5

H

0 10 20 30 40 50 60 70 800

1

2

Bv(kA)

Time(ms)

Figure 6. The second disruptive discharge, shot 6520, used for forecasting. This

plasma discharge disrupted around 62 ms, and the disruption was triggered around

60 ms. The plasma current attained prior to disruption was 80 kA.

5

0

5

expt.

5

0

5

p

redicted t = 1ms

5

0

5

predicted t = 2ms

35 40 45 50 55 60 655

0

5

time(ms)

pr

edicted t = 8ms

5

0

5

predicted t = 4ms

Figure 7. Forecasting disruption using shot 6520. Only SXR signals are shown.

The actual experimental signal is compared with the neural network predictions for

t = 1, 2, 4 and 8 ms early, as shown. The vertical lines indicate the actual instant

of triggering the disruption.



12/16


0

5

expt.

0

5

predicted

t = 1ms

0

5

predicted

t = 2ms

0

5

predicted

t = 4ms

35 40 45 50 55 650

5

time(ms)

predicted

t = 8ms

Figure 8. Forecasting disruption using shot 6520. Only H signals are shown. The

actual experimental signal is compared with the neural network predictions for t =

1, 2, 4 and 8 ms early, as shown. The vertical lines indicate the actual instant of

triggering the disruption.

sufficed, as this time regime contained the disruption

precursors followed by the current quench, as also a

portion of the discharge prior to the triggering of the

instabilities.

(iii) The spikes of the SXR signal towards thenegative side were only noise and obviously did not

have any physical significance. These spikes contin-

ued even after the discharge terminated. However,Fig. 7 shows that the noise level was considerably fil-

tered, and the negative spikes were greatly reduced.

The third discharge used for forecasting,

shot 6688, is shown in Fig. 9. In this case the98.39 kA plasma disrupted at t 65 ms, while

the triggering instabilities set in around 63.32 ms,

according to the indicator. The observations from

this discharge are described below:

(a) The signal from the Mirnov probe at 42, and

the SXR and H signals were predicted remarkably

well, with very little distortion in the signals even for

a prediction 8 ms early.

(b) The SXR signals in this case did not contain

any negative spikes. In addition, sawtooth oscilla-

tions were observed prior to the disruption, for the

last 30 ms. These sawteeth were excellently picked

up by the neural network.

(c) The vertical lines in Figs 1012 show the

instant of triggering of the disruptive instabilities.

From Fig. 10 one observes that the MHD activitiesas picked up by the Mirnov probe started increas-

ing around 63 ms, when the magnetic fluctuations

increased in amplitude.

It was seen from the results of all the three disrup-

tive discharges that, while predicting the disruption

occurrence, the ANN did not give any false predic-

tion within the non-disruptive part of the discharge.

This should be a good motivation for using this algo-

rithm as a disruption alarm.

A general feature of all the predictions was that

towards the beginning of the predicted interval, sev-

eral of the predicted signals became a little distorted

with respect to the actual signal, especially at higher

t. However, for achieving the goals of the present

study, this was not likely to prove any hurdle, as onlythe prediction of the signal around the instant of

the triggering of disruptive instabilities was of prime

concern. In the earlier part of the discharges, the



13/16


0

10

20

Vloop2

42

02

SXR

5

0

5

mag.

fluct.

0

5

H

0 10 20 30 40 50 60 700

1

2

Bv(kA)

Time(ms)

0

50

100

Ip(kA)

Shot : 6688 06Jan1999 01:34:55 PM

Figure 9. The third disruptive discharge shot 6688, used for forecasting. This plasma

discharge shows a major disruption at t 66 ms. The plasma current attained prior

to disruption was t 103 kA.

5

0

5

expt.

5

0

5

p

redicted t = 1ms

5

0

5

predicted t = 4ms

35 40 45 50 55 60 65 70 75 805

0

5

time(ms)

pr

edicted t = 8ms

5

0

5

predicted t = 2ms

Figure 10. Forecasting disruption using shot 6688. Only B=42 signals are shown.






14/16


5

0

5

expt.

5

0

5

predicted t = 1ms

5

0

5

predicted t = 2ms

5

0

5

predicted t = 4ms

35 40 45 50 55 60 65 70 75 80

5

0

5

time(ms)

predicted t = 8ms

Figure 11. Forecasting disruption using shot 6688. Only SXR signals are shown.




0

5

expt.

0

5

pred

icted

t = 1ms

0

5

predicted

t = 2 sm

0

5

predicted

t = 4ms

35 40 45 50 55 60 65 70 75 800

5

time(ms)

pred

icted

t = 8ms

Figure 12. Forecasting disruption using shot 6688. Only H signals are shown.






15/16


point that was of real importance for our purpose

was whether any false alarms were produced by the

ANN, when there were no indications of the trigger-

ing of a disruption in the actual data.

None of the discharges used in this work was pre-

dicted entirely. To do this, a fresh training was nec-

essary, as the plasma dynamics during the startup

phase were not picked up by the ANN during train-

ing, which was also done using the last 35 msof the training discharge. For forecasting the wholedischarge, a large error was, therefore, anticipated.

But although the time series prediction formalism

requires the use of more past information for an accu-

rate prediction of the future, the initial phase of the

discharges was unlikely to provide any extra infor-

mation regarding the triggering of the instabilities

leading to the disruption.

7. Discussions

The forecasting of plasma disruptions in toka-

mak ADITYA were described in the previous sec-

tion, using a set of diagnostics different from what

had been used in the earlier works [4, 6]. The

use of a combination of several diagnostic signals,

rather than a single type of diagnostic as had been

used in the studies of [4, 6], was thought to haveproduced the improved forecasting capabilities of

the ANN.

Apart from changing the nature of the inputs,another major change was made in the present work

from Refs [4, 6]. This concerns the use of a direct pre-diction of the disruption, unlike the iterated predic-

tion methods incorporated earlier. But it was proved

that the improvement in forecasting was not really

due to this change, as the use of a single input in this

work did not produce better results. In particular,

the performance of a trained network with only an

SXR signal showed a result similar to that of Ref. [6],

as the time lag was first observed around 3 ms. A

glance at Table 3 would reveal that the ANN predic-

tion error with only an H signal worsened further.There was not much change when the SXR and Hsignals were used along with two Mirnov probes at

42 and 234. There was, however, a marked improve-

ment when the magnetic signals were from probes at

138 and 330. The best predictions were obtained

from all four probes, together with the SXR and Hsignals. From this it appears that two main factors

were responsible for the best prediction results in this

work:

(a) A definite combination of diagnostic signalsfrom 4 Mirnov probes, one SXR and one Hmonitor.

(b) The poloidal distribution of the Mirnov probes

around the plasma.

The performance of the trained ANN in an actual

real time application for plasma disruption forecastcould not be in doubt, as the discharges used in this

study were experimental, and noise tolerance of theANN was automatically ensured.

Regarding the timescales of TEXT and ADITYA,

it can be stated that the scales are much shorter

for ADITYA, as the plasma duration for ADITYA

is around 100 ms while that for TEXT varies from

250 to 400 ms or more [6, 15]. Thus a detection of an

approaching disruption 8 ms in advance in ADITYAwould correspond to a much more reliable situation

for a real time prediction.

8. Summary and conclusions

In this article a neural network was used for fore-

casting plasma disruptions in ADITYA. A number of

diagnostic signals were fed into the network input.

Although this made the structure of the network

heavier, it is believed that this increased input infor-

mation from a definite combination of the diagnostics

was the main reason for the significantly improved

performance. This combination provided the opti-

mum number of inputs to the ANN, with ten past

values of each of the temporal variables.Confidence about the performance of the ANN

in real time could be gained from the fact that the

algorithm not only predicted the trigger of the insta-

bilities correctly, but did it sufficiently early which

was the basic requirement for real time operations.

A forecast of an approaching disruption about 8 ms

in advance is extremely crucial, not only for medium

sized machines like ADITYA, but also for reactor

grade tokamaks like ITER where the pulse lengths

are to be around 1000 s. Since such long pulse opera-tions can be strongly inhibited by major disruptions,

a forecast as proposed in this study can be effectively

used to alert the real time control systems and mea-

sures, such as electron cyclotron resonance heating,

pellet injections and neutral beam heating, can be

put into operation to soften the harmful effects of dis-

ruptive termination of a plasma discharge. In addi-tion, it was amply demonstrated that in the absence

of any approaching disruption, the network would



16/16


not give any false alarms. Finally, since experimen-

tal plasma discharges were used in this study, the

ability of the ANN from the point of view of noise

tolerance was automatically ensured.

One crucial observation in this work was that the

discharges used were not taken on the same day, and

yet no effect was noticed in the prediction quality.

The quality degraded slightly only due to a larger

t. From this it could be concluded that the physi-cal conditions, such as wall conditioning and average

plasma density, do not have any effect on the predic-

tion of disruption. Prediction depends basically on

the nature of the discharges. The discharges used in

this work were, by nature, similar in so far as thegeneral variation of the different temporally varying

plasma parameters is concerned. Moreover, all the

discharges ended in a major disruption without any

intermediate minor disruption. So although the max-

imum plasma current, loop voltage and the duration

of the discharges varied from discharge to discharge,

these had no real effect on the quality of prediction.

Acknowledgements

The authors take this opportunity to express

their sincere thanks to J.B. Lister for providing

them with the neural network program. They grate-

fully acknowledge H. Ramachandran for his sugges-tions and critical comments after going through this

manuscript. One of the authors (AS) would like to

thank C. Ramdas who helped in drawing the neu-

ral network structure of Fig. 1. Finally, the authors

thank the entire ADITYA team for supplying theexperimental data.

References

[1] Lister, J.B., Schnurrenberger, H., Nucl. Fusion 31

(1991) 1291.

[2] Coccorese, E., Morabito, C., Martone, R., Nucl.

Fusion 34 (1994) 1349.

[3] Albanese, R., et al., Fusion Technol. 30 (1996) 219.

[4] Hernandez, J.V., et al., Nucl. Fusion 36 (1996) 1009.

[5] Wroblewski, D., Jahns, G.L., Leuer, J.A., Nucl.

Fusion 37 (1997) 725.[6] Vannucci, A., Oliveira, K.A., Tajima, T., Nucl.

Fusion 39 (1999) 255.

[7] Yoshino, R., Koga, J.K., Takeda, T., Fusion Tech-

nol. 30 (1996) 237.

[8] Hamilton, J.D., Time Series Analysis, Princeton

University Press, Princeton, NJ (1994).

[9] Weigend, A.S., Gershenfeld, N.A., Time Series Pre-

diction: Forecasting the Future and Understanding

the Past, Addison-Wesley, Reading, MA (1992).

[10] Bhatt, S.B., et al., Indian Pure Appl. Phys. 27

(1989) 710.

[11] Saxena, Y.C., Curr. Sci. 65 (1993) 25.

[12] Geva, A.B., IEEE Trans. Neural Networks NN-9(1998) 1471.

[13] Bishop, C.M., Rev. Sci. Instrum. 65 (1994) 1803.

[14] Lister, J.B., Schnurrenberger, H., Marmillod, P.,

Implementation of a Multilayer Perceptron for a

Non-linear Control Problem, Rep. LRP 398/90,

CRPPEPFL, Lausanne (1990).

[15] Vannucci, A., McCool, S.C., Nucl. Fusion 37 (1997)

1229.

(Manuscript received 27 October 1999

Final manuscript accepted 30 August 2000)

E-mail address of A. Sengupta:

[email protected]

Subject classification: C0, Tm