Khalid 1996 Micro Processing and Micro Programming

download Khalid 1996 Micro Processing and Micro Programming

of 12

Transcript of Khalid 1996 Micro Processing and Micro Programming

  • 8/9/2019 Khalid 1996 Micro Processing and Micro Programming

    1/12

    MicroprocessingandMicroprogrammingELSEVIER Microprocessing and Microprogramming 41 ( 19%) 69 l-702

    A neural network-based replacement strategyfor high performance computer architecturesHumayun Khalid *

    Depurtm ent of Electrical Engineering, Convent Av enue at 140th St reet, City University of New York, The City College, New York,NY 10031, USA

    Received 5 May 1995; revised 8 August 1995; accepted 23 October 1995

    AbstractWe propose a new scheme for the replacement of cache lines in high performance computer systems. Preliminary

    research, to date, indicates that neural networks (NNs) have great potential in the area of statistical predictions [l]. Thisattribute of neural networks is used in our work to develop a neural network-based replacement policy which can effectivelyeliminate dead lines from the cache memory by predicting the sequence of memory addresses referenced by the centralprocessing unit (CPU) of a computer system. The proposed strategy may, therefore, provide better cache performance ascompared to the conventional schemes, such as: LRU (Least Recently Used), FIFO (First In First Out), and MRU (MostRecently Used) algorithms. In fact, we observed from the simulation experiments that a carefully designed neuralnetwork-based replacement scheme does provide excellent performance as compared to the LRU scheme. The new approachcan be applied to the page replacement and prefetching algorithms in virtual memory systems.Keywor& Performance evaluation; Trace-driven simulation; Cache memory; Neural networks; Predictor

    1. IntroductionIn high performance computer systems, band-width of the memory is often a bottleneck because itplays a critical role in affecting the peak throughout.Cache is the simplest cost-effective way to achieve

    high speed memory hierarchy and, its performance is

    l Email: [email protected]

    extremely vital for high speed computers [2-71. It isfor this reason, that caches have been extensivelystudied since their introduction by IBM in the system360 Model 85 [8]. Caches provide, with high proba-bility, instructions and data needed by the CPU at arate that is closer to the CPUs demand rate. Theywork because the programs exhibit a property calledlocality of reference. Three basic cache organizationswere identified by Conti [9]: direct mapped, fullyassociative, and set associative. The choice of cache

    0165~6074/%/$15.00 0 1996 Elsevier Science B.V. All rights reservedXSDI 0165-6074(95)00030-5

  • 8/9/2019 Khalid 1996 Micro Processing and Micro Programming

    2/12

    692 H. Khal id/M icroprocessing and Mi croprogramm ing 41 (1996) 691-702

    organization can have a significant impact on cacheperformance and costs [lo- 121. In general, set asso-ciative cache organization offers a good balancebetween hit ratios and the implementation cost. Also,the selection of a line/block replacement algorithm,in set associative caches, can have. a significantimpact on the overall system performance. The com-mon replacement algorithms used with such cachesare: FIFO, MRU, and the LRU. These algorithms tryto anticipate future references by looking at the pastbehavior of the programs (exploiting the locality ofreference proper@ of the programs). The relativeperformance of these algorithms depends mainly onthe length of the history consulted. An LRU algo-rithm consults a longer history of past address pat-terns than FIFO, and therefore, it has a better relativeperformance.Many different approaches have been suggestedby researchers to improve the performance of re-placement algorithms [13-151. One such proposalwas given by Pomerene et al. [13]. Pomerene sug-gested the use of a shadow directory in order to lookat a relatively longer history when making decisionswith LRU. The problem with this approach is thatthe size of the shadow directory limits the length ofthe history consulted. Chen et al. [14] studied theimprovement in cache performance due to instruc-tion reordering for a variety of benchmark programs.The instruction reordering increases the spatial andsequential locality of instruction caches, and thereby,results in a better performance of the replacementand mapping algorithms. Our research work attemptsto improve the performance of cache memory bydeveloping a replacement strategy that looks at avery long history of addresses, without proportionalincrease in the space/time requirements. The task isaccomplished through neural networks (NNs). Somerudimentary work in this area was done by Sen [ 161.His results were not very encouraging. Moreover, nojustification or explanation was given for the design.

    In the next section of this paper we will describevarious neural network paradigms that have been

    considered in our work. Section 3 provides a theoret-ical framework for the replacement policy. Next,algorithms are developed to realize the replacementscheme for different categories of neural networks.Section 5 is on the hardware implementation of thealgorithms. Simulation results and discussion aregiven in Section 6. Finally, the last section containrecommendations for future work and conclusions.

    2. Neural network paradigms for the cache con-troller

    Neural networks are human attempts to simulateand understand the nervous system with the hope ofcapturing some of the power of these biologicalsystems such as the ability for generalization, grace-ful degradation, adaptivity and learning, and paral-lelism. There are many different types/models/paradigms of neural networks (NNs), reflecting theemphasis and goals of different research groups.Each network type, generally, has several variationswhich address some weakness or enhance some fea-ture of the basic networks. In this large array ofnetwork types, there are two characteristics that di-vide NNs into a few basic categories.(1) Direction of signal flow: feed-forwa,rd/feed-

    back/lateral,(2) Mode of learning: supervised/unsupervised/re-inforcement.NNs are experts at many pattern recognition tasks.They generate their own rules by learning from

    examples shown to them. Due to their learning abil-ity, NNs have been applied in several areas such as:financial prediction, signal analysis and processing,process control, robotics, classification, patternrecognition, filtering, speech synthesis, and medicaldiagnosis. Several good references are available onthe theory and applications of NNs [17-191.We have chosen six of the most popular NNs, thatencompass almost all of the basic categories. Inaddition, we have looked at variants from among

  • 8/9/2019 Khalid 1996 Micro Processing and Micro Programming

    3/12

    H. Khali d/M icroprocessing and Mi croprogramm ing 41 (1996) 691-702 693

    these basic categories. A very brief description of theparadigms used in our work is given hereunder.(1) Backpropagation Neural Netw ork (BPNN). Atypical BPNN has an input layer, an output layer,and at least one hidden layer. There is no theoreticallimit on the number of hidden layers. However, inour work we have tried to limit the number of casesunder study by considering at most two hidden lay-ers. There are several variants of BPNNs, a few ofwhich are considered in the present work.(2) Radial Basis Function Network (RBFN). In gen-eral, a RBPN is any neural network which makes useof a radially symmetric and radially bounded transferfunction in its hidden layer. Probabilistic neural net-works (PNNs) and general regression neural net-works (GRNNs) could both be considered RBFNs.The Moody/Darken type of RBFN (MD-RBFN),that we have used in the research, consists of 3layers: an input layer, a prototype layer, and anoutput layer.(3) Probabilistic N eural Netw ork (PNN J. The PNNsactually implements Bayes decision rule and usesParzen windows to estimate the probability densityfunctions (pdf) related to each class. The main ad-vantage of PNN is that it can be effectively usedwith sparse data.(4) Modular Neural Network (MNN). MNNs con-sists of a group of BPNNs competing to learn differ-ent aspects of a problem, and therefore, it can beregarded as a generalization of BPNN. There is agating network associated with MNN that controlsthe competition and learns to assign different parts ofthe data space to different networks.(5) Learning Vector Q uantiz ation Nehvork (LVQ).LVQ is a classification network. It contains an inputlayer, a Kohonen layer which learns and performsthe classification, and an output layer. The basic

    LVQ has various drawbacks and therefore, variantshave been developed to overcome these shortcom-ings.(6) General Regression Neural Netw ork (GRNN ).GRNNs are general purpose NNs. A simple GRNNconsists of an input layer, a pattern layer, summationand division layer, and an output layer, GRNNsprovide good results in situations where the statisticsof the data changes over time. They can be regardedas the generalization of probabilistic neural net-works. GRNNs use the same parzen estimator as thePNNs.

    3. Theoretical basis for the cache replacementpolicy

    The proposed scheme uses neural networks toidentify which line/block should be replaced on acache miss in real-time. The job of our neural net-works is to learn a function that may approximatethe unknown complex relationship between the pastand future values of addresses.f: past + future;tl (past, future) E Domain( addresses)The function of f can also be learned using the tagfield of the addresses. Recall, that a set-associativecache partitions the addresses into tag, set, and theword fields. The word field is used to select a wordwithin a cache block/line and the set field is usedfor indexing. The tag field preserves the mappingbetween the cache and the main memory. This meansthat the development of f using the tag field is onlypossibly if we use the set field for indexing a groupof neurons in NNs. Fig. 1 shows how a NN ispartitioned into groups/sets and indexed through aset selector. Here, the function f actually representsa predictor. The parameters of such predictors aredeveloped through the use of some statistical proper-ties of variables/functions such as mean, variance,

  • 8/9/2019 Khalid 1996 Micro Processing and Micro Programming

    4/12

    694 H. Khal id/M icroprocessing and Mi croprogramm ing 41 (1996) 691-702

    Fig. 1. Partial block diagram of the cache controller with anyneural network paradigm except PNN/GRNN.

    probability density functions (pdf), cumulative distri-bution functions (cdf), power spectral density etc.This means that our NNs would behave as predictorsif they are informed about the statistical properties ofthe tags. The problem with this approach is that tags,like other similar data sets, do not have well-definedstatistical properties.To accomplish the task in, we have attempted toteach NNs the histogram of eferences (an estimatedpdf is called a histogram). Initially, the performanceof the predictor would be low, however, the perfor-mance improves as the histogram develops. A well-developed histogram may contain sufficient localityinformation about the programs and can be used forguiding the replacement decisions. In order to ac-complish this with NNs, we need at least two tags,T(i) and T(i + l), for the training/learning. Thisimplies that the network learning should be delayeduntil a new tag arrives. A tag T(i) is a vector that ismade up of Os and - ls (Bipolar values). The sizeof each vector T(i) depends on the computer archi-tecture. For example for a Q-bit address machinethat incorporates a cache with N sets and P wordsper block, the size of the vector T(i) would be equalto Y = (Q - (log, N + log, P)) bits. So, a Y-di-mensional vector T(i) should be applied to the input

    layer of NNs (predictor) and the output f(i + 1)(estimate of T(i + 1)) is compared with T( i + 1) inorder to get error values for the feedback purpose.The repetition of the procedure will eventually leadto the development of predictor parameters contain-ing histogram information of the addresses. Thereare some NNs, like PNNs and GRNNs, which re-quire only the present value of tags T(i) to estimatethe categorical pdf/cdf, internally, using parzen win-dows. Such NNs are good for classification type ofproblems, since they are adept at developing categor-ical histograms (related to a category/class).The above discussion gives some insight into thealgorithm that would make NNs behave as predictorsfor our problem. The question that remains is thathow can the predictor information be used forblock/line replacements in a cache in real time?This question is answered in two different ways fortwo different categories of NNs based on whetherthe histogram is estimated internally by NNs or not.Our first category consist of BPNN, MNN, LVQ,and RBFN. The second category contains PNN, andGRNN. The basic approach, however, is the samefor both cases. What we want to do is to make bothtypes of NNs learn the histogram of addresses/tags.At the same time, we would also like to extract thehistory information for a particular tag instead of theestimate of the future values of the tag. So, theimportant parameter that is used for guiding thereplacement decision is not f(i + l), rather, it is theresponse of a particular layer of neurons containingcategorical history. The number of categories re-ferred to in such layers, that is the number of neu-rons in the layers, should obviously be equal to theassociativity A of the cache in order to be able toidentify the cache line to be replaced within the set.The task, therefore, is to select a NN architecture andtrain it to learn the histogram in such a way so that atleast a layer of neurons in NNs would provide thedesired information. The responses of these neuronsto an incoming tag T(i) will therefore determine theextent to which a tag vector T(i) is seen in the past.

  • 8/9/2019 Khalid 1996 Micro Processing and Micro Programming

    5/12

    H. Kholi d/ Mi croprocessing and Mi croprogrumm ing 41 (1996) 691-702 695

    This can be accomplished by correlating the historyof references, corresponding to each cache line in aset, with the present tag vector T(i).Consider the finite correlation of two sequences x,and Y,,,mn= c X k y k - n

    k =O

    =X( ) y - n +X, y , _ n + . . . +x , y , _ . ( 1 )

    T h e neurons in NNs can be made to compute thesum as follows for an input vector X and the weightvector W:xrw = xowo + X,W] + . . . +xmwm (2)Equating the right hand side of Eqs. (1) and (2), weget the following values:W~=y_nrW1=Y~-nt . . . . ..,wm=ym_nThis means that for certain weight values, the innerproduct results from a layer of neurons can provide,to a certain extent, the correlation information. Withthis information, we can determine which line is themost suitable candidate to be replaced in a cache set.Another way of looking at the same thing is fromthe point of view of the difference in the two vectors.For an associativity A, we are looking towards anA-class problem for which we requirei= ,7ax, A 11X - Wi 119. 7Expanding II X - Wi II we get,IIx-wiII=[(x-wi)(x-wi)]2

    = (xTx- 2XTWi -t- w;rwi)12 (3)From (3) it is obvious that for max II X - Wi II,i= 1, 2,..., A, we require min (XTWi>. Therefore,the most suitable line to be replaced is the one thatcorresponds to the neuron, in a layer, with the small-est output response. Thus, the identification of such alayer for the two categories of NNs, under study,becomes imperative for our job. For the PNNs/GRNNs, parzen estimators are responsible for devel-

    T(i)

    i ForSet N-1 1i TLi

    Fig. 2. Partial block diagram of the cache controller withPNN/GRNN paradigm.

    oping the histogram. In such a case, we dont needT(i) and T(i + 1) for the determination of feedbackerror. Therefore, the output layer of such NNs couldbe used for the identification of replaceable cachelines (see Fig. 2). The things become a little morecomplicated for the other category of NNs(BPNN, MNN, LVQ, RBFN). This is because, weneed to train such NNs to develop the histogram byexternally providing T(i + 1). In this paper, we sug-gest a heuristic solution to the problem of identifyingthe layer of neurons that would provide the replace-ment information. Our trace-driven simulation exper-iments indicate that by inserting a low-dimensionallayer of neurons between two identical NNs, learningthe histogram together, could help us identify theappropriate cache line to be replaced (see Fig. 1).

    4. The algorithms

    The discussion above, leads us to the followingalgorithms for the replacement of cache lines from aset.

  • 8/9/2019 Khalid 1996 Micro Processing and Micro Programming

    6/12

    696 H. Khal id/ Mi croprocessing and Mi croprogramm ing 41 (I 996) 691-702Algorithm-l

    (For replacements with BPNN, MNN, RBFN, andLVQ)Step 1: (Partitioning Of NNs)Partition the NNs into groups. Each group/NN mod-ule deals with a specific set. This means that for Nsets of a cache the algorithmic space/time complex-ity would be O(aN), where (Y s a factor that relatesto the complexity of NNs within a module.Step 2: (Partitioning of Modules)Each module of NN is partitioned into two sub-mod-ules. The first sub-module should have an input layerand one or more hidden layers. The second sub-mod-ule should have one or more similar hidden layersand an output layer. The two sub-modules are to beconnected by a lower dimensional layer of NNs,interface layer, containing A neurons (A =associativity of a cache) that provide the desiredresponse for replacements.Step 3: (Initialization)Initialize all the weights in the NNs to small randomvalues very close to zero.Step 4: (Module Selection)Partition each incoming address into tag, word, andset fields. The set field is to be used for selecting oneof the NN modules.Step 5: (Replacement Line Identification)Apply the input Y-dimensional tag vector 7(i) to theselected module n and look at the response from theinterface layer. The lowest response from neuron i,i= 1,2,..., A, indicates that the ith cache linewithin the set n should be replaced on a miss. On ahit situation or when a cache line is empty (illegaltag), the response from NNs should be disregarded.Step 6: (Learning)The output response from the second sub-module iscompared with the next incoming tag T( i + 1) andthe resulting error (T(i + 1) - l?(i+ 1)) is used forthe weight adjustment.Algorithm-2 (For replacements with PNNs andGRNNs)Step I: Same as above.

    Step 2: Same as step3 aboveStep 3: Same as step4 above.Step 4: (Replacement Line Identification)Apply the input Y-dimensional tag vector 7(i) to theselected module n and send the output response ofthe PNN/GRNN to a transformation module. Thismodule transforms values as follows:g: Lowest value + Highest possible output value for neurons

    All other values + Lowest possible output value for neurons.The highest output index of the transformation mod-ule indicates the corresponding cache line to bereplaced from the set n.Step 5: (Learning)The output values from the transformation unit iscompared with its input values for the error. Thismeans thatERROR = {g (input) - input) (4)However, for the hit situation or for the case when acache line is empty, the error is evaluated by com-paring the input values of the transformation modulewith a vector. This vector should have a value of+ 1.0 (highest possible unscaled output value for aneuron) at the coordinates corresponding to the hit-position/tag-insertion position (empty line case). Allother values should be - 1.0 (lowest possible un-scaled output value for a neuron>.

    5. Proposedscheme

    implementation of the replacement

    Based on the given theory and the algorithm fordifferent categories of NNs, we have developed twodifferent architectures presented via Figs. 1 and 2.Fig. 1 shows how the set field of an incomingaddress is extracted by the set selector to identify aneural network module that contains the informationrelated to a particular cache set. The input and theoutput lines for only one neural network module isshown for brevity. All the other neural networkmodules are identical copies of the one shown. Thetag field of an address is fed in the NN module as an

  • 8/9/2019 Khalid 1996 Micro Processing and Micro Programming

    7/12

    H. Khalid/Microprocessing and Microprogramming 41 (1996) 691-702 697

    input through a register. The sub-modules are repre-sented by rectangular boxes. These boxes containinput, output, and the hidden layer(s) for BPNN,MNN, RBFN, or LVQ. The number of neurons inthe input and output layers are identical and equal tothe dimensionality Y of the tag vector T(i). Thesub-modules are connected by a neural network layercalled the interface layer. Each neuron in the inter-face layer corresponds to a cache block/line withina cache set. The error values are generated by com-paring the estimated tag f(i + 1) with a new incom-ing tag T(i + 1). Therefore, the error calculations aredelayed by approximately the amount of time it takesfor a new address/tag to arrive. The error is com-puted through an adder/subtracter accompanied witha buffer as shown by a circle in Fig. 1. This error isfed back to the neural networks, for the set inconsideration, in order to develop the histogram ofaddresses. The minimum selector (MIN SEL) identi-fies the neuron, in the interface layer, that providesthe lowest response value and accordingly sends theline replacement information to the control unit (CU).The initial performance of this architecture wouldobviously be low, because of the initial randomweight selection. However, performance improves asNNs learn the histogram. The hardware implementa-tion for the PNNs/GRNNs based replacement algo-

    rithm is shown in Fig. 2. Here, only the presentvalue of tag T(i) is presented directly to the NNsmodule as an input. The output layer forPNNs/GRNNs is shown external to the rectangularbox in Fig. 2. The transformation module transformsthe output values from PNNs/GRNNs for two pur-poses:(11 To identify the corresponding cache line to be

    replaced.(2) To determine the error.Again, the module transforms values as follows:g : Lowest value + Highest possible output value

    for neuronsAll other values + Lowest possible output value

    for neuronsFor example, a 4-dimensional output vector[O.l 0.9 0.8 0.31T is transformed as given below.[O.l 0.9 0.8 0.31T L [l.O - 1.0 - 1.0 - 1.01rA multiplexer (MUX) is used to select the vectorvalues from either the CU or the output of thetransformation unit. The information as to which ofthe inputs should be selected by the MUX comesfrom the CU. The lower input lines to the MUX areselected by the CU on a hit situation or when a cache

    (4 (c)Fig. 3. Miss ratio v.s. cache size for different variants of BPNN with BLDATA as the trace tile: (a) Associativity = 2; (b) Associativity = 4,(c) Associativity = 8.

  • 8/9/2019 Khalid 1996 Micro Processing and Micro Programming

    8/12

    698 H. Khalid/ Microprocessing and Microprogramming 41 (1996) 691-702

    (4Fig. 4. Miss ratio vs. cache size for different variants of BPNN with TPC.DAT as the trace file: (a) Associativity = 2; (b) Associativity = 4;(c) Associativity = 8.

    line in empty. However, on a cache miss, the upperlines (representing the desired output vector) areselected for the purpose of error computation.

    6. Simulation results and discussionIn this research work, we have used trace-driven

    simulations to accurately obtain information on therelative performance of various neural networkparadigms and the conventional LRU algorithm. Thetwo trace files used were named TPC.data andBI.data. TPC.data is the trace file that was generatedon an IBM PC running Borlands Turbo PascalCompiler, The other trace file, BI.data, was gener-ated using Microsofts Basica Interpreter. Each ofthese original trace files contains approximately l/2

    Table 1Legend information for Figures 3 and 4Tl idden layer1 = 11, Learning rule = Quick-prop, Learning rate = 0.5, Transfer function = TanH, Momentum = 0.5I2 Hidden layer1 = 11,Learning rule = Max-prop, Learning rate = 0.5, Transfer function = TanH, Momentum = 0.5T3: Hidden layer1 = 19, Learning rule = Norm-cum, Learning rate = 0.5, Transfer function = TanH, Momentum = 0.5T4: Hidden layer1 = 11,Learning rule = Ext-delta-bar-delta, Learning rate = 0.5, Transfer function = Sigmoid, Momentum = 0.5T5: Hidden layer1 = 1 , Hidden layer2 = 19, Learning rule = Max-prop, Learning rate = 0.9, Learning rule = TanH, Momentum = 0.9T6: Hidden layer1 = 19, Learning rule = Norm-cum, Learning rate = 0.9, Transfer function = TanH, Momentum = 0.9T7: Hidden layer1 = 19, Learning rule = Norm-cum, Learning rate = 0.05, Transfer function = TanH, Momentum = 0.05

    million addresses. For more information on thesetrace files see [20].

    The caches considered in our simulation have setsizes that vary from N = 32 to N = 2048. The linesize is fixed at 8 words per block. We have alsovaried the associativity A and looked at three differ-ent cases, that is A = 2, A = 4, and A = 8. Eachfigure is partitioned into three parts namely (a), (b),and (cl, corresponding to the three values of associa-tively under consideration. Figs. 3 and 4 provide themiss ratio information for different variants of BPNNparadigm with BI.data and TPC.data as the tracefiles, respectively. Ts assigned to the legends denotethe type of the BPNN. Complete description of Tsis given in Table 1. Notice from the Figs. 3 and 4that BPNN of type T6 has a relatively better perfor-mance than the other types. This means that the

  • 8/9/2019 Khalid 1996 Micro Processing and Micro Programming

    9/12

    H. Khal id/ Mi croprocessing and Mi croprogramm ing 41 (1996) 691-702 699

    Norm-Cum learning rule along with the TanH trans-fer function is suitable for our neural network-basedreplacement algorithm. The number of neurons in thehidden layer are reasonably varied, in our experi-ments, in the range 10 to 20. For T5, even with twohidden layers, Max-Prop learning rule could notoutperform the Norm-Cum based results. However,T3s performance is on the same league as T6 (bestcase) which demonstrates the fact that for our data,there is a very little change in learning with theincreased momentum and learning rate values.The performance of T7 with respect to T6 indi-cates that very high values of momentum and leam-ing rates do not produce very drastic results ascompared to moderate values, but, very low valuesare indeed detrimental to the performance. The

    decrement in the miss ratio with increasing cachesize is uniform across different types of BPNNs.Notice that the relative performance of T7 in Fig. 4is worse as compared to Fig. 3; This is due to thefact that the learning of neural networks is dependenton the sequence of the data file. However, based onthe general profile of the results, for a variety ofaddress traces, we can draw some conclusions on therelative performance of different types of neuralnetworks. Our experiments with using dynamiclearning rates (based on miss ratios) failed becauseof the low values of miss ratios, the related curvesare, therefore, not included in the given figures. InFigs. 5 and 6, a comparative performance of BPNN,PNN, GRNN, LVQ, RBFN, and MNN is given.Here, we have considered the best results from among

    (b)a)Fig. 6. Miss ratio V.S. cache size for different NN paradigms with TPC.DATA as the trace file: (a) Associativity = 2; (b) Associativity = 4;(c) Associativity = 8.

  • 8/9/2019 Khalid 1996 Micro Processing and Micro Programming

    10/12

    700 H. Khlid/ Microprocessing and Mi croprogramm ing 41 (1996) 691-702

    ib) CC)Fig. 7. Miss ratio V.S. cache size for LRU, LVQ, and BPNN with BI.DATA as the trace tile: (a) Associativity = 2; (b) Associativity = 4; (c)Associativity = 8.

    the paradigms for a reasonable number of neurons(IO to 20 neurons). Our comparative study revealsthat BPNN outperforms the other paradigms. Theclosest rival appears to be the LVQ. PNNs andGRNNs appear to be next in line, respectively. How-ever, PNNs in some cases performs as well as LVQ.The difference in the relative performance of variousneural network paradigms is quite appreciable forlarge caches.The last pair of Figs. 7 and 8, provide us with arelative performance of LRU, LVQ, and BPNN. Forthe peak category, BPNN gives a significant perfor-

    mance improvement of 16.4711% in the miss ratiovalues over the LRU algorithm. The average is notsignificant because of the large variance in BPNNsperformance as shown in Figs. 3 and 4. LVQ alsoperforms well and, in most of the cache organiza-tions considered in the simulation studies, its perfor-mance is better than the LRU. This proves that wecan get excellent results from NNs provided wechoose our parameters carefully. We expect that,longer trace files for a variety of applications andsystems software can provide us with even betterresults. The results are trace dependent, however, the

    (a) (b) fc)Fig. 8. Miss ratio V.S. cache size for LRU, LVQ, and BPNN with TPC.DATA as the trace file: (a) Associativity = 2; (b) Associativity = 4;(c) Associativity = 8.

  • 8/9/2019 Khalid 1996 Micro Processing and Micro Programming

    11/12

    H. Khal id/ Mi croprocessing and Mi croprogramm ing 41 (1996) 691-702 701

    theory presented here dictates that our cache replace-ment strategy can be expected to yield reasonablygood results for any workload.

    7. Conclusion and recommendations for futurework

    In this paper, we have presented neural network-based cache replacement algorithms and have pro-posed corresponding hardware implementations.Several neural network paradigms, such as: BPNN,MNN, RBPN, LVQ, PNN, and GRNN, were investi-gated in our work and their relative performance wasstudied, and analyzed. Our trace-driven simulationresults indicate that with a suitable BPNN paradigm,we can get an excellent improvement of 16.47 11% inthe miss ratio over the conventional LRU algorithm.Excellent performance of neural network-based re-placement strategy means that this new approach haspotential for providing promisingplied to the page replacement andrithms in virtual memory systems.

    Acknowledgements

    results when ap-prefetching algo-

    This research is supported in part by psc-cunygrants #s 6-64455 and 6-666353.

    Referencesfll

    L.21

    (31

    S. Wu, R. Lu and N. Buesing, A stock market forecastingsystem with an experimental approach using an artificialneural network, Proc. 25th Small College Computing Symp.North Dakota (April 24-25, 1992) 183-192.MS. Obaidat, H. Khalid and K. Sadiq, A methodology forevaluating the performance of CISC computer systems undersingle and two-level cache environments, Microprocessingand Mi croprogram mi ng J 40(6) (July 1994) 41 l-421.S. Laha, J.H. Pate1 and R.K. Iyer, Accurate low-cost methodsfor performance evaluation of cache memory systems, IEEETruns. Computers 37(11) (Nov. 1988) 1325-1336.

    [41

    [51

    t61

    [71

    Dl[91

    1101

    [Ill

    [121

    [131

    1141

    [151

    1161

    D. ThiCbaut, H.S. Stone and J.L. Wolf, Improving disk cachehit-ratios through cache partitioning, IEEE Trans. Computers41(6) (June 1992) 665-676.A. Agarwal, M. Horowitz and J. Hennessy, An analyticalcache model, ACM Trans. Compurer Systems 7(2) (May1989) 184-215.J.E. Smith and J.R. Goodman, Instruction cache replacementpolicies and organizations, IEEE Trans. Computers 343)(March 1985) 234-241.M.D. Hill and A.J. Smith, Evaluating associativity in CPUcaches, IEEE Trans. Compufers 38( 12) (Dec. 1989) 1612-1630.S.G. Tucker, The IBM 3090 Systems: An overview, IBMSyst ems J. 25(6) (Jan. 1986).C.J. Conti, Concepts for buffer storage, fEEE Comp. GroupNews 2(8) (March 19691 9- 13.A.J. Smith, A comparative study of set associative memorymapping algorithms and their use for cache and main mem-ory, IEEE Trans. SofhYare Eng. 421 (March 1978).H. Khalid and M.S. Obaidat, A novel cache memory con-troller: algorithm and simulation, Summer Compurer Simula-tion Co& (SCSC 95), Ottawa, Canada (July 1995) 767-772.M.S. Obaidat and H. Khalid, A performance evaluationmethodology for computer systems, Proc. IEEE 14th AnnualInt. Phoenix Conf. on Computers and Communications,Scottsdale, AZ (March 1995) 713-719.J. Pomerene. T.R. Puzak, R. Rechtschaffen and F. Sporacio,Prefetching mechanism for a high-speed buffer store, USPatent, 1984.W.Y. Chen, P.P. Chung, T.M. Conte and W.-M.W. Hwu,The effect of code expanding optimizations on instructioncache design, IEEE Trans. Computers 42(9) (Sep. 1993)1045-1057.W.W. Hwu and P.P. Chang, Achieving high instructioncache performance with an optimizing compiler, Proc. 16rhAnnual Inr. Symp. on Computer Architecture (June 1989)242-25 1.C.-F. Sen, A self-adaptive cache replacement algorithm byusing backpropagation neural networks, MS Thesis, Univer-sity of Missouri-Rolla, 1991.

    [17] M.S. Obaidat and H. Khalid, An intelligent system forultrasonic transducer characterization, submitted to the IEEETrans. on Instrumentanon and Measurements, 1995.[181 J. Hertz, A. Krogh and R.G. Palmer, Jnrroduction to theTheory of Neural Computation (Addison-Wesley, 1991).

    [191 S. Kong and B. Kosko, Differential competitive learning forcentroid estimation and phoneme recognition, IEEE Trans.Neural Nenvorks 2(l) (Jan. 19911 118-124.

    [201 H.S. Stone, High Performance Compu rer Archit ecfure (Ad-dison-Wesley, 1993).

  • 8/9/2019 Khalid 1996 Micro Processing and Micro Programming

    12/12

    702 H. Khalid/Microprocessing and Microprogramming 41 (1996) 691-702Humayun Khalid was born at Karachi,Pakistan. He received his B.S.E.E.(Magna Cum Laude) and M.S.E.E.(Graduate Citation) degrees from theCity College of New York. Currently,he is a Ph.D. candidate in the depart-ment of Electrical Engineering at theCity University Of New York. His re-search interests include parallel com-puter architecture, high performancecomputing/computers, computer net-works, performance evaluation, and applied artificial neural networks. Mr.

    Khalid is a reviewer for IEEE international Phoenix Conferenceon Computers and Communications (IPCCC), IEEE InternationalConference on Electronics, Circuits, and Systems (ICECS95),Journal of Computer and Electrical Engineering, Information Sci-ences Journal, and Computer Simulation Journal. He is an Assis-tant Chair for the IEEE International Conference on Electronics,Circuits, and Systems (ICECS95). He is a lecturer at the CityUniversity of New York (CUNY), and a lead researcher at theComputer Engineering Research Lab. (CERL). He has publishedseveral refereed technical articles in national and internationaljournals and conference proceedings.