Branch prediction contest_report
-
Upload
prosunjit-biswas -
Category
Technology
-
view
937 -
download
0
description
Transcript of Branch prediction contest_report
![Page 1: Branch prediction contest_report](https://reader036.fdocuments.us/reader036/viewer/2022073116/547b35fc5906b57c798b45cc/html5/thumbnails/1.jpg)
Branch Prediction Contest: Implementation of Piecewise Linear Prediction Algorithm
Prosunjit Biswas
Department of Computer Science. University of Texas at San Antonio.
Abstract
Branch predictor’s accuracy is very important to harness the parallelism available in ILP and thus improve performance of today’s microprocessors and specially superscalar processors. Among branch predictors, various neural branch predictors including Scaled Neural Branch Predictor (SNAP), Piecewise Linear Branch predictor outperform other state-of-the-art predictors. In this course final project for the course of Computer Architecture (CS-5513), I have studied various neural predictors and implemented the Piecewise Linear Branch Predictor as per the algorithm provided by a research paper of Dr. Daniel A. Jimenez. The hardware budget is restricted for this project and I have implemented the predictor within a predefined hardware budget of 64K of memory. I am also competing for branch prediction contest. Keywords: Piecewise Linear, Neural Network, Branch Prediction.
I. INTRODUCTION
Neural Branch predictors are the most accurate predictors in the literature but they were impractical due to the high latency associated with prediction. This latency is due to the complex computation that must be carried out to determine the excitation of an artificial neuron. [3] Piecewise Linear Branch Prediction [1] improved both accuracy and latency over previous neural predictors. This predictor works by developing a set of linear functions, one for each program path to the branch to be predicted that separate predicted taken from predicted untaken. In this paper, Piecewise Linear Branch Prediction, Daniel A. Jimenez proposed two versions of the prediction algorithm – i) The Idealized Piecewise Linear Branch Predictor and ii) A Practical Piecewise Linear Branch Predictor. In this project, I have focused on the idealized predictor.
II. RELATED WORKS
Perceptron prediction is one of the first attempts in branch prediction history that associated branch prediction through neural network. This predictor achieved a improved misprediction rate on a composite trace of SPEC2000 benchmarks by 14.7%. [2] But unfortunately, this predictor was impractical due to its high latency.
First Path-Based Neural Branch Prediction[4] is another attempt that combines path and pattern history to overcome the limitation associated with preexisting neural predictors. It improved accuracy over previous neural predictors and achieved significantly low latency. This predictor achieved IPC of an aggressively clocked microarchitecture by 16% over the former perceptron predictor. Scaled neural analog predictor, or SNAP is another recently proposed neural branch predictor which uses the concept of piecewise-linear branch prediction and relies on a mixed analog/digital implementation. This predictor decreases latency over power consumption over other available neural predictors [5]. Fig.1 (Courtesy – “An Optimized Scaled Neural Branch Predictor” by Daniel A. Jimenez) shows comparative performance of noted branch prediction approaches on a set of SPEC CPU 2000 and 2006 integer benchmarks.
III. THE ALGORITHM
The Branch predictor algorithm has two major parts namely i) Prediction algorithm ii) Train/Update algorithm. Before going to the implementation of these
Fig. 1. Performance of Branch different branch Predictors over SPEC CPU 2000 and 2006 integer benchmarks (Courtesy - “An Optimized Scaled Neural Branch Predictor” by Daniel A. Jimenez) two algorithms, we will discuss the states and variable they use. The three dimensional array W is the data structure used to store weights of the branches which is used in both prediction and update algorithm.
![Page 2: Branch prediction contest_report](https://reader036.fdocuments.us/reader036/viewer/2022073116/547b35fc5906b57c798b45cc/html5/thumbnails/2.jpg)
Fig2: The array of W with its corresponding indices
Branch address is generally taken as the last 8/10 bits of the instruction address. For each predicting branch, the algorithm keeps history of all other branches that precede this branch in the dynamic path taken by the branch. The second dimension indicated by the variable GA keeps track of these per branch dynamic path history. The third dimension, as shown as GHR[i], keeps track of the position of the address GA[i] in the global branch history register namely GHR. Some of the important variables of the algorithm is also given here for the clarity purpose. GA : An array of address. This array keeps the path history associated with each branch address. As new branch is executed, the address of the branch is shifted into the first position of the array. GHR: An array of Boolean true/false value. This array keep track of the taken / untaken status of the branches. H : Length of History Register. Output: An integer value generated by the predictor algorithm to predict current branch. Table I: The prediction algorithm. void branch_update *predict (branch_info & b) { bi = b; if (b.br_flags & BR_CONDITIONAL) { address = ( ((b.address >> 4 ) & 0x0F )<<2) | ((b.address>>2)) & 0x03; output = W[address][0][0]; for (int i=0; i<H; i++) { if ( GHR[i] == true ) output += W[address][GA[i]][i]; else if (GHR[i] == false) output -= W[address][GA[i]][i]; } u.direction_prediction(output>=0); } else { u.direction_prediction (false); } u.target_prediction (0); return &u; }
Table II: The update/train algorithm void update (branch_update *u, bool taken, unsigned int target) { if (bi.br_flags & BR_CONDITIONAL) { if ( abs(output)< theta || ( (output>=0) != taken) ){ if (taken == true ) { if (W[address][0][0] < SAT_VAL) W[address][0][0] ++; } else { if (W[address][0][0] > (-1) * SAT_VAL) W[address][0][0] --; } for(int i=0; i<H-1; i++) { if(GHR[i] == taken ) { if (W[address][GA[i]][i] < SAT_VAL) W[address][GA[i]][i] ++; } else { if (W[address][GA[i]][i] > (-1) * SAT_VAL+1 ) W[address][GA[i]][i] --; } } } shift_update_GA(address); shift_update_GHR(taken); } }
IV. TUNING PERFORMANCE
Besides the algorithm, the MPKI (Miss Per Kilo Instruction) rate of the algorithm depends on the size of various dimension of the array W. I have experienced MPKI against various dimension of W. The result of my experiment is shown below. Table 1 shows the result of the experiment. Table I : MPKI rate of the Piecewise Linear Algorithm with limited budget of 64K
W[i][GA[i]][GHR[i] MPKI
W[64][16][64] 3.982 W[128][16][32] 4.217 W[64][8][128] 4.292 W[32][16][128] 5.807 W[64][64][16] 4.826
The table shows that the predictor performs better when i, GA[i], GHR[i] has corresponding 64,16,64 entries.
V. TWEAKING INSTRUCTION ADDRESS
I have found that rather than taking the last bits from the address, discarding the 2 least significant bits of the address and then taking 3-8 bits make the predictor predicts more accurately. It decreases the aliasing and thus improves prediction rate a little bit.
![Page 3: Branch prediction contest_report](https://reader036.fdocuments.us/reader036/viewer/2022073116/547b35fc5906b57c798b45cc/html5/thumbnails/3.jpg)
Fig. 3: Tweaking Branch address for performance
speed up.
VI. RESULT
Misprediction rate of the benchmarks according to the piecewise linear algorithm is shown in fig 4. Fig.5 shows comparison of different prediction algorithms(piecewise linear, perceptron and gshare) against various given benchmarks.
0
2
4
6
8
10
12
14
164.gzip
175.vpr
176.gcc
181.m
cf
186.crafty
197.parser
201.compress
202.jess
205.raytrace
209.db
213.javac
222.m
pegaudio
227.m
trt
228.jack
252.eon
/253.perlbmk
254.gap
255.vortex
256.bzip2
300.twolf
Fig 4: Misprediction rate of different benchmarks using
piecewise linear prediction algorithm
Fig 5: Comparison of prediction algorithms against different benchmarks on given 64K budget.
VII. 64K BUDGET CALCULATION
I have limited the implementation of piecewise linear prediction algorithm within 64K + 256 byte memory. The algorithm performs better as I increase the memory limit. In table II, I have shown the calculation of 64K + 256 byte budget.
Table II: 64 K ( 65,532 Byte) memory budget limit calculation DataStructure/Array/Variable
Memory calculation
W[64][16][63] of each 1 Byte long
64,512 byte
Constants(SIZE,H,SAT_VAL,theta,N)
5*1 byte ( each value < 128)
(GA[63] * 6 bits / 8) byte 48 byte (GHR[63] * 1 bit / 8) byte 8 byte vaiables (address , output ) * 4 byte
8 byte
Total: 64,581 byte
VIII CONCLUSION
In this individual course final project, I have tried to implement the piecewise linear branch prediction algorithm. . In my implementation, I have achieved a MPKI of 3.988 at best. I think, it is also possible to enhance the performance of this algorithm with better implementation tricks. I have also compared the performance of piecewise prediction algorithm with perceptron and gshare algorithms. With the same memory limit, piecewise prediction performs significantly better than the other two.
REFERENCES
[1] Daniel A. Jimenez. Piecewise linear branch prediction. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA-32), June 2005.
[2] D. Jimenez and C. Lin. Dynamic branch prediction with per-ceptrons. In Proceedings of the Seventh International Sym-posium on High Performance Computer Architecture,Jan-uary 2001
[3] Lakshminarayanan, Arun; Shriraghavan, Sowmya,
“Neural Branch Prediction” available at http://webspace.ulbsibiu.ro/lucian.vintan/html/neu ralpredictors.pdf
[4] D.A. Jimenez, “Fast Path-Based Neural Branch Prediction,” Proc. 36th Ann. Int’l Symp. Microarchitecture, pp. 243-252, Dec. 2003.
[5] D.A. Jimenez, “An optimized scaled neural branch predictor,” Computer Design (ICCD), 2011 IEEE 29th International Conference, pp. 113 - 118, Oct. 2011.