VLSI Project

VLSI ProjectVLSI Project

Neural Networks based Neural Networks based Branch Prediction Branch Prediction

Alexander Zlotnik Marcel Apfelbaum

Supervised by: Michael Behar, Spring 2005

VLSI Project Spring 2005VLSI Project Spring 2005 22

IntroductionIntroduction Branch Prediction has always been a “hot” topicBranch Prediction has always been a “hot” topic

20% of all instructions are branches20% of all instructions are branches Correct prediction makes faster executionCorrect prediction makes faster execution

Misprediction has high costsMisprediction has high costs Classic predictors are based on 2 bit counter Classic predictors are based on 2 bit counter

state-machinesstate-machines

00SNT

taken

not-taken

taken taken

taken

not-taken not-taken

not-taken01

WNT10WT

11ST


Introduction (cont.)Introduction (cont.)

Modern predictors are 2 level and use Modern predictors are 2 level and use 2 bit 2 bit counters counters and branch history (local\global)and branch history (local\global)

Known problems are: Known problems are: • Memory size exponential to history lengthMemory size exponential to history length• Too long history can cause errorsToo long history can cause errors

Recent studies exploreRecent studies explore

Branch Prediction using Neural Networks


Project ObjectiveProject Objective Develop a mechanism for branch predictionDevelop a mechanism for branch prediction

Explore the practicability and applicability of such Explore the practicability and applicability of such mechanism and explore its success rates mechanism and explore its success rates

Use of a known Neural Networks technology: The Use of a known Neural Networks technology: The PerceptronPerceptron

Compare and analyze against “old” predictorsCompare and analyze against “old” predictors


Project RequirementsProject Requirements

Develop for SimpleScalar platform to Develop for SimpleScalar platform to simulate OOOE processorssimulate OOOE processors

Run developed predictor on accepted Run developed predictor on accepted benchmarksbenchmarks

C languageC language

No hardware components equivalence No hardware components equivalence needed, software implementation onlyneeded, software implementation only


Background and TheoryBackground and Theory

PerceptronPerceptron


Background and Theory (cont.)Background and Theory (cont.)

Perceptron TrainingPerceptron TrainingLet ө=training threshold

t=1 if the branch was taken, or -1 otherwise x=history vector

if (sign( yout ) != t) or |yout|<= ө then

for i := 0 to n dowi := wi + t*xi

end forend if


Development StagesDevelopment Stages

1.1. Studying the backgroundStudying the background

2.2. Learning SimpleScalar platformLearning SimpleScalar platform

3.3. Coding a "dummy" predictor and using it to be Coding a "dummy" predictor and using it to be sure that we understand how branch prediction sure that we understand how branch prediction is handled in the SimpleScalar platformis handled in the SimpleScalar platform

4.4. Coding the perceptron predictor itselfCoding the perceptron predictor itself

5.5. Coding perceptron behavior revealerCoding perceptron behavior revealer

6.6. Benchmarking (smart environment)Benchmarking (smart environment)

7.7. A special study of our suggestion regarding A special study of our suggestion regarding perceptron predictor performance perceptron predictor performance


PrinciplesPrinciples Branch prediction needs a learning methodology, NN Branch prediction needs a learning methodology, NN

provides it based on inputs and outputs (patterns provides it based on inputs and outputs (patterns recognition)recognition)

As history grows, the data structures of our predictor grows As history grows, the data structures of our predictor grows linearly.linearly.

We use a perceptron to learn correlations between particular branch outcomes in the global history and the behavior of the current branch. These correlations are represented by the weights. The larger the weight, the stronger the correlation, and the more that particular branch in the history contributes to the prediction of the current branch. The input to the bias weight is always 1, so instead of learning a correlation with a previous branch outcome, the bias weight learns the bias of the branch, independent of the history.


Design and ImplementationDesign and Implementation


Hardware budgetHardware budget

History lengthHistory length

Long history length -> less perceptronsLong history length -> less perceptrons ThresholdThreshold

The threshold is a parameter to the perceptron

training algorithm that is used to decide whether the predictor needs more training.

Representation of weightsRepresentation of weights

Weights are signed integers.Weights are signed integers.

Nr of bits = 1 + floor(log(Nr of bits = 1 + floor(log(ΘΘ)).)).


AlgorithmAlgorithm

Fetch stageFetch stage1. The branch address is hashed to produce an index i Є 0..n - 1 into the table of perceptrons.2. The i-th perceptron is fetched from the table into a vector register, of weights P.3.The value of y is computed as the dot product of P and the global history register.

4.The branch is predicted not taken when y is negative, or taken otherwise.


Algorithm (cont.)Algorithm (cont.)

Execution stageExecution stage

11.. Once the actual outcome of the branch becomes known, the training algorithm uses this outcome and the value of y to update the weights in P (training)

2. P is written back to the i-th entry in the table.


Simulation ResultsSimulation Results

In all parameters Perceptron basedIn all parameters Perceptron based

predictor outran the GSHAREpredictor outran the GSHARE

Simulation done over Benchmarks of Simulation done over Benchmarks of

VPR, Perl, Parser from the ss_spec2k VPR, Perl, Parser from the ss_spec2k


Simulation Results (cont.)Simulation Results (cont.)Neural on VPR

0.9869 0.9863 0.9859 0.98550.9879 0.9875

0.93

0.94

0.95

0.96

0.97

0.98

0.99

1

15/64, 5760 15/128, 11520 15/256, 23040 15/512, 46080 15/1024, 92160 15/2048, 184320

GHr/Percpetrons, Memory

Predic

tion Ra

te

GSHARE on VPR

0.9325

0.9487

0.96260.9644

0.97160.9737

0.9773 0.9781 0.9785 0.9785

0.93

0.94

0.95

0.96

0.97

0.98

0.99

1

8, 512 9, 1024 10, 2048 11, 4096 12, 8192 13, 16384 14, 32768 15, 65536 16, 131072 17, 262144

GHr, Memory

Predic

tion R

ate


Simulation Results (cont.)Simulation Results (cont.)GSHARE on VPR - Instructions Per Cycle

1.8013

1.8283

1.8538 1.8533

1.86741.8719

1.877 1.8782 1.8794 1.8793

1.8

1.84

1.88

1.92

1.96

2

8, 512 9, 1024 10, 2048 11, 4096 12, 8192 13, 16384 14, 32768 15, 65536 16, 131072 17, 262144

GHr, Memory

IPC

NEURAL on VPR - Instructions Per Counter

1.93621.9311 1.928 1.9313

1.937 1.9375

1.8

1.84

1.88

1.92

1.96

2

15/64, 5760 15/128, 11520 15/256, 23040 15/512, 46080 15/1024, 92160 15/2048, 184320

GHr/Percpetrons, Memory

IPC


Perceptron Prediction by GHr

0.98

0.982

0.984

0.986

0.988

0.99

0.992

10 15 20 25 30

GHr size

Pre

dic

tio

n r

ate

256

64

1024

2048

Simulation Results (cont.)Simulation Results (cont.)


Special ProblemsSpecial Problems

Software simulation of hardwareSoftware simulation of hardware• Utilizing existing data structures of Utilizing existing data structures of

SimpleScalarSimpleScalar

Compiling self written programs for Compiling self written programs for SimpleScalarSimpleScalar• After several weeks of hard work we decided After several weeks of hard work we decided

to use accepted benchmarksto use accepted benchmarks


SummarySummary

We implemented a We implemented a differentdifferent branch prediction branch prediction mechanism and received exciting resultsmechanism and received exciting results

Hardware implementation of the mechanism is Hardware implementation of the mechanism is hard, but possiblehard, but possible

Longer history in perceptron helps getting better Longer history in perceptron helps getting better predictionspredictions

VLSI Project

Documents

Transcript of VLSI Project