VLSI Project
-
Upload
norman-combs -
Category
Documents
-
view
23 -
download
1
description
Transcript of VLSI Project
VLSI ProjectVLSI Project
Neural Networks based Neural Networks based Branch Prediction Branch Prediction
Alexander Zlotnik Marcel Apfelbaum
Supervised by: Michael Behar, Spring 2005
VLSI Project Spring 2005VLSI Project Spring 2005 22
IntroductionIntroduction Branch Prediction has always been a “hot” topicBranch Prediction has always been a “hot” topic
20% of all instructions are branches20% of all instructions are branches Correct prediction makes faster executionCorrect prediction makes faster execution
Misprediction has high costsMisprediction has high costs Classic predictors are based on 2 bit counter Classic predictors are based on 2 bit counter
state-machinesstate-machines
00SNT
taken
not-taken
taken taken
taken
not-taken not-taken
not-taken01
WNT10WT
11ST
VLSI Project Spring 2005VLSI Project Spring 2005 33
Introduction (cont.)Introduction (cont.)
Modern predictors are 2 level and use Modern predictors are 2 level and use 2 bit 2 bit counters counters and branch history (local\global)and branch history (local\global)
Known problems are: Known problems are: • Memory size exponential to history lengthMemory size exponential to history length• Too long history can cause errorsToo long history can cause errors
Recent studies exploreRecent studies explore
Branch Prediction using Neural Networks
VLSI Project Spring 2005VLSI Project Spring 2005 44
Project ObjectiveProject Objective Develop a mechanism for branch predictionDevelop a mechanism for branch prediction
Explore the practicability and applicability of such Explore the practicability and applicability of such mechanism and explore its success rates mechanism and explore its success rates
Use of a known Neural Networks technology: The Use of a known Neural Networks technology: The PerceptronPerceptron
Compare and analyze against “old” predictorsCompare and analyze against “old” predictors
VLSI Project Spring 2005VLSI Project Spring 2005 55
Project RequirementsProject Requirements
Develop for SimpleScalar platform to Develop for SimpleScalar platform to simulate OOOE processorssimulate OOOE processors
Run developed predictor on accepted Run developed predictor on accepted benchmarksbenchmarks
C languageC language
No hardware components equivalence No hardware components equivalence needed, software implementation onlyneeded, software implementation only
VLSI Project Spring 2005VLSI Project Spring 2005 66
Background and TheoryBackground and Theory
PerceptronPerceptron
VLSI Project Spring 2005VLSI Project Spring 2005 77
Background and Theory (cont.)Background and Theory (cont.)
Perceptron TrainingPerceptron TrainingLet ө=training threshold
t=1 if the branch was taken, or -1 otherwise x=history vector
if (sign( yout ) != t) or |yout|<= ө then
for i := 0 to n dowi := wi + t*xi
end forend if
VLSI Project Spring 2005VLSI Project Spring 2005 88
Development StagesDevelopment Stages
1.1. Studying the backgroundStudying the background
2.2. Learning SimpleScalar platformLearning SimpleScalar platform
3.3. Coding a "dummy" predictor and using it to be Coding a "dummy" predictor and using it to be sure that we understand how branch prediction sure that we understand how branch prediction is handled in the SimpleScalar platformis handled in the SimpleScalar platform
4.4. Coding the perceptron predictor itselfCoding the perceptron predictor itself
5.5. Coding perceptron behavior revealerCoding perceptron behavior revealer
6.6. Benchmarking (smart environment)Benchmarking (smart environment)
7.7. A special study of our suggestion regarding A special study of our suggestion regarding perceptron predictor performance perceptron predictor performance
VLSI Project Spring 2005VLSI Project Spring 2005 99
PrinciplesPrinciples Branch prediction needs a learning methodology, NN Branch prediction needs a learning methodology, NN
provides it based on inputs and outputs (patterns provides it based on inputs and outputs (patterns recognition)recognition)
As history grows, the data structures of our predictor grows As history grows, the data structures of our predictor grows linearly.linearly.
We use a perceptron to learn correlations between particular branch outcomes in the global history and the behavior of the current branch. These correlations are represented by the weights. The larger the weight, the stronger the correlation, and the more that particular branch in the history contributes to the prediction of the current branch. The input to the bias weight is always 1, so instead of learning a correlation with a previous branch outcome, the bias weight learns the bias of the branch, independent of the history.
VLSI Project Spring 2005VLSI Project Spring 2005 1010
Design and ImplementationDesign and Implementation
VLSI Project Spring 2005VLSI Project Spring 2005 1111
Hardware budgetHardware budget
History lengthHistory length
Long history length -> less perceptronsLong history length -> less perceptrons ThresholdThreshold
The threshold is a parameter to the perceptron
training algorithm that is used to decide whether the predictor needs more training.
Representation of weightsRepresentation of weights
Weights are signed integers.Weights are signed integers.
Nr of bits = 1 + floor(log(Nr of bits = 1 + floor(log(ΘΘ)).)).
VLSI Project Spring 2005VLSI Project Spring 2005 1212
AlgorithmAlgorithm
Fetch stageFetch stage1. The branch address is hashed to produce an index i Є 0..n - 1 into the table of perceptrons.2. The i-th perceptron is fetched from the table into a vector register, of weights P.3.The value of y is computed as the dot product of P and the global history register.
4.The branch is predicted not taken when y is negative, or taken otherwise.
VLSI Project Spring 2005VLSI Project Spring 2005 1313
Algorithm (cont.)Algorithm (cont.)
Execution stageExecution stage
11.. Once the actual outcome of the branch becomes known, the training algorithm uses this outcome and the value of y to update the weights in P (training)
2. P is written back to the i-th entry in the table.
VLSI Project Spring 2005VLSI Project Spring 2005 1414
Simulation ResultsSimulation Results
In all parameters Perceptron basedIn all parameters Perceptron based
predictor outran the GSHAREpredictor outran the GSHARE
Simulation done over Benchmarks of Simulation done over Benchmarks of
VPR, Perl, Parser from the ss_spec2k VPR, Perl, Parser from the ss_spec2k
VLSI Project Spring 2005VLSI Project Spring 2005 1515
Simulation Results (cont.)Simulation Results (cont.)Neural on VPR
0.9869 0.9863 0.9859 0.98550.9879 0.9875
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
15/64, 5760 15/128, 11520 15/256, 23040 15/512, 46080 15/1024, 92160 15/2048, 184320
GHr/Percpetrons, Memory
Predic
tion Ra
te
GSHARE on VPR
0.9325
0.9487
0.96260.9644
0.97160.9737
0.9773 0.9781 0.9785 0.9785
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
8, 512 9, 1024 10, 2048 11, 4096 12, 8192 13, 16384 14, 32768 15, 65536 16, 131072 17, 262144
GHr, Memory
Predic
tion R
ate
VLSI Project Spring 2005VLSI Project Spring 2005 1616
Simulation Results (cont.)Simulation Results (cont.)GSHARE on VPR - Instructions Per Cycle
1.8013
1.8283
1.8538 1.8533
1.86741.8719
1.877 1.8782 1.8794 1.8793
1.8
1.84
1.88
1.92
1.96
2
8, 512 9, 1024 10, 2048 11, 4096 12, 8192 13, 16384 14, 32768 15, 65536 16, 131072 17, 262144
GHr, Memory
IPC
NEURAL on VPR - Instructions Per Counter
1.93621.9311 1.928 1.9313
1.937 1.9375
1.8
1.84
1.88
1.92
1.96
2
15/64, 5760 15/128, 11520 15/256, 23040 15/512, 46080 15/1024, 92160 15/2048, 184320
GHr/Percpetrons, Memory
IPC
VLSI Project Spring 2005VLSI Project Spring 2005 1717
Perceptron Prediction by GHr
0.98
0.982
0.984
0.986
0.988
0.99
0.992
10 15 20 25 30
GHr size
Pre
dic
tio
n r
ate
256
64
1024
2048
Simulation Results (cont.)Simulation Results (cont.)
VLSI Project Spring 2005VLSI Project Spring 2005 1818
Special ProblemsSpecial Problems
Software simulation of hardwareSoftware simulation of hardware• Utilizing existing data structures of Utilizing existing data structures of
SimpleScalarSimpleScalar
Compiling self written programs for Compiling self written programs for SimpleScalarSimpleScalar• After several weeks of hard work we decided After several weeks of hard work we decided
to use accepted benchmarksto use accepted benchmarks
VLSI Project Spring 2005VLSI Project Spring 2005 1919
SummarySummary
We implemented a We implemented a differentdifferent branch prediction branch prediction mechanism and received exciting resultsmechanism and received exciting results
Hardware implementation of the mechanism is Hardware implementation of the mechanism is hard, but possiblehard, but possible
Longer history in perceptron helps getting better Longer history in perceptron helps getting better predictionspredictions