Combining Statistical and Symbolic Simulation
-
Upload
hayden-dodson -
Category
Documents
-
view
30 -
download
1
description
Transcript of Combining Statistical and Symbolic Simulation
![Page 1: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/1.jpg)
Combining Statistical and Symbolic Simulation
Mark Oskin
Fred Chong and Matthew FarrensDept. of Computer Science
University of California at Davis
![Page 2: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/2.jpg)
Overview
• HLS is a hybrid performance simulation– Statistical + Symbolic
• Fast
• Accurate
• Flexible
![Page 3: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/3.jpg)
Motivation
Branch prediction accuracy
0.74 0.76 0.78 0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94
IPC
0.80
0.85
0.90
0.95
1.00
1.05
1.10
1.15 I-cache hit rate
I-cache miss penaltyBranch miss-predictpenalty
Basic block size
Dispatch bandwidth
![Page 4: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/4.jpg)
Motivation
• Fast simulation– seconds instead of hours or days– Ideally is interactive
• Abstract simulation– simulate performance of unknown designs– application characteristics not applications
![Page 5: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/5.jpg)
Outline
• Simulation technologies and HLS
• From applications to profiles
• Validation
• Examples
• Issues
• Conclusion
![Page 6: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/6.jpg)
Design Flow with HLS
Cycle-by-Cycle
Simulation
HLS
Profile
Design Issue
Design Issue
Design Issue
Possible solution
EstimatePerformance
![Page 7: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/7.jpg)
Traditional Simulation Techniques
• Cycle-by-cycle (Simplescalar, SimOS,etc.)
+ accurate
– slow
• Native emulation/basic block models (Atom, Pixie)
+ fast, complex applications
– useful to a point (no low-level modifications)
![Page 8: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/8.jpg)
Statistical / Symbolic Execution
• HLS+ fast (near interactive)
+ accurate / – within regions
+ permits variation of low-level parameters
+ arbitrary design points / – use carefully
![Page 9: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/9.jpg)
HLS: A Superscalar Statistical and Symbolic Simulator
L2
Cac
he
L1
I-ca
che
L1
D-c
acheM
ain
Mem
ory
BranchPredictor
Fet
ch U
nit
Ou
t of
ord
erD
ispa
tch
Un
it
Ou
t of
ord
erC
ompl
etio
n U
nit
Ou
t of
ord
erE
xecu
tion
cor
e
Statistical Symbolic
![Page 10: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/10.jpg)
WorkflowCode
Binary
sim-stat
sim-outorderapp profile
Stat-binary
HLS
machine-profile
R10k
machine-configuration
![Page 11: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/11.jpg)
Machine Configurations
• Number of Functional units (I,F,[L,S],B)
• Functional unit pipeline depths
• Fetch, Dispatch and completion bandwidths
• Memory access latencies
• Mis-speculation penalties
![Page 12: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/12.jpg)
Profiles• Machine profile:
– cache hit rates => ()– branch prediction accuracy => ()
• Application profile:– basic block size => (,)– instruction mix (% of I,F,L,S,B)– dynamic instruction distance (histogram)
0
1020
3040
50
Integer FloatingPoint
Load Store Branch
Instruction TypePer
cen
t o
f to
tal D
ynam
ic
Dep
end
ence
Dis
tan
ce None1-19
20-100
![Page 13: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/13.jpg)
Statistical Binary
• 100 basic blocks
• Correlated:– random instruction mix– random assignment of dynamic instruction
distance– random distribution of cache and branch
behaviors
![Page 14: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/14.jpg)
Statistical Binary
load (l1 i-cache, l2 i-cache, l1 d-cache l2 d-cache, dependence 0)
integer (l1 i-cache, l2 i-cache, dependence 0, dependence 1)
integer (l1 i-cache, l2 i-cache, dependence 0, dependence 1)
branch (l1 i-cache, l2 i-cache, branch-predictor accr., dep 0, dep 1)
store (l1 i-cache, l2 i-cache, l1 d-cache l2 d-cache, dep 0, dep 1)
load (l1 i-cache, l2 i-cache, l1 d-cache l2 d-cache, dependence 0)
core functionalunit requirements
cache behaviorduring I-fetch cache behavior
during data access
dynamic instruction distancebranch predictor behavior
![Page 15: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/15.jpg)
HLS Instruction Fetch Stage
integer (...)
branch (...)
store (...)
load (...)
integer (...)
branch (...)
load (...)
integer (..)
Similar to conventional instruction fetch:
- has a PC- has a fetch window- interacts with caches- utilizes branch predictor- passes instructions to dispatch
Differences:
- caches and branch predictor are statistical models
Fetches symbolic instructions and interacts with a statisticalmemory system and branch predictor model.
![Page 16: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/16.jpg)
Validation - SimpleScalar vs. HLS
Brenchmark SimpleScalar IPC HLS IPC Errorperl 1.27 1.32 4.20%compress 1.18 1.25 5.50%gcc 0.92 0.96 3.90%go 0.94 1.01 6.80%ijpeg 1.67 1.73 3.90%li 1.62 1.5 7.20%m88ksim 1.16 1.14 1.50%vortex 0.87 0.83 5.10%
![Page 17: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/17.jpg)
Validation - R10k vs. HLS
Brenchmark R10K HLS IPC Errorperl 1.01 1.09 7.00%compress 0.7 0.69 2.60%gcc 0.93 0.96 3.80%go 0.9 0.98 0.90%ijpeg 1.45 1.4 4.00%li 0.85 0.9 6.00%m88ksim 1.15 1.15 0.10%vortex 0.83 0.82 1.00%
![Page 18: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/18.jpg)
1.61.5
1.41.3
1.2
1.1
1.0
0.9
0.8
0.7
0.6
Branch Prediction Accuracy
0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00
L1
Intr
uct
ion
Ca
che
Hit
Ra
te
0.80
0.82
0.84
0.86
0.88
0.90
0.92
0.94
0.96
0.98
1.002.0
1.91.81.71.6
1.5
1.5
1.4
1.4
1.3
1.3
1.2
1.2
1.1
1.1
1.0
1.00.9
0.90.8
0.80.7
0.70.6
0.6
0.5
0.5
HLS Multi-value Validation with SimpleScalar
HLSSimple-Scalar
(Perl)
![Page 19: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/19.jpg)
HLS Multi-Value Validation with SimpleScalar
HLSSimple-Scalar
(Xlisp)
L1 Instruction Cache Hit rate
0.80 0.85 0.90 0.95 1.00
L1 In
stru
ctio
n C
ache
Mis
s P
enal
ty
2
4
6
8
10
12
14
16
18
20
1.3
1.4
1.21.11.0
0.90.8
0.70.6
0.5
0.4
0.3
0.2
1.5
1.5
1.4
1.4
1.4
1.3
1.3
1.3
1.2
1.2
1.2
1.1
1.1
1.1
1.0
1.0
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.2
![Page 20: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/20.jpg)
Example use of HLS
Branch Prediction Accuracy
0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00
Bas
ic B
lock
Siz
e
10
20
30
40
50
1.3
1.3
1.3
1.2
1.2
1.2
1.1
1.11.01.00.9
0.90.80.80.7
0.70.6
An intuitive result:branch predictionaccuracy becomesless important (crossesfewer iso-IPC contourlines, as basic block sizeincrease).
(Perl)
![Page 21: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/21.jpg)
Example use of HLS
Basic Block Size
2 4 6 8 10 12 14 16 18 20
Dyn
amic
Ins
truc
tion
Dis
tanc
e
2
4
6
8
10
12
14
16
18
20
1.4
1.4
1.4
1.3
1.3
1.3
1.3
1.3
1.21.2
1.2
1.2
1.2
1.2
1.1
1.1
1.1
1.1
1.0
1.0
1.0
1.0
0.9
0.9
0.9
0.9
0.8
0.8
0.8
0.8
0.7
0.7
0.7
0.7
Another intuitive result: gains in IPCdue to basic block size are front-loaded
(Perl)
Trade-off betweenfront-end (fetch/dispatch)and back-end (ILP)processor performance
![Page 22: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/22.jpg)
Example use of HLS
% Value predicted instructions
0 1
Dyn
amic
Ins
truc
tion
Dis
tanc
e
2
4
6
8
10
12
14
16
18
20
1.2
1.2
1.21.1
This spaceintentionallyleft blank.
(Perl)
![Page 23: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/23.jpg)
Related work
• R. Carl and J.E. Smith. Modeling superscalar processors via statistical simulation - PAID Workshop - June 1998.
• N. Jouppi. The non-uniform distribution of instruction-level and machine parallelism and its effect on performance. - IEEE Trans. 1989.
• D. Noonburg and John Shen. Theoretical modeling of superscalar processor performance - MICRO27 - November 1994.
![Page 24: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/24.jpg)
Questions & Future Directions
• How important are different well-performing benchmarks anyway?– easily summarized– summaries are not precise => yet precise enough– Will the statistical+symbolic technique work for
poorly behaved applications?
• Will it extend to deeper pipelines and more real processors (i.e. Alpha, P6 architecture)?
![Page 25: Combining Statistical and Symbolic Simulation](https://reader035.fdocuments.us/reader035/viewer/2022062308/56812e85550346895d94268a/html5/thumbnails/25.jpg)
Conclusion
• HLS: Statistical + Symbolic Execution– Intuitive design space exploration
• Fast
• Accurate
– Flexible
• Validated against cycle-by-cycle and R10k• Future work: deeper pipelines, more hardware
validations, additional domains• source code at: http://arch.cs.ucdavis.edu/~oskin