Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o...
Transcript of Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o...
![Page 1: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/1.jpg)
1
Dynamic Instruction Reuse
Avinash Sodani
Guri Sohi
Computer Sciences Department
University of Wisconsin — Madison
![Page 2: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/2.jpg)
Dynamic Instruction Reuse 2Avinash SodaniU. of Wisconsin-Madison
Motivation
• Programs consist of static instructions
• Execution sees static instruction many times
- often with same inputs
➔ produces same result
➔ no need to compute again
Exploited by
• Buffer results of instructions
• Reuse old result if input operands are same
Dynamic Instruction Reuse
![Page 3: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/3.jpg)
Dynamic Instruction Reuse 3Avinash SodaniU. of Wisconsin-Madison
Instruction Reuse
+ permits dependent instructions to issue earlier
+ reduces resource contention
+ salvages useful work from misprediction squashes
+ completes chains of dependent instruction in single cycle
➔ potentially breaks dataflow limit
Advantages
![Page 4: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/4.jpg)
Dynamic Instruction Reuse 4Avinash SodaniU. of Wisconsin-Madison
Outline
• Motivation
• What enables reuse ?
• Implementing Reuse
• Three Reuse Schemes
• Some Results
• Summary
![Page 5: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/5.jpg)
Dynamic Instruction Reuse 5Avinash SodaniU. of Wisconsin-Madison
What enables reuse?
• General Reuse
- due to the way computation is expressed
➔ same code visited with same data
• Squash Reuse
- due to mis-speculation
![Page 6: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/6.jpg)
Dynamic Instruction Reuse 6Avinash SodaniU. of Wisconsin-Madison
find (key, list)
foreach element in list
access element
if (key == element) found
not found
General Reuse
find (a, list) find(b, list)
same computationiter 1
iter 2
![Page 7: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/7.jpg)
Dynamic Instruction Reuse 7Avinash SodaniU. of Wisconsin-Madison
Squash Reuse
path
path
Dynamicinstructionstream
branch
Squashed
correct
Reused
incorrect
![Page 8: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/8.jpg)
Dynamic Instruction Reuse 8Avinash SodaniU. of Wisconsin-Madison
Outline
• Motivation
• What enables reuse?
• Implementing Reuse
• Three Reuse Schemes
• Some Results
• Summary
![Page 9: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/9.jpg)
Dynamic Instruction Reuse 9Avinash SodaniU. of Wisconsin-Madison
• Buffer results of instructions: Reuse Buffer (RB)
• Reuse if operands same as in previous execution: Reuse Test
RB
- Indexed by PC
- Reuse Test
- Selective invalidation
PCInvali-date
EventsReuse Test
Reused inst.
ooo
Implementing Reuse
![Page 10: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/10.jpg)
Dynamic Instruction Reuse 10Avinash SodaniU. of Wisconsin-Madison
Integrating RB in pipeline
• RB access begins in fetch stage
• Reuse happens in decode stage
FetchUnit
Decode&
Rename
OOOExecution Commit
ReuseBufferPC
![Page 11: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/11.jpg)
Dynamic Instruction Reuse 11Avinash SodaniU. of Wisconsin-Madison
Issues
• What information stored in RB?
• How is Reuse Test done?
• How is the information kept consistent?
![Page 12: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/12.jpg)
Dynamic Instruction Reuse 12Avinash SodaniU. of Wisconsin-Madison
Reuse Schemes
• Scheme Sv : operand values
+ most aggressive
- lot of bits
• Scheme Sn : operand names
+ few bits
- too conservative
• Scheme Sn+d : operand names + dependences
+ improvement on Sn
+ Sv performance at (near) Sn cost
![Page 13: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/13.jpg)
Dynamic Instruction Reuse 13Avinash SodaniU. of Wisconsin-Madison
Scheme Sv
What to store in RB?
• Store results and operand values
Reuse Test
• Reuse result if operand values are same
How to keep RB consistent?
• loads invalidated when memory location overwritten.
• other instructions not invalidated
![Page 14: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/14.jpg)
Dynamic Instruction Reuse 14Avinash SodaniU. of Wisconsin-Madison
Scheme Sv
PC: r2 + r3r1
PC: r2 + r3r1
20 3050
20 3050
r2 r3
r2 r3
20 30
20 30
RB Register contentsTime
ooo
o
oo
== ==
Reuse result
(cont’d)
![Page 15: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/15.jpg)
Dynamic Instruction Reuse 15Avinash SodaniU. of Wisconsin-Madison
Scheme Sn
What to store in RB?
• Store result and operand names
Reuse Test
• Reuse if result valid : valid bit
How to keep RB consistent?
• invalidate result when operand name overwritten
![Page 16: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/16.jpg)
Dynamic Instruction Reuse 16Avinash SodaniU. of Wisconsin-Madison
Scheme Sn (cont’d)
A : r1 B : r3
r2 + 3r1 + 4
A r2B r1
operand name
R : r1 4 invalidate A r2B r1
operand nameo
o
ReusedA : r1
B : r3r2 + 3r1 + 4
A r2operand nameo
Time
T1
T2
T3
Dynamic instructions RB contents
B performs same computation — but not reused by Sn
oo
o
![Page 17: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/17.jpg)
Dynamic Instruction Reuse 17Avinash SodaniU. of Wisconsin-Madison
Scheme Sn+d
What to store in RB?
• Store result, name and dependences
Reuse Test
• A reused if valid : valid bit
• B reused if A is latest producer of r1
How to keep RB consistent?
• Invalidate chain when inputs overwritten
r1 r2 + 3
r3 r1 + 4
A
B
r1 r2 + 3
r3 A + 4
A
B
r2
![Page 18: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/18.jpg)
Dynamic Instruction Reuse 18Avinash SodaniU. of Wisconsin-Madison
Scheme Sn+d (cont’d)
Time
T1
T2
T3
Dynamic instructions RB contents
A : r1 B : r3
r2 + 3r1 + 4
A r2B A
operand name
o
R : r1 4 A r2B A
operand name
(B not invalidated)
o
ReusedA : r1
B : r3r2 + 3r1 + 4
A r2operand name
B A
oo
Dependent chain reused — possibly in the same cycle
o
o
![Page 19: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/19.jpg)
Dynamic Instruction Reuse 19Avinash SodaniU. of Wisconsin-Madison
Experimental Evaluation
• OOO execution: window of 32 inst.: 4-way superscalar
• BTB: 2048 entries with 2-bit counters
• I-cache: 16K direct mapped, 32 byte line
• D-cache: 16K 2-way set assoc., 32 byte line
• Reuse Buffer
- Size: 32, 128 and 1024 entries
- Fully assoc. and 4-way set assoc. with FIFO replacement
- 4 reads, 4 writes and 4 invalidations per cycle
• Benchmarks: Spec95 Int, Spec92 Int and others
![Page 20: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/20.jpg)
Dynamic Instruction Reuse 20Avinash SodaniU. of Wisconsin-Madison
0
10
20
30
40
50
60
70
80
90
100
gcc
compress
eqntott
espressoxlisp
yacr2mpeg go
m88ksim perlijpeg
vortex
32 RB entries128 RB entries1024 RB entries
Percentage Reuse (Sv)
Significant reuse observed
![Page 21: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/21.jpg)
Dynamic Instruction Reuse 21Avinash SodaniU. of Wisconsin-Madison
0
10
20
30
40
50
60
70
80
90
100
32 RB entries128 RB entries1024 RB entries
gcc
compress
eqntott
espressoxlisp
yacr2mpeg go
m88ksim perlijpeg
vortex
Percentage Speedup (Sv)
Speedups significant too
![Page 22: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/22.jpg)
Dynamic Instruction Reuse 22Avinash SodaniU. of Wisconsin-Madison
Comparing Schemes
0
10
20
30
40Harmonic mean
32 128 1024RB Entries
Scheme SvScheme SnScheme Sn+d
Per
cent
![Page 23: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/23.jpg)
Dynamic Instruction Reuse 23Avinash SodaniU. of Wisconsin-Madison
Summary
Instruction Reuse
• reduces critical path of the computation
• reduces contention for resources
• reduces mis-prediction penalty
Significant instruction reuse
• in some cases > 50% : typically ~ 20%
Speedups also significant
• in several cases > 20% : typically ~ 10%
![Page 24: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/24.jpg)
Dynamic Instruction Reuse backupAvinash SodaniU. of Wisconsin-Madison
Reuse per Inst. Category
• Reuse prevalent among all instruction categories
• About 15% from load values
• About 25-35% from address calculation
0
10
20
30
40
50
60
70
80
90
100
32 128 1024 32 128 1024 32 128 1024
Per
cent
integer : immediate
integer : one reg
integer : two reg
control
addr calc
loads
40-50% of Reuse
![Page 25: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/25.jpg)
Dynamic Instruction Reuse 25Avinash SodaniU. of Wisconsin-Madison
Mean Speedups
0
10
20
30
40
max spdup 17 14 19 26 15 28 43 17 30
32 128 1024RB Entries(%)
Harmonic mean
Scheme SvScheme SnScheme Sn+d
Per
cent
![Page 26: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/26.jpg)
Dynamic Instruction Reuse backupAvinash SodaniU. of Wisconsin-Madison
Squash Vs. General Reuse
• Recovering squashed work gives significant reuse.
• Squash reuse buys more — but general reuse important too
0
20
40
60
80
100
A Bgcc
C A B
compress
C A Bgo
C A B
ijpeg
C A Bvortex
C
Per
cent
0
20
40
60
80
100
A Bgcc
C A B
compress
C A Bgo
C A Bijpeg
C A B
vortex
C
Per
cent
squash reuse general reuse
Reuse Performance
A = 32 entries B = 128 entries C = 1024 entries
![Page 27: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/27.jpg)
Dynamic Instruction Reuse backupAvinash SodaniU. of Wisconsin-Madison
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
gcc compress
eqntott espresso
xlisp yacr2
mpeg go
m88ksim perl
ijpeg vortex
Benchmarks
Nor
mal
ized
res
olut
ion
late
ncy
Collapsing true dependences
Data dependence resolution latency32 RB entries128 RB entries1024 RB entries
![Page 28: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/28.jpg)
Dynamic Instruction Reuse backupAvinash SodaniU. of Wisconsin-Madison
Related Work
• Harbison’s value cache (Tree machine)
• Richardson’s result cache.
• Oberman and Flynn’s division and reciprocal caches
Key Difference
• They use address or operand values as index
– limits the usefulness to long latency operations
– Cannot reuse dependent chain in the same cycle
![Page 29: Avinash Sodani Guri Sohiftp.cs.wisc.edu/sohi/talks/1997/isca.reuse.pdf · Reuse Test Reused inst. o o o Implementing Reuse. Avinash Sodani Dynamic Instruction Reuse 10 U. of Wisconsin-Madison](https://reader033.fdocuments.us/reader033/viewer/2022041821/5e5ddc08c67f5f6cc63f9d85/html5/thumbnails/29.jpg)
Dynamic Instruction Reuse backupAvinash SodaniU. of Wisconsin-Madison
Squash Reuse
B = A + 1
A = 0
A = 2
Control Flow Graph
incorrect path correct path