1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard...
-
Upload
reynold-mcgee -
Category
Documents
-
view
212 -
download
0
Transcript of 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard...
![Page 1: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/1.jpg)
1
J. Bradley Chen and Bradley D. D. Leupen
Division of Engineering and Applied Sciences
Harvard University
Improving Instruction Locality with Just-In-Time Code Layout
![Page 2: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/2.jpg)
2
Goals
• Improve instruction reference locality–big problem for commodity applications
• Eliminate need for profile information–required by current compiler-based solutions
![Page 3: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/3.jpg)
3
How?
Implement layout dynamically using Activation Order:• A new heuristic for code layout.• Locate procedures in order of use.
![Page 4: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/4.jpg)
4
Requirements
• No special hardware support.
• Minimal changes to the operating system.
• Minimal system overhead.
![Page 5: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/5.jpg)
5
Optimizing Procedure Layout
Bad Layout Better Layout
![Page 6: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/6.jpg)
6
Current Practice: Pettis and Hansen
• Nodes are procedures.
• Edges are caller/callee pairs.
• Weights are call frequency.
WinMain()
Initialize()
EventLoop()
GetEvent() React(
)
HandleRareCase()
HandleInputError()
CheckForInputError()
HandleCommonCase()
1
1
129394
68754
128404
1
10
68753
![Page 7: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/7.jpg)
7
Pettis and Hansen Layout
EventLoop()
GetEvent()React()
CheckForInputError()
HandleCommonCase()
129394 68754
128404
68753
EventLoop()
React()
HandleCommonCase()
129394 68754
68753
Node-1
layout: [] layout: [GetEvent,CheckForInputErrors]
Node-2
React()
HandleCommonCase()
68754
68753
layout: [EventLoop, GetEvent,CheckForInputErrors]
Node-3
HandleCommonCase()
68753
layout: [React, EventLoop,GetEvent,CheckForInputErrors]
Node-4
layout: [HandleCommonCase, React,EventLoop, GetEvent,CheckForInputErrors]
![Page 8: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/8.jpg)
8
A New Heuristic
Code P&H AO
void main(){ Initialize(); Do some stuff; for (10000 interations) { Do some stuff; foo(); bar(); }}
foo()main()bar()Initialize()
main()Initialize()foo()bar()
Activation Order: Co-locate procedures that are activated sequentially.
Example:
![Page 9: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/9.jpg)
9
Implementing JITCL __start: perform initializations call thunk_main
thunk_main:. . .
thunk_foo:. . .
__InstructionMemory:
Thunk routines implement code layout on-the-fly.
![Page 10: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/10.jpg)
10
Thunk routines
// Global variables:// ProcPointers[] - one element per procedure// INDEX_proc and LENGTH_proc for each procedurethunk_main: if (InCodeSegment(ProcPointers[INDEX_main])) ProcPointers[INDEX_main] = CopyToTextSegment(ProcPointer[INDEX_main],
LENGTH_main); PatchCallSite(ProcPointer[INDEX_main], ComputeCallSiteFromReturnAddress(RA)); jmp ProcPointer[INDEX_main];
The thunk routines copy procedures into the textsegment and update call sites at run-time.
![Page 11: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/11.jpg)
11
Simulation Methodology
8K 8KCache Size
Direct-Mapped 2-WayAssociativity
ATOM EtchSimulation
UNIX/RISC Win32/x86
![Page 12: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/12.jpg)
12
Workloads
Benchmark Description Text Size
UNIX
compress file compression 112
Gcc The GNU C compiler 1552
m88ksim Simulation of Motorola 88K 160
Perl The perl scripting language 376
Raytrace Image rendering 192
Xanim MPEG player 2024
Win32
Mazelord Maze game 1445
Perfmon Windows NT system utility 2805
Wordpro 96 Lotus document preparation 5148
Word 7 Microsoft document preparation 7694
IE302 Microsoft web browser product 4990
![Page 13: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/13.jpg)
13
Results
• The AO heuristic is effective.• The overhead of JITCL is negligible.• JITCL improves procedure layout without
requiring profile information.• JITCL reduces program memory requirements.
![Page 14: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/14.jpg)
14
Results: The AO Heuristic
0
0.01
0.02
0.03
0.04
co
mp
res
s
gc
c
m8
8k
sim
pe
rl
ray
tra
ce
xa
nim
Pettis & Hansen
Activation Order
Improvement in I-CacheMiss Rate
Conclusion: Effectiveness of heuristic is comparable to P&H.
![Page 15: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/15.jpg)
15
Overhead of JITCL
• Copy overhead– instruction overhead– cache overhead
• Cache consistency
• Disk overhead - comparable to demand loaded text; not evaluated.
![Page 16: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/16.jpg)
16
Results: Overhead
0
0.025
0.05
0.075
0.1
com
pre
ss
gcc
m88
ksim
per
l
rayt
race
xan
im
OverheadInstructions(%)
Conclusion: JITCL Overhead is less than 0.1% in all cases.
![Page 17: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/17.jpg)
17
Results: Performance
0
0.2
0.4
0.6
0.8
1
1.2
co
mp
res
s
gc
c
m8
8k
sim
pe
rl
ray
tra
ce
xa
nim
Pettis & Hansen
JITCL
SavedCycles perInstruction
Conclusion: Overall performance is comparable to P&H.
![Page 18: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/18.jpg)
18
JITCL for Win32 Applications
• Windows applications are composed of multiple executable modules.
• When transitions between modules are frequent, intra-module code layout is less effective.
• With JITCL, inter-module code layout is possible and beneficial.
![Page 19: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/19.jpg)
19
Win32 Cache Miss Rates
0
0.01
0.02
0.03
0.04
0.05
0.06
mazelord perfmon wordpro96
word 7 ie302
default
P&H
JITCL
L1 CacheMiss Rate
Conclusion: Careful layout did not help Win32 applications.
![Page 20: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/20.jpg)
20
Text Segment Size
0
500
1000
1500
2000
2500
3000
3500
mazelord perfmon wordpro96
word 7 ie302
default
JITCL
Text size inmegabytes
Conclusion: JITCL typically reduces text size by 50%.
![Page 21: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/21.jpg)
21
JITCL vs. PBO
• JITCL provides an alternative to feedback-based procedure layout.
• Many important optimizations still require profile information.– instruction scheduling– register allocation– other intra-procedural optimizations
• Don’t expect profile-based optimization to go away!
![Page 22: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/22.jpg)
22
Conclusions
Just-In-Time code layout achieves comparable benefit to profile-based code layout without the need for profiles.• The AO heuristic is effective.• The overhead of procedure copying is low.• Benefit in I-Cache is comparable to Pettis and
Hansen layout.• JITCL can reduce working set size.
![Page 23: 1 J. Bradley Chen and Bradley D. D. Leupen Division of Engineering and Applied Sciences Harvard University Improving Instruction Locality with Just-In-Time.](https://reader035.fdocuments.us/reader035/viewer/2022070414/5697c0251a28abf838cd4e56/html5/thumbnails/23.jpg)
23
The Morph Project
oM phr
For more information:http://www.eecs.harvard.edu/morph/