ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School...
-
date post
22-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School...
![Page 1: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/1.jpg)
ABACUS: A Hardware-Based Software Profiler for Modern Processors
Eric Matthews • Lesley ShannonSchool of Engineering Science
Sergey Blagodurov • Sergey Zhuravlev • Alexandra FedorovaSchool of Computing Science
Simon Fraser University, Vancouver, BC, Canada
![Page 2: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/2.jpg)
Overview
Legendary Introduction to ABACUS
Delicious Profiling Units
Epic Conclusion
2
![Page 3: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/3.jpg)
Introduction to ABACUS
3
![Page 4: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/4.jpg)
Introduction to ABACUS
4
![Page 5: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/5.jpg)
Introduction to ABACUS
5
![Page 6: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/6.jpg)
Introduction to ABACUS
6
![Page 7: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/7.jpg)
ABACUS
7
![Page 8: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/8.jpg)
ABACUS
8
ASPLOSrocks!
![Page 9: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/9.jpg)
ABACUS
9
![Page 10: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/10.jpg)
Performance comparison
10
Memory Reuse Profile
ABACUS avg runtime: 48.5seconds
Simics avg runtime: 1 hour 6minutes
ABACUS
Simics
missReuse 0
Reuse 1
01234
namd
Counts
(in
Mil-
lions)
missReuse 0
Reuse 1
0
2
4
hmmer
Counts
(in
Mil-
lions)
![Page 11: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/11.jpg)
Conclusion
ABACUS is a generic profiler that can be easily integrated into modern processors
It can be used by the O/S to obtain runtime information about a thread’s behaviour to make better thread assignments
11
![Page 12: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/12.jpg)
Thank you! Questions?
![Page 13: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/13.jpg)
Motivation
Future systems will be multi-core and heterogeneous
How does the OS place threads on this architecture?
Characterize thread behaviour
Instruction MixMemory Reuse ProfileEffectiveness of pre-fetchingMemory bandwidth utilization
13
![Page 14: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/14.jpg)
Motivation (cont'd)
How are these metrics collected?
Offline analysis
Code Instrumentation
Simulation (e.g., Simics)
Software-based instruction set simulator
Models systems with full OS support
14
![Page 15: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/15.jpg)
Motivation (cont'd)
Why not use current hardware counters?
Architecture-specific
Not all desired metrics provided
Help detect symptoms, not causes
Limited in number and in concurrent use
15
![Page 16: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/16.jpg)
Goal
Create a hardware profiler to collect thread characteristics at runtime
Imposed constraints
External to processor
Minimally invasive
Cycle accurate
OS controllable
16
![Page 17: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/17.jpg)
ABACUS
hArdware-Based Analyzer for the Characterization of User Software
A collection of runtime configurable profiling units
Collects metrics useful for thread placement
Controllable through the O/S
17
![Page 18: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/18.jpg)
Hardware Platform
18
Proof-of-concept System
LEON3 Sparc v8 Instruction Set Architecture
Single core, single threaded
Test System
OpenSparc Niagara T1 soft processor
1 to 4 hardware threads
Multi-core Multi-board support
![Page 19: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/19.jpg)
Hardware Platform (cont'd)
19
![Page 20: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/20.jpg)
ABACUS
20
![Page 21: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/21.jpg)
External InterfaceBus slave and master modules
Processing required on processor signals
Designed such that only external interface changes with different processor/system
21
![Page 22: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/22.jpg)
Portability
22
Previously integrated with a LEON3 (Sparc
v8 ISA) based system
Differences:
AMBA Advanced High-performance Bus (AHB) vs Processor Local Bus (PLB)
Processor internals
![Page 23: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/23.jpg)
ControllerStarts or stops profiling
Can limit profiling to a specific address range
DMA interface for retrieving collected data
Linux device driver support
23
![Page 24: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/24.jpg)
Profiling Units
Operate on one or more processor signals:
Instruction
PC
Cache Reuse Distance
etc.
Store data in a collection of counters
24
![Page 25: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/25.jpg)
Profiling Units (cont'd)Focus on two dimensional metrics
– Gives bigger picture / greater insight
Aim to be as architecture independent as possible
25
![Page 26: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/26.jpg)
Profile UnitBehaves like a traditional software profiler
Operates on Program Counter
26
Range Overlap
TraceRangeNon-Overlap
Code Space
![Page 27: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/27.jpg)
Memory Reuse UnitCollects a measure of code or data reuse
Utilizes Least Recently Used (LRU) stack
Reuse distance is movement in the LRU stack or a miss
Uses in cache contention management
27
![Page 28: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/28.jpg)
Memory Reuse UnitCreates histogram of cache reuse pattern
Range: [0, set associativity – 1] or cache miss
28
Reuse Distance
4-way set-associative reuse profile
![Page 29: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/29.jpg)
Instruction Mix
29
Identify current instruction subset in use
Divide instructions into logical categories
Load/Store
Floating Point
Control Flow
Opcode-based table lookup
![Page 30: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/30.jpg)
Latency Unit
30
Break down miss latency into constituent sources
Bus contention
DRAM latency
etc.
For each category create a histogram of latency in cycles
![Page 31: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/31.jpg)
Stall Unit
31
Break down Cycles Per Instruction
Attribute cycles to their sources
Cache miss
Translation Lookaside Buffer (TLB) miss
Floating Point busy stalls
etc.
![Page 32: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/32.jpg)
Verification
32
Run a subset of the SPECCPU2006 benchmarks
Those with memory usage within board specs
Collect metrics with ABACUS and Simics
Profile for a few billion instructions
Limited by Simics performace
![Page 33: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/33.jpg)
Test Platform
Proof-of-concept System
Single core, single threaded
XUP V2Pro: 90% slice utilization
33
Processor LEON3 (SPARC v8 ISA) (50MHz)
Memory 256MB DDR RAM
OS Debian Etch (4.0)
![Page 34: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/34.jpg)
Simulation Platform
Simics System:
Differences:
SPARC v9 ISA (64-bit processor)
Local filesystem vs NFS
34
Processor UltraSparc II (SPARC v9 ISA)
Memory 256MB DDR RAM
OS Debian Etch (4.0)
![Page 35: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/35.jpg)
LEON3 Comparison
35
missReuse 0
Reuse 1
0
10
20
namd
Counts
(in
Mil-
lions)
missReuse 0
Reuse 1
05
10152025
hmmer
Counts
(in
Mil-
lions)
ABACUS
Simics
![Page 36: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/36.jpg)
LEON3 Comparison (cont'd)
36
missReuse 0
Reuse 1
01234
namd
Counts
(in
Mil-
lions)
missReuse 0
Reuse 1
0
2
4
hmmer
Counts
(in
Mil-
lions)
DC Memory Reuse Profile
ABACUS
Simics
![Page 37: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/37.jpg)
Resource Usage
3737
Default:
0
200
400
600
800
1000
1200
1400
1600
LUT (V2p)LUT (V5)FF
32bit counters 40bit counters 32bit countersProfile Unit added
2–way LRU Instruction Cache2–way LRU Data Cache5 Instruction Types
![Page 38: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/38.jpg)
Conclusion
ABACUS is a generic profiler that can be easily integrated into modern processors
It can be used by the O/S to obtain runtime information about a thread’s behaviour to make better thread assignments
38
![Page 39: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/39.jpg)
Future Plans
Move to multi-core/multi-threaded system
Memory reuse distance independent of existing cache implementation
Process tracking
Integrate results into OS scheduler
39
![Page 40: ABACUS: A Hardware-Based Software Profiler for Modern Processors Eric Matthews Lesley Shannon School of Engineering Science Sergey Blagodurov Sergey Zhuravlev.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d7e5503460f94a61809/html5/thumbnails/40.jpg)
Questions
?