Treegion Instruction Scheduling in...
Transcript of Treegion Instruction Scheduling in...
![Page 1: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/1.jpg)
TINKER Microarchitecture and Compiler Research GroupDepartment of Electrical & Computer EngineeringNorth Carolina State University
Chad RosierGCC Developers’ Summit
June 30th, 2006
Treegion Instruction Scheduling Treegion Instruction Scheduling in GCCin GCC
![Page 2: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/2.jpg)
2
Presentation Outline
• Introduction• Treegion formation• Code size efficiency• Implementation• Summary• Future work
![Page 3: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/3.jpg)
3
Region Formation Techniques
• Trace Scheduling [Fisher and Ellis]– Linear regions, based on profile, optimizes likely path
• Superblock Scheduling [Chang, et al.]– Apply tail duplication to remove side entrances– Robert Kidd working at Tree SSA level
• Hyperblock Scheduling [Mahlke et al.]– Add hardware predication
• Regions are linear in nature
![Page 4: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/4.jpg)
4
Treegion Scheduling [Havanki et al.]
• Tree-shaped subgraph of CFG– single-entry, multiple-exit– acyclic, non-linear
• Multiple paths considered for speculating instructions
• Formation based on CFG• Profile used for scheduling• Additional notable characteristics
– Calculating domination– No merge points
Treegion 0
Treegion 1
Treegion 2
![Page 5: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/5.jpg)
5
Treegion Scheduling
• Treegion formation (basis of this talk)– Natural Treegion formation
• Treegion formation without tail duplication
– Region enlargement optimizations• Tail duplication
• Treegion-based instruction scheduling• Currently using existing Haifa scheduler
![Page 6: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/6.jpg)
6
![Page 7: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/7.jpg)
7
![Page 8: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/8.jpg)
8
![Page 9: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/9.jpg)
9
![Page 10: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/10.jpg)
10
![Page 11: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/11.jpg)
11
BB0
BB1 BB2
BB3 BB4
BB5
BB7BB6
BB8
Treegion 0
Treegion 1
Treegion 2
![Page 12: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/12.jpg)
12
Tail Duplication Example
BB0
BB1 BB2
BB3 BB4
BB5
BB7BB6
BB8
Treegion 0
Treegion 1
Treegion 2
![Page 13: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/13.jpg)
13
Tail Duplication Example
BB0
BB1 BB2
BB3 BB4
BB5
BB7BB6
BB8
Treegion 0
Treegion 1
Treegion 2
![Page 14: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/14.jpg)
14
Tail Duplication Example
BB0
BB1/BB5' BB2
BB3 BB4
BB5
BB7BB6
BB8
Treegion 0
Treegion 1
Treegion 2
BB7'BB6'
![Page 15: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/15.jpg)
15
Region Enlargement Optimizations
• Instruction level parallelism (ILP) vs. static code size– Region enlarging optimizations usually enhance ILP
• Cyclic scheduling: loop unrolling, loop peeling, etc.• Acyclic scheduling: tail duplication, recovery code, etc.
• I-cache and ITLB performance vs. static code size– Larger code usually means larger I-Cache footprint
• Trade off of the conflicting effects of code size increase– Especially in acyclic global scheduling
![Page 16: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/16.jpg)
16
Code Size Efficiency
• Use the ratio of IPC changes over code size changes as an indication of code size efficiency.– Average code size efficiency
– Instantaneous code size efficiency
treegionnaturalcandidate
treegionnaturalcandidateaverage sizecodesizecode
IPCIPCEfficiency
_
_
__ −
−=
napplicatioindividualbeforenapplicatioindividualafter
napplicatioindividualbeforenapplicatioindividualafterinst sizecodesizecode
IPCIPCEfficiency
____
____
__ −
−=
![Page 17: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/17.jpg)
17
Instantaneous Code Size Efficiency
Code Size
Static IPC
A1
A2
A3A4
04
04
__ AA
AAaverage SizeCodeSizeCode
IPCIPCEff−−
=
A0
23
23
__)2(
AA
AAinst SizeCodeSizeCode
IPCIPCAEff−−
=
![Page 18: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/18.jpg)
18
Estimate Static IPC Before Scheduling• Use the expected execution time to calculate the
static IPC– For a multi-path region:
• Now, IPC changes can be calculated as execution time saved by the optimization.
( )[ ]∑ ∗=ipath
ipathipathipathExpected FreqboundresourcebounddependencedataMaxTimeExe_
___ _,___
tree1
tree2
Tree1’
)2(_)'1(_)2(_)1(_)(
treesizeCodeTreetimeExetreetimeExetreetimeExeAEffinst
−+=
Example:
![Page 19: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/19.jpg)
19
Motivating Results from LEGO129.compress
2
2.2
2.4
2.6
2.8
3
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6Relative code size
Stat
ic IP
C
best ILP for a given code size
0%
2%5%
80%30%
![Page 20: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/20.jpg)
20
Motivating Results from LEGO147.vortex
2.5
2.7
2.9
3.1
3.3
3.5
0.8 1 1.2 1.4 1.6 1.8 2 2.2Relative code size
Stat
ic IP
C
best ILP for a given code size
0%
2%5%
80%30%
Reason: only a very small part of the program is frequently executed.
![Page 21: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/21.jpg)
21
Optimal Code Size Efficiency• Point where the ‘diminishing returns’ start• Range between tan(π/12) (.268) and
tan(π /6) (.577)
Code Size
IPC
A
l
![Page 22: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/22.jpg)
22
Single Entry Acyclic Regions (SEARs)
• Prompted by recent extend region code• Form larger regions by merging treegions
– IFF predecessor edges entering the child treegion are exit edges from parent region
• Prevents side entrances (and need for compensation code)
• We now have merge points in the region so it is no longer a Treegion
• SEARs formation happens after tail duplication, but ICSE heuristic still applicable
![Page 23: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/23.jpg)
23
![Page 24: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/24.jpg)
24
![Page 25: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/25.jpg)
25
BB0
BB1 BB2
BB3 BB4
BB5
BB7BB6
BB8
SEAR 0
![Page 26: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/26.jpg)
26
Tree Traversal Scheduling• Final step is to (Haifa) schedule instructions• Other region formation techniques optimize likely
path while off path suffers• Prioritize likely path (based on exec frequency)
• TTS Algorithm:1. Sort basic blocks in depth-first traversal order2. List schedule beginning at root block3. Consider speculation of instructions dominated by current block4. Repeat step 3 until all blocks scheduled
![Page 27: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/27.jpg)
27
Implementation Details
• Serves as front end for Haifa scheduler– Scheduling pass prior to RA
• Primary modifications in sched-rgn.c
• ICSE - data dependence code in sched-dep.c
• Tail duplication – duplicationcode available in cfglayout.c
Front EndParserGenericGimple
TreesTree
GenerationTree
Optimization
RTL
Schedule 1Register Alloc
Schedule 2
BackendSelect
EBB (IA64)
Gimple
Tree-SSA
RTL
ASM
RTL Gen/Opt
![Page 28: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/28.jpg)
28
Compile Time Considerations
• Efficient updating of region structures• Calculating ICSE is expensive
– Requires building DDG to find dependence bound
• Compile time considerations:– Limit duplication based on:
• Number of blocks/instructions• Code expansion • ICSE threshold
![Page 29: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/29.jpg)
29
Region Formation Statistics
• Significant increase in region size• Increase % blocks contained within multi-block region• Limited additional duplication using ICSE• Without code size consideration region size may become
extremely large
![Page 30: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/30.jpg)
30
Speedup
![Page 31: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/31.jpg)
31
Operation Combining
• Tail duplication increases code size• General operation combining reduces code size
![Page 32: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/32.jpg)
32
Automata Based Pipeline Description
• Integral part of achieving speedup is knowing what the hardware can do
• Only need to be described once per processor• The compiler can have a generic coding• Solution must not be memory intensive• This method is used in GCC
– But the DFA representation of GCC is very complex.
![Page 33: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/33.jpg)
33
HMDES Machine Description Language
• Created by John Gyllenhaal in UIUC
• Provides a flexible way to model the pipeline for a variety of processors
• User-friendly yet Powerful• HMDES is converted to
LMDES which is then interfaced with the IMPACT-compiler
![Page 34: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/34.jpg)
34
GCC Machine Description Example
State Diagram of a Integer & FP Unit
Pipeline
START
DECODE
FP
FP
Next Cycle
INT
INT
Next Cycle Next Cycle
GCC-DFA representation of
this State Diagram
![Page 35: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/35.jpg)
35
MDES Machine Description Example
State Diagram of a Integer & FP Unit
Pipeline
START
DECODE
FP
FP
Next Cycle
INT
INT
Next Cycle Next Cycle
MDES representation of
this State Diagram
![Page 36: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/36.jpg)
36
Conclusions
• Natural Treegion formation provides significantly larger regions (SEARs extend upon Treegions)
• Application of ICSE to tail duplication – (has the potential for) good tradeoff between code size, compile time, and performance
• What’s missing?
![Page 37: Treegion Instruction Scheduling in GCCprod.tinker.cc.gatech.edu/symposia/GCC_Summit_Treegion_Slides.pdf · GCC Developers’Summit June 30th, 2006 Treegion Instruction Scheduling](https://reader035.fdocuments.us/reader035/viewer/2022071016/5fcfece83daa6509320b74e3/html5/thumbnails/37.jpg)
TINKER Microarchitecture and Compiler Research GroupNorth Carolina State Universityhttp://www.tinker.ncsu.edu
Chad Rosier [email protected] Conte [email protected] V. Iyer [email protected]
Questions & Contact InformationQuestions & Contact Information