B Compile

24
Section B Compilation Process

Transcript of B Compile

Page 1: B Compile

Section BCompilation Process

Page 2: B Compile

Optimization Control Philosophy Low optimization levels:

Shorter compile time Safer optimizations Greater computational accuracy Slower generated code

High optimization levels: Longer compile time More aggressive optimizations Precision compromised for higher performance Faster generated code

Fine control over optimization via a multitude of options PathOpt2 can be a lot of help

Page 3: B Compile

Optimization Flags vs. Phases Invoked-O0 (the default under -g)

Front-end and code generator, all optimizations disabled-O1

Front-end and code generator, local optimizations only-O2 (the default)

Add WOPT and rest of CG's optimizations-O3

Add LNO-ipa (can be any opt level)

Add IPA-Ofast

Same as

-O3 -ipa -OPT:Ofast -fno-math-errno –ffast-math-OPT:Ofast is: -OPT:ro=2:Olimit=0:div_split=ON:alias=typed

Page 4: B Compile

Option Groups Options organized into groups by compiler phase or by class of

feature General syntax:

-GROUPNAME:opt[=val]{:opt=[val]} Some GNU-style flags map to these options

-march -ffast-math -ffloat-store -fno-inline Group names:

Loop nest optimization-LNO:

Global scalar optimization-WOPT:

Code generation-CG:

Language features-LANG:

Inter-procedural analysis-IPA:

Back-end inlining-INLINE:

Target environment-TENV:

Target machine-TARG:

Optimizations-OPT:

User listing-LIST:

Page 5: B Compile

Roles of the Compiler Driver

Implemented in open64/driver Handles all command line options Invokes all compilation phases:

Preprocessor Front-end Inliner Backend (be, lno, wopt, cg) Assembler Linker

Maintain compatibility with GNU options

Page 6: B Compile

Compiler Driver

open64/driver/OPTIONS Table of options specifications Can map an option to a different option

Single executable, multiple soft links arg[0] string to identify language Query compiler-relevant env variables Query host processor under -march=auto (default) Look up compiler.defaults file for system-specific options

Page 7: B Compile

C/C++ Front-end History GNU 2.95 when open-sourced (2000)

Direct translation from GNU internal trees to WHIRL Separate C and C++ front-ends embedded inside Open64

Updated to GNU 3.3.1(2004)

Defined .spin file format as virtual machine target (2006) GNU compiler no longer maintained as part of Open64 Streamlined efforts for updating to each GNU release Duplicate code between C and C++ eliminated

GNU 4.0.2 front-ends shipped March 2007

GNU 4.2.0 front-ends shipped October 2007

Path to additional GNU languages in future

Page 8: B Compile

Using GNU Compiler as Front-end

Start with GNU compiler configured for X86-64

Old Way:

open64/kgccfe for C

open64/kg++fe for C++

Calls for WHIRL generation embedded in GNU code

C++ requires running entire compilation to assembly to produce complete translation data

Duplicate source trees between C and C++

Page 9: B Compile

Using GNU Compiler as Front-end

Start with GNU compiler configured for X86-64

New Way:

gspin tree nodes – components to model GNU trees Utilities implemented in libspin repository

gspin tree nodes dumped out to .spin file

Identify points to intercept GNU trees in gcc’s compilation

gspin tree nodes generated from GNU trees in gcc/tree.c

open64/wgen translates gspin tree nodes (.spin file) to WHIRL nodes (.B file) wgen's mode of operation modeled after kgccfe/kg++fe

Page 10: B Compile

Gspin Tree Nodes

Purpose: encode complete information in GNU trees for dumping to .spin file

8-byte sized gspin node as atomic building block defined in libspin/gspin-tree.h represents a field of information in GNU's tree node aggregate of contiguous gspin nodes represents a

GNU tree node representation scheme defined in libspin/gspin-tel.h

Allocation of gspin nodes managed by libspin I/O of gspin nodes via mmap() ASCII dumper

Each node only dumped once to avoid infinite recursion

Page 11: B Compile

Example of gspin nodes

GNU’s PLUS tree node has 14 fields

Root gspin node to encode PLUS

13 more gspin nodes for rest of fields

tree_code = PLUS

0 tree-code_class = BINARY

1 tree_type

2 tree_chain = NULL

3 flags

4 arity = 13

5 file name

6 line no

7 operand 0

8 operand 1

9 unused

10 unused

11 unused

12 unused

Page 12: B Compile

FORTRAN Front-end

Originated from the Cray Fortran compiler

Front-end consists of:

Cray Fortran front-end (crayf90/fe90)

Adaptor for WHIRL generation (crayf90/sgi)

Multiple run-time library directories libF77, libI77, libU77, libfi, libf, libu libfortran.so contains all of these

Numerous bug fixes and enhancements at PathScale TR15580 and TR15581

Page 13: B Compile

FORTRAN Front-end ImplementationThree sub-phases in fe90:

1. Lexer (src_input.c, lex.c) and Parser (p_*.c)

Program is represented in tree form

Tree nodes are entries in a variety of tables (sytb.*): scp-tbl for scopes SH_Tbl for statement headers global_name_tbl and name_tbl for Fortran identifiers AT_Tbl for attribute nodes for variables and procedures CN_Tbl for constant values IR_Tbl for operator nodes IL_Tbl for list linking nodes More for array bounds, file names, etc

Page 14: B Compile

FORTRAN Front-end Implementation

2. Semantic pass (s_*.c) Operations that could not be performed on-the-fly during

parsing3. WHIRL generation cvrt_to_pdg() and send*() (in icvrt.c) traverse trees and

symbol tables They call routines in crayf90/sgi to generate WHIRLTo debug: Build with -D_DEBUG Run mfef95 with: -uall (dump all tables) -uir2 (dump the most frequently useful tables) Routines in fe90/debug.c can be called when running

debugger on mfef95

Page 15: B Compile

Goto Conversion

• Converts loops written in goto's to high-level loop forms

to be friendly to LNO

• Based on paper by Ana Erosa and Laurie Hendren

• Originally applied once before LNO

(be/com/opt_goto.cxx)

• GNU 4.2 front-end no longer generates high-level loop

constructs

• Added new phase to cater to VH WHIRL at beginning of

backend (be/be/goto_conv.cxx)

Page 16: B Compile

Very High WHIRL Optimizer

Lower to High WHIRL while performing optimizations

First part deals with common language constructs (be/vho/vho_lower.cxx):

Bit-field optimizations Short-circuit boolean expressions Switch statement optimization Simple if-conversion Assignments of small structs: lower struct copy to

assignments of individual fields Convert patterns of code sequences to intrinsics:

• Saturated subtract, abs() Other pattern-based optimizations

• max, min

Page 17: B Compile

Very High WHIRL OptimizerSecond part generates efficient code from FORTRAN 90

constructs (be/vho/f90_lower.cxx):• array section operations expanded to loops• introduce array temporaries in order to preserve parallel

semantics

A(1:n) = A(B(1:n))

expands to

do i = 1, n

t(i) = A(B(i))

enddo

do i = 1, n

A(i) = t(i)

enddo

Page 18: B Compile

Lowering

All lowering actions performed after VHO performed by calling wn_lower() (be/com/wn_lower.cxx)

Each bit in LOWER_ACTIONS parameter controls one class of lowering

Recursively walk the tree and apply the lowering relevant to each node

Mostly simple tree transformation

Page 19: B Compile

WHIRL Simplifier

Simplify a WHIRL tree to a more efficient form

Implemented in common/com/wn_simp_code.h

Node types mapped by cpp to either wn or coderep when invoked from wopt

(should have used C++ template)

Evaluate constant expressions to constants Used by front-ends to handle constant expressions in

declarations

Automatically called during WHIRL tree generation

The cheapest optimization

Should be called whenever transformation occurs

Page 20: B Compile

Linkage Convention

Implemented in common/com/x8664/targ_sim.cxx

Called from the lowerer

Controls:• How parameters of different types are passed• How function return values of different types are

returned

Fake parameters for return structs introduced by lowerer

Page 21: B Compile

Data Layout

Refers to how program variables are allocated in memory

Program variables remain discrete until laid out in memory

Optimization opportunities arise from:

Alignment

Locality of references

Strategy: delay until benefits seen for certain relative positioning

Page 22: B Compile

Data Layout Mechanism

Designed so it can occur continuously throughout optimization and compilation

Happens during: IPA (common block splitting and padding) LNO (enforcing alignment)

Hierarchical layout representation: for each symbol, ST_base: symbol relative to which it is allocated

(original set to itself) ST_ofst: position of symbol in ST_base's block

A symbol is laid out by setting its ST_base and ST_ofst fields

Symbol ST_base itself may not be laid out till later

Page 23: B Compile

Data Layout for Stack FrameImplemented in be/com/data_layout.cxx

Segments for: Formal parameters Fixed temporaries for Fortran alternate entry parameters Actual (outgoing) parameters Locals (user or compiler-generated)

Different stack models: Small ($sp only) Large ($fp and $sp) Dynamic ($fp and $sp)

Stack frame finalized at end of code generation

Final resolution of ST_base either $sp or $fp

Page 24: B Compile

Data Layout Example

Variable A is 32 bytes off $sp

See Base_Symbol_And_Offset() in common/com/symtab.cxx

Abase = Boffset = 12

base = Coffset =20

BCbase = $spoffset =0

$sp