Gcc porting

58
GCC porting Use instruction pattern describe target ISA Shiva Chen [email protected] May 2013

Transcript of Gcc porting

Page 1: Gcc porting

GCC porting

Use instruction pattern describe target ISA

Shiva [email protected]

May 2013

Page 2: Gcc porting

Outline Compiler structure Intermediate languages in GCC Optimization pass in GCC Define instruction pattern Operand constraints Match instruction pattern Strict RTL Target defined constraints Emit assembly code Target information usage Preserve word to describe instruction pattern Example of instruction pattern Split instruction pattern Instruction attribute Peephole pattern Instruction scheduling

Page 3: Gcc porting
Page 4: Gcc porting

Three main intermediate languages format in GCC GENERIC

Language-independent representation generated by each front end

Common representation for all the languages supported by GCC.

GIMPLEPerform language independent and target independent

optimization RTL

Perform the optimization which will notice target feature by porting code

Page 5: Gcc porting

Gimple optimization pass in GCC 4.6.2

004t.gimple006t.vcg009t.omplower010t.lower012t.eh013t.cfg017t.ssa018t.veclower019t.inline_param1020t.einline021t.early_optimizations022t.copyrename1023t.ccp1024t.forwprop1025t.ealias026t.esra

027t.copyprop1028t.mergephi1029t.cddce1030t.eipa_sra031t.tailr1032t.switchconv034t.profile035t.local-pure-const1036t.fnsplit037t.release_ssa038t.inline_param2057t.copyrename2058t.cunrolli059t.ccp2060t.forwprop2062t.alias063t.retslot

064t.phiprop065t.fre066t.copyprop2067t.mergephi2068t.vrp1069t.dce1070t.cselim071t.ifcombine072t.phiopt1073t.tailr2074t.ch076t.cplxlower077t.sra078t.copyrename3079t.dom1080t.phicprop1081t.dse1

082t.reassoc1083t.dce2084t.forwprop3085t.phiopt2086t.objsz087t.ccp3088t.copyprop3090t.bswap091t.crited092t.pre093t.sink094t.loop095t.loopinit096t.lim1097t.copyprop4…143t.optimized

Page 6: Gcc porting

RTL optimization pass in GCC 4.6.2

004t.gimple

144r.expand

Other gimple pass

145r.sibling147r.initvals148r.unshare149r.vregs150r.into_cfglayout151r.jump152r.subreg1153r.dfinit154r.cse1155r.fwprop1

156r.cprop1158r.hoist159r.cprop2162r.ce1163r.reginfo164r.loop2165r.loop2_init166r.loop2_invariant170r.loop2_done172r.cprop3173r.cse2174r.dse1175r.fwprop2176r.auto_inc_dec177r.init-regs178r.dce

179r.combine180r.ce2182r.regmove183r.outof_cfglayout184r.split1185r.subreg2188r.asmcons190r.sched1191r.ira192r.postreload194r.split2198r.pro_and_epilogue199r.dse2200r.csa201r.peephole2202r.ce3

204r.cprop_hardreg205r.dce206r.bbro208r.split4209r.sched2212r.alignments215r.mach216r.barriers217r.dbr218r.split5220r.shorten221r.nothrow222r.final223r.dfinish224t.statistics

Page 7: Gcc porting

Why need divide optimization pass to gimple pass and RTL pass? Gimple pass have more high level semantic

Ex: switch, array, structure, variableSome optimization is more easier to designed when

high level semantic still exist However, gimple pass lack of target information

Ex: instruction length(size), supported ISATherefore, we need RTL optimization pass

Page 8: Gcc porting

Define instruction pattern

All the RTL pattern must match target ISA How to tell GCC generate the RTL match ISA ?

Instruction patterns Use define_expand, define_insn to describe the instruction

patterns which target support

(define_insn “addsi3" [

(set (match_operand:SI 0 “register_operand" "=r,r")

(plus:SI (match_operand:SI 1 “register_operand" "%r,r")

(match_operand:SI 2 “nonmemory_operand" “r,i")) ) ] ... )

Page 9: Gcc porting

Define instruction pattern

GCC already define several instruction pattern name and the semantic of the pattern addsi3

Add semantic with 3 SI mode operands

GCC don’t know the operand constraint of the target How to tell GCC our target’s operand constraint of each

instruction ?PredicateConstraint

Page 10: Gcc porting

Operand Constraints

Multiple Alternative Constraints(define_insn “addsi3" [

(set (match_operand:SI 0 “register_operand" "=r,r")

(plus:SI (match_operand:SI 1 “register_operand" "%r,r")

(match_operand:SI 2 “nonmemory_operand" “r,i")) ) ] ... ) Predicate: register_operand, nonmemory_operand

Constraint: r, iPredicate should contain each constraints of the operand

For operand 2 with SI mode r(reg) belong to nonmemory_operand i(immediate) belong to nonmemory_operand

Page 11: Gcc porting

Operand Constraints

GCC already have predicate to restrict operand Why need constraint field ?

Give the opportunity to change operand while optimization

Ex:movi $r0, 4; add $r1, $r1, $r0 {addsi3}Constant propagation => addi $r1, $1, 4 {addsi3}

Page 12: Gcc porting

Operand Constraints

GCC use two level operand constraint group same semantic instruction together with

single instruction pattern (addsi3) Lots of ISA designed have several assembly

instructions with same semantic and different operand constraint

Reduce the instruction pattern when porting

Page 13: Gcc porting

Operand Constraints

Use instruction pattern do ISA support checking when GCC generate a new RTL pattern Check does the back end define the pattern by

define_insn Check the operand type support or not by predi

cate Check the operand belong to which alternative

by constraint

Page 14: Gcc porting

Operand Constraints

Multiple Alternative Constraints(define_insn “addsi3" [

(set (match_operand:SI 0 “register_operand" "=r,r")

(plus:SI (match_operand:SI 1 “register_operand" "%r,r")

(match_operand:SI 2 “nonmemory_operand" “r,i")) ) ] ... )

First alternative constraints match “add”

Second alternative constraintsmatch “addi”

Page 15: Gcc porting

Match instruction pattern

Multiple Alternative Constraints(define_insn “addsi3" [

(set (match_operand:SI 0 “register_operand" "=r,r")

(plus:SI (match_operand:SI 1 “register_operand" "%r,r")

(match_operand:SI 2 “nonmemory_operand" “r,i")) ) ] ... )

Ex: (set (reg/f:SI 88) (plus:SI (reg:SI 87) (reg/v:SI 55))

1. Parsing RTL pattern(set (op0)

(plus:SI (op1) (op2))

Page 16: Gcc porting

Match instruction pattern

When will generate new RTL pattern ? RTL expand phase (GIMPLE to RTL) During optimization

Ex:(set (reg/f:SI 47) (lshiftrt:SI (reg:SI 60) (const_int 2))

(set (reg/f:SI 88) (plus:SI (reg:SI 47) (reg:SI 55))

(set (reg/f:SI 88) (plus:SI (lshiftrt:SI (reg:SI 60) (const_int 2)) (reg/v:SI 55))Combine phase

srli $r47, $r60, 2add $r88, $r47, $r55

add_srli $r88, $r55, $r60, 2

Page 17: Gcc porting

Strict RTL

Does the new generated RTL pattern always satisfy constraint ? GCC allow certain kind un-match constraint

which reload could fix it later Predicate must always satisfy

RTL1Not do optimization1

Do optimization1

RTL1

RTL2

Reload

Reload

RTL3

RTL2 not satisfy constraint

RTL4

1. RTL3 and RTL4Satisfy constraint 2. RTL4 is betterThen RTL3

Page 18: Gcc porting

Strict RTL

Constraint could allow certain un-match before reload, and hope reload to fix it Ex: constraint is m (memory), but current operand is

constant, GCC will allow before reload Reload phase is after register allocation

In fact, during register allocation, GCC will call reload rapidly while the operand not fit the constraint.

After reload, the operand must satisfy one of the operand constraint (strict RTL)

Page 19: Gcc porting

Strict RTL

(define_insn “movsi" [

(set (match_operand:SI 0 “register_operand" "=r,m")

(match_operand:SI 1 “register_operand" “r,r")) ) ] ... )

(set (reg/f:SI 47) (reg:SI 60))

(set (reg/f:SI 47) (reg:SI 3))

AssumeAfter register allocationPseudo register r60 assigned to r3and the hardware register is exhausted

RA (set (mem:SI (plus (sp)(const)))) (reg:SI 3))

Reload

Page 20: Gcc porting

Target defined constraints

Target could define their own predicate and constraint Target defined predicate

(define_predicate "index_operand" (ior (match_operand 0 "register_operand") (and (match_operand 0 “const_int_operand") (match_test "(INTVAL (op) < 4096 && INTVAL (op) > -4096))")))

Page 21: Gcc porting

Target defined constraints

Target defined constraint

(define_register_constraint "l" "LO_REGS" "registers r0->r7.")

(define_memory_constraint "Uv" "@internal In ARM/Thumb-2 state a valid VFP load/store address." (and (match_code "mem") (match_test "TARGET_32BIT && arm_coproc_mem_operand (op, FALSE)")))

Page 22: Gcc porting

Emit assembly code

Multiple Alternative Constraints(define_insn “addsi3"

[ (set (match_operand:SI 0 “register_operand" "=r,r")

(plus:SI (match_operand:SI 1 “register_operand" "%r,r")

(match_operand:SI 2 “nonmemory_operand" “r,i")))] “” “@ add %0, %1, %2 addi %0, %1, %2” )

Match First alternative constraints match “add”

Output assembly code “add $r3, $r4, $5”

Ex: (set (reg/f:SI 3) (plus:SI (reg:SI 4) (reg:SI 5))

Page 23: Gcc porting

Target information usage

When will GCC use target information get from instruction patterns ? RTL instruction pattern generation

generate insn-emit.c when building GCC by parsing instruction patterns

RTL instruction validation (target supported)generate insn-recog.c when building GCC by parsing instruction

patterns Emit target assembly code

generate insn-output.c when building GCC by parsing instruction patterns

Page 24: Gcc porting

Preserve word to describe instruction pattern

define_insn

“naming pattern”

define_expand “naming pattern”

define_insn

“*..”

RTL generation

RTL validation

Emit assembly

GCC define several “naming patterns” and their semantic use to generate RTL pattern during RTL expand phase ex: addsi3, subsi3, movsi, movhi …

Some target ISA which the semantic not defined in GCC naming pattern but the RTL could generate by some optimization ex: add_slli could generate by combine phase define un-naming pattern make the instruction validate

define_insn “*add_slli” define_insn name with * prefix will identify as un-naming pattern

Page 25: Gcc porting

Example of instruction pattern1600 ;; These control RTL generation for conditional jump insns1601 (define_expand "cbranchsi4"1602 [(set (pc)1603 (if_then_else (match_operator 0 "ordered_comparison_operator"1604 [(match_operand:SI 1 "nonmemory_nonsymbol_operand" "")1605 (match_operand:SI 2 "nonmemory_nonsymbol_operand" "")])1606 (label_ref (match_operand 3 "" ""))1607 (pc)))]1608 ""1609 {1610 sh_expand_cbranchsi4 (operands);1611 DONE;1612 }1613 )

Semantic of “cbranchsi4”compare operand1 and operand 2 by operator 0branch to label 3 if the compare result is true

Predicate "ordered_comparison_operator“ including EQ,NE,LT,LTU,LE,LEU,GT,GTU,GE,GEU.Use porting function sh_expand_cbranchsi4 to generate RTL pattern

Page 26: Gcc porting

Example of instruction pattern1621 (define_insn "*bcondz"1622 [(set (pc)1623 (if_then_else (match_operator 0 "bcondz_operator"1624 [(match_operand:SI 1 "register_operand" "r")1625 (const_int 0)])1626 (label_ref (match_operand 2 "" ""))1627 (pc)))]1628 ""1629 {1630 switch (GET_CODE (operands[0]))1631 {1632 case EQ:1633 return "beqz %1, %2";1634 case NE:1635 return "bnez %1, %2";1636 case LT:1637 return "bltz %1, %2";1638 case LE:1639 return "blez %1, %2";1640 case GT:1641 return "bgtz %1, %2";1642 case GE:1643 return "bgez %1, %2";1644 default:1645 gcc_unreachable ();1646 }1647 }

Un-naming pattern “*bcondz”Use to validate RTL and emit assembly code for the branchcompare with zero

Page 27: Gcc porting

Example of instruction pattern

1388 (define_insn "one_cmplsi2"1389 [(set (match_operand:SI 0 "register_operand" "=r")1390 (not:SI (match_operand:SI 1 "register_operand" "r")))]1391 ""1392 "nor\t%0, %1, %1“)

Semantic of “one_cmplsi2”not operand1 and set to operand 0

Naming pattern “one_cmplsi2” use to generate RTL, validate RTLAnd output assembly code

Output assembly “nor ra, rb, rb” to match the semantic

Page 28: Gcc porting

Split instruction pattern

When will need split instruction pattern ? The const_int value too big that single assembly

instruction can’t encodeSplit the const_int to high part and low partCould split the constant while define_expand

But it’s not good enough, why? Too early split the constant will lost the opportun

ity to optimize the RTL pattern

Page 29: Gcc porting

Split instruction pattern

The optimization phase “move2add”could do the following thing (use assembly code to present RTL semantic for convenient )

move $r0, 123456move $r1, 123457move $r2, 123458

move $r0, 123456addi $r1, $r0, 1addi $r2, $r0, 2

sethi $r0, hi20(123456)ori $r0, lo12(123456) sethi $r1, hi20(123457)ori $r1, lo12(123457)sethi $r2, hi20(123458)ori $r2, lo12(123458)

If split const_int to high/low part tooearlymove2add will fail to transfer move to add

Page 30: Gcc porting

Split instruction pattern

How to split instruction pattern not in RTL expand phase ? Use define_split, define_insn_and_split

Page 31: Gcc porting

Split instruction pattern

004t.gimple

144r.expand

Other gimple pass

145r.sibling147r.initvals148r.unshare149r.vregs150r.into_cfglayout151r.jump152r.subreg1153r.dfinit154r.cse1155r.fwprop1

156r.cprop1158r.hoist159r.cprop2162r.ce1163r.reginfo164r.loop2165r.loop2_init166r.loop2_invariant170r.loop2_done172r.cprop3173r.cse2174r.dse1175r.fwprop2176r.auto_inc_dec177r.init-regs178r.dce

179r.combine180r.ce2182r.regmove183r.outof_cfglayout184r.split1185r.subreg2188r.asmcons190r.sched1191r.ira192r.postreload194r.split2198r.pro_and_epilogue199r.dse2200r.csa201r.peephole2202r.ce3

204r.cprop_hardreg205r.dce206r.bbro208r.split4209r.sched2212r.alignments215r.mach216r.barriers217r.dbr218r.split5220r.shorten221r.nothrow222r.final223r.dfinish224t.statistics

Page 32: Gcc porting

Split instruction pattern

486 (define_insn_and_split "*movsi_const" 487 [(set (match_operand:WORD 0 "register_operand" "=r,r") 488 (match_operand:WORD 1 "immediate_operand" "P,i"))] 489 "" 490 { 491 if (GET_CODE (operands[1]) == CONST_INT && SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)) 492 { 493 return "movi\t%0, %1"; 494 } 495 else 496 return "#"; 497 } 498 "reload_completed && GET_CODE (operands[1]) == CONST_INT && ! SIGNED_INT_FITS_N_BITS (INTVAL (ope rands[1]), 20)" 499 [(set (match_dup 0) (high:SI (match_dup 1))) 500 (set (match_dup 0) (lo_sum:SI (match_dup 0) (match_dup 1)))]

If const_int not fit signed 20 bit return “#” which means the pattern will split in split phase

Page 33: Gcc porting

Split instruction pattern

486 (define_insn_and_split "*movsi_const" 487 [(set (match_operand:WORD 0 "register_operand" "=r,r") 488 (match_operand:WORD 1 "immediate_operand" "P,i"))] 489 "" 490 { 491 if (GET_CODE (operands[1]) == CONST_INT && SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)) 492 { 493 return "movi\t%0, %1"; 494 } 495 else 496 return "#"; 497 } 498 "reload_completed && GET_CODE (operands[1]) == CONST_INT && ! SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)" 499 [(set (match_dup 0) (high:SI (match_dup 1))) 500 (set (match_dup 0) (lo_sum:SI (match_dup 0) (match_dup 1)))]

Split conditions:Which is reload_completed (after reload) && the const_int not fit signed 20 bit

Page 34: Gcc porting

Split instruction pattern

486 (define_insn_and_split "*movsi_const" 487 [(set (match_operand:WORD 0 "register_operand" "=r,r") 488 (match_operand:WORD 1 "immediate_operand" "P,i"))] 489 "" 490 { 491 if (GET_CODE (operands[1]) == CONST_INT && SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)) 492 { 493 return "movi\t%0, %1"; 494 } 495 else 496 return "#"; 497 } 498 "reload_completed && GET_CODE (operands[1]) == CONST_INT && ! SIGNED_INT_FITS_N_BITS (INTVAL (operands[1]), 20)" 499 [(set (match_dup 0) (high:SI (match_dup 1))) 500 (set (match_dup 0) (lo_sum:SI (match_dup 0) (match_dup 1)))]

Split RTL pattern to set high partAnd add low sum

match_dup 0 means duplicate operands 0 to this field

Page 35: Gcc porting

Split instruction pattern

288 (define_split 289 [(set (match_operand:ANY64 0 "register_operand" "") 290 (match_operand:ANY64 1 "register_operand" ""))] 291 "reload_completed && 292 (! USE_V3_SERISE_ISA)” 295 [(set (match_dup 0) (match_dup 1)) 296 (set (match_dup 2) (match_dup 3))] 297 “…

Split condition would be reload_completed && not V3 ISAV3 have movd44 which could do 64 bit register move

ANY64: DI, DFDI: double intDF:double float

define_split Define_insn_and_split

Split RTL

RTL validation

Emit assembly

Page 36: Gcc porting

Instruction attribute

120 (define_attr "type" 121 "unknown,load,store,bequal, alu, .." 122 (const_string "unknown"))… 614 (define_insn "cmovn" 615 [(set (match_operand:SI 0 "register_operand" "=r") 616 (if_then_else (ne:SI (match_operand:SI 1 "register_operand" "r") 617 (const_int 0)) 618 (match_operand:SI 2 "register_operand" "r") 619 (match_operand:SI 3 "register_operand" "0")))] 620 "" 621 "cmovn\t%0, %2, %1" 622 [(set_attr "type" "alu") 623 (set_attr “length” “4”])

(define_attr “attribute_name” “value domain” (default value))

Page 37: Gcc porting

Instruction attribute

Attribute “type” use to divide instruction to several instruction group Help to write instruction scheduling porting code

Attribute “length” give each instruction ISA length (size) information make the GCC could calculate branch distance correctly.

Page 38: Gcc porting

Peephole pattern2072 ;; Merge move 0 to bcondz2073 (define_peephole22074 [(set (match_operand:SI 0 "register_operand" "") (const_int 0))2075 (set (pc)2076 (if_then_else (match_operator 1 "bcondz_operator"2077 [(match_dup 0)2078 (match_operand:SI 2 "register_operand" "r")])2079 (label_ref (match_operand 3 "" ""))2080 (pc)))]2081 "peep2_reg_dead_p (2, operands[0])"2082 [(set (pc)2083 (if_then_else:SI (match_dup 1)2084 (label_ref (match_dup 3)) (pc)))]2085 "2086 {2087 operands[1] = gen_rtx_fmt_ee (swap_condition (GET_CODE (operands[1])) ,2088 SImode, operands[2], GEN_INT(0));2089 }")

Old RTL

New RTL

movi $r0, 0bne $r0, $r1, L3

bnez $r1, L3

Page 39: Gcc porting

Instruction scheduling

Instruction scheduling is the optimization pass in GCC change instruction without changing the

semantic of the code To reduce the pipeline stall to improve

performance Instruction scheduling is belong to RTL phase

Page 40: Gcc porting

RTL optimization pass in GCC 4.6.2

004t.gimple

144r.expand

Other gimple pass

145r.sibling147r.initvals148r.unshare149r.vregs150r.into_cfglayout151r.jump152r.subreg1153r.dfinit154r.cse1155r.fwprop1

156r.cprop1158r.hoist159r.cprop2162r.ce1163r.reginfo164r.loop2165r.loop2_init166r.loop2_invariant170r.loop2_done172r.cprop3173r.cse2174r.dse1175r.fwprop2176r.auto_inc_dec177r.init-regs178r.dce

179r.combine180r.ce2182r.regmove183r.outof_cfglayout184r.split1185r.subreg2188r.asmcons190r.sched1191r.ira192r.postreload194r.split2198r.pro_and_epilogue199r.dse2200r.csa201r.peephole2202r.ce3

204r.cprop_hardreg205r.dce206r.bbro208r.split4209r.sched2212r.alignments215r.mach216r.barriers217r.dbr218r.split5220r.shorten221r.nothrow222r.final223r.dfinish224t.statistics

Page 41: Gcc porting

Instruction scheduling

GCC have two scheduling pass Sched1

Do the interblock scheduling before Register allocation Try to find the innermost loop as region

Schedule the instructions in the region Improve the performance of hot spot (innermost loop) Extend the scope to region to find more scheduling opportunit

y

Sched2Do the single basic block scheduling after Register allocationRegister allocation may produce spill code (load/store)

Need re-schedule again

Page 42: Gcc porting

Instruction scheduling

Instruction scheduling resolve the following hazard to prevent pipeline stall Structure hazard

Structure hazard occur when two or more instruction need the same function unit at the same time

Data hazardRAW (read after write): a true dependencyWAR (write after read): a anti-dependencyWAW(write after write): an output dependency

Page 43: Gcc porting

Instruction scheduling

GCC provide several interface to describe pipeline model After parsing the pipeline description porting co

deGcc will generate a automata as a pipeline hazard re

cognizer To figure out the possibility of the instruction issue by

the processor on a given simulated cycle

(define_automaton “name")

Page 44: Gcc porting

Instruction scheduling

(define_automaton “a1")(define_cpu_unit "decode1,decode2" "a1")(define_cpu_unit "div" "a1")

(define_insn_reservation “alu_class" 1 (eq_attr “type" “alu") "decode + alu")(define_insn_reservation "mult_class" 1 (eq_attr “type" "mult") "decode + mult")

a1: automata namedecode1, decode2, div: the cpu unit(function unit) in the processor

define_insn_reservation: describepipeline rule for each instruction class

alu_class,mult_class: insn-name (insn class)

(eq_attr “type" “alu"): match the rulewhile the type attribute of the Instruction pattern is alu

"decode + alu": regular expressionto describe the function unit usage

1 is the default cycle when the datadependency occur

Page 45: Gcc porting

Instruction scheduling

Multiple Alternative Constraints(define_insn “addsi3"

[ (set (match_operand:SI 0 “register_operand" "=r,r")

(plus:SI (match_operand:SI 1 “register_operand" "%r,r")

(match_operand:SI 2 “nonmemory_operand" “r,i")))] “” “@ add %0, %1, %2 addi %0, %1, %2” [(set_attr “type" “alu") )

(define_insn_reservation “alu_class" 1 (eq_attr “type" “alu") "decode + alu")

Page 46: Gcc porting

Instruction scheduling

(define_automaton “a1")(define_cpu_unit "decode1,decode2" "a1")(define_cpu_unit “alu" "a1")(define_cpu_unit “mult" "a1")

(define_insn_reservation “alu_class" 1 (eq_attr “type" “alu") "decode + alu")(define_insn_reservation "mult_class" 1 (eq_attr “type" "mult") "decode + mult")

nothing

decode+ alu

decode+ mult

alu_classnext_cycle

next_cycle

mult_class

next_cycle

Current CPUFunction unit usage

Next cycle CPUFunction unit usage

State transition:1. Occupy some function unit2. release function some unit

Page 47: Gcc porting

Instruction scheduling

(define_insn_reservation “alu_class" 1 (eq_attr “type" “alu") "decode + alu")(define_insn_reservation "mult_class" 1 (eq_attr “type" "mult") "decode + mult")

(define_bypass 2 “alu_class" “alu_class“)

(define_bypass 3 “mult_class" “mult_class“)

producer consumer t 1 2 3 4 5

alu_class alu_class

mult_class

1 0 0 0 0

0 0 0 0 0

mult_class alu_class

mult_class

0 0 0 0 0

1 1 0 0 0

1 means will stall at t cycle

t cycle is the cycle timeAfter producer

Page 48: Gcc porting

Instruction schedulingproducer consumer t 1 2 3 4 5

alu_class alu_class

mult_class

1 0 0 0 0

0 0 0 0 0

mult_class alu_class

mult_class

0 0 0 0 0

1 1 0 0 0

0 0 0 0 00 0 0 0 0

1 0 0 0 00 0 0 0 0

1 0 0 0 01 0 0 0 0

0 0 0 0 01 1 0 0 0

1 0 0 0 01 0 0 0 0

Current state

consumer

alu_class

mult_class

t 1 2 3 4 5

Page 49: Gcc porting

Instruction scheduling

1. movi $r0, 0 {alu}2. movi $r1, 1 {alu}3. add $r0, $r0, $r1 {alu}4. lwi $r4, [$sp + 4] {load}5. mul $r5, $r0, $r4 {mul}

1

4

2

5

3

(define_insn_reservation “alu_class" 1 (eq_attr “type" “alu") "decode + alu")(define_insn_reservation "mult_class" 1 (eq_attr “type" "mult") "decode + mult")(define_insn_reservation “load_class" 1 (eq_attr “type" “load") "decode + mem")

Bottom up calcuate priority ofEach instruction

By P = max {latency + one successor latency}

1

22

33

Dataflow graph

Page 50: Gcc porting

Instruction scheduling

1

4

2

5

3

Dataflow graph

Ready list: 1 2 4Pending list: 3 5Queued list: Scheduled list:

Ready Pending Queued Scheduled

Scheduled

Dependency

resolved

Data hazard

Pick the max priority insn from Ready list

Page 51: Gcc porting

Instruction scheduling

1

4

2

5

3

Dataflow graph

Ready list: 4Pending list: 3 5Queued list: 2Scheduled list:1

1. movi $r0, 0 {alu}

{alu} {alu}

{alu} {load}

{mult}cycle 1

(define_bypass 2 “alu_class" “alu_class“)

Page 52: Gcc porting

Instruction scheduling

1

4

2

5

3

Dataflow graph

Ready list: 2Pending list: 3 5Queued list:Scheduled list:1 4

1. movi $r0, 0 {alu}

{alu} {alu}

{alu} {load}

{mult}cycle 1

4. lwi $r4, [$sp + 4] {load}cycle 2

Page 53: Gcc porting

Instruction scheduling

1

4

2

5

3

Dataflow graph

Ready list:Pending list: 5Queued list: 3Scheduled list:1 4 2

1. movi $r0, 0 {alu}

{alu} {alu}

{alu} {load}

{mult}cycle 1

4. lwi $r4, [$sp + 4] {load}cycle 2

2. movi $r1, 1 {alu}cycle 3

Page 54: Gcc porting

Instruction scheduling

1

4

2

5

3

Dataflow graph

Ready list: 3Pending list: 5Queued list:Scheduled list:1 4 2

1. movi $r0, 0 {alu}

{alu} {alu}

{alu} {load}

{mult}cycle 1

4. lwi $r4, [$sp + 4] {load}cycle 2

2. movi $r1, 1 {alu}cycle 3

cycle 4

Page 55: Gcc porting

Instruction scheduling

1

4

2

5

3

Dataflow graph

Ready list: 5Pending list:Queued list:Scheduled list:1 4 2 3

1. movi $r0, 0 {alu}

{alu} {alu}

{alu} {load}

{mult}cycle 1

4. lwi $r4, [$sp + 4] {load}cycle 2

2. movi $r1, 1 {alu}cycle 3

cycle 4

3. add $r0, $r0, $r1 {alu}cycle 5

Page 56: Gcc porting

Instruction scheduling

1

4

2

5

3

Dataflow graph

Ready list:Pending list:Queued list:Scheduled list:1 4 2 3 5

1. movi $r0, 0 {alu}

{alu} {alu}

{alu} {load}

{mult}cycle 1

4. lwi $r4, [$sp + 4] {load}cycle 2

2. movi $r1, 1 {alu}cycle 3

cycle 4

3. add $r0, $r0, $r1 {alu}cycle 5

5. mul $r5, $r0, $r4 {mul}cycle 6

Page 57: Gcc porting

Thank you

Page 58: Gcc porting

Switch initialization conversion in gimple optimization pass

31 int a,b; 32 33 switch (argc) 34 { 35 case 1: 36 case 2: 37 a = 8; 38 b = 6; 39 break; 40 case 3: 41 a = 9; 42 b = 5; 43 break; 44 case 12: 45 a = 10; 46 b = 4; 47 break; 48 default: 49 a = 16; 50 b = 1; 51 }

58 static const int = CSWTCH01[] = {6, 6, 5, 1, 1, 1, 1, 1, 1, 1, 1, 4}; 59 static const int = CSWTCH02[] = {8, 8, 9, 16, 16, 16, 16, 16, 16, 16, 60 16, 16, 10}; 61 62 if (((unsigned) argc) - 1 < 11) 63 { 64 a = CSWTCH02[argc - 1]; 65 b = CSWTCH01[argc - 1]; 66 } 67 else 68 { 69 a = 16; 70 b = 1; 71 }

Try to transfer switch statement to static array access