Learning for Optimizing Compilers

46
UNIVERSITY OF NIVERSITY OF MASSACHUSETTS ASSACHUSETTS, A , AMHERST MHERST Department of Computer Science Department of Computer Science John Cavazos Architecture and Language Implementation Lab Thesis Seminar University of Massachusetts, Amherst Learning for Optimizing Compilers

description

John Cavazos Architecture and Language Implementation Lab Thesis Seminar University of Massachusetts, Amherst. Learning for Optimizing Compilers. Compiler writers have a difficult task optimizations are NP-hard computer architectures are complex computer architects need rapid evaluation - PowerPoint PPT Presentation

Transcript of Learning for Optimizing Compilers

Page 1: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

John CavazosArchitecture and Language Implementation Lab

Thesis Seminar

University of Massachusetts, Amherst

Learning for Optimizing Compilers

Page 2: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 2

Motivation

Compiler writers have a difficult task optimizations are NP-hard computer architectures are complex computer architects need rapid

evaluation Generating heuristics manually is

slow, complicated, and ad hoc.

Page 3: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 3

Propose Supervised Learning

Induces heuristics automatically Training examples

a,b,c,…,z label a,b,c,…z : properties of problem label : proper decision to make

Two objectives: Minimize error Prefer less complicated function

LOCO (Learning for Optimizing COmpilers)

Page 4: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 4

Benefits of Supervised Learning

Heuristic construction sped up Determines relative importance of

features Effective heuristics

Comparable to hand-tuned heuristics Theoretically sound

Traditional approach ad hoc

Page 5: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 5

Taxonomy of Compiler Heuristics

1. What Order to Apply Optimizations Phase-ordering heuristics

2. When to Optimize Filters

3. Which Optimization Algorithm to Apply

Hybrid Optimizations

4. How to Optimize Priority Functions

Page 6: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 6

The LOCO Methodology

Determine class of heuristic Generate raw data

Instrument compiler Process raw data

Thresholds Generates training data

Induce heuristic Integrate into compiler

Page 7: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 7

The LOCO Methodology

SupervisedLearning

InstrumentedCompiler

TrainingSet

ProductionCompiler

Generate raw learning data

Process raw data(Thresholding)

Rule induction

Induces heuristic

LOCO

Page 8: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 8

Experimental Setup

Java JIT compiler Jikes RVM 2.0.2

PowerPC 533 MHz G4, model 7410 Case Study 1: SPEC JVM benchmarks Case Study 2: Scientific benchmarks

Scheduling improves by 4% or more

Page 9: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 9

HybridRegister Allocation

Case Study 1

Page 10: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 10

Motivation

Register Allocation: important Effective use of registers

Different Algorithms to choose from Graph coloring: possibly expensive Linear scan: not always effective

Which algorithm to apply?

Page 11: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 11

Solution

Features predict which algorithm to use

Heuristic function controls allocator Reduces cost significantly Retains most benefit

Successful with simple features Applicable to other optimizations

Page 12: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 12

Hybrid Register Allocation

Page 13: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 13

Features of Methods

Features MeaningOut, In, and

Exception Out Edges

Out, in, and exception out edges in CFG (total, avg)

Live on Entry

Live on Exit

Number of edges live on entry and exit (total, min, max)

Insts and Blocks Number of instructions and blocks in method (total)

Block size Size of blocks (max, min, avg)

Intervals Number of live intervals (max, total, avg)

Symbolics Number of symbolics (total, avg)

Page 14: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 14

Hybrid Register Allocation

Page 15: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 15

Inducing Heuristic Controller

For each block generate raw training data Features of method Additional spills incurred Cost of allocation algorithms

Process raw data to generate training set

Leave-one-out cross-validation Output of LOCO = heuristic

controller

Page 16: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 16

Labeling Training Instances

Two factors: Cost of register allocation Spill benefit of different allocators

Prefer graph coloring If benefit above threshold

Prefer linear scan If graph coloring cost above

threshold No spill benefit

Page 17: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 17

Motivation for Threshold Technique

Noise reduction technique Simplifies learning

Removes cases of fine distinction

Separation by a threshold gap For example:

T=10% model estimates improvement by 10%

Page 18: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 18

Thresholding

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.E-01 1.E+01 1.E+03 1.E+05 1.E+07 1.E+09

LS Spills - GC Spills

1 -

(L

S C

os

t /

GC

Co

st)

Cost Threshold (0.5)Spill Threshold(8192)

Graph ColoringLinear Scan

No Instance

Page 19: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 19

Labeling Training Instances

If (LS_Spill – GC_Spill > Spill_Threshold)

Print “GC”;Else If (LS_Cost/GC_Cost >

Cost_Threshold) Print “LS”;

Else if (LS_Spill – GC_Spill <= 0) Print “LS”;Else { // No Label }

High Spill Benefit

High Cost

No Spill BenefitSkip Training

Instance

Page 20: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 20

Threshold Example

Page 21: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 21

Spill Loads(Opt Level 3, 8 Regs)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

GC B0C0 B8kC0 B64kC0 B0C50 B8kC50 B64kC50 LS

Page 22: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 22

Benchmark Running Times(Opt Level 3, 8 Regs)

0

0.2

0.4

0.6

0.8

1

1.2

GC B0C0 B8kC0 B64kC0 B0C50 B8kC50 B64kC50 LS

Page 23: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 23

Register Allocation Stats(Opt Level 3, 8 Regs)

REG ALG

Run Time

Allocation Cost

GC 91.9%

100%

B0C0 93.4% 83.0%

B8kC0 93.1% 71.2%

B64kC0 93.7% 66.7%

B0C50 93.3% 82.4%

B8kC50 94.0% 40.9%

B64kC50 96.6% 27.9%

LS 100% 13.0%

Page 24: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 24

Register Allocation Cost(Opt Level 3, 8 Regs)

0

0.2

0.4

0.6

0.8

1

1.2

GC B0C0 B8KC0 B64KC0 B0C50 B8kC50 B64C50 LS

Page 25: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 25

Hybrid Register Allocation is Successful

Significantly reduce register allocation time Reduced allocation time by 60%

Preserve benefit of graph coloring Achieved 93% of graph coloring

benefit LOCO effective for this heuristic

Page 26: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 26

Instruction SchedulingFilters

Case Study 2:

Page 27: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 27

Motivation

Instruction scheduling: important Improvements over 15%

But: Expensive Frequently not beneficial

Problem: Can we predict which blocks benefit from scheduling?

Page 28: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 28

Solution

Features of block predict when to schedule

Heuristic controls scheduling Reduces cost of scheduling Retains benefit of scheduling

Successful with simple features Filter for applying scheduler

Page 29: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 29

An Optimization Filter

Page 30: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 30

Features of Block

Features Kind Meaning

BBLen Block size Number of Instructions

Load, Store, Branch, Call Return

Operation Fraction of that type of instruction

Integer, Float, System Functional unit Fraction of instruction that executes on that FU

PEI, GC, Yield, Thread Hazard Fraction of that type of hazard instruction

Page 31: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 31

Inducing a Filter

Construct cheap-to-compute features of a block

Obtain training instances that include: Features of the block Labels (Scheduling benefit to block)

Induce a filter using LOCO We used rule induction

Use the filter to control when compiler schedules

Page 32: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 32

Block Timing Estimator

Estimate of cycles to execute block Simple model of real machine

Determines cost of block in isolation

Relative cycle differences important Not absolute cycle counts

Page 33: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 33

Labeling using Thresholds

Page 34: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 34

80

85

90

95

100

105

aes bh linpack power voronoi scimark geomean

Rat

io t

o N

ot

Sch

edu

lin

g

0% 5% 10% 15% 20% 25% LS

Running Time with Filtering

Page 35: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 35

80

85

90

95

100

105

aes bh linpack power voronoi scimark geomean

%P

ct

of

No

t S

ch

ed

uli

ng

0% 5% 10% 15% 20% 25% LS

Running Time with Filtering

Page 36: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 36

80

85

90

95

100

105

aes bh linpack power voronoi scimark geomean

%P

ct o

f N

ot

Sch

edu

lin

g

0% 5% 10% 15% 20% 25% LS

Running Time with Filtering

Page 37: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 37

0

10

20

30

40

50

60

70

80

90

100

aes bh linpack power voronoi scimark geomean

Fra

ctio

n o

f L

S t

ime

0% 5% 10% 15% 20% 25%

Scheduling Time with Filtering

Page 38: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 38

0

10

20

30

40

50

60

70

80

90

100

aes bh linpack power voronoi scimark geomean

Fra

ctio

n o

f L

S t

ime

0% 5% 10% 15% 20% 25%

Scheduling Time with Filtering

Page 39: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 39

Filtering Statistics

0%

5%

10%

15%

20%

25%

30%

35%

40%

Sched Blocks Sched Insts Filter/Sched Sched/Comp

0% 5% 10% 15% 20% 25%

Page 40: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 40

Filters are Successful

Significantly reduce scheduling time Reduced scheduling time by 75%

Preserve benefit of scheduling Achieved 93% of scheduling benefit

LOCO effective for this heuristic

Page 41: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 41

Related Work

Supervised learning Loop-unrolling and tiling

Genetic algorithms Hyperblocks, reg allocation, prefetching (MIT) Application-specific compilation strategy (Rice)

Reinforcement learning Used to induce heuristic for scheduling

(UMass) We argue LOCO is better

Page 42: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 42

Future Work

More work on filters Inlining and SSA-based opts

More work on hybrid optimizations Garbage collection

More work on priority functions Register allocation spill heuristic

Use LOCO anywhere a heuristic is used

Page 43: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 43

Conclusion

LOCO effective at constructing heuristics Faster than most alternatives

LOCO can lead to insights More readable than other alternatives

LOCO heuristics competitive Comparable to hand-tuned heuristics

LOCO easier to use

Page 44: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 44

Spill Loads(Opt Level 1, 8 Regs)

0

0.5

1

1.5

2

2.5

GC B0C0 B8kC0 B64kC0 B0C50 B8kC50 B64kC50 LS

Page 45: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 45

Register Allocation Cost(Opt Level 1, 8 Regs)

0

0.2

0.4

0.6

0.8

1

1.2

GC B0C0 B8KC0 B64KC0 B0C50 B8KC50 B64KC50 LS

Page 46: Learning for Optimizing  Compilers

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science 46

Benchmark Running Times (Opt Level 1, 8 Regs)

0

0.2

0.4

0.6

0.8

1

1.2

GC B0C0 B8kC0 B64kC0 B0C50 B8kC50 B64kC50 LS