Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May...

Adapting Convergent Scheduling Using Machine

Learning

Diego Puppin*, Mark Stephenson†, Una-May O’Reilly†, Martin Martin†, and

Saman Amarasinghe†

*Institute for Information Science and Technologies, Italy† , Massachusetts Institute of Technology USA

http://web.mit.edu/index.html

Outline

This talk shows how one can apply machine learning techniques to find good phase orderings for an instruction scheduler

First, I’ll introduce the scheduler that we are interested in improving

Then, I’ll discuss genetic programmingThen, I’ll present experimental results

R4000 likeProcessorCore

Operandnetwork

Clustered Architectures

Memory and registers separated into clustersRAWClustered VLIWs

When scheduling, we try to co-locate data with computation

Convergent Scheduling

Convergent scheduling passes are symmetric

Each pass takes as input a preference map and outputs a preference map

Passes are modular and can be applied in any order

Convergent SchedulingPreference Maps

Inst

ruct

ions

Clusters

Tim

e

0 1 2 3

4

5

6

7

0

1

2

3

Each entry is a weightThe weights correspond

to the “confidence” of a space-time assignment for a given instruction

Four clusters

High confidence

Low confidence

Example Dependence Graph

Critical Path Strengthening

Path Propagation

Parallelism Distribute

Path Propagation

Final Schedule

Convergent Scheduling

“Classical” scheduling passes make absolute decisions that can’t be undone

Convergent scheduling passes make soft decisions in the form of preferencesMistakes made early on can be undone

Passes don’t impose order!

Pass Pass

Double-Edged Sword

The good news: convergent scheduling does not constrain phase orderNice interface makes writing and integrating

passes easy

The bad news: convergent scheduling does not constrain phase orderLimitless number of phase orders to consider,

some of which are much better than others

Our Proposal

Use genetic programming to automatically search for a phase ordering that’s catered to a givenArchitectureCompiler

Our inspiration comes from Cooper’s work [Cooper et al., LCTES 1999]

Genetic Programming

Searching algorithm analogous to Darwinian evolutionMaintain a population of expressions

(sequence INITTIME (sequence PLACE (if imbalanced LOAD COMM)))

Genetic Programming

Searching algorithm analogous to Darwinian evolutionMaintain a population of expressionsSelection

The fittest expressions in the population are more likely to reproduce

ReproductionCrossing over subexpressions of two expressions

Mutation

General Flow

Create initial population(initial solutions)

Evaluation

Selection

Randomly generated initial population

Create Variants

done?

General Flow


Evaluation

Selection

Create Variants

done?

Compiler is modified to use the given expression as the phase ordering

Each expression is evaluated by compiling and running the benchmark(s)

Fitness is the relative speedup over our original phase ordering on the benchmark(s)

General Flow


Evaluation

Selection

Create Variants

done?

Just as with Natural Selection, the fittest individuals are more likely to survive

General Flow


Evaluation

Selection

Create Variants

done?

Use crossover and mutation to generate new expressions

And thus, generate new and hopefully improved phase orderings

Experimental Setup

We use an in-house VLIW compiler (SUIF, MachSUIF) and simulator

Compiler and simulator are parameterized so we can easily change VLIW configurations

Experiments presented here are for clustered architecturesDetails of the architectures are in the paper

Convergent Scheduling Heuristics

Noise Introduction Initial Time Assignment Preplacement Critical Path Strengthening Communication Minimization Parallelism Distribution Load Balance Dependence Enforcement Assignment Strengthening Functional Unit Distribution Push to first cluster Critical Path Distance Cluster Creation Register Pressure Reduction in Time Register Pressure Reduction in Space

Hand-Tuned Results4-cluster VLIW, Rich Interconnect

0

0.5

1

1.5

2

2.5

3

3.5

4

vvmul rbsorf yuv tomcatv mxm fir cholesky

Spe

edup

PCC

UAS

Convergent

Results4-cluster VLIW, Limited Interconnect

Training an Improved Sequence

Goal: find a sequence that works well for all the benchmarks in the last graph (vmul, rbsorf, yuv, etc.)

Train a sequence using these benchmarks then…For each expression in the population compile

and run all the benchmarks, take the average speedup as fitness

The Schedule

Evolved sequence is much more conservative in communication

inittime func dep func load func dep func comm dep func comm place

func reduces weights of instructions on overloaded clusters

dep increases probability that dependent instruction scheduled “nearby”

comm tries to keep neighboring instructions in same cluster

Results4-cluster VLIW, Limited Interconnect

ResultsLeave-One-Out Cross Validation

Summary of Results

When we changed the architecture, the hand-tuned sequence failedUAS and PCC outperform convergent

schedulingOur GP system found a sequence that

usually outperforms UAS and PCCCross validation suggests that it is

possible to find a “general-purpose” sequence

Running Time

Using about 20 machines in a small cluster of workstations it takes about 2 days to evolve a sequence

This is a one-time process!Performed by the compiler vendor

Disappointing Result

Unfortunately, sequences with conditionals are weeded out of the GP selection processOur system rewards parsimonyConvergent scheduling passes make soft

decisions, so running an extra pass may not be detrimental

We’d like to get to the bottom of this unexpected result

Conclusions

Using GP we’re able to find architecture-specific, application-independent sequences

We can quickly retune the compiler whenThe architecture changesThe compiler itself changes

Implemented Tests

Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May...

Documents

Transcript of Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May...