Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified...

60
Automatic Compilation for Domain Specific Accelerators Ross Daly Caleb Donovick Jackson Melchert

Transcript of Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified...

Page 1: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Automatic Compilation for Domain Specific Accelerators

Ross Daly Caleb Donovick

Jackson Melchert

Page 2: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Golden Age of Computer Architecture!

Page 3: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

• Architecture Specifications change frequently

Golden Age of Computer Architecture!

Page 4: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

• Architecture Specifications change frequently • Compiler is the (often overlooked) key component!

Golden Age of Computer Architecture!

Page 5: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

• Architecture Specifications change frequently • Compiler is the (often overlooked) key component! • Waterfall methodology:

Golden Age of Computer Architecture!

ApplicationAnalysis

Architectural Specification

RTL Design and Test

Physical Design

Software / Compiler

Design

Page 6: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

• Architecture Specifications change frequently • Compiler is the (often overlooked) key component! • Agile methodology:

Golden Age of Computer Architecture!

Base Hardware Accelerator v0

Compiler Toolchain v0

Application 1Application 2

Power, Performance,

Area

Base Hardware Accelerator v1

Compiler Toolchain v1

Incremental Updates

Application 2.1Application 3

Page 7: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

• Architecture Specifications change frequently • Compiler is the (often overlooked) key component! • Agile methodology: • Automatically generate compiler for every spec change

Golden Age of Computer Architecture!

Base Hardware Accelerator v0

Compiler Toolchain v0

Application 1Application 2

Power, Performance,

Area

Base Hardware Accelerator v1

Compiler Toolchain v1

Incremental Updates

Application 2.1Application 3

Page 8: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

CPU

• Compile to IR (CoreIR) • Common Optimizations • Mapping • Packing • Placement • Routing • Bitfile generation

• Compile to IR (LLVM) • Common Optimizations • Instruction Selection • Peephole Optimization • Instruction Scheduling • Register Allocation • Assembly

CGRA/FPGA

Page 9: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

CPU

• Compile to IR (CoreIR) • Common Optimizations • Mapping • Packing • Placement • Routing • Bitfile generation

• Compile to IR (LLVM) • Common Optimizations • Instruction Selection • Peephole Optimization • Instruction Scheduling • Register Allocation • Assembly

CGRA/FPGA

Page 10: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

CGRA Mapping

Lower

Application Halide Program

CoreIR Graph

Map PE and Memory

Mapped CoreIR Graph

CGRA Bitstream

Page 11: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Our DSL-based Hardware Generation and Software Compilation Flow

PEak Compiler

PE HW in Magma

CGRA Verilog

PEak Program (PE spec)

Halide Compiler

CoreIR Graph

PE and MEM Mapper

Mapped CoreIR Graph

CGRA Bitstream

Place & Route Engine

Application Halide Program

Magma Compiler

Compiler Collateral

Page 12: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Our DSL-based Hardware Generation and Software Compilation Flow

Lake CompilerPEak Compiler

PE HW in Magma

CGRA Verilog

Lake Program (MEM spec)

PEak Program (PE spec)

Halide Compiler

CoreIR Graph

PE and MEM Mapper

Mapped CoreIR Graph

CGRA Bitstream

Place & Route Engine

Application Halide Program

Magma Compiler

MEM HW in Magma

Compiler Collateral

Page 13: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Output of Halide Compiler

Unified Buffer

Unified Buffer

Computation Kernel

Computation Kernel

CoreIR Graph

From Global Buffer

To Global Buffer

Page 14: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Desired Output of Mapper

From Global Buffer

To Global Buffer

Lake-Specified Mem Tile

PEak-Specified PE Tile

Mapped CoreIR Graph

Page 15: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

To Buffer/IO

Kernels are composed of CoreIR PrimitivesCoreIR Primitives

add

add

sub

ashr

divmul

mul

Computational Kernel

From Buffer/IO

Page 16: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

CoreIR has SMT QF BitVector Semantics

In0 In1

Out

CoreIR.Sub Out = In0 - In1

Page 17: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Mapping

a

as

a

dm

m

PEak-Specified PE Tile

CoreIR Primitives

Kernel Mapped Kernel

Page 18: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

PEak-Specified PE Tile

CoreIR Primitives

Rewrite Rule 1

Rewrite Rule 2

Rewrite Rule 3

Rewrite Rule 4

Rewrite Rule Table

a

as

a

dm

m

Kernel

Instruction Selection Algorithm

Mapped Kernel

Instruction Selection

div

mul add

sub

ashr add

Page 19: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

PEak-Specified PE Tile

CoreIR Primitives

Rewrite Rule 1

Rewrite Rule 2

Rewrite Rule 3

Rewrite Rule 4

Rewrite Rule Table

4.3

6.0

3.1

1.2

a

as

a

dm

m

Kernel

Instruction Selection Algorithm

Mapped Kernel

Instruction Selection

div

mul add

sub

ashr add

Cost

Page 20: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Peak Compiler generates a table of Rewrite Rules

PEak Compiler

PEak Program (PE spec)

Halide Compiler

CoreIR Graph

PE and MEM Mapper

Mapped CoreIR Graph

CGRA Bitstream

Place & Route Engine

Application Halide Program

Rewrite Rule 1

Rewrite Rule 2

Rewrite Rule 3

Rewrite Rule 4

Rewrite Rule Table

div

mul add

sub

ashr add

Page 21: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

PEak: PE DSLPE Functional Specificationclass PE(Peak): def __call__(self, inst: Const(Instruction), A: Word, B: Word, C: Word) -> {“res”:Data, “flag”:Bit}:

if inst.invert_A: A = ~A

if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :

... return res, flag

PE ISA Specification

class Opcode(Enum): Add = 0 Mul = 1 …# Define Instructionclass Instruction(Product): op = Opcode invert_A = Bit c_in = Bit # Define WordWord = UnsignedBitVector[16]

Specific types (or composition of types) for operands and instructions

Page 22: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

PEak: PE DSLPE Functional Specificationclass PE(Peak): def __call__(self, inst: Const(Instruction), A: Word, B: Word, C: Word) -> {“res”:Word, “flag”:Bit}:

if inst.invert_A: A = ~A

if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :

... return res, flag

PE ISA Specification

class Opcode(Enum): Add = 0 Mul = 1 …# Define Instructionclass Instruction(Product): op = Opcode invert_A = Bit c_in = Bit # Define WordWord = UnsignedBitVector[16]

Specific types (or composition of types) for operands and instructions

Page 23: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

PEak: PE DSLPE Functional Specificationclass PE(Peak): def __call__(self, inst: Const(Instruction), A: Word, B: Word, C: Word) -> {“res”:Word, “flag”:Bit}:

if inst.invert_A: A = ~A

if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :

... return res, flag

PE ISA Specification

class Opcode(Enum): Add = 0 Mul = 1 …# Define Instructionclass Instruction(Product): op = Opcode invert_A = Bit c_in = Bit # Define WordWord = UnsignedBitVector[16]

Specific types (or composition of types) for operands and instructions

Page 24: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

PEak: PE DSLPE Functional Specificationclass PE(Peak): def __call__(self, inst: Const(Instruction), A: Word, B: Word, C: Word) -> {“res”:Word, “flag”:Bit}:

if inst.invert_A: A = ~A

if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :

... return res, flag

PE ISA Specification

class Opcode(Enum): Add = 0 Mul = 1 …# Define Instructionclass Instruction(Product): op = Opcode invert_A = Bit c_in = Bit # Define WordWord = UnsignedBitVector[16]

Specific types (or composition of types) for operands and instructions

Page 25: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

PEak: PE DSLPE Functional Specificationclass PE(Peak): def __call__(self, inst: Const(Instruction), A: Word, B: Word, C: Word) -> {“res”:Word, “flag”:Bit}:

if inst.invert_A: A = ~A

if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :

... return res, flag

PE ISA Specification

class Opcode(Enum): Add = 0 Mul = 1 …# Define Instructionclass Instruction(Product): op = Opcode invert_A = Bit c_in = Bit # Define WordWord = UnsignedBitVector[16]

Specific types (or composition of types) for operands and instructions

Page 26: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

PEak: PE DSLPE Functional Specificationclass PE(Peak): def __call__(self, inst: Const(Instruction), A: Word, B: Word, C: Word) -> {“res”:Word, “flag”:Bit}:

if inst.invert_A: A = ~A

if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :

... return res, flag

PE ISA Specification

class Opcode(Enum): Add = 0 Mul = 1 …# Define Instructionclass Instruction(Product): op = Opcode invert_A = Bit c_in = Bit # Define WordWord = UnsignedBitVector[16]

Specific types (or composition of types) for operands and instructions

Page 27: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Subtract?

res flag

A B C

PE

inst

PE Functional Specificationclass PE(Peak): def __call__(self, inst: Instruction, A: Word, B: Word, C: Word) -> {“res”:Word, “flag”:Bit}:

if inst.invert_A: A = ~A

if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :

... return res, flag

Page 28: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Subtract?

res flag

A B C

PE

inst

PE Functional Specificationclass PE(Peak): def __call__(self, inst: Instruction, A: Word, B: Word, C: Word) -> {“res”:Word, “flag”:Bit}:

if inst.invert_A: A = ~A

if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :

... return res, flag

= Instruction( op=Add, invert_A=1, c_in=1)

Page 29: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Subtract?

res flag

A B C

PE

inst

PE Functional Specificationclass PE(Peak): def __call__(self, inst: Instruction, A: Word, B: Word, C: Word) -> {“res”:Word, “flag”:Bit}:

if inst.invert_A: A = ~A

if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :

... return res, flag

= Instruction( op=Add, invert_A=1, c_in=1)

res = ~A + B + 1

Page 30: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Subtract?

res flag

A B C

PE

inst

PE Functional Specificationclass PE(Peak): def __call__(self, inst: Instruction, A: Word, B: Word, C: Word) -> {“res”:Word, “flag”:Bit}:

if inst.invert_A: A = ~A

if inst.op == Opcode.Add: res, c_out = A.add(B, inst.c_in) flag = c_out elif inst.op == Opcode.Mul: res = A * B flag = (res == 0) elif ... :

... return res, flag

= Instruction( op=Add, invert_A=1, c_in=1)

res = ~A + B + 1 = B - A

Page 31: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

class RISCV(Peak): def __init__(self): self.rf = RegisterFile(32, Word) self.PC = Register(Data)

def __call__(self, inst: Instruction) ->{“next_PC”:Word}: #ID rs1_idx, rs2_idx, rd_idx, … = decode(inst) rs1_val, rs2_val = self.rf.read(rs1_idx, rs2_idx) #EX ...

#MEM...

#WBself.rf.write(rd_val)

Define sub-components and state

RiscV Peak Specification

Page 32: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

class RISCV(Peak): def __init__(self): self.rf = RegisterFile(32, Word) self.PC = Register(Data)

def __call__(self, inst: Instruction) ->{“next_PC”:Word}: #ID rs1_idx, rs2_idx, rd_idx, … = decode(inst) rs1_val, rs2_val = self.rf.read(rs1_idx, rs2_idx) #EX ...

#MEM...

#WBself.rf.write(rd_val)

Define sub-components and state

RiscV Peak Specification

Page 33: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

RiscV ISA Specification with Algebraic Data Types

Page 34: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

RiscV ISA Specification with Algebraic Data Types

class Register(Product): funct7 = Funct7Enum rs2 = BitVector[5] rs1 = BitVector[5] funct3 = Funct3Enum rd = BitVector[5] opcode= Opcode

class Immediate(Product): ...

class UImmediate(Product): ... class Store(Product): ... class Branch(Product): ... class Jump(Product): ...

Instruction = Sum[Register, Immediate, UImmediate, Store, Branch, Jump]

Page 35: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Multiple Interpretations of PEak Specification

• PEak program uses abstract types provided by the PEak DSL such as Bit, BitVector etc. • Each component of the

PEak compiler provides a separate concrete implementation of these abstract types • Multiple interpretations of a

PEak specification in different contexts

Python Context

Functional Model

PEak Program

BitVector

Magma Context

PEak Program

RTL

Bits

SMT Context

PEak Program

Symbolic Representation

(for Rewrite Rules)

SMTBitVector

Page 36: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Multiple Interpretations of PEak Specification

• PEak program uses abstract types provided by the PEak DSL such as Bit, BitVector etc. • Each component of the

PEak compiler provides a separate concrete implementation of these abstract types • Multiple interpretations of a

PEak specification in different contexts

Python Context

Functional Model

PEak Program

BitVector

Magma Context

PEak Program

RTL

Bits

SMT Context

PEak Program

Symbolic Representation

(for Rewrite Rules)

SMTBitVector

SINGLE SOURCE OF TRUTHPEak Program

Page 37: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

In0 In1

Out

CoreIR.Sub

Discovering a Rewrite Rule

res flag

A B C

PE

inst

Page 38: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

In0 In1

Out

CoreIR.Sub

Input/Output Bindings

res flag

A B C

PE

inst

Page 39: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

In0 In1

Out

CoreIR.Sub

Input/Output Bindings

res flag

A B C

PE

inst

Page 40: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

In0 In1

Out

CoreIR.Sub

Input/Output Bindings

res flag

A B C

PE

inst

Constant

Page 41: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

In0 In1

Out

CoreIR.Sub

Setting Constants

res flag

A B C

PE

inst = Instruction( op=Add, invert_A=1, c_in=1)

Page 42: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

In0 In1

Out

CoreIR.Sub

res flag

A B C

PE

inst

CoreIR.Sub(in0, in1) == PE(inst, input_binding(in0, in1))

Page 43: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

∃(input_binding, inst)

CoreIR.Sub(in0, in1) == PE(inst, input_binding(in0, in1))

st ∀(in0, in1):

Out

In0 In1

CoreIR.Sub

res flag

A B C

PE

inst

Page 44: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

CoreIR.Sub(in0, in1) == PE(inst, input_binding(in0, in1))[‘res’]

Out

In0 In1

CoreIR.Sub

res flag

A B C

PE

inst

∃(input_binding, inst) st ∀(in0, in1):

Page 45: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

∃(input_binding, inst)

CoreIR.Sub(in0, in1) == PE(inst, input_binding(in0, in1, other))[‘res’]

st ∀(in0, in1, other):

Out

In0 In1

CoreIR.Sub

res flag

A B C

PE

inst

Page 46: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

How to Handle State?

res flag

A B C

PE

inst

State

Page 47: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

How to Handle State?

res flag

A B C

PE

inst

State

res flag

A B C

PE

inst

State

Transform

Page 48: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Floating Point?

res flag

A B C

PE

inst

Floating Point

res flag

A B C

PE

inst

Transform

Floating Point

Page 49: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Performance of Rewrite Rule Generator

• Problem: Universally Quantified SMT queries can take a long time • Solutions: • It is okay to be slightly slow (unless doing DSE!) • Different ways to encode the final formula • Different techniques for solving Quantified Expression

• Recent results: • ~ 1 minute to solve 20 rewrite rules on our current CGRA.

Page 50: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

What patterns to use in the rewrite rule table?

PEak Compiler

PEak Program (PE spec)

Halide Compiler

CoreIR Graph

PE and MEM Mapper

Mapped CoreIR Graph

CGRA Bitstream

Place & Route Engine

Application Halide Program

Rewrite Rule 1

Rewrite Rule 2

Rewrite Rule 3

Rewrite Rule 4

Rewrite Rule Table

??

div

mul add

sub

ashr add

Page 51: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Which Patterns?

• Enumerate all possible patterns up to a size • Lots of uncommon patterns • Bloated Rewrite Rule Table • Slower instruction selection

• Analyze target domain’s applications for common subgraphs • Approach used for our upcoming DSE paper

• Only very basic patterns • Use peephole optimization/packing after instruction selection

Page 52: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

CPU Instruction Selection

Page 53: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Unified Buffer

Unified Buffer

Computation Kernel

Computation Kernel

CoreIR Graph

From Global Buffer

To Global Buffer

CGRA Compilation

Page 54: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Basic Block

Basic Block

Basic Block

Basic BlockR2 <— Sub(R0, R1)

R3 <— M[R2] M[R3] <— R1 R4 <— Add(R1, 0x50) …

Control Flow Graph Basic Block(Machine independent)

Page 55: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

In0 In1

Out

Out <— Sub(In0, In1)

Compiling WebAssembly to RiscV?

RISCV

inst

Register File

WebAssembly Subtract

Page 56: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Transform RiscV to remove Register File

RISCV

inst

TransformRegister File Register

File

RISCV

inst

rs1 rs2

rd

Page 57: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

In0 In1

Out

Out <— Sub(In0, In1)

Discovering Subtract

RISCV

inst rs1 rs2

rd

Page 58: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

RISCV

inst rs1 rs2

rd

Branch/Memory Instructions?

PC MemRead

Next PC

Mem Addr

Mem Write

Page 59: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

The Future

• Goal: Fully Automatic compiler generation for Accelerator Architectures

Page 60: Automatic Compilation for Domain Specific Accelerators · 2020. 8. 5. · Mem Tile PEak-Specified PE Tile Mapped CoreIR Graph. To Buffer/IO Kernels are composed of CoreIR Primitives

Thank You