DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0: }...

169
DLVM

Transcript of DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0: }...

Page 1: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DLVM

Page 2: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DLVMA Modern Compiler Framework for Neural Network DSLs

Page 3: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DLVM

Richard Wei Lane Schwartz Vikram Adve University of Illinois at Urbana-Champaign

A Modern Compiler Framework for Neural Network DSLs

Page 4: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

1-2 years ago

Page 5: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

1-2 years ago

Deep Learning Frameworks

Page 6: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

1-2 years ago

Deep Learning Frameworks

Compiler Technologies

Page 7: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Today

Deep Learning Frameworks

Compiler Technologies

Page 8: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Today

Deep Learning Frameworks

Compiler Technologies

Deep Learning Compiler Technologies

Page 9: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Today• Latte.jl

• XLA

• NNVM + TVM

• ONNX

• PyTorch JIT

• DLVM

Deep Learning Compiler Technologies

Page 10: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Neural networks are programs

Page 11: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Neural networks are programs

Control Flow

Auto Vectorization

Intermediate Representation

Automatic Differentiation

Static Analysis

OptimizationsComputeTyping

Page 12: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

A New Compiler Problem

Page 13: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

A New Compiler Problem

• Programs, not just a data flow graph

Page 14: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

A New Compiler Problem

• Programs, not just a data flow graph

• Type safety

Page 15: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

A New Compiler Problem

• Programs, not just a data flow graph

• Type safety

• Ahead-of-time AD

Page 16: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

A New Compiler Problem

• Programs, not just a data flow graph

• Type safety

• Ahead-of-time AD

• Code generation

Page 17: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

A New Compiler Problem

• Programs, not just a data flow graph

• Type safety

• Ahead-of-time AD

• Code generation

• Lightweight installation

Page 18: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

C/C++

Python

DSL

Interpreter CodeGenGraph Optimizer

Libraries

Page 19: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Python

Page 20: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Python

Page 21: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Safe language

Page 22: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Safe language

DSL

* Rompf, Tiark, and Martin Odersky. Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs, 2010

Page 23: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

• NN as a host language functionSafe language

DSL

* Rompf, Tiark, and Martin Odersky. Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs, 2010

Page 24: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

• NN as a host language function

• Type safety

Safe language

DSL

* Rompf, Tiark, and Martin Odersky. Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs, 2010

Page 25: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

• NN as a host language function

• Type safety

• Naturalness

Safe language

DSL

* Rompf, Tiark, and Martin Odersky. Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs, 2010

Page 26: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

• NN as a host language function

• Type safety

• Naturalness

• Lightweight modular staging*

Safe language

DSL

* Rompf, Tiark, and Martin Odersky. Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs, 2010

Page 27: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

• NN as a host language function

• Type safety

• Naturalness

• Lightweight modular staging*

• Compiler magic

Safe language

DSL

* Rompf, Tiark, and Martin Odersky. Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs, 2010

Page 28: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Safe language

Libraries

DSL

Page 29: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Safe language

Libraries

DSL

• Trainer

Page 30: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Safe language

Libraries

DSL

• Trainer

• Layers

Page 31: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Safe language

Libraries

DSL

• Trainer

• Layers

• Application API

Page 32: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Safe language

Libraries

DSL

Compiler Infrastructure

Page 33: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Safe language

Libraries

DSL

Compiler Infrastructure

• Generic linear algebra IR

Page 34: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Safe language

Libraries

DSL

Compiler Infrastructure

• Generic linear algebra IR

• Automatic differentiation

Page 35: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Safe language

Libraries

DSL

Compiler Infrastructure

• Generic linear algebra IR

• Automatic differentiation

• Optimizations

Page 36: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Safe language

Libraries

DSL

Compiler Infrastructure

• Generic linear algebra IR

• Automatic differentiation

• Optimizations

• Code generation

Page 37: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Safe language

Libraries

DSL

Compiler Infrastructure

• Generic linear algebra IR

• Automatic differentiation

• Optimizations

• Code generation

• Runtime

Page 38: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DSL

Compiler Infrastructure

Page 39: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DLVM

Page 40: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

• Linear algebra IR

DLVM

Page 41: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

• Linear algebra IR

• Framework for building DSLs

DLVM

Page 42: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

• Linear algebra IR

• Framework for building DSLs

• Automatic backpropagator

DLVM

Page 43: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

• Linear algebra IR

• Framework for building DSLs

• Automatic backpropagator

• Multi-stage optimizerDLVM

Page 44: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

• Linear algebra IR

• Framework for building DSLs

• Automatic backpropagator

• Multi-stage optimizer

• Static code generator based on LLVMDLVM

Page 45: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:
Page 46: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Intermediate Representation

Analyses Verifier Transforms

CoreComputeCoreOp CoreTensor

DLVM Core

Page 47: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

TEL (Standalone DSL)

NNKit (Embedded DSL)

DSL stack

Intermediate Representation

Analyses Verifier Transforms

CoreComputeCoreOp CoreTensor

DLVM Core

Page 48: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

LLVM IR Generator DLRuntime

TEL (Standalone DSL)

NNKit (Embedded DSL)

DSL stack

Intermediate Representation

Analyses Verifier Transforms

CoreComputeCoreOp CoreTensor

DLVM Core

Page 49: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

LLVM IR Generator DLRuntime

Command Line Toolchain dlc, dlopt

TEL (Standalone DSL)

NNKit (Embedded DSL)

DSL stack

Intermediate Representation

Analyses Verifier Transforms

CoreComputeCoreOp CoreTensor

DLVM Core

Page 50: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

LLVM IR Generator DLRuntime

dlc, dlopt

Intermediate Representation

Analyses Verifier Transforms

CoreComputeCoreOp CoreTensor

DLVM Core

LLVM Compiler Infrastructure

GPU CPU

Page 51: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Core Language: DLVM IR

Page 52: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Tensor TypeRank Notation Descripton

0 i64 64-bit integer

1 <100 x f32> float vector of size 100

2 <100 x 300 x f64> double matrix of size 100x300

n <100 x 300 x ... x bool> rank-n tensor

First-class tensors

Page 53: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Domain-Specific InstructionsKind Example

Element-wise unary tanh %a: <10 x f32>

Element-wise binary power %a: <10 x f32>, %b: 2: f32

Dot dot %a: <10 x 20 x f32>, %b: <20 x 2 x f32>

Concatenate concatenate %a: <10 x f32>, %b: <20 x f32> along 0

Reduce reduce %a: <10 x 30 x f32> by add along 1

Transpose transpose %m: <2 x 3 x 4 x 5 x i32>

Convolution convolve %a: <…> kernel %b: <…> stride %c: <…> …

Slice slice %a: <10 x 20 x i32> from 1 upto 5

Random random 768 x 10 from 0.0: f32 upto 1.0: f32

Select select %x: <10 x f64>, %y: <10 x f64> by %flags: <10 x bool>

Compare greaterThan %a: <10 x 20 x bool>, %b: <1 x 20 x bool>

Data type cast dataTypeCast %x: <10 x i32> to f64

Page 54: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

General-Purpose InstructionsKind Example

Function application apply %foo(%x: f32, %y: f32): (f32, f32) -> <10 x 10 x f32>

Branch branch 'block_name(%a: i32, %b: i32)

Conditional (if-then-else) conditional %cond: bool then 'then_block() else 'else_block()

Shape cast shapeCast %a: <1 x 40 x f32> to 2 x 20

Extract extract #x from %pt: $Point

Insert insert 10: f32 to %pt: $Point at #x

Allocate stack allocateStack $Point count 1

Allocate heap allocateHeap $MNIST count 1

Deallocate deallocate %x: *<10 x f32>

Load load %ptr: *<10 x i32>

Store store %x: <10 x i32> to %ptr: *<10 x i32>

Copy copy from %src: *<10 x f16> to %dst: *<10 x f16> count 1: i64

Page 55: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Instruction Set

Page 56: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Instruction Set

• Primitive math operators & general purpose operators

Page 57: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Instruction Set

• Primitive math operators & general purpose operators

• No softmax, sigmoid

• Composed by primitive math ops

Page 58: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Instruction Set

• Primitive math operators & general purpose operators

• No softmax, sigmoid

• Composed by primitive math ops

• No min, max, relu

• Composed by compare & select

Page 59: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DLVM IR

Page 60: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DLVM IR• Full static single assignment (SSA) form

Page 61: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DLVM IR• Full static single assignment (SSA) form

• Control flow graph (CFG) and basic blocks with arguments

Page 62: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DLVM IR• Full static single assignment (SSA) form

• Control flow graph (CFG) and basic blocks with arguments

• Custom type definitions

Page 63: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DLVM IR• Full static single assignment (SSA) form

• Control flow graph (CFG) and basic blocks with arguments

• Custom type definitions

• Modular architecture (module - function - basic block - instruction)

Page 64: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DLVM IR• Full static single assignment (SSA) form

• Control flow graph (CFG) and basic blocks with arguments

• Custom type definitions

• Modular architecture (module - function - basic block - instruction)

• Textual format & in-memory format

Page 65: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DLVM IR• Full static single assignment (SSA) form

• Control flow graph (CFG) and basic blocks with arguments

• Custom type definitions

• Modular architecture (module - function - basic block - instruction)

• Textual format & in-memory format

• Built-in parser and verifier

Page 66: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DLVM IR• Full static single assignment (SSA) form

• Control flow graph (CFG) and basic blocks with arguments

• Custom type definitions

• Modular architecture (module - function - basic block - instruction)

• Textual format & in-memory format

• Built-in parser and verifier

• Robust unit testing via LLVM Integrated Tester (lit) and FileCheck

Page 67: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DLVM IRModule

Type DefinitionFunction

Basic Block

Basic Block

Basic Block

Function

Basic Block

Basic Block

Basic Block

Page 68: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

module "my_module" // Module declaration stage raw // Raw stage IR in the compilation phase

struct $Classifier { #w: <784 x 10 x f32>, #b: <1 x 10 x f32>, }

type $MyClassifier = $Classifier

Page 69: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

module "my_module" // Module declaration stage raw // Raw stage IR in the compilation phase

struct $Classifier { #w: <784 x 10 x f32>, #b: <1 x 10 x f32>, }

type $MyClassifier = $Classifier

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

Page 70: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

module "my_module" // Module declaration stage raw // Raw stage IR in the compilation phase

struct $Classifier { #w: <784 x 10 x f32>, #b: <1 x 10 x f32>, }

type $MyClassifier = $Classifier

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> conditional true: bool then ‘b0() else 'b1() 'b0(): return %0.1: <1 x 10 x f32> ‘b1(): return 0: <1 x 10 x f32> }

Page 71: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Transformations: Differentiation & Optimizations

Page 72: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

Page 73: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

[gradient @inference wrt 1, 2] func @inference_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>)

Page 74: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

[gradient @inference wrt 1, 2] func @inference_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>)

Differentiation PassCanonicalizes every gradient function declaration in an IR module

Page 75: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

func @inference_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>) { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>):

}

Page 76: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

func @inference_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>) { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>):

}

Copy instructions from original function

Page 77: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

func @inference_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>) { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32>

}

Page 78: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

func @inference_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>) { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32>

}

Generate adjoint code

Page 79: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @inference_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>) { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> %0.2 = transpose %x: <1 x 784 x f32> %0.3 = multiply %0.2: <1 x 784 x f32>, 1: f32 return (%0.3: <1 x 10 x f32>, 1: f32): (<1 x 10 x f32>, f32) }

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

Page 80: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @inference_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>) { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> %0.2 = transpose %x: <1 x 784 x f32> %0.3 = multiply %0.2: <1 x 784 x f32>, 1: f32 return (%0.3: <1 x 10 x f32>, 1: f32): (<1 x 10 x f32>, f32) }

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

Algebra Simplification Pass

Page 81: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

func @inference_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>) { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> %0.2 = transpose %x: <1 x 784 x f32> return (%0.2: <1 x 10 x f32>, 1: f32): (<1 x 10 x f32>, f32) }

Page 82: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

func @inference_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>) { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> %0.2 = transpose %x: <1 x 784 x f32> return (%0.2: <1 x 10 x f32>, 1: f32): (<1 x 10 x f32>, f32) }

Dead Code Elimination Pass

Page 83: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

func @inference_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>) { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = transpose %x: <1 x 784 x f32> return (%0.0: <1 x 10 x f32>, 1: f32): (<1 x 10 x f32>, f32) }

Page 84: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

Page 85: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

[gradient @inference from 0] func @inference_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>)

Page 86: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

[gradient @inference from 0] func @inference_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>)

Configurable gradient declarationfrom: selecting which output to differentiate in tuple return

Page 87: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

[gradient @inference from 0 wrt 1, 2] func @inference_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>)

Configurable gradient declarationfrom: selecting which output to differentiate in tuple return wrt: with respect to arguments 1 & 2

Page 88: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

[gradient @inference from 0 wrt 1, 2 keeping 0] func @inference_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>, <1 x 10 x f32>)

Configurable gradient declarationfrom: selecting which output to differentiate in tuple return wrt: with respect to arguments 1 & 2keeping: keeping original output

Page 89: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @inference: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

[gradient @inference from 0 wrt 1, 2 keeping 0 seedable] func @inference_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>, <1 x 10 x f32>)

Configurable gradient declarationfrom: selecting which output to differentiate in tuple return wrt: with respect to arguments 1 & 2keeping: keeping original outputseedable: allow passing in back-propagated gradients as seed

Page 90: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @f: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

func @g: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = apply @f(%x, %w, %b): (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> %0.1 = tanh %0.0: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

Page 91: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @f: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

func @g: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = apply @f(%x, %w, %b): (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> %0.1 = tanh %0.0: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

[gradient @g wrt 1, 2] func @g_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>)

Page 92: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @f: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

func @g: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = apply @f(%x, %w, %b): (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> %0.1 = tanh %0.0: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

[gradient @g wrt 1, 2] func @g_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>)

Page 93: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @f: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

func @g: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = apply @f(%x, %w, %b): (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> %0.1 = tanh %0.0: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

[gradient @g wrt 1, 2] func @g_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>)

[gradient @f wrt 1, 2 seedable] func @f_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>)

Page 94: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

func @f: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

func @g: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = apply @f(%x, %w, %b): (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> %0.1 = tanh %0.0: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

[gradient @g wrt 1, 2] func @g_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>)

[gradient @f wrt 1, 2 seedable] func @f_grad: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>)

Seed

Page 95: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Compilation Phases

Page 96: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Compilation Phases

DLVM

Page 97: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Compilation Phases

DLVM

stage raw

Page 98: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Compilation Phases

DLVM

Analyses & Verification

stage raw

Page 99: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Compilation Phases

DLVM

Analyses & Verification

Dominance

Side Effects

Type Checking

Differentiability

stage raw

Page 100: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Compilation Phases

Differentiation

DLVM

Analyses & Verification

Dominance

Side Effects

Type Checking

Differentiability

stage raw

Page 101: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Compilation Phases

Differentiation

DLVM

Analyses & Verification

Dominance

Side Effects

Type Checking

Differentiability

stage raw stage optimizable

Page 102: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Compilation Phases

Differentiation Optimizations

DLVM

Analyses & Verification

Dominance

Side Effects

Type Checking

Differentiability

stage raw stage optimizable

Page 103: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Compilation Phases

Differentiation Optimizations

DLVM

Analyses & Verification

Dominance

Side Effects

Type Checking

Differentiability

Linear Alg Fusion

AD Checkpointing

Algebra Simp

Constant Prop

Dead Code Elim

Com Subexpr Elim

Matrix Chaining

stage raw stage optimizable

Page 104: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Compilation Phases

Differentiation Optimizations Compute Generation

DLVM

Analyses & Verification

Dominance

Side Effects

Type Checking

Differentiability

Linear Alg Fusion

AD Checkpointing

Algebra Simp

Constant Prop

Dead Code Elim

Com Subexpr Elim

Matrix Chaining

stage raw stage optimizable

Page 105: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Compilation Phases

Differentiation Optimizations Compute Generation

DLVM

Analyses & Verification

Dominance

Side Effects

Type Checking

Differentiability

Linear Alg Fusion

AD Checkpointing

Algebra Simp

Constant Prop

Dead Code Elim

Com Subexpr Elim

Matrix Chaining

stage raw stage optimizable stage compute

Page 106: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Compilation Phases

Differentiation Optimizations Compute Generation

DLVM

Analyses & Verification

Dominance

Side Effects

Type Checking

Differentiability

Linear Alg Fusion

AD Checkpointing

Algebra Simp

Constant Prop

Dead Code Elim

Com Subexpr Elim

Matrix Chaining

stage raw stage optimizable stage compute

Compute Scheduling

Page 107: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Compilation Phases

Differentiation Optimizations Compute Generation

DLVM

Analyses & Verification

Dominance

Side Effects

Type Checking

Differentiability

Linear Alg Fusion

AD Checkpointing

Algebra Simp

Constant Prop

Dead Code Elim

Com Subexpr Elim

Matrix Chaining

stage raw stage optimizable stage compute

Compute Scheduling

stage schedule

Page 108: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Compilation Phases

Differentiation Optimizations LLGenCompute Generation

DLVM

Analyses & Verification

Dominance

Side Effects

Type Checking

Differentiability

Linear Alg Fusion

AD Checkpointing

Algebra Simp

Constant Prop

Dead Code Elim

Com Subexpr Elim

Matrix Chaining

stage raw stage optimizable stage compute

Compute Scheduling

stage schedule

Page 109: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Compilation Phases

Differentiation Optimizations LLGenCompute Generation

LLVM DLVM

Analyses & Verification

Dominance

Side Effects

Type Checking

Differentiability

Linear Alg Fusion

AD Checkpointing

Algebra Simp

Constant Prop

Dead Code Elim

Com Subexpr Elim

Matrix Chaining

stage raw stage optimizable stage compute

Compute Scheduling

stage schedule

Page 110: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DLVM

Page 111: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DSL

Compiler Infrastructure

Page 112: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DSL

Compiler Infrastructure

Page 113: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DSL

Compiler Infrastructure

Page 114: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DSL

Compiler Infrastructure

• NN as program, not a graph

• Static analysis

• Type safety

• Naturalness

• Lightweight modular staging

• Compiler magic

Page 115: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

NNKit: Staged DSL in Swift

Page 116: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

NNKit

Page 117: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

NNKit

• It’s a prototype!

Page 118: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

NNKit

• It’s a prototype!

• Tensor computation embedded in host language

Page 119: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

NNKit

• It’s a prototype!

• Tensor computation embedded in host language

• Type safety

Page 120: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

NNKit

• It’s a prototype!

• Tensor computation embedded in host language

• Type safety

• Generates DLVM IR on the fly

Page 121: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Language

Page 122: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Language• Statically ranked tensors

• T, Tensor1D<T>, Tensor2D<T>, Tensor3D<T>, Tensor4D<T>

Page 123: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Language• Statically ranked tensors

• T, Tensor1D<T>, Tensor2D<T>, Tensor3D<T>, Tensor4D<T>

• Type wrapper for staging – Rep<Wrapped>

• Rep<Float>, Rep<Tensor1D<Float>>, Rep<Tensor2D<T>>

Page 124: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Language• Statically ranked tensors

• T, Tensor1D<T>, Tensor2D<T>, Tensor3D<T>, Tensor4D<T>

• Type wrapper for staging – Rep<Wrapped>

• Rep<Float>, Rep<Tensor1D<Float>>, Rep<Tensor2D<T>>

• Operator overloading

• func + <T: Numeric>(_: Rep<T>, _: Rep<T>) -> Rep<T>

• func • (_: Rep<Tensor2D<T>>, _: Rep<Tensor2D<T>>) -> Rep<Tensor2D<T>>

Page 125: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Language

Page 126: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Language

• Lambda abstraction

• func lambda<T, U>(_ f: (Rep<T>) -> Rep<U>) -> Rep<(T) -> U>

Page 127: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Language

• Lambda abstraction

• func lambda<T, U>(_ f: (Rep<T>) -> Rep<U>) -> Rep<(T) -> U>

• Function application

• subscript<T, U>(arg: Rep<T>) -> Rep<U> where Wrapped == (T) -> U

• subscript<T, U>(arg: T) -> U where Wrapped == (T) -> U// JIT DLVM IR

Page 128: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Staged Evaluation

Page 129: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Staged Evaluation

Rep<(Float2D) -> Float2D>

Page 130: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Staged Evaluation

Rep<(Float2D) -> Float2D>

(Float2D) -> Float2D

Page 131: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Staged Evaluation

Page 132: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Staged Evaluation

Rep<(Float2D) -> Float2D>

Page 133: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Staged Evaluation

Expression Staging

Rep<(Float2D) -> Float2D>

Page 134: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Staged Evaluation

Expression Staging

Shape Specialization

Rep<(Float2D) -> Float2D>

Page 135: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Staged Evaluation

Expression Staging

Shape Specialization

DLVM IR Generation

Rep<(Float2D) -> Float2D>

Page 136: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Staged Evaluation

Expression Staging

Shape Specialization

DLVM IR Generation

Rep<(Float2D) -> Float2D>

Page 137: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Staged Evaluation

Expression Staging

Shape Specialization

DLVM IR Generation

Rep<(Float2D) -> Float2D>

Page 138: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Staged Evaluation

Expression Staging

Shape Specialization

DLVM IR Generation

Function Reification

Rep<(Float2D) -> Float2D>

Page 139: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Staged Evaluation

Expression Staging

Shape Specialization

DLVM IR Generation

Function Reification

Rep<(Float2D) -> Float2D> (Float2D) -> Float2D

Page 140: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

typealias Float2D = Tensor2D<Float>

let inference: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

let foo: Parameters = … inference[..., foo.w, foo.b]

Page 141: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

typealias Float2D = Tensor2D<Float>

struct Parameters { var w: Float2D var b: Float2D }

let inference: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

let foo: Parameters = … inference[..., foo.w, foo.b]

Page 142: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

typealias Float2D = Tensor2D<Float>

struct Parameters { var w: Float2D var b: Float2D }

let f: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

let foo: Parameters = … inference[..., foo.w, foo.b]

Page 143: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

typealias Float2D = Tensor2D<Float>

struct Parameters { var w: Float2D var b: Float2D }

let f: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

let foo: Parameters = … inference[..., foo.w, foo.b]

Page 144: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

typealias Float2D = Tensor2D<Float>

struct Parameters { var w: Float2D var b: Float2D }

let f: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

let params: Parameters = …

Page 145: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

typealias Float2D = Tensor2D<Float>

struct Parameters { var w: Float2D var b: Float2D }

let f: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

let params: Parameters = … let x: Float2D = [[0.0, 1.0]]

Page 146: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

typealias Float2D = Tensor2D<Float>

struct Parameters { var w: Float2D var b: Float2D }

let f: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

let params: Parameters = … let x: Float2D = [[0.0, 1.0]] f[x, params.w, params.b] // ==> result

Page 147: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

let f: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

f[x, w, b] // x: 1x784, w: 784x10, b: 1x10

Page 148: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

let f: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

f[x, w, b] // x: 1x784, w: 784x10, b: 1x10

func @f: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> <1 x 10 x f32> { 'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <1 x 10 x f32>): %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32> %0.1 = add %0.0: <1 x 10 x f32>, %b: <1 x 10 x f32> return %0.1: <1 x 10 x f32> }

Page 149: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

let f: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

Page 150: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

let f: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

let g = lambda { x, w, b in let linear = f[x, w, b] return tanh(linear) }

Page 151: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

let f: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

let g = lambda { x, w, b in let linear = f[x, w, b] return tanh(linear) }

let ∇g = gradient(of: g, withRespectTo: (1, 2))

Page 152: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

let f: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

let g = lambda { x, w, b in let linear = f[x, w, b] return tanh(linear) }

let ∇g = gradient(of: g, withRespectTo: (1, 2)) // ∇g : Rep<(Float2D, Float2D, Float2D) -> (Float2D, Float2D)>

Page 153: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

let f: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

let g = lambda { x, w, b in let linear = f[x, w, b] return tanh(linear) }

let ∇g = gradient(of: g, withRespectTo: (1, 2)) // ∇g : Rep<(Float2D, Float2D, Float2D) -> (Float2D, Float2D)>

∇g[x, w, b] // ==> ( ∂g/∂w, ∂g/∂b )

Page 154: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

let f: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

let g = lambda { x, w, b in let linear = f[x, w, b] return tanh(linear) }

let ∇g = gradient(of: g, withRespectTo: (1, 2)) // ∇g : Rep<(Float2D, Float2D, Float2D) -> (Float2D, Float2D)>

∇g[x, w, b] // ==> ( ∂g/∂w, ∂g/∂b )

[gradient @f wrt 1, 2] func @g: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>)

Page 155: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

let f: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

let g = lambda { x, w, b in let linear = f[x, w, b] return tanh(linear) }

let ∇g = gradient(of: g, withRespectTo: (1, 2)) // ∇g : Rep<(Float2D, Float2D, Float2D) -> (Float2D, Float2D)>

∇g[x, w, b] // ==> ( ∂g/∂w, ∂g/∂b )

[gradient @f wrt 1, 2] func @g: (<1 x 784 x f32>, <784 x 10 x f32>, <1 x 10 x f32>) -> (<784 x 10 x f32>, <1 x 10 x f32>)

Page 156: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

let f: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

let g = lambda { x, w, b in let linear = f[x, w, b] return tanh(linear) }

let ∇g = gradient(of: g, withRespectTo: (1, 2)) // ∇g : Rep<(Float2D, Float2D, Float2D) -> (Float2D, Float2D)>

∇g[x, w, b] // ==> ( ∂g/∂w, ∂g/∂b )

Page 157: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

let f: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

let g = lambda { x, w, b in let linear = f[x, w, b] return tanh(linear) }

let ∇g = gradient(of: g, withRespectTo: (1, 2), seedable: true) // ∇g : Rep<(Float2D, Float2D, Float2D, Float2D) -> (Float2D, Float2D)>

Page 158: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

let f: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

let g = lambda { x, w, b in let linear = f[x, w, b] return tanh(linear) }

let ∇g = gradient(of: g, withRespectTo: (1, 2), seedable: true, keeping: (0)) // ∇g : Rep<(Float2D, Float2D, Float2D, Float2D) -> (Float2D, Float2D, Float2D)>

Page 159: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

let f: Rep<(Float2D, Float2D, Float2D) -> Float2D> = lambda { x, w, b in x • w + b }

let g = lambda { x, w, b in let linear = f[x, w, b] return tanh(linear) }

let ∇g = gradient(of: g, withRespectTo: (1, 2), seedable: true, keeping: (0)) // ∇g : Rep<(Float2D, Float2D, Float2D, Float2D) -> (Float2D, Float2D, Float2D)>

∇g[x, w, b, ∂h_∂g] // ==> ( ∂h/∂w, ∂h/∂b, g(x,w,b) )

Page 160: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Safe language

Libraries

DSL

Compiler Infrastructure

Page 161: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Swift

Libraries

NNKit

DLVM

Page 162: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

Swift

Libraries

NNKit

DLVM DLVM is written in Swift!

Page 163: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

PL & Compilers + ML

Page 164: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

PL & Compilers + ML

• Programs, not just a data flow graph

Page 165: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

PL & Compilers + ML

• Programs, not just a data flow graph

• Type safety

Page 166: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

PL & Compilers + ML

• Programs, not just a data flow graph

• Type safety

• Ahead-of-time AD

Page 167: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

PL & Compilers + ML

• Programs, not just a data flow graph

• Type safety

• Ahead-of-time AD

• Code generation

Page 168: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

dlvm.org

Page 169: DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0:  } Transformations: Differentiation & Optimizations. func @inference:

DLVM