TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End...
Transcript of TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End...
![Page 1: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/1.jpg)
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu,
Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy
!1
![Page 3: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/3.jpg)
Beginning of Story
!3
![Page 4: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/4.jpg)
Beginning of Story
!3
Acclerator
![Page 5: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/5.jpg)
Beginning of Story
!3
Acclerator
![Page 6: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/6.jpg)
Beginning of Story
// Pseudo-code for convolution program for the VIA accelerator// Virtual Thread 00x00: LOAD(PARAM[ 0-71]) // LD@TID00x01: LOAD(ACTIV[ 0-24]) // LD@TID00x02: LOAD(LDBUF[ 0-31]) // LD@TID00x03: PUSH(LD->EX) // LD@TID00x04: POP (LD->EX) // EX@TID00x05: EXE (ACTIV[ 0-24],PARAM[ 0-71],LDBUF[ 0-31],STBUF[ 0- 7]) // EX@TID00x06: PUSH(EX->LD) // EX@TID00x07: PUSH(EX->ST) // EX@TID00x08: POP (EX->ST) // ST@TID00x09: STOR(STBUF[ 0- 7]) // ST@TID00x0A: PUSH(ST->EX) // ST@TID0// Virtual Thread 10x0B: LOAD(ACTIV[25-50]) // LD@TID10x0C: LOAD(LDBUF[32-63]) // LD@TID10x0D: PUSH(LD->EX) // LD@TID10x0E: POP (LD->EX) // EX@TID10x0F: EXE (ACTIV[25-50],PARAM[ 0-71],LDBUF[32-63],STBUF[32-39]) // EX@TID10x10: PUSH(EX->LD) // EX@TID10x11: PUSH(EX->ST) // EX@TID10x12: POP (EX->ST) // ST@TID10x13: STOR(STBUF[32-39]) // ST@TID10x14: PUSH(ST->EX) // ST@TID1// Virtual Thread 20x15: POP (EX->LD) // LD@TID20x16: LOAD(PARAM[ 0-71]) // LD@TID20x17: LOAD(ACTIV[ 0-24]) // LD@TID20x18: LOAD(LDBUF[ 0-31]) // LD@TID20x19: PUSH(LD->EX) // LD@TID20x1A: POP (LD->EX) // EX@TID20x1B: POP (ST->EX) // EX@TID20x1C: EXE (ACTIV[ 0-24],PARAM[ 0-71],LDBUF[ 0-31],STBUF[ 0- 7]) // EX@TID20x1D: PUSH(EX->ST) // EX@TID20x1E: POP (EX->ST) // ST@TID20x1F: STOR(STBUF[ 0- 7]) // ST@TID2// Virtual Thread 30x20: POP (EX->LD) // LD@TID30x21: LOAD(ACTIV[25-50]) // LD@TID30x22: LOAD(LDBUF[32-63]) // LD@TID30x23: PUSH(LD->EX) // LD@TID30x24: POP (LD->EX) // EX@TID30x25: POP (ST->EX) // EX@TID20x26: EXE (ACTIV[25-50],PARAM[ 0-71],LDBUF[32-63],STBUF[32-39]) // EX@TID30x27: PUSH(EX->ST) // EX@TID30x28: POP (EX->ST) // ST@TID30x29: STOR(STBUF[32-39]) // ST@TID3
// Convolution access pattern dictated by micro-coded program.// Each register index is derived as a 2-D affine function.// e.g. idxrf = arfy+brfx+crf
0, where crf0 is specified by
// micro op 0 fields.for y in [0…i) for x in [0…j) rf[idxrf
0] += GEVM(act[idxact0], par[idxpar
0]) rf[idxrf
1] += GEVM(act[idxact1], par[idxpar
1]) … rf[idxrf
n] += GEVM(act[idxactn], par[idxpar
n])
(b) Convolution micro-coded program
// Max-pool, batch normalization and activation function// access pattern dictated by micro-coded program.// Each register index is derived as a 2D affine function.// e.g. idxdst = adsty+bdstx+cdst
0, where cdst0 is specified by
// micro op 0 fields.for y in [0…i) for x in [0…j) // max pooling rf[idxdst
0] = MAX(rf[idxdst0], rf[idxsrc
0]) rf[idxdst
1] = MAX(rf[idxdst1], rf[idxsrc
1]) … // batch norm rf[idxdst
m] = MUL(rf[idxdstm], rf[idxsrc
m]) rf[idxdst
m+1] = ADD(rf[idxdstm+1], rf[idxsrc
m+1]) rf[idxdst
m+2] = MUL(rf[idxdstm+2], rf[idxsrc
m+2]) rf[idxdst
m+3] = ADD(rf[idxdstm+3], rf[idxsrc
m+3]) … // activation rf[idxdst
n-1] = RELU(rf[idxdstn-1], rf[idxsrc
n-1]) rf[idxdst
n] = RELU(rf[idxdstn], rf[idxsrc
n])
(c) Max pool, batch norm and activationmicro-coded program
(a) Blocked convolution program with multiple thread contexts
!3
Acclerator
![Page 7: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/7.jpg)
Goal: Deploy Deep Learning Everywhere
!4
Frameworks
![Page 8: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/8.jpg)
Goal: Deploy Deep Learning Everywhere
Explosion of models and frameworks
!4
Frameworks
![Page 9: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/9.jpg)
Goal: Deploy Deep Learning Everywhere
Explosion of models and frameworks
Explosion of hardware backends
!4
Frameworks
![Page 10: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/10.jpg)
Goal: Deploy Deep Learning Everywhere
Explosion of models and frameworks
Explosion of hardware backends
!4
Frameworks
![Page 11: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/11.jpg)
Goal: Deploy Deep Learning Everywhere
Explosion of models and frameworks
Explosion of hardware backends
!4
Frameworks
![Page 12: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/12.jpg)
Goal: Deploy Deep Learning Everywhere
Explosion of models and frameworks
Explosion of hardware backends
!4
Frameworks
![Page 13: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/13.jpg)
Goal: Deploy Deep Learning Everywhere
Explosion of models and frameworks
Explosion of hardware backends
!4
Frameworks
![Page 14: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/14.jpg)
Goal: Deploy Deep Learning Everywhere
Explosion of models and frameworks
Explosion of hardware backends
!4
Frameworks
![Page 15: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/15.jpg)
Goal: Deploy Deep Learning Everywhere
Explosion of models and frameworks
Explosion of hardware backends
!4
Frameworks
![Page 16: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/16.jpg)
Goal: Deploy Deep Learning Everywhere
Explosion of models and frameworks
Explosion of hardware backends
!4
Frameworks
![Page 17: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/17.jpg)
Goal: Deploy Deep Learning Everywhere
Explosion of models and frameworks
Explosion of hardware backends
!4
Frameworks
Acclerator
![Page 18: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/18.jpg)
Goal: Deploy Deep Learning Everywhere
Explosion of models and frameworks
Explosion of hardware backends
Huge gap between model/frameworks and hardware backends
!4
Frameworks
Acclerator
![Page 19: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/19.jpg)
Existing Approach
Hardware!5
Frameworks
![Page 20: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/20.jpg)
Existing Approach
High-level data flow graph
Hardware!5
Frameworks
![Page 21: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/21.jpg)
Existing Approach
High-level data flow graph
Hardware
Primitive Tensor operators such as Conv2D
!5
Frameworks
![Page 22: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/22.jpg)
Existing Approach
High-level data flow graph
Hardware
Primitive Tensor operators such as Conv2D
eg. cuDNN Offload to heavily optimized DNN operator library
!5
Frameworks
![Page 23: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/23.jpg)
Existing Approach: Engineer Optimized Tensor Operators
!6!6
![Page 24: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/24.jpg)
Existing Approach: Engineer Optimized Tensor Operators
!6!6
![Page 25: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/25.jpg)
Existing Approach: Engineer Optimized Tensor Operators
!6!6
C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
Matmul: Operator Specification
![Page 26: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/26.jpg)
Existing Approach: Engineer Optimized Tensor Operators
!6!6
C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
Matmul: Operator Specification
![Page 27: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/27.jpg)
Existing Approach: Engineer Optimized Tensor Operators
!6!6
C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
Matmul: Operator Specification
for y in range(1024): for x in range(1024): C[y][x] = 0 for k in range(1024): C[y][x] += A[k][y] * B[k][x]
Vanilla Code
![Page 28: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/28.jpg)
Existing Approach: Engineer Optimized Tensor Operators
!7!7
C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
Matmul: Operator Specification
for yo in range(128): for xo in range(128): C[yo*8:yo*8+8][xo*8:xo*8+8] = 0 for ko in range(128): for yi in range(8): for xi in range(8): for ki in range(8): C[yo*8+yi][xo*8+xi] += A[ko*8+ki][yo*8+yi] * B[ko*8+ki][xo*8+xi]
Loop Tiling for Locality
![Page 29: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/29.jpg)
Existing Approach: Engineer Optimized Tensor Operators
!8!8
C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
Matmul: Operator Specification
inp_buffer AL[8][8], BL[8][8]acc_buffer CL[8][8]for yo in range(128): for xo in range(128): vdla.fill_zero(CL) for ko in range(128): vdla.dma_copy2d(AL, A[ko*8:ko*8+8][yo*8:yo*8+8]) vdla.dma_copy2d(BL, B[ko*8:ko*8+8][xo*8:xo*8+8]) vdla.fused_gemm8x8_add(CL, AL, BL) vdla.dma_copy2d(C[yo*8:yo*8+8,xo*8:xo*8+8], CL)
Map to Accelerators
Human exploration of optimized code
![Page 30: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/30.jpg)
Limitations of Existing Approach
cuDNN
!9
Frameworks
![Page 31: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/31.jpg)
Limitations of Existing Approach
cuDNN
!9
Frameworks
![Page 32: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/32.jpg)
Limitations of Existing Approach
cuDNN
!9
Frameworks
![Page 33: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/33.jpg)
Limitations of Existing Approach
cuDNN
!9
Frameworks
![Page 34: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/34.jpg)
Limitations of Existing Approach
cuDNN
!9
Frameworks
![Page 35: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/35.jpg)
Limitations of Existing Approach
cuDNN
!9
Frameworks
New operator introduced by operator fusion optimization potentially benefit: 1.5x speedup
![Page 36: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/36.jpg)
Limitations of Existing Approach
cuDNN
!9
Frameworks
New operator introduced by operator fusion optimization potentially benefit: 1.5x speedup
![Page 37: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/37.jpg)
Limitations of Existing Approach
cuDNN
!9
Frameworks
New operator introduced by operator fusion optimization potentially benefit: 1.5x speedup
![Page 38: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/38.jpg)
Limitations of Existing Approach
cuDNN
!9
Frameworks
New operator introduced by operator fusion optimization potentially benefit: 1.5x speedup
Engineering intensive
![Page 39: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/39.jpg)
Learning-based Learning System
High-level data flow graph and optimizations
!10
Hardware
Frameworks
![Page 40: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/40.jpg)
Learning-based Learning System
High-level data flow graph and optimizations
!10
Hardware
Frameworks
![Page 41: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/41.jpg)
Hardware aware Search Space of Optimized Tensor Programs
Learning-based Learning System
High-level data flow graph and optimizations
!10
Hardware
Frameworks
![Page 42: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/42.jpg)
Hardware aware Search Space of Optimized Tensor Programs
Machine Learning based Program Optimizer
Learning-based Learning System
High-level data flow graph and optimizations
!10
Hardware
Frameworks
![Page 43: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/43.jpg)
Hardware aware Search Space of Optimized Tensor Programs
Machine Learning based Program Optimizer
Learning-based Learning System
High-level data flow graph and optimizations
directly generate optimized program for new operator workloads and hardware
!10
Hardware
Frameworks
![Page 44: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/44.jpg)
Learning-based Learning System Frameworks
High-level data flow graph and optimizations
Hardware aware Search Space of Optimized Tensor Programs
Machine Learning based Program Optimizer
!11
Hardware
![Page 45: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/45.jpg)
Learning-based Learning System Frameworks
High-level data flow graph and optimizations
Hardware aware Search Space of Optimized Tensor Programs
Machine Learning based Program Optimizer
!11
Hardware
![Page 46: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/46.jpg)
Hardware-aware Search Space
Hardware!12
C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
Tensor Expression Language (Specification)
![Page 47: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/47.jpg)
Hardware-aware Search Space
Hardware!12
C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
Tensor Expression Language (Specification)
Define search space of hardware aware mappings from expression to hardware program
Based on Halide’s compute/schedule separation
![Page 48: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/48.jpg)
Hardware-aware Search Space
L1D L1IL2
L3
L1D L1IL2
implicitly managed
Compute Primitives
scalar vector
!13
Memory Subsystem
Loop Transformations
Cache Locality Vectorization
CPUs
Reuse primitives from prior work: Halide, Loopy
![Page 49: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/49.jpg)
Challenge to Support Diverse Hardware Backends
!14
CPUs GPUs TPU-like specialized Accelerators
![Page 50: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/50.jpg)
Hardware-aware Search SpaceCompute Primitives
scalar vector
!15
Memory Subsystem
L2
RF RFTX/L1
SM
RF RFTX/L1
SM
mixed
GPUs
![Page 51: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/51.jpg)
Hardware-aware Search SpaceCompute Primitives
scalar vector
!15
Memory Subsystem
L2
RF RFTX/L1
SM
RF RFTX/L1
SM
mixed
GPUs
Shared memory among compute cores
![Page 52: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/52.jpg)
Hardware-aware Search SpaceCompute Primitives
scalar vector
!15
Memory Subsystem
L2
RF RFTX/L1
SM
RF RFTX/L1
SM
mixed
Thread Cooperation
Use of Shared Memory
GPUs
Shared memory among compute cores
![Page 53: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/53.jpg)
Hardware-aware Search Space
!16
TPU-like Specialized Accelerators
tensor
Compute Primitives
Unified Buffer Acc
FIFO
explicitly managed
Memory Subsystem
![Page 54: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/54.jpg)
Hardware-aware Search Space
!16
TPU-like Specialized Accelerators
tensor
Compute Primitives
Unified Buffer Acc
FIFO
explicitly managed
Memory Subsystem
![Page 55: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/55.jpg)
Tensorization ChallengeCompute primitives
!17
![Page 56: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/56.jpg)
Tensorization ChallengeCompute primitives
scalar
!17
![Page 57: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/57.jpg)
Tensorization ChallengeCompute primitives
scalar vector
!17
![Page 58: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/58.jpg)
Tensorization ChallengeCompute primitives
scalar vector tensor
!17
![Page 59: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/59.jpg)
Tensorization ChallengeCompute primitives
scalar vector tensor
w, x = t.placeholder((8, 8)), t.placeholder((8, 8))k = t.reduce_axis((0, 8))y = t.compute((8, 8), lambda i, j: t.sum(w[i, k] * x[j, k], axis=k))
def gemm_intrin_lower(inputs, outputs): ww_ptr = inputs[0].access_ptr(“r") xx_ptr = inputs[1].access_ptr("r") zz_ptr = outputs[0].access_ptr("w") compute = t.hardware_intrin("gemm8x8", ww_ptr, xx_ptr, zz_ptr) reset = t.hardware_intrin("fill_zero", zz_ptr) update = t.hardware_intrin("fuse_gemm8x8_add", ww_ptr, xx_ptr, zz_ptr) return compute, reset, update
gemm8x8 = t.decl_tensor_intrin(y.op, gemm_intrin_lower)
declare behavior
lowering rule to generatehardware intrinsics to carry out the computation
Hardware designer: declare tensor instruction interface
with Tensor Expression
!17
![Page 60: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/60.jpg)
Tensorization ChallengeCompute primitives
scalar vector tensor
w, x = t.placeholder((8, 8)), t.placeholder((8, 8))k = t.reduce_axis((0, 8))y = t.compute((8, 8), lambda i, j: t.sum(w[i, k] * x[j, k], axis=k))
def gemm_intrin_lower(inputs, outputs): ww_ptr = inputs[0].access_ptr(“r") xx_ptr = inputs[1].access_ptr("r") zz_ptr = outputs[0].access_ptr("w") compute = t.hardware_intrin("gemm8x8", ww_ptr, xx_ptr, zz_ptr) reset = t.hardware_intrin("fill_zero", zz_ptr) update = t.hardware_intrin("fuse_gemm8x8_add", ww_ptr, xx_ptr, zz_ptr) return compute, reset, update
gemm8x8 = t.decl_tensor_intrin(y.op, gemm_intrin_lower)
declare behavior
lowering rule to generatehardware intrinsics to carry out the computation
Hardware designer: declare tensor instruction interface
with Tensor Expression
!17
Tensorize: transform program
to use tensor instructions
scalar tensor
![Page 61: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/61.jpg)
Hardware-aware Search Space
!18
TPU-like Specialized Accelerators
tensor
Compute Primitives
Unified Buffer Acc
FIFO
explicitly managed
Memory Subsystem
![Page 62: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/62.jpg)
Hardware-aware Search Space
!18
TPU-like Specialized Accelerators
tensor
Compute Primitives
Unified Buffer Acc
FIFO
explicitly managed
Memory Subsystem
![Page 63: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/63.jpg)
Software Support for Latency Hiding
Single ModuleNo Task-Pipelining
load inputs
load weights
matrix multiplication
store outputs
load inputs
load weights
matrix multiplication
store outputs
load inputs
load weights
matrix multiplication
store outputs
load inputs
load weights
matrix multiplication
store outputs
![Page 64: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/64.jpg)
Software Support for Latency Hiding
Single ModuleNo Task-Pipelining
load inputs
load weights
matrix multiplication
store outputs
load inputs
load weights
matrix multiplication
store outputs
load inputs
load weights
load inputs
load weights
matrix multiplication
matrix multiplication
store outputs
store outputs
Multiple-ModuleTask-Level Pipelining
![Page 65: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/65.jpg)
Software Support for Latency Hiding
Single ModuleNo Task-Pipelining
load inputs
load weights
matrix multiplication
store outputs
load inputs
load weights
matrix multiplication
store outputs
load inputs
load weights
load inputs
load weights
matrix multiplication
matrix multiplication
store outputs
store outputs
Multiple-ModuleTask-Level Pipelining
Explicit dependency tracking managed by software to hide memory latency
![Page 66: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/66.jpg)
Hardware-aware Search Space
Hardware!21
C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
Tensor Expression Language
![Page 67: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/67.jpg)
Hardware-aware Search Space
Hardware
Loop Transformations
Thread Bindings
Cache Locality
Primitives in prior work:Halide, Loopy
!21
C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
Tensor Expression Language
![Page 68: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/68.jpg)
Hardware-aware Search Space
Hardware
Loop Transformations
Thread Bindings
Cache Locality
Primitives in prior work:Halide, Loopy
Thread Cooperation Tensorization Latency
Hiding …New primitives for GPUs, and enable TPU-like Accelerators
!21
C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
Tensor Expression Language
![Page 69: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/69.jpg)
Hardware-aware Search Space
Hardware
Loop Transformations
Thread Bindings
Cache Locality
Thread Cooperation Tensorization Latency
Hiding
!22
C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
Tensor Expression Language
![Page 70: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/70.jpg)
Hardware-aware Search Space
Hardware
Loop Transformations
Thread Bindings
Cache Locality
Thread Cooperation Tensorization Latency
Hiding
!22
C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
Tensor Expression Language
![Page 71: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/71.jpg)
Hardware-aware Search Space
Hardware
Loop Transformations
Thread Bindings
Cache Locality
Thread Cooperation Tensorization Latency
Hiding
Billions of possible optimization choices
!22
C = tvm.compute((m, n), lambda y, x: tvm.sum(A[k, y] * B[k, x], axis=k))
Tensor Expression Language
![Page 72: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/72.jpg)
Learning-based Learning System Frameworks
High-level data flow graph and optimizations
Hardware aware Search Space of Optimized Tensor Programs
Machine Learning based Program Optimizer
!23
Hardware
![Page 73: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/73.jpg)
Learning-based Learning System Frameworks
High-level data flow graph and optimizations
Hardware aware Search Space of Optimized Tensor Programs
Machine Learning based Program Optimizer
!23
Hardware
![Page 74: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/74.jpg)
Learning-based Program Optimizer
!24
Program Optimizer
![Page 75: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/75.jpg)
Learning-based Program Optimizer
!24
Program Optimizer Program Code Generator
![Page 76: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/76.jpg)
Learning-based Program Optimizer
Runtime Measurements
!24
Program Optimizer Program Code Generator
![Page 77: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/77.jpg)
Learning-based Program Optimizer
Runtime Measurements
High experiment cost, each trial costs ~1second !24
Program Optimizer Program Code Generator
![Page 78: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/78.jpg)
Learning-based Program Optimizer
!25
Program Optimizer Program Code Generator
![Page 79: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/79.jpg)
Learning-based Program Optimizer
!25
Program Optimizer Program Code Generator
Cost Model
![Page 80: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/80.jpg)
Learning-based Program Optimizer
Need reliable cost model per hardware!25
Program Optimizer Program Code Generator
Cost Model
![Page 81: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/81.jpg)
Learning-based Program Optimizer
!26
Program Optimizer Program Code Generator
![Page 82: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/82.jpg)
Learning-based Program Optimizer
!26
Program Optimizer Program Code Generator
D<latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit><latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit><latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit><latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit>
Training data
![Page 83: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/83.jpg)
Learning-based Program Optimizer
!26
Program Optimizer Program Code Generator
D<latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit><latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit><latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit><latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit>
Training data
LearningStatistical Cost Model
![Page 84: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/84.jpg)
Learning-based Program Optimizer
!26
Program Optimizer Program Code Generator
D<latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit><latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit><latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit><latexit sha1_base64="1Z6CzjBl0OMVztfQ+m452YDkcY0=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIRdFnUhcsK9gFtKJPppB06mQkzN0IJ/Qw3LhRx69e482+ctFlo64GBwzn3MueeMBHcoOd9O6W19Y3NrfJ2ZWd3b/+genjUNirVlLWoEkp3Q2KY4JK1kKNg3UQzEoeCdcLJbe53npg2XMlHnCYsiMlI8ohTglbq9WOCY0pEdjcbVGte3ZvDXSV+QWpQoDmofvWHiqYxk0gFMabnewkGGdHIqWCzSj81LCF0QkasZ6kkMTNBNo88c8+sMnQjpe2T6M7V3xsZiY2ZxqGdzCOaZS8X//N6KUbXQcZlkiKTdPFRlAoXlZvf7w65ZhTF1BJCNbdZXTommlC0LVVsCf7yyaukfVH3vbr/cFlr3BR1lOEETuEcfLiCBtxDE1pAQcEzvMKbg86L8+58LEZLTrFzDH/gfP4AdN2RWg==</latexit>
Training data
LearningStatistical Cost Model
Adapt to hardware type by learning Make prediction in 1ms level
![Page 85: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/85.jpg)
Effectiveness of ML based Model
!27
0 100 200 300 400 500 600 700 8000.00
0.25
0.50
0.75
1.00
1.25
1.50
![Page 86: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/86.jpg)
Effectiveness of ML based Model
!27
0 100 200 300 400 500 600 700 8000.00
0.25
0.50
0.75
1.00
1.25
1.50
One Conv2D Layer of ResNet18 on Titan X
![Page 87: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/87.jpg)
Effectiveness of ML based Model
!27
0 100 200 300 400 500 600 700 8000.00
0.25
0.50
0.75
1.00
1.25
1.50
Number of Trials
One Conv2D Layer of ResNet18 on Titan X
![Page 88: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/88.jpg)
Effectiveness of ML based Model
!27
0 100 200 300 400 500 600 700 8000.00
0.25
0.50
0.75
1.00
1.25
1.50
Number of Trials
One Conv2D Layer of ResNet18 on Titan X
Rela
tive
Spee
dup
Baseline: CuDNN
![Page 89: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/89.jpg)
Effectiveness of ML based Model
!28
0 100 200 300 400 500 600 700 8000.00
0.25
0.50
0.75
1.00
1.25
1.50
0 100 200 300 400 500 600 700 8000.00
0.25
0.50
0.75
1.00
1.25
1.50
Number of Trials
One Conv2D Layer of ResNet18 on Titan X
Rela
tive
Spee
dup
Baseline: CuDNN
TVM: Random Search
![Page 90: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/90.jpg)
Effectiveness of ML based Model
!29
0 100 200 300 400 500 600 700 8000.00
0.25
0.50
0.75
1.00
1.25
1.50
0 100 200 300 400 500 600 700 8000.00
0.25
0.50
0.75
1.00
1.25
1.50
0 100 200 300 400 500 600 700 8000.00
0.25
0.50
0.75
1.00
1.25
1.50
One Conv2D Layer of ResNet18 on Titan X
Rela
tive
Spee
dup
TVM: Random Search
TVM: ML-based Model
Number of Trials
Baseline: CuDNN
![Page 91: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/91.jpg)
Learning-based Learning System Frameworks
High-level data flow graph and optimizations
Hardware aware Search Space of Optimized Tensor Programs
Machine Learning based Program Optimizer
!30
Hardware
![Page 92: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/92.jpg)
End to End Inference Performance (Nvidia Titan X)
ResNet-18 MobileNet0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
Tim
e(m
s)
LSTM LM DQN DCGAN0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Tensorflow-XLA
Apache MXNetTensorflow
![Page 93: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/93.jpg)
End to End Inference Performance (Nvidia Titan X)
ResNet-18 MobileNet0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
Tim
e(m
s)
LSTM LM DQN DCGAN0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Tensorflow-XLA
Apache MXNetTensorflow
Backed by cuDNN
![Page 94: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/94.jpg)
End to End Inference Performance (Nvidia Titan X)
Tensorflow-XLA
Apache MXNetTensorflow
ResNet-18 MobileNet0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
Tim
e(m
s)
LSTM LM DQN DCGAN0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
TVM: without graph optimizations
![Page 95: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/95.jpg)
End to End Inference Performance (Nvidia Titan X)
Tensorflow-XLA
Apache MXNetTensorflow TVM: without graph optimizations
ResNet-18 MobileNet0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
Tim
e(m
s)
LSTM LM DQN DCGAN0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
TVM: all optimizations
![Page 96: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/96.jpg)
End to End Inference Performance (Nvidia Titan X)
Tensorflow-XLA
Apache MXNetTensorflow TVM: without graph optimizations
ResNet-18 MobileNet0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
Tim
e(m
s)
LSTM LM DQN DCGAN0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
TVM: all optimizations
Competitive on standard models
![Page 97: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/97.jpg)
End to End Inference Performance (Nvidia Titan X)
Tensorflow-XLA
Apache MXNetTensorflow TVM: without graph optimizations
ResNet-18 MobileNet0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
Tim
e(m
s)
LSTM LM DQN DCGAN0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
TVM: all optimizations
Competitive on standard models
Bigger gap on less conventional models
![Page 98: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/98.jpg)
End to End Performance(ARM Cortex-A53)
!34ResNet-18 MobileNet
0.0
100.0
200.0
300.0
400.0
500.0
600.0
700.0
800.0
Tim
e(m
s)
DQN0.0
2.0
4.0
6.0
8.0
10.0
12.0
Tensorflow Lite TVM w/o graph opt TVM
![Page 99: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/99.jpg)
End to End Performance(ARM Cortex-A53)
!34ResNet-18 MobileNet
0.0
100.0
200.0
300.0
400.0
500.0
600.0
700.0
800.0
Tim
e(m
s)
DQN0.0
2.0
4.0
6.0
8.0
10.0
12.0
Tensorflow Lite TVM w/o graph opt TVM
Specially optimized for Embedded system(ARM)
![Page 100: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/100.jpg)
End to End Performance(ARM GPU)
!35
float32 float16ResNet-18
float32 float16MobileNet
0.0
50.0
100.0
150.0
200.0
250.0
Tim
e(m
s)
float32 float16DQN
0.0
1.0
2.0
3.0
4.0
5.0
ARMComputeLib TVM w/o graph opt TVM
![Page 101: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/101.jpg)
Supporting New Specialized Accelerators Frameworks
High-level data flow graph and optimizations
Hardware aware Search Space of Optimized Tensor Programs
Machine Learning based Program Optimizer
LLVM CUDA
!36
![Page 102: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/102.jpg)
Supporting New Specialized Accelerators Frameworks
High-level data flow graph and optimizations
Hardware aware Search Space of Optimized Tensor Programs
Machine Learning based Program Optimizer
LLVM CUDA VTA: Open, Customizable Deep Learning Accelerator
Data Center FPGAEdge FPGA ASIC
!36
![Page 103: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/103.jpg)
TVM/VTA: Full Stack Open Source System
!37
ML-based Optimizer
Tensor Program Search Space
High-level Optimizations
![Page 104: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/104.jpg)
TVM/VTA: Full Stack Open Source System
VTA MicroArchitecture
!37
ML-based Optimizer
Tensor Program Search Space
High-level Optimizations
![Page 105: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/105.jpg)
TVM/VTA: Full Stack Open Source System
VTA Hardware/Software Interface (ISA)
VTA MicroArchitecture
!37
ML-based Optimizer
Tensor Program Search Space
High-level Optimizations
![Page 106: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/106.jpg)
TVM/VTA: Full Stack Open Source System
VTA Runtime & JIT Compiler
VTA Hardware/Software Interface (ISA)
VTA MicroArchitecture
!37
ML-based Optimizer
Tensor Program Search Space
High-level Optimizations
![Page 107: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/107.jpg)
TVM/VTA: Full Stack Open Source System
VTA Runtime & JIT Compiler
VTA Hardware/Software Interface (ISA)
VTA MicroArchitecture VTA Simulator
!37
ML-based Optimizer
Tensor Program Search Space
High-level Optimizations
![Page 108: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/108.jpg)
TVM/VTA: Full Stack Open Source System
• JIT compile accelerator micro code
• Support heterogenous devices, 10x better than CPU on the same board.
• Move hardware complexity to software
VTA Runtime & JIT Compiler
VTA Hardware/Software Interface (ISA)
VTA MicroArchitecture VTA Simulator
!37
ML-based Optimizer
Tensor Program Search Space
High-level Optimizations
![Page 109: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/109.jpg)
TVM/VTA: Full Stack Open Source System
• JIT compile accelerator micro code
• Support heterogenous devices, 10x better than CPU on the same board.
• Move hardware complexity to software
VTA Runtime & JIT Compiler
VTA Hardware/Software Interface (ISA)
VTA MicroArchitecture VTA Simulator
!37
ML-based Optimizer
Tensor Program Search Space
High-level Optimizations}compiler, driver, hardware design full stack open source
![Page 110: TVM: An Automated End-to-End Optimizing Compiler for Deep ... · TVM: An Automated End-to-End Optimizing Compiler for Deep Learning Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin](https://reader034.fdocuments.us/reader034/viewer/2022042108/5e87d7f2d5d5860bbc6fd78f/html5/thumbnails/110.jpg)
TVM: Learning-based Learning System
Check it out!ML-based Optimizer
Tensor Program Search Space
High-level Optimizations
VTALLVM CUDA