L Llvm T CS 744: TVM

18
CS 744: TVM Shivaram Venkataraman Fall 2020 Tensor Virtual Machine x.dk ! L , Llvm T -

Transcript of L Llvm T CS 744: TVM

Page 1: L Llvm T CS 744: TVM

CS 744: TVM

Shivaram VenkataramanFall 2020

TensorVirtual Machine

x.dk! L, Llvm

T-

Page 2: L Llvm T CS 744: TVM

ADMINISTRIVIA

- Course project titles- Project proposal aka Introduction (10/16)

IntroductionRelated WorkTimeline (with eval plan)

- Midterm: Oct 22

Assignment→

<

]→ 2 pagewriteup

Page 3: L Llvm T CS 744: TVM

MACHINE LEARNING: STACK

✓Distributed

noowed

Train efferent just lie↳ forward pass

→ Interplayinference

&

quondam \, training

makedistributedeasy ( dealing inference

groin v

Hardwareand saddle

Page 4: L Llvm T CS 744: TVM

MOTIVATION: PERFORAMNCE PORTABILITYPytoreh → model file

"intent :?÷:www.rayf/4TE-iTIyconfute primitives matrix cow

multiply ed

you want high performance I 1

across hardware backends-

Dependence onvendor specific o o • q

libraries

MLmodels evolve fast ⇒ new operators

new combination of operators Y ⇒ Notavailable

in existingvendor

libraries

Page 5: L Llvm T CS 744: TVM

AM

→ Python code describes

ML model

→Tvm

. . )=

→ Binary file thatruns on hardware

Page 6: L Llvm T CS 744: TVM

OPTIMIZATION COMPUTATION GRAPHSOperator Fusion

Data layout

[ I¥÷÷÷¥÷÷÷÷÷:* " "÷ . :÷÷÷:- -T -

→ 1-1 operators ,"

map"

-

→ Sum reduction, scaling after

↳(Spg-

Kow Major ,

column Major ,Blocked

g, Infest

teatisrepresented

2- layerNN

.in/TEHtEi/:::IS...:: :as layout

Page 7: L Llvm T CS 744: TVM

TENSOR EXPRESSION LANGUAGE

Common Arithmetic, Math operationsKnow the shape of the output and the data accessed

operator cry↳ expressed in tensor expression language

↳tensor) math operations

Page 8: L Llvm T CS 744: TVM

CODE GENERATION

Nested parallelism

Tensorization

Halide→ expression OpenMP ← gu+ of imtmhns

ead does

a.← anime fi:÷÷÷÷÷÷:÷÷i÷÷÷'

jaihe"

for i in l : "

for j . int :S

threads can use as , Bstdgmptd.im#isIL;i!dffIHdeu- poker=doopiterah#bad ,

store, add → whet is the

hardware instructionset-

= Allows you to

- - register operatorExtensible ! intrinsic

Page 9: L Llvm T CS 744: TVM

Latency HIDING

What is the goal?

Someas

Pytorchetc .

9↳ Overlap computation and

communication

Schedule thatutilizes

-

memorybandwidth &

compute units

ig:*year 'fad

Page 10: L Llvm T CS 744: TVM

AUTOMATING OPTIMIZATION

Goal: Create a specialized operator for input shape and layoutChallenge:

Choose appropriate schedule optimizationsTiling size, loop unrolling

Automate the optimizer!

---

- - - - r . lots of differentchoices&

also lots of parameters Huntsto

choose .

FimMl ? -

"" m.

what configurationsI

to Try ←

Page 11: L Llvm T CS 744: TVM

ML-Based Cost model

Machine Learning Model Design ChoicesSpeed: Faster than time it takes to evaluate a configQuality: Use a rank objective to predict the relative order of runtime

Gradient tree boosting modelmemory access countreuse ratio of each memory buffer at each loop levelone-hot encoding of loop annotations

as to

→ →n seconds

take

code generated

features

Page 12: L Llvm T CS 744: TVM

ML-BASED COST MODEL

IterationSelect a batch of candidates Collect data Use as training data to update the model

How to select candidates?Parallel Simulated Annealing

Start from a random configWalk to a nearby config à

Successful if cost decreases Else Reject

model perfwhen using ←config

,

tom >LE -y

( Cz ,20ms

→each candidate is

ccz,

8ms>41

a wyignratim lahhhh

< £ , 'fail>fief a ← :

harder:c ,

→ step a)above trashy data

- toaB7✓~,

Aa ↳ → ↳'

Task model is cj better than b

Yes → go& try d

,

on cluster

No → generateanother

oyer config

Page 13: L Llvm T CS 744: TVM

Distributed device pool

Pool of devices to speed up profilingRPC interface to run a trial on device

Share device pools for multiple graphs

Page 14: L Llvm T CS 744: TVM

SUMMARY

TVM: Compiler for ML inference modelsSupport high performance for range of models, hardware devices

Key ideasGraph-level optimizationsTensor expression language: Code-gen, Latency hiding etcML based Cost Model for automation

→→ operator fusion

-

Page 15: L Llvm T CS 744: TVM

DISCUSSIONhttps://forms.gle/WiVgJ3abGXXgfBN99

Page 16: L Llvm T CS 744: TVM

Consider that you are building an optimizer for Spark programs instead of ML inference. What would be some configuration knobs that you could similarly tune? What might be different from the TVM optimizer?

Similar logic → latency hidingoverlap comp ,

communication

7rYYdimemim'operatorfmon→mapBoperahnaccess patterns

↳laa#↳ operators are

user definedchallenging ??-Partitioning → can you automate

↳ number of partitions / co- partitioning\ performance !Had .

cache →config space

!

Persistence → manually insert

Page 17: L Llvm T CS 744: TVM

What is your takeaway from the following figure?

→ fastingqmon

f-T" bae:Enea.:c.

.

honey! week,unite:b.or

Page 18: L Llvm T CS 744: TVM

NEXT STEPS

Next class: RayCourse project: Oct 16 (introductions)Midterm: Oct 22

latency hiding in spark ?

Drddlsmaf tasks > ✓

credence tasks|D÷i;:D rddz map ← Hanffiles:

Edna > ← nocomm

D wait