Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf ·...
Transcript of Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf ·...
![Page 1: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/1.jpg)
sa paUniversity of Washington MICRO 2012
Neural Accelerationfor General-PurposeApproximate ProgramsHadi EsmaeilzadehAdrian SampsonLuis CezeDoug Burger
University of Washington
Microsoft Research
![Page 2: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/2.jpg)
ProgramCPU
![Page 3: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/3.jpg)
JPL & Rob Hogg NASA thefrugalgirl.com
computer vision
machine learning
sensory data
physical simulation
information retrieval
augmented reality
image rendering
![Page 4: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/4.jpg)
computer vision
machine learning
sensory data
physical simulation
information retrieval
augmented reality
image rendering
Approximatecomputing
EnerJ programming language[PLDI 2011]
Truffle dual-voltage architecture[ASPLOS 2012]
Relax software fault recovery[de Kruijf et al., ISCA 2010]
Code perforation transformations[MIT]
Green runtime system[Baek and Chilimbi, PLDI 2010]
Flikker approximate DRAM[Liu et al., ASPLOS 2011]
Stochastic processors[Illinois]
Probabilistic CMOS designs[Rice, NTU, Georgia Tech…]
![Page 5: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/5.jpg)
Accelerators
CPU GPU
FPGAVector
Unit
DySERWisconsin
BERETMichiganConservation
CoresUCSD
![Page 6: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/6.jpg)
Accelerators
CPU GPU
FPGAVector
Unit
DySERWisconsin
BERETMichiganConservation
CoresUCSD
Approximatecomputingcomputer vision
machine learning
sensory data
physical simulation
information retrieval
augmented reality
image rendering
![Page 7: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/7.jpg)
An acceleratorfor approximate computations
ApproximateAccelerator
1.0
Mimics functions written in traditional languages!
Runs more efficiently than a CPU or a precise accelerator!
May introduce small errors!
√
√
√NEW
!
![Page 8: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/8.jpg)
Neural networksare function approximatorsTrainable: implements many functions
Very efficienthardware implementations
Highly parallel
Fault tolerant[Temam, ISCA 2012]
CPU
![Page 9: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/9.jpg)
Program
Neural acceleration
![Page 10: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/10.jpg)
Neural acceleration
Program
Annotate an approximateprogram component
![Page 11: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/11.jpg)
Program
Annotate an approximateprogram component
Compile the programand train a neural network
Neural acceleration
![Page 12: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/12.jpg)
Program
Annotate an approximateprogram component
Compile the programand train a neural network
Execute on a fast Neural Processing Unit (NPU)
Neural acceleration
![Page 13: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/13.jpg)
Annotate an approximateprogram component
Compile the programand train a neural network
Execute on a fast Neural Processing Unit (NPU)
Neural acceleration
Improve performance 2.3x and energy 3.0x on average
1234
![Page 14: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/14.jpg)
Programming model
float grad(float[3][3] p) { …}
edgeDetection()
void edgeDetection(Image &src, Image &dst) { for (int y = …) { for (int x = …) { dst[x][y] = grad(window(src, x, y)); } }}
[[transform]]
![Page 15: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/15.jpg)
Code region criteria
Hot code
Approximable
Well-definedinputs and outputs
grad()
run on every3x3 pixel window
small errors do not corrupt output
takes 9 pixel values;returns a scalar
√
√
√
![Page 16: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/16.jpg)
Empirically selectingtarget functions
Program AcceleratedProgram
√
√✗
![Page 17: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/17.jpg)
Compiling and transformingAnnotated
SourceCode
Training Inputs
TrainedNeural
Network
Augmented Binary
1. CodeObservation
2. Training3. Code
Generation
![Page 18: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/18.jpg)
Code observation
[[NPU]]float grad(float[3][3] p) { …}
void edgeDetection(Image &src, Image &dst) { for (int y = …) { for (int x = …) { dst[x][y] = grad(window(src, x, y)); } }}
+ =
test cases instrumented program
samplearguments & outputs
323, 231, 122, 93, 321, 49 53.2➝
p grad(p)
49, 423, 293, 293, 23, 2 94.2➝
34, 129, 493, 49, 31, 11 1.2➝
21, 85, 47, 62, 21, 577 64.2➝
7, 55, 28, 96, 552, 921 18.1➝
5, 129, 493, 49, 31, 11 92.2➝
49, 423, 293, 293, 23, 2 6.5➝
34, 129, 72, 49, 5, 2 120➝
323, 231, 122, 93, 321, 49 53.2➝
6, 423, 293, 293, 23, 2 49.7➝
record(p); record(result);
![Page 19: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/19.jpg)
Training
BackpropagationTraining
Training Inputs
![Page 20: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/20.jpg)
Training
Training Inputs
Training Inputs
Training Inputs
fasterless robust
slowermore accurate
70% 98% 99%
![Page 21: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/21.jpg)
Code generation
void edgeDetection(Image &src, Image &dst) { for (int y = …) { for (int x = …) { p = window(src, x, y); NPU_SEND(p[0][0]); NPU_SEND(p[0][1]); NPU_SEND(p[0][2]); … dst[x][y] = NPU_RECEIVE(); } }}
![Page 22: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/22.jpg)
Neural Processing Unit (NPU)
Core NPU
![Page 23: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/23.jpg)
Software interface:ISA extensions
Core NPU
input
output
configurationenq.cdeq.c
enq.d
deq.d
![Page 24: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/24.jpg)
Microarchitectural interface
NPUconfiguration
S
enq.cdeq.c
enq.d
deq.d
NS
S NS
Fetch
Decode
Issue
Execute
Memory
Commit
![Page 25: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/25.jpg)
A digital NPUBus
Scheduler
Processing Engines
input
output
scheduling
![Page 26: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/26.jpg)
A digital NPUBus
Scheduler
Processing Engines
input
output
schedulingmultiply-add unit
accumulator
sigmoid LUT
neuronweights
input
output
![Page 27: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/27.jpg)
Experiments
Several benchmarks; annotated one hot function eachFFT, inverse kinematics, triangle intersection, JPEG, K-means, Sobel
Simulated full programs on MARSSx86Energy modeled with McPAT and CACTIMicroarchitecture like Intel Penryn: 4-wide, 6-issue45 nm, 2080 MHz, 0.9 V
![Page 28: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/28.jpg)
1,079static x86-64instructions
60 neurons2 hidden layers
88 staticinstructions
18neurons
triangle intersection
edge detection
Two benchmarks
56% of dynamicinstructions
97% of dynamicinstructions
![Page 29: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/29.jpg)
Speedup with NPU acceleration
2.3x average speedupRanges from 0.8x to 11.1x
0x
2x
4x
6x
8x
10x
12x
fft inversek2j jmeint jpeg kmeans sobel geometric mean
spee
dup
over
all-
CPU
exe
cutio
n
![Page 30: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/30.jpg)
Energy savings with NPU acceleration
3.0x average energy reductionAll benchmarks benefit
0x
2x
4x
6x
8x
10x
12x
fft inversek2j jmeint jpeg kmeans sobel geometric mean
ener
gy re
duct
ion
over
all-
CPU
exe
cutio
n
21.1x
![Page 31: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/31.jpg)
Application quality loss
Quality loss below 10% in all casesBased on application-specific quality metrics
0%
20%
40%
60%
80%
100%
fft inversek2j jmeint jpeg kmeans sobel geometric mean
qual
ity d
egra
datio
n
![Page 32: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/32.jpg)
Edge detectionwith gradient calculation on NPU
![Page 33: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/33.jpg)
Also in the paper
Sensitivity to communication latency
Sensitivity to NN evaluation efficiency
Sensitivity to PE count
Benchmark statistics
All-software NN slowdown
![Page 34: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/34.jpg)
Program
![Page 35: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/35.jpg)
Program
![Page 36: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/36.jpg)
Program
Neural networks can efficiently approximate functions from programs written in conventional languages.
![Page 37: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/37.jpg)
CPU
low powerparallel
regular fault-tolerantanalog
flexible
![Page 38: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/38.jpg)
![Page 39: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/39.jpg)
Normalized dynamic instructions
0%
20%
40%
60%
80%
100%
fft inversek2j jmeint jpeg kmeans sobel geometric mean
dyna
mic
inst
ruct
ion
coun
t nor
mal
ized
to o
rigin
al
other instructionsNPU queue instructions
![Page 40: Neural Acceleration for General-Purpose Approximate Programsasampson/media/npu-micro-slides.pdf · Neural Acceleration for General-Purpose Approximate Programs Hadi Esmaeilzadeh Adrian](https://reader033.fdocuments.us/reader033/viewer/2022042214/5eb9023c096f4f49a60830d5/html5/thumbnails/40.jpg)
Slowdown with software NN
20x average slowdownUsing off-the-shelf FANN library
0x
15x
30x
45x
60x
75x
fft inversek2j jmeint jpeg kmeans sobel geometric mean
slow
dow
n ov
er o
rigin
al p
rogr
am