Automatically Fusing Functions on CuPy
-
Upload
preferred-infrastructure-preferred-networks -
Category
Technology
-
view
605 -
download
0
Transcript of Automatically Fusing Functions on CuPy
Automatically+Fusing+Functions+on+CuPyAkifumi Imanishi
What’s'CuPy
• An'implementation'of'NumPy6compatiblemulti6dimensional'array'on'CUDA
• CuPy enables'us'to'write'Python'Codesfor'running'on'GPU.
• Two'basic'operations• elementwise
• Applying'the'function'to'each'element
• reduction• Reducing'elements
Problems'of'CuPy
• Small'functions'are'called'many'times.• Communication'time'between'CPU'and'GPU'is'a'bottleneck.
• A'mechanism'of'fusing'functions'is'needed'to'resolve'it.
• ex.)':''x'*'y'+'z'*'3'+'5• There'are'4'kernel'calls'in'total.
• We'want'to'calculate'the'expression'in'1'kernel'call.
UI'for'elementwise'kernel
• Converting'a'Python'function'to'an'Elementwise.
• ex.)
Constructing'a'Data'Structure
3 5
**
++
x y z
Generating'an'Elementwise
UI'for'reduction'kernel
• Converting'a'Python'function'to'a'ReductionKernel.• ex.)
Rewrite'adam.py by'using'”fuse”
Results
• chainer/optimizers/adam.py (update_one_gpu)
• chainer/example/mnist/train_mnist.py
Memory'usage'(MiB)Ufunc 225
Elementwise 211
Fusion 211
78.656
62.430 62.874
55.000
60.000
65.000
70.000
75.000
80.000
85.000
Ufunc Elementwise fusion
Running'times
Memory'usage