Automatically Fusing Functions on CuPy

Post on 06-Jan-2017

605 views 0 download

Transcript of Automatically Fusing Functions on CuPy

Automatically+Fusing+Functions+on+CuPyAkifumi Imanishi

What’s'CuPy

• An'implementation'of'NumPy6compatiblemulti6dimensional'array'on'CUDA

• CuPy enables'us'to'write'Python'Codesfor'running'on'GPU.

• Two'basic'operations• elementwise

• Applying'the'function'to'each'element

• reduction• Reducing'elements

Problems'of'CuPy

• Small'functions'are'called'many'times.• Communication'time'between'CPU'and'GPU'is'a'bottleneck.

• A'mechanism'of'fusing'functions'is'needed'to'resolve'it.

• ex.)':''x'*'y'+'z'*'3'+'5• There'are'4'kernel'calls'in'total.

• We'want'to'calculate'the'expression'in'1'kernel'call.

UI'for'elementwise'kernel

• Converting'a'Python'function'to'an'Elementwise.

• ex.)

Constructing'a'Data'Structure

3 5

**

++

x y z

Generating'an'Elementwise

UI'for'reduction'kernel

• Converting'a'Python'function'to'a'ReductionKernel.• ex.)

Rewrite'adam.py by'using'”fuse”

Results

• chainer/optimizers/adam.py (update_one_gpu)

• chainer/example/mnist/train_mnist.py

Memory'usage'(MiB)Ufunc 225

Elementwise 211

Fusion 211

78.656

62.430 62.874

55.000

60.000

65.000

70.000

75.000

80.000

85.000

Ufunc Elementwise fusion

Running'times

Memory'usage