Automatically Fusing Functions on CuPy

9
Automatically Fusing Functions on CuPy Akifumi Imanishi

Transcript of Automatically Fusing Functions on CuPy

Page 1: Automatically Fusing Functions on CuPy

Automatically+Fusing+Functions+on+CuPyAkifumi Imanishi

Page 2: Automatically Fusing Functions on CuPy

What’s'CuPy

• An'implementation'of'NumPy6compatiblemulti6dimensional'array'on'CUDA

• CuPy enables'us'to'write'Python'Codesfor'running'on'GPU.

• Two'basic'operations• elementwise

• Applying'the'function'to'each'element

• reduction• Reducing'elements

Page 3: Automatically Fusing Functions on CuPy

Problems'of'CuPy

• Small'functions'are'called'many'times.• Communication'time'between'CPU'and'GPU'is'a'bottleneck.

• A'mechanism'of'fusing'functions'is'needed'to'resolve'it.

• ex.)':''x'*'y'+'z'*'3'+'5• There'are'4'kernel'calls'in'total.

• We'want'to'calculate'the'expression'in'1'kernel'call.

Page 4: Automatically Fusing Functions on CuPy

UI'for'elementwise'kernel

• Converting'a'Python'function'to'an'Elementwise.

• ex.)

Page 5: Automatically Fusing Functions on CuPy

Constructing'a'Data'Structure

3 5

**

++

x y z

Page 6: Automatically Fusing Functions on CuPy

Generating'an'Elementwise

Page 7: Automatically Fusing Functions on CuPy

UI'for'reduction'kernel

• Converting'a'Python'function'to'a'ReductionKernel.• ex.)

Page 8: Automatically Fusing Functions on CuPy

Rewrite'adam.py by'using'”fuse”

Page 9: Automatically Fusing Functions on CuPy

Results

• chainer/optimizers/adam.py (update_one_gpu)

• chainer/example/mnist/train_mnist.py

Memory'usage'(MiB)Ufunc 225

Elementwise 211

Fusion 211

78.656

62.430 62.874

55.000

60.000

65.000

70.000

75.000

80.000

85.000

Ufunc Elementwise fusion

Running'times

Memory'usage