Comparison of deep learning frameworks from a viewpoint of double backpropagation

Post on 21-Jan-2018

1.820 views 0 download

Transcript of Comparison of deep learning frameworks from a viewpoint of double backpropagation

Comparisonofdeeplearningframeworksfromaviewpointof

doublebackpropagation

PreferredNetworks,Inc.KentaOono <oono@preferred.jp>

Chainer Meetup#6@PreferredNetworksSep.30th 2017

1

Agenda

• TechnologicalstackofDLframeworks• DesignchoiceinDLframeworks• Doublebackprop primer• Codingexamplesofdoublebackprop inChainer,PyTorch,andTF

2

TechnologystackofaDLframework

name functions example

Graphical visualization DIGITS, TensorBoard

Machine learning workflowmanagement

Dataset prep, Save/LoadTraining loop

Keras, TF slim

Computational graph(CG)management

Build/Optimize CGsForward/Back prop

Theano, TensorFlowTorch.nn

Multi-dimensionalarray processing

High-level array manipulation

NumPy, CuPyEigen, Torch (core)

Numerical computation Matrix operationConvolution

BLAS(OpenBLAS, MKL),cuBLAS, cuDNN, MKL DNN

Computational device CPU, GPU, TPU, FPGA

3

TechnologystackofChainer

cuDNN

Chainer

NumPy CuPy

BLAS cuBLAS,cuRAND

CPU GPU

4

name

Graphical visualization

Machine learning workflowmanagementComputational graph managementMulti-dimensionalarray processingNumerical computation

Computational device

TechnologystackofTensorFlow

cuDNN

TensorFlow

Eigen::Tensor

BLAS cuBLAS,cuRAND

CPU GPU

5

TensorBoard

TFslimKeras

name

Graphical visualization

Machine learning workflowmanagementComputational graph managementMulti-dimensionalarray processingNumerical computation

Computational device

TechnologystackofTheano

CUDA,OpenCLCUDAToolkit

Theano

BLAS

CPU GPU

6

libgpuarrayNumPy

Keras,Lasagne,Blocks,etc.

name

Graphical visualization

Machine learning workflowmanagementComputational graph managementMulti-dimensionalarray processingNumerical computation

Computational device

TechnologystackofKeras

7

Keras

TensorFlowTheano

TechnologyStackofTheano

TechnologyStackofTF

name

Graphical visualization

Machine learning workflowmanagementComputational graph managementMulti-dimensionalarray processingNumerical computation

Computational device

8

9

10

11

12

ImportantDesignChoicesthroughuser’stypicalworkflow

WriteNNs(inwhichlanguage?)

Computebackprop(how?)

Updateparameters(howtorepresent?)(howtoupdate?)

Runusercodes(when?)

OptimizeCG(how?)

Scaleuptraining(how?)

Coding Execution Improvement

ImportantDesignChoicesthroughuser’stypicalworkflow

WriteNNs(inwhichlanguage?)

Computebackprop(how?)

Updateparameters(howtorepresent?)(howtoupdate?)

Runusercodes(when?)

Coding Execution Improvement

OptimizeCG(how?)

Scaleuptraining(how?)

13

http://bit.ly/aaai-dlif

14

NeuralNetworkasaComputationalGraph

• Inmostframeworks,NNisconceptualizedasacomputationalgraph(CG).• ThesimplestformofCGisabipartite DAG(DirectedAcyclicGraph)consistingofdatanodes andoperatornodes.

y = x1 * x2z = y - x3

x1 mul suby

x3

z

x2

datanode

operatornode15

MultiLayerPerceptron(MLP)

x Affine

W1 b1

h1 ReLU a1

Affine

W2 b2

h2 ReLU a2

Softmax prob Cross

Entropy loss

t 16

HowtocomputebackpropBackprop throughgraphsFrameworkonlybuildsgraphsofforwardprop,anddobackpropbybacktrackingthegraphs.

E.g.Torch.nn,Caffe

Backprop asextendedgraphsFrameworkbuildsgraphsforbackprop aswellasthoseforforwardprop.

E.g.Theano,MXNet,TensorFlow,Chainer,PyTorch

a mul suby

c

z

b

a mul suby

c

z

b

gzid

neg

mul

mul

gy

gc

ga

gb

∇y z∇a z ∇z z = 1

17

Howtocomputebackprop

Backprop throughgraphs

EasyandsimpletoimplementBackpropcomputationneednotbedefinedasgraphs.

LowflexibilityFeaturesavailableforgraphsmaynotapplytobackpropcomputations.

Backprop asextendedgraphs

Implementationgetscomplicated

HighflexibilityAnyfeaturesavailableforgraphscanalsobeappliedtobackpropcomputations(e.g.backpropofbackprop).

18

Doublebackprop

x F z

y

・・・ L

class F(FunctionNode):def forward(self, x, y):

return x * x + y

def backward(self, x, y, gz):return 2 * gz * x, gz

NumPy,CuPy

Note:Theinterfaceissimplifiedfromactualimplementation.

chainer.Variable->CreatesCG

19

Doublebackprop

x F z

y

gx Grad F gz

gy

・・・ L

Backprop!

=∂L/∂z=∂L/∂x

=∂L/∂y

1.0

=∂L/∂L

Mul

x

gz

y

gx

gy

*2

20

Doublebackprop

x F z

y

gx Grad F 1.0

gy

Backprop!

=∂z/∂x

=∂z/∂y 21

Doublebackprop

x F z

y

gx

Grad F1.0

gy

22

Doublebackpropx Mul z

y

gx

Grad F1.0

gy

Backprop!

1.0DoubleGrad F

ggx

=∂2z/∂x2 23

Doublebackprop

x f z

ComputesthedifferentiationofL = G(f(x), ∇f(x)) withrespecttox

L = G(f(x), ∇f(x))

24

Doublebackprop

x f z

gxGrad f

ComputesthedifferentiationofL = G(f(x), ∇f(x)) withrespecttox

L = G(f(x), ∇f(x))

25

Doublebackprop

x f z

gxGrad f

・・・ L

ComputesthedifferentiationofL = G(f(x), ∇f(x)) withrespecttox

L = G(f(x), ∇f(x))

26

Doublebackprop

x f z

gxGrad f

・・・ L

Backprop!

ggxDoubleGrad f

∂L/∂x

1.0gzGrad f

ComputesthedifferentiationofL = G(f(x), ∇f(x)) withrespecttox

L = G(f(x), ∇f(x))

27

Example(Chainer)

http://bit.ly/2wpEzO5

28

Example(PyTorch)

29

Example(TensorFlow)

30

Conclusion

• SeveralDLframeworkshavesimilarityintheirstructure• Differenceinchoiceofdesigndeterminescapabilityofframeworks• Introductionofdoublebackprop andtoyexamplesinseveralframeworks.

31