Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions...

Abstractions and Frameworksfor Deep Learning:

a Discussion−

Caffe, Torch, Theano,TensorFlow, et al.

Rémi Emonet

Saintélyon Deep Learning Workshop − 2015-11-26

Disclaimer

Introductory Poll

Did you ever use?CaffeTheanoLasagneTorchTensorFlowOther?

Any experience to share?

3 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Overview

Deep Learning?

Abstraction in Frameworks

A Tour of Existing Framework

More Discussions?

Overview

Deep Learning?



More Discussions?



Finding Parameters of a Function(supervised)

NotationsInput i

Output o

Function f given

Parameters θ to be learned

We suppose: o = f (i)

How to optimize it: how to find the best θ?need some regularity assumptionsusually, at least differentiability

Remark: a more generic viewo = f (i) = f(θ, i)

θ

θ


Gradient DescentWe want to find the best parameters

we suppose: o = f (i)we have examples of inputs i and target output t

we want to minimize the sum of errors L(θ) = L(f (i ), t )

we suppose f and L are differentiable

Gradient descent (gradient = vector of partial derivatives)start with a random θ

compute the gradient and update θ = θ − γ∇ L(θ)

Variationsstochastic gradient descent (SGD)conjugate gradient descentBFGSL-BFGS...

θ

n n

n

∑ θn n

0

t+1 tθ


Finding Parameters of a “Deep” FunctionIdea

f is a composition of functions

2 layers: o = f (i) = f (f (i))3 layers: o = f (i) = f (f (f (i)))K layers: o = f (i) = f (...f (f (f (i)))...)with all f differentiable

How can we optimize it?

The chain rule!

Many versions (with F = f ∘ g)

(f ∘ g) = (f ∘ g) ⋅ g

F (x) = f (g(x))g (x)

= ⋅

θ θ22

θ11

θ θ33

θ22

θ11

θ θKK

θ33

θ22

θ11

l

′ ′ ′

′ ′ ′

dx

df

dg

df

dx

dg


Finding Parameters of a “Deep” FunctionReminders: K layers: o = f (i) = f (...f (f (f (i)))...)

minimize the sum of errors L(θ) = L(f (i ), t )

chain rule = ⋅

Goal: compute ∇ L for gradient descent

∇ L = =

∇ L = =

∇ L = = ⋯

: gradient of the loss with respect to its input ✔

: gradient of a function with respect to its input ✔

: gradient of a function with respect to its parameters ✔

θ θKK

θ33

θ22

θ11

n

∑ θn n

dx

df

dg

df

dx

dg

θ

θK

dθK

dL

df K

dL

dθK

df K

θK−1dθK−1

dL

df K

dL

df K−1

df K

dθK−1

df K−1

θ1dθ1

dL

df K

dL

df K−1

df K

df 1

df 2

dθ1

df 1

df K

dL

df k−1

df k

dθk

df k


Deep Learning and Composite FunctionsDeep Learning?

NN can be deep, CNN can be deep“any” composition of differentiable functioncan be optimized with gradient descentsome other models are also deep... (hierarchical models, etc)

Evaluating a composition f (i) = f (...f (f (f (i)))...)“forward pass”evaluate successively each function

Computing the gradient ∇ L (for gradient descent)compute the input ($o) gradient (from the output error)for each f , f , ...compute the parameter gradient (from the output gradient)compute the input gradient (from the output gradient)

θ θKK

θ33

θ22

θ11

θ

1 2


Back to “seeing parameters as inputs”Parameters (θ )

Just another input of f

Can be rewritten, e.g. as f (θ , x)

More genericinputs can be constantinputs can be parametersinputs can be produced by another function (e.g. f(g(x), h(x)))

k

k

k k

Overview

Deep Learning?



More Discussions?



Function/Operator/LayerThe functions that we can use for f

Many choicesfully connected layersconvolutions layersactivation functions (element-wise)soft-maxpooling...

Loss Functions: same with no parameters

In the wildTorch moduleTheano operator

k


Data/Blob/TensorThe data: input, intermediate result, parameters, gradient, ...

Usually a tensor (n-dimensional matrices)

In the wildTorch tensorTheano tensor, scalars, numpy arrays

Overview

Deep Learning?



More Discussions?



ContendersCaffe

Torch

Theano

Lasagne

Tensor Flow

Deeplearning4j

...


OverviewBasics

install CUDA/Cublas/OpenBlasblob/tensors, blocks/layers/loss, parameterscuDNNopen source

Control flowdefine a composite function (graph)choice of an optimizerforward, backward

Extendwrite a new operator/module"forward""backward": gradParam, gradInput


Caffe"made with expression, speed, and modularity in mind"

"developed by the Berkeley Vision and Learning Center (BVLC)"

"released under the BSD 2-Clause license"

C++

layers-oriented http://caffe.berkeleyvision.org/tutorial/layers.html

plaintext protocol buffer schema (prototxt) to describe models(and so save them too)

1,068 / 7,184 / 4,077


Torch7By

Ronan Collobert (Idiap, now Facebook)Clement Farabet (NYU, now Madbits now Twitter)Koray Kavukcuoglu (Google DeepMind)

Lua (+ C)need to learneasy to embed

Layer-orientedeasy to usedifficult to extend, sometimes (merging sources)

418 / 3,267 / 757


Theano"is a Python library"

"allows you to define, optimize, and evaluate mathematicalexpressions"

"involving multi-dimensional arrays"

"efficient symbolic differentiation"

"transparent use of a GPU"

"dynamic C code generation"

Use symbolic expressions: reasoning on the graphwrite numpy-like codeno forced “layered” architecturecomputation graph

263 / 2,447 / 878


Lasagne (Keras, etc)Overlay to Theano

Provide layer API close to caffe/torch etc

Layer-oriented

133 / 1,401 / 342


Tensor FlowBy Google, Nov. 2015

Selling pointseasy to move from a cluster to a mobile phoneeasy to distribute

Currently slow?

Not fully open yet?

1,303 / 13,232 / 3,375


Deeplearning4j“Deep Learning for Java, Scala & Clojure on Hadoop,Spark & GPUs“

Apache 2.0-licensed

Java

High level (layer-oriented)

Typed API

236 / 1,648 / 548

Overview

Deep Learning?



More Discussions?



Be creative!anything differentiable

can be tried!


How to choose a framework?


Any experience to share?

Thanks

Abstractions and Frameworksfor Deep Learning:

a Discussion−

Caffe, Torch, Theano, TensorFlow, et al.

Rémi Emonet

Saintélyon Deep Learning Workshop − 2015-11-26

Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions...

Documents

Transcript of Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions...