Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions...

28
Abstractions and Frameworks for Deep Learning: a Discussion Caffe, Torch, Theano, TensorFlow, et al. Rémi Emonet Saintélyon Deep Learning Workshop 2015-11-26

Transcript of Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions...

Page 1: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

Abstractions and Frameworksfor Deep Learning:

a Discussion−

Caffe, Torch, Theano,TensorFlow, et al.

Rémi Emonet

Saintélyon Deep Learning Workshop − 2015-11-26

Page 2: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

Disclaimer

Page 3: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

Introductory Poll

Did you ever use?CaffeTheanoLasagneTorchTensorFlowOther?

Any experience to share?

3 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Page 4: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

Overview

Deep Learning?

Abstraction in Frameworks

A Tour of Existing Framework

More Discussions?

Page 5: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

Overview

Deep Learning?

Abstraction in Frameworks

A Tour of Existing Framework

More Discussions?

5 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Page 6: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

6 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Finding Parameters of a Function(supervised)

NotationsInput i

Output o

Function f given

Parameters θ to be learned

We suppose: o = f (i)

How to optimize it: how to find the best θ?need some regularity assumptionsusually, at least differentiability

Remark: a more generic viewo = f (i) = f(θ, i)

θ

θ

Page 7: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

7 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Gradient DescentWe want to find the best parameters

we suppose: o = f (i)we have examples of inputs i and target output t

we want to minimize the sum of errors L(θ) = L(f (i ), t )

we suppose f and L are differentiable

Gradient descent (gradient = vector of partial derivatives)start with a random θ

compute the gradient and update θ = θ − γ∇ L(θ)

Variationsstochastic gradient descent (SGD)conjugate gradient descentBFGSL-BFGS...

θ

n n

n

∑ θn n

0

t+1 tθ

Page 8: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

8 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Finding Parameters of a “Deep” FunctionIdea

f is a composition of functions

2 layers: o = f (i) = f (f (i))3 layers: o = f (i) = f (f (f (i)))K layers: o = f (i) = f (...f (f (f (i)))...)with all f differentiable

How can we optimize it?

The chain rule!

Many versions (with F = f ∘ g)

(f ∘ g) = (f ∘ g) ⋅ g

F (x) = f (g(x))g (x)

= ⋅

θ θ22

θ11

θ θ33

θ22

θ11

θ θKK

θ33

θ22

θ11

l

′ ′ ′

′ ′ ′

dx

df

dg

df

dx

dg

Page 9: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

9 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Finding Parameters of a “Deep” FunctionReminders: K layers: o = f (i) = f (...f (f (f (i)))...)

minimize the sum of errors L(θ) = L(f (i ), t )

chain rule = ⋅

Goal: compute ∇ L for gradient descent

∇ L = =

∇ L = =

∇ L = = ⋯

: gradient of the loss with respect to its input ✔

: gradient of a function with respect to its input ✔

: gradient of a function with respect to its parameters ✔

θ θKK

θ33

θ22

θ11

n

∑ θn n

dx

df

dg

df

dx

dg

θ

θK

dθK

dL

df K

dL

dθK

df K

θK−1dθK−1

dL

df K

dL

df K−1

df K

dθK−1

df K−1

θ1dθ1

dL

df K

dL

df K−1

df K

df 1

df 2

dθ1

df 1

df K

dL

df k−1

df k

dθk

df k

Page 10: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

10 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Deep Learning and Composite FunctionsDeep Learning?

NN can be deep, CNN can be deep“any” composition of differentiable functioncan be optimized with gradient descentsome other models are also deep... (hierarchical models, etc)

Evaluating a composition f (i) = f (...f (f (f (i)))...)“forward pass”evaluate successively each function

Computing the gradient ∇ L (for gradient descent)compute the input ($o) gradient (from the output error)for each f , f , ...compute the parameter gradient (from the output gradient)compute the input gradient (from the output gradient)

θ θKK

θ33

θ22

θ11

θ

1 2

Page 11: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

11 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Back to “seeing parameters as inputs”Parameters (θ )

Just another input of f

Can be rewritten, e.g. as f (θ , x)

More genericinputs can be constantinputs can be parametersinputs can be produced by another function (e.g. f(g(x), h(x)))

k

k

k k

Page 12: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

Overview

Deep Learning?

Abstraction in Frameworks

A Tour of Existing Framework

More Discussions?

12 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Page 13: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

13 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Function/Operator/LayerThe functions that we can use for f

Many choicesfully connected layersconvolutions layersactivation functions (element-wise)soft-maxpooling...

Loss Functions: same with no parameters

In the wildTorch moduleTheano operator

k

Page 14: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

14 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Data/Blob/TensorThe data: input, intermediate result, parameters, gradient, ...

Usually a tensor (n-dimensional matrices)

In the wildTorch tensorTheano tensor, scalars, numpy arrays

Page 15: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

Overview

Deep Learning?

Abstraction in Frameworks

A Tour of Existing Framework

More Discussions?

15 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Page 16: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

16 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

ContendersCaffe

Torch

Theano

Lasagne

Tensor Flow

Deeplearning4j

...

Page 17: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

17 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

OverviewBasics

install CUDA/Cublas/OpenBlasblob/tensors, blocks/layers/loss, parameterscuDNNopen source

Control flowdefine a composite function (graph)choice of an optimizerforward, backward

Extendwrite a new operator/module"forward""backward": gradParam, gradInput

Page 18: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

18 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Caffe"made with expression, speed, and modularity in mind"

"developed by the Berkeley Vision and Learning Center (BVLC)"

"released under the BSD 2-Clause license"

C++

layers-oriented http://caffe.berkeleyvision.org/tutorial/layers.html

plaintext protocol buffer schema (prototxt) to describe models(and so save them too)

1,068 / 7,184 / 4,077

Page 19: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

19 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Torch7By

Ronan Collobert (Idiap, now Facebook)Clement Farabet (NYU, now Madbits now Twitter)Koray Kavukcuoglu (Google DeepMind)

Lua (+ C)need to learneasy to embed

Layer-orientedeasy to usedifficult to extend, sometimes (merging sources)

418 / 3,267 / 757

Page 20: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

20 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Theano"is a Python library"

"allows you to define, optimize, and evaluate mathematicalexpressions"

"involving multi-dimensional arrays"

"efficient symbolic differentiation"

"transparent use of a GPU"

"dynamic C code generation"

Use symbolic expressions: reasoning on the graphwrite numpy-like codeno forced “layered” architecturecomputation graph

263 / 2,447 / 878

Page 21: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

21 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Lasagne (Keras, etc)Overlay to Theano

Provide layer API close to caffe/torch etc

Layer-oriented

133 / 1,401 / 342

Page 22: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

22 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Tensor FlowBy Google, Nov. 2015

Selling pointseasy to move from a cluster to a mobile phoneeasy to distribute

Currently slow?

Not fully open yet?

1,303 / 13,232 / 3,375

Page 23: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Deeplearning4j“Deep Learning for Java, Scala & Clojure on Hadoop,Spark & GPUs“

Apache 2.0-licensed

Java

High level (layer-oriented)

Typed API

236 / 1,648 / 548

Page 24: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

Overview

Deep Learning?

Abstraction in Frameworks

A Tour of Existing Framework

More Discussions?

24 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Page 25: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

25 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Be creative!anything differentiable

can be tried!

Page 26: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

26 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

How to choose a framework?

Page 27: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

27 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.

Any experience to share?

Page 28: Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow,

Thanks

Abstractions and Frameworksfor Deep Learning:

a Discussion−

Caffe, Torch, Theano, TensorFlow, et al.

Rémi Emonet

Saintélyon Deep Learning Workshop − 2015-11-26