Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions...
Transcript of Abstractions and Frameworks for Deep Learning: a ... · 23 / 28 − Rémi Emonet − Abstractions...
Abstractions and Frameworksfor Deep Learning:
a Discussion−
Caffe, Torch, Theano,TensorFlow, et al.
Rémi Emonet
Saintélyon Deep Learning Workshop − 2015-11-26
Disclaimer
Introductory Poll
Did you ever use?CaffeTheanoLasagneTorchTensorFlowOther?
Any experience to share?
3 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Overview
Deep Learning?
Abstraction in Frameworks
A Tour of Existing Framework
More Discussions?
Overview
Deep Learning?
Abstraction in Frameworks
A Tour of Existing Framework
More Discussions?
5 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
6 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Finding Parameters of a Function(supervised)
NotationsInput i
Output o
Function f given
Parameters θ to be learned
We suppose: o = f (i)
How to optimize it: how to find the best θ?need some regularity assumptionsusually, at least differentiability
Remark: a more generic viewo = f (i) = f(θ, i)
θ
θ
7 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Gradient DescentWe want to find the best parameters
we suppose: o = f (i)we have examples of inputs i and target output t
we want to minimize the sum of errors L(θ) = L(f (i ), t )
we suppose f and L are differentiable
Gradient descent (gradient = vector of partial derivatives)start with a random θ
compute the gradient and update θ = θ − γ∇ L(θ)
Variationsstochastic gradient descent (SGD)conjugate gradient descentBFGSL-BFGS...
θ
n n
n
∑ θn n
0
t+1 tθ
8 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Finding Parameters of a “Deep” FunctionIdea
f is a composition of functions
2 layers: o = f (i) = f (f (i))3 layers: o = f (i) = f (f (f (i)))K layers: o = f (i) = f (...f (f (f (i)))...)with all f differentiable
How can we optimize it?
The chain rule!
Many versions (with F = f ∘ g)
(f ∘ g) = (f ∘ g) ⋅ g
F (x) = f (g(x))g (x)
= ⋅
θ θ22
θ11
θ θ33
θ22
θ11
θ θKK
θ33
θ22
θ11
l
′ ′ ′
′ ′ ′
dx
df
dg
df
dx
dg
9 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Finding Parameters of a “Deep” FunctionReminders: K layers: o = f (i) = f (...f (f (f (i)))...)
minimize the sum of errors L(θ) = L(f (i ), t )
chain rule = ⋅
Goal: compute ∇ L for gradient descent
∇ L = =
∇ L = =
∇ L = = ⋯
: gradient of the loss with respect to its input ✔
: gradient of a function with respect to its input ✔
: gradient of a function with respect to its parameters ✔
θ θKK
θ33
θ22
θ11
n
∑ θn n
dx
df
dg
df
dx
dg
θ
θK
dθK
dL
df K
dL
dθK
df K
θK−1dθK−1
dL
df K
dL
df K−1
df K
dθK−1
df K−1
θ1dθ1
dL
df K
dL
df K−1
df K
df 1
df 2
dθ1
df 1
df K
dL
df k−1
df k
dθk
df k
10 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Deep Learning and Composite FunctionsDeep Learning?
NN can be deep, CNN can be deep“any” composition of differentiable functioncan be optimized with gradient descentsome other models are also deep... (hierarchical models, etc)
Evaluating a composition f (i) = f (...f (f (f (i)))...)“forward pass”evaluate successively each function
Computing the gradient ∇ L (for gradient descent)compute the input ($o) gradient (from the output error)for each f , f , ...compute the parameter gradient (from the output gradient)compute the input gradient (from the output gradient)
θ θKK
θ33
θ22
θ11
θ
1 2
11 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Back to “seeing parameters as inputs”Parameters (θ )
Just another input of f
Can be rewritten, e.g. as f (θ , x)
More genericinputs can be constantinputs can be parametersinputs can be produced by another function (e.g. f(g(x), h(x)))
k
k
k k
Overview
Deep Learning?
Abstraction in Frameworks
A Tour of Existing Framework
More Discussions?
12 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
13 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Function/Operator/LayerThe functions that we can use for f
Many choicesfully connected layersconvolutions layersactivation functions (element-wise)soft-maxpooling...
Loss Functions: same with no parameters
In the wildTorch moduleTheano operator
k
14 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Data/Blob/TensorThe data: input, intermediate result, parameters, gradient, ...
Usually a tensor (n-dimensional matrices)
In the wildTorch tensorTheano tensor, scalars, numpy arrays
Overview
Deep Learning?
Abstraction in Frameworks
A Tour of Existing Framework
More Discussions?
15 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
16 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
ContendersCaffe
Torch
Theano
Lasagne
Tensor Flow
Deeplearning4j
...
17 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
OverviewBasics
install CUDA/Cublas/OpenBlasblob/tensors, blocks/layers/loss, parameterscuDNNopen source
Control flowdefine a composite function (graph)choice of an optimizerforward, backward
Extendwrite a new operator/module"forward""backward": gradParam, gradInput
18 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Caffe"made with expression, speed, and modularity in mind"
"developed by the Berkeley Vision and Learning Center (BVLC)"
"released under the BSD 2-Clause license"
C++
layers-oriented http://caffe.berkeleyvision.org/tutorial/layers.html
plaintext protocol buffer schema (prototxt) to describe models(and so save them too)
1,068 / 7,184 / 4,077
19 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Torch7By
Ronan Collobert (Idiap, now Facebook)Clement Farabet (NYU, now Madbits now Twitter)Koray Kavukcuoglu (Google DeepMind)
Lua (+ C)need to learneasy to embed
Layer-orientedeasy to usedifficult to extend, sometimes (merging sources)
418 / 3,267 / 757
20 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Theano"is a Python library"
"allows you to define, optimize, and evaluate mathematicalexpressions"
"involving multi-dimensional arrays"
"efficient symbolic differentiation"
"transparent use of a GPU"
"dynamic C code generation"
Use symbolic expressions: reasoning on the graphwrite numpy-like codeno forced “layered” architecturecomputation graph
263 / 2,447 / 878
21 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Lasagne (Keras, etc)Overlay to Theano
Provide layer API close to caffe/torch etc
Layer-oriented
133 / 1,401 / 342
22 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Tensor FlowBy Google, Nov. 2015
Selling pointseasy to move from a cluster to a mobile phoneeasy to distribute
Currently slow?
Not fully open yet?
1,303 / 13,232 / 3,375
23 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Deeplearning4j“Deep Learning for Java, Scala & Clojure on Hadoop,Spark & GPUs“
Apache 2.0-licensed
Java
High level (layer-oriented)
Typed API
236 / 1,648 / 548
Overview
Deep Learning?
Abstraction in Frameworks
A Tour of Existing Framework
More Discussions?
24 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
25 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Be creative!anything differentiable
can be tried!
26 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
How to choose a framework?
27 / 28 − Rémi Emonet − Abstractions and Frameworks for Deep Learning: a Discussion − Caffe, Torch, Theano, TensorFlow, et al.
Any experience to share?
Thanks
Abstractions and Frameworksfor Deep Learning:
a Discussion−
Caffe, Torch, Theano, TensorFlow, et al.
Rémi Emonet
Saintélyon Deep Learning Workshop − 2015-11-26