Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What...
Transcript of Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What...
![Page 1: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/1.jpg)
27.09.18
Oliver Dürr
OverdoinglinearregressionwithTensorFlow
![Page 2: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/2.jpg)
Outline
Background• What is deep learning• What is linear regression• Why linear regression is of interest in this area• Solving linear regression the DL way• Gradient Descent
TensorFlow (as an example of a DL framework)• Computational Graph• Gradient Flow in a computational graph
![Page 3: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/3.jpg)
What is DL (in 4 slides)
4
![Page 4: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/4.jpg)
Why DL: Imagenet 2012, 2013, 2014, 2015
A. Krizhevskyfirst CNN in 2012Und es hat zoom gemacht
GoogLeNet 6.7%
Figure: https://medium.com/global-silicon-valley/machine-learning-yesterday-today-tomorrow-3d3023c7b519
1000 classes1 Mio samples
Only one non-CNN approach in 2013
2015: It gets tougher 4.95% Microsoft (Feb 6 surpassing human performance 5.1%)4.8% Google (Feb 11) -> further improved to 3.6 (Dec)?4.58% Baidu (May 11 banned due too many submissions )3.57% Microsoft (Resnet winner 2015)
…
Human: 5% misclassification
5 5
![Page 5: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/5.jpg)
Application Areas of DL
6
InputxtoDLmodel OutputyofDLmodel Application
Images Label“Tiger”
Imageclassification
Audio Sequence/Text“seeyoutomorrow”
VoiceRecognition
ASCII-Sequences“Hallo,wiegehts?”
Unicode-Sequences“������?”
Translation
ASCII-SequenceThismoviewasrathergood
Label(Sentiment)positive
Sentiment Analysis
StructuredDatacity=‘london’,device=‘mobile’
P(“userclicksonadd”) Clickprediction
Action DeepReinforcementLearninge.g.GO
Reward for last ActionState of the world
![Page 6: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/6.jpg)
Main Idea in DL
Learn hierarchy of features 7
![Page 7: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/7.jpg)
A DL model: Fully Connected aka MLP
• The input: e.g. intensity values of pixels of an image
• Information is processed layer by layer
• Output: probability that image belongs to certain person
• Arrows are weights (these need to be learned)
• For image and text there are specialized architectures (CNN, RNN) 8
![Page 8: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/8.jpg)
The learning process
• Three ingredients
– A model with weights, which needs to be learned
– Data with labels (reinforcement is a bit different)
– A loss function, describing who good the data is fit with the model
• Learning is tuning the weights to fit the training data
9
![Page 9: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/9.jpg)
Linear Regression
10
![Page 10: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/10.jpg)
An introductory remark
11
<<All the impressive achievements of deep learning amount to just curve fitting>>Judea Pearl, 2018
Let‘s look at the simplest curve fitting model: linear regression
![Page 11: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/11.jpg)
12
Linear Regression: See Backbord
• Model \hat y = a * x + b• Linear Regression as a mother of all
networks
• Training Data • (x and y pairs)• Plot
• Kriterium RSS
![Page 12: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/12.jpg)
Tuning the weights
13http://mathlets.org/mathlets/linear-regression/
![Page 13: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/13.jpg)
Gradient and Gradient Descent
14
Gradient Descent: 1-D function See Backbord
![Page 14: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/14.jpg)
Demo for influence of step size
15https://developers.google.com/machine-learning/crash-course/fitter/graph
![Page 15: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/15.jpg)
Optimization in 2-D
• 2 equivalent representations
Slide credit: Thilo, wikipedia 16
![Page 16: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/16.jpg)
Optimization in 2-D
• Gradient Descent
Slide credit: wikipedia
Gradient is perpendicular to levels Wi
t+1 =Wit − ε ∂Wi
loss
W1
W2
−∂W2loss
−∂W1loss
−∇loss
17
![Page 17: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/17.jpg)
Gradient Descent
Slide from cs229
Figure shows a 2 dimensional loss function. In DL Millions! We just know the current value (blind) 18
![Page 18: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/18.jpg)
Local vs. Global Minima
• If the loss is convex, gradient descent converges to local minima if step-size is small enough
• Linear regression is a convex problem
• Deep Learning is by far not a convex problem. Still works in practice (one of the miracles of DL)
19Image credit: Wikipedia, https://openreview.net/pdf?id=HkmaTz-0W
![Page 19: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/19.jpg)
Details left out
• Mini-batch stochastic gradient descent– Sometimes we cannot use all of the data points è just use a random
subset
• Overfitting problematic– When the models get to complicated (many weights) models can learn
the particularities of the training data
• Deep Learning is often used for classification problems– Here we had regression with RSS– Classification similar but different loss functions
20
![Page 20: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/20.jpg)
Introduction to TF
21
![Page 21: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/21.jpg)
Deep Learning frameworks
22
TensorFlow
Keraspytorch
![Page 22: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/22.jpg)
Two mayor library designs
We need gradients, of functions with 100 million+ weights.
Two design principles
• Static Computational Graph (build and run the graph in 2 steps)– Theano– TensorFlow
• Autograd– Pytorch– TensorFlow Eager
23
![Page 23: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/23.jpg)
What is TensorFlow
• It’s API about tensors, which flow in a computational graph
• What are tensors?
https://www.tensorflow.org/
24
![Page 24: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/24.jpg)
What is a tensor?
In this course we only need the simple and easy accessible definition of Ricci:
Sharpe, R. W. (1997). Differential Geometry: Cartan's Generalization of Klein's Erlangen Program. Berlin, New York: Springer-Verlag. p. 194. ISBN 978-0-387-94732-7.
25
![Page 25: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/25.jpg)
What is a tensor?
For TensorFlow: A tensor is an array with several indices (like in numpy). Order are number of indices and shape is the range.
26
![Page 26: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/26.jpg)
Computations in TensorFlow (and Theano)
27
![Page 27: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/27.jpg)
Computations in TensorFlow (and Theano)
28
![Page 28: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/28.jpg)
Summary
• The computation in TF is done via a computational graph
• The nodes are ops• The edges are the flowing tensors
29
![Page 29: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/29.jpg)
TensorFlow: Computation in 2 steps
• Computations are done in 2 steps
– First: Build the graph – Second: Execute the graph
• Both steps can be done in many languages (python, C++, R)– Best supported so far is python
• Graph can be trained and ported on different devices – TPU – GPU– Embedded System like mobile phones
• Graph can be optimized– XLA optimization
30
![Page 30: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/30.jpg)
Building the graph (python)
numpy
10 3 3( ) 22
⎛⎝⎜
⎞⎠⎟= 120
![Page 31: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/31.jpg)
32
Be the spider who knits a computational graph
TensorFlow: Building the graph
Translate the following TF code in a graph
m1=(3,3)
Finish the computation graph
wipes the graph
Quite much happen in here!
10 3 3( ) 22
⎛⎝⎜
⎞⎠⎟= 120
![Page 32: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/32.jpg)
Be the spider who knits the computational graph
33
TensorFlow: Building the graph
TensorFlow: Executing the graph
Translate the following TF code in a graph
m1=(3,3)
Finish the computation graph
m2 = 22
⎛⎝⎜
⎞⎠⎟
Matmul(m1,m2)
x=10 product=mul
![Page 33: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/33.jpg)
Building the graph (Numpy vs TensorFlow)
numpy
TensorFlow: Building the graph
10 3 3( ) 22
⎛⎝⎜
⎞⎠⎟= 120
TensorFlow: Executing the graph
![Page 34: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/34.jpg)
Session vs Graph
• A graph is the abstract definition of the calculation
• A session is a concrete realization– It places the ops on physical devices such as GPUs– It initializes variables– We can feed and fetch a session (see next slides)
35
sess = tf.Session()… #do stuffsess.close() #Free the resources (TF eats all mem on GPU!)
with tf.Session()as sess:… #do stuff
#Free the resources when leaving the scope of with
Alternatively use the with construct
![Page 35: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/35.jpg)
Gradient Descent in TensorFlow
• In Theano and TensorFlow the Framework does the calculation of the gradient for you (autodiff)
• You just have to provide a graph
36
# loss has to be defined symbolicallytrain_op = tf.train.GradientDescentOptimizer(0.0001).minimize(loss)
…for e in range(epochs): #Fitting the data for some epochs
_, res = sess.run([train_op, loss], feed_dict={x:x_data, y:y_data})
![Page 36: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/36.jpg)
Look at the source luke
Exercises at:
https://tensorchiefs.github.io/linear_regression/
37
![Page 37: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/37.jpg)
Excercises
38
![Page 38: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/38.jpg)
Backup
39
![Page 39: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/39.jpg)
Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201627
f
activations
gradients
“local gradient”
Gradient flow in a computational graph: local junction
Illustration: http://cs231n.stanford.edu/slides/winter1516_lecture4.pdf
is modified by local gradient
= ∂ f∂y
![Page 40: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/40.jpg)
Example
èMultiplication do a switch
Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201627
f
activations
gradients
“local gradient”
1
∂(α + β )∂α
= 1 ∂(α *β )∂α
= β
![Page 41: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/41.jpg)
Computations using feeding and fetching
res = sess.run(f, feed_dict={b:[2]})
Fetch f (symbolic)
fetch (the numeric value) symbolic values
42
![Page 42: Overdoing linear regression with TensorFlow...Outline Background • What is deep learning • What is linear regression • Why linear regression is of interest in this area • Solving](https://reader036.fdocuments.us/reader036/viewer/2022071210/6021d35f0b647930c251bfe4/html5/thumbnails/42.jpg)
Feed and Fetch
• Fetches can be a list of tensors
• Feed (from TF docu)– A feed temporarily replaces the output of an operation with a tensor value.
You supply feed data as an argument to a run() call. The feed is only used for the run call to which it is passed. The most common use case involves designating specific operations to be “feed” operations by using tf.placeholder() to create them.
x = tf.placeholder(tf.float32, shape=(1024, 1024))res1, res2 = sess.run([loss, loss2], feed_dict={x:data[:,0], y:data[:,1]})
fetches (symbolic)
two inputs (feeds) fetches (the numeric values)
A more general example
res = sess.run(f, feed_dict={b:data[:,0]})
43