[AIESEC FTU HCMC] AIESEC Times - September 2011 - Colorize your life!
Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout...
Transcript of Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout...
Deep Learning
Lecture 1 – Introduction, linear models
Niklas WahlstromDivision of Systems and ControlDepartment of Information TechnologyUppsala University
[email protected]/katalog/nikwa778
1 / 35 [email protected] DL 2019 - Introduction, linear models
What is the course about?
2 / 35 [email protected] DL 2019 - Introduction, linear models
Machine learning
”Machine learning is about learning, reasoningand acting based on data.”
“It is one of today’s most rapidly growing technical fields, lying atthe intersection of computer science and statistics, and at the core
of artificial intelligence and data science.”
Ghahramani, Z. Probabilistic machine learning and artificial intelligence. Nature 521:452-459, 2015.
Jordan, M. I. and Mitchell, T. M. Machine Learning: Trends, perspectives and prospects. Science,349(6245):255-260, 2015.
3 / 35 [email protected] DL 2019 - Introduction, linear models
Deep learning
“Deep learning allows computational models that are composed ofmultiple processing layers to learn representations of data with
multiple levels of abstraction.”
LeCun, Y., Bengio, Y. and Hinton, G. Deep learning. Nature 521:436-444, 2015.
Example: Image classification
Input: pixels of an imageOutput: object identityEach hidden layer extractsincreasingly abstract features.Zeiler, M. D. and Fergus, R. Visualizing andunderstanding convolutional networks
Computer Vision - ECCV (2014).
Image removed in handout version due to copyrightc©Springer International Publishing
4 / 35 [email protected] DL 2019 - Introduction, linear models
ex) Cancer diagnosis
Systems for detectingcell divisions (mitosis)in histology images canbe used to improve (orautomate) cancerdiagnosis.
Image removed in handout version due to copyrightc©Springer-Verlag Berlin Heidelberg 2013
• Learn a model with
input: RBG histology image (pixel values)output: number and locations (in the image) of mitosisdetections
• Training data: Histology images labeled by experts.
D. C. Ciresan, A. Giusti, L. M. Gambardella and J. Schmidhuber. Mitosis Detection in Breast Cancer HistologyImages with Deep Neural Networks. In Medical Image Computing and Computer Assisted Intervention, 411-418,2013.
5 / 35 [email protected] DL 2019 - Introduction, linear models
ex) Colorization (I/II)
Image removed in handout version due to copyrightc©Springer
Task: Colorize gray-scale photos.
• Learn a model with
input: gray-scale pixel values.output: color pixel values.
R. Zhang, P. Isola, and A. A. Efros. Colorful Image Colorization. ECCV, 2016.
6 / 35 [email protected] DL 2019 - Introduction, linear models
ex) Colorization (II/II)
Image removed in handout version due to copyrightc©Springer
Model applied to legacy grayscale photos.
7 / 35 [email protected] DL 2019 - Introduction, linear models
ex) Alpha Go zero
• Input: State of the game (19× 19 grid,either black, white or blank)
• Output: Probability for the current playerto win the game
+ reinforcement learning
Silver et al. Mastering the game of Go with deep neuralnetworks and tree search, Nature 529, 484–489, 2016.
• Input: Same
• Output: Probability for the current playerto win the game and what move to make
Silver et al. Mastering the game of Go without humanknowledge, Nature 550, 354–359, 2017.
Silver et al. A general reinforcement learning algorithmthat masters chess, shogi, and Go through self-play,Science, 362(6419): 1140–1144, 2018.
8 / 35 [email protected] DL 2019 - Introduction, linear models
ex) Higgs Machine Learning Challenge
HiggsML: Crowd-sourcing initiative by CERN (hosted at Kaggle)
• Separate H → ττ from background noise.
• Learn a model with
- input: 30-dimensional vector of“features” recorded during theexperiment.
- output: “signal” or “background”
• Deep Learning methods among the winningmethods.
C. Adam-Bourdarios, G. Cowan, C. Germain, I. Guyon, B. Kegl and D. Rousseau. The Higgs boson machinelearning challenge. NIPS 2014 Workshop on High-energy Physics and Machine Learning, 2014.
9 / 35 [email protected] DL 2019 - Introduction, linear models
ex) Generative models
Novel generation of painting
• Learn a model with
- input: paintings- output: style, genre,
artist, year, etc.
• Used a generative model tocreate artificial paintings
• More about generativemodels in lecture 9
K. Jones and D. Bonafilia, A. Danyluk GANGogh: Creating Art with GANs. 2017https://towardsdatascience.com/gangogh-creating-art-with-gans-8d087d8f74a1
10 / 35 [email protected] DL 2019 - Introduction, linear models
Course information
11 / 35 [email protected] DL 2019 - Introduction, linear models
Lecture outline
1. Introduction2. Feed farward neural networks3. Optimization4. Convolutional neural networks I5. Convolutional neural networks II6. Over-/underfitting, bias-variance trade-off7. Regularization in deep learning8. Variational inference9. Variational autoencoders10. Summary and guest lecture
12 / 35 [email protected] DL 2019 - Introduction, linear models
Hand-in assignments
4 hand-in assignments (HAs):
• Covers (mainly) implementation aspects of deep learning.
• Deadlines for all HAs available on course homepage.
• Everyone has 7 joker days, which can be used with any HA.
• You are encouraged to collaborate...
• ... but you should write submit your own report and code.
Optional helpdesks are scheduled after each lecture.
13 / 35 [email protected] DL 2019 - Introduction, linear models
Optional project
• After the course you are encouraged to carry out a project
• Preferably 1-4 students in each team
• The projects must include real data
• Awarded 3hp extra
• Conducted during summer, see course homepage for the dates
14 / 35 [email protected] DL 2019 - Introduction, linear models
Course literature
• Ian Goodfellow, Yoshua Bengio and Aaron Courville DeepLearning, MIT Press, 2016. www.deeplearningbook.org
We will not follow the book strictly though. Another greatresource is
• Michael A. Nielson Neural Networks and Deep LearningDeterminiation Press, 2015.neuralnetworksanddeeplearning.com
• First lectures are well covered by lecture notes linked from thecourse homepage.
All of them available online.
15 / 35 [email protected] DL 2019 - Introduction, linear models
Who are we?
Teachers involved in the course (in approximate order ofappearance):
NiklasWahlstrom
Room:2319
ThomasSchon
Room:2209
JoakimLindblad
Room:2141
JalilTaghia
Room:2321
FredrikGustafsson
Room:2303
NicolasPielawski
Room:xxxx
AnindyaGupta
Room:xxxx︸ ︷︷ ︸
Lecturers
︸ ︷︷ ︸Teaching assistants
All room numbers are at ITC Polacksbacken.You can reach us by email: <firstname.lastname>@it.uu.se.
16 / 35 [email protected] DL 2019 - Introduction, linear models
Who are you?
17 / 35 [email protected] DL 2019 - Introduction, linear models
Linear regression and classification
18 / 35 [email protected] DL 2019 - Introduction, linear models
Supervised machine learning
Methods for automatically learning (training, estimating, . . . )a model for the relationship between
• the input x, and the
• the output y
from observed training data
T def= {(x1, y1), (x2, y2), . . . , (xn, yn)}.
19 / 35 [email protected] DL 2019 - Introduction, linear models
Regression vs. classification
We will distinguish between two types of problems:regression and classification
Regression is when the output y is quantitative, e.g.
• Climate models (y = “increase in global temperature”)
• Economic models (y = “change in GDP”)
Classification is when the output y is qualitative, e.g.
• Spam filters (y ∈ {spam, good email})• Diagnosis systems (y ∈ {ALL, AML, CLL, CML, no leukemia})• Fingerprint verification (y ∈ {match, no match})
20 / 35 [email protected] DL 2019 - Introduction, linear models
ex) A regression problem
0 5 10 15 20 25 30 35 40 450
50
100
150
Speed (mph)
Dis
tan
ce(f
eet)
Data
0 5 10 15 20 25 30 35 40 450
50
100
150
Speed (mph)–Input
Dis
tan
ce(f
eet)
–O
utp
ut
Linear model
Data (xi, yi)
0 5 10 15 20 25 30 35 40 450
50
100
150
Speed (mph)–Input
Dis
tan
ce(f
eet)
–O
utp
ut
Linear model
Data (xi, yi)
Predictions (x?, y?)
Consider a linear model to explain the data
y = wx+ b
21 / 35 [email protected] DL 2019 - Introduction, linear models
What is a good model?
0 5 10 15 20 25 30 35 40 450
50
100
150
Speed (mph)
Dis
tan
ce(f
eet)
A linear modelAnother linear modelA third linear model
A fourth linear modelTraining data T
22 / 35 [email protected] DL 2019 - Introduction, linear models
Learning using maximum likelihood
Assume that each data points can be described by a linear model + noise
yi = wxi + b+ εi, εi ∼ N (0, σ2)
Maximum likelihood: Think of ε (dotted) as Gaussain randomvariables, and choose the model (solid) such that the resulting ε areas likely as possible.
23 / 35 [email protected] DL 2019 - Introduction, linear models
Gaussian (Normal) distribution
Probability density function (PDF) for the scalar Gaussiandistribution
N (z;µ, σ2) =1√2πσ2
e−(z−µ)2
2σ2
• µ is the mean (expectedvalue of the distribution)
• σ is the standard deviation
• σ2 is the variance0 5 10
0
0.2
0.4
0.6
0.8
1
zP
DF
µ = −1, σ2 = 0.5
µ = 7, σ2 = 0.2
µ = 4, σ2 = 4
z ∼ N (z;µ, σ2) means that z is a Gaussian random variable withmean µ and variance σ2. ∼ reads “distributed according to”.
24 / 35 [email protected] DL 2019 - Introduction, linear models
Maximum likelihood
A linear model with Gaussian noise
yi = wxi + b+ εi, εi ∼ N (0, σ2), i = 1, . . . , n
The model can also be expressed as
p(yi|xi, w, b) = N (yi;wxi + b, σ2)
Pick w and b which makes the data as likely as possible
w, b = argmaxw,b
p(y1, . . . , yn|x1, . . . , xn, w, b)
Assume all εi to be independent
p(y1, . . . , yn|x1, . . . , xn, w, b) =n∏
i=1
p(yi|xi, w, b)
=
n∏i=1
N (yi;wxi + b, σ2)
∝ e−1
2σ2
∑ni=1(yi−wxi−b)2
25 / 35 [email protected] DL 2019 - Introduction, linear models
y and z independent⇒ p(y, z) = p(y)p(z)
Linear regression
The least squares problem
w, b = argminw,b
n∑i=1
(wxi + b− yi)2
0 5 10 15 20 25 30 35 40 450
50
100
150
Speed (mph)
Dis
tan
ce(f
eet)
Linear regression model
Data (xi, yi)
26 / 35 [email protected] DL 2019 - Introduction, linear models
ex) A classification problem
Beaver Body Temperature DataInput Body temperature xOutput Beaver is outside y = 1, or inside y = 0, of the retreat.
36.6 36.8 37 37.2 37.4 37.6 37.8 38 38.2 38.4−0.5
0
0.5
1
1.5
Temperature x
Pr(y=
1|x)
Data
36.6 36.8 37 37.2 37.4 37.6 37.8 38 38.2 38.4−0.5
0
0.5
1
1.5
Temperature x
Pr(y=
1|x)
Linear regression model
Data (xi, yi)
Linear regression is not suitable since it is not constrained to [0, 1]
27 / 35 [email protected] DL 2019 - Introduction, linear models
Logistic regression
Consider a binary classification problem y ∈ {0, 1}.
pi = Pr(yi = 1|xi) and thus Pr(yi = 0|xi) = 1− pi
Let the odds is the ration between the two class probabilities
Pr(yi = 1|xi)Pr(yi = 0|xi)
=pi
1− pi∈ (0,∞)
and log odds consequently
lnpi
1− pi∈ (−∞,∞)
Logistic regression Assume a linear model for the log odds
lnpi
1− pi= wxi + b ⇒ pi =
ezi
1 + ezi, zi = wxi + b
28 / 35 [email protected] DL 2019 - Introduction, linear models
Logistic function (aka sigmoid function)
The function f : R 7→ [0, 1] defined as f(z) = ez
1+ez is known asthe logistic function.
−6 −4 −2 0 2 4 60
0.2
0.4
0.6
0.8
1
z
ez
1+ez
29 / 35 [email protected] DL 2019 - Introduction, linear models
Logistic regression using maximum likelihood
Pick w and b which make data as likely as possible
w, b = argmaxw,b
lnPr(y1, . . . , yn|x1, . . . , xn, w, b)
Assume all yi to be independent
lnPr(y1, . . . ,yn|x1, . . . , xn, w, b)
=
n∑i=1
lnPr(yi|xi, w, b)
=
n∑i=1yi=1
lnPr(yi = 1|xi, w, b)︸ ︷︷ ︸=pi
+
n∑i=1yi=0
lnPr(yi = 0|xi, w, b)︸ ︷︷ ︸=1−pi
This leads to the following optimization problem
w, b = argminw,b
−n∑
i=1
yi ln(pi) + (1− yi) ln(1− pi)
30 / 35 [email protected] DL 2019 - Introduction, linear models
ln(·) strictly increasing ⇒argmax
xf(x) = argmax
xln f(x)
The Beaver data example
36.6 36.8 37 37.2 37.4 37.6 37.8 38 38.2 38.40
0.2
0.4
0.6
0.8
1
Temperature x
Pr(y=
1|x)
Data
36.6 36.8 37 37.2 37.4 37.6 37.8 38 38.2 38.40
0.2
0.4
0.6
0.8
1
y = 0 y = 1
Temperature x
Pr(y=
1|x)
Logistic regression model
Data (xi, yi)
31 / 35 [email protected] DL 2019 - Introduction, linear models
Linear and logistic regressionMultidimensional input
Linear and logistic regression also work multidimensional inputs
xi = [xi1, . . . , xip]T, i = 1, . . . , n
We assign one parameter for each input dimension
w = [w1, . . . , wp]T
Linear regression Logistic regression
-4
-2
0
10
2
4
105 500
Data
0
0.2
10 10
0.4
0.6
0.8
1
5 500
Data
yi = wTxi + b Pr(yi = 1|xi) = ewTxi+b
1+ewTxi+b
32 / 35 [email protected] DL 2019 - Introduction, linear models
Linear and logistic regression
Linear regression Logistic regressionOutput Outputyi ∈ R yi ∈ {0, 1}
Model Model
yi = wTxi + b pi = Pr(yi = 1|xi) =ew
Txi+b
1+ewTxi+b
Loss LossLi = (yi − yi)2 Li = −yi ln(pi)− (1− yi) ln(1− pi)
We find w and b by minimizing sum of the losses
w, b = argminw,b
1
n
n∑i=1
Li
• For linear regression the problem can be solved analytically.• For logistic regression we need to use numerical optimization
(will talk about it in the next lecture).33 / 35 [email protected] DL 2019 - Introduction, linear models
Hand-in assignment 1AClassifying biopsies using logistic regression
Biopsy data {yi,xi}ni=1 from n = 699 breast tumors
Input
xi1 clump thicknessxi2 uniformity of cell sizexi3 uniformity of cell shape...
...xi9 mitoses
Output yi benign/malignant
Tasks:
• Derive the gradients of the cost function w.r.t. the parameters
• Implement the logistic regression model
• Update the parameters with gradient descent
The code will be extended to a neural network in HA 1B.
34 / 35 [email protected] DL 2019 - Introduction, linear models
A few concepts to summarize lecture 1
Machine Learning: Deals with learning, reasoning and acting based on data.
Deep Learning: A set of machine learning methods that allow modelscomposed of multiple processing layers.
Regression: Learning problem where the output is quantitative.
Classification: Learning problem where the output is qualitative.
Maximum likelihood: Learning objective based on probability theory.
Linear Regression: Linear model for regression problems.
Logistic Regression: Linear model for classification problems.
35 / 35 [email protected] DL 2019 - Introduction, linear models