Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout...

Deep Learning

Lecture 1 – Introduction, linear models

Niklas WahlstromDivision of Systems and ControlDepartment of Information TechnologyUppsala University

[email protected]/katalog/nikwa778

1 / 35 [email protected] DL 2019 - Introduction, linear models

www.it.uu.se/katalog/nikwa778

mailto:[email protected]

What is the course about?



Machine learning

”Machine learning is about learning, reasoningand acting based on data.”

“It is one of today’s most rapidly growing technical fields, lying atthe intersection of computer science and statistics, and at the core

of artificial intelligence and data science.”

Ghahramani, Z. Probabilistic machine learning and artificial intelligence. Nature 521:452-459, 2015.

Jordan, M. I. and Mitchell, T. M. Machine Learning: Trends, perspectives and prospects. Science,349(6245):255-260, 2015.



Deep learning

“Deep learning allows computational models that are composed ofmultiple processing layers to learn representations of data with

multiple levels of abstraction.”

LeCun, Y., Bengio, Y. and Hinton, G. Deep learning. Nature 521:436-444, 2015.

Example: Image classification

Input: pixels of an imageOutput: object identityEach hidden layer extractsincreasingly abstract features.Zeiler, M. D. and Fergus, R. Visualizing andunderstanding convolutional networks

Computer Vision - ECCV (2014).

Image removed in handout version due to copyrightc©Springer International Publishing



ex) Cancer diagnosis

Systems for detectingcell divisions (mitosis)in histology images canbe used to improve (orautomate) cancerdiagnosis.

Image removed in handout version due to copyrightc©Springer-Verlag Berlin Heidelberg 2013

• Learn a model with

input: RBG histology image (pixel values)output: number and locations (in the image) of mitosisdetections

• Training data: Histology images labeled by experts.

D. C. Ciresan, A. Giusti, L. M. Gambardella and J. Schmidhuber. Mitosis Detection in Breast Cancer HistologyImages with Deep Neural Networks. In Medical Image Computing and Computer Assisted Intervention, 411-418,2013.



ex) Colorization (I/II)

Image removed in handout version due to copyrightc©Springer

Task: Colorize gray-scale photos.


input: gray-scale pixel values.output: color pixel values.

R. Zhang, P. Isola, and A. A. Efros. Colorful Image Colorization. ECCV, 2016.



ex) Colorization (II/II)

Image removed in handout version due to copyrightc©Springer

Model applied to legacy grayscale photos.



ex) Alpha Go zero

• Input: State of the game (19× 19 grid,either black, white or blank)

• Output: Probability for the current playerto win the game

+ reinforcement learning

Silver et al. Mastering the game of Go with deep neuralnetworks and tree search, Nature 529, 484–489, 2016.

• Input: Same

• Output: Probability for the current playerto win the game and what move to make

Silver et al. Mastering the game of Go without humanknowledge, Nature 550, 354–359, 2017.

Silver et al. A general reinforcement learning algorithmthat masters chess, shogi, and Go through self-play,Science, 362(6419): 1140–1144, 2018.



ex) Higgs Machine Learning Challenge

HiggsML: Crowd-sourcing initiative by CERN (hosted at Kaggle)

• Separate H → ττ from background noise.


- input: 30-dimensional vector of“features” recorded during theexperiment.

- output: “signal” or “background”

• Deep Learning methods among the winningmethods.

C. Adam-Bourdarios, G. Cowan, C. Germain, I. Guyon, B. Kegl and D. Rousseau. The Higgs boson machinelearning challenge. NIPS 2014 Workshop on High-energy Physics and Machine Learning, 2014.



ex) Generative models

Novel generation of painting


- input: paintings- output: style, genre,

artist, year, etc.

• Used a generative model tocreate artificial paintings

• More about generativemodels in lecture 9

K. Jones and D. Bonafilia, A. Danyluk GANGogh: Creating Art with GANs. 2017https://towardsdatascience.com/gangogh-creating-art-with-gans-8d087d8f74a1


https://towardsdatascience.com/gangogh-creating-art-with-gans-8d087d8f74a1


Course information



Lecture outline

1. Introduction2. Feed farward neural networks3. Optimization4. Convolutional neural networks I5. Convolutional neural networks II6. Over-/underfitting, bias-variance trade-off7. Regularization in deep learning8. Variational inference9. Variational autoencoders10. Summary and guest lecture



Hand-in assignments

4 hand-in assignments (HAs):

• Covers (mainly) implementation aspects of deep learning.

• Deadlines for all HAs available on course homepage.

• Everyone has 7 joker days, which can be used with any HA.

• You are encouraged to collaborate...

• ... but you should write submit your own report and code.

Optional helpdesks are scheduled after each lecture.



Optional project

• After the course you are encouraged to carry out a project

• Preferably 1-4 students in each team

• The projects must include real data

• Awarded 3hp extra

• Conducted during summer, see course homepage for the dates



Course literature

• Ian Goodfellow, Yoshua Bengio and Aaron Courville DeepLearning, MIT Press, 2016. www.deeplearningbook.org

We will not follow the book strictly though. Another greatresource is

• Michael A. Nielson Neural Networks and Deep LearningDeterminiation Press, 2015.neuralnetworksanddeeplearning.com

• First lectures are well covered by lecture notes linked from thecourse homepage.

All of them available online.


www.deeplearningbook.org

neuralnetworksanddeeplearning.com


Who are we?

Teachers involved in the course (in approximate order ofappearance):

NiklasWahlstrom

Room:2319

ThomasSchon

Room:2209

JoakimLindblad

Room:2141

JalilTaghia

Room:2321

FredrikGustafsson

Room:2303

NicolasPielawski

Room:xxxx

AnindyaGupta

Room:xxxx︸︷︷︸

Lecturers

︸︷︷︸Teaching assistants

All room numbers are at ITC Polacksbacken.You can reach us by email: <firstname.lastname>@it.uu.se.



Who are you?



Linear regression and classification



Supervised machine learning

Methods for automatically learning (training, estimating, . . . )a model for the relationship between

• the input x, and the

• the output y

from observed training data

T def= {(x1, y1), (x2, y2), . . . , (xn, yn)}.



Regression vs. classification

We will distinguish between two types of problems:regression and classification

Regression is when the output y is quantitative, e.g.

• Climate models (y = “increase in global temperature”)

• Economic models (y = “change in GDP”)

Classification is when the output y is qualitative, e.g.

• Spam filters (y ∈ {spam, good email})• Diagnosis systems (y ∈ {ALL, AML, CLL, CML, no leukemia})• Fingerprint verification (y ∈ {match, no match})



ex) A regression problem

0 5 10 15 20 25 30 35 40 450

50

100

150

Speed (mph)

Dis

tan

ce(f

eet)

Data

0 5 10 15 20 25 30 35 40 450

50

100

150

Speed (mph)–Input

Dis

tan

ce(f

eet)

–O

utp

ut

Linear model

Data (xi, yi)

0 5 10 15 20 25 30 35 40 450

50

100

150

Speed (mph)–Input

Dis

tan

ce(f

eet)

–O

utp

ut

Linear model

Data (xi, yi)

Predictions (x?, y?)

Consider a linear model to explain the data

y = wx+ b



What is a good model?

0 5 10 15 20 25 30 35 40 450

50

100

150

Speed (mph)

Dis

tan

ce(f

eet)

A linear modelAnother linear modelA third linear model

A fourth linear modelTraining data T



Learning using maximum likelihood

Assume that each data points can be described by a linear model + noise

yi = wxi + b+ εi, εi ∼ N (0, σ2)

Maximum likelihood: Think of ε (dotted) as Gaussain randomvariables, and choose the model (solid) such that the resulting ε areas likely as possible.



Gaussian (Normal) distribution

Probability density function (PDF) for the scalar Gaussiandistribution

N (z;µ, σ2) =1√2πσ2

e−(z−µ)2

2σ2

• µ is the mean (expectedvalue of the distribution)

• σ is the standard deviation

• σ2 is the variance0 5 10

0

0.2

0.4

0.6

0.8

1

zP

DF

µ = −1, σ2 = 0.5

µ = 7, σ2 = 0.2

µ = 4, σ2 = 4

z ∼ N (z;µ, σ2) means that z is a Gaussian random variable withmean µ and variance σ2. ∼ reads “distributed according to”.



Maximum likelihood

A linear model with Gaussian noise

yi = wxi + b+ εi, εi ∼ N (0, σ2), i = 1, . . . , n

The model can also be expressed as

p(yi|xi, w, b) = N (yi;wxi + b, σ2)

Pick w and b which makes the data as likely as possible

w, b = argmaxw,b

p(y1, . . . , yn|x1, . . . , xn, w, b)

Assume all εi to be independent

p(y1, . . . , yn|x1, . . . , xn, w, b) =n∏

i=1

p(yi|xi, w, b)

=

n∏i=1

N (yi;wxi + b, σ2)

∝ e−1

2σ2

∑ni=1(yi−wxi−b)2


y and z independent⇒ p(y, z) = p(y)p(z)


Linear regression

The least squares problem

w, b = argminw,b

n∑i=1

(wxi + b− yi)2

0 5 10 15 20 25 30 35 40 450

50

100

150

Speed (mph)

Dis

tan

ce(f

eet)

Linear regression model

Data (xi, yi)



ex) A classification problem

Beaver Body Temperature DataInput Body temperature xOutput Beaver is outside y = 1, or inside y = 0, of the retreat.

36.6 36.8 37 37.2 37.4 37.6 37.8 38 38.2 38.4−0.5

0

0.5

1

1.5

Temperature x

Pr(y=

1|x)

Data

36.6 36.8 37 37.2 37.4 37.6 37.8 38 38.2 38.4−0.5

0

0.5

1

1.5

Temperature x

Pr(y=

1|x)

Linear regression model

Data (xi, yi)

Linear regression is not suitable since it is not constrained to [0, 1]



Logistic regression

Consider a binary classification problem y ∈ {0, 1}.

pi = Pr(yi = 1|xi) and thus Pr(yi = 0|xi) = 1− pi

Let the odds is the ration between the two class probabilities

Pr(yi = 1|xi)Pr(yi = 0|xi)

=pi

1− pi∈ (0,∞)

and log odds consequently

lnpi

1− pi∈ (−∞,∞)

Logistic regression Assume a linear model for the log odds

lnpi

1− pi= wxi + b ⇒ pi =

ezi

1 + ezi, zi = wxi + b



Logistic function (aka sigmoid function)

The function f : R 7→ [0, 1] defined as f(z) = ez

1+ez is known asthe logistic function.

−6 −4 −2 0 2 4 60

0.2

0.4

0.6

0.8

1

z

ez

1+ez



Logistic regression using maximum likelihood

Pick w and b which make data as likely as possible

w, b = argmaxw,b

lnPr(y1, . . . , yn|x1, . . . , xn, w, b)

Assume all yi to be independent

lnPr(y1, . . . ,yn|x1, . . . , xn, w, b)

=

n∑i=1

lnPr(yi|xi, w, b)

=

n∑i=1yi=1

lnPr(yi = 1|xi, w, b)︸︷︷︸=pi

+

n∑i=1yi=0

lnPr(yi = 0|xi, w, b)︸︷︷︸=1−pi

This leads to the following optimization problem

w, b = argminw,b

−n∑

i=1

yi ln(pi) + (1− yi) ln(1− pi)


ln(·) strictly increasing ⇒argmax

xf(x) = argmax

xln f(x)


The Beaver data example

36.6 36.8 37 37.2 37.4 37.6 37.8 38 38.2 38.40

0.2

0.4

0.6

0.8

1

Temperature x

Pr(y=

1|x)

Data

36.6 36.8 37 37.2 37.4 37.6 37.8 38 38.2 38.40

0.2

0.4

0.6

0.8

1

y = 0 y = 1

Temperature x

Pr(y=

1|x)

Logistic regression model

Data (xi, yi)



Linear and logistic regressionMultidimensional input

Linear and logistic regression also work multidimensional inputs

xi = [xi1, . . . , xip]T, i = 1, . . . , n

We assign one parameter for each input dimension

w = [w1, . . . , wp]T

Linear regression Logistic regression

-4

-2

0

10

2

4

105 500

Data

0

0.2

10 10

0.4

0.6

0.8

1

5 500

Data

yi = wTxi + b Pr(yi = 1|xi) = ewTxi+b

1+ewTxi+b



Linear and logistic regression

Linear regression Logistic regressionOutput Outputyi ∈ R yi ∈ {0, 1}

Model Model

yi = wTxi + b pi = Pr(yi = 1|xi) =ew

Txi+b

1+ewTxi+b

Loss LossLi = (yi − yi)2 Li = −yi ln(pi)− (1− yi) ln(1− pi)

We find w and b by minimizing sum of the losses

w, b = argminw,b

1

n

n∑i=1

Li

• For linear regression the problem can be solved analytically.• For logistic regression we need to use numerical optimization

(will talk about it in the next lecture).33 / 35 [email protected] DL 2019 - Introduction, linear models


Hand-in assignment 1AClassifying biopsies using logistic regression

Biopsy data {yi,xi}ni=1 from n = 699 breast tumors

Input

xi1 clump thicknessxi2 uniformity of cell sizexi3 uniformity of cell shape...

...xi9 mitoses

Output yi benign/malignant

Tasks:

• Derive the gradients of the cost function w.r.t. the parameters

• Implement the logistic regression model

• Update the parameters with gradient descent

The code will be extended to a neural network in HA 1B.



A few concepts to summarize lecture 1

Machine Learning: Deals with learning, reasoning and acting based on data.

Deep Learning: A set of machine learning methods that allow modelscomposed of multiple processing layers.

Regression: Learning problem where the output is quantitative.

Classification: Learning problem where the output is qualitative.

Maximum likelihood: Learning objective based on probability theory.

Linear Regression: Linear model for regression problems.

Logistic Regression: Linear model for classification problems.



Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout...

Documents

Transcript of Lecture 1 Introduction - Uppsala University · ex) Colorization (I/II) Image removed in handout...